🚀 New on the DownToZero Blog: Scale-To-Zero PostgreSQL Databases We took our scale-to-zero philosophy one step further — all the way to PostgreSQL. By combining systemd socket activation and Docker Compose, we built databases that only run when needed — saving compute, memory, and cost. Discover how it works, when it shines, and where it doesn’t. 👉 https://lnkd.in/eskvCrAP #PostgreSQL #ScaleToZero #CloudComputing #GreenTech #DowntoZero #DevOps #Sustainability
How to scale-to-zero PostgreSQL databases with systemd and Docker Compose
More Relevant Posts
-
Solving a Kubernetes Storage Challenge with Longhorn We hit a storage dilemma in Kubernetes: Some databases can’t be clustered—if a pod dies on one node, it must restart fast on another with the same data. Another challenge appeared when we needed to migrate a PostgreSQL cluster’s backup into a new Kubernetes environment with a new PostgreSQL (CNPG) cluster—a flow that differs from typical app data restores. After testing our options, it all came down to how we handle Volumes/PVCs. We evaluated Ceph vs Longhorn. Ceph ✅ powerful & feature-rich, but resource-heavy and complex to operate for our scale. Longhorn ✅ lightweight, easy to deploy, and a great fit for our use case. What we achieved with Longhorn Brought up PostgreSQL (CNPG) in a new cluster from existing backups/snapshots. Added high availability to single-node SQL databases via replicated volumes. Each volume keeps 2–3 replicas across nodes. If a node/pod fails, the workload can be rescheduled and attach a replica on another node in seconds (controller + scheduler permitting). Why Longhorn for this scenario? Simple to run and resource-friendly Kubernetes-native operations (CSI snapshots, backups/DR) Fast restore paths for both single-node DBs and CNPG-managed clusters Ceph still has powerful, unique capabilities—especially at very large scale or when you need unified block/file/object—but for our goals, Longhorn was the perfect fit. 🔗 I’ve shared a step-by-step doc on restoring a CNPG cluster from an existing Longhorn backup—link below. https://lnkd.in/dDvqpFX2 https://lnkd.in/dwzi2UwH #kubernetes #longhorn #cloudnative #devops #sre #postgresql #cnpg #statefulsets #storage #ceph
To view or add a comment, sign in
-
-
Scaling Postgres 388 is released! In this episode, we discuss PG17 and PG18 benchmarks across storage types, more about Postgres locks, sanitizing SQL and can a faster software & hardware environment cause performance problems? https://lnkd.in/eHXb6uwi #Postgres #PostgreSQL
To view or add a comment, sign in
-
PostgreSQL 18 was officially released on September 25, 2025, and it brings one of the biggest architectural upgrades in years, a next-generation asynchronous I/O subsystem. Before PostgreSQL 18, most disk I/O was synchronous, and now the new async I/O layer can fetch data without blocking the main process and intelligently batch operations. This upgrade results in meaningful real-world performance gains: ✅ Faster queries, especially heavy-read workloads ✅ Higher throughput under concurrency ✅ Better hardware utilization without extra tuning For teams building scalable products, this means smoother performance during peak usage — without over engineering infrastructure. If you would like to better understand how this upgrade can impact your architecture and improve scalability in your stack, feel free to reach out to us at hello@whitecodelabs.com #PostgreSQL #CTO #CIO #CEOPerspective #StartupTech #Scalability #CloudArchitecture #Databases #OpenSource #BackendEngineering
To view or add a comment, sign in
-
Excited to be over halfway through #PG18Hacktober — 51% and counting! After years of anticipation, PostgreSQL18 finally delivers one of its most transformative features: Asynchronous I/O (AIO). In Day16 blog, we break down how this long-awaited feature reshapes PostgreSQL’s performance model freeing backend processes from blocking reads, unlocking better CPU utilization, and setting the stage for future innovations like Direct I/O. 🔍 Inside the post: 🐘Why AIO matters and how it evolved over 5 years 🐘The difference between worker, io_uring, and sync methods 🐘What’s next for PostgreSQL — from async writes to Direct I/O It’s a deep dive into PostgreSQL internals and the performance revolution behind PG18. Check it out here https://lnkd.in/gF7zzTGw #PostgreSQL #PG18 #OpenSource #DatabasePerformance #AIO #PG18Hacktober
To view or add a comment, sign in
-
🚫 Don’t Do This! Before you scale, tune, or deploy PostgreSQL… learn what not to do first. Even experienced Postgres teams fall into avoidable traps — from replication missteps to index misuse to HA gaps that break under load. 🎥 Watch our on-demand session: “Don’t Do This! Learn from Common PostgreSQL Mistakes Before You Make Them” 🔍 You’ll learn: • Common schema & query design mistakes • Replication pitfalls + HA gotchas • Scaling misconfigurations to avoid • How to fix issues before they cause downtime Save time. Save stress. Save your Postgres. 👉 Watch now & level-up your PostgreSQL expertise: https://hubs.la/Q03RkR3y0.. #Postgres #PostgreSQL #DatabaseReliability #DBA #DevOps #DataEngineering #pgEdge #OpenSource #HighAvailability #DistributedPostgres Jimmy Angelakos
To view or add a comment, sign in
-
-
Upgrading PostgreSQL: A potentially multi-million dollar decision that doesn't have to happen yet. Your PostgreSQL version is reaching end-of-life. Your team estimates 18 months and significant budget to upgrade across hundreds of databases. Sound familiar? PgLTS from Command Prompt, Inc. extends PostgreSQL support by 3 additional years, giving you eight years total instead of five years. What this means: — Time to plan properly instead of rushing — Budget spread over additional years — FedRAMP compliance maintained — Support for PostgreSQL 12+ Think of it like extending a lease while you plan your next building—except for your database infrastructure. Learn more at https://lnkd.in/gpiMGVam #PostgreSQL #Database #EnterpriseTech #DevOps #EnterpriseIT
To view or add a comment, sign in
-
-
5 PostgreSQL patterns for zero downtime replication. It's 2 AM during peak load. The primary PostgreSQL instance failed. Write-IO ground to a halt. The team had configured replication. But we overlooked the critical detail of automated promotion for synchronous commit mode. The battle-tested architectural framework we mandated to prevent the next enterprise-scale outage: 1. Implement Asynchronous Streaming Replication. Use this as your baseline for read scaling and disaster recovery across availability zones (AZs). Monitor lag using pg_stat_replication metrics ingested by Prometheus to ensure replication delay stays below 50ms, mitigating data loss windows. 2. Automate Failover with Patroni/Keepalived. Never trust manual intervention for promotion. Use Patroni, managed by a central orchestration engine like Kubernetes and deployed via Helm charts, to handle automatic leader election and subsequent DNS/Service discovery updates seamlessly. 3. Use Quorum Synchronous Commit. For true enterprise HA, demand durability. Set synchronous_standby_names to require confirmation from at least two replica nodes before the commit returns success, mitigating data loss entirely during immediate primary node failure. 4. Codify Replication Topology in Terraform. Treat the entire replication cluster (including load balancers and connection pooling via PgBouncer) as immutable infrastructure. Deploy changes via ArgoCD for GitOps adherence and enable fast, reliable rollback capability. 5. Practice Chaos Engineering with Replica Testing. Use tools like the AWS Fault Injection Simulator or bespoke Docker environments to intentionally terminate primary nodes. Validate the cluster self-heals and the recovery time objective (RTO) remains under 60 seconds consistently. Durability and automated recovery are foundational requirements; complexity is only acceptable if it guarantees 99.99% uptime. Patroni or Repmgr – which tool simplified your organization's failover architecture more effectively? Save this framework for your next infrastructure review. #SystemDesign #PostgreSQL #SiteReliabilityEngineering #PlatformEngineering
To view or add a comment, sign in
-
-
Scaling PostgreSQL is never about hardware alone. The real challenge is keeping performance, availability, and consistency aligned as your workload grows. This article from pgEdge breaks it down clearly: • Tune configuration before scaling out. • Combine vertical and horizontal scaling. • Use logical replication for distributed load. • Plan for automatic failover and monitoring from day one. Practical, production-ready guidance for anyone managing mission-critical Postgres systems. https://lnkd.in/dH3S7pwv #PostgreSQL #Scaling #HighAvailability #Replication #Sharding #Partitioning #ReadReplicas #DatabasePerformance #DBA #OpenSource #pgEdge #DataEngineering #DistributedSystems #DatabaseArchitecture #CloudDatabases #PostgresCommunity
To view or add a comment, sign in
-
-
PostgreSQL Cost Optimization: The Impact of Query Efficiency Technical leaders often focus on infrastructure sizing, but query performance offers the most direct lever for cost reduction. The Reality: A single 50ms query running 100,000 times daily consumes 1.5 hours of CPU time. High-frequency, moderately slow queries create more infrastructure strain than occasional slow queries—and they're often overlooked. Query-level improvements translate directly to reduced infrastructure costs and improved scalability. Command Prompt offers practical guidance for PostgreSQL workload optimization and cost analysis. Read more in this blog post "Cost Optimization for PostgreSQL: Practical Tips for Technical Teams" by Command Prompt, Inc.'s Director of Client Success & Compliance Debra Cerda here: https://lnkd.in/gDmufY8P #PostgreSQL #CostOptimization #TechnicalLeadership #DatabaseManagement #ProTips #Tips
To view or add a comment, sign in
-
-
PostgreSQL replication is easy, until it silently isn’t. It was a Tuesday afternoon, peak traffic, when the latency alerts started screaming. Our synchronous replica was lagging—badly—and a minor network hiccup almost took down the entire production database. Here’s the battle-tested playbook we now use for rock-solid PostgreSQL HA. 1. We stopped defaulting to async. Speed is tempting, but for our core services, even a few seconds of data loss on failover is unthinkable. We now use synchronous replication for the auth and payment tables. The ~5ms latency hit was a small, and honestly necessary, price to pay for a zero RPO. 2. Automate failover like your job depends on it. Because it does. Manual promotion is a recipe for disaster under pressure. We run Patroni with etcd to manage the cluster state and automate leader election. It handles failover in under 30 seconds, which saved us during a partial AWS AZ outage last quarter. 3. PgBouncer is non-negotiable. Your application shouldn't know or care which DB instance is the primary. A central connection pooler like PgBouncer sits in front of the cluster, maintaining persistent connections and redirecting traffic seamlessly after a failover. This completely eliminated the cascade of app-level connection errors we used to see. 4. Monitor replication lag like a hawk. An out-of-sync replica isn't high availability; it's a high-stakes liability. We have a non-negotiable Datadog alert that queries pg_stat_replication and pages the on-call if replay_lag exceeds 500ms for more than two minutes. It’s our best early warning for network saturation. 5. Brutally test your recovery. Your HA setup is just a theory until you pull the plug. We use Terraform to spin up a staging clone of our production DB and run weekly chaos tests—terminating the primary instance, severing network connections via security group rules. It’s the only way to build real confidence. True high availability isn't about uptime; it's about predictable, tested recovery. What's the most counter-intuitive failure you've seen in a PostgreSQL HA setup? Save this as a sanity check for your own cluster. #DevOps #PostgreSQL #HighAvailability #SiteReliability
To view or add a comment, sign in
-