Kubernetes Outage and Recovery

Taweh Ruhle
May 27
3 min read

Updated: 11 hours ago

Why Kubernetes Pod Disruption Budgets (PDBs) Are Essential for High Availability

In today’s always-on digital landscape, zero downtime is not just a nice-to-have—it’s a necessity. Organizations demand high availability and seamless user experiences, especially when leveraging CI/CD pipelines to frequently deliver features and updates. Kubernetes, when configured correctly, is a powerful platform that supports this objective through features like rolling deployments, auto-scaling, and fault-tolerant architecture.

Yet even the best Kubernetes setup isn’t immune to the risks posed by planned or unplanned disruptions, such as node upgrades or infrastructure misconfigurations. This is where Pod Disruption Budgets (PDBs) play a crucial role.

The Foundations of High Availability in Kubernetes

To achieve resilient systems, teams typically:

Deploy services across multiple availability zones or regions.
Adopt advanced deployment strategies like rolling updates, canary, or blue/green.
Align Kubernetes deployment mechanisms with the service’s SLOs and traffic patterns.

These practices ensure uptime in most failure scenarios—but they don't guard against platform-level maintenance events, such as node replacements, autoscaler downsizing, or OS patching. During these disruptions, Kubernetes may evict pods without awareness of application-level performance needs. Enter the PDB.

What Is a Pod Disruption Budget (PDB)?

A Pod Disruption Budget tells Kubernetes how many pods of a specific application must remain available during voluntary disruptions—like node drains or upgrades. It helps maintain service reliability while allowing essential maintenance to proceed.

A PDB typically defines one of two constraints:

minAvailable: The minimum number or percentage of pods that must be available at any time.
maxUnavailable: The maximum number or percentage of pods that can be disrupted simultaneously.

These settings are critical in preventing service degradation or downtime during node lifecycle events.

Real-World Scenario: Without PDB

Let’s explore a concrete example:

You’re running a User Access Management service across three availability zones (s#a, s#b, s#c) within a Kubernetes cluster in the Singapore region:

s#a: 1 pod
s#b: 1 pod
s#c: 2 pods
Total: 4 pods
Minimum of 3 pods needed to handle typical load
An HPA (Horizontal Pod Autoscaler) is in place to scale on demand

Now imagine a cluster upgrade is triggered—either planned or mistakenly initiated by an infrastructure pipeline:

Case 1: Rolling Node Upgrades (No PDB)

Kubernetes proceeds to drain nodes one at a time:

s#a node drains → 3 pods remain → service operates normally.
s#b node drains → 3 pods remain → still fine.
s#c node drains → only 2 pods remain → potential service degradation or user impact, especially if pods on that node were handling active sessions or spikes in traffic.

Case 2: Blue/Green Cluster Replacement

A new cluster (green) is spun up, workloads are migrated from the old (blue) one:

More controlled, but complex and potentially expensive.
Even then, migration sequencing without PDB constraints can still cause transient unavailability.

Real-World Scenario: With PDB

In the same scenario, suppose a PDB with minAvailable: 3 is applied to the application:

s#a node drains → 3 pods remain → allowed.
s#b node drains → 3 pods remain → allowed.
s#c node attempts to drain → would reduce pod count to 2 → upgrade is blocked.

Thanks to the PDB, Kubernetes halts the upgrade until the pod availability condition is met, maintaining performance and uptime guarantees.

Benefits of Using Pod Disruption Budgets

Implementing PDBs yields numerous operational benefits:

Minimizes downtime and prevents data loss, safeguarding customer experience.
Ensures resilience during cluster events, keeping critical applications functional.
Offers fine-grained control over how many pods can be taken down at once.
Reduces the need for manual oversight during disruptive platform events.
Encourages intentional architecture, especially in multi-zone deployments.

Best Practices and Recommendations

While not always mandatory, every production-grade application should define a PDB.
Choose between minAvailable or maxUnavailable based on the application’s scaling model.
Platform administrators should consider enforcing admission policies to block deployments without PDBs.
Combine PDBs with robust observability, chaos testing, and cluster lifecycle management for a holistic strategy.

Final Thoughts

Pod Disruption Budgets are your safety net. They ensure that your services remain stable during the very events that keep your cluster healthy and secure. If you’re deploying production workloads on Kubernetes without PDBs, now’s the time to start.

Writer’s Overview

Taweh Ruhle – Co-Founder & Head of DevSecOps & Cloud, Midships

Taweh leads Midships’ DevSecOps and Cloud services, with nearly two decades of experience delivering secure, automated solutions for enterprise environments. He’s the creator of Midships’ ForgeRock Accelerator and a pioneer in CI/CD architecture.

Short bio: Taweh is a cloud security expert and delivery architect, known for designing and implementing complex DevOps ecosystems across regulated industries.

📧 taweh@midships.io

🔗 LinkedIn: Taweh Ruhle

Stronger Identity,
Happier Customers.

Kubernetes Outage and Recovery

Why Kubernetes Pod Disruption Budgets (PDBs) Are Essential for High Availability

The Foundations of High Availability in Kubernetes

What Is a Pod Disruption Budget (PDB)?

Real-World Scenario: Without PDB

Now imagine a cluster upgrade is triggered—either planned or mistakenly initiated by an infrastructure pipeline:

Case 1: Rolling Node Upgrades (No PDB)

Case 2: Blue/Green Cluster Replacement

Real-World Scenario: With PDB

Benefits of Using Pod Disruption Budgets

Best Practices and Recommendations

Final Thoughts

Writer’s Overview

Recent Posts

Comments

Stronger Identity, Happier Customers.

Why Kubernetes Pod Disruption Budgets (PDBs) Are Essential for High Availability

The Foundations of High Availability in Kubernetes

What Is a Pod Disruption Budget (PDB)?

Real-World Scenario: Without PDB

Now imagine a cluster upgrade is triggered—either planned or mistakenly initiated by an infrastructure pipeline:

Case 1: Rolling Node Upgrades (No PDB)

Case 2: Blue/Green Cluster Replacement

Real-World Scenario: With PDB

Benefits of Using Pod Disruption Budgets

Best Practices and Recommendations

Final Thoughts

Writer’s Overview

Comments

Stronger Identity,
Happier Customers.