Keycloak at Production Scale: High Availability Patterns for Regulated Cloud-Native Platforms
- Ajit Gupta

- Jun 10
- 5 min read

Abstract
Keycloak deployments often begin well. They pass user acceptance testing, perform acceptably in staging, and go live without visible issues. The problems usually appear later: session state becomes inconsistent under load, Kubernetes maintenance disrupts authentication, cache replication behaves unpredictably, or a regional failover exposes assumptions that were never tested.
For production environments where authentication is critical infrastructure, Keycloak must be designed as more than an application server. Its reliability depends on early architectural choices around cache topology, session replication, database access, deployment promotion, Kubernetes disruption controls, and observability. Many of these choices are difficult to reverse after go-live.
This paper explains the production patterns that make Keycloak stable at scale: separating the cache tier from the application tier, designing explicit multi-region behavior, supporting blue-green deployment, protecting availability during Kubernetes maintenance, and monitoring the failure modes that matter.
1. Introduction
Keycloak gives engineering teams control over identity infrastructure, custom authentication flows, configuration, and extensions. That control is valuable, but it also means the operating team must make architectural decisions that other platforms may hide.
In production, Keycloak failures are rarely about whether a feature exists. They are usually about whether the deployment model survives real operational pressure: pod restarts, node drains, database bottlenecks, cache replication lag, failed promotion, or configuration mistakes.
Authentication sits on the critical path. If Keycloak is unavailable, downstream services may still be healthy, but users and services may not be able to access them. For that reason, Keycloak should be treated as core platform infrastructure from the beginning.
2. The limit of embedded cache
Keycloak’s default deployment model uses an embedded Infinispan cache for session replication between cluster members. For a single-region deployment with stable networking and modest session volume, this can be acceptable. At larger scale, or in environments with stronger availability requirements, it becomes limiting.
The embedded model couples session state to the Keycloak process. When a pod restarts because of node maintenance, rescheduling, or an application update, the cluster topology changes and must stabilise. Under high concurrency, that instability can surface as session validation failures.
The same model is also weak for multi-region deployments. It does not support cross-region session continuity without significant network latency overhead. In active-active topologies, this makes the embedded cache a bottleneck.
The issue is not simply that embedded cache is “bad.” The issue is that it ties application lifecycle and session state too closely together for production environments that require predictable maintenance, scaling, and failover.
3. External cache as the production pattern
The more reliable pattern separates the cache tier from the Keycloak application tier. Infinispan runs as a standalone cluster and handles session replication. Keycloak connects to it as a remote cache store.
This separation has three practical benefits.
First, cache sizing and Keycloak sizing become independent. Session volume creates memory pressure, and that pressure should be handled by the cache tier rather than the application pods.
Second, the cache cluster can be scaled, upgraded, or failed over without directly changing the Keycloak application layer. Maintenance becomes easier to plan and isolate.
Third, cross-region cache replication can use Infinispan’s native RELAY2 protocol instead of relying on Keycloak cluster discovery. This makes replication more predictable and easier to monitor.
The original production guidance notes that a three-node Infinispan cluster per region, sized at 16GB RAM and 4 vCPU per node, has supported approximately 10 million active sessions with a 30-minute idle timeout. Lower timeout thresholds reduce memory requirements proportionally.
The principle is clear: session state should be treated as a dedicated production concern, not as incidental memory inside Keycloak pods.
4. Multi-region session behavior
A true active-active multi-region deployment requires explicit decisions about session ownership and failover behavior. Keycloak does not make those decisions automatically.
A stable pattern is to route session creation to a primary region using DNS weighted routing or global load balancer policy. The creating region holds the authoritative session copy. The cache layer replicates session state to the secondary region asynchronously.
Replication lag must be monitored as a production metric. If lag exceeds a defined threshold, alerts should fire before users experience session failures.
During regional failover, the secondary region promotes its cache to authoritative. Sessions that have not yet replicated are invalidated and require re-authentication. This is an intentional trade-off. Synchronous replication would reduce this risk, but it adds latency that is unacceptable for authentication flows.
Persistent state must also be available after failover. A database read replica in the secondary region ensures that user records and realm configuration can be accessed without cross-region reads.
5. Blue-green deployment and rollback
Keycloak upgrades, realm configuration changes, and custom extension updates can all interrupt service if handled as simple in-place changes. The safer pattern is blue-green promotion.
Two production-equivalent environments exist at the same time. The active environment serves traffic. Changes are applied to the inactive environment, validated, and then traffic is promoted. The previous environment is retained for a defined period as a rollback target.
This pattern depends on three conditions.
Database schema changes must be backward-compatible. If a migration breaks the previous application version, rollback is no longer reliable.
Session state must be shared between environments or drained before promotion. Sessions created in the blue environment must remain valid after traffic moves to green.
Configuration and code changes must be coordinated. A realm configuration that references a custom extension before that extension exists in the active environment will fail immediately.
Blue-green deployment is not just a release pattern. For Keycloak, it is a way to reduce authentication downtime during upgrades, configuration changes, and extension releases.
6. Kubernetes disruption controls
Replica count alone does not guarantee availability. Without a Pod Disruption Budget, voluntary Kubernetes operations such as node drains, cluster upgrades, patching, or autoscaler scale-down events can terminate multiple Keycloak pods at once.
For a three-pod Keycloak deployment, a PDB with minAvailable: 2 ensures that at least two pods remain running during voluntary disruptions. If a node drain would violate that constraint, Kubernetes blocks the operation until it can proceed safely.
The Infinispan cache cluster needs its own PDB because cache quorum matters separately from Keycloak pod availability. Keycloak pods may be healthy, but if the cache cluster loses quorum, authentication can still fail.
PDBs only protect against voluntary disruptions. Involuntary failures still require sufficient replicas and anti-affinity rules to distribute pods across nodes and availability zones.
7. Observability and incident response
A production Keycloak deployment must expose the signals that predict authentication failure.
The minimum useful observability set includes authentication event rate by flow type and outcome, cache replication lag, database connection pool utilisation, and token issuance and introspection latency at the 95th and 99th percentile.
These metrics matter because they map directly to real failure modes. Authentication failure spikes may indicate credential attacks or downstream misconfiguration. Cache lag can cause session errors. Database pool exhaustion can make authentication failures look like application errors. Latency increases may reveal configuration or infrastructure problems before they become outages.
Runbooks should cover cache split-brain, database replica lag, configuration corruption, and failed deployment promotion. Each runbook should include detection, recovery, and rollback steps. Most importantly, these paths should be tested before they are needed.
Conclusion
Keycloak can operate successfully as a production identity platform, but stable deployments are the result of deliberate architecture rather than default installation choices.
The key pattern is separation of concerns. Keycloak should handle authentication. The cache tier should handle session state. The database layer should support persistent state and regional recovery. Kubernetes should enforce disruption limits. Deployment pipelines should support promotion and rollback. Observability should expose failure signals early.
A deployment that works in staging is not necessarily ready for long-running production use. A deployment that survives pod restarts, cache failover, regional promotion, blue-green release, token refresh load, and rollback is much closer to the standard required for critical identity infrastructure.
Writer’s Overview
Ajit Gupta – Co-Founder & CEO, Midships
Ajit leads Midships Group’s transition from a specialist identity consultancy to a portfolio of autonomous, AI-native business units. He focuses on long-term business relevance through platform thinking, customer outcomes, and scalable operating models.
Short bio: Ajit is a strategic founder with deep expertise in IAM, platform delivery, and AI services, driving Midships’ expansion across Asia, the Middle East, and beyond.



Comments