29 items found for ""
- Deploying a Highly Available ForgeRock Stack on Kubernetes
For this blog I am sharing my experience of deploying #ForgeRock #accessmanager and #directoryservices in a highly available manner onto a #Kubernetes cluster. About Juan Redondo I am a full stack developer with experience across #IAM, #Kubernetes, #Cloud, and #DevOps. I am accredited on #ForgeRock Access Manager and has Mentor Status. For any queries, feedback you may have please contact me on juan@midships.io Now that we have explained how our accelerator implements the CIA triad for our sensitive data in a previous post, it is time to move on to configure our ForgeRock stack to achieve this same practice at an application level. Let's assume that the target architecture to be deployed is the following one: As you can observe, each of the components is deployed in HA, with DS instances making use of the replication protocol that can be easily enabled in our accelerator during deployment and AM using the clustering configuration (site) that ForgeRock offers. You should also note that DS instances do not automatically scale horizontally due to their stateful nature (they are deployed as stateful sets), whereas AM deployment can be configured to auto-scale based on incoming load traffic, due to its stateless nature. The end state of our Kubernetes cluster should look like the following: We can also verify that automatic self-replication has been enabled for our DS instances. We can use the DS user-store pods as an example to check this feature: As observed, the two configured user-store pods are replicating all the data from the ou=users LDAP branch, which holds all the user entries that have been created during registration in AM. It is important to mention that our ForgeRock accelerator will also take care of the hosts naming convention of the DS instances (userstore,tokenstore,configstore) which will be based in the number of replicas specified beforehand while configuring the deployment. This ensures that all the replicas have connectivity between them so they can self-replicate the data. The next step is to verify how the AM clustering configuration has been applied, considering that there are some caveats while deploying AM in HA, mainly related to the use of secrets on a clustered environment (https://bugster.forgerock.org/jira/browse/OPENAM-14771). The bug that the ticket describes will cause the following Catalina stack trace on your secondary AM instance once scaled: This exception is caused by the fact that during the deployment of the primary AM instance, AM will create a /secrets/encrypted/storepass and /secrets/encrypted/entrypass entry, containing randomly generated hashes to access the AM keystore and keystore aliases respectively. These entries are not automatically created on the secondary server, thereby preventing AM from: opening the keystore using any of the aliases to sign any token/assertion issued to that AM instance This also results in an exception being thrown when trying to access the JWK_URI endpoint for token verification in all the secondary AM instances: In a VMs world, the solution for this issue will be to copy the keystore files and password entries (/secrets/encrypted/storepass and /secrets/encrypted/entrypass) of the primary instance to all the secondary instances, including a restart of the secondary instances so that the new keys can be retrieved by AM during start-up. This can all be easily automated using an IT automation tool such as Ansible. In Kubernetes this operation is a bit more complex since the restarting of the container can cause that the main process of the pod is killed and hence the pod will remain in a continuous “Error” status. Our accelerator has been designed to take care of this issue by storing and retrieving the keystore files and password entries during runtime, with an automatic management of the secondary AM servers restart, so that once you configure the desired number of AM replicas during deployment, you do not need to worry about keys alignment and can instead focus on using the features ForgeRock can offer! I hope this solves a few frustrations for those trying to resolve high availability issues when deploying ForgeRock on Kubernetes, Juan
- How to keep your ForgeRock configuration secure when deploying to Kubernetes
For my second blog I thought it might be interesting to address the problem of how to keep your #ForgeRock configuration secure when deploying to #Kubernetes. About Juan Redondo I am a full stack developer with experience across #IAM, #Kubernetes, #Cloud, and #DevOps. I am accredited on #ForgeRock Access Manager and has Mentor Status. For any queries, feedback you may have please contact me on juan@midships.io Now that you have decided to move your ForgeRock deployment to K8s you might be concerned about two important areas in your architecture. Yes, we are talking about High Availability (HA) and Secrets Management. As an enterprise, you will want to adhere to the well-known CIA triad in security policy development (Confidentiality-Integrity-Availability). So, the questions we need to answer are : How do we implement this practices in our brand-new ForgeRock K8s deployment? Does it differ in some way with the standard approaches taken in Virtual Server world? To answer to these questions, we will be relying on the out-of-the-box settings provided in our ForgeRock accelerator. One of the key features that our accelerator provides is a Secrets Management solution (#Hashicorp Vault) that will take care of retrieving the required secrets during runtime for each of the components of the ForgeRock stack (AM, Config store, User store and Token store), as observed in the CI/CD architecture below: We use the Vault is to store all of our ForgeRock related secrets (certificates, keys, passwords etc), In addition, our accelerator uses the vault to also hold the bespoke customer specific configuration. This ensures that all this sensitive data is centrally managed, remains secure and can be environment specific. Since the Vault can be scaled, it will also ensure that the secret information is always available for the ForgeRock components during runtime. This approach also solves the dependency of using K8s secrets implementation, which will rely on multi-cluster deployments to provide this same HA for the secrets used by the ForgeRock stack. Once we trigger a deployment, we will observe in the pod logs how the components configuration and secrets are securely pulled from the Vault paths during runtime. Taking the AM pod as an example, we can observe how the certificates and the application passwords are retrieved from the Vault: Once the deployment is finished, you will have a FR running stack fully integrated with a secure Secrets Management solution which will be used to centrally manage your FR application configuration and any sensitive data. I hope you found this helpful. If you have any queries please let me know. Juan
- So you want to deploy the ForgeRock User Store on Kubernetes?
This post is aimed at #ForgeRock practitioners who are deploying the user store on #Kubernetes. ForgeRock's advice currently recommends: "... deploying DS in VMs as the processes to do maintenance and troubleshooting are well mastered..." however they "understand and accept that customers choose to deploy in Kubernetes, as long as they understand the limitations" Refer to: https://forum.forgerock.com/2019/06/directory-services-docker-kubernetes-friends-foes/ In this paper we will explain how Midships mitigate the risks associated with deploying the ForgeRock User Store on Kubernetes. Please reach out to us if you have any queries. Utilise the benefits of Kubernetes (Auto-Restarts and Statefulsets) Our ForgeRock Accelerator leverages the benefits that comes with using Kubernetes, namely: Auto-scaling Auto-restart on failure Persistent storage. [2] and [3] are leveraged by the User, Configuration and Token stores whereas [1] is used by the Access Manager. Kubernetes orchestrates Stateful applications using a combination of its “stateful sets”, “persistent volume” and “persistent volume claims” frameworks. All stores are setup with a persistent volume set to a “Retain” reclaim policy. This ensure that when the application is deleted the data remains for later use. Note: Persistent Volumes use disks outside of the Kubernetes estate, with varying performance and costs, just like Virtual Machines. i.e. the risk of data loss through physical failure is similar to that of Virtual Machines. On most cloud providers you can utilise block-level data storage products that are low latency, high performing, durable, and reliable. Deploy Multiple User Store Instances Our ForgeRock Accelerator by default deploys a minimum of two User Store instances in each region in an active-active state, ensuring high availability and data redundancy. Each instance has its own dedicated storage. Prior to production, we recommend that Customers configure the User Store sizing to support the peak production loads such that in the event a User Store failure, the impact to customers is minimised. Data Replication By default all User, Configuration, and Token Stores are deployed and configured with self replication turn on to ensure that all instances are kept in sync. In the case where replication is required across regions or cloud providers, replication servers can be used. Following the restart of a failed User Store instance, the already running instance will ensure it is up to date by replicating any information that was added, removed or modified while it was unavailable and being restarted. Regular Backups We provide our customers with a runbook on how to take regular snapshots of the underlining cloud disks supporting the persistent volumes and how to restore in the event of a disaster. Note we have procedures for AWS, GCP, Azure, AliCloud and OCI. Note that we recommended customers to move to a multi region and / or multi cloud model to provide an additional layer of resilience when possible. In the event the underlying storage does fail, the procedures provide an expedient mechanism to recover Customer accounts. Please contact us if you have any queries, require clarification or would like to discuss other Topics relating to #ForgeRock #Kubernetes #DevOps.
- Backup your containerised ForgeRock User Store reliably today!
This is my first blog and is aimed at any ForgeRock SMEs looking to deploy Directory Services on Kubernetes and providing instructions on how to backup and quickly recover data in the event of a major loss of data. Before I get started, let me introduce myself. My name is Juan Redondo and I am a full stack developer with experience across #IAM, #Kubernetes, #Cloud, and #DevOps. I am accredited on #ForgeRock Access Manager and also have Mentor Status. For any queries, feedback you may have 😊 and any other topics you would like me to pick up in the future, please contact me on juan@midships.io Ok, so you have decided to migrate your on premise ForgeRock deployment to the cloud and you want to automate your deployments using Kubernetes to adopt the newest DevOps practices and benefits (open-source, scalability, resource management, automation etc.), but what about handling incidents such as data loss and recovering from those? After all is Kubernetes really geared to handle stateful data sets (and critical ones such as the user store). At Midships we have developed an accelerator which consistently and reliably deploys a full ForgeRock stack on K8s as illustrated below using a combination of persistent volumes, persistent volume claims and a Kubernetes cluster. Our architecture is highly available with self-replication between the DS instances turned on. I.e. if we lose one DS instance we can rebuild easily from the other. However, what happens if either the DS Token Store or User Store become corrupt and inadvertently replicate the corruptions to the other instances? We don’t want to lose data otherwise authentications & authorisations will fail and if we cannot recover, this could have a long standing impact on customers. Below is how we backup and restore the Directory Services on #AliCloud, and this is similar for #GCP, #AWS, #Azure too. In the AliCloud console, under Elastic Compute Service->Disks we will first take a snapshot of the DS User Store persistent volume disks. Once the snapshot is ready it will be available in the Snapshots section: We will then proceed to create a Persistent Volume using the generated Snapshot image. This can be achieved under Elastic Compute Service->Disks->Create Disk. We will create the disk using the Snapshot generated on the previous step: At this point we already have a backup of our User Store instances and have securely created a disk which can be used to mount a PV in case data gets corrupted in the already mounted PV in the DS User Store instances. This can be automated to be performed at regular intervals as required. The image below shows how to create a new PV under Container Services Kubernetes->Persistent Volumes. We will just need to specify the disk Id of the disk that was created on the previous step: Once the PV has been created, we will also create a PVC under Container Services Kubernetes->Persistent Volume Claims and we will associate the PVC with the newly created PV. LET'S RESTORE As part of our restore testing, we are going to delete the helm release for the DS User Store as well as all the associated PVs that the DS User Store pods were mounted in. We should now see that the newly created PV from the previous step is displayed in the list of PVs in the defined namespace and that the original DS User Store PV has been deleted: Final step to test the restore procedure is to specify in the DS User Store deployment YAML the PVC that has been created and that will bound to created PV which contains our data (identities) to be restored, since we do not want Kubernetes to dynamically provision a new PV in this case Now it is time to deploy the DS User Store helm chart again and wait until the PV is mounted in the DS User Store pod. If we describe the DS User Store pod, it should look like the following: Also, if we check the DS User Store pod logs, we will see that the Midships ForgeRock accelerator code for this component already checks that the instance has been already configured if we are mounting a PV where the DS User Store instance was already configured and held user identities, as the pod logs describe: Finally, we can access the AM instance and verify under the Identities section that we have successfully recovered the identities that were in the DS User Store instance when we took the backup of the PV: As we have observed the process of taking a backup and restore of a PV does not differ that much on the procedures and operations that we would run in a VM environment. To summarise, if you are deploying Directory Services on Kubernetes, make sure you have multiple replicas (ideally across availability zones and regions) and do regular backups – as it is easy to restore. Happy backup! Juan
- Stretched / Multi Zone Cluster Considerations
At a recent client, the subject of Stretched / Multizone Clusters came up. The following provides a summary of what these are and some of the considerations that should be taken into account when using them. DEFINITIONS A Stretched Cluster (as illustrated below) is defined as a cluster which has worker nodes distributed across multiple Availability Zones (AZ) within a single region thereby providing resilience in the event of a single Availability Zone failure. Stretched / Multizone Clustering Limitations of Note: (refer to: https://kubernetes.io/docs/setup/best-practices/multiple-zones/) Kubernetes assumes that the different zones are located close to each other in the network, and as a result they do not perform any zone-aware routing. As a result inter service traffic may cross zones (even if the same pods exist in the same zone) incurring additional latency and cost. Whilst nodes are in multiple zones, by default kube-up currently builds a single master node. While services are highly available and can tolerate the loss of a zone, the control plane is located in a single zone. Where a highly available control plane is required this needs to be configured (refer to: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/). Communications between pods on a cluster are not encrypted. Pods will only be able to attached to Persistent Volumes in the same Availability Zone as themselves. Kubernetes will distribute pods across nodes in multiple Availability Zone using its “SelectorSpreadPriority” which is based in best endeavours (not guaranteed). KEY CONSIDERATIONS BENEFITS The cluster is highly available in the event of a single AZ failure. All of the pods/containers deployed to the remaining AZs will continue to operate. Efficient resource allocation as all Pods will share the single cluster resources which means there is higher overall utilisation. It is easier to manage a single cluster as opposed to multiple clusters across AZs CHALLENGES The cluster itself remains a single point of failure (SPOF). It is not uncommon for Kubernetes upgrades to lead to cluster failures or requiring full restarts (refer to: https://www.alibabacloud.com/help/doc-detail/86497.htm). Mitigations include: Deploy a second stretched cluster; Ensure any upgrades are tested in lower environments (and operated for a period of time) before applying to production. Increased security risk where multiple services sharing a common stretched cluster. If one application running on the cluster is compromised and the attackers gain access to the underlying host they may be able to gain access to other services running on the cluster. Mitigations over and above hardening include: separating sensitive workloads by either having dedicated clusters or by using node pools. Pods may not be equally distributed across the AZs thereby making the failure of one AZ more impactful than another. Mitigations include: Use Kubernetes constraints such as “podAffinity”, “podAntiAffinity”, “nodeAffinity”, etc; and, ensure that each service has at least one instance in each AZ. Network latency may impact performance where AZs are not located near one another given that Kubernetes does not intelligently route inter service traffic such that it uses services from the same zone where available. Mitigations include: Performance test to understand the impact if key services are being called from another AZ. Misconfigurations or poor adoption of best practices such as configuring resource limits can affect all services hosted on the cluster. For example a pod/container could consume resources required by other services where limits are not applied and thereby causing failures. Mitigations include: Ensure Kubernetes best practices are adhered to (refer to: https://kubernetes.io/docs/tasks/administer-cluster/securing-a-cluster/) #multizone #kubernetes