29 items found for ""
- Midships’ Approach To Digital Transaction Signing
Executive Summary In this paper, we presents Midships' solution for digital transaction signing that enhances the security and integrity of digital banking services. The solution is built on Passwordless and FIDO principles, utilising device-level PKI to facilitate transaction signing. The private key is securely stored in the user's device, while the public key is registered with the backend. This method ensures zero knowledge of the user's signing credential on the backend system, shifting the burden of proof to the customer in the event of repudiation. Through digital signatures and JWTs injected with the transaction details, this solution provides clear and indisputable evidence of a transaction's authenticity. Midships can provide expertise and capabilities to financial service organisations, integrating this cutting-edge solution into their existing infrastructures. Background & Purpose For financial institutions, it is imperative to maintain the integrity and security of their digital banking services. One essential aspect is to incorporate digital transaction signing to ensure the bank’s ability to demonstrate that a transaction could only have been authorised by the customer. This ability is crucial for managing liability and customer repudiation. The process of transaction signing can provide a secure and reliable method of verifying and authenticating digital transactions. By utilising digital signatures, we can unequivocally link a transaction to a specific individual, thereby providing clear and indisputable evidence of a transaction's authenticity. The presence of transaction signing will not only enhance the security of the digitalised services but also allow the bank to effectively counter repudiation claims. In order for these desired outcomes to be realised, the digital transaction signing solution must address a few key considerations, notably: Zero Knowledge : The concept of zero knowledge has been widely adopted in user password implementations to ensure that the exposure of users’ stored credentials in the backend does not amount to a password compromised situation. The same concept is equally applicable for a transaction signing solution where the the backend system should have zero knowledge of the user’s credential that is to be used to sign a transaction, thereby is able to effectively shift the burden of proof from the bank to the customer in the event of repudiation. Transaction Binding : Transaction signing is more than step-up authentication enforced by transactional authorisation. A key outcome through the process of transaction signing is to bind the attributes of a transaction to an identity such that the authenticity established illustrates clearly the transaction that has been authorised. This is different from a pure step-up authentication where we just want to authenticate the user on a transactional basis in order to provide an extra layer of security. This paper outlines Midships' digital transaction signing solution that addresses the above considerations and expectations. We provide a straightforward and effective solution framework on transaction signing that can be implemented on any tech stack. We also offer a readily implementable Ping Identity (ForgeRock) native solution. Our approach, which has been successfully implemented with a digital bank in Indonesia, provided a frictionless and secure transaction signing experience on the mobile banking application. Midships' Solution Midships’ digital transaction signing solution is built on Passwordless and FIDO principles. Through the establishment of a device level PKI, similar to FIDO2 passkeys, customers can carry out transaction signing using their devices. As the device holds the private key and the backend is registered with the public key, the device will be able to sign a transaction where the digital signature can be verified by the backend. For end-to-end integration, our solution make use of the OAuth 2.0 protocol where JWTs (JSON Web Tokens) containing the transaction attributes are used as the proof of transaction signing completion. Our approach allows the realisation of the below desired outcomes: Shift Of Proof : As the private key on the customer device was never held by the bank (zero knowledge), the bank no longer shares the burden of proof on whether or not the private key was compromised. Via reasonable T&Cs terms to indicate that the customers are responsible for their device access, our solution will allow the bank to achieve non-repudiation by demonstrating that the transaction could only have been signed by the customer. Enhanced Security : As the mobile device itself serves as the basis of authentication, the solution provides additional security through a Passwordless approach. The customer experience to carry out transaction signing will also be seamless as only OS level authorisation gestures are involved to unlock the private key for signing . Moreover, since there is no OTP sent to the customer, there are no opportunities exposed for malicious actors to trick customers into sharing any credentials. Clear Evidence : In the solution, we make use of JWT that is injected with the transaction attributes that has been signed and verified to act as an immutable evidence that the user has authorised the transaction. A transaction will be submitted with its corresponding JWT such that the backend service that is handling the transaction can rely on the details contained in the JWT to ensure only the authorised transaction is executed. Clean Integration : This solution has a clear separation of concerns. The entire transaction signing process is managed by the bank's Customer Identity & Access Management (CIAM) system. End-to-end integration is straightforward via the use of JWT for validation. Our integration approach can address stateless and immutable requirements for different type of enterprise architecture. How it works Registration & Enrolment To allow transaction signing ability on the user’s device, an authenticated user will need to first enrol the device with the CIAM system via registration of the device’s meta data (with unique deviceID) and a public key. The private/public key pair needs to be generated by the device such that the private key will be stored in the device’s secure enclave, ensuring zero exposure of the private key outside of the device. The banking app can configure private key access policy to necessitate local authentication using OS-level authenticator options, such as face ID or device pin to further protect the private key. Digital Transaction Signing On Registered Device The transaction signing process on registered device involves the banking app to first initiate transaction signing with the CIAM system by informing the CIAM system on the transaction details such as the amount and the beneficiary that the user want to carryout. The CIAM system will then facilitate the app to sign those transaction details. To carryout the signing, the app will ask for the user to carryout an authorisation gesture that can be in the form of a faceID/device pin validation which will unlock the private key. The generated digital signature can then be submitted to the CIAM system for verification using the registered public key against the transaction details. Upon successful digital signature verification, the CIAM system will coordinate with the banking app to issue an short lived transaction token (JWT) that is injected with the transaction details. Once the banking app receives the transaction token, it can submit the transaction request with the transaction token as the authorisation header. The API gateway that is protecting the payment service can locally validate the transaction token to control the access. The payment service can rely on the details contained in the transaction token to ensure the transaction request has not been tampered and only authorised transaction is executed. Digital Transaction Signing using Registered Device In our solution, a registered device can also be used for transaction signing when the transaction is initiated on a separate application. For example, the user can be initiating a transaction via the internet banking channel on the browser. The internet banking channel will first initiate transaction signing with the CIAM system by informing the CIAM system on the transaction details that the user want to carryout. The CIAM system will then send push notification to the user’s registered device and facilitate the transaction signing. While the user access his/her registered device to sign the transaction, the internet banking channel will poll against the CIAM system for the result. Once the user has completed the transaction signing process, the CIAM system will coordinate with the internet banking channel to issue the transaction token. Once the banking app receives the transaction token, it can submit the transaction request with the transaction token as the authorisation header. Other Considerations/Decisions Non-Replayability As the transaction token (JWT) can be configured to have an extremely short expiry, the time window for replay is minimised. However this approach does not guarantee non-replayability. To achieve non-replayability of the transaction token, there are 2 approaches in our solution. First is to validate the transaction token against the CIAM system which will also blacklist the transaction token for one time use. CIAM systems that support OAuth 2.0 flows typically also provide token revocation capabilities. Second is for the API gateway to utilise a cache service such as Redis to blacklist the transaction token’s JWT ID (JTI) with the time to live equal to the token’s expiry. This approach will allow the transaction token validation to be carried locally on the API gateway and at the same time guarantee non-replayability. Extend with transaction Pin In the above solution, we included only local OS-level authenticator options as part of the transaction signing process. We can also easily include other forms of authentication options such as a transaction pin. Integration of other authentication options in our solution will only require the CIAM service to facilitate the authentication process prior to the unlock of the private key. Offline support As a registered device can sign transactions and generate transaction signatures without an internet connection, our solution will allow organisations to utilise the transaction signing ability to handle cases where the customer has limited access to internet connectivity. This will allow more flexibility in the way organisations design their customer journeys. Are you interested? If you would like to learn more, please contact ajit@midships.io
- Agent Verification: A low cost and simple approach to battle Social Engineering!
Executive Summary This paper outlines Midships’ approach to combat social engineering frauds, focusing on the scenario of phone call-based attacks. The solution presented in this paper enables agent verification via Time-based One-Time Password (TOTP) technology that can prevent malicious actors from successfully impersonating call center staff. The solution allows customers to authenticate an agent from the organization using their mobile devices even in offline mode. Midships can provide expertise and capabilities to organisations, integrating this solution into their existing infrastructures to better protect their customers. Background & Purpose Frauds involving social engineering is a significant security concern for organisations as it directly targets the human element, often the weakest link in the security chain. Finding a solution to mitigate the risk of customers becoming victims of fraud from social engineering can be extremely challenging as this problem revolves around human nature and psychology. In this paper, we focus on social engineering through phone calls. Compare to other mediums of communication, phone calls allow the malicious actor to exert pressure and practise psychological techniques such as cold reading and neuro linguistic programming. Therefore phone call based social engineering has a better chance for the malicious actor to deceive the victim into performing financial transactions or sharing certain sensitive information. To effectively mitigate this risk, we have identified the below key considerations : Agent Authentication : In order for the malicious actor to pull off the attack, he/she will need to first impersonate a call center staff from the organisation. Therefore to allow the customer to have the ability to authenticate the agent easily will be an effective approach to block the attack from being successfully executed. Journey Standardisation : The customer journey when interacting with a genuine staff must be different from fraudulent actors. By including agent authentication as part of the SOP for call center staffs and informing the customers on this SOP will be important to mitigate the risk of social engineering attacks. As the customers get accustomed to the standard journey that involves them carrying out an agent authentication, they will be less likely to fall into the traps of malicious actors even in a stressful situation. This paper outlines Midships' agent verification solution that is built on TOTP technology. We provide a straightforward and effective solution framework on agent verification that can be implemented on any tech stack. We also offer a readily implementable Ping Identity (ForgeRock) native solution. For any organisation that provide mobile applications to their customers, this solution will grant customers the ability to authenticate an agent from the organisation using their mobile phones. Midships Solution Midships’ agent verification solution is built on TOTP technology. When customers onboard to the mobile application on their mobile devices, the device registration journey will include a TOTP seed exchange using secure key exchange protocol. This seed is to be securely stored in both the device and in the backend. This will allows the same OTP to be generated by both the device and the backend that can be used for agent authentication. Our approach allows the realisation of the below desired outcomes: Bi-Directional Authentication : As TOTP is a symmetric cryptographic algorithm, this allow bi-directional authentication to be possible. An OTP can be generated by the backend and that OTP can be verified by the device. This will allow a user to established the authenticity of the agent. Wide Coverage : As almost everybody these days owns a smart phone, this approach will be able to cover most if not all customers. This will allow inclusion of the agent authentication into SOPs to have little concerns over customer coverage. Offline Support : For the device to generate and verify TOTP, it does not require any internet connectivity. This will allow customers that has limited internet access on their mobile devices to be able to carryout agent authentication in offline manner. How it works Registration & Enrolment To enable agent verification ability on the mobile device, the device must first be registered with the Customer Identity & Access Management (CIAM) system. During the device enrolment process, the CIAM system will generate a TOTP seed for agent verification which will be registered to the user profile and also returned to the mobile app via secure key exchange protocol. The mobile app will store the TOTP seed in the device’s secure enclave and can configure access policy to necessitate local authentication using OS-level authenticator options, such as face ID or device pin to further protect the TOTP seed. Agent Verification The agent verification process begins with the agent retrieving an TOTP from the CIAM system. The CIAM system will protect the agent verification TOTP generation endpoint such that only authenticated and authorised agent can access this endpoint. The agent will need to provide the CIAM system with the user identifier which the CIAM system will use to pinpoint the seed for TOTP generation. Upon receiving the TOTP, the agent can call the customer and provide the TOTP. The customer can verify this TOTP locally on the device as the device can generate the same TOTP using the same seed stored. Other Considerations Seed Dedication And TOTP Time Step The TOTP seed for agent verification must be dedicated such that the TOTP generated can only be used for agent verification and nothing else. This is particularly important for organisations that already provide TOTP features for user authentication. The seed used for user authentication should never be the same seed used for agent verification. This will ensure the exposure of TOTP to the agent does not amount to opportunities of internal hacking. The expiry of an TOTP depends on the time step used in TOTP generation. As agent interacts with the customer in real-time, the time step used for agent verification should be in the range of a few minutes to minimise the time window of the TOTP replayability. In addition, to handle time out of sync cases, the CIAM system will need to provide a time sync API for the mobile app to periodically carryout time synchronisation. Other Communication Medium This solution can also be applied to email and SMS where the message includes a TOTP for the customer to verify the authenticity of the message. As email and SMS do not involve real time interaction with the user, the time step used for TOTP generation will need to be extended from minutes to hours. The message should also inform the user when the OTP will be expired and provide a customer journey where the customer can request for the message to be resent with a new TOTP. Are you interested? If you would like to learn more, please contact ajit@midships.io
- Hybrid CIAM Architecture With ID Cloud
Control CIAM Resiliency with Hybrid Architecture This paper discusses Midships' design and implementation of a hybrid Customer Identity & Access Management (CIAM) architecture. The hybrid solution is proposed to address concerns about shifting resiliency responsibilities entirely to Software as a Service (SaaS) CIAM services, such as ID Cloud. The hybrid architecture utilises an on-premise CIAM deployment as a standby in the event that ID Cloud is unavailable. Where ID Cloud is unavailable, this hybrid design allows for a near-zero recovery point objective (RPO) and a recovery time objective (RTO) within minutes. We cover how to: Manage user data synchronisation and traffic control; Options for the type of standby deployment are given to balance RTO and cost; and, How Midships uses one pipeline to ensure both the ID Cloud and the on-premise deployment always share the same configuration. By adopting this hybrid architecture, organisations can achieve a highly resilient CIAM architecture and retain control to act swiftly in the event of a disaster. Background & Purpose Customer Identity & Access Management (CIAM) service is one of the most critical system within an enterprise architecture. The availability of the CIAM service directly affects whether the organisation’s digital services can be accessed by its customers. Therefore, CIAM service requires a resilient architecture design to meet the desired Recovery Point Objective (RPO) and Recovery Time Objective (RTO). While organisations consider Software as a Service (SaaS) CIAM services such as ID Cloud, a common concern is towards the shift of resiliency responsibilities for CIAM entirely to SaaS. In this paper, we address this concern by illustrating a hybrid design where an on-premise deployment acts as a standby CIAM service to ID Cloud. This hybrid architecture allows the organization to achieve the following key outcomes during an ID Cloud outage: Near Zero RPO: This is achieved by synchronising user data from the ID Cloud to the on-premise setup. Minutes Ranged RTO: This is achieved by changing the CIAM traffic routing to the on-premise setup. By adopting this hybrid architecture, organisations can achieve a highly resilient CIAM architecture and retain the control to act swiftly in a disaster situation to recover the CIAM service and minimise any customer impact. Solution Hybrid Architecture In this solution, we introduce a hybrid architecture that involves an active-passive (standby) deployment of the CIAM service. The active deployment of the CIAM service will be on ID cloud and the standby deployment of the CIAM service will be on-premise. In order to achieve the desired outcomes on RPO and RTO, this solution addresses the below key considerations: User Data Sync : In order to achieve a near zero RPO, user data changes on the active deployment must be synchronised to the standby deployment in near real time. Traffic Control : In order to achieve a RTO within minutes, there should be control over the routing of the CIAM traffic such that we can easily direct traffic from the active deployment to the standby deployment . Hot Standby Vs Warm Standby : Maintaining the standby deployment involves costs, there should be options on the standby approach to optimised the balance of RTO time and costs. Single Pipeline, Same Configuration : Both deployments of CIAM should have the same configuration and behaviour, this confidence is important for the shift decision to be made during an outage. User Store Sync Near real-time sync of user data is achieved via configuring the Live-Sync capability supported on ID Cloud. In order for ID Cloud to perform live-sync with the on-premise user store, a remote connector server (RCS) needs to be deployed on-premise which acts as the on-premise user store connector for ID Cloud. RCS establishes web-socket connections with ID Cloud for bi-directional communications to facilitate the data sync process. The communications between RCS and ID Cloud are secured via standard SSL and OAuth 2 access token (client credential grant). For more details on RCS, please refer to : Remote connectors :: ICF 1.5.20.21 Traffic Control On-premise deployed Identity Gateways (IG) act as an reverse proxy to route all CIAM traffic to either ID Cloud or the on-premise CIAM deployment. This allow IG to be the platform to control all the CIAM traffic. In the event of an outage, IG can be updated to switch the routing targets and achieve a RTO within minutes. IG being an stateless gateway, can auto-scale and recover easily. The resiliency and availability of IG can be addressed separately via multi-cluster deployment approaches. Hot Standby Vs Warm Standby For the on-premise standby deployment, there are 2 options an organisation can adopt. The first option is a hot standby approach where all the components of the CIAM system are deployed and running. The compute layer (IDM and AM) can be maintained at the minimum capacity with auto-scaling capability. The data layer (user store, token store and app policy store) needs to be maintained at the production-required capacity. This approach can achieve an extremely short RTO time as the activities involve during a traffic switch is only on updating the IG routing target. The second option is a warm standby approach where only the user store is deployed and running with production-required capacity. In the event of a traffic switch, CIAM deployment pipeline need to be invoked to deploy the rest of the CIAM components before updating the IG routing target. Compared to the hot standby approach, this option requires a longer RTO time (~30min and could be as low as 10min) but incurs lower cost in maintaining the standby deployment. Single Pipeline, Same Configuration ID Cloud and the on-premise CIAM deployment should always share the same configuration and behaviour such that we can confidently carryout a traffic switch whenever the situation amounts to. This confidence is achieved via a single pipeline and codebase that facilitate a CICD process where the same configuration are applied to both deployments for any release. Manual configuration needs to be avoided completely via codification of CIAM configuration. By using the Midship’s accelerator, one will be achieve this single pipeline CICD process. Additionally, the accelerator also makes version upgrade and security patching much easier to handle as the organisation seek to keep the on-premise deployment secure and updated. Other considerations Shift back to ID cloud When ID Cloud has recovered and is ready for traffic to be switched back, there will be a few key operational activities to be carried before the switch back. On ID Cloud, turn off user data live-sync (from ID Cloud to on-premise) ON ID Cloud, run user data reconciliation (from on-premise to ID Cloud) On ID Cloud, turn on live-sync (from on-premise to ID Cloud) and take down the sync token Carry out the traffic switch On ID Cloud, turn off live-sync (from on-premise to ID Cloud) On ID Cloud, turn on user data live-sync (from ID Cloud to on-premise) with sync token Long-live refresh token and Dynamic OAuth 2 client registration By avoiding synchronisation of token store, we can reduce the complexity of the data synchronisation between the 2 deployment without much draw back as it is generally acceptable for customers to login again in the event of an outage. However, in the case where the organisation issues long-live refresh tokens, token store synchronisation maybe required. Similarly, if the organisation supports dynamic OAuth 2 client registration (as part of OIDC), app policy store synchronisation may be required. The approach to synchronise token store or app policy store will be the same as user store but one must pay extra attention on secret management of AM to ensure both deployments use the same secrets. Parallel Run Parallel Run can be achieved in this hybrid architecture with additional enhancements. First is that IG will need to have logic to enforce stickiness such that CIAM requests within a user’s session are handled by one deployment. This stickiness can be lifted if there is no stateful features used (such a refresh token, stateful session, Auth Code grants) and by paying extra attention on secret management of AM to ensure both deployments uses the same secrets. Second is to configure bi-directional live-sync on ID cloud. Extra attention and filtering scripts will be needed to avoid synchronisation loops and address data conflicts. In generally, unless necessary, parallel run should be avoided due to the amount of added complexity. Are you interested? If you would like to learn more, please contact sales@midships.io
- Optimise Your Directory Service
Keep the User Store Clean Over time, dormant user profiles and historical device profiles can accumulate in the user store (directory service), resulting in a significant portion of unused data. This paper presents a simple solution to keep the user store clean by maintaining only active user profiles and device profiles. This solution includes a directory service clean-up process and alterations to the Access Management (AM) Authentication Tree to manage dormant user journeys. Consequently, organisations can lower their license costs and optimise the user store's size (which is important for speed of backup & restore). Solution DS Clean Up Process The DS clean up process involves the below key activities: A periodic CronJob triggers a dormant profile batch service to carryout filtered LDIF search on the user store. From the search results, the dormant profile batch service determines the dormant user profiles and archive them by posting to the profile archival service. User credentials such as the password hash do not need to be archived The dormant profile batch service then carry out LDIF delete on the user store to clean up on the inactive/dormant user profiles and device profile To carryout filtered LDIF search based on timestamp and profile status require certain configurations on the directory service. Time based filtering on LDIF search : LDAP search :: ForgeRock Directory Services Big index for attributes like profile status: Index types :: ForgeRock Directory Services The DS clean up process can be handled by IDM (or cron job with bash scripts) via configuration of scheduled script jobs and the archival of the dormant user profile can be achieved via configuring DB connector and mapping. Therefore If the organisation uses IDM , dormant profile batch service and profile archival service can be replaced by IDM. AM Authentication Tree Changes Key changes on AM authentications trees to handle dormant users: Authentication Tree : During authentication flow, when AM could not find the userID in the user store, AM should query the profile archival service to determine if this user is a archived dormant user. If this user is a archived dormant user, AM should facilitate the user to the reset password flow. Reset Password Tree : During reset password flow, when AM could not find the user profile in the user store, AM should query the profile archival service to determine if this user is a archived dormant user. If this user is a archived dormant user, AM should migrate the profile from the database to the user store. Similar to the DS clean up process, if the organisation uses IDM , profile archival service can be replaced by IDM and AM can migrate the archived dormant user profile from the database to the user store by using IDM’s system object APIs. Other DS Optimisations At Midships we can also help you optimise your onboarding/registration processes to limit the creation of user accounts until after eKYC has been completed and thereby further limiting DS to holding verified active user accounts. Are you interested? If you would like to learn more, please contact sales@midships.io
- Mobile Message Broker – Midships approach to offline & fast mobile banking
Executive Summary Now mobile banking has become the cornerstone of financial management, Midships is pioneering solutions to meet emerging consumer demands. One of our focus is to achieve two pivotal enhancements: Offline Mobile Banking: We're revolutionising the way customers interact with their finances in connectivity-challenged areas. Our solution empowers users to view balances and prepare transactions offline, a boon for those in remote regions or reliant on intermittent Wi-Fi access. Enhanced Performance: We're redefining responsiveness in mobile banking. Our architecture significantly reduces the latency from app launch to displaying the latest account status, ensuring customers have immediate access to their financial information. Leveraged by Indonesia's first digital bank in 2019, the Midships Mobile Message Broker (MMB) framework is a testament to our commitment to innovation. Via an event-driven architecture, we empowered the mobile banking application to be more than just a window to its backend services. This architecture negates the need for the mobile application to perform synchronous data pull requests and instead keeps the application updated via event messages. This significantly reduced the time needed for the mobile application to be ready for use after user authentication. It also enabled offline accessibility of the mobile application to allow user interactions even when there is no internet connectivity . Midships can provide expertise and capabilities to financial service organisations, integrating this cutting-edge solution into their existing infrastructures, thereby elevating the mobile banking experience for customers worldwide. Background & Purpose In the digital banking revolution, mobile banking applications have become essential for customers to manage their finances and execute transactions such as money transfers and bill payments. Users demand seamless, highly responsive solutions that provide comprehensive banking capabilities. Sometimes, they require these digitalised banking services to remain available without any internet connectivity to accommodate scenarios like travel. Some users may not have internet connectivity available to them all the time or have poor internet connection. The mobile banking solutions should be inclusive for those users by providing a good customer experience in situations where the internet connection is not available or not stable. However, several banking practices have not aligned with these expectations, notably: Synchronous Design: Banks often employ synchronous post-authentication data pulling on user’s full financial information. Due to the large amount of data to be pulled per user login, this approach often creates a long loading time for the app to enter the main dashboard. In addition, as the mobile application is often not designed to store any user financial information, retrieving user’s historical financial data per user login also places additional strains on the backend services. Online Dependency: Traditional apps require an internet connection for access, where slow or unstable connections can degrade the user experience. While some apps offer offline viewing, they fall short in supporting transaction queuing for offline execution, which is crucial for users in areas with poor internet reception or during activities like flying. This paper outlines Midships' innovative strategy to overcome these challenges through an event-driven architecture and by utilising a Mobile Message Broker (MMB) to integrate with the mobile app. Our approach, which has been successfully implemented with a digital bank in Indonesia, ensures a frictionless and efficient mobile banking experience, catering to the modern customer's needs for flexibility and reliability. Midships Solution Midships has recognised the necessity for an event-driven architecture that embraces asynchronous communication between the mobile app and backend services to fulfil the outlined requirements effectively. However, it is also understood that certain synchronous elements must be preserved. This balance allows customers to initiate manual updates, caters to scenarios demanding real-time information, and provides additional resilience in the event that the asynchronous service encounters downtime. One key concept here is to change the perspective that a mobile banking app will be largely a stateless application. We still deem the backend to be the source of truth for a user’s financial data, but given the nature of a mobile app, it can hold a copy of the user financial data and be given only the changes in those data through event messages via a mobile message broker integration. This will allow us to realise the below benefits. Performance Efficiency: Data Stream Optimisation: As the mobile application is only retrieving the updates on the user’s financial information, the size of data streams are optimised. This reduces the strain on the backend systems, allows faster loading of pages in the presence of local data and reduces the amount of network traffic. Latency Indifference: As the mobile application’s consumption of event messages is in an asynchronous manner, customer experience degradation due to network latency will be minimised. This means a customer can be in any part of the world and not feel impacted when utilising the mobile banking app. Customer Flexibility: Offline Accessibility : As the mobile application is holding on to a copy of the user’s financial data locally, read only operations remain available to customer in offline mode. Backend Resiliency Slowness Resilience: In an event-driven architecture, customers’ requests can be handled asynchronously where the response will be given to the customers via event messages. This means that connections do not need to be kept alive for an extended period of time due to database or system slowness. This ensures shared layers such API gateways and authorisation gateways will not be impacted and cause downtime due to slowness from downstream systems. Other Benefits Push Notification Capability: this solution also delivers push notification capability to customer devices that can be utilised in different use cases. Transaction Queueing Capability: this solution also allows features on the mobile application to enable customer to queue transactions in offline mode and be executed once internet connectivity is obtained. Global updates: Beside using the mobile message broker for customer level financial events, this solution also enable global level messages to be received by all the mobile apps. Asynchronous communication, while advantageous, presents its own set of challenges that must be meticulously addressed: Message Integrity Consideration Event messages need to be complete and be delivered to the mobile application without information loss. To address this consideration, this solution proposes a pulling mechanism on the mobile app to begin retrieving the event messages from the mobile message broker at app launch. While the push mechanism to the mobile is more towards notifying the mobile app to carryout the pulling mechanism when its is running in the background or in an inactive state. The message broker must also ensure message order, uniqueness and durability such that the correct and complete information can be retrieved by the mobile app. In addition, the backend remains as the source of truth for the data and should continue to provide the necessary APIs for the application to reconcile the user financial information in the event discrepancies are raised. Message Availability Consideration Event messages need to be readily available to the mobile apps. This means the message broker should be scalable as per the customer base and has the required reliability to ensure there is no down time. A customer should not be restricted to a single device and the message broker should support multiple devices to retrieve a customer’s event messages. Message Confidentiality Consideration Event message should only be accessible by the authorised identity. In the mobile app context, we address data confidentiality on twofolds, First is that the message broker needs to support user level private channels where a user’s event messages can only be retrieved by that user and no one else. Second is that the message content should be encrypted and the encryption key should only be held by the mobile application and trusted systems in the domain of the organisation to ensure no one else will be able to peek into the content of the event messages. By addressing these considerations, Midships aims to provide a robust, secure, and user-friendly mobile banking experience that aligns with the dynamic needs of today's customers. High Level Solution Architecture Key activities: Setup A) Upon opting for mobile banking, customers initiate the registration of their app and device with the bank's Customer Identity & Access Management (CIAM) service. The CIAM service will create a message channel token that the Mobile message broker (MMB) recognises. This token will have an expiry of 30days and include claims on the user’s identity reference ID and the private message channel assigned to the user. This token will be returned to the mobile app for it to persist for event message retrieval. B) The CIAM service then coordinates with the MMB to establish a private message channel for the customer. The MMB will create a new message channel if the customer is new to mobile banking. If the customer is not new to mobile banking and is only registering due to a change of device, the MMB will identify the customer’s existing message channel to register the new device for push notification. C) The CIAM service then coordinates with the Notification service to retrieve the message decryption key for the customer. Similar to MMB, the Notification service will create a new message key for encryption and decryption if the customer is new to mobile banking. The message decryption key will be returned to the mobile app via a secure key exchange mechanism for it to persist into OS level secure enclave for data decryption. This streamlined setup ensures a robust foundation for a secure and efficient mobile banking experience, aligning with Midships' commitment to customer convenience and safety. Updates Banking services processed the transaction asynchronously and publish the completion result to the Pub/Sub service. The Notification service subscribes to the completed transactions topic and receives the event message on the transaction completion The Notification service encrypts the event message content using the user’s message encryption key and publishes the transformed event message to the user’s private channel on the MMB The MMB signals the the Push notification service to send push notification to the user’s registered devices. The Push notification service sends background notification to user’s registered devices to pull message events. The mobile app receives the notification and subscribes to MMB’s user private channel with the message channel token to retrieve event messages of the user. The mobile app will filter out messages that are already stored and persists any new messages. App launch During app launch, the app should handle both offline mode and online mode. Offline mode: If the device does not have internet connectivity, the app should facilitate a local authentication using OS level authenticator options. Upon successful authentication, unlock the message decryption key from the secure enclave to decrypt the stored event messages as well as the stored user financial data. The app should then process the event messages along with the user financial data to load the dashboard for the user and persist the newest user financial data in encrypted form into the device. Now the customer can interact with the app in offline mode. Online mode: if the device has internet connectivity, the app should establish a subscription to MMB’s user private channel with the message channel token. The app should then facilitate the normal authentication flow with the CIAM service and additionally, upon successful authentication, requests the CIAM service to refresh the message channel token. The app should then carryout similar process as per offline mode to load the dashboard for the user. As the app’s subscription to the MMB will be kept alive in online mode, the app will receive in near real time the message events while the user interacts with the app. Other Considerations/Decisions Mobile Message Broker While a custom-built event-driven architecture could address the challenges outlined, we also considered third party solutions. In 2018, Midships conducted a thorough evaluation of two leading services, PubNub and Ably. Our assessment revealed that both platforms offered: Scalability: Robust solutions capable of supporting a vast customer base and their diverse device ecosystems. Global Reach: Data centers strategically positioned worldwide, enabling message storage at the edge, closest to the customer, thus achieving minimal latency. Reliability: Services boasting an impressive 99.999% uptime. Durable messages: Messages can be persisted. Low Latency: Less than 50ms from device to Edge Server Quality of Service: A commitment to guaranteed message delivery. Security and Efficiency: End-to-end message encryption coupled with message compression. Seamless Integration: Compatibility with strategic CIAM solutions. Secure Authentication: Utilisation of token-based authentication mechanisms (whose lifetime can be customised). Additional Features: A suite of other beneficial functionalities. We found that the comprehensive benefits offered by these third-party services significantly outweighed the costs and efforts associated with developing and maintaining a comparable in-house solution. By adopting a Mobile Message Broker, Midships ensures a cutting-edge, efficient, and secure mobile banking experience for our customers. Transaction Queuing In Offline Mode With Digital Transaction Signing In Midships' affirmID solution, we allow any transaction to be signed using a PKI approach similar to FIDO2 passkeys. The signing of a transaction in the affirmID solution can be carried out in offline mode and this can be used to queue signed transactions. When the app gains internet connectivity, the queued signed transactions can be posted to the backend services with the signature verified before processing.For more details on Midships' affirmID solution, please refer to this link: Midships | AffirmID, our low-code anti-fraud solutions for financial services Are you interested? If you would like to learn more, please contact ajit@midships.io
- SCA using Dependency Track (with AWS CDK Deployment Template)
Introduction Software Composition Analysis (SCA) is an essential activity in the development of software or infrastructure. It helps reduce the risk of shipping software or infrastructure containing known vulnerabilities within operating systems, 3rd party libraries or binaries. It can also help alert you when your systems which already in production are found to contain new CVEs. Dependency Track is an open-source application from OWASP that allows you to run an internal system that you can feed with details of your applications and infrastructure and track your overall risk position regarding these systems and the vulnerabilities that they may contain. This post should help you to understand how this is valuable for your organisation and also provide you with code to create a deployment of Dependency Track in AWS to allow you to test it out and hopefully reduce your software supply chain risk. Risk in the Software Supply Chain Log4J is a ubiquitous logging framework that is present in countless applications, both open-source and bespoke. Security researchers discovered a remote code exploitation vulnerability in this library in late 2021 (see CVE-2021-44228). A patch was released a month later that removed that vulnerability. But because of the widespread use of this library, it quickly became one of the most widely exploited vulnerabilities out there. This excellent article from Sophos describes the details of the exploit for those interested. Systems that used a version prior to the patched version were vulnerable to this exploit. Organisations that didn’t have good visibility into their software supply chain didn’t even know what systems they were running in production that were vulnerable to this attack. Some still don’t as it is still being exploited in 2024. Software Composition Analysis is the technique that you and your organisation can use to protect yourself with data about your production systems and the software supply chain risk inherent in them. How SCA using Dependency Track Works Software Composition Analysis involves creating a Software Bill of Materials (SBOM) at build time that describes the 3rd-party libraries and binaries present in the application or infrastructure components. An SBOM can easily be created using the Cyclone DX open-source tool in a format supported by Dependency Track. SBOMs can be created for any type of system that includes operating systems, binaries and or libraries used to host, run or add functionality to those systems. Examples include Java applications packaged with Maven, JavaScript applications packaged with Node Package Manager, Container Images packaged with Docker, Virtual Machine Images packaged with Packer and many more. The SBOM is uploaded to Dependency Track using its API interface or available open-source tools that provide the integration from your build system to the Dependency Track server. Once the SBOM has been received, it is analysed asynchronously, creating a matrix of data points within Dependency Track detailing the project (name, ID, version, parent etc.) along with the list of libraries and binaries that it holds. Licence information is also stored within Dependency Track to enable legal, risk and compliance teams to track the software licences that are in use in the systems created by your organisation. Once analysed in Dependency Track, the application version is continuously assessed against the frequently updated database of known Common Vulnerabilities and Exploits (CVE) listed by organisations like NIST, Sonatype and GitHub. If the application is found to be vulnerable to a newly listed CVE, the risk score of that application is updated accordingly. This risk score analysis of each application is maintained over the entire portfolio of applications and is displayed over time, from when the application version is first registered in Dependency Track. Using this tool your development, security and risk teams have a rich data set that they can use to prioritise the patching of all systems in the fleet. When new critical CVEs are released, all applications tracked with Dependency Track that are vulnerable to this new CVE are easily identified and can be then scheduled for patching. This provides an extremely powerful view of the software vulnerability posture of the organisation. Personally, the author has seen this pay dividends, allowing an organisation he worked with in the past to patch all systems that were affected by the Log4Shell vulnerability in a few days. Why is this better than other tools? While there are many systems available that provide SCA, they have drawbacks: Docker's Scout is specialised to a particular technology, Docker containers Dependabot can only be used if you are using GitHub Artifactory's X-Ray and Sonatype Life Cycle are expensive paid-for products that require significant effort to integrate with, especially in delivery pipelines Dependency Track is the best of breed in regarding the tracking your software supply chain risk. No other tool provides this level of detail for all types of applications and infrastructure. If you can create an SBOM for it, Dependency Track can track it. Deploying Dependency Track using the AWS CDK To help with our mission of reducing fraud and helping to keep our customers secure, we have developed a CDK-based deployment tool for anyone to use, that creates a deployment of Dependency Track in the Amazon Web Services (AWS) cloud platform. This deployment will allow your organisation to run this tool within your own private cloud, configured and managed by your team. Alternatively, the code can be used to see how the configuration can be adapted to run it within your own AWS or other cloud infrastructure. A Note on Cloud Costs This deployment will cost you roughly $300 USD per month to run. This is due to the large amount of memory required by Dependency Track to run, and the Aurora Postgres database used to store and secure the data that your organisation generates. If you choose at deploy time, to use the embedded database instead of Aurora Postgres, these costs will be reduced though the CDK program does not include external disk storage so this is not recommended as you will lose data if the ECS task is restarted. Overall, in terms of ROI, this is a small outgoing that can help you have greater visibility of your application security risk over time. If you don’t find it to be valuable, then a quick execution of the "cdk destroy" command will remove it from your AWS account. Dependency Track AWS Architecture Here is the code that you can use to deploy Dependency Track in your org: https://gitlab.com/midships-public/dependency-track-cdk The above AWS CDK code deploys Dependency Track by creating the following AWS resources: A VPC with public and private subnets An ECS Cluster backed by the minimum size of EC2 instance for running Dependency Track in the public subnet 2 ECS Tasks for the Dependency Track front end and API server components An optional Aurora Postgres database cluster for storing the data generated A load balancer to route traffic to the front end for the User Interface and to the API server for build time integration Does not currently include: HTTPS configuration for the load balancer Custom URL management linked to your Route 53 service This diagram, generated from the CDK output using the cdk-dia tool, shows the architecture: Deployment Prerequisites You need the following set up before you start: The AWS CDK installed A terminal program to run the CDK commands like MacOS Terminal, Git Bash for Windows, or a good old Linux prompt An AWS account with access rights to create resources The following code steps (using Bash or a similar shell program) will get the Dependency Track application running in your environment in minutes. Get The Code Grab the source code from our GitLab repository. We strongly suggest that you read the code before running the CDK program, so you can be sure you know what resources it will be creating and running inside your AWS account git clone https://gitlab.com/midships-public/dependency-track-cdk Run The Code Firstly, you will need to make valid AWS credentials available to the shell that you are using. There are a few ways to do this, the simplest is to export temporary credentials as environment variables in your shell environment. AWS provides you these on their login screen. Remember not to use the above process for production credentials. Better to use the aws cli to log in. Check the deployment options in the cdk.json file before you run the CDK commands, making sure they are suitable for your use case and environment. Note that the dependency.track.database.X parameters are only needed if dependency.track.database.embedded is set to false. "dependency.track.network.vpc.cidr": "10.25.0.0/24", "dependency.track.application.instance.type": "t3.xlarge", "dependency.track.database.embedded": false, "dependency.track.database.instance.type": "t4g.medium", "dependency.track.database.name": "dtrack", "dependency.track.database.username": "dtrack", "dependency.track.database.password": "OverrideThi$" Run the following CDK commands providing approval as needed: cdk synth --all cdk deploy --all Accessing Dependency Track When completed, the CDK program will output the URL that the application can be accessed from. Something like the following: Outputs: DependencyTrackApplicationStack.DependencyTrackFrontEndAddress = dependencytrack-1234567890.us-east-1.elb.amazonaws.com Stack ARN: arn:aws:cloudformation:us-east-1:1234567890:stack/DependencyTrackApplicationStack/d67430b2-f880-4639-a946-ad1db96015b8 ✨ Total time: 290.25s Grab the URL from the CDK output and put it into a browser. Remember that HTTPS is not configured on the ALB for this deployment so make sure you enter http explicitly in your browser address bar, for example: http://dependencytrack-1234567890.us-east-1.elb.amazonaws.com You should be presented with the login page. You can use the default admin credentials to get started: Username: admin Password: admin You will be forced to change the admin the first time you log in. Dependency Track Vulnerability Sources Dependency Track follows an eventual consistency model. It receives data from your inputs and from internet vulnerability databases and will parse and update these over time. When you first deploy the application, it will take some time for it to pull and parse all of the CVE data sources it is configured with out of the box. So give it some time, but in the mean time you can start uploading your data. Dependency Track Integration Here we will show 2 ways you can integrate with the Dependency Track server from the CLI. One using a bash script wrapping Docker’s internal SBOM generation capability and another integrated with the Maven build system. Before starting, go to the admin UI and update the permissions for the API Key already created for API-based integration, e.g.: http://dependencytrack-1234567890.us-east-1.elb.amazonaws.com/admin/accessManagement/permissions Add the following permissions to the Automation Team: BOM_UPLOAD PORTFOLIO_MANAGEMENT PROJECT_CREATION_UPLOAD VIEW_PORTFOLIO VIEW_VULNERABILITY Next, export two environment variables into your linux terminal environment for the next 2 integration approaches to use: export DEPENDENCY_TRACK_API_KEY=odt_qwertyuiopasdfghjklzxcvbnm export DEPENDENCY_TRACK_BASE_URL=http://dependencytrack-12345679890.us-east-1.elb.amazonaws.com:8081 Command Line Integration To make this easier, we created a bash script that wraps the "docker sbom" command and uses curl to upload the generated SBOM to the Dependency Track server. This is located in the root of the GitLab repository containing the CDK program. Again, its good security sense to read the script before executing it. Make sure you install the latest version of the Docker SBOM plugin first of all: https://github.com/docker/sbom-cli-plugin/tree/main The Docker SBOM capability can create an SBOM from any of your container images. This sca.sh script executes that command, then pushes the generated SBOM via API to your Dependency Track server. This example pulls 2 Ubuntu LTS container images from Docker Hub but you can use any that you have in your local container image cache. # Generate and upload SBOM for ubuntu:16.04 ./sca.sh --image ubuntu --tag 16.04 # Generate and upload SBOM for ubuntu:24.04 ./sca.sh --image ubuntu --tag 24.04 Then look at the Projects page in the Dependency Track UI for details of these images and the CVEs they contain. http://dependencytrack-123456789.us-east-1.elb.amazonaws.com/projects Maven Life Cycle Integration Dependency Track can be used to provide your teams with fast feedback on the presence of CVEs within the application or infrastructure they are creating at build time, on their local environment or in their build pipelines. The best time to do this kind of integration with Dependency Track is before committing to the source code repository, i.e. from a local development environment. Additionally, this integration should be a mandatory part of your build pipelines that run before code is merged to the main branch of your code repository. To facilitate this kind of development and build time validation of your application, a number of open-source tools exist. This example show how to use the Maven build system along with a Dependency Track plugin to get fast feedback on vulnerabilities in your application or infrastructure. Dependency Track Maven Plugin The author has previously created an open-source tool for integrating the Maven build system with the Dependency Track server. Full details can be found on the GitHub page for that project: https://github.com/pmckeown/dependency-track-maven-plugin. Check the profiles in the pom.xml file for an example on how to use this. Other Dependency Track Integrations Dependency Track has a rich integration ecosystem, both in terms of data sources for vulnerabilities and build and analysis tools for integration in your environment. Check them all out here: Ecosystem Overview Support to Integrate If you need help deploying Dependency Track, or implementing an integration to it from your pipelines, then get in touch with Midships. Our team of DevSecOps experts have considerable experience with this tool and can help you to get the best from it. We can help you to configure the Dependency Track application, tweak the cloud deployment or deploy on-premise, integrate your delivery pipelines to Dependency Track or help your security teams with understanding its use and setting internal policy regarding SCA for your teams to adhere to. Feel free to reach out to paul@midships.io if you want to know more.
- Performance Testing ForgeRock with K6 (for beginners :-))
Introduction: k6 is a modern, developer-friendly load testing tool built with Go programming language. It offers a simple and efficient way to create, execute, and analyse load tests. k6 provides a JavaScript-based scripting approach, making it accessible to both developers and non-technical users. Testing ForgeRock installation can be tricky as there are several components to ForgeRock, and testing each component and optimising it for the production environment can be time-consuming. The technical test is composed of 3 parts: Load testing is a crucial aspect of ensuring the performance and reliability of web applications. It involves simulating realistic user traffic to determine how an application behaves under different load conditions. Soak testing ensures that the system can operate under high demand for a longer period. Stress testing allows you to find the breaking point of the system. k6 is a powerful open-source load-testing tool that simplifies the process of testing and provides actionable insights. This guide will explore the key concepts and steps involved in load testing with k6 for ForgeRock setup. About the Author Debasis is a Senior ForgeRock Subject Matter Expert at Midships leading ForgeRock implementations at customers across South East Asia. For further details about this blog, please reach out on debasis.dwivedy@midships.io Installation and Setup: To get started with k6, you need to install it on your local machine or a dedicated load testing environment. k6 supports major operating systems like Windows, macOS, and Linux. The installation process is straightforward and well-documented in the official k6 documentation. The details of the installation can be found in the link below: https://k6.io/docs/get-started/installation/ Writing a k6 Test Script: k6 test scripts are written in JavaScript, allowing you to define realistic user scenarios, specify HTTP requests, and add assertions to verify the application's response. You can utilize the k6 API to simulate complex user interactions, handle dynamic data, and control test flow. I found this better than jmeter as sometimes the plugins in jmeter can be overwhelming and hard to understand. k6 allows programmers to write the test scripts as normal JavaScript using the request/response syntax and scale it up if and when required. It also provides fine granularity as the response from a previous request can be used to determine the next request. All these requests can be clubbed together to form a long/short test script which can be loaded independently to test specific scenarios. Below is a sample: import http from 'k6/http'; import { group } from "k6"; import { Counter } from 'k6/metrics'; let CounterErrors = new Counter("Errors"); let CounterDuration1 = new Counter("Login1"); let CounterDuration2 = new Counter("Login2"); let CounterDuration3 = new Counter("Login3"); const url ="https://am.example.com"; export function login1(token) { let params = { headers: { "accept-api-version": "protocol=1.0,resource=2.1", "Accept": "application/json", "Content-Type": "application/json", "X-OpenAM-Username": "testuser", "X-OpenAM-Password": "Password12345", }, }; let res = http.post(`${url}/am/json/realms/root/realms/test/authenticate`, {}, params); CounterDuration1.add(res.timings.duration); return tokenId; } export function login2(token) { let params = { headers: { "accept-api-version": "protocol=1.0,resource=2.1", "Accept": "application/json", "Content-Type": "application/x-www-form-urlencoded", "Cookie": token, }, redirects: 0, }; let payload ={ "scope": "openid", "response_type": "code", "client_id": "testClient", "redirect_uri": "http://example.com", "decision": "allow", "code_challenge": "j3wKnK2Fa_mc2tgdqa6GtUfCYjdWSA5S23JKTTtPF8Y", "code_challenge_method": "S256", "csrf": token, "state": "abc123", }; let res = http.post(`${url}/am/oauth2/test/authorize`, payload, params); CounterDuration2.add(res.timings.duration); let code=res.headers.Location.split('&')[0].split('?')[1].split('=')[1]; return code; } export function login3(code) { let params = { headers: { "Content-Type": "application/x-www-form-urlencoded", }, }; let payload ={ "code": code, "grant_type": "authorization_code", "client_id": "testClient", "code_verifier": "ZpJiIM_G0SE9WlxzS69Cq0mQh8uyFaeEbILlW8tHs62SmEE6n7Nke0XJGx_F4OduTI4", "redirect_uri": "http://example.com", }; let res = http.post(`${url}/am/oauth2/test/access_token`, payload, params); CounterDuration3.add(res.timings.duration); let access_token=res.json()["access_token"]; return access_token } export default function() { let token=login1(); let code=login2(token); let access_token=login3(code); if(!access_token) { CounterErrors.add(1); } } The above script name sample.js could be run by using the command below, which will run it once for one user: k6 run sample.js For more sample on how to write the scripts can be found from the link below: https://k6.io/docs/examples/ Configuring Load Scenarios: k6 provides various configuration options to simulate different types of load scenarios. You can define the number of virtual users (VUs), duration of the test, and ramp-up periods. Additionally, k6 allows for specifying thresholds and tolerances to determine the test's success criteria. The above script could be ramped up for multiple used by multiple users for a longer period of time using the command below: k6 run –vus 6 –duration 180m sample.js This will run the scripts for 6 users for 3 hours. For more sample on how to write the scripts can be found from the link below: https://k6.io/docs/examples/ Executing Load Tests: Once you have written your test script and configured the load scenario, you can execute the load test using the k6 command-line interface (CLI). k6 provides real-time test execution metrics and progress updates, enabling you to monitor the test's progress and identify any issues. We have integrated k6 run with a Grafana dashboard to visualize the progress of each test and send alerts if required. Below the dashboard for the run above. We have used Grafana+influxdb for our test example, but there are multiple ways to integrate. Please follow the link below for k6 and Grafana integration: https://k6.io/docs/results-output/grafana-dashboards/ It is important to note that we are using our own standalone k6 installation. There is a k6 cloud offering that operates similarly but without the need of adding your own infrastructure. For more details visit the link below: https://k6.io/docs/cloud/ Analysing Test Results: After the load test execution, k6 generates comprehensive test reports and result summaries. These reports include key performance indicators like response times, throughput, error rates, and more. You can leverage these insights to identify performance bottlenecks, evaluate system capacity, and make data-driven optimizations. The response to the above run was as below: k6 run --vus 6 --duration 180m sample.js data_received..................: 78 MB 16 kB/s data_sent......................: 25 MB 5.0 kB/s http_req_blocked...............: avg=155.07µs min=0s med=3.23µs max=110.01ms p(90)=4.47µs p(95)=4.96µs http_req_connecting............: avg=7.69µs min=0s med=0s max=3.83ms p(90)=0s p(95)=0s http_req_duration..............: avg=397.61ms min=0s med=300.28ms max=7.63s p(90)=896.04ms p(95)=1.38s { expected_response:true }...: avg=412.65ms min=12.1ms med=313.19ms max=7.63s p(90)=951.83ms p(95)=1.4s http_req_failed................: 3.67% ✓ 1326 ✗ 34741 http_req_receiving.............: avg=92.1µs min=0s med=91.76µs max=2ms p(90)=129.22µs p(95)=139.91µs http_req_sending...............: avg=32.58µs min=0s med=28.54µs max=10.42ms p(90)=43.86µs p(95)=49.93µs http_req_tls_handshaking.......: avg=139.17µs min=0s med=0s max=78.23ms p(90)=0s p(95)=0s http_req_waiting...............: avg=397.48ms min=0s med=300.12ms max=7.63s p(90)=895.95ms p(95)=1.38s http_reqs......................: 36067 7.255703/s iteration_duration.............: avg=1.15s min=514.67µs med=823.96ms max=30s p(90)=2.4s p(95)=2.8s iterations.....................: 12905 2.596136/s Login1.........................: 5617456.782563 1130.080107/s Login2.........................: 3079675.395106 619.54725/s Login3.........................: 5643533.119942 1135.32596/s vus............................: 3 min=3 max=3 vus_max........................: 3 min=3 max=3 If there is no visualization tool integration, then this output can be saved to a file for future analysis. Integrations and Extensibility: k6 integrates with popular tools and platforms like Grafana, InfluxDB, and New Relic, allowing you to visualize and analyse load test results in a more advanced manner. k6 also supports custom metrics, thresholds, and result exports, providing flexibility and extensibility to suit different testing requirements. There are several ways to generate the script automatically without writing them: Har-to-k6 : https://github.com/grafana/har-to-k6 Postman-to-k6: https://github.com/apideck-libraries/postman-to-k6 Swagger-to-k6: https://k6.io/blog/load-testing-your-api-with-swagger-openapi-and-k6/ Best practice would be to generate the scripts based on any of the method above and then modify it according to the requirement. More details about these integration tools can be found below: https://k6.io/docs/integrations/ Advanced Features and Best Practices: As you gain proficiency with k6, you can explore advanced features like test setup and teardown functions, custom performance profiles, distributed load testing, and more. Additionally, it is crucial to follow best practices like realistic test scenarios, data parameterization, and proper resource management to ensure accurate and reliable load testing results. Conclusion: Load testing with k6 empowers developers and testers to identify and address performance issues early in the development lifecycle. With its ease of use, flexibility, and powerful reporting capabilities, k6 proves to be an excellent tool for load testing web applications. By following this beginner's guide, you can get started with k6 and effectively conduct load tests to optimize your application's performance and user experience. I hope you enjoyed this blog. Please reach out if you have any queries, Debasis.
- Midships: Our Distinctive Consulting Methodology
By Ajit Gupta, CEO of Midships (ajit@midships.io) Before the inception of Midships, I found myself deeply involved in challenging IT delivery programs, working withinin some of the most prominent consulting firms in the industry. Our clientele was diverse, spanning sectors such as the public domain, financial services, logistics, and utilities. In each team that I became a part of, our relentless commitment often led us to conquer challenges that appeared insurmountable. Despite our successes, I consistently identified areas for improvement that were, at the time, outside my capacity to change. The genesis of Midships has been fueled by a desire to transform the consulting landscape, to align it with my convictions regarding the execution of consulting services. Initially, the focus was not on scaling this concept, but as it turned out, our distinctive consulting methodology seems to be scalable (time will tell), though it has presented more challenges than usual. Midships is unique – we are not just another name in the consulting industry, nor do we aspire to be. Say goodbye to the old way of doing things. Say hello to Midships. My Vision / My Aspiration / My Organisation / My Strategy Problem Solving, Not Selling – We immerse ourselves in comprehending the hurdles our clients face and showcase our solutions, allowing the inherent value we offer to do the talking. Mutual High Value – As a consulting firm dedicated to delivering high value, our goal is to provide benefits not only for ourselves but for our clients as well. If they cannot recognize our value, it prompts us to reevaluate our role. Empowerment Over Dependence – By imparting our knowledge and resources, we ensure that our clients are equipped to be independent after our engagement. Our partnership persists only if they perceive continuous value in our services. Preparedness Over On-the-Job Learning – We invest in training as necessary, empowering our teams to operate more efficiently with smaller, more agile groups, and to deliver results promptly and most importantly avoid on the job learning. Focus on Outcomes, Not Change Requests – Our commitment is to outcomes, not to change requests. We view challenges as opportunities to assist, not as pretexts for additional fees. Attitude and Capability Over Tenure – Experience is invaluable, but the right attitude and capability are often the keys to success. We recruit based on these traits, with our senior team offering insights from their diverse experiences. Ethics Before Profit – Our ethical principles guide our operations. We decline engagements that conflict with our values, a stance that has been tested and upheld even against proposals from major players. No Debt or Short-Term Thinking – We grow steadily to avoid debt and foster a sustainable work environment. Rewarding Integrity and Growth – We acknowledge that mistakes happen, but as long as they are not repeated and the intentions are sound, we support our team. This principle has been tested over the years, and I am proud to say that we have consistently acted in our team’s best interest, despite external pressures. Connecting Our Customers – We aim for our customers to trust the advice we provide. To achieve this, they need to connect with their peers and learn from them. We facilitate customer forums (where we are invited guests) to enable our customers to learn from one another. When Midships was first established, I was unsure whether a consulting firm with such values could prosper. Today, not only have we found success, but we are also flourishing! Putting Our Principles into Practice The past five years has been a steep learning curve and putting the vision into practice has been a challenge and has forced to dig deep and validate that our convictions are really more important than both money and prestige. In the process of upholding our convictions we have ruffled feathers with our partners, and this has not been easy. This is where we are today: Our hiring practices have evolved to recruit individuals who distinguish themselves by being multi skilled (see below), have an exceptional attitude, are eager to learn, deeply committed to our customers’ best interests and relish the challenge of overcoming obstacles. In exchange, we try to be generous and considerate. As a result our attrition is exceptionally low. We offer a transparent work environment where their voice is heard, they have opportunities for growth at their pace, equal pay for equal roles (regardless of location), profit sharing, and more. This unique approach has consistently resulted in positive feedback from our clients about our team. To maintain our focus on business outcomes, we prefer to undertake delivery on a Fixed Price basis. This approach allows us the flexibility to bring in the right people at the right time and to strengthen the team as needed. It also provides our customers with cost certainty. When determining the cost for fixed price, we provide a detailed breakdown of the anticipated SME effort required to deliver the scope and give our customers the opportunity to review and confirm that our expectations of effort are aligned. We create accelerators to enable us to work more quickly, consistently, predictably, and reliably. While we are not a products company and have no desire to become one, our accelerators are frameworks that can be easily integrated into a customer’s environment to drive outcomes. These frameworks are continuously updated and enhanced as we learn more, allowing our newest customers to immediately benefit from the learnings we’ve gained from our previous customers. What we don’t know, we learn and prepare in our own time. Our cloud bill is larger than I’d like, as I encourage my team to prove solutions prior to any engagement. This allows us to anticipate and inform customers of potential challenges and to acquire the necessary knowledge (or declare our gaps) before an engagement begins. While it can sometimes be painful, we have open and honest conversations with our customers. We own our mistakes, regardless of how difficult it may be. We connect and encourage our customers to connect with our SMEs—before, during, and after an engagement—to reach out if they need help. My team are passionate with what they do and is always happy to assist fellow engineers wherever possible. In fact, I can’t stop it even if I wanted to, so I don’t try. We are selective about our customers and have been known to turn away opportunities where a customer cannot align with our values. Our customers have high expectations of us, and similarly, we have high expectations of them. Here are a few examples of what we expect: Customer SMEs integrated into the delivery team so we approach the delivery and associated challenges as partners to achieve an outcome, moving away from a ‘vendor’ and ‘customer’ mindset. Honesty, so we can lean in and prepare for challenges upfront. The earlier the challenges are known, the more we can do to mitigate them or align resourcing to address them efficiently. If the values and approach of Midships resonate with you, please feel free to reach out to me at ajit@midships.io
- How Midships migrates Forgerock IDM from a Database to Forgerock DS without any disruption or interruption of service
By Ravivarma Baru, Senior Specialist at Midships (ravivarma.baru@midships.io) This article is intended for anyone who is looking migrate PingIdentity (ForgeRock) Identity Manager from using a Database to Directory Service. Midships has been leading these migrations seamlessly since 2019 and we thought it was time to share :-) Introduction: ForgeRock’s Identity Management Solution (IDM) makes it easier for organizations to manage the identity of their members. IDM uses managed objects to represent different kinds of identities such as users, groups, devices, credentials, etc. Attributes of these managed objects are usually synchronised with other systems including ForgeRock’s Directory Service(DS) which is used by Forgerock’s Access Manager(AM) as a identity store to enable authentication and authorisation of identities. IDM requires a data repository backend such as a database to store the managed objects and metadata (used for syncing). Problem Statement: Prior to IDM v6, IDM used a database as it’s primary data repository. Since v6, DS can also be used as IDM’s data repository and this both simplifies the solution by removing the operational overhead of managing a backend database and removes complexity around synchronizing which was previously required between the ForgeRock AM’s userstore and the IDM backend Database containing the managed objects. Midships has significant experience helping our customers migrate their IDM from using a database to using DS seamlessly and with zero downtime. In this blog, we will address how we do this. Challenges: The main challenges that we encountered at one of the banks at which this migration was performed: How to design the DS servers, such that: it can be consumed by both IDM as well as AM, without losing any data? it doesn’t duplicate any data element, example IDM managed object and AM identity object can be shared? How to deploy the newly designed DS servers, the IDM + AM servers (configuration updated to consume the new DS servers) without needing additional infrastructure or VMs? How to reconcile the data from the IDM current backend DB as well as the current Forgerock DS(that was used by AM as idRepo) to the new DS servers? How to use attributes that were defined in DB to hold certain data but in DS the corresponding attributes are operational and are not subjected to userchange? (More information in solution section) How to keep the data in sync after performing initial reconciliation to all the way till the traffic is switched over to updated AM, IDM and DS instances? Our Approach: Conclusion: Organisations and Enterprises often deem that migrating the IDM backend repo from JDBC to ForgeRock DS is difficult. Particularly with financial institutions, there is always an inertia of thought to stay where they are at, especially when it comes to customer or business data and it is impossible to migrate backend repo without downtime, without losing data or without spending additional cost on new infrastructure. But as discussed above having a holistic approach and defining solid automation will make the transition easy. One caveat though in migrating to the DS is it doesn’t support workflows, unlike a DB backend. In summary, the migration of the IDM backend repository from DB to DS can be achieved with zero downtime by designing the DS, planning ways to synchronize data, identifying the functional issues in lower environments, fixing them and automating the processes that can be automated. If you would like to learn more, please don't hesitate to contact me. Ravi
- AI and ML are not the answer to everything!
Author: Yuxiang Lin, Senior Solution Architect at Midships, yuxiang.lin@midships.io This article is for anyone looking to leverage the power of AI and ML to solve problems within their organisations. In recent years, Artificial Intelligence (AI) and Machine Learning (ML) have emerged as prominent forces within the dynamic technology landscape. Organisations across various industries are embracing this transformative wave, aiming to leverage its power to unlock unprecedented opportunities and innovations. However, the path to effectively adopt AI and ML remains unclear to some, with the risk of reducing these terms to mere buzzwords. This article aims to provide insights on how organisations can translate the immense potential of AI and ML into tangible value. It's important to note that AI and ML are not the solution to everything. To fully utilise their benefits, organisations need to embark on a carefully planned journey to reach their desired outcomes. It Is Not Magic Artificial Intelligence (AI) is a broad field of study that focuses on making machines intelligent. Within AI, Machine Learning (ML) is a subfield that's been the focus of academic and industrial interest for over 80 years. Hence, AI and ML are not new concepts. Within ML, the subfield of deep learning, characterised by neural network architecture, started in 1986 and this is where GenAI and Large Language Models (LLM) fall within. Fundamentally, ML has introduced to the world a new way to build software systems. Instead of creating logic and rules to handle data, ML uses data to establish these rules and logic. However, the logic and rules created by ML models through training differ from those created by human intelligence. A trained ML model is a product of intelligence derived from large volumes of data and therefore the rules and logic it entails are statistical in nature. It’s not the kind of intelligence you are thinking It's essential for one to understand that at its root, ML are algorithms designed for making statistical predictions based on given inputs, also known as features. The parameters regulating these algorithms are computed through iterative training to reduce the model’s loss/cost function using available data. Understanding the nature of machine learning (ML) helps us to see its limitations and challenges. Firstly, mainstream ML is best suited for problems that can be modelled statistically or where feedback loops can be easily simulated. This means that the belief in AI solutionism, where it assumes that given enough data, ML algorithms can solve any problem, is far from reality. It's important to recognise that causality-based logic, typically used by human intelligence to solve problems, is often lacking in ML models due to their statistical nature. This implies that when making decisions using ML outcomes, one must be careful when justifying those decisions based on causality. Moreover, advanced learning models often lack transparency and interpretability. This makes it challenging to identify specific modifications for desired behaviours and raises concerns when it's important to explain how a system produces results in a given context. It can take a while Secondly, the effort and time required to solve a problem using Machine Learning (ML) are typically harder to determine compared to other software development. This is primarily due to the non-deterministic nature of ML. Training ML models to achieve satisfactory performance involves experimentation and intuition. When organization faces challenges in addressing ML issues such as over-fitting (where the model cannot make generalised predictions with new inputs) and under-fitting (where the model is highly biased with training data), they will have to go through cycles of feature engineering, data collection, model training, and evaluation, this could easily lead the organization down a rabbit hole. It‘s all about the data Thirdly, the performance of a machine learning (ML) model in solving a problem largely depends on the data it was trained on. Essentially, a ML model is only as good as the data it receives. Data availability and quality can often become significant obstacles for many organisations trying to harness the power of AI and ML. Issues such as data privacy, potential bias, and discrimination in the collected data are also crucial considerations. Therefore, before embracing AI and ML, one must first embrace data. It takes much more to go production Finally, machine learning (ML) and artificial intelligence (AI) do not operate independently. These systems, like any other software, require thoughtful architecture and engineering considerations. Topics such as scaling, resiliency, monitoring, continuous integration and continuous deployment (CICD), API security, and infrastructure must be addressed before any proof-of-concept can progress towards production. Given the complexity and the nature of ML, a mature and robust DevOps pipeline tailored to the needs of a ML pipeline is a key success factor in an organization's journey to embrace AI and ML. It Is A Tool Not A Solution In the previous section, we discussed machine learning (ML), a subfield of artificial intelligence (AI). This section will outline our recommendations for how organisations should approach AI and ML when addressing real business challenges. It’s about the problem Some may say that ML and AI have refined the way we approach problems. This is true to some extent, they have given us tools to tackle problems previously deemed technically too challenging to address. However, real business problems are often more complex than a single technical hurdle. While designing a solution for a complex problem, it's crucial to let the problem drive the solution, not the other way around. AI and ML are tools, not solutions. To effectively address a problem, one needs to understand the problem and its context, define requirements, break down the problem into smaller sub-problems and finding appropriate approaches to address these sub-problems. So, organisations must not rush into using AI and ML to solve a problem before carrying out the essential solutioning processes. This avoids creating a solution that merely serves as buzzwords and fails to address the problem effectively. It’s like cooking Crafting a solution is akin to cooking; there are various approaches and tools that can be used as ingredients. However, there's no one-size-fits-all recipe. A proficient solution engineer must consider all aspects, including the timeline, budget, contextual limitations and constraints. The key is to choose the right ingredient, not necessarily the most extravagant one. This is a decision that the solution engineer, acting as the chef, must make. Sometimes AI and ML will be the main ingredients, other times they'll serve as seasonings, and sometimes they simply won't be the right ingredients for the dish. It’s not a safe haven Organisations and solution engineers should not view AI and ML as a safe haven. While AI and ML are powerful tools, they are not a panacea for avoiding complexity in the solutioning process. It can be tempting to turn to AI and ML when faced with a complicated problem where AI and ML use cases can be applied, thereby forgoing the more traditional human intelligence-based logic and rule approach. However, AI and ML also have their own challenges and limitations. Sometimes, stepping back may reveal new possibilities that could prove to be better options. It Has Potential But How To Create Value In the final section of this article, we will provide recommendations on how organisations can transform AI and ML's potential into actual business value. While AI and ML are not the solution to everything, they are crucial tools that can unlock innovations, providing organisations with a competitive edge. In this section we will use 3 examples to illustrate how organisations can use AI and ML in different problem contexts. It can be used for optimisation In manufacturing, machinery parameters such as vibration and temperature are continuously monitored. A common challenge can be scheduling maintenance to prevent production impact while minimising maintenance costs. A simple solution might be rotating equipment on a fixed schedule to minimise operation faults. However, this approach does not necessarily minimise costs. Given that data are collected that can predict machinery health, we can improve this simple solution by shifting from a fixed to a dynamic schedule. We can implement a maintenance scheduling service that uses a Machine Learning (ML) model trained with historical data. This model predicts when maintenance is likely needed based on the current data of each piece of equipment. The maintenance scheduling service will be designed with logic that uses these predicted timings and the planned working capacity to create optimised schedules, thus reducing costs. It can compliment other solutions For financial institutions, detecting fraud is essential for customer protection. While deep learning models might seem attractive for detecting abnormalities, they pose challenges such as real-time evaluation performance and flexibility in adjusting behaviours. An alternative solution is to create a rule and logic-based engine that identifies defined anomalies. Although this alternative resolves many issues, it lacks the ability to find new patterns as fraud techniques evolve. Therefore, we can improve this solution by using machine learning techniques to do offline clustering and classification of tracked data. This will uncover new rules and patterns that can be added to the fraud detection engine, allowing its rule and logic profile to evolve. It can help to save time Organisations frequently maintain various internal documents. Searching for relevant documents and extracting necessary information for specific tasks can be a tedious and time-consuming process. With the current technology stacks around LLM, the creation of an internal knowledge base has become much simpler. This enables the extraction of organization-specific information in a precise and relevant manner. Such an approach can be an efficient solution to alleviate some of the daily pain points within the organization. It does not need to revolutionary As a final note of this article, the journey for an organization to embrace AI and ML is not an overnight one. Rather than viewing AI and ML as all-encompassing solutions, start by building a strong foundation. Begin small with data exploration and visualisation to gain insights. Then, apply ML models to address specific problems. Over time, the value and benefits of AI and ML will naturally become apparent. Take the first step into this exciting field, and see what unfolds. I hope you enjoyed my first blog with Midships. Please reach out to me at yuxiang.lin@midships.io if you would like to learn more Yuxiang
- It's all about performance... Key Performance Tuning Considerations on ForgeRock
This Blog explains some attributes that Midships tunes across the ForgeRock Platform to meet the non-functional requirements that our Financial Services Customers have given us over the years. ForgeRock has proven time and time again to be a high-performing service and when tuned correctly can support in excess of 3,000 TPS (that's 259 million transactions a day). I am Ravivarma and one of Midships' ForgeRock Senior SMEs. If you have any queries on this Blog, please feel free to get in touch with me at ravivarma.baru@midships.io Caveats: Performance tuning depends on the deployment configuration including whether you are running services on VMs or Containers. The tuning suggestions mentioned below are generic and for consideration. ForgeRock Directory Services ServerId (server name) Keep the server name small and identifiable (ex: USDC1). This is because the server name is used in every replication activity, for storing the sync state. It does have a material size on the size of logfiles and the DB. Change Number Indexer disabling: If no other 3rd party application (outside of AM, so yes IDM is a third party application) is querying your Directory Service e.g. User Store or CTS then you can disable the change number indexer. Refer to changelog-enabled:disabled. This will increase the performance of Directory Service replication. Replication Groups: Pair the DS and RS in groups based on criteria like if they are in the same data centre for a faster replication changes (and reduce cross DC traffic). DB Verifier Thread: If the data size is vast, then change the schedule of the DB verifier thread that verifies the integrity of each backend in a DS to spread over different times in off-peak hours in different DSes. By default, it is scheduled at 12 AM on all the DSes. FSyncs: If the FSync delays are high (which can be observed in support extracts), better to upgrade the disks to better I/O disks. Indexes: Remove unwanted indexes. Many clients have indexes that include both equality and presence, but in most cases, the presence is never used. From DS 7.3.0, the startup shows a warning if both indexes are created and presence is not required. Those indexes can be removed which results in lesser storage of the data. ForgeRock Access Manager Session cache and JVM Settings Set the maximum number of sessions that needs to be cached in AM, (used to enable AM to search through the session validity and properties in memory instead of going to the CTS) . Configure > Server Defaults > Session > Session Limits, default value is 5000. This can be tuned as per client production load (can be safely set to 100,000s). Using G1GC, try to limit the GC pause time for minor GCs to 200ms, -XX:MaxGCPauseMillis=200 Increasing the session cache will require AM to have sufficient memory. This can be achieved by increasing the heap size, also heap size depends on other cache settings described below. Ldap Connections: For CTS: default max connections 10, tune it to 65 / 129 etc (2^n + 1 [one is reserved for reaper thread]). For US: default max connection is 10, which can be increased to 64 or other value depending on the load and the resources available on the server. For CFG: the default max connection is 10, which can be set to 64 or other value depending on the load and the resources available on the server. For Nodes / Authentication Modules: default is set as 10, this can be tuned to 65 ForgeRock AM 7 > Maintenance Guide > Tuning LDAP Connectivity Log level Log-level must be set in ERROR level for AM deployments on production, otherwise we will see performance implications with Message or Info Level. SDK caching The maximum number of user entries cached can be set using SDK caching: Configure > Server Defaults > SDK. Default 10000, tune this value based on the load. DN Caching Setup DN caching, so that the user DNs are cached (applicable only do this if the user dns are not changed very frequently). Configure > Server Defaults > SDK > caching and replica. Set the number of DNs to cache default is 1500. Session Blacklist Cache Size If using client-based sessions, with session blacklisting, then tune the session blacklist cache size to meet the performance requirements. Configure > Global Services > Session > Client-Based Sessions > Session Blacklist Cache Size. This can be set to 100,000s based on the performance requirements and tests. OAuth2 Provider Token Blacklist cache size OAuth2 tokens blacklist can be stored in AM cache, to reduce the number of calls to CTS. Configure > Global Services > OAuth2 Provider > Global Attributes > Token Blacklist Cache Size. This can be set to 100,000s based on the performance requirements and tests. ForgeRock Identity Manager Additional JVM Tuning As IDM handles data that has common keys example DS data or DB data, using string deduplication can also help : -XX:+UseStringDeduplication Sync Use Cases : service accounts : Ensure that correct privileges are given to service accounts that query or write data to the DS. bypass-acl setting can be turned on the DS, which will remove the additional check for service account to increase throughput. change numbers: Allow IDM to read change numbers for live sync, which will have improved performance. Note does not work if you use TimeStamp-based LiveSync. Indexes: Index the fields with which IDM reads data from DS or DB frequently. If livesync is used with timestamps, instead of change logs, index modify and createTimestamp in DS, so that IDM performance is optimal. If IDM uses managed objects and uses DB as backend, use searchable as true (default true) for the required fields of Managed Object, which helps to create index in DB for the field. Sync per objectclass: IDM sync uses single thread and it cannot handle more requests per second. To make IDM handle multiple requests in parallel, configure sync per target objectclass, instead of using a single top level object class. ForgeRock Identity Gateway Resource release issue: IG Custom Filters written in Java implements filter method, which returns a response with an object of class Promise. Promise is implemented asynchronously in ForgeRock Java APIs. We recommend you separate requests (that use the same filter and the default ForgeRock handlers) into different handlers & routes. If a single filter makes two or more http calls using the ForgeRock Handler object asynchronously, then the resources will not be released by the ForgeRock OOB code. Example: check below filter method of a filter class: public class ExampleFilter implements Filter { public ExampleFilter(Handler handler, ....) { } public Promise filter(Context context, Request, request, Handler handler) { Headers headers = request.getHeaders(); //any handler, even if another handler like amHanlder used to the right, will still result in issue Promise promiseToBeReturned = handler.handle( context, request).thenAlways(request::close). thenAsync(new AsyncFunction(){ Request newRequest = new Request().setHeaders(headers.add(newHeader)); newRequest.setEntity(....newEntity); handler.handle(context, newRequest). thenAlways(newRequest::close); }); return promiseToBeReturned; } } The above code shows that inside the filter there are two requests made. The second request won’t release the resources with the existing ForgeRock code. Instead use a different handler in the route, which can execute after this filter by using handler.next(…) or If there is no choice to change the route, then a custom HttpClient can be implemented, which can be used to do any of the inner calls and close them to release resources. Custom HttpClient can be used like Apache Http Client, set the connection resources like max-routes per connection, total connections, timeouts etc as required and tune them as per performance requirements. Logging Level: Use Error Level Logging for production use cases, don’t use Debug or Info level as they can cause performance implications. Loggers: Do not use system.ou.println(…) for logging for debug or error as it won’t release the resources and can cause thread locks. No of workers: Generally should be equal to the number of CPU Cores of the VM in which it is running, can be optionally tuned for routes having more frequent calls. Additional JVM settings : As IG requests and response generally have more data that is common like header names, request and response body keys, using string deduplication can also help : -XX:+UseStringDeduplication Other JVM Settings and Cache : (XMX and XMS) is based on a calculation that considers the size of keys and values and the cache size. Performance Tuning :: ForgeRock Directory Services Based on the deployment approach, whether to use shared cache or cache per backend and the above calculation, recheck Xmx and Xms options. Using G1GC, try to limit the GC pause time for minor GCs to 200ms, -XX:MaxGCPauseMillis=200 Ulimits Ulimits for number of open files need to be appropriately set for the user with which the application runs for all components, general recommendation is to set the soft limit to 65536 and hard limits to 131072 for open files. For those customers using JBOSS (we don't recommend it), then please take note of the additional performance tuning for IG and AM . Note IG now works in standalone mode (which we do recommend) and AM works well with Tomcat: Below can be adjusted in the standalone.xml Default worker Threads Set the default worker threads as per the load requirements. Ex: default value is 100, which can be set to 1000s, as per the load requirement. Threads (core, max, queue) : Adjust the core, max and queue threads to meet the requirements, can be set equal to the default worker threads. KeepAlive Time out: Default is 100 milliseconds, which is very low for IG and AM connections. This can be adjusted to 30/60 seconds depending on the request clients to the JBoss server. Max-connections: max-connections set in the http(s)-listener, default 20000, which can be removed or increased to allow JBoss to handle more number of concurrent connections. Below Java Settings can be applied in env.sh and env.properties : Max Http Connections : Adjust the apache max http connection threads and count in env.properties : org.apache.tomcat.util.net.MAX_THREADS=10000 (can be set as 10000s as per requirement) org.apache.tomcat.util.http.Parameters=100000 (can be set as 100000s as per requirement) There is a lot that goes into performance tuning, but when done well, ForgeRock is incredibly stable and performant. If you need help with tuning your ForgeRock stack, please don't hesitate to reach out to Midships at sales@midships.io I hope you found this blog a helpful starting point. Ravi.
- Maintaining ForgeRock Blog Series: ForgeRock ChangeLogs are not purging what do I do?
This is the first in a series of blogs detailing our experience resolving challenges that can arise when using the ForgeRock Platform. Applicable to ForgeRock Directory Services v7.3 and earlier About Taweh Ruhle Taweh leads the Midships’ DevSecOps practice and is a certified ForgeRock Access Manager engineer. For any queries, or feedback you may have, please contact Taweh me on taweh@midships.io What are ChangeLogs? The ForgeRock Directory Server running as a Replication Server by default will have a Changelog Database enabled (to be changed in future releases of ForgeRock). The database holds the change history of all Directory Servers (non replication servers) in the Replication Topology. In the case of Token Stores, it will hold all the token changes made over time. For User Stores, it will hold all User changes made over time. Changes in the changelog database are held for a pre-defined window. This window or period of time is controlled by an attribute called the purge-deplay. By default this is set to 3 days. Meaning the change-log database will only have changes for the last 3 days. How do you detect whether the ChangeLogs are purging on the Replication Servers (RS)? This is simple, you can monitor the volume assigned to the Replication Server. Default directory is: /path_to_ds_installation/changelogDb/ Check if the volume utilisation is increasing over a period of a couple of weeks. Check if there are ChangeLogs within that are more than 3 days old. Finally, check if you can see either of these terms mentioned in the error logs: disk-low-threshold disk-full-threshold If the utilisation is stable and you don't see ChangeLogs more than three days old or those terms in the error log, you can probably get that well-deserved coffee! What should happen? By default the change log should auto-purge changes that are older than 3 days (or set purge delay). What happens if I don't address the ChangeLog purge issue? A service outage once the disk-full-threshold is breached. Let me explain! When the available disk space falls below the disk-low-threshold, the directory server only allows updates from users and applications that have the bypass-lockdown privilege. When available space falls below disk-full-threshold, the directory server stops allowing updates, instead returning an UNWILLING_TO_PERFORM error to each update request. Your service will be done. See here for more details. When this happens it could prevent the following (depending on your architecture): For unauthenticated users, session information cannot be written to the Core Token Store Authenticated users will be unable to update session data or user store information. Note if you run statelessly then authenticated users maybe okay. NOTE By default disk-low-threshold is set to 5% of volume size + 5GB disk-full-threshold is set to 5% of volume size + 1GB. Example command to check Replication Server disk thresholds: ./dsconfig get-replication-server-prop \ --advanced --hostname --port \ --bindDn --bindPassword \ --provider-name "Multimaster Synchronization" \ --property disk-low-threshold \ --property disk-full-threshold \ --no-prompt -X Why didn’t the change logs purge? A couple of reasons are mentioned below, but there could be others: 1. You have used an LDIF export/import to populate the User Store. When you use LDIF Export on ForgeRock DS, it includes ForgeRock specific server metadata. When imported into a new replication topology this metadata causes a bad state which in turn affects the changelog purge from working as required. Make sure you use the following LDIF Export command as it strips out server meta data: Ldap-export with the below parameters will ensure that the server meta data is not exported. This will ensure that the changelog DB continue to be purged as per the setting for the purge delay. ./export-ldif --hostname --port 4444 --bindDN uid=admin --bindPasswordFile --backend-name userStore –-excludeAttribute ds-sync-hist --excludeAttribute ds-sync-state --excludeAttribute ds-sync-generation-id --ldifFile 2. Kill -9 had been used to terminate DS when it had hung. This resulted in the domain state being inconsistent. This should only be used after waiting a minimum of 200ms and then check that your DS is healthy afterwards. How do you purge the Change Logs? As warned by ForgeRock here, it is stated: Do not compress, tamper with, or otherwise alter changelog database files directly, unless specifically instructed to do so by a qualified ForgeRock technical support engineer.External changes to changelog database files can render them unusable by the server. i.e. Do Not Use “rm -RF *” as suggested by Google Bard recently! In order to purge the changelog, ForgeRock has created the below command: ./dsrepl clear-changelog See ForgeRock Backstage here for more details. Note: NEVER clear the changelog without using the above command. Approach 01: Without Downtime (we recommend you raise a ticket and work with ForgeRock before you proceed) Sample Replication Topology 3 User Stores with Customer Identities 2 Replication Servers with large Changelog Database Confirm that service is up and running as normal in your environment (Health Check) Verify that the current changelog database sizing. You can do this by list the changelogDb directory size locatated in the Directory Server (DS) instance folder. See below, yours should be significantly large than the example below: Shutdown a Replication Servers using the ./stop-ds command. In this scenario on RS1. Run the ./dsrepl clear-changelog command on RS1. This command requires the server to be offline. If you run It online, you will get a message like the below: Verify on RS1 that the changelogDb folder has been cleared down by checking the size as you did in step #B above. Startup RS1 using the ./start-ds command and confirm server starts successfully. You can verify this by checking the server.out logs on the server for Errors. At the end of the logs you should see a successful start message like the below: You should also see confirmation of connection to the other replication server. In this scenario connection to RS2. For instance: Current state of the environment: Following startup of RS1, it will sync up the changelog as required and align with the other Replication Servers in the Replication topology. In this case RS2. Verify that the changelogDb folder size has increased since the cleardown from step #D. Note: Monitor the size of the changelogDB folder for a few minutes and ensure it is either not increasing or increase is very minimal. This is to verify that it is aligned with the other replication servers. On RS1 run the ./dsrepl status command to verify the relication topology and status. Confirm everything (Delays, Domains, Records count, etc.) is as expected. Below is an example command and output: ./dsrepl status --hostname --port 4444 --trustAll \ --bindDn uid=admin \ --bindPassword \ -–showReplicas Example output: Note: In DS v7.2.0 and below, the ./dsrel status command does not include the Entry Count column. To see the entry count we suggest you run the below command: ./status --hostname \ --port 4444 --bindDn uid=admin \ --bindPassword \ --trustAll Repeat steps #B to #H for all other replication Servers. In this scenario, we will be doing this for RS2. Possible Challenges with Solution Approach 01 “Bad Generation ID” error when you check your replication status or start up the any of the DSes in the Replication Topology. A DS Generation ID is a calculated value (shorthand form) of the initial state of its dataset. The generation ID is a hash of the first 1000 entries in a backend. If the replicas' generation IDs match, the servers can replicate data without user intervention. This ID is used by both the DSes and Replications Servers in the topology. Steps to resolve on affected servers (at this point you should have raised a ForgeRock ticket especially when in production): Locate the Replication Domain / baseDN with the Bad Geberation ID error. FO instance it could be ou=Tokens. This can be seen from the DS server.out log file on server startup on from the ./dsrepl status command. Run the below command to remove the affected Replication domain identities ./dsconfig delete-replication-domain \ --provider-name "Multimaster Synchronization" \ --domain-name ou=tokens --hostname \ --port --bindDN \ --bindPassword \ --trustALL --no-prompt Verify that the domain has been removed successfully from the Replication Server ./dsconfig list-replication-domains \ --provider-name Multimaster\ Synchronization \ --hostname --port \ --bindDn \ --bindPassword \ --trustAll --no-prompt Sample Output Run the below command to re-add the affected Replication Domain / baseDN ./dsconfig create-replication-domain \ --provider-name "Multimaster Synchronization" \ --domain-name ou=tokens --set base-dn: ou=tokens \ --type generic --hostname \ --port \ --bindDn \ --bindPassword \ --trustAll --no-prompt Check the status of the replication using the below command: ./dsrepl status --hostname --port 4444 --trustAll \ --bindDn uid=admin \ --bindPassword \ --showReplicas NOTE: Solution 02 below can also be used to resolve “Bad Generations ID”. More details available here from ForgeRock. Approach 02: With Downtime or Blue-Green Deployment Disable traffic to all DSes in the replication Topology affected with the ever-increasing change log database. Check that all DSes from #1 above in the replication topology have the same data count. If it is not, either wait for all servers to catch-up or initialise all DSes with the same data. Below is an example command to get the status and records count (in v7+ only): ./dsrepl status --hostname --port 4444 --trustAll \ --bindDn uid=admin \ --bindPassword \ -–showReplicas Below is an example command to see records count (in v7.2.0- only): ./status --hostname \ --port 4444 --bindDn uid=admin \ --bindPassword \ --trustAll Below is an example command to initialize all DSes with the same data: ./dsrepl initialize --baseDN \ --bindDN uid=admin --bindPasswordFile \ --hostname "" --toAllServers \ --port 4444 --trustAll Shutdown all Directory from #1 above, including the Replication Servers that are affected. Below is an example command to stop the DS: ./stop-ds Execute the dsrepl clear-change log command on all Replication Servers in the replication topology Start up all Replication Servers in the replication topology Below is an example command to stop the DS: ./start-ds Start up all DSes with data to be replicated in the replication topology Below is an example command to stop the DS: ./start-ds Monitor the change log database on the RSes and confirm that it is decreasing after the purge delay is executed I hope this post has been useful (although I hope you don’t face this issue in the first place). In theory, Bad Generation ID etc should not occur from ForgRock DS v7.2.1 onwards. If you face this and need help, please contact Midships. We will be happy to help! To learn more about our Midships please get in touch with us at sales@midships.io