Skip links

Limiting the Impact: Exploring Blast Radius Management in Software Systems

Jump To Section

Blast Radius Management

What constitutes a blast radius?

Most, if not all, of us are familiar with the Chernobyl nuclear plant disaster, deemed the worst in human history, which occurred on April 26, 1986. The aftermath, both in terms of cost and casualties, is still evident and palpable today. This tragic event resulted from a simulated exercise aimed at testing the resilience of the reactor’s coolant systems. With proper planning and analysis, the impact could have been mitigated. The term blast radius in software delivery, which measures the impact area of changes, has become associated with this disaster.

While we are not dealing with a scenario that could lead to such extensive casualties, exercising due diligence is imperative to minimize adverse impacts on any production system whenever changes are implemented. 

In the tech world, metaphorically speaking, the severity of impact is gauged as the blast radius. Reliability stands out as a crucial key performance indicator (KPI) for both systems and tech teams. Therefore, it is essential to be well-acquainted with the tools and processes to limit the blast radius in software delivery.

Salesforce, like numerous other platforms and microservices, offers features and patterns designed to fulfill multiple purposes concurrently if well understood. Simple features can serve as an excellent strategy for disaster control or mitigating the blast radius in adverse conditions.

It’s important to note that the term “blast radius” is utilized to measure the magnitude of disasters in various aspects of the software lifecycle, including microservices, security, reliability, cloud infrastructure, deployment, and access management. However, this write-up primarily focuses on the security and reliability of applications.

Effects are determined by layers of software. In the context of the blast radius perspective in Salesforce, the impact layers are typically conceptualized as follows:

Blast Radius Management
Blast Radius Management

In this blog, I will elaborate on the patterns that have proven to be both widespread and effective in constraining the blast radius within the realm of software. These patterns include:

  1. Bulkhead Pattern
  2. Circuit Breaker Pattern
  3. Service Registry Pattern

Strategies for Damage Mitigation

Bulkhead Pattern

When discussing security concerns, our immediate focus often shifts to potential hacks and unauthorized access. Indeed, implementing access control measures and robust security practices forms an effective strategy for disaster management. This damage-limiting approach is known as the bulkhead pattern, borrowing its name from naval terminology. 

Similar to a compartment in a ship designed to contain damage and enable other sections to function accurately (albeit at reduced capacity), the bulkhead pattern in software swiftly isolates unforeseen issues. This ensures minimal impact on customers while streamlining the process of identifying and resolving problems. Ultimately, this approach creates a win-win situation for all stakeholders involved.

Some considerations for scenarios on the Salesforce platform:

a) Ensure that any newly developed features are encapsulated within permission sets or custom permissions, akin to creating bulkheads. Distribute features and accesses across these permission sets or custom permissions to maximize the effectiveness of this pattern.

b) Implement a strategy to compartmentalize features into distinct individual components, acting as bulkheads. This could involve employing an apex helper pattern, creating reusable Lightning Web Components (LWC) dedicated to specific features, or establishing an integration framework comprised of multiple components serving different purposes sequentially. Alternatively, consider a service-based integration framework to further enhance compartmentalization.

In the event of a change deployed in the production environment posing a potential adverse impact on system performance or security, promptly implementing a bulkhead extraction can mitigate the extensive consequences. This bulkhead cut-off can be automated using try-catch programs to facilitate real-time responsiveness. By doing so, the resulting damage can be confined to the logic layer or, at most, the process encapsulated within the bulkhead.

Circuit Breaker Pattern

Consider it a safeguard or an automated ally to ensure uninterrupted application functionality—the governor limits inherent in all cloud-based or multi-tenant systems. These limits regulate resource usage, ensuring each process receives its allocated resources for seamless operation. However, this control can lead to errors if a process exceeds its limits, halting the operation. In scenarios where the halted process is critical and another system is dependent on its completion, the consequences could be financially severe.

In a real-world analogy, imagine needing to catch a flight, and your taxi breaks down on the way to the airport. The solution? Have a backup plan—get another cab and reach your destination. While not a perfect analogy, it emphasizes the importance of having backup processes or bypass mechanisms for critical operations.

This safety net is embodied in the Circuit Breaker Pattern and Retry Pattern. When employing the Circuit Breaker pattern, development teams can focus on handling dependencies’ unavailability rather than merely detecting and managing failures. For instance, if a team is developing a website page reliant on ContentMicroservice for a widget’s content, they can make the page available without the widget’s content when ContentMicroservice is unavailable.

For the Salesforce platform, specific considerations include:

a) Implement checks for resource limits on these features, incorporating clauses to restrict resource usage. Salesforce’s built-in timeout limits in many libraries help control resources, but additional areas require attention.

b) Have a catch bypass ready for these features, flagging the issue and routing the process to an alternative channel or logic (Plan B). For instance, when attempting to retrieve data from an integrated system, if a network issue or system downtime occurs, an alternate path should dictate the next steps. While it may not yield exact results, it keeps the system running by minimizing the impact. Simultaneously, implement a robust flagging mechanism for impacted records, providing a clear handle for corrective measures.

c) Consider retries in running processes with a threshold limit, ensuring a cap on the number of retry attempts.

Service Registry Pattern 

While this pattern may not directly reduce the blast radius, it significantly enhances operational reliability. The service registry acts as a repository storing information about services, including details about their instances and locations. In a microservices application, this pattern allows the application to dynamically search the repository for an available service instance, avoiding reliance on static connections. Before providing the service’s location, the registry may perform a Health Check API invocation to ensure the service’s availability.

Conclusion

In conclusion, system reliability is paramount for business success. Whether intentional or inadvertent, any changes to the business system should not result in a system failure or disrupt business operations. The patterns discussed here represent a subset of innovative ideas aimed at maintaining system resilience. While additional ideas are encouraged, these guidelines provide a solid foundation for establishing a robust application architecture.

Picture of Mohammad Parwez Akhtar

Mohammad Parwez Akhtar

Suggested Reading

Ready to Unlock Your Enterprise's Full Potential?

Michael Woodall

Chief Growth Officer of Financial Services

Michael Woodall, as the Chief Growth Officer of Financial Services at Altimetrik, spearheads the identification of new growth avenues and revenue streams within the financial services sector. With a robust background and extensive expertise, Michael brings invaluable insights to his role.

Previously, Michael served as the Chief of Operations and President of the Trust Company at Putnam Investments, where he orchestrated strategic developments and continuous operational enhancements. Leveraging strategic partnerships and data analytics, he revolutionized capabilities across investments, retail and institutional distribution, and client services. Under his leadership, Putnam received numerous accolades, including the DALBAR Mutual Fund Service Award for over 30 consecutive years.

Michael’s dedication to industry evolution is evident through his involvement with prestigious organizations such as the DTCC Senior Wealth Advisory Board, ICI Operations Committee, and NICSA, where he served as Chairman and now holds the position of Director Emeritus. Widely recognized as an industry luminary, Michael frequently shares his expertise with various divisions of the SEC, solidifying his reputation as a seasoned presenter.

At Altimetrik, Michael plays a pivotal role in driving expansion within financial services, leveraging his expertise and Altimetrik’s Digital Business Methodology to ensure clients navigate their digital journey seamlessly, achieving tangible outcomes and exponential growth.

Beyond his corporate roles, Michael serves as Chair of the Boston Water & Sewer Commission, appointed by the Mayor of Boston, and is actively involved in various philanthropic endeavors, including serving on the board of the nonprofit Inspire Arts & Music.

Michael holds a distinguished business degree from Northeastern University, graduating with distinction as a member of the Sigma Epsilon Rho Honor Society.

Anguraj Kumar Arumugam

Chief Digital Business Officer for the U.S. West region

Anguraj is an accomplished business executive with an extensive leadership experience in the services industry and strong background across digital transformation, engineering services, data and analytics, cloud and consulting.

Prior to joining Altimetrik, Anguraj has served in various positions and roles at Globant, GlobalLogic, Wipro and TechMahindra. Over his 25 years career, he has led many strategic and large-scale digital engineering and transformation programs for some of world’s best-known brands. His clients represent a range of industry sectors including Automotive, Technology and Software Platforms. Anguraj has built and guided all-star teams throughout his tenure, bringing together the best of the techno-functional capabilities to address critical client challenges and deliver value.

Anguraj holds a bachelor’s degree in mechanical engineering from Anna University and a master’s degree in software systems from Birla Institute of Technology, Pilani.

In his spare time, he enjoys long walks, hiking, gardening, and listening to music.

Vikas Krishan

Chief Digital Business Officer and Head of the EMEA region

Vikas (Vik) Krishan serves as the Chief Digital Business Officer and Head of the EMEA region for Altimetrik. He is responsible for leading and growing the company’s presence across new and existing client relationships within the region.

Vik is a seasoned executive and brings over 25 years of global experience in Financial Services, Digital, Management Consulting, Pre- and Post-deal services and large/ strategic transformational programmes, gained in a variety of senior global leadership roles at firms such as Globant, HCL, Wipro, Logica and EDS and started his career within Investment Banking. He has developed significant cross industry experience across a wide variety of verticals, with a particular focus on working with and advising the C-Suite of Financial Institutions, Private Equity firms and FinTech’s on strategy and growth, operational excellence, performance improvement and digital adoption.

He has served as the engagement lead on multiple global transactions to enable the orchestration of business, technology, and operational change to drive growth and client retention.

Vik, who is based in London, serves as a trustee for the Burma Star Memorial Fund, is a keen photographer and an avid sportsman.

Megan Farrell Herrmanns

Chief Digital Officer, US Central

Megan is a senior business executive with a passion for empowering customers to reach their highest potential. She has depth and breadth of experience working across large enterprise and commercial customers, and across technical and industry domains. With a track record of driving measurable results, she develops trusted relationships with client executives to drive organizational growth, unlock business value, and internalize the use of digital business as a differentiator.

At Altimetrik, Megan is responsible for expanding client relationships and developing new business opportunities in the US Central region. Her focus is on digital business and utilizing her experience to create high growth opportunities for clients. Moreover, she leads the company’s efforts in cultivating and enhancing our partnership with Salesforce, strategically positioning our business to capitalize on new business opportunities.

Prior to Altimetrik, Megan spent 10 years leading Customer Success at Salesforce, helping customers maximize the value of their investments across their technology stack. Prior to Salesforce, Megan spent over 15 years with Accenture, leading large transformational projects for enterprise customers.

Megan earned a Bachelor of Science in Mechanical Engineering from Marquette University. Beyond work, Megan enjoys playing sand volleyball, traveling, watching her kids soccer games, and is actively involved in a philanthropy (Advisory Council for Cradles to Crayons).

Adaptive Clinical Trial Designs: Modify trials based on interim results for faster identification of effective drugs.Identify effective drugs faster with data analytics and machine learning algorithms to analyze interim trial results and modify.
Real-World Evidence (RWE) Integration: Supplement trial data with real-world insights for drug effectiveness and safety.Supplement trial data with real-world insights for drug effectiveness and safety.
Biomarker Identification and Validation: Validate biomarkers predicting treatment response for targeted therapies.Utilize bioinformatics and computational biology to validate biomarkers predicting treatment response for targeted therapies.
Collaborative Clinical Research Networks: Establish networks for better patient recruitment and data sharing.Leverage cloud-based platforms and collaborative software to establish networks for better patient recruitment and data sharing.
Master Protocols and Basket Trials: Evaluate multiple drugs in one trial for efficient drug development.Implement electronic data capture systems and digital platforms to efficiently manage and evaluate multiple drugs or drug combinations within a single trial, enabling more streamlined drug development
Remote and Decentralized Trials: Embrace virtual trials for broader patient participation.Embrace telemedicine, virtual monitoring, and digital health tools to conduct remote and decentralized trials, allowing patients to participate from home and reducing the need for frequent in-person visits
Patient-Centric Trials: Design trials with patient needs in mind for better recruitment and retention.Develop patient-centric mobile apps and web portals that provide trial information, virtual support groups, and patient-reported outcome tracking to enhance patient engagement, recruitment, and retention
Regulatory Engagement and Expedited Review Pathways: Engage regulators early for faster approvals.Utilize digital communication tools to engage regulatory agencies early in the drug development process, enabling faster feedback and exploration of expedited review pathways for accelerated approvals
Companion Diagnostics Development: Develop diagnostics for targeted recruitment and personalized treatment.Implement bioinformatics and genomics technologies to develop companion diagnostics that can identify patient subpopulations likely to benefit from the drug, aiding in targeted recruitment and personalized treatment
Data Standardization and Interoperability: Ensure seamless data exchange among research sites.Utilize interoperable electronic health record systems and health data standards to ensure seamless data exchange among different research sites, promoting efficient data aggregation and analysis
Use of AI and Predictive Analytics: Apply AI for drug candidate identification and data analysis.Leverage AI algorithms and predictive analytics to analyze large datasets, identify potential drug candidates, optimize trial designs, and predict treatment outcomes, accelerating the drug development process
R&D Investments: Improve the drug or expand indicationsUtilize computational modelling and simulation techniques to accelerate drug discovery and optimize drug development processes