A big four bank

Modernizing IT Operations for System Reliability for a Big four Bank – Altimetrik

Altimetrik’s SRE transformation helped a global bank reduce downtime, automate operations, and achieve real-time system visibility worldwide.

December 8, 2024
5 minute read

90%

Reduction of MTTR for PO Incidents

27%

Cost Optimization on Underutilized Systems

32%

License and Operational Cost Reduction
Share

Background

The financial industry is increasingly relying on technology to provide services to customers. As a result, banks need to ensure that their systems are reliable, scalable, and efficient. Additionally, they operate in a highly regulated industry where downtime or outages can result in significant financial losses or regulatory penalties.

Site Reliability Engineering (SRE) is transforming global financial services companies by providing a set of platforms and practices that enable them to deliver more reliable and scalable services to their customers. SRE practices focus on building and operating software systems that are highly reliable, scalable, efficient and reduce likelihood of outages or downtime.

Our client, a Global Investment Bank and Financial Services conglomerate, is present across 160 countries providing payments, cards, cash management, working capital and trade solutions to companies, and governments, and other big institutions. With over $13 trillion in assets under custody, it also integrates the capabilities of markets with a trading floor in more than 80 countries. Since its tech-operations are huge and span several geographies, the company wanted a comprehensive strategy that can simplify monitoring, enable system tracing and fully automate their tech operations.

The end goal was to get real time 360-degree insights on collective view of system’s health, service management activities, product quality index, missed revenue and total cost of ownership.

Centralized Observability to get real-time insights

Financial Insights (TCO)

Time-based calculation of planned vs. actual financial cost of building and running feature, and missed revenues.

Feature Health Checks

Real time view of service / feature availability and performance metrics vs. defined targets

Service Management

View into deployments and incidents – system response and recovery time, and product quality.

Real Time Alerting

Detection of business and technology anti-patterns based on benchmark KPIs with trend analysis to.

Challenge

Altimetrik’s SRE Transformation team engaged with the client to help define, deploy scalable site reliability engineering framework, policies, and procedures to modernize their IT operations. In our discovery phase we mapped their current way of working and identified certain challenges impacting their system reliability, such as:

  1. Scattered dashboards with noisy reactive alerts (~1000 alerts/day).
  2. No end-to-end traceability.
  3. Tech deliverables were not up to the mark and did not match client’s business priorities.
  4. Multiple tickets SLA breaches with delayed RCAs.
  5. Fragmented SRE tool adoption that resulted in high maintenance cost.

Solution

Our team outlined a 12-month transformational roadmap and assisted in adopting SREfoundational services, optimizing observability capabilities and developed an automated suite to provide self-healing capabilities.

1. Cockpit Controller

We developed a self-service automation platform to support critical flow remediation and introduced system throttling to manage event queues and transition of payment methods.

2. ISDMS Integration

Simplified monitoring, and constructed dashboards to capture average request services on active nodes setup of Prometheus and HA Proxy Integration to highlight non-utilized / underutilized nodes.

3. Mission Control

Consolidated monitoring and logging services with single interface providing comprehensive view into live traffic. Also optimized logging and monitoring by distributed tracing to reduce MTTD.

Results

We optimized observability capabilities and developed automated suites to provide self-healing capabilities and reduced their operational toil.

  1. 1. SLO based alerting.
  2. Automated service management.
  3. Eliminated development team overhead in supporting production events.
  4. Reduced operational and maintenance cost for SRE tools.
  5. Predictive capacity planning and management.

Accelerate your digital evolution

Contact Us

Contact Us

We'd love to hear from you.
Contact Us

Amit singh

“Amit Singh is the Chief Strategy Officer and Chief of Staff to the CEO at Altimetrik, where he drives corporate strategy, growth acceleration, and value creation through transformation initiatives. In this dual role, he partners closely with leadership teams, investors, and the board to align business strategy with sustained, technology-driven growth.

With over two decades of experience at the intersection of technology, business, and transformation, Amit brings a unique perspective on how organizations can innovate and adapt in a rapidly evolving digital landscape. His career has been defined by building high-performing teams, scaling innovative platforms, and driving organizational change to deliver lasting impact.

Before joining Altimetrik, Amit held senior leadership roles at Visa, where he led technology strategy, engineering, and product development for Real-Time Payments and the Visa Developer Platform. Earlier, he served as Chief Product Officer at a startup and spent more than a decade at Oracle, leading product and engineering teams across a wide range of enterprise software applications.”

Our expertise
Before we proceed..

Altimetrik is committed to protecting your personal information. To apply for a position, you will need to provide your email address and create a login. Your information will be used in accordance with applicable data privacy laws, our Privacy Policy, and our Privacy Notice.

Explore More