Streamlined financial reporting from over 18 source systems using Databricks on AWS, achieving a 40% reduction in operational costs for a Fortune 500 healthcare company

Think

About Life Sciences and Analytical Instrument Company

Our client, a Fortune 500 organization, is a global leader in Life Sciences and Analytical Instruments, with over 80,000 employees worldwide and annual revenues exceeding $44 billion.

Technology

  • KPI Partners_Databricks Partnership
  • AWS KPI Partners
  • Databricks Unity Catalog-1
  • Power-Bi-KPI Partners
  • Apache Airflow
  • AWS S3 KPI Partners

The Objectives

Our client aimed to achieve parallel data integration from over 18+ source systems, including SAP, Oracle Cloud ADW, Progress, SQL Server, DB2, and various other databases. Their goal was to streamline data integration for the GLA finance module for different business units enabling near-real-time executions. They sought to harness Databricks Delta Lakes and an ETL Process to provide timely insights to the business team.

 

Challenge

  • Managing Parallel Data: Handling parallel deletes, inserts, and updates within a tight 30-minute window was challenging.
  • Lengthy Pipeline Execution: The existing pipeline execution time was 4 hours and 15 minutes, requiring significant improvement.
  • Inefficient Data Handling: Data fragmentation from 18+ source systems led to inefficient parallel data operations.
  • Data Catalog and Analytics Gaps: The absence of a centralized data catalog hindered data discovery and analysis.
  • Data Security Concerns: Inadequate role-based security measures posed risks to data integrity and confidentiality

 

Solution 

  • BUILD DELTA LAKE: We leveraged Databricks Delta Lake for seamless data integration, from 18+ source systems into the cloud and implemented CI/CD pipeline using GitHub Actions for efficient code deployment
  • UNITY CATALOG: Introduction of the Unity catalog for enhanced data analytics capabilities and democratization of data
  • 2X FASTER ETL PROCESSING: Improved ETL process has reduced execution time by 50%, doubling efficiency, aided by Apache Airflow for data orchestration
  • DATABRICKS INTEGRATION: Seamlessly integrated Databricks with diverse environments, including Oracle Cloud, SQL Server, DB2 and Oracle On-Premises databases, with the support of the Spark engine for data processing.
  • Efficient Data Management: Utilized AWS S3 for unified storage of source OLTP data, streamlining data management

 

Solution Architecture

Generic Databricks Medallion Architecture on AWS (1)

 

Impact

  • 50% Faster Data Processing: Our ETL optimization slashed execution time in half, accelerating insights
  • 40% Lower Operational Costs: Databricks job clusters reduced costs significantly
  • Minimized Data Discrepancies: The Unity Catalog improved data accuracy
  • Enhanced Data Security: Robust security measures protected sensitive data

Comments

Comments not added yet!

Your future starts today. Ready?

kpi-top-up-button