Optimizing Healthcare ETL Pipelines with Hybrid Cloud Data Warehousing: A Case Study Using Snowflake and Azure Data Factory
DOI:
https://doi.org/10.63282/3050-9416.IJAIBDCMS-V3I3P111Keywords:
Healthcare, ETL, Azure Data Factory, Snowflake, Hybrid Cloud, Data Warehousing, Data Integration, HL7, HIPAA, Real-Time ProcessingAbstract
The exponential growth of healthcare data driven by electronic health records (EHRs), Internet of Medical Things (IoMT) devices, and evolving regulatory mandates necessitates scalable and secure data integration pipelines. Traditional ETL architectures struggle to meet the demands of real-time processing, hybrid cloud interoperability, and compliance with standards such as HIPAA and HL7. This paper presents a hybrid cloud ETL framework leveraging Azure Data Factory (ADF) and Snowflake to enable scalable, secure, and resilient healthcare data pipelines. The architecture integrates on-premises clinical systems with cloud-native services using a self-hosted integration runtime, facilitating ingestion, transformation, and analytics at scale. We evaluate the system through a real-world deployment at a multihospital healthcare network in the Midwest United States. The implementation achieved a 41% reduction in data latency and a 60% decrease in infrastructure overhead compared to legacy systems. Key contributions include dynamic schema handling, endto-end encryption, audit-ready transformation workflows, and optimized parallel loading. We also identify challenges around cost governance, schema drift, and network reliability. This work provides a replicable model for healthcare organizations seeking to modernize their data engineering infrastructure using hybrid cloud technologies
References
1. Microsoft Azure, “Azure data factory documentation,” 2020. [Online]. Available: https://learn.microsoft.com/en-us/azure/data-factory/ introduction
2. T. C. Ong, M. G. Kahn, B. M. Kwan, T. Yamashita, E. Brandt, P. Hosokawa, C. Uhrich, and L. M. Schilling, “Dynamic-etl: a hybrid approach for health data extraction, transformation and loading,” BMC medical informatics and decision making, vol. 17, no. 1, p. 134, 2017. [Online]. Available: https://doi.org/10.1186/s12911-017-0532-3
3. S. K. Singu, “Designing scalable data engineering pipelines using azure and databricks,” ESP Journal of Engineering & Technology Advancements, vol. 1, no. 2, pp. 176–187, 2021. [Online]. Available: https://www.espjeta.org/jeta-v1i2p119
4. H. Sullivan and M. Lin, “Cloud-centric iot data processing: A multi-platform approach using aws, azure, and snowflake,” International Journal of AI, BigData, Computational and Management Studies, vol. 2, no. 1, pp. 12–23, 2021. [Online]. Available: https://ijaibdcms.org/index.php/ijaibdcms/article/view/26
5. R. Mukherjee and P. Kar, “A comparative review of data warehousing etl tools with new trends and industry insight,” in 2017 IEEE 7th International Advance Computing Conference (IACC), 2017, pp. 943– 948.
6. S. Anand, “Comparative analysis of hadoop and snowflake in handling healthcare encounter data,” International Journal of AI, BigData, Computational and Management Studies, vol. 2, no. 2, p. 44–54, 2021. [Online]. Available: https://ijaibdcms.org/index.php/ijaibdcms/article/view/181
7. Iyengar, A. Kundu, U. Sharma, and P. Zhang, “A trusted healthcare data analytics cloud platform,” in 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), 2018, pp. 1238–1249.
8. P. Trakadas, N. Nomikos, E. T. Michailidis, T. Zahariadis, F. M. Facca, D. Breitgand, S. Rizou, X. Masip, and P. Gkonis, “Hybrid clouds for data-intensive, 5g-enabled iot applications: An overview, key issues and relevant architecture,” Sensors, vol. 19, no. 16, 2019. [Online]. Available: https://www.mdpi.com/1424-8220/19/16/3591
9. V. Salapura, “Hipaa compliant cloud for sensitive health data,” in Proceedings of the 7th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER,, INSTICC. SciTePress, 2017, pp. 596–602.
10. S. R. Sukumar, R. Natarajan, and R. K. Ferrell, “Quality of big data in health care,” International Journal of Health Care Quality Assurance, vol. 28, no. 6, pp. 621–634, 07 2015. [Online]. Available: https://doi.org/10.1108/IJHCQA-07-2014-0080
11. K. C. Gonugunta and K. Leo, “The unexplored territory in data ware housing,” The Computertech, vol. 5, pp. 31–39, 2019. [Online]. Available: https://www.yuktabpublisher.com/index.php/TCT/ article/view/228
12. D. Seenivasan, “Optimizing cloud data warehousing: a deep dive into snowflake’s architecture and performance,” International Journal of Advanced Research in Engineering and Technology (IJARET), vol. 12, no. 3, pp. 951–962, 2021. [Online]. Available: https: //ssrn.com/abstract=5148190
13. L. Marco-Ruiz, D. Moner, J. A. Maldonado, N. Kolstrup, and J. G. Bellika, “Archetype-based data warehouse environment to enable the reuse of electronic health record data,” International Journal of Medical Informatics, vol. 84, no. 9, pp. 702–714, 2015. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1386505615300058
14. R. Goss and L. Subramany, “Journey to a big data analysis platform: Are we there yet?” in 2021 32nd Annual SEMI Advanced Semiconductor Manufacturing Conference (ASMC), 2021, pp. 1–7.
15. S. Nepal, R. Ranjan, and K.-K. R. Choo, “Trustworthy processing of healthcare big data in hybrid clouds,” IEEE Cloud Computing, vol. 2, no. 2, pp. 78–84, 2015.