Designing Data and Analytics Ecosystems for High Volume Transaction Processing Applications

Raj Kiran Chennareddy

doi:10.63282/3050-9416.IJAIBDCMS-V2I2P111

Authors

Raj Kiran Chennareddy Data & Analytics Senior Manager, Citibank NA. Author

DOI:

https://doi.org/10.63282/3050-9416.IJAIBDCMS-V2I2P111

Keywords:

Data And Analytics Ecosystem Design, Integrated Operational And Analytical Systems, Application-Oriented Data Architectures, Mixed Operational And Analytical Workloads, High-Throughput Application Data Processing, Incremental Data Processing, Change Propagation Across Systems, Data Synchronization Across Components, Schema Evolution In Production Systems, Throughput-Oriented System Design, Performance Isolation Techniques, Embedded Analytics In Applications

Abstract

Increasingly high-volume transaction processing systems in digital commerce, financial services, healthcare systems and cloud-native enterprise need unified architectures that can provide ultra-low-latency operations and also near-real-time analytics. Separating the OLTP and OLAP environment traditionally creates issues related to synchronization delays, multiple data replications, and schema inconsistencies as well as creating performance contention during mixed workloads. Even though Hybrid Transactional/Analytical Processing (HTAP) environments seek to resolve this gap, most are built on the idea of optimizing the database instead of solving ecosystem-wide issues such as the propagation of incremental changes between services, service-to-service coordination, the integration of analytics into the embedded platform, and system design centered on throughput. Consequently, loosely coupled architectures tend to exhibit issues of cascading latency, irregular state synchronization as well as reduced flexibility to workload changes. This paper proposes an architectural framework called the Throughput-Oriented Integrated Data and Analytics Ecosystem (TIDAE) as a single, architectural framework aimed at high-volume, mixed-workload systems. The suggested system brings together transaction-cherished processing, change data capture (CDC) entertained by streaming, incremental process organizations, efficient analytical storage, and inbuilt analytics displays in an orchestrated throughput-disposed developing. Formal throughput analysis, architectural modeling, and empirical benchmarking are used to prove that the framework supports 35-50% better throughput and about 40% less synchronization latency than traditional decoupled architectures, and meets analytically-challenged latency SLOs. The findings confirm the usefulness of isolated workload, progressive processing and scalable synchronization solutions in facilitating enterprise-caliber systems in smoothly incorporating analytics into operational pipelines with no disruption to performance or resiliency.

References

1. Boroumand, A., Ghose, S., Oliveira, G. F., & Mutlu, O. (2021). Polynesia: Enabling effective hybrid transactional/analytical databases with specialized hardware/software co-design. arXiv preprint arXiv:2103.00798.

2. Warren, J., & Marz, N. (2015). Big Data: Principles and best practices of scalable realtime data systems. Simon and Schuster.

3. Zaharia, M., Xin, R. S., Wendell, P., Das, T., Armbrust, M., Dave, A., ... & Stoica, I. (2016). Apache spark: a unified engine for big data processing. Communications of the ACM, 59(11), 56-65.

4. Brewer, E. A. (2000, July). Towards robust distributed systems. In PODC (Vol. 7, No. 10.1145, pp. 343477-343502).

5. Stonebraker, M., Abadi, D. J., DeWitt, D., Madden, S., Paulson, E., Pavlo, A., & Rasin, A. (2010). MapReduce and parallel DBMSs: Friends or foes? Communications of the ACM, 53(1), 64–71. https://doi.org/10.1145/1629175.1629197

6. Murray, D. G., McSherry, F., Isard, M., Isaacs, R., Barham, P., & Abadi, M. (2016). Incremental, iterative data processing with timely dataflow. Communications of the ACM, 59(10), 75-83.

7. Jhawar, R., & Piuri, V. (2013, July). Adaptive resource management for balancing availability and performance in cloud computing. In 2013 International Conference on Security and Cryptography (SECRYPT) (pp. 1-11). IEEE.

8. Khalifa, S., Elshater, Y., Sundaravarathan, K., Bhat, A., Martin, P., Imam, F., ... & Statchuk, C. (2016). The six pillars for building big data analytics ecosystems. ACM Computing Surveys (CSUR), 49(2), 1-36.

9. Tang, S., He, B., Yu, C., Li, Y., & Li, K. (2020). A survey on spark ecosystem: Big data processing infrastructure, machine learning, and applications. IEEE Transactions on Knowledge and Data Engineering, 34(1), 71-91.

10. Bendre, M. R., & Thool, V. R. (2016). Analytics, challenges and applications in big data environment: a survey. Journal of Management Analytics, 3(3), 206-239.

11. Makreshanski, D., Giceva, J., Barthels, C., & Alonso, G. (2017, May). BatchDB: Efficient isolated execution of hybrid OLTP+ OLAP workloads for interactive applications. In Proceedings of the 2017 ACM International Conference on Management of Data (pp. 37-50).

12. Sirin, U., & Ailamaki, A. (2019). Micro-architectural analysis of OLAP: limitations and opportunities. arXiv preprint arXiv:1908.04718.

13. Özcan, F., Tian, Y., & Tözün, P. (2017, May). Hybrid transactional/analytical processing: A survey. In Proceedings of the 2017 ACM International Conference on Management of Data (pp. 1771-1775).

14. Kuznetsov, S. D., Velikhov, P. E., & Fu, Q. (2020, December). Real-time analytics, hybrid transactional/analytical processing, in-memory data management, and non-volatile memory. In 2020 Ivannikov Ispras Open Conference (ISPRAS) (pp. 78-90). IEEE.

15. Alhilal, A., Finley, B., Braud, T., Su, D., & Hui, P. (2020). Distributed vehicular computing at the dawn of 5G: A survey. arXiv preprint arXiv:2001.07077.

16. Malek, S., Mikic-Rakic, M., & Medvidovic, N. (2005). A style-aware architectural middleware for resource-constrained, distributed systems. IEEE Transactions on Software Engineering, 31(3), 256-272.

17. Anwar, M. J., Gill, A. Q., Hussain, F. K., & Imran, M. (2021). Secure big data ecosystem architecture: challenges and solutions. EURASIP Journal on Wireless Communications and Networking, 2021(1), 130.

18. Luu, J., Anderson, J. H., & Rose, J. S. (2011, February). Architecture description and packing for logic blocks with hierarchy, modes and complex interconnect. In Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays (pp. 227-236).

19. Gupta, S., & Giri, V. (2018). Capture Streaming Data with Change-Data-Capture. In Practical Enterprise Data Lake Insights: Handle Data-Driven Challenges in an Enterprise Big Data Lake (pp. 87-123). Berkeley, CA: Apress.

20. Andreakis, A., & Papapanagiotou, I. (2020). DBLog: A Watermark Based Change-Data-Capture Framework. arXiv preprint arXiv:2010.12597.

21. Cugola, G., & Margara, A. (2012). Processing flows of information: From data stream to complex event processing. ACM Computing Surveys (CSUR), 44(3), 1-62.

22. Cleland-Huang, J., Chang, C. K., & Christensen, M. (2003). Event-based traceability for managing evolutionary change. IEEE Transactions on Software Engineering, 29(9), 796-810.

23. Syed, A. (2006, January). Time synchronization for high latency acoustic networks. In Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

24. Hu, B., Huang, K., Chen, G., Cheng, L., & Knoll, A. (2016). Adaptive workload management in mixed-criticality systems.

25. Guo, J., Cai, P., Wang, J., Qian, W., & Zhou, A. (2019). Adaptive optimistic concurrency control for heterogeneous workloads. Proceedings of the VLDB Endowment, 12(5), 584-596.

Designing Data and Analytics Ecosystems for High Volume Transaction Processing Applications

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

How to Cite

Make a Submission

Callpaper

Menu

Information

Keywords

Latest publications