Enhancing Data Throughput and Latency in Distributed In-Memory Systems for AI-Driven Applications across Public Cloud Infrastructure

Thulasiram Yachamaneni; Uttam Kotadiya; Amandeep Singh Arora

doi:10.63282/3050-9416.IJAIBDCMS-V2I4P107

Authors

Thulasiram Yachamaneni Senior Engineer II, USA. Author
Uttam Kotadiya Software Engineer II, USA. Author
Amandeep Singh Arora Senior Engineer I, USA. Author

DOI:

https://doi.org/10.63282/3050-9416.IJAIBDCMS-V2I4P107

Keywords:

Distributed In-Memory Systems, AI Workloads, Public Cloud Infrastructure, Data Throughput, Adaptive Caching, Edge Computing

Abstract

Data processing systems are exposed to inordinate pressure to provide real-time computation in the era of artificial intelligence (AI), especially in distributed cloud computing. Distributed In-Memory Systems (DIMS) have also become a crucial infrastructure for supporting AI-based applications, which require both low latency and high throughput. The paper presents an improvement of data throughput and final latency in DIMS to serve AI workload in the famous cloud sites, such as AWS, Azure, and Google Cloud. We explore how existing systems cannot architecturally support performance bottlenecks, and we present a model of hybrid in-memory data distribution that utilises adaptive caching, smart sharding of data, and intelligent data placement based on the proximity principle. On simulations and deployment to benchmark AI applications, the proposed methodology shows considerable performance improvements. Our solution is a layered architecture with modular components to address the issues of scalability, consistency, and fault tolerance, which is backed by efficient methods of memory management. The paper is accompanied by a comparative study with baseline models, such as Apache Ignite, Redis Cluster, and Memcached, which implement these models on the public cloud fringe. We present test results indicating that the enhancements lower average latency by 35 percent and raise data throughput by 47 percent on a variety of AI workloads such as image classification, natural language processing, and predictive analytics. The paper will conclude with a discussion on the implications of this research for large, scalable, AI-enabled cloud computing infrastructures, as well as the extensive work that can be done in the future

References

1. Acharya, S. (2018). Apache Ignite Quick Start Guide: Distributed data caching and processing made easy. Packt Publishing Ltd.

2. Fitzpatrick, B. (2004). Distributed caching with memcached. Linux journal, 2004(124), 5.

3. Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., ... & Ng, A. (2012). Large-scale distributed deep networks. Advances in neural information processing systems, 25.

4. Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., & Stoica, I. (2013, November). Discretized streams: Fault-tolerant streaming computation at scale. In Proceedings of the twenty-fourth ACM symposium on operating systems principles (pp. 423-438).

5. Venkataraman, S., Yang, Z., Franklin, M., Recht, B., & Stoica, I. (2016). Ernest: Efficient performance prediction for {Large-Scale} advanced analytics. In 13th USENIX symposium on networked systems design and implementation (NSDI 16) (pp. 363-378).

6. Cui, H., Zhang, H., Ganger, G. R., Gibbons, P. B., & Xing, E. P. (2016, April). Geeps: Scalable deep learning on distributed GPUs with a GPU-specialised parameter server. In Proceedings of the Eleventh European Conference on Computer Systems (pp. 1-16).

7. Schwarzkopf, M., Konwinski, A., Abd-El-Malek, M., & Wilkes, J. (2013, April). Omega: flexible, scalable schedulers for large compute clusters in Proceedings of the 8th ACM European Conference on Computer Systems (pp. 351-364).

8. Kalia, A., Kaminsky, M., & Andersen, D. G. (2014, August). Using RDMA efficiently for key-value services. In Proceedings of the 2014 ACM Conference on SIGCOMM (pp. 295-306).

9. Jonas, E., Schleier-Smith, J., Sreekanti, V., Tsai, C. C., Khandelwal, A., Pu, Q., ... & Patterson, D. A. (2019). Cloud programming simplified: A Berkeley view on serverless computing. arXiv preprint arXiv:1902.03383.

10. Cheng, Y., Wang, D., Zhou, P., & Zhang, T. (2017). A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:1710.09282.

11. Kirilin, V., Sundarrajan, A., Gorinsky, S., & Sitaraman, R. K. (2019, August). RL-Cache: Learning-based cache admission for content delivery. In Proceedings of the 2019 Workshop on Network Meets AI & ML (pp. 57-63).

12. Velev, D., & Zlateva, P. (2010, March). Cloud infrastructure security. In International Workshop on Open Problems in Network Security (pp. 140-148). Berlin, Heidelberg: Springer Berlin Heidelberg.

13. Sousa, E., Lins, F., Tavares, E., Cunha, P., & Maciel, P. (2014). A modeling approach for cloud infrastructure planning considering dependability and cost requirements. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 45(4), 549-558.

14. Barbosa, F. P., & Charão, A. S. (2012, June). Impact of pay-as-you-go cloud platforms on software pricing and development: a review and case study. In International Conference on Computational Science and Its Applications (pp. 404-417). Berlin, Heidelberg: Springer Berlin Heidelberg.

15. Mutlu, O., Ghose, S., Gómez-Luna, J., & Ausavarungnirun, R. (2019, June). Enabling practical processing in and near memory for data-intensive computing. In Proceedings of the 56th Annual Design Automation Conference 2019 (pp. 1-4).

16. Zhou, X., Chai, C., Li, G., & Sun, J. (2020). Database meets artificial intelligence: A survey. IEEE Transactions on Knowledge and Data Engineering, 34(3), 1096-1116.

17. Jha, S., Katz, D. S., Luckow, A., Chue Hong, N., Rana, O., & Simmhan, Y. (2017). Introducing distributed dynamic data‐intensive (D3) science: Understanding applications and infrastructure. Concurrency and Computation: Practice and Experience, 29(8), e4032.

18. Lv, M., Guan, N., Reineke, J., Wilhelm, R., & Yi, W. (2016). A survey on static cache analysis for real-time systems. Leibniz Transactions on Embedded Systems, 3(1), 05-1.

19. Hu, C., Wang, X., Yang, R., & Wo, T. (2016, December). ScalaRDF: a distributed, elastic and scalable in-memory RDF triple store. In 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS) (pp. 593-601). IEEE.

20. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., ... & Stoica, I. (2012). Resilient distributed datasets: A {Fault-Tolerant} abstraction for {In-Memory} cluster computing. In 9th USENIX symposium on networked systems design and implementation (NSDI 12) (pp. 15-28).

21. Tang, X., Zhai, J., Yu, B., Chen, W., Zheng, W., & Li, K. (2017). An efficient in-memory checkpoint method and its practice on fault-tolerant HPL. IEEE Transactions on Parallel and Distributed Systems, 29(4), 758-771.

Enhancing Data Throughput and Latency in Distributed In-Memory Systems for AI-Driven Applications across Public Cloud Infrastructure

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

How to Cite

Make a Submission

Callpaper

Menu

Information

Keywords

Latest publications