Scaling AI: Best Practices in Designing On-Premise & Cloud Infrastructure for Machine Learning
DOI:
https://doi.org/10.63282/3050-9416.IJAIBDCMS-V4I2P105Keywords:
AI scalability, machine learning infrastructure, cloud computing, hybrid cloud, Kubernetes, on-premise ML, GPU optimization, cloud cost management, distributed ML training, MLOpsAbstract
The need for scalable & effective infrastructure to fit ML workloads has reached hitherto unheard- of heights as businesses increasingly use AI-driven applications. Still a major challenge is designing infrastructure that balances cost, performance & the flexibility. While cloud platforms provide flexibility but may cause unanticipated costs, on-site systems supply control & the security yet frequently face scaling difficulties. Using on-site infrastructure for stable workloads & the cloud resources for peak demand helps organizations to improve their resource allocation by means of a hybrid cloud approaches. This method dynamically scales ML tasks based on their demand, hence improving cost efficiency & their performance. Important elements consist in the choice of suitable hardware accelerators, the use of containerized machine learning pipelines for portability, and the usage of automation for resource economy. Moreover, incorporating systems for monitoring and budget control helps teams to understand consumption trends, thus guiding the infrastructure for maximum efficiency. Using best practices in infrastructure design allows businesses to create strong AI systems that supports innovation while keeping reasonable running expenses. This webinar looks at the workable ways to grow AI infrastructure, including ideas on harmonizing on-site & the cloud systems to meet the needs of ML
References
1. Pop, Daniel. "Machine learning and cloud computing: Survey of distributed and saas solutions." arXiv preprint arXiv:1603.08767 (2016).
2. Hwang, Kai. Cloud computing for machine learning and cognitive applications. Mit Press, 2017.
3. Ciaburro, Giuseppe, V. Kishore Ayyadevara, and Alexis Perrier. Hands-on machine learning on google cloud platform: Implementing smart and efficient analytics using cloud ml engine. Packt Publishing Ltd, 2018.
4. Addero, Edgar Otieno. Machine learning techniques for optimizing the provision of storage resources in cloud computing infrastructure as a service (iaas): a comparative study. Diss. University of Nairobi, 2014.
5. Caycioglu, Malik, and Dennis Schlegel. "A criteria framework for the evaluation of cloud-based machine learning services." Journal of Management Cases (2017): 31.
6. Hummer, Waldemar, et al. "Modelops: Cloud-based lifecycle management for reliable and trusted ai." 2019 IEEE International Conference on Cloud Engineering (IC2E). IEEE, 2019.
7. Chmielecki, Przemysław. "Machine Learning based on Cloud Solutions." Edukacja-Technika-Informatyka 10.1 (2019): 132-138.
8. Raj, Pethuru, et al. "Multi-cloud management: Technologies, tools, and techniques." Software-defined cloud centers: Operational and management technologies and tools (2018): 219-240.
9. Dube, Parijat, Tonghoon Suk, and Chen Wang. "AI gauge: Runtime estimation for deep learning in the cloud." 2019 31st International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD). IEEE, 2019.
10. Buniatyan, Davit. "Hyper: Distributed cloud processing for large-scale deep learning tasks." 2019 Computer Science and Information Technologies (CSIT). IEEE, 2019.
11. Spjuth, Ola, Jens Frid, and Andreas Hellander. "The machine learning life cycle and the cloud: implications for drug discovery." Expert opinion on drug discovery 16.9 (2021): 1071-1079.
12. Tirupati, Krishna Kishor, et al. "Optimizing Machine Learning Models for Predictive Analytics in Cloud Environments." International Journal for Research Publication & Seminar. Vol. 13. No. 5. 2022.
13. Nama, Prathyusha. "Integrating AI with cloud computing: A framework for scalable and intelligent data processing in distributed environments." (2022).
14. Selvarajan, Guru Prasad. "OPTIMISING MACHINE LEARNING WORKFLOWS IN SNOWFLAKEDB: A COMPREHENSIVE FRAMEWORK SCALABLE CLOUD-BASEDDATA ANALYTICS." Technix InternationalJournal for Engineering Research 8 (2021): a44-a52.
15. van Ooijen, Peter MA, Erfan Darzi, and Andre Dekker. "Data Storage, Cloud Usage and Artificial Intelligence Pipeline." Artificial Intelligence in Cardiothoracic Imaging. Cham: Springer International Publishing, 2022. 45-55.