A Review of AI and Machine Learning Solutions for Fault Detection and Self-Healing in Cloud Services
DOI:
https://doi.org/10.63282/3050-9416.IJAIBDCMS-V2I3P107Keywords:
Artificial Intelligence, Machine Learning, Fault Detection, Self-Healing, Cloud ServicesAbstract
The reliability and availability of cloud services are thus critical issues as the usage of cloud computing as a fundamental aspect of present-day digital infrastructure continues to grow. As cloud environments become more complex and larger, the utility of the traditional fault detection schemes and approaches is inadequate, thus providing possibilities of system crashes, poor performance, and interruption of services. In this paper, researching the use of Artificial Intelligence (AI) and Machine Learning (ML) in detecting faults and self-healing in cloud services is performed. It highlights how the traditional rule-based monitoring is moving towards AI-based solutions that utilize large operational datasets in order to find anomalies and perform predictive maintenance. It classifies many types of AI / ML models, such as supervised models, unsupervised models and deep learning models, and explains their success or effectiveness in detecting faults and automating recovery operations. Moreover, it also considers the issues and prospects of including AI/ML in fault management of the cloud environment with the final goal of achieving system resilience and operational effectiveness
References
1. S. Garg, “Predictive Analytics and Auto Remediation using Artificial Intelligence and Machine learning in Cloud Computing Operations,” Int. J. Innov. Res. Eng. Multidiscip. Phys. Sci., vol. 7, no. 2, 2019.
2. M. Kaur and H. Singh, “A review of cloud computing security issues,” Int. J. Grid Distrib. Comput., vol. 8, no. 5, pp. 215–222, 2015, doi: 10.14257/ijgdc.2015.8.5.21.
3. M. K. Gokhroo, M. C. Govil, and E. S. Pilli, “Detecting and mitigating faults in cloud computing environment,” in 2017 3rd International Conference on Computational Intelligence and Communication Technology (CICT), IEEE, Feb. 2017, pp. 1–9. doi: 10.1109/CIACT.2017.7977362.
4. S. Gupta and S. Prakash, “QoS and load balancing in cloud computing-an access for performance enhancement using agent-based software,” Int. J. Innov. Technol. Explor. Eng., vol. 8, no. 11, pp. 641–644, 2019.
5. F. R. Salmasi, “A Self-Healing Induction Motor Drive With Model Free Sensor Tampering and Sensor Fault Detection, Isolation, and Compensation,” IEEE Trans. Ind. Electron., vol. 64, no. 8, pp. 6105–6115, 2017, doi: 10.1109/TIE.2017.2682035.
6. J. Jiao, M. Zhao, J. Lin, and K. Liang, “A comprehensive review on convolutional neural network in machine fault diagnosis,” Neurocomputing, vol. 417, pp. 36–63, 2020, doi: 10.1016/j.neucom.2020.07.088.
7. S. S. S. Neeli, “Real-Time Data Management with In-Memory Databases: A Performance-Centric Approach,” J. Adv. Dev. Res., vol. 11, no. 2, p. 8, 2020.
8. W. Wang, N. G. Moreau, Y. Yuan, P. R. Race, and W. Pang, “Towards machine learning approaches for predicting the self-healing efficiency of materials,” Comput. Mater. Sci., vol. 168, no. February, pp. 180–187, 2019, doi: 10.1016/j.commatsci.2019.05.050.
9. V. Kalra and R. K. Sahu, “A Review on Fault Detection in WSNs,” Int. J. Adv. Res. Comput. Sci. Softw. Eng., vol. 7, no. 4, pp. 63–67, 2017, doi: 10.23956/ijarcsse/v7i3/0140.
10. Z. Amin, H. Singh, and N. Sethi, “Review on Fault Tolerance Techniques in Cloud Computing,” Int. J. Comput. Appl., vol. 116, no. 18, pp. 11–17, 2015, doi: 10.5120/20435-2768.
11. D. Miljković, “Fault detection methods: A literature survey.,” 2016.
12. D. Ghosh, R. Sharman, H. Raghav Rao, and S. Upadhyaya, “Self-healing systems - survey and synthesis,” Decis. Support Syst., vol. 42, no. 4, pp. 2164–2185, 2007, doi: 10.1016/j.dss.2006.06.011.
13. M. R. Mesbahi, A. M. Rahmani, and M. Hosseinzadeh, “Reliability and high availability in cloud computing environments: a reference roadmap,” Human-centric Comput. Inf. Sci., vol. 8, no. 1, 2018, doi: 10.1186/s13673-018-0143-8.
14. R. Chalapathy and S. Chawla, “Deep Learning for Anomaly Detection: A Survey,” 2019, doi: 10.48550/arXiv.1901.03407.
15. P. Ravikumar, “Self-Healing Networks : An AI Approach to Network Fault Management,” vol. 1, no. 2, pp. 1–10, 2015.
16. Y. Zhao, T. Li, X. Zhang, and C. Zhang, “Artificial intelligence-based fault detection and diagnosis methods for building energy systems: Advantages, challenges and the future,” Renew. Sustain. Energy Rev., vol. 109, pp. 85–101, 2019, doi: https://doi.org/10.1016/j.rser.2019.04.021.
17. A. Xenakis, A. Karageorgos, E. Lallas, A. E. Chis, and H. González-Vélez, “Towards distributed IoT/Cloud-based fault detection and maintenance in industrial automation,” Procedia Comput. Sci., vol. 151, no. 2018, pp. 683–690, 2019, doi: 10.1016/j.procs.2019.04.091.
18. U. A. Butt et al., “A Review of Machine Learning Algorithms for Cloud Computing Security,” Electronics, vol. 9, no. 9, 2020, doi: 10.3390/electronics9091379.
19. H. A. S. Ahmed, M. H. Ali, L. M. Kadhum, M. F. Bin Zolkipli, and Y. A. Alsariera, “A review of challenges and security risks of cloud computing,” J. Telecommun. Electron. Comput. Eng., vol. 9, no. 1–2, pp. 87–91, 2017.
20. M. D. Almutairi, A. I. Aria, V. K. Thakur, and M. A. Khan, “Self-healing mechanisms for 3D-printed polymeric structures: From lab to reality,” Polymers (Basel)., vol. 12, no. 7, pp. 1–27, 2020, doi: 10.3390/polym12071534.
21. G. I. Kadhom and A. M. Jaafar, “Semi-alive architecture ‘from healing to self-healing in architecture,’” IOP Conf. Ser. Mater. Sci. Eng., vol. 881, no. 1, 2020, doi: 10.1088/1757-899X/881/1/012015.
22. T. Zhao, W. Zhang, H. Zhao, and Z. Jin, “A Reinforcement Learning-Based Framework for the Generation and Evolution of Adaptation Rules,” in 2017 IEEE International Conference on Autonomic Computing (ICAC), 2017, pp. 103–112. doi: 10.1109/ICAC.2017.47.
23. A. Botta, W. de Donato, V. Persico, and A. Pescapé, “Integration of Cloud computing and Internet of Things: A survey,” Futur. Gener. Comput. Syst., vol. 56, pp. 684–700, 2016, doi: https://doi.org/10.1016/j.future.2015.09.021.
24. J. B. Porch, C. H. Foh, H. Farooq, and A. Imran, “Machine Learning Approach for Automatic Fault Detection and Diagnosis in Cellular Networks,” in 2020 IEEE International Black Sea Conference on Communications and Networking (BlackSeaCom), 2020, pp. 1–5. doi: 10.1109/BlackSeaCom48709.2020.9234962.
25. M. Chen, K. Zhu, R. Wang, and D. Niyato, “Active Learning-Based Fault Diagnosis in Self-Organizing Cellular Networks,” IEEE Commun. Lett., vol. 24, no. 8, pp. 1734–1737, 2020, doi: 10.1109/LCOMM.2020.2991449.
26. C. Shetty and H. Sarojadevi, “Framework for Task Scheduling in Cloud using Machine Learning Techniques,” in 2020 Fourth International Conference on Inventive Systems and Control (ICISC), 2020, pp. 727–731. doi: 10.1109/ICISC47916.2020.9171141.
27. C. B. O. Tamashiro, R. Spolon, R. S. Lobato, A. M. Junior, and M. A. Cavenaghi, “Modelagem de falhas em nuvem Cloud Fault Modeling,” in 2020 15th Iberian Conference on Information Systems and Technologies (CISTI), IEEE, Jun. 2020, pp. 1–6. doi: 10.23919/CISTI49556.2020.9141061.
28. B. Liang, N. Chen, Y. Xie, and Y. Chen, “Grey Fault Detection Method Based on Application Interference Model in Cloud Storage,” in 2019 IEEE International Conference on Smart Internet of Things (SmartIoT), IEEE, Aug. 2019, pp. 36–43. doi: 10.1109/SmartIoT.2019.00015.
29. L. Joseph and R. Mukesh, “To Detect Malware attacks for an Autonomic Self-Heal Approach of Virtual Machines in Cloud Computing,” in 2019 Fifth International Conference on Science Technology Engineering and Mathematics (ICONSTEM), IEEE, Mar. 2019, pp. 220–231. doi: 10.1109/ICONSTEM.2019.8918909.
30. S. Ghahremani and H. Giese, “Performance Evaluation for Self-Healing Systems: Current Practice and Open Issues,” in 2019 IEEE 4th International Workshops on Foundations and Applications of Self* Systems (FAS*W), 2019, pp. 116–119. doi: 10.1109/FAS-W.2019.00039.
31. Kalla, D., and Samiuddin, V. (2020). Chatbot for medical treatment using NLTK Lib. IOSR J. Comput. Eng, 22, 12.
32. Kuraku, S., and Kalla, D. (2020). Emotet malware a banking credentials stealer. Iosr J. Comput. Eng, 22, 31-41
33. Sreejith Sreekandan Nair, Govindarajan Lakshmikanthan (2020). Beyond VPNs: Advanced Security Strategies for the Remote Work Revolution. International Journal of Multidisciplinary Research in Science, Engineering and Technology 3 (5):1283-1294.