Potential of AI and ML to Enhance Error Detection, Prediction, and Automated Remediation in Batch Processing

Sandeep Kumar Jangam; Nagireddy Karri

doi:10.63282/3050-9416.IJAIBDCMS-V3I4P108

Authors

Sandeep Kumar Jangam Independent Researcher, USA. Author
Nagireddy Karri Independent Researcher, USA. Author

DOI:

https://doi.org/10.63282/3050-9416.IJAIBDCMS-V3I4P108

Keywords:

Batch Processing, Machine Learning, Error Detection, Predictive Maintenance, Automated Remediation, Fault Tolerance, Anomaly Detection

Abstract

A batch processing system is especially critical to a number of data-dependent and mission-sensitive functions in markets including financial, healthcare, and supply chain management. Such systems, however, are very vulnerable to run-time errors, performance degradation, and system collapses, as they depend on sequential task completion, contemporaneous scheduling, and have few mechanisms for real-time feedback. The drawback with the traditional rule-based monitoring and manual human interventions is that they are unable to detect some fine-scale variations or anticipate failures beforehand, leading to operational downtimes and wastages of resources. The criticality of next-generation solutions that can be used to facilitate the transformation of the market is discussed in this paper with regard to Artificial Intelligence (AI) and Machine Learning (ML) abilities to expand the range of processes related to batch processing by helping to identify, model, and resolve errors on a preventative basis. We introduce a unified framework that employs both unsupervised and supervised learning models to ensure that batch processing environments are more resilient and autonomous and, therefore, cannot fail easily. The approach involves preprocessing the log data, identifying patterns, and training models. A history of past flawed executions is used to identify and predict failures before they happen, thereby averting them to prevent the failure. The main results of our prototype implementation demonstrate a considerable increase in the accuracy of detecting errors, providing warnings in advance, and the efficiency of system recovery compared to traditional systems. The flexibility of AI-based remediation agents to automatically correct mistakes efficiently with little to no human touch is also evident in our study, which in benchmark cases lowered Mean Time To Recovery (MTTR) by as much as 40 percent. The results highlight the feasibility of implementing AI/ML in actual batch operations to reduce the downtime, optimize resource use and maximize Service-Level Agreement (SLA) achievements. This study provides guidance on constructing smart, self-healing batch systems that can learn throughout their operation and self-improve in the future

References

1. Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3), 1-58.

2. Borghesi, A., Bartolini, A., Lombardi, M., Milano, M., & Benini, L. (2019, July). Anomaly detection using autoencoders in high-performance computing systems. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, No. 01, pp. 9428-9433).

3. Pang, G., Shen, C., Cao, L., & Hengel, A. V. D. (2021). Deep learning for anomaly detection: A review. ACM computing surveys (CSUR), 54(2), 1-38.

4. Geiger, A., Liu, D., Alnegheimish, S., Cuesta-Infante, A., & Veeramachaneni, K. (2020, December). Tadgan: Time series anomaly detection using generative adversarial networks. In 2020 IEEE International Conference on Big Data (Big Data) (pp. 33-43). IEEE.

5. Heidari, Alireza; McGrath, Joshua; Ilyas, Ihab F.; Rekatsinas, Theodoros. HoloDetect: Few Shot Learning for Error Detection. arXiv preprint arXiv:1904.02285 (2019).

6. Morris-Wiseman, L. F., & Nfonsam, V. N. (2021). Early detection and remediation of problem learners. Surgical Clinics, 101(4), 611-624.

7. Chang, Haw Shiuan; Vembu, Shankar; Mohan, Sunil; Uppaal, Rheeya; McCallum, Andrew. Using Error Decay Prediction to Overcome Practical Issues of Deep Active Learning for Named Entity Recognition. arXiv preprint arXiv:1911.07335v2 (updated July 21, 2020).

8. Heidari, Alireza; McGrath, Joshua; Ilyas, Ihab F.; Rekatsinas, Theodoros. HoloDetect: Few Shot Learning for Error Detection. arXiv preprint arXiv: 1904.02285 (2019)..

9. Das, M. K., & Rangarajan, K. (2020, March). Performance monitoring and failure prediction of industrial equipment using artificial intelligence and machine learning methods: A survey. In 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC) (pp. 595-602). IEEE.

10. Gilks, W. R., Richardson, S., & Spiegelhalter, D. (Eds.). (1995). Markov chain Monte Carlo in practice. CRC Press.

11. Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 785-794).

12. Pandey, Rahul; Purohit, Hemant; Castillo, Carlos; Shalin, Valerie L. Modeling and mitigating human annotation errors to design efficient stream processing systems with human in the loop machine learning. arXiv preprint arXiv:2007.03177 (July 7, 2020).

13. Kingma, D. P., & Welling, M. (2013, December). Auto-encoding variational bayes.

14. Andrew, A. M. (2001). An introduction to support vector machines and other kernel‐based learning methods. Kybernetes, 30(1), 103-115.

15. Arockiaraj Simiyon; Chaitanya Sachidanand; Manthana Halmakki Krishnamurthy; Ananya V. Bhatt; Thirunavukkarasu Indiran (et al.) — 2020.

16. Frtunikj, J., Armbruster, M., & Knoll, A. (2014, November). Run-time adaptive error and state management for open automotive systems. In 2014 IEEE International Symposium on Software Reliability Engineering Workshops (pp. 467-472). IEEE.

17. Verma, D. C., Verma, A., & Mangla, U. (2021, December). Addressing the Limitations of AI/ML in Creating Cognitive Solutions. In 2021 IEEE third international conference on cognitive machine intelligence (CogMI) (pp. 189-196). IEEE.

18. Al Mamun, S. A., & Valimaki, J. (2018). Anomaly detection and classification in cellular networks using an automatic labeling technique for applying supervised learning. Procedia Computer Science, 140, 186-195.

19. Rato, T. J., Rendall, R., Gomes, V., Chin, S. T., Chiang, L. H., Saraiva, P. M., & Reis, M. S. (2016). A Systematic Methodology for Comparing Batch Process Monitoring Methods: Part I Assessing Detection Strength. Industrial & Engineering Chemistry Research, 55(18), 5342-5358.

20. Thati, V. B., Vankeirsbilck, J., & Boydens, J. (2016, September). Comparative study on data error detection techniques in embedded systems. In 2016 XXV International Scientific Conference Electronics (ET) (pp. 1-4). IEEE.

21. Pappula, K. K., & Anasuri, S. (2020). A Domain-Specific Language for Automating Feature-Based Part Creation in Parametric CAD. International Journal of Emerging Research in Engineering and Technology, 1(3), 35-44. https://doi.org/10.63282/3050-922X.IJERET-V1I3P105

22. Rahul, N. (2020). Vehicle and Property Loss Assessment with AI: Automating Damage Estimations in Claims. International Journal of Emerging Research in Engineering and Technology, 1(4), 38-46. https://doi.org/10.63282/3050-922X.IJERET-V1I4P105

23. Enjam, G. R., & Chandragowda, S. C. (2020). Role-Based Access and Encryption in Multi-Tenant Insurance Architectures. International Journal of Emerging Trends in Computer Science and Information Technology, 1(4), 58-66. https://doi.org/10.63282/3050-9246.IJETCSIT-V1I4P107

24. Pappula, K. K., & Rusum, G. P. (2021). Designing Developer-Centric Internal APIs for Rapid Full-Stack Development. International Journal of AI, BigData, Computational and Management Studies, 2(4), 80-88. https://doi.org/10.63282/3050-9416.IJAIBDCMS-V2I4P108

25. Pedda Muntala, P. S. R., & Karri, N. (2021). Leveraging Oracle Fusion ERP’s Embedded AI for Predictive Financial Forecasting. International Journal of Artificial Intelligence, Data Science, and Machine Learning, 2(3), 74-82. https://doi.org/10.63282/3050-9262.IJAIDSML-V2I3P108

26. Rahul, N. (2021). Strengthening Fraud Prevention with AI in P&C Insurance: Enhancing Cyber Resilience. International Journal of Artificial Intelligence, Data Science, and Machine Learning, 2(1), 43-53. https://doi.org/10.63282/3050-9262.IJAIDSML-V2I1P106

27. Enjam, G. R., Chandragowda, S. C., & Tekale, K. M. (2021). Loss Ratio Optimization using Data-Driven Portfolio Segmentation. International Journal of Artificial Intelligence, Data Science, and Machine Learning, 2(1), 54-62. https://doi.org/10.63282/3050-9262.IJAIDSML-V2I1P107

Potential of AI and ML to Enhance Error Detection, Prediction, and Automated Remediation in Batch Processing

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

How to Cite

Make a Submission

Callpaper

Menu

Information

Keywords

Latest publications