Metric-Driven Automatic Software Vulnerability Detection Using Feature Extraction Pipelines and Interpretable ML

Authors

  • Ananya Nair Artificial Intelligence Department, Amrita Vishwa Vidyapeetham, Coimbatore, Tamil Nadu, India. Author
  • Rohit Menon Information Technology, Amrita Vishwa Vidyapeetham, Coimbatore, Tamil Nadu, India. Author
  • Pranav Kulkarni Artificial Intelligence Department, Amrita Vishwa Vidyapeetham, Coimbatore, Tamil Nadu, India. Author

DOI:

https://doi.org/10.63282/3050-9416.IJAIBDCMS-V5I4P120

Keywords:

Software Security, Vulnerability Detection, Interpretable Machine Learning, Feature Extraction Pipelines, Devsecops, Code Representation Learning, Program Graphs, Risk Metrics, CI/CD Governance

Abstract

Software vulnerability detection has progressed from signature-driven static analysis toward learning-based detection over code, build, and operational signals. However, many machine-learning (ML) vulnerability detectors remain difficult to operationalize in real enterprise pipelines because they (i) rely on brittle or opaque feature representations, (ii) provide limited traceability from predictions to security-relevant metrics and engineering controls, and (iii) under-support governance requirements such as auditable decision pathways, risk thresholds, and continuous integration and delivery (CI/CD) gating. This manuscript proposes a metric-driven framework for automatic vulnerability detection that explicitly couples (a) a modular feature-extraction pipeline spanning program semantics and software delivery telemetry, (b) a unified risk metric schema grounded in vulnerability taxonomies and operational signals, and (c) interpretable ML for actionable explanations at multiple levels of granularity. The core idea is to treat vulnerability detection as a metric-aware learning problem: the detector outputs a probability of vulnerability and an aligned, human-auditable rationale mapped to metrics such as weakness category, severity tier, change-risk indicators, and runtime observability signals. The framework integrates transformer-based code representations and graph-based program semantics with explanation methods to produce transparent triage artifacts suitable for DevSecOps and compliance-constrained domains. A deployment-oriented methodology is presented, including feature engineering, model training, explanation generation, CI/CD integration, and evaluation protocols that emphasize decision utility in addition to predictive accuracy.

References

1. Harold Booth, Doug Rike, and Gregory A. Witte, “The National Vulnerability Database (NVD): Overview,” ITL Bulletin, National Institute of Standards and Technology, Gaithersburg, MD, USA, Dec. 18, 2013. https://www.nist.gov/publications/national-vulnerability-database-nvd-overview?pub_id=915172.

2. Robert A. Martin, and Sean Barnum, “Common weakness enumeration (CWE) status update,” ACM SIGAda Ada Letters, vol. 28, no. 1, pp. 88–91, Apr. 2008. https://dl.acm.org/doi/abs/10.1145/1387830.1387835.

3. Srikanth Reddy Gudi, “Enhancing Reliability in Java Enterprise Systems Through Comparative Analysis of Automated Testing Frameworks,” International Journal of Emerging Trends in Computer Science and Information Technology, vol. 4, no. 2, pp. 151–160, 2023. https://www.ijetcsit.org/index.php/ijetcsit/article/view/476.

4. Siva Kantha Rao Vanama, “Architecture Led Cloud Modernization: A Framework for Enterprise Migration from VMware to OpenShift and AWS,” International Journal of Emerging Research in Engineering and Technology, vol. 5, no. 1, pp. 117–125, Mar. 2024. https://ijeret.org/index.php/ijeret/article/view/393

5. Sai Krishna Gunda, et al., “Decision Intelligence Methodology for AI-Driven Agile Software Lifecycle Governance and Architecture-Centered Project Management,” International Journal of Artificial Intelligence, Data Science, and Machine Learning, vol. 4, no. 1, pp. 102–108, Mar. 2023. https://ijaidsml.org/index.php/ijaidsml/article/view/301.

6. Indrasena Manga, Sai Dheeraj Sivva, and Vamsi Krishna Manga, “The Adaptive Intelligence in Cloud Systems: A Unified Architecture for AI Enhanced Observability and Automated Root Cause Analysis”, IJAIDSML, vol. 5, no. 1, pp. 160–166, Mar. 2024. https://ijaidsml.org/index.php/ijaidsml/article/view/366

7. Srikanth Reddy Gudi, “Design and Evaluation of Secure Microservices Architecture for HIPAA-Compliant Prescription Processing on AWS and OpenShift,” International Journal of Artificial Intelligence, Data Science, and Machine Learning, vol. 5, no. 2,0 pp. 144–149, 2024. https://ijaidsml.org/index.php/ijaidsml/article/view/337

8. Rakesh Reddy Thalakanti, Sai Santhosh Goud Bandari, and Sai Dheeraj Sivva, “Federated Learning for Privacy Preserving Fraud Detection across Financial Institutions: Architecture Protocols and Operational Governance,” International Journal of Emerging Research in Engineering and Technology, vol. 5, no. 2, pp. 108–114, 2024. https://ijeret.org/index.php/ijeret/article/view/394

9. Sai Krishna Gunda, “The Future of Software Development and the Expanding Role of ML Models,” International Journal of Emerging Research in Engineering and Technology, vol. 4, no. 2, pp. 126–129, 2023. https://ijeret.org/index.php/ijeret/article/view/347.

10. Sai Santhosh Goud Bandari, Sai Dheeraj Sivva, and Rakesh Reddy Thalakanti, “Regulatory Grade Fraud Detection using Explainable Artificial Intelligence with Auditable Decision Pathways and Empirical Validation on Banking Data,” International Journal of Artificial Intelligence, Data Science, and Machine Learning, vol. 5, no. 3, pp. 139–147, 2024. https://ijaidsml.org/index.php/ijaidsml/article/view/367

11. Srikanth Reddy Gudi, “AI-Driven Fax-to-Digital Prescription Automation: A Cloud-Native Framework Using OCR, Machine Learning, and Microservices for Pharmacy Operations,” International Journal of Emerging Research in Engineering and Technology, vol. 5, no. 1, pp. 111–116, 2024. https://ijeret.org/index.php/ijeret/article/view/358

12. Macro Tulio Ribeiro, Sameer Singh, and Carlos Guestrin, “Why Should I Trust You?: Explaining the Predictions of Any Classifier,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD), San Francisco, CA, USA, pp. 1135–1144, Aug. 2016. https://dl.acm.org/doi/abs/10.1145/2939672.2939778.

13. Sai Krishna Gunda, “Comparative Analysis of Machine Learning Models for Software Defect Prediction,” in Proc. 2024 Int. Conf. Power, Energy, Control and Transmission Systems (ICPECTS), Chennai, India, pp. 1–6, 2024. https://ieeexplore.ieee.org/abstract/document/10780167

14. Scott M. Lundberg and Su-In Lee, “A Unified Approach to Interpreting Model Predictions,” in Advances in Neural Information Processing Systems 30 (NeurIPS 2017), pp. 4765–4774, 2017. https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html.

15. Srikanth Reddy Gudi, “Leveraging Predictive Analytics and Redis-Backed Caching to Optimize Specialty Medication Fulfillment and Pharmacy Inventory Management,” International Journal of AI, BigData, Computational and Management Studies, vol. 5, no. 3, pp. 155–160, 2024. https://ijaibdcms.org/index.php/ijaibdcms/article/view/327

16. Rakesh Reddy Thalakanti and Sai Santhosh Goud Bandari, “Intelligent Continuous Integration and Delivery for Banking Systems using Machine Learning Driven Risk Detection with Real World Deployment Evaluation,” International Journal of AI, BigData, Computational and Management Studies, vol. 5, no. 4, pp. 168–175, 2024. https://ijaibdcms.org/index.php/ijaibdcms/article/view/335

17. Zhangyin Feng, et al., “CodeBERT: A Pre-Trained Model for Programming and Natural Languages,” in Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 1536–1547, 2020. https://arxiv.org/abs/2002.08155.

18. Sai Krishna Gunda, “Fault Prediction Unveiled: Analyzing the Effectiveness of Random Forest, Logistic Regression, and KNeighbors,” in Proc. 2024 2nd Int. Conf. Self Sustainable Artificial Intelligence Systems (ICSSAS), Erode, India, pp. 107–113, 2024. https://ieeexplore.ieee.org/abstract/document/10760620

19. Yaqin Zhou, et al., “Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks,” in Advances in Neural Information Processing Systems 32 (NeurIPS 2019), pp. 10197–10207, 2019. https://proceedings.neurips.cc/paper_files/paper/2019/hash/49265d2447bc3bbfe9e76306ce40a31f-Abstract.html.

20. Sai Dheeraj Sivva, et al., “AI-Driven Decision Intelligence for Agile Software Lifecycle Governance: An Architecture-Centered Framework Integrating Machine Learning Defect Prediction and Automated Testing,” IJETCSIT, vol. 4, no. 4, pp. 167-72, 2023. https://www.ijetcsit.org/index.php/ijetcsit/article/view/554.

Downloads

Published

2024-12-30

Issue

Section

Articles

How to Cite

1.
Nair A, Menon R, Kulkarni P. Metric-Driven Automatic Software Vulnerability Detection Using Feature Extraction Pipelines and Interpretable ML. IJAIBDCMS [Internet]. 2024 Dec. 30 [cited 2026 Mar. 15];5(4):182-8. Available from: https://ijaibdcms.org/index.php/ijaibdcms/article/view/460