End-to-End Visibility in Cloud Deployments: Building Real-Time Program Health Systems

Soumya Remella

doi:10.63282/3050-9416.IJAIBDCMS-V6I4P117

Authors

Soumya Remella Independent Researcher, USA. Author

DOI:

https://doi.org/10.63282/3050-9416.IJAIBDCMS-V6I4P117

Keywords:

Cloud Observability, Telemetry Correlation, Anomaly Detection, Real-Time Monitoring, Program Health, Cloud-Native Systems

Abstract

Cloud-native applications now run across distributed services, containers, and serverless functions, each emitting its own logs, metrics, traces, and events. While modern observability tools collect these signals effectively, they tend to process them in isolation, leaving engineers to manually correlate symptoms during incidents. This fragmentation slows detection, clouds root-cause analysis, and weakens real-time understanding of program health. This paper introduces the Real-Time Program Health (RTPH) Framework, a multi-layer model that unifies telemetry ingestion, real-time stream processing, machine-learning-based anomaly detection, and health scoring into a single, interpretable view of system behavior. RTPH is evaluated in a hybrid cloud environment running microservice workloads on Kubernetes, with synthetic faults injected under controlled conditions. Its performance is compared against established observability stacks that include metrics, logging, and tracing tools. Experimental results show that RTPH reduces anomaly detection latency by 32–45%, lowers false-positive alerts by 28–40%, and correctly correlates 87–93% of cross-service anomalies, while keeping CPU and memory overhead below 8% and 6% per node, respectively. These findings indicate that unified, real-time health modeling can provide more accurate, actionable visibility into cloud deployments than traditional, signal-specific monitoring approaches

References

[1] B. Sigelman et al., “Dapper, a Large-Scale Distributed Systems Tracing Infrastructure,” Google, Technical Report, 2010.

[2] P. Sharma, S. Rathore, and S. Park, “A Survey on Monitoring and Observability of Cloud-Native Systems,” IEEE Access, vol. 9, pp. 162785–162802, 2021.

[3] R. Heinrich et al., “Architectural Metrics for Microservice Monitoring,” in Proc. IEEE/IFIP Conf. on Software Architecture (ICSA), 2020, pp. 145–154.

[4] C. Heger, A. van Hoorn, and D. Okanovic, “Application Performance Monitoring: From Black Box to Open Observability,” ACM Comput. Surveys, vol. 54, no. 4, pp. 1–35, 2022.

[5] R. Burns, “Observability for Modern Applications,” ACM Queue, vol. 19, no. 5, 2021.

[6] OpenTelemetry, “OpenTelemetry Project Documentation,” CNCF, 2023. [Online]. Available: https://opentelemetry.io

[7] A. Mukhopadhyay et al., “Survey of Machine Learning Techniques for Anomaly Detection in Cloud Systems,” IEEE Trans. on Neural Networks and Learning Systems, vol. 33, no. 9, pp. 4485–4505, 2022.

End-to-End Visibility in Cloud Deployments: Building Real-Time Program Health Systems

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

How to Cite

Make a Submission

Callpaper

Menu

Information

Keywords

Latest publications