Reinforcement Learning for Adaptive Resource Management in Cloud Systems

Authors

  • Rajender Reddy Muddam Independent Researcher, USA. Author

DOI:

https://doi.org/10.63282/3050-9416.ICAIDSCT26-129

Keywords:

Reinforcement Learning, Cloud Resource Management, Adaptive Scheduling, Auto-Scaling, Dynamic Resource Allocation, Cloud Computing, Intelligent Orchestration, Performance Optimization, QoS Management, AI-driven Cloud Systems

Abstract

Cloud software systems operate under unpredictable and constantly changing workloads. Static provisioning and rule-based auto-scaling strategies often respond too late to performance issues or waste resources during low-demand periods. This creates a tradeoff between cost efficiency and service reliability that traditional approaches struggle to balance. This paper presents a reinforcement learning based framework for adaptive cloud resource management. The system learns how to allocate computing resources by interacting with the cloud environment and observing the long-term outcomes of its decisions. Cloud management is modeled as a sequential decision process where the learning agent balances performance, cost, and service-level agreement compliance. We evaluate the proposed approach using simulated cloud workloads and compare it with threshold-based and reactive scaling strategies. Results show improved resource utilization, reduced SLA violations, and smoother adaptation to workload changes. The findings suggest that reinforcement learning offers a practical foundation for building self-adaptive cloud systems that improve over time without manual rule tuning.

References

1. Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction. MIT Press.

2. Mao, Y., Alizadeh, M., Menache, I., & Kandula, S. (2016). Resource management with deep reinforcement learning. HotNets.

3. Chen, T., Zhang, Z., Mao, Y., & Li, B. (2018). Self-adaptive resource allocation using reinforcement learning. IEEE CLOUD.

4. Hyndman, R. J., & Athanasopoulos, G. (2018). Forecasting: Principles and Practice. OTexts.

5. Xu, H., & Li, B. (2013). Dynamic cloud pricing for revenue maximization. IEEE Transactions on Cloud Computing.

6. Mao, M., & Humphrey, M. (2011). Auto-scaling to minimize cost and meet application deadlines in cloud workflows. SC Companion.

7. Chen, M., et al. (2020). Machine learning for system reliability: A survey. IEEE Transactions on Reliability.

8. Dean, J., & Barroso, L. A. (2013). The tail at scale. Communications of the ACM.

9. Ghodsi, A., et al. (2011). Dominant resource fairness. ACM SIGCOMM.

10. Agarwal, P. K., et al. (2016). Site Reliability Engineering: How Google Runs Production Systems. O’Reilly.

Downloads

Published

2026-02-17

How to Cite

1.
Muddam RR. Reinforcement Learning for Adaptive Resource Management in Cloud Systems. IJAIBDCMS [Internet]. 2026 Feb. 17 [cited 2026 Feb. 17];:258-61. Available from: https://ijaibdcms.org/index.php/ijaibdcms/article/view/419