Cloud Data Optimization: Performance Tuning for Batch Processing and Real-Time Streaming
DOI:
https://doi.org/10.63282/3050-9416.ICAIDSCT26-144Keywords:
Cloud Computing, Data Optimization, Performance Tuning, Batch Processing, Real-Time Streaming, Scalability, Latency ReductionAbstract
This paper explores performance tuning techniques for cloud data workflows, focusing on both batch processing and real-time streaming. It addresses key challenges in scalability, efficiency, and latency reduction to optimize data handling in cloud environments. Various strategies for resource allocation, load balancing, and data partitioning are analyzed to enhance throughput and minimize processing delays. The study evaluates the impact of tuning parameters on system performance through experimental results and case studies. Emphasis is placed on balancing cost-effectiveness with computational demands. Insights into adaptive optimization approaches for dynamic workloads are also provided. The findings demonstrate significant improvements in processing speed and resource utilization. This work contributes practical guidelines for optimizing cloud-based data pipelines in diverse operational contexts.
References
1. Q. Lin, B. C. Ooi, Z. Wang, and C. Yu, “Scalable Distributed Stream Join Processing,” Association for Computing Machinery, May 2015, pp. 811–825. doi: 10.1145/2723372.2746485.
2. R. Alsurdeh, R. N. Calheiros, K. M. Matawie, and B. Javadi, “Hybrid Workflow Scheduling on Edge Cloud Computing Systems,” IEEE Access, vol. 9, pp. 134783–134799, Jan. 2021, doi: 10.1109/access.2021.3116716.
3. D. Cheng, Y. Wang, X. Zhou, and C. Jiang, “Adaptive Scheduling Parallel Jobs with Dynamic Batching in Spark Streaming,” IEEE Trans. Parallel Distrib. Syst., vol. 29, no. 12, pp. 2672–2685, Dec. 2018, doi: 10.1109/tpds.2018.2846234.
4. D. Cheng, D. Milojicic, Y. Chen, D. Gmach, and X. Zhou, “Adaptive scheduling of parallel jobs in spark streaming,” Institute Of Electrical Electronics Engineers, May 2017. doi: 10.1109/infocom.2017.8057206.
5. Q. Zhang, W. Shi, Y. Song, and R. R. Routray, “Adaptive Block and Batch Sizing for Batched Stream Processing System,” Institute Of Electrical Electronics Engineers, July 2016, pp. 35–44. doi: 10.1109/icac.2016.27.
6. M. Khaldi, M. Rebbah, B. Meftah, and O. Smail, “Fault tolerance for a scientific workflow system in a Cloud computing environment,” International Journal of Computers and Applications, vol. 42, no. 7, pp. 705–714, July 2019, doi: 10.1080/1206212x.2019.1647651.
7. Z. Chen, X. Chen, C. Rong, B. Lin, K. Lin, and X. Zheng, “Adaptive Resource Allocation and Consolidation for Scientific Workflow Scheduling in Multi-Cloud Environments,” IEEE Access, vol. 8, pp. 190173–190183, Jan. 2020, doi: 10.1109/access.2020.3032545.
8. H. Liu, W. Zhu, Y. Lu, and S. Fu, “A Trend Detection-Based Auto-Scaling Method for Containers in High-Concurrency Scenarios,” IEEE Access, vol. 12, pp. 71821–71834, Jan. 2024, doi: 10.1109/access.2024.3403451.
9. Y. Li, L. Zou, M. T. Ozsu, and D. Zhao, “Time Constrained Continuous Subgraph Search Over Streaming Graphs,” Institute Of Electrical Electronics Engineers, Apr. 2019. doi: 10.1109/icde.2019.00100.
10. J. K. Konjaang and L. Xu, “Multi-objective workflow optimization strategy (MOWOS) for cloud computing,” J Cloud Comp, vol. 10, no. 1, Jan. 2021, doi: 10.1186/s13677-020-00219-1.
11. J. Cervino, E. Kalyvianaki, P. Pietzuch, and J. Salvachua, “Adaptive Provisioning of Stream Processing Systems in the Cloud,” Institute Of Electrical Electronics Engineers, Apr. 2012, pp. 295–301. doi: 10.1109/icdew.2012.40.
12. A. S. Rajawat, S. B. Goyal, V. Malik, and M. Kumar, “Adaptive resource allocation and optimization in cloud environments: Leveraging machine learning for efficient computing,” Crc, 2024, pp. 499–508. doi: 10.1201/9781003471059-64.
13. C. Xu, M. Kaul, V. Markl, and M. Holzemer, “Efficient fault-tolerance for iterative graph processing on distributed dataflow systems,” Institute Of Electrical Electronics Engineers, May 2016, pp. 613–624. doi: 10.1109/icde.2016.7498275.
14. M. Junaid et al., “Modeling an Optimized Approach for Load Balancing in Cloud,” IEEE Access, vol. 8, pp. 173208–173226, Jan. 2020, doi: 10.1109/access.2020.3024113.
15. M. Mudassar, L. Lejian, and Y. Zhai, “Adaptive Fault-Tolerant Strategy for Latency-Aware IoT Application Executing in Edge Computing Environment,” IEEE Internet Things J., vol. 9, no. 15, pp. 13250–13262, Aug. 2022, doi: 10.1109/jiot.2022.3144026.
16. K. Ren, A. Thomson, T. Diamond, and D. J. Abadi, “Low-Overhead Asynchronous Checkpointing in Main-Memory Database Systems,” Association for Computing Machinery, June 2016, pp. 1539–1551. doi: 10.1145/2882903.2915966.
17. N. Jain, J. Naor, I. Menache, and J. Yaniv, “A Truthful Mechanism for Value-Based Scheduling in Cloud Computing,” Theory Comput Syst, vol. 54, no. 3, pp. 388–406, Feb. 2013, doi: 10.1007/s00224-013-9449-0.
18. H. Zheng, M. Zhang, H. Li, H. Tan, and K. Xu, “Efficient resource allocation in cloud computing environments using AI-driven predictive analytics,” ACE, vol. 82, no. 1, pp. 17–23, Sept. 2024, doi: 10.54254/2755-2721/82/2024glg0055.
19. Z. Chen, C. Luo, J. Hu, T. El-Ghazawi, and G. Min, “Adaptive and Efficient Resource Allocation in Cloud Datacenters Using Actor-Critic Deep Reinforcement Learning,” IEEE Trans. Parallel Distrib. Syst., vol. 33, no. 8, pp. 1911–1923, Aug. 2022, doi: 10.1109/tpds.2021.3132422.
20. K. Lee and L. Liu, “Scaling queries over big RDF graphs with semantic hash partitioning,” Proc. VLDB Endow., vol. 6, no. 14, pp. 1894–1905, Sept. 2013, doi: 10.14778/2556549.2556571.
21. Y. Mao, X. Li, and X. Chen, “Max–Min Task Scheduling Algorithm for Load Balance in Cloud Computing,” Springer India, 2014, pp. 457–465. doi: 10.1007/978-81-322-1759-6_53.
22. M. O. Oyediran, O. Aiyeniko, M. O. Adigun, P. Chima Obuzor, O. S. Ojo, and S. A. Ajagbe, “Comprehensive review of load balancing in cloud computing system,” IJECE, vol. 14, no. 3, p. 3244, June 2024, doi: 10.11591/ijece.v14i3.pp3244-3255.
23. L. Amini, A. Sehgal, J. Silber, O. Verscheure, and N. Jain, “Adaptive Control of Extreme-scale Stream Processing Systems,” Institute Of Electrical Electronics Engineers, Jan. 2017, p. 71. doi: 10.1109/icdcs.2006.13.
24. J.-H. Hwang, S. Zdonik, A. Rasin, M. Balazinska, M. Stonebraker, and U. Cetintemel, “High-Availability Algorithms for Distributed Stream Processing,” Institute Of Electrical Electronics Engineers, Apr. 2005, pp. 779–790. doi: 10.1109/icde.2005.72.
25. T. Das, I. Stoica, Y. Zhong, and S. Shenker, “Adaptive Stream Processing using Dynamic Batch Sizing,” Association for Computing Machinery, Nov. 2014, pp. 1–13. doi: 10.1145/2670979.2670995.
26. J. Rane, Ö. Kaya, N. L. Rane, and S. K. Mallick, “Scalable and adaptive deep learning algorithms for large-scale machine learning systems,” Deep Science, 2024. doi: 10.70593/978-81-981271-0-5_2.
27. G. Van Dongen and D. Van Den Poel, “Evaluation of Stream Processing Frameworks,” IEEE Trans. Parallel Distrib. Syst., vol. 31, no. 8, pp. 1845–1858, Aug. 2020, doi: 10.1109/tpds.2020.2978480.
28. M. Ghasemi, A. Mansy, P. Kanuparthy, J. Rexford, and T. Benson, “Performance Characterization of a Commercial Video Streaming Service,” Association for Computing Machinery, Nov. 2016, pp. 499–511. doi: 10.1145/2987443.2987481.
29. R. Han et al., “Workload-Adaptive Configuration Tuning for Hierarchical Cloud Schedulers,” IEEE Trans. Parallel Distrib. Syst., vol. 30, no. 12, pp. 2879–2895, Dec. 2019, doi: 10.1109/tpds.2019.2923197.
30. M. Zhang, B. Yuan, K. Xu, and H. Li, “LLM-Cloud Complete: Leveraging Cloud Computing for Efficient Large Language Model-based Code Completion,” JAIGS, vol. 5, no. 1, pp. 295–326, Aug. 2024, doi: 10.60087/jaigs.v5i1.200.
31. V. Bhimanapati, S. Jain, and O. Goel, “Cloud-Based Solutions for Video Streaming and Big Data Testing,” URR, vol. 10, no. 4, pp. 329–345, Dec. 2023, doi: 10.36676/urr.v10.i4.1333.
32. M. Bilal and M. Canini, “Towards automatic parameter tuning of stream processing systems,” Association for Computing Machinery, Sept. 2017, pp. 189–200. doi: 10.1145/3127479.3127492.