The Future of Serverless Architectures in Data Engineering
DOI:
https://doi.org/10.63282/3050-9416.IJAIBDCMS-V7I1P103Keywords:
Serverless, Python, Kubernetes, Interactive batch processing, Data pipelines, Analytics Monitoring, Cost Engineering, ScalabilityAbstract
With every increase in the complexity and volume of data engineering workloads, organizations face highly important architectural choices to make in terms of compute execution environments. The serverless computing platforms and containerized orchestration systems presented by two dominant paradigms provide different favourable measurements in terms of performance, cost, scalability, and operating overheads. Although the concept of a serverless platform features automatic scaling, simplified operations, and a pay-as-you-use business model, the containerized offers predictable performance, control of resources in fine-grained way and it is useful with long-running workloads. This paper compares empirically and data-driven two architectures (serverless and containerized) in terms of data engineering pipelines, namely, ETL workflows, event-driven processing, streaming analytics, and batch jobs. Based on the public cloud benchmark datasets, academic measurement literature, Open Telemetry traces, and published pricing models, the research conducts measurement of latency, throughput, economic efficiency, scalability under burst workload, and the reliability of this operation. The results reveal that serverless systems perform better in bursty and event-driven workloads that have unpredictable demand, but containerized systems have better performance in sustained and high-throughput pipes and resource-intensive workloads. The conclusion of the paper ended with practical recommendations to offer the builders on the best model of execution without violating workload characteristics.
References
1. Akkus, I. (2021). Serverless computing: Architectural challenges and solutions. IEEE Internet Computing.
2. Baldini, I. (2021). Serverless computing: Current trends and open problems. ACM Computing Surveys.
3. Eismann, S. (2021). Serverless in the wild: Characterizing and optimizing the serverless workload. USENIX ATC.
4. Jonas, E. (2021). Cloud programming simplified: A Berkeley view on serverless computing. arXiv.
5. Lloyd, W. (2021). Serverless computing: An investigation of factors influencing cold start latency. IEEE Cloud.
6. McGrath, G., & Brenner, P. (2022). Serverless computing: Design, implementation, and performance. Future Generation Computer Systems.
7. Shahrad, M. (2022). Serverless computing versus containers: Performance and cost trade-offs. IEEE Transactions on Cloud Computing.
8. Spillner, J. (2022). Benchmarking FaaS platforms. Journal of Cloud Computing.
9. Sun, S., Qin, M., Zhang, W., Xia, H., & Zong, C. (2023). TradeMaster: A holistic quantitative trading platform empowered by reinforcement learning. NeurIPS Datasets and Benchmarks.
10. Wang, L. (2023). A large-scale study of serverless cold starts. ACM SIGMETRICS.
11. Zhang, Q. (2023). Characterizing serverless workloads for cloud efficiency. IEEE TPDS.
12. Gao, P. (2024). Serverless vs. Kubernetes for data-intensive pipelines. Future Generation Computer Systems.
13. Probierz, A. (2024). Benchmarking cloud-native execution platforms. Machine Learning Journal.
14. Lin, X. (2024). Observability-driven performance analysis of Kubernetes workloads. IEEE Software.
15. Chen, Y. (2025). Cost-aware scheduling for serverless and containerized workloads. IEEE Transactions on Services Computing.