Serverless Architecture Patterns for Enterprise AI Agents: ECS Fargate, OpenSearch k-NN, and DynamoDB for Knowledge-Grounded LLM Workflows

Authors

  • Raj Sunkara Independent Researcher, USA. Author

DOI:

https://doi.org/10.63282/3050-9416.IJAIBDCMS-V7I2P129

Keywords:

Serverless Architecture, AWS, ECS Fargate, OpenSearch, k-NN, HNSW, Vector Search, Amazon Titan Embeddings, DynamoDB, Retrieval-Augmented Generation, RAG, AI Agent, Knowledge Base, Ingestion Pipeline, AI Infrastructure

Abstract

Production deployment of enterprise AI agents that combine large language models with proprietary knowledge bases presents specific architectural challenges. State has to be managed across asynchronous workflows. Retrieval latency has to be kept low enough to support interactive use. The knowledge base has to stay fresh as the underlying source systems change. Costs have to be predictable. The agent has to be observable and recoverable when individual steps fail. This paper describes a next-generation serverless reference architecture for a domain-specific AI agent. The architecture is built on AWS ECS Fargate for compute, Amazon OpenSearch with k-Nearest Neighbor search using Hierarchical Navigable Small World indexing for vector search, Amazon Titan embeddings for semantic representation, Amazon DynamoDB for workflow and conversation state, and an automated ingestion pipeline that synchronizes issue trackers, architecture documentation, and source code repositories into the knowledge base. The paper describes the end-to-end data flow from ingestion through embedding, retrieval, language model invocation, and response persistence. It discusses the design choices around HNSW parameter tuning, embedding refresh strategy, Fargate cold-start mitigation, isolation between tenant workloads, and incremental knowledge base updates without full reindexing. The architecture supports a graphics engineering bug triage agent that builds on the Retrieval-Augmented Generation approach described in earlier work and extends it with the operational properties that production deployment at scale requires. The contribution is a deployment-tested blueprint for teams building RAG-based agents on AWS, with attention to operational concerns that frequently receive less coverage in prototype-focused literature.

References

1. Amazon Web Services. Amazon ECS and AWS Fargate documentation.

2. Amazon Web Services. Amazon OpenSearch Service documentation, including k-NN search.

3. Amazon Web Services. Amazon Bedrock documentation, including foundation model integration and Knowledge Bases.

4. Amazon Web Services. Amazon Titan embeddings documentation.

5. Amazon Web Services. Amazon DynamoDB documentation.

6. Malkov, Y. A. and Yashunin, D. A. Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018.

7. Lewis, P. et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS 2020.

8. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. Attention Is All You Need. NeurIPS, 2017.

9. Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT, 2019.

10. Brown, T. B. et al. Language Models are Few-Shot Learners. NeurIPS, 2020.

11. Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., and Yih, W. Dense Passage Retrieval for Open-Domain Question Answering. EMNLP, 2020.

12. Johnson, J., Douze, M., and Jegou, H. Billion-Scale Similarity Search with GPUs. IEEE Transactions on Big Data, 2021.

13. Reimers, N. and Gurevych, I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. EMNLP-IJCNLP, 2019.

14. Hellerstein, J. M., Faleiro, J., Gonzalez, J. E., Schleier-Smith, J., Sreekanti, V., Tumanov, A., and Wu, C. Serverless Computing: One Step Forward, Two Steps Back. CIDR, 2019.

15. Jonas, E. et al. Cloud Programming Simplified: A Berkeley View on Serverless Computing. arXiv:1902.03383, 2019.

Downloads

Published

2026-05-04

Issue

Section

Articles

How to Cite

1.
Sunkara R. Serverless Architecture Patterns for Enterprise AI Agents: ECS Fargate, OpenSearch k-NN, and DynamoDB for Knowledge-Grounded LLM Workflows. IJAIBDCMS [Internet]. 2026 May 4 [cited 2026 May 27];7(2):197-201. Available from: https://ijaibdcms.org/index.php/ijaibdcms/article/view/580