Distributed RAG Architecture for Enterprise Knowledge QA on Databricks

Authors

  • Vamshi Krishna Malthummeda Independent researcher, USA. Author

DOI:

https://doi.org/10.63282/3050-9416.IJAIBDCMS-V7I2P117

Keywords:

Databricks, Vector Databases, Llm, Ray, Rag, Chatbot, Vector Embeddings, Salesforce, Soql

Abstract

Organizations rely on centralized knowledge repositories like Salesforce Knowledge to support internal teams and external teams. The traditional keyword/rule-based search often doesn’t provide accurate, up-to-date information at scale. This paper presents a unified data-centric and model-centric Retrieval-Augmented Generation (RAG) architecture implemented on Databricks using Apache Spark. The architecture integrates large-scale ETL processing with efficient model execution for vector embedding generation within a single distributed framework. The proposed solution employs open and modular design with separate components for data extraction, data cleansing, workflow orchestration, embedding generation, vector retrieval and LLM inference.  In the proposed approach, knowledge articles are extracted by executing Salesforce Object Query Language (SOQL) queries against a Salesforce Knowledge object REST API endpoint and stored in a Databricks Lakehouse. The RAG pipeline employs stateful model reuse and fine-grained resource allocation coupled with concurrent multithreaded encoding within Databricks to perform large-scale vector embedding generation.  The generated vector embeddings from the extracted content are stored in a vector database. At inference time, user queries from the chatbot are converted into vectors and matched against the vector database to find most relevant information chunks which are then sent along with prompts to an LLM to generate responses. This design ensures that answers are traceable to authoritative knowledge sources, reduces hallucinations, and supports continuous updates as knowledge articles evolve.The RAG pipeline implemented on a Databricks cluster demonstrated 100% CPU utilization with high execution efficiency on a vector embedding pipeline as shown in the experimentation. The proposed architecture provides fine-grained control over resource allocation, concurrency, and batching strategies, enabling enterprises to generate high quality embeddings while leveraging existing infrastructure without significant redesign.

References

1. Vector Search | Databricks (https://www.databricks.com/product/machine-learning/vector-search)

2. Vattam, L. (2022). Salesforce REST API in Action: A Practical and Research-Based Exploration of Integration Solutions. International Journal of Artificial Intelligence, Data Science, and Machine Learning, 3(2), 36-43.

3. Gupta, S., Ranjan, R., & Singh, S. N. (2024). A comprehensive survey of retrieval-augmented generation (rag): Evolution, current landscape and future directions. arXiv preprint arXiv:2410.12837.

4. Dive deep into vector data stores using Amazon Bedrock Knowledge Bases | Artificial Intelligence

5. Salesforce Agentforce Pricing | Salesforce

6. What is a Vector Database & How Does it Work? Use Cases + Examples | Pinecone

7. Sarmah, B., Mehta, D., Hall, B., Rao, R., Patel, S., & Pasquali, S. (2024, November). Hybridrag: Integrating knowledge graphs and vector retrieval augmented generation for efficient information extraction. In Proceedings of the 5th ACM International Conference on AI in Finance (pp. 608-616).

Downloads

Published

2026-04-17

Issue

Section

Articles

How to Cite

1.
Malthummeda VK. Distributed RAG Architecture for Enterprise Knowledge QA on Databricks. IJAIBDCMS [Internet]. 2026 Apr. 17 [cited 2026 Apr. 23];7(2):98-104. Available from: https://ijaibdcms.org/index.php/ijaibdcms/article/view/550