Distributed RAG Architecture for Enterprise Knowledge QA on Databricks

Vamshi Krishna Malthummeda

doi:10.63282/3050-9416.IJAIBDCMS-V7I2P117

Authors

Vamshi Krishna Malthummeda Independent researcher, USA. Author

DOI:

https://doi.org/10.63282/3050-9416.IJAIBDCMS-V7I2P117

Keywords:

Databricks, Vector Databases, Llm, Ray, Rag, Chatbot, Vector Embeddings, Salesforce, Soql

Abstract

Organizations rely on centralized knowledge repositories like Salesforce Knowledge to support internal teams and external teams. The traditional keyword/rule-based search often doesn’t provide accurate, up-to-date information at scale. This paper presents a unified data-centric and model-centric Retrieval-Augmented Generation (RAG) architecture implemented on Databricks using Apache Spark. The architecture integrates large-scale ETL processing with efficient model execution for vector embedding generation within a single distributed framework. The proposed solution employs open and modular design with separate components for data extraction, data cleansing, workflow orchestration, embedding generation, vector retrieval and LLM inference. In the proposed approach, knowledge articles are extracted by executing Salesforce Object Query Language (SOQL) queries against a Salesforce Knowledge object REST API endpoint and stored in a Databricks Lakehouse. The RAG pipeline employs stateful model reuse and fine-grained resource allocation coupled with concurrent multithreaded encoding within Databricks to perform large-scale vector embedding generation. The generated vector embeddings from the extracted content are stored in a vector database. At inference time, user queries from the chatbot are converted into vectors and matched against the vector database to find most relevant information chunks which are then sent along with prompts to an LLM to generate responses. This design ensures that answers are traceable to authoritative knowledge sources, reduces hallucinations, and supports continuous updates as knowledge articles evolve.The RAG pipeline implemented on a Databricks cluster demonstrated 100% CPU utilization with high execution efficiency on a vector embedding pipeline as shown in the experimentation. The proposed architecture provides fine-grained control over resource allocation, concurrency, and batching strategies, enabling enterprises to generate high quality embeddings while leveraging existing infrastructure without significant redesign.

References

1. Vector Search | Databricks (https://www.databricks.com/product/machine-learning/vector-search)

2. Vattam, L. (2022). Salesforce REST API in Action: A Practical and Research-Based Exploration of Integration Solutions. International Journal of Artificial Intelligence, Data Science, and Machine Learning, 3(2), 36-43.

3. Gupta, S., Ranjan, R., & Singh, S. N. (2024). A comprehensive survey of retrieval-augmented generation (rag): Evolution, current landscape and future directions. arXiv preprint arXiv:2410.12837.

4. Dive deep into vector data stores using Amazon Bedrock Knowledge Bases | Artificial Intelligence

5. Salesforce Agentforce Pricing | Salesforce

6. What is a Vector Database & How Does it Work? Use Cases + Examples | Pinecone

7. Sarmah, B., Mehta, D., Hall, B., Rao, R., Patel, S., & Pasquali, S. (2024, November). Hybridrag: Integrating knowledge graphs and vector retrieval augmented generation for efficient information extraction. In Proceedings of the 5th ACM International Conference on AI in Finance (pp. 608-616).

Distributed RAG Architecture for Enterprise Knowledge QA on Databricks

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

How to Cite

Make a Submission

Callpaper

Menu

Information

Keywords

Latest publications