Salesforce + AI
Jitendra's Blog
RAG TUTORIAL

Talk to Salesforce Data Using OpenAI, LangChain & ChromaDB

Build a conversational AI that understands your Salesforce CRM data using Retrieval-Augmented Generation (RAG)

RAG
Architecture Pattern
4x
ChromaDB Rust Performance Boost
text-embedding-3
Recommended Model
Python
Implementation Language

1 What is RAG (Retrieval-Augmented Generation)?

Article Updated: February 2026

This article has been refreshed with the latest LangChain patterns, ChromaDB improvements, and OpenAI embedding model recommendations. Originally published December 2023.

What's New in This Update (February 2026)

This is blog post 2 in my AI series. In this tutorial, I'll share source code and a video walkthrough for using LangChain with OpenAI embeddings and ChromaDB vector database to create a conversational interface for Salesforce Lead data.

The concept behind this is called RAG - Retrieval-Augmented Generation. Instead of relying solely on the LLM's training data, we provide it with relevant context from our own database, enabling accurate answers about your specific Salesforce records.

Why RAG for Salesforce?
LLMs don't know about your specific customer data. RAG lets you "teach" the AI about your leads, opportunities, and accounts by providing relevant context at query time—without fine-tuning or retraining models.

2 Architecture Overview

The RAG architecture for this demo follows these key steps:

1. Extract Data
Get Lead records from Salesforce via REST API
2. Convert to Text
Format records as text documents
3. Create Embeddings
Generate vectors using OpenAI API
4. Store in ChromaDB
Persist vectors for fast retrieval
5. Query with LangChain
Conversational retrieval chain

Key Components

Component Purpose 2026 Recommendation
LangChain Orchestration framework for LLM applications Use LangGraph for agentic workflows
ChromaDB Open-source vector database Rust-core rewrite offers 4x performance
OpenAI Embeddings Convert text to vector representations text-embedding-3-large for production
GPT Model Generate natural language responses GPT-4 or GPT-4-turbo for accuracy

3 Implementation Steps

Here's a summary of what the demo code accomplishes:

  1. Get data from Salesforce - Connect via OAuth and export Lead records to a text file
  2. Convert to embeddings - Use OpenAI's embedding model to create vector representations and store them in ChromaDB
  3. Query with context - When a user asks a question, LangChain retrieves relevant chunks from ChromaDB to enrich the prompt
  4. Generate response - OpenAI's GPT model uses the enriched context to answer questions accurately

Sample Code Structure

# Key imports
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import ConversationalRetrievalChain
from langchain.chat_models import ChatOpenAI

# Create embeddings and store in ChromaDB
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(
    documents=salesforce_docs,
    embedding=embeddings,
    persist_directory="./chroma_db"
)

# Create conversational chain
llm = ChatOpenAI(model_name="gpt-4")
qa_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=vectorstore.as_retriever(),
    return_source_documents=True
)

# Query your Salesforce data
response = qa_chain({
    "question": "Which leads are from the technology industry?",
    "chat_history": []
})
Cost Optimization: ChromaDB stores embeddings locally, so you only pay for the initial embedding generation. Subsequent queries use cached vectors, significantly reducing OpenAI API costs.

4 Video Tutorial & Complete Source Code

Watch the complete video walkthrough demonstrating the RAG implementation with Salesforce data:

Complete Source Code

The full Python implementation is available on GitHub. This includes Salesforce authentication, data extraction, embedding generation, and the conversational interface:

5 2026 Updates & Best Practices

The RAG landscape has evolved significantly since this article was first published. Here are the key updates for building production RAG systems in 2026:

ChromaDB Improvements (2025-2026)

OpenAI Embedding Models

Model Dimensions Best For
text-embedding-3-large 3072 (adjustable) Production RAG, multilingual support
text-embedding-3-small 1536 Cost-sensitive applications, prototypes
Dimension Reduction: With text-embedding-3-large, you can reduce dimensions from 3072 to 1024 via the API parameter, trading off some accuracy for lower storage and faster retrieval.

Salesforce Native RAG (Agentforce)

Salesforce now offers native RAG capabilities through Agentforce and Data Cloud. The Agentforce Data Library (ADL) automatically configures:

Production RAG Best Practices (2026)

  1. Agentic RAG: Use LangGraph for dynamic retrieval decisions—35-50% improvement on complex queries
  2. Hierarchical chunking: Preserve document structure, validate chunk boundaries semantically
  3. Smart routing: Implement model degradation and caching for 30-45% cost reduction
  4. Observability: Integrate with LangSmith for production monitoring and debugging

6 Frequently Asked Questions

RAG is a technique that combines retrieval-based and generative AI models. It retrieves relevant information from a knowledge base (using vector embeddings) and uses that context to generate more accurate, grounded responses from an LLM like GPT-4. This allows AI to answer questions about your specific data without retraining.
ChromaDB is an open-source vector database that stores embeddings locally or in the cloud. It reduces OpenAI API costs by caching embeddings—you only pay for initial generation. It enables fast semantic search over your Salesforce records and supports persistence for production use.
OpenAI's text-embedding-3-large is recommended for production RAG systems. It offers better multilingual support and allows dimension reduction (3072 → 1024) for cost optimization. For prototypes or budget-conscious projects, text-embedding-3-small provides excellent performance at lower cost.
Yes! Salesforce Data Cloud now includes native RAG capabilities through Agentforce. The Agentforce Data Library (ADL) automatically sets up vector stores, search indexes, and retrieval pipelines for enterprise use cases. This is ideal for organizations already invested in the Salesforce ecosystem.
Key strategies include: (1) Cache embeddings in ChromaDB so you don't regenerate them, (2) Use dimension reduction with text-embedding-3-large, (3) Implement smart routing to cheaper models for simple queries, (4) Use hierarchical chunking to reduce retrieval calls by 30-40%.

7 Related Reading

Continue your Salesforce and AI learning journey with these related guides:

8 Abbreviations & Glossary

Technical Terms

Reference guide for abbreviations and technical terms used in this article.

RAG - Retrieval-Augmented Generation
LLM - Large Language Model
API - Application Programming Interface
GPT - Generative Pre-trained Transformer
CRM - Customer Relationship Management
OAuth - Open Authorization Protocol
ADL - Agentforce Data Library
GIL - Global Interpreter Lock (Python)
BM25 - Best Matching 25 (ranking function)
Link copied to clipboard!
Previous Post
Converting Salesforce Data into Embeddings with OpenAI and AWS Lambda
Next Post
Talk to any PDF document using AI
Archives by Year
2026 13 2025 16 2024 2 2023 9 2022 8 2021 4 2020 18 2019 16 2018 21 2017 34 2016 44 2015 54 2014 30 2013 31 2012 46 2011 114 2010 162
Search Blog

One response to “Talk to Salesforce Data Using OpenAI, Langchain & Chroma”

Leave a Reply to BhaveshCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Discover more from Jitendra Zaa

Subscribe now to keep reading and get access to the full archive.

Continue Reading