What embedding models should I use in 2026?

OpenAI's text-embedding-3-large is recommended for production RAG systems. It offers better multilingual support and allows dimension reduction for cost optimization. For budget-conscious projects, text-embedding-3-small provides excellent performance at lower cost.

RAG TUTORIAL

Talk to Salesforce Data Using OpenAI, LangChain & ChromaDB

Q: Can Salesforce handle RAG natively now?

Yes, Salesforce Data Cloud now includes native RAG capabilities through Agentforce. The Agentforce Data Library (ADL) automatically sets up vector stores, search indexes, and retrieval pipelines for enterprise use cases.

Build a conversational AI that understands your Salesforce CRM data using Retrieval-Augmented Generation (RAG)

Reading time: ~6 minutes | Last Updated: February 2026

RAG

Architecture Pattern

ChromaDB Rust Performance Boost

text-embedding-3

Recommended Model

Python

Implementation Language

1 What is RAG (Retrieval-Augmented Generation)?

Article Updated: February 2026

This article has been refreshed with the latest LangChain patterns, ChromaDB improvements, and OpenAI embedding model recommendations. Originally published December 2023.

What's New in This Update (February 2026)

Updated: LangChain documentation link (now at docs.langchain.com)
Added: ChromaDB 2025 Rust-core performance improvements (4x faster)
Added: OpenAI text-embedding-3 model recommendations
Added: Salesforce Agentforce native RAG capabilities
Added: 2026 production RAG best practices

Quick Answer: RAG (Retrieval-Augmented Generation) combines vector search with LLMs to answer questions using your own data. It retrieves relevant context from a knowledge base and feeds it to the AI for accurate, grounded responses.

This is blog post 2 in my AI series. In this tutorial, I'll share source code and a video walkthrough for using LangChain with OpenAI embeddings and ChromaDB vector database to create a conversational interface for Salesforce Lead data.

The concept behind this is called RAG - Retrieval-Augmented Generation. Instead of relying solely on the LLM's training data, we provide it with relevant context from our own database, enabling accurate answers about your specific Salesforce records.

Why RAG for Salesforce?
LLMs don't know about your specific customer data. RAG lets you "teach" the AI about your leads, opportunities, and accounts by providing relevant context at query time—without fine-tuning or retraining models.

2 Architecture Overview

The RAG architecture for this demo follows these key steps:

1. Extract Data

Get Lead records from Salesforce via REST API

→

2. Convert to Text

Format records as text documents

→

3. Create Embeddings

Generate vectors using OpenAI API

→

4. Store in ChromaDB

Persist vectors for fast retrieval

→

5. Query with LangChain

Conversational retrieval chain

Key Components

Component	Purpose	2026 Recommendation
LangChain	Orchestration framework for LLM applications	Use LangGraph for agentic workflows
ChromaDB	Open-source vector database	Rust-core rewrite offers 4x performance
OpenAI Embeddings	Convert text to vector representations	text-embedding-3-large for production
GPT Model	Generate natural language responses	GPT-4 or GPT-4-turbo for accuracy

3 Implementation Steps

Here's a summary of what the demo code accomplishes:

Get data from Salesforce - Connect via OAuth and export Lead records to a text file
Convert to embeddings - Use OpenAI's embedding model to create vector representations and store them in ChromaDB
Query with context - When a user asks a question, LangChain retrieves relevant chunks from ChromaDB to enrich the prompt
Generate response - OpenAI's GPT model uses the enriched context to answer questions accurately

Sample Code Structure

# Key imports
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import ConversationalRetrievalChain
from langchain.chat_models import ChatOpenAI

# Create embeddings and store in ChromaDB
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(
    documents=salesforce_docs,
    embedding=embeddings,
    persist_directory="./chroma_db"
)

# Create conversational chain
llm = ChatOpenAI(model_name="gpt-4")
qa_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=vectorstore.as_retriever(),
    return_source_documents=True
)

# Query your Salesforce data
response = qa_chain({
    "question": "Which leads are from the technology industry?",
    "chat_history": []
})

Cost Optimization: ChromaDB stores embeddings locally, so you only pay for the initial embedding generation. Subsequent queries use cached vectors, significantly reducing OpenAI API costs.

4 Video Tutorial & Complete Source Code

Watch the complete video walkthrough demonstrating the RAG implementation with Salesforce data:

Complete Source Code

The full Python implementation is available on GitHub. This includes Salesforce authentication, data extraction, embedding generation, and the conversational interface:

5 2026 Updates & Best Practices

The RAG landscape has evolved significantly since this article was first published. Here are the key updates for building production RAG systems in 2026:

ChromaDB Improvements (2025-2026)

Rust-core rewrite: Eliminates Python GIL bottlenecks, delivering up to 4x performance improvement for writes and queries
BM25 & SPLADE support: First-class sparse vector support enables hybrid retrieval strategies
Regex search: Added in July 2025 for more flexible querying
JavaScript Client V3: Complete rewrite with reduced bundle size

OpenAI Embedding Models

Model	Dimensions	Best For
text-embedding-3-large	3072 (adjustable)	Production RAG, multilingual support
text-embedding-3-small	1536	Cost-sensitive applications, prototypes

Dimension Reduction: With text-embedding-3-large, you can reduce dimensions from 3072 to 1024 via the API parameter, trading off some accuracy for lower storage and faster retrieval.

Salesforce Native RAG (Agentforce)

Salesforce now offers native RAG capabilities through Agentforce and Data Cloud. The Agentforce Data Library (ADL) automatically configures:

Data streams and object mapping
Vector data store and search index
Retriever and prompt template
Agent actions for conversational AI

Production RAG Best Practices (2026)

Agentic RAG: Use LangGraph for dynamic retrieval decisions—35-50% improvement on complex queries
Hierarchical chunking: Preserve document structure, validate chunk boundaries semantically
Smart routing: Implement model degradation and caching for 30-45% cost reduction
Observability: Integrate with LangSmith for production monitoring and debugging

6 Frequently Asked Questions

RAG is a technique that combines retrieval-based and generative AI models. It retrieves relevant information from a knowledge base (using vector embeddings) and uses that context to generate more accurate, grounded responses from an LLM like GPT-4. This allows AI to answer questions about your specific data without retraining.

ChromaDB is an open-source vector database that stores embeddings locally or in the cloud. It reduces OpenAI API costs by caching embeddings—you only pay for initial generation. It enables fast semantic search over your Salesforce records and supports persistence for production use.

OpenAI's text-embedding-3-large is recommended for production RAG systems. It offers better multilingual support and allows dimension reduction (3072 → 1024) for cost optimization. For prototypes or budget-conscious projects, text-embedding-3-small provides excellent performance at lower cost.

Yes! Salesforce Data Cloud now includes native RAG capabilities through Agentforce. The Agentforce Data Library (ADL) automatically sets up vector stores, search indexes, and retrieval pipelines for enterprise use cases. This is ideal for organizations already invested in the Salesforce ecosystem.

Key strategies include: (1) Cache embeddings in ChromaDB so you don't regenerate them, (2) Use dimension reduction with text-embedding-3-large, (3) Implement smart routing to cheaper models for simple queries, (4) Use hierarchical chunking to reduce retrieval calls by 30-40%.

7 Related Reading

Continue your Salesforce and AI learning journey with these related guides:

Salesforce Security Ultimate Guide - Comprehensive guide to securing your Salesforce org and data
7 Ways to Secure Experience Cloud - Best practices for protecting your Salesforce community
Solution vs Technical Architect - Understanding Salesforce architect roles and career paths

8 Abbreviations & Glossary

Technical Terms

Reference guide for abbreviations and technical terms used in this article.

RAG - Retrieval-Augmented Generation

LLM - Large Language Model

API - Application Programming Interface

GPT - Generative Pre-trained Transformer

CRM - Customer Relationship Management

OAuth - Open Authorization Protocol

ADL - Agentforce Data Library

GIL - Global Interpreter Lock (Python)

BM25 - Best Matching 25 (ranking function)

Link copied to clipboard!

One response to “Talk to Salesforce Data Using OpenAI, Langchain & Chroma”

Bhavesh says:

April 5, 2024 at 12:31 pm

Hello Sir, I am getting error. can we call on google meet?

Reply

Talk to Salesforce Data Using OpenAI, LangChain & ChromaDB

1 What is RAG (Retrieval-Augmented Generation)?

2 Architecture Overview

Key Components

3 Implementation Steps

Sample Code Structure

4 Video Tutorial & Complete Source Code

Complete Source Code

5 2026 Updates & Best Practices

ChromaDB Improvements (2025-2026)

OpenAI Embedding Models

Salesforce Native RAG (Agentforce)

Production RAG Best Practices (2026)

6 Frequently Asked Questions

7 Related Reading

8 Abbreviations & Glossary

Technical Terms

One response to “Talk to Salesforce Data Using OpenAI, Langchain & Chroma”

Leave a Reply to BhaveshCancel reply

Discover more from Jitendra Zaa