LangChain
This guide shows you how to integrate Vantage, a powerful vector database and search platform, with LangChain π¦οΈπ, a cutting-edge framework designed for building applications powered by large language models (LLMs), which represents a significant leap forward in the development of intelligent, data-driven applications.
This collaboration utilizes Vantage's capabilities in handling complex vector search queries alongside LangChain's innovative approach to leveraging LLMs for natural language understanding and processing.
Together, they offer developers a comprehensive toolkit for creating highly responsive, intuitive applications that can understand, interpret, and act on vast amounts of data in real-time.
Let's see how you can use it on your own.
Step 1: Environment Setup
The first step in this process is setting up the environment by installing necessary libraries and setting all important keys and values for later use.
- Installing libraries
pip install -qU \
pip install vantage-sdk \
langchain-openai==0.1.1 \
langchain==0.1.13
- Setting environment variables
export VANTAGE_API_KEY=<YOUR_VANTAGE_API_KEY>
export OPENAI_API_KEY=<YOUR_OPENAI_API_KEY - available at platform.openai.com/api-keys>
import os
vantage_api_key = os.environ.get('VANTAGE_API_KEY')
openai_api_key = os.environ.get('OPENAI_API_KEY')
- Setting Vantage Account ID
account_id = "<YOUR_VANTAGE_ACCOUNT_ID>"
Step 2: Data Preparation
Next, we are going to prepare the data we want to index. We will demonstrate two different methods of data preparation: the first method involves setting up simple text and metadata lists, and the second method uses LangChain's Document objects.
Option 1: Simple text and metadata lists
- To upload data to LangChain's vector store, it needs to be in the proper format - either as texts or documents. For texts, we will use the
add_texts
method, which requires a simple list of texts, as created below. Optionally, you can provide metadata for your text data, and it will be stored as matching pairs with the texts.
TEXTS = [
"Ted goes to the gym and exercises three times a week during summer.",
"Yuriko and Mina are going to Hawaii this summer.",
"Many people eat cereal for breakfast.",
]
METADATA = [
{"planet": "Earth", "something_else": "Some value"},
{"planet": "Earth"},
{"planet": "Mars"},
]
Option 2: LangChain's Documents
-
Another method of data upload involves providing LangChain's Documents, which can be created either manually or by using LangChain's document loaders.
Creating Documents manually
You can create documents from your text and metadata objects simply by creating a list of LangChain's Document objects, as shown below.from langchain_core.documents import Document documents = [Document(page_content=text, metadata=meta) for text, meta in zip(TEXTS, METADATA)]
Using Document Loaders
A more popular approach is using the document loader object. There are plenty of options and input sources you can use to createDocuments
from your data. For instance, you can use simple PDF files, as we will in our example, or you can use a HuggingFace dataset by instantiatingHuggingFaceDatasetLoader
. If you want to use data stored on your Azure Blob Storage, you can simply useAzureBlobStorageFileLoader
, among others. Refer to LangChain's documentation to explore all the possibilities.
In our example we are going to use simple PyPDFLoader
.
from langchain.document_loaders.pdf import PyPDFLoader
data = PyPDFLoader(file_path="<path_to_your_PDF_file>")
documents = data.load()
Step 3: Vantage Client Initialization
Before we create the LangChain vector store object, we need to instantiate our Vantage client object, which will be provided as a parameter for the creation of the Vantage vector store. We are using client credentials and account ID to instantiate the client in the code block below.
from vantage_sdk import VantageClient
vantage_client = VantageClient.using_vantage_api_key(
vantage_api_key=vantage_api_key,
account_id=account_id,
)
Step 4: LangChain's Vantage Vector Store Initialization
Another required parameter for LangChain's vector store is the embedding
parameter. Here, we need to provide an instance of LangChain's Embedding
class, which will be used to create embeddings from the provided data that needs to be ingested. More on this will be described in the next step.
In our example, we will be using OpenAIEmbeddings
for this purpose.
from langchain_openai import OpenAIEmbeddings
langchain_embeddings = OpenAIEmbeddings(
openai_api_key=openai_api_key,
model="text-embedding-ada-002"
)
For Vantage vector store specifically, we need to set embedding_dimension
parameter as well.
embedding_dimension = 1536 # matching the OpenAIEmbeddings model from the previous code block
Who Handles the Embedding Creation Process?
We are going to create a UPE collection, which means we don't need to specify a large language model. During LangChain's vector store initialization, the
embedding
parameter is provided, and it will create embeddings instead.Conversely, if you create a VME collection, you will need to provide an
llm
,external_api_key
, and ensure theembedding_dimension
matches the providedllm
model. In this scenario, Vantage will handle the embedding creation process.Below is an example of that scenario. In that case, LangChain's
embeddings
parameter is still required but will be ignored internally.vector_store_vme = Vantage( client=vantage_client, embedding=langchain_embeddings, collection_id=collection_id, user_provided_embeddings=False, llm="text-embedding-3-large", external_api_key="<YOUR_EXTERNAL_API_KEY>", embedding_dimension=3072, # matching text-embedding-3-large model (llm parameter) )
Below, we are finally initializing LangChain's Vantage vector store. For this, we are setting the collection_id
, embedding
, and client
that we created above, along with embedding_dimension
and the user_provided_embeddings
parameter, which we are setting to True, thereby choosing to create the UPE collection.
from langchain_community.vectorstores.vantage import Vantage
collection_id = "langchain-collection-texts"
vector_store_vme = Vantage(
client=vantage_client,
embedding=langchain_embeddings,
collection_id=collection_id,
embedding_dimension=embedding_dimension,
user_provided_embeddings=True,
)
Step 5: Indexing
Indexing your data into LangChain's vector store can be done either by providing texts or documents, using the add_texts
or add_documents
methods, respectively. In this step, we are using the add_texts
method and providing our lists, which we created in step 2.
ids = vector_store_vme.add_texts(TEXTS, METADATA)
What About IDs?
A list of ingested IDs mapped to your data is returned. You have the option to provide your own list of IDs, along with texts and metadata lists. If not provided, IDs will be automatically generated.
Alternative [Step 4 + Step 5]: LangChain's Vantage Vector Store Indexing during Initialization
An alternative way to ingest your data is by using the class methods from_texts
and from_documents
, which ingest your data during the initialization of the vector store.
This offers a concise, one-liner approach to accomplishing what we described in steps 4 and 5. All parameters remain the same, except that now we are using documents
instead of texts
, which were created using a document loader in the second part of step 2.`
collection_id = "langchain-collection-documents"
vector_store_document_loader = Vantage.from_documents(
documents=documents,
embedding=langchain_embeddings,
client=vantage_client,
collection_id=collection_id,
embedding_dimension=embedding_dimension,
user_provided_embeddings=True,
)
Step 6: RAG with Vantage & LangChain
In progress
Updated 4 months ago