Vantage and LangChain

This guide shows you how to integrate Vantage, a powerful vector database and search platform, with LangChain πŸ¦œοΈπŸ”—, a cutting-edge framework designed for building applications powered by large language models (LLMs), which represents a significant leap forward in the development of intelligent, data-driven applications.

This collaboration utilizes Vantage's capabilities in handling complex vector search queries alongside LangChain's innovative approach to leveraging LLMs for natural language understanding and processing.

Together, they offer developers a comprehensive toolkit for creating highly responsive, intuitive applications that can understand, interpret, and act on vast amounts of data in real-time.

Let's see how you can use it on your own.

Step 1: Environment Setup

The first step in this process is setting up the environment by installing necessary libraries and setting all important keys and values for later use.

  • Installing libraries
pip install -qU \
  pip install vantage-sdk \
  langchain-openai==0.1.1 \
  langchain==0.1.13
  • Setting environment variables
export VANTAGE_API_KEY=<YOUR_VANTAGE_API_KEY>
export OPENAI_API_KEY=<YOUR_OPENAI_API_KEY - available at platform.openai.com/api-keys>
import os

vantage_api_key = os.environ.get('VANTAGE_API_KEY')
openai_api_key = os.environ.get('OPENAI_API_KEY')
  • Setting Vantage Account ID
account_id = "<YOUR_VANTAGE_ACCOUNT_ID>"

Step 2: Data Preparation

Next, we are going to prepare the data we want to index. We will demonstrate two different methods of data preparation: the first method involves setting up simple text and metadata lists, and the second method uses LangChain's Document objects.

Option 1: Simple text and metadata lists

  • To upload data to LangChain's vector store, it needs to be in the proper format - either as texts or documents. For texts, we will use the add_texts method, which requires a simple list of texts, as created below. Optionally, you can provide metadata for your text data, and it will be stored as matching pairs with the texts.
TEXTS = [
    "Ted goes to the gym and exercises three times a week during summer.",
    "Yuriko and Mina are going to Hawaii this summer.",
    "Many people eat cereal for breakfast.",
]

METADATA = [
    {"planet": "Earth", "something_else": "Some value"},
    {"planet": "Earth"},
    {"planet": "Mars"},
]

Option 2: LangChain's Documents

  • Another method of data upload involves providing LangChain's Documents, which can be created either manually or by using LangChain's document loaders.
    Creating Documents manually
    You can create documents from your text and metadata objects simply by creating a list of LangChain's Document objects, as shown below.

    from langchain_core.documents import Document
    
    documents = [Document(page_content=text, metadata=meta) for text, meta in zip(TEXTS, METADATA)]
    

    Using Document Loaders
    A more popular approach is using the document loader object. There are plenty of options and input sources you can use to create Documents from your data. For instance, you can use simple PDF files, as we will in our example, or you can use a HuggingFace dataset by instantiating HuggingFaceDatasetLoader. If you want to use data stored on your Azure Blob Storage, you can simply use AzureBlobStorageFileLoader, among others. Refer to LangChain's documentation to explore all the possibilities.

In our example we are going to use simple PyPDFLoader.

from langchain.document_loaders.pdf import PyPDFLoader

data = PyPDFLoader(file_path="<path_to_your_PDF_file>")
documents = data.load()

Step 3: Vantage Client Initialization

Before we create the LangChain vector store object, we need to instantiate our Vantage client object, which will be provided as a parameter for the creation of the Vantage vector store. We are using client credentials and account ID to instantiate the client in the code block below.

from vantage_sdk import VantageClient

vantage_client = VantageClient.using_vantage_api_key(
    vantage_api_key=vantage_api_key,
    account_id=account_id,
)

Step 4: LangChain's Vantage Vector Store Initialization

Another required parameter for LangChain's vector store is the embedding parameter. Here, we need to provide an instance of LangChain's Embedding class, which will be used to create embeddings from the provided data that needs to be ingested. More on this will be described in the next step.

In our example, we will be using OpenAIEmbeddings for this purpose.

from langchain_openai import OpenAIEmbeddings

langchain_embeddings = OpenAIEmbeddings(
  openai_api_key=openai_api_key, 
  model="text-embedding-ada-002"
)

For Vantage vector store specifically, we need to set embedding_dimension parameter as well.

embedding_dimension = 1536 # matching the OpenAIEmbeddings model from the previous code block

🚧

Who Handles the Embedding Creation Process?

We are going to create a UPE collection, which means we don't need to specify a large language model. During LangChain's vector store initialization, the embedding parameter is provided, and it will create embeddings instead.

Conversely, if you create a VME collection, you will need to provide an llm, external_api_key, and ensure the embedding_dimension matches the provided llm model. In this scenario, Vantage will handle the embedding creation process.

Below is an example of that scenario. In that case, LangChain's embeddings parameter is still required but will be ignored internally.

vector_store_vme = Vantage(  
    client=vantage_client,  
    embedding=langchain_embeddings,  
    collection_id=collection_id,  
    user_provided_embeddings=False,  
    llm="text-embedding-3-large",  
    external_api_key="<YOUR_EXTERNAL_API_KEY>",  
    embedding_dimension=3072, # matching text-embedding-3-large model (llm parameter)  
)

Below, we are finally initializing LangChain's Vantage vector store. For this, we are setting the collection_id, embedding, and client that we created above, along with embedding_dimension and the user_provided_embeddings parameter, which we are setting to True, thereby choosing to create the UPE collection.

from langchain_community.vectorstores.vantage import Vantage

collection_id = "langchain-collection-texts"

vector_store_vme = Vantage(
    client=vantage_client,
    embedding=langchain_embeddings,
    collection_id=collection_id,
    embedding_dimension=embedding_dimension,
    user_provided_embeddings=True,
)

Step 5: Indexing

Indexing your data into LangChain's vector store can be done either by providing texts or documents, using the add_texts or add_documents methods, respectively. In this step, we are using the add_texts method and providing our lists, which we created in step 2.

ids = vector_store_vme.add_texts(TEXTS, METADATA)

πŸ“˜

What About IDs?

A list of ingested IDs mapped to your data is returned. You have the option to provide your own list of IDs, along with texts and metadata lists. If not provided, IDs will be automatically generated.

Alternative [Step 4 + Step 5]: LangChain's Vantage Vector Store Indexing during Initialization

An alternative way to ingest your data is by using the class methods from_texts and from_documents, which ingest your data during the initialization of the vector store.

This offers a concise, one-liner approach to accomplishing what we described in steps 4 and 5. All parameters remain the same, except that now we are using documents instead of texts, which were created using a document loader in the second part of step 2.`

collection_id = "langchain-collection-documents"

vector_store_document_loader = Vantage.from_documents(
    documents=documents,
    embedding=langchain_embeddings,
    client=vantage_client,
    collection_id=collection_id,
    embedding_dimension=embedding_dimension,
    user_provided_embeddings=True,
)

Step 6: RAG with Vantage & LangChain

🚧

In progress