Embeddings
Low-dimensional representations of high-dimensional data
What are Embeddings?
Embeddings are a fundamental concept in the field of machine learning and natural language processing (NLP). At their core, embeddings are dense, low-dimensional representations of high-dimensional data, such as text or images. Unlike the original data, which might be sparse and difficult for algorithms to process efficiently, embeddings capture the essence or meaning of the data in a form that's easier for models to work with.
Embeddings are often used to transform discrete objects (like words, sentences, or entire documents) into continuous vectors. This transformation allows computational models to understand and process complex relationships between objects, such as semantic similarity or contextual relevance. By mapping data to a continuous vector space, embeddings enable a wide range of machine learning applications, from recommender systems and search engines to sophisticated NLP tasks like sentiment analysis and machine translation.
Vantage & Embeddings
Vantage is designed to efficiently store and retrieve embeddings. It is compatible with embeddings from leading AI research labs, including OpenAI and HuggingFace. This compatibility allows users to leverage state-of-the-art embedding models for a variety of applications directly within Vantage. Besides that, for users with more specific use cases, Vantage offers the possibility to upload your own embeddings.
Users can decide which embedding creation option to use during the collection creation process. Below are examples of these options and how you can use them on your own.
Vantage-Managed Embeddings
1 | OpenAI Embeddings
Vantage supports OpenAI embeddings, enabling users to use the power of OpenAI's cutting-edge language models.
Collections using OpenAI Embeddings
To create a Vantage collection utilizing OpenAI embeddings, users can take advantage of Vantage's OpenAICollection
class and create an instance of it by providing the required fields. Besides the collection_id
and embeddings_dimension
parameters, users are required to provide the llm
parameter, which specifies the actual OpenAI model. To authenticate with OpenAI, users can choose between two options: providing an llm_secret
, which represents OpenAI's secret key or the ID of an already created external API key.
Example code block:
openai_collection = OpenAICollection(
collection_id="my-openai-collection",
embeddings_dimension=1536,
llm="text-embedding-ada-002",
external_account_id="YOUR_EXTERNAL_KEY_ID",
)
created_collection = vantage_client.create_collection(
collection=openai_collection,
)
openai_collection = OpenAICollection(
collection_id="my-openai-collection",
embeddings_dimension=1536,
llm="text-embedding-ada-002",
llm_secret="YOUR_OPENAI_SECRET_KEY",
)
created_collection = vantage_client.create_collection(
collection=openai_collection,
)
2 | HuggingFace Embeddings
Additionally, Vantage works with HuggingFace's wide range of pre-trained models, giving users the possibility to discover the ideal embeddings for their requirements.
Collections using HuggingFace Embeddings
To create a Vantage collection utilizing HuggingFace embeddings, users can take advantage of Vantage's HuggingFaceCollection
class and create an instance of it by providing the required fields. Besides the collection_id
and embeddings_dimension
parameters, users are required to provide the external_url
parameter, which represents deployed HuggingFace Model Endpoint URL. To authenticate with HuggingFace, users can choose between two options: providing an llm_secret
, which represents HuggingFace's secret key or the ID of an already created external API key.
external_url
parameterTo use a HuggingFace model, it must first be deployed. This can be easily done through HuggingFace Inference Endpoints, offering a straightforward way to deploy your model with just a few clicks. Once deployed, you can copy the Endpoint URL, which will serve as the value for the
url
parameter. Thellm_secret
should correspond to the secret key associated with the account from which your HuggingFace model was deployed.
Example code block:
hf_collection = HuggingFaceCollection(
collection_id="my-hf-collection",
embeddings_dimension=123,
external_url="HF_ENDPOINT_URL",
external_account_id="YOUR_EXTERNAL_KEY_ID",
)
created_collection = vantage_client.create_collection(
collection=hf_collection,
)
hf_collection = HuggingFaceCollection(
collection_id="my-hf-collection",
embeddings_dimension=123,
external_url="HF_ENDPOINT_URL",
llm_secret="YOUR_HF_SECRET_KEY",
)
created_collection = vantage_client.create_collection(
collection=hf_collection,
)
User-Provided Embeddings
3 | Custom Embeddings
Understanding the unique needs of various projects, Vantage also provides the capability to upload your own custom embeddings. This feature allows users to fully customize their embeddings, optimizing them for specific data types, domains, or performance requirements.
Note: Embedding vector should be a proper unit vector in order to perform the Vantage collection operations successfully.
Collections using custom user-provided embeddings
In this scenario, users can utilize Vantage's UserProvidedEmbeddingsCollection
class and create an instance of it by providing the required fields, which in this case are only collection_id
and embeddings_dimension
.
Example code block:
upe_collection = UserProvidedEmbeddingsCollection(
collection_id="my-upe-collection",
embeddings_dimension=123,
)
created_collection = vantage_client.create_collection(
collection=upe_collection,
)
Custom Embeddings - Full Tutorial
For a complete guide on using custom embeddings, check out our basic embedding search tutorial. It covers the entire process, from setting up your environment and preparing your data to indexing and querying your collection.
Updated 6 months ago