Basic Embedding Search

Furniture Search With Images

In this tutorial, you'll revisit familiar territory one more time. Similar to the basic semantic search tutorial, we're building a search index to enable semantic search across a set of furniture products, using both images and text this time with the help of the OpenAI CLIP model. As before, we will load and search through furniture listings sourced from various sellers on Etsy. There's no need to familiarise yourself with the data; we are using the same dataset as in the basic semantic search tutorial.

Prerequisites

Before you begin, you will need:

  1. Vantage Account:
    Sign up for a Vantage account through our console.
  2. Vantage API Key:
    You can find your Vantage API key on the console by navigating to API keys tab.
  3. Furniture Sample Data File:
    Download the data file, which contains approximately 5k listings with images and descriptions.

📘

Play On Your Own

To explore this example further, you can run the Jupyter Notebook on your own. Find it in our Vantage Tutorials GitHub repository.

Happy discovering!

Step 1: Environment Setup

As always, we'll start with setting up the environment. In this tutorial, you will need to create a new virtual environment and install the prerequisite libraries. Additionally, you will have to download and install the CLIP model. All the necessary steps are outlined in our notebook, which you can follow to prepare your local environment for the subsequent steps.

Step 2: CLIP Model Preparation

What is CLIP Model?

OpenAI's CLIP (Contrastive Language–Image Pre-training) model is an advanced machine learning model designed to understand and interpret images in the context of natural language descriptions. It can recognize a wide range of visual concepts in images and associate them with textual descriptions, enabling versatile applications such as zero-shot learning, where it can accurately classify images it has never seen before based on a textual description.

Notebook Info:

In the notebook, we outline the steps to instantiate model and preprocess objects. The model is utilized to tokenize text inputs and subsequently generate embeddings from both text and image data. Meanwhile, preprocess is a callable used for image preprocessing, preparing them for the CLIP model to create embeddings.

Step 3: Data Preparation

In the data preparation step, we aim to transform our original furniture data, retaining only the columns that will be used for subsequent uploading and searching. The essential fields for us are id, text (which contains item descriptions), and image_url (referred to as noop_image_url in the dataset), holding the paths to the item images.

It's important to mention that each original record is transformed into two records compatible with Vantage, each containing id, text, and embeddings fields. The difference between them lies in the source of their embeddings: one record contains embeddings derived from the text field, while the other contains embeddings derived from the image specified in the image_url. This process ensures that the same object in the collection can be represented twice, once based on the image and once based on the text.

Notebook Info:

The notebook provides all the necessary steps for preparing and transforming the data into a format compatible with Vantage. For more details on this format, refer to the Data Ingestion documentation pages.

Step 4: Vantage: Data Upload & Indexing

At this stage, the Vantage vector database plays a crucial role. We will upload our prepared data into a Vantage collection, which must first be created. This can be done programmatically using our Python SDK, although the Console UI offers an alternative method. For guidance on using the Console UI, refer to our Create Collections guide.

Notebook Info:

If you're following our notebook, you'll see that data upload is done through our Python SDK. We provide various methods for data upload, but using the JSONL format in this use case. You can find more information about this approach on the Management API documentation page. If you wish, you can upload your data using Console UI as well, but make sure that provided data is correctly formatted in the parquet format as outlined in the Vantage Parquet Format documentation.

Step 5: Vantage: Search

Finally, we arrive at the core component of this tutorial: search. In this section, we will conduct various types of searches to demonstrate how you can utilize both images and text for embedding search, as well as More-Like-This search if you wish to retrieve results based on specific documents from the collection.

As previously mentioned, we are using a UPE (User-Provided Embeddings) collection in this example. Therefore, semantic search is not applicable here.

5.1 Query Preparation

To perform searches within our collection, we need to prepare our queries properly. This involves creating embedding vectors from our images and text for the embedding search. Additionally, we will extract two sample document IDs to conduct a More-Like-This search, which finds documents similar to the ones with the provided IDs. Detailed steps on how to accomplish this are described in the notebook.

5.2 Search

In this example, we will explore two out of the four types of searches that Vantage offers - Embedding and More-Like-This search.

For the embedding search, we will use image and text query embeddings prepared in the previous step, 5.1 Query Preparation. Performing searches in Vantage is straightforward once your data is uploaded. Simply provide an appropriate query, based on the search type, and the ID of the collection you wish to search.

# ...

search_results = vantage_client.embedding_search(
  embedding=query_embedding, 
  collection_id=COLLECTION_ID
)

# ...
# ...

search_results = vantage_client.more_like_this_search(
  document_id=query_document_id, 
  collection_id=COLLECTION_ID
)

# ...

Again, you can visit our notebook to see these searches in action.

5.3 Search Results Analysis

Finally, performing a particular search will result in a Vantage Search Results object. There, you can see all the returned records, with their IDs and scores, which represent how similar each particular record is to your query.

In the notebook, you will find some helper functions that we use to display each result as an image grid, making it easier to see and compare them. However, originally, all results adhere to the same schema of the SearchResults object.

When comparing our results, you'll find that all search methods are equally effective at returning results similar to your query. The choice of which method to adopt depends on your preference. To view concrete examples and experiment with different queries yourself, please visit our notebook.

Conclusion

If you followed the notebook, you can see that all of our search results returned items that are similar to our queries.

While the results are not identical — due to the slight differences in the methods used for each search — the similarity across items effectively demonstrates the efficiency of both Embedding search and MoreLikeThis search. This also highlights the versatility of accepting both text and image inputs for queries.

We hope this tutorial has been informative and helpful to you! For further exploration and more use cases, feel free to check out our other Tutorials or examples.