Search Options

Parameters Common to All Search Endpoints

Vantage supports several parameters are common to all search endpoints:

  • Collection ID: The only required field, specifying the collection to search within.
  • Accuracy: Defines the accuracy threshold for the search results.
  • Pagination: Controls the pagination settings for navigating through search results.
  • Filter: Allows for narrowing down search results based on specific criteria.
  • Sort: Determines the sorting order of the search results.
  • Weighted Field Values: Applies specific weights to certain fields to influence search relevance.

Required Parameters

Collection Identification (required)

You can have many collections with various types and composition of data.

To instruct the Vantage platform which collection within your account to perform the search against, user have to provide collection_id and account_id as part of the endpoint path.

  • account_id: The Vantage account ID that the collection is contained within. This can be found in the Console UI and it is typically your company or organization name.
  • collection_id: The unique identifier of the collection you are searching. You specified this ID when you created the collection. This can be found in the Console UI or by API request.

💻

Python SDK

If you are accessing the Vantage platform through our Python SDK, account_id can be provided during the client initialization process, while collection_id can be provided during the method call.

from vantage_sdk import VantageClient

vantage_instance = VantageClient.using_vantage_api_key(
    vantage_api_key=VANTAGE_API_KEY,
    account_id=ACCOUNT_ID,
)

vantage_instance.semantic_search(
    collection_id="example-collection",
    text="some query text"
)

Additional Optional Parameters

Accuracy

The Vantage platform lets you tune the recall of every search query, controlling how much of your collection data to search over. Generally, a lower accuracy number give great results, with exceptional speed (tens of milliseconds). A higher accuracy number may provide additional or better results, but take longer to process (one to three seconds).

  • collection.accuracy: A number between 0.001 and 1.000 that tells the Vantage platform how much of the collection to search across. A higher number will search across more of the collection but take longer. If unsure, a good place to start is 0.2.
{
  ...
  "collection": {
    "accuracy" : 0.15
    ...
  }
  ...
}
{
  ...
  "collection": {
    "accuracy" : 0.5
    ...
  }
  ...
}

💻

Python SDK

If you are accessing the Vantage platform through our Python SDK, accuracy can be provided during the method call.

...

vantage_instance.semantic_search(
    collection_id="example-collection",
    text="some query text",
    accuracy=0.15
)

...

Sort

To enable sorting of your search results, follow the steps outlined below:

Data Ingestion: When ingesting your data, ensure that the column names intended for sorting have the prefix meta_ordered_. This prefix differentiates sortable columns from other metadata fields, which typically use the prefix meta_. For instance, if you wish to sort by price, name the column meta_ordered_price.

Value Type Restriction: Values provided for the meta_ordered_ columns must be of type float.

Executing a Search: During your search query, refer to the field by its base name without the prefix. For example, use price to sort by the previously defined meta_ordered_price column.

  • field: The name of the field by which search results are sorted. For instance, based on the context provided earlier, price would serve as the sort_field when you want to organize search results according to price values.
  • order: Specifies the direction in which search results are organized. It can be either ascending (asc) to sort from lowest to highest values, or descending (desc) to sort from highest to lowest values. The default sorting order is descending (desc).
  • mode: Indicates the criteria used for sorting search results. Options include field_selection, which organizes results based on the values of the sort_field, and semantic_threshold, which sorts results based on their relevance or similarity to the search query. The default sorting mode is field_selection.
{  
  "sort": {  
    	"field": "price",  
    	"order": "asc",  
    	"mode": "field_selection",  
    }  
}
{
  "sort": {
    	"field": "price",
    	"order": "desc",
    	"mode": "semantic_threshold",
    }
}

💻

Python SDK

If you are accessing the Vantage platform through our Python SDK, sort options can be provided during the method call, using the Sort object.

from vantage_sdk.model.search import Sort

...

sort_options = Sort(
    field="price",
    order="asc",
    mode="field_selection",
)

vantage_instance.semantic_search(
    collection_id="example-collection",
    text="some query text",
    sort=sort_options,
)

Pagination

Pagination lets you control which results you receive within the larger set of results. You can call the endpoint repeatedly to page your results, requesting batches of results up to a total of 1000 results.

  • pagination.page: A number, starting at 0, that indicates the page of results to return, where each page is of size pagination.count.
  • pagination.count: The number of results to return for this request. Must be greater than 0.
  • pagination.threshold: Determines the "pool" of records to match before sorting. Must be lower than 10K.
{
	...
  "pagination": {
    "page": 0,
    "count": 40
  }
	...
}
{
	...
  "pagination": {
    "page": 1,
    "count": 40
  }
	...
}
{
	...
  "pagination": {
    "page": 0,
    "count": 40,
    "threshold": 300,
  }
	...
}
{
	...
  "pagination": {
    "page": 0,
    "count": 40,
    "threshold": 5000,
  }
	...
}

🚧

Result order determinism

The overall search result set for a given query may change for a variety of reasons between requests. While it's very likely that the next page of results will begin on the precise next result from the overall set, it's possible that new content being ingested into the collection may alter the overall result set.

💻

Python SDK

If you are accessing the Vantage platform through our Python SDK, pagination options can be provided during the method call, using the Pagination object.

from vantage_sdk.model.search import Pagination

...

pagination_options = Pagination(
    page=0,
    count=40,
    threshold=300,
)

vantage_instance.semantic_search(
    collection_id="example-collection",
    text="some query text",
    pagination=pagination_options,
)

Filtering

Filters enable your collection's ingested features or categorical data to be used in conjunction with semantic similarity search. Using filters generally results in lightning quick results. They are frequently used in traditional faceted search interfaces. For example, in product catalog search, you may only want product results within a single category, brand, size or color.

  • filter.boolean_filter: Either an empty string (no filters) or a boolean clause that will filter the results while the Vantage platform scores for semantic similarity. The string itself is comprised of:
    • field:"value": Limits results based on exact, case sensitive matching to a meta_ field provided during ingestion. Both field and value are case sensitive.
    • Combinations of these limits put together with AND and OR.
    • These filters can be composed together and compositely to create trees of complex filters using parentheses ( and ).
    • Can be reversed by adding NOT in front of the filter
# product_category was ingested as meta_product_category
product_category:"Fashion"
product_BrandName:"Brand XYZ"
(product_category:"Fashion" AND product_BrandName:"Brand XYZ")
(product_category:"Fashion" OR product_category:"Clothing")
NOT content_rating:"TV-14"
(
  (product_category:"Fashion" OR product_category:"Clothing")
  AND 
  product_BrandName:"Brand XYZ"
)
  • boolen_filter is sent in JSON, so a filter typically has the quotes (") escaped in the JSON request. Most JSON libraries do this automatically on your behalf when you create JSON from an object string containing quotes.
{
  filter: {
    boolean_filter: "((product_category:\"Fashion\" OR product_category:\"Clothing\") AND product_BrandName:\"Brand XYZ\")"
  }
}

💻

Python SDK

If you are accessing the Vantage platform through our Python SDK, filter options can be provided during the method call, using the Filter object.

from vantage_sdk.model.search import Filter

...

filter_options = Filter(
    boolean_filter='(product_category:"Fashion" AND product_BrandName:"Brand XYZ")',
)

vantage_instance.semantic_search(
    collection_id="example-collection",
    text="some query text",
    filter=filter_options,
)

Request ID

To enable asynchronous calls to the search endpoints, an identifier is included in the request which is then returned with the results.

  • request_id: An integer that will be returned with the results. It should be unique across all in-progress calls to any search endpoint.

Field Value Weighting

Keyword Support

If you are using Vantage Managed embeddings, during ingestion the text field is processed to support a straightforward keyword boosting method for search. You can use the tokens extracted and boost direct keyword matching using the following two fields to boost the core semantic matching score. This is useful if you are trying to add just a bit of keyword help to the existing semantic search to help ensure direct and long-tail phrases from your users are well represented in the initial results.

  • field_value_weighting.query_key_word_max_overall_weight: A number that will represent the largest increase in score with the number of key word or phrases that were matched. 1 is neutral, and regardless of how many keywords match or don't, the semantic score won't be affected. 0-1 reduces the score if the keyword matches meaning, reduce the score if there are keyword matches. 1-2 increases the score based on the number of phrases and matches present up to the maximum.
  • field_value_weighting.query_key_word_weighting_mode: A field which instructs Vantage how to do weighting on keywords. none indicate no keyword matching will be part of the query. uniform treats all word and phrases (after stemming) in the query input equal in weight using consistent score additions for any keyword matches. weighted uses embeddings to match words and phrases (after stemming) to the query input, and let the embedding distances determine the relative weights to apply for any matched.
{
  "field_value_weighting": {
      "query_key_word_weighting_mode": "uniform",
      "query_key_word_max_overall_weight": 1.05
  }
}

Field Value Boosting

There exists many use cases, where items in of particular category, brand, color, or other defined attributes (in meta_ fields) should be boosted (or reduced) slightly to help improve the overall results. red shoes will receive generally good results in the semantic search for shoes, and red things, and generally both but often the semantic scores without the context of the corpus will often favor these semantic ideas generally, instead of the dictionary of items in your collection. You can boost field values, based on the context for where you are calling the search (Brand Specific landing page on your site) or the parse the search itself for values. Vantage will take a set of fields, values, and weights and if they match exactly adjust the scores for those items accordingly. If the field values don't match, no adjustments to scores occur.

  • field_value_weighting.weighted_field_values: An array of objects, that instruct Vantage to boost the scores for the fields, names and weights specified. weight is 1 neutral, with 0-1 reducing scores and the 1-2 increasing the scores for items that match.
{
  "field_value_weighting": {
    "weighted_field_values": [
      {
          "field": "category", "value": "shoes", "weight": 1.03
      },
      {
          "field": "color", "value": "red", "weight": 1.03
      },
      {
          "field": "style", "value": "bogus", "weight": 1.03
      }
    ]
  }
}

Any documents with category:shoes score will be multipled by 1.03. Same with items that are color:red. The bogus style articulated above, which is guaranteed to never have a value of bogus in style will be ignored and no adjustments to any unmatched field values will occur.


📘

Reference Guide for Search API