Search Options

Parameters Common to All Search Endpoints

Vantage supports several parameters that are common to all search endpoints:

  • Collection ID: The only required field, specifying the collection to search within.
  • Accuracy: Defines the accuracy threshold for the search results.
  • Pagination: Controls the pagination settings for navigating through search results.
  • Filter: Allows for narrowing down search results based on specific criteria.
  • Sort: Determines the sorting order of the search results.
  • Weighted Field Values: Applies specific weights to certain fields to influence search relevance.

Required Parameters

Collection Identification (required)

You can have many collections with various types and composition of data.

To instruct the Vantage platform which collection within your account to perform the search against, users have to provide collection_id and account_id as part of the endpoint path.

  • account_id: The Vantage account ID that the collection is contained within. This can be found in the Console UI and it is typically your company or organization name.
  • collection_id: The unique identifier of the collection you are searching. You specified this ID when you created the collection. This can be found in the Console UI or by API request.

💻

SDK Usage

If you are accessing the Vantage platform through one of our SDKs, Account ID can be provided during the client initialization process, while Collection ID can be provided during the method call.

from vantage_sdk import VantageClient

vantage_instance = VantageClient.using_vantage_api_key(
    vantage_api_key=VANTAGE_API_KEY,
    account_id=ACCOUNT_ID,
)

vantage_instance.semantic_search(
    collection_id="example-collection",
    text="some query text"
)
import { VantageClientConfiguration, VantageClient } from "@vantage-sdk";

const configuration: VantageClientConfiguration = {
    vantageApiKey: vantageApiKey,
    accountId: accountId, 
    
}
let client = new VantageClient(configuration)

const collectionId = "example-collection"
const queryText = "Example query"

const searchResults = client.semanticSearch(collectionId, queryText);
import com.vantagediscovery.sdk.VantageClient;

public static void main(String[] args) {
    final VantageClient client = VantageClient.usingVantageApiKey()
        .withAccountId(ACCOUNT_ID)
        .withVantageApiKey(VANTAGE_API_KEY)
        .build();
}

Additional Optional Parameters

Accuracy

The Vantage platform lets you tune the recall of every search query, controlling how much of your collection data to search over. Generally, a lower accuracy number gives great results, with exceptional speed (tens of milliseconds). A higher accuracy number may provide additional or better results, but take longer to process (one to three seconds).

  • collection.accuracy: A number between 0.001 and 1.000 that tells the Vantage platform how much of the collection to search across. A higher number will search across more of the collection but take longer. If unsure, a good place to start is 0.2.
{
  ...
  "collection": {
    "accuracy" : 0.2
    ...
  }
  ...
}
{
  ...
  "collection": {
    "accuracy" : 0.5
    ...
  }
  ...
}

💻

SDK Usage

If you are accessing the Vantage platform through one of our SDKs, accuracy can be provided during the method call.

...

vantage_instance.semantic_search(
    collection_id="example-collection",
    text="some query text",
    accuracy=0.15
)

...
...

const collectionId = "example-collection"
const queryText = "Example query"
const accuracy = 0.15

const searchResults = client.semanticSearch(collectionId, queryText, accuracy);

...

final String collectionId = "example-collection

final SearchResult result = client.search()
    .collection("example-collection")
    .semantic()
    .withSearchProperties(
        CommonSearchProperties
           .builder()
           .withAccuracy(BigDecimal.valueOf(0.15))
           .build()
    )
    .withSearchText("Example query")
    .execute();

vantage search-semantic --vantage-api-key 'API_KEY' --text "Example query" --accuracy 0.15 example-collection

Sort

To enable sorting of your search results, follow the steps outlined below:

Data Ingestion: When ingesting your data, ensure that the column names intended for sorting have the prefix meta_ordered_. This prefix differentiates sortable columns from other metadata fields, which typically use the prefix meta_. For instance, if you wish to sort by price, name the column meta_ordered_price.

Value Type Restriction: Values provided for the meta_ordered_ columns must be of type float.

Executing a Search: During your search query, refer to the field by its base name without the prefix. For example, use price to sort by the previously defined meta_ordered_price column.

  • field: The name of the field by which search results are sorted. For instance, based on the context provided earlier, price would serve as the sort_field when you want to organize search results according to price values.
  • order: Specifies the direction in which search results are organized. It can be either ascending (asc) to sort from lowest to highest values, or descending (desc) to sort from highest to lowest values. The default sorting order is descending (desc).
  • mode: Indicates the criteria used for sorting search results. Options include field_selection, which organizes results based on the values of the sort_field, and semantic_threshold, which sorts results based on their relevance or similarity to the search query. The default sorting mode is field_selection.
{  
  "sort": {  
    	"field": "price",  
    	"order": "asc",  
    	"mode": "field_selection",  
    }  
}
{
  "sort": {
    	"field": "price",
    	"order": "desc",
    	"mode": "semantic_threshold",
    }
}

💻

SDK Usage

If you are accessing the Vantage platform through one of our SDKs, sort options can be provided during the method call, using the Sort object.

from vantage_sdk.model.search import Sort

...

sort_options = Sort(
    field="price",
    order="asc",
    mode="field_selection",
)

vantage_instance.semantic_search(
    collection_id="example-collection",
    text="some query text",
    sort=sort_options,
)
...

const collectionId = "example-collection"
const queryText = "Example query"

const sortOptions = new Sort(
    "price", // field
    "asc", // order
    "field_selection" // mode
)

client.semanticSearch(
    collectionId,
    queryText, 
    undefined, // accuracy
    undefined, // pagination
    undefined, // filter
    sortOptions
)

...

final String collectionId = "example-collection"

final SearchResult result = client.search()
    .collection("example-collection")
    .semantic()
    .withSearchProperties(
        CommonSearchProperties
           .builder()
           .withAccuracy(BigDecimal.valueOf(0.15))
           .withSort(new Sort("price", Sort.SortOrderType.ASC, Sort.SortModeType.FIELD_SELECTION))
           .build()
    )
    .withSearchText("Example query")
    .execute();

vantage search-semantic --vantage-api-key API_KEY --text "Example query" --accuracy 0.15 --sort-field price --sort-order asc --sort-mode field_selection example-collection

Pagination

Pagination lets you control which results you receive within the larger set of results. You can call the endpoint repeatedly to page your results, requesting batches of results up to a total of 1,000 results.

  • pagination.page: A number, starting at 0, that indicates the page of results to return, where each page is of size pagination.count.
  • pagination.count: The number of results to return for this request. Must be greater than 0.
  • pagination.threshold: Determines the "pool" of records to match before sorting. Must be lower than 10,000.
{
	...
  "pagination": {
    "page": 0,
    "count": 40
  }
	...
}
{
	...
  "pagination": {
    "page": 1,
    "count": 40
  }
	...
}
{
	...
  "pagination": {
    "page": 0,
    "count": 40,
    "threshold": 300,
  }
	...
}
{
	...
  "pagination": {
    "page": 0,
    "count": 40,
    "threshold": 5000,
  }
	...
}

🚧

Result order determinism

The overall search result set for a given query may change for a variety of reasons between requests. While it's very likely that the next page of results will begin on the precise next result from the overall set, it's possible that new content being ingested into the collection may alter the overall result set.

💻

SDK Usage

If you are accessing the Vantage platform through one of our SDKs, pagination options can be provided during the method call, using the Pagination object.

from vantage_sdk.model.search import Pagination

...

pagination_options = Pagination(
    page=0,
    count=40,
    threshold=300,
)

vantage_instance.semantic_search(
    collection_id="example-collection",
    text="some query text",
    pagination=pagination_options,
)
...

const collectionId = "example-collection"
const queryText = "Example query"

const paginationOptions = new Pagination(
    0, // page
    40, // count
    300 // threshold
)

client.semanticSearch(
    collectionId,
    queryText, 
    undefined, // accuracy
    paginationOptions,
)

...

final String collectionId = "example-collection

final SearchResult result = client.search()
    .collection("example-collection")
    .semantic()
    .withSearchProperties(
        CommonSearchProperties
           .builder()
           .withAccuracy(BigDecimal.valueOf(0.15))
           .withSort(new Sort("price", Sort.SortOrderType.ASC, Sort.SortModeType.FIELD_SELECTION))
           .withPagination(
               Pagination.builder()
                   .withPage(0)
                   .withCount(40)
                   .withThreshold(300)
                   .build()
           )
           .build()
    )
    .withSearchText("Example query")
    .execute();

vantage search-semantic --vantage-api-key API_KEY --text "Example query" --accuracy 0.15 --page 0 --items-per-page 40 --pagination-threshold 300 example-collection

Filtering

Filters enable your collection's ingested features or categorical data to be used in conjunction with semantic similarity search. Using filters generally results in lightning quick results. They are frequently used in traditional faceted search interfaces. For example, in product catalog search, you may only want product results within a single category, brand, size or color.

  • filter.boolean_filter
    Either an empty string (no filters) or a boolean clause that will filter the results while the Vantage platform scores for semantic similarity. The string itself is comprised of:

    • field:"value": Limits results based on exact, case sensitive matching to a meta_ field provided during ingestion. Both field and value are case sensitive.
    • Combinations of these limits put together with AND and OR.
    • These filters can be composed together to create trees of complex filters using parentheses ( and ).
    • Can be reversed by adding NOT in front of the filter
  • filter.variant_filter
    Either an empty string (no filters) or a boolean clause that will filter the results while the Vantage platform scores for semantic similarity. The string itself is comprised of:

    • field:"value": Limits results based on exact, case sensitive matching to a fields inside variants list of objects provided during ingestion. Both field and value are case sensitive.
    • Combinations of these limits put together with AND and OR.
    • These filters can be composed together to create trees of complex filters using parentheses ( and ).
    • Can be reversed by adding NOT in front of the filter
# product_category was ingested as meta_product_category
product_category:"Fashion"
product_BrandName:"Brand XYZ"
(product_category:"Fashion" AND product_BrandName:"Brand XYZ")
(product_category:"Fashion" OR product_category:"Clothing")
NOT content_rating:"TV-14"
(
  (product_category:"Fashion" OR product_category:"Clothing")
  AND 
  product_BrandName:"Brand XYZ"
)
  • both boolean_filter and variant_filter are sent in JSON, so a filter typically has the quotes (") escaped in the JSON request. Most JSON libraries do this automatically on your behalf when you create JSON from an object string containing quotes.
{
  filter: {
    boolean_filter: "((product_category:\"Fashion\" OR product_category:\"Clothing\") AND product_BrandName:\"Brand XYZ\")"
  }
}

💻

SDK Usage

If you are accessing the Vantage platform through one of our SDKs, filter options can be provided during the method call, using the Filter object.

from vantage_sdk.model.search import Filter

...

filter_options = Filter(
    boolean_filter='(product_category:"Fashion" AND product_BrandName:"Brand XYZ")',
    variant_filter='(color:"Black" OR color:"Brown")',
)

vantage_instance.semantic_search(
    collection_id="example-collection",
    text="some query text",
    filter=filter_options,
)
...

const collectionId = "example-collection"
const queryText = "Example query"

const filterOptions = new Filter(
    '(product_category:"Fashion" AND product_BrandName:"Brand XYZ")', // booolean filter
    '(color:"Black" OR color:"Brown")' // variant filter
)

client.semanticSearch(
    collectionId,
    queryText, 
    undefined, // accuracy
    undefined, // pagination
    filterOptions
)

...

final String collectionId = "example-collection

final SearchResult result = client.search()
    .collection("example-collection")
    .semantic()
    .withSearchProperties(
        CommonSearchProperties
           .builder()
           .withFilter(
               Filter.builder()
                   .withBooleanFilter("(product_category:\"Fashion\" AND product_BrandName:\"Brand XYZ\")")
                   .withVariantFilter("(color:\"Black\" OR color:\"Brown\")")
                   .build()
           )
           .build()
    )
    .withSearchText("Example query")
    .execute();

vantage search-semantic --vantage-api-key API_KEY --text "Example query" --boolean-filter '(product_category:"Fashion" AND product_BrandName:"Brand XYZ")' --variant-filter '(color:"Black" OR color:"Brown")' example-collection

Request ID

To enable asynchronous calls to the search endpoints, an identifier is included in the request which is then returned with the results.

  • request_id: An integer that will be returned with the results. It should be unique across all in-progress calls to any search endpoint.

Field Value Weighting

Keyword Support

If you are using Vantage Managed embeddings, during ingestion the text field is processed to support a straightforward keyword boosting method for search. You can use the tokens extracted and boost direct keyword matching using the following two fields to boost the core semantic matching score. This is useful if you are trying to add just a bit of keyword help to the existing semantic search to help ensure direct and long-tail phrases from your users are well represented in the initial results.

  • field_value_weighting.query_key_word_max_overall_weight: A number that will represent the largest increase in score with the number of key word or phrases that were matched. 1 is neutral, and regardless of how many keywords match or don't, the semantic score won't be affected. 0-1 reduces the score if the keyword matches meaning, reduce the score if there are keyword matches. 1-2 increases the score based on the number of phrases and matches present up to the maximum.
  • field_value_weighting.query_key_word_weighting_mode: A field which instructs Vantage how to do weighting on keywords. none indicate no keyword matching will be part of the query. uniform treats all word and phrases (after stemming) in the query input equal in weight using consistent score additions for any keyword matches. weighted uses embeddings to match words and phrases (after stemming) to the query input, and let the embedding distances determine the relative weights to apply for any matched.
{
  "field_value_weighting": {
      "query_key_word_weighting_mode": "uniform",
      "query_key_word_max_overall_weight": 1.05
  }
}

Field Value Boosting

There are many use cases where items in a particular category, brand, color, or other defined attributes (in meta_ fields) should be boosted (or reduced) slightly to help improve the overall results. red shoes will receive generally good results in the semantic search for shoes and/or red things. But the semantic scores without the context of the corpus will often favor these semantic ideas generally, instead of the dictionary of items in your collection. You can boost field values, based on the context for where you are calling the search (Brand Specific landing page on your site) or parse the search itself for values. Vantage will take a set of fields, values, and weights and if they match exactly adjust the scores for those items accordingly. If the field values don't match, no adjustments to scores occur.

  • field_value_weighting.weighted_field_values: An array of objects, that instruct Vantage to boost the scores for the fields, names and weights specified. weight is 1 neutral, with 0-1 reducing scores and the 1-2 increasing the scores for items that match.
{
  "field_value_weighting": {
    "weighted_field_values": [
      {
          "field": "category", "value": "shoes", "weight": 1.03
      },
      {
          "field": "color", "value": "red", "weight": 1.03
      },
      {
          "field": "style", "value": "bogus", "weight": 1.03
      }
    ]
  }
}

Any documents with category:shoes score will be multipled by 1.03. Same with items that are color:red. The bogus style articulated above, which is guaranteed to never have a value of bogus in style will be ignored and no adjustments to any unmatched field values will occur.

Facets

Facets are like filters that allow users to drill down into specific attributes of the data. For example, if you're searching for clothing items in an online store, you might use facets like color and size to narrow down the results to just red shirts in medium size. Facets provide a structured way to explore data by enabling easy filtering on object attributes.

In our case, the API will return the count for each facet value provided, rather than the specific objects themselves. You can retrieve the objects by using boolean_filter to filter on different facet values.

  • facets: An array of objects containing name, type, and values fields. The name represents the facet's name (upserted during ingestion as meta_<name>). The type is an enum that defines whether we want to specify a concrete value (count) or a range of values (range). Currently, only the count type is available. The values field represents an array of values for which we want to receive a count. If values is an empty list, API will return count for all possible values.
{
  "facets": [
    {
        "name": "color", "type": "count", "values": []
    },
    {
        "name": "size", "type": "count", "values": ["sm", "md"]
    }
  ],
}

💻

SDK Usage

If you are accessing the Vantage platform through one of our SDKs, facets options can be provided during the method call, using the Facet object.

from vantage_sdk.model.search import Facet, FacetType

...

facets = [
    Facet(
        name="color",
        type=FacetType.COUNT,
    ),
    Facet(
        name="size",
        type=FacetType.COUNT,
        values=["sm", "md"],
    ),
]

vantage_instance.semantic_search(
    collection_id="example-collection",
    text="some query text",
    facets=facets,
)
...

const collectionId = "example-collection"
const queryText = "Example query"

const facets = [
            new Facet("color", FacetTypeEnum.Count),
            new Facet("size", FacetTypeEnum.Count, ["sm", "md"])
        ]

client.semanticSearch(
    collectionId,
    queryText, 
    undefined, // accuracy
    undefined, // pagination
    undefined, // filter
    undefined, // sort
    undefined, // field value weighting
    facets,
)

...

final SearchResult result = client
    .search()
    .collection("example-collection")
    .semantic()
    .withSearchProperties(
        CommonSearchProperties
            .builder()
            .withAccuracy(BigDecimal.ONE)
            .withFacets(List.of(
                 Facet.countAllFacet("color"),
                 Facet.countValuesFacet("size", List.of("sm", "md", "lg"))
            ))
        .build()
    )
    .withSearchText("test search")
    .execute();
vantage search-semantic --vantage-api-key API_KEY --text "Example query" --facets '[ { "name": "color", "type": "count", "values": [] }, { "name": "size", "type": "count", "values": [ "sm", "md", "lg" ] } ]' example-collection

📘

Reference Guide for Search API