Vantage Ingestion Format

The file format for ingesting data into the Vantage platform

The data you ingest or upload into your collections must conform to specific field and format requirements. This is what we call the Vantage Ingestion Format (VIF). You can use this format in πŸ”œboth our REST API and the Vantage Parquet Format when uploading via the console. All field name are lower case, except where noted.

Required Fields

id

This represents your ID for this document. It will be handed back to you in search results.

  • Length: 1 to 256 characters
{
 "id": "http://mydomain.com/products/1234567"
}
{
 "id": "product-id-12345" 
}
{
  "id": "9938173" 
}

text

This field is the text that will be embedded using your provided model. It should be a UTF-8 string, zero to 1GB in size.

embeddings

This is an array of 32-bit floating point numbers (4 bytes each). The array length should match the Dimension Size of the collection you're putting data into. For instance, if you're using OpenAI's text-embedding-ada-002 model which has a dimension size of 1536, your embeddings array length should be 1536.

πŸ‘

Vantage Managed Embeddings (VME) or User Provided Embeddings (UPE)

During ingestion you must include either text or embeddings depending on if Vantage is managing embeddings for you or you are providing them.

{
  "id" : "123"
  ,"text" : "The quick brown fox jumps over the lazy dog"
}
{
  "id" : "document-100"
  "embeddings" : [-0.010061902925372124, -0.017514921724796295, .... ]
}

🚧

File with both text and embeddings

If your file includes both, the embeddings field takes precedence and we will use it instead of re-embedding the text. That means, even if you are using Vantatge Managed Embeddings and normally the text would be processed, the embeddings would be used instead. This may be useful if you have already created embeddings for the data your are ingesting to the Vantage platform.

Optional Fields

meta_ fields

Other fields, prefixed with meta_, can be provided to support querying and filtering. A document can have any number of these fields.

meta_<fieldname> is a field that should be indexed for search query filtering. The <fieldname> is case-sensitive. meta_FieldName is different from meta_fieldname.

The <fieldname> part of the field has specific naming restrictions:

  • Characters: May contain only [a-zA-Z0-9-_] characters
  • Minimum length: 3 characters
  • Maximum length: 255 characters

The values of these fields can only be these types:

  • Scalar Single Values: meta_<fieldname> : int, string
  • Array or List Values: meta_<fieldname> : [int], [string]

πŸ“˜

The meta_ prefix is only used in the ingestion format. During querying, the meta_ prefix is dropped.