JSONL Format

The Vantage JSONL format is used to bulk upload your data via the Console UI and via the direct upload links available in the REST API. Additionally, the format is also used for the Upload Documents API.

πŸ‘

Vantage Ingestion Format

The schema and format of the documents for ingestion.

Required Fields

  • id: Physical type is BYTE_ARRAY, logical type is String
  • text: Physical type is BYTE_ARRAY, logical type is String
  • embeddings: Physical type is a list of DOUBLE

Optional Fields

  • operation: Specifies the action to be performed on the document. The available options: update, add, delete
  • meta_[...]: Support querying and filtering; also used for Facets
  • meta_ordered_[...]: Support sorting
  • variants: Describe variants of a document

πŸ“˜

Typically you'd have only one of text or embeddings.

For more details please check Vantage Documents page.

JSONL Documents

Examples of correctly prepared JSONL data are below:

{"id": "1", "text": "Example text", "meta_color": "green", "variants": [{"id": "size-xl", "size": "XL"}, {"id": "size-l", "size": "L"}]}
{"id": "2", "text": "Sample text", "meta_color": "blue", "variants": [{"id": "size-m", "size": "M"}, {"id": "size-s", "size": "S"}]}
{"id": "1", "text": "Example text", "meta_color": "green", "embeddings": [1,2,3, ...], "variants": [{"id": "size-xl", "size": "XL"}, {"id": "size-l", "size": "L"}]}
{"id": "2", "text": "Sample text", "meta_color": "blue", "embeddings": [4,5,6, ...], "variants": [{"id": "size-m", "size": "M"}, {"id": "size-s", "size": "S"}]}

Note: meta_color, and variants fields are optional.

When you have your data you can use Vantage Python SDK to upload it easily:

from vantage_sdk import VantageClient

vantage_instance = VantageClient.using_vantage_api_key(
    vantage_api_key=VANTAGE_API_KEY,
    account_id=ACCOUNT_ID,
)

# JSONL data from above 
documents = '{
  "id": "1", 
  "text": "Example text", 
  "meta_color": "green", 
  "embeddings": [1,2,3, ...]
}\\n{
  "id": "2", 
  "text": "Sample text", 
  "meta_color": "blue",
  "embeddings": [4,5,6, ...]
}'

vantage_instance.upsert_documents_from_jsonl_string(
    collection_id="example-collection",
    documents_jsonl=documents
)
from vantage_sdk import VantageClient

vantage_instance = VantageClient.using_vantage_api_key(
    vantage_api_key=VANTAGE_API_KEY,
    account_id=ACCOUNT_ID,
)

# Proper JSONL data written in file
documents_path = "my_documents.jsonl"

vantage_instance.upsert_documents_from_jsonl_file(
    collection_id="example-collection",
    jsonl_file_path=documents_path
)