Parquet Format

The Vantage Parquet format is used to bulk upload your data via the Console UI and via the direct upload links available in the REST API.

👍
Vantage Ingestion Format
The schema and format of the documents for ingestion.

Required Fields

id: Physical type is BYTE_ARRAY, logical type is String
text: Physical type is BYTE_ARRAY, logical type is String
embeddings: Physical type is a list of DOUBLE

Optional Fields

operation: Specifies the action to be performed on the document. The available options: update, add, delete
meta_[...]: Support querying and filtering
meta_ordered_[...]: Support sorting
variants: Describe variants of a document

📘
Typically you'd have only one of text or embeddings.
For more details please check Vantage Documents page.

Check Format of Parquet

Here's some Python to validate the format in a .parquet file. You can find the examples in our :github: vantage-tutorials

import pyarrow.parquet as pq

# Read the Parquet file metadata
parquet_file = pq.ParquetFile('hello_world.parquet')

# Get the schema
schema = parquet_file.schema

# Print columns and their types
for field in schema:
    physical_type = field.physical_type
    logical_type = field.logical_type
    print(f"Column: {field.path}, Physical Type: {physical_type}, Logical Type: {logical_type}")

Column: id, Physical Type: BYTE_ARRAY, Logical Type: String
Column: text, Physical Type: BYTE_ARRAY, Logical Type: String
Column: meta_product_type, Physical Type: BYTE_ARRAY, Logical Type: String

import pyarrow.parquet as pq

# Read the Parquet file metadata
parquet_file = pq.ParquetFile('hello_world_embeddings.parquet')

# Get the schema
schema = parquet_file.schema

# Print columns and their types
for field in schema:
    physical_type = field.physical_type
    logical_type = field.logical_type
    print(f"Column: {field.path}, Physical Type: {physical_type}, Logical Type: {logical_type}")

Column: id, Physical Type: BYTE_ARRAY, Logical Type: String
Column: embeddings.list.element, Physical Type: DOUBLE, Logical Type: None

Updated 30 days ago