Skip to main content
POST
/
api
/
kb
/
ingest
Ingest
curl --request POST \
  --url https://api.example.com/api/kb/ingest \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form file='@example-file' \
  --form 'collection=<string>' \
  --form parse_method=default \
  --form chunk_strategy=recursive \
  --form chunk_size=1 \
  --form chunk_overlap=1 \
  --form embedding_model_id=text-embedding-v4 \
  --form embedding_batch_size=1 \
  --form max_retries=1 \
  --form retry_delay=1
{
  "status": "<string>",
  "message": "<string>",
  "doc_id": "<string>",
  "parse_hash": "<string>",
  "chunk_count": 0,
  "embedding_count": 0,
  "vector_count": 0,
  "completed_steps": [
    {
      "name": "<string>",
      "metadata": {}
    }
  ],
  "failed_step": "<string>",
  "warnings": [
    "<string>"
  ]
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

multipart/form-data
file
file
required
collection
string
parse_method
enum<string> | null

Parser used during ingestion. Options: default, pypdf, pdfplumber, unstructured, pymupdf, deepdoc

Available options:
default,
pypdf,
pdfplumber,
unstructured,
pymupdf,
deepdoc
chunk_strategy
enum<string> | null

Chunking strategy. Options: recursive (default), fixed_size, markdown

Available options:
recursive,
fixed_size,
markdown
chunk_size
integer | null

Chunk size in characters (default: 1000)

Required range: x > 0
chunk_overlap
integer | null

Chunk overlap in characters (default: 200)

Required range: x >= 0
embedding_model_id
string
default:text-embedding-v4

Embedding model ID (default: text-embedding-v4)

embedding_batch_size
integer | null

Batch size for embedding (default: 10)

Required range: x > 0
max_retries
integer | null

Maximum retries for embedding failures (default: 3)

Required range: x >= 0
retry_delay
number | null

Delay between retries in seconds (default: 1.0)

Required range: x >= 0

Response

Successful Response

Structured response for the document ingestion pipeline.

status
string
required

Pipeline status: success|error|partial

message
string
required

Human-readable summary of pipeline result

doc_id
string | null

Document identifier produced by register_document

parse_hash
string | null

Parse hash produced during parse_document step

chunk_count
integer
default:0

Number of chunks created; must be non-negative

Required range: x >= 0
embedding_count
integer
default:0

Number of embeddings generated; must be non-negative

Required range: x >= 0
vector_count
integer
default:0

Number of vectors written to storage; must be non-negative

Required range: x >= 0
completed_steps
IngestionStepResult · object[]

List of successfully completed steps

failed_step
string | null

Pipeline step where failure occurred, if any

warnings
string[]

Non-fatal warnings encountered