Upload and ingest a document into the knowledge base.
Args: collection: Target collection name. If not provided, uses file name. file: The document file to upload and process. parse_method: Parser used during ingestion. chunk_strategy: Strategy for chunking the document. chunk_size: Target chunk size in characters. chunk_overlap: Overlap between consecutive chunks. embedding_model_id: Embedding model ID from model hub. embedding_batch_size: Batch size for embedding operations. max_retries: Maximum retry attempts for failures. retry_delay: Delay between retry attempts in seconds.
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Parser used during ingestion. Options: default, pypdf, pdfplumber, unstructured, pymupdf, deepdoc
default, pypdf, pdfplumber, unstructured, pymupdf, deepdoc Chunking strategy. Options: recursive (default), fixed_size, markdown
recursive, fixed_size, markdown Chunk size in characters (default: 1000)
x > 0Chunk overlap in characters (default: 200)
x >= 0Embedding model ID (default: text-embedding-v4)
Batch size for embedding (default: 10)
x > 0Maximum retries for embedding failures (default: 3)
x >= 0Delay between retries in seconds (default: 1.0)
x >= 0Successful Response
Structured response for the document ingestion pipeline.
Pipeline status: success|error|partial
Human-readable summary of pipeline result
Document identifier produced by register_document
Parse hash produced during parse_document step
Number of chunks created; must be non-negative
x >= 0Number of embeddings generated; must be non-negative
x >= 0Number of vectors written to storage; must be non-negative
x >= 0List of successfully completed steps
Pipeline step where failure occurred, if any
Non-fatal warnings encountered