Save ingestion configuration for a specific collection.
Documentation Index
Fetch the complete documentation index at: https://docs.xagent.run/llms.txt
Use this file to discover all available pages before exploring further.
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Configuration values for the document ingestion pipeline.
DeepDoc processing mode (e.g., 'pipeline', 'default').
DeepDoc parallel threads (DEEPDOC_PARALLEL_THREADS).
x >= 1DeepDoc reserved CPU cores (DEEPDOC_RESERVE_CPU).
x >= 0DeepDoc CapacityLimiter capacity (DEEPDOC_LIMITER_CAPACITY).
x >= 1Enable DeepDoc pipeline monitor (DEEPDOC_PIPELINE_MONITOR).
DeepDoc S1 worker count (DEEPDOC_PIPELINE_S1_WORKERS).
x >= 1DeepDoc GPU sessions count/preference (DEEPDOC_GPU_SESSIONS).
x >= 0Override DashScope base URL for embedding requests.
Override DashScope API key for embedding requests.
Override embedding request timeout (seconds).
x > 0Parse method used during parse_document step
default, pypdf, pdfplumber, unstructured, pymupdf, deepdoc Chunk strategy passed to chunk_document
recursive, fixed_size, markdown Custom chunk method identifier. If provided, takes precedence over chunk_strategy
Chunk size passed to chunk_document; must be a positive integer. If None, semantic splitting is used without size limits.
x > 0Chunk overlap passed to chunk_document; must be non-negative
x >= 0Markdown headers split rules for markdown strategy
Custom separators for recursive/markdown strategies
If True, chunk_size and chunk_overlap are in tokens (tiktoken); only applies to RECURSIVE strategy
tiktoken encoding name when use_token_count=True (e.g. cl100k_base for GPT-4/3.5). Should align with config.DEFAULT_TIKTOKEN_ENCODING.
If True, do not split inside code blocks, formulas, tables (P1).
Optional regex patterns for protected regions; None uses config default.
Chars from prev/next chunk to attach to table chunks; 0 = off (P2).
x >= 0Chars from prev/next chunk to attach to image chunks; 0 = off (P2).
x >= 0Embedding model identifier registered in AgentOS model hub. If omitted, the pipeline attempts to auto-detect a single available embedding model.
Whether to lock collection configuration. When True, enforces strict config validation.
Whether to allow mixed parse methods within the collection. When False, enforces type-based parse method consistency.
Skip collection configuration validation. Use with caution.
Batch size for embedding provider requests; must be positive
Maximum concurrent requests for embedding computation when using async mode (for models that don't support batch processing, e.g., text-embedding-v4). Must be positive. Adjust based on machine configuration and API rate limits.
Whether to use async concurrent processing for embeddings. Set to True for models that don't support batch processing (e.g., text-embedding-v4). When True, embeddings are processed concurrently using asyncio instead of batch API calls.
Maximum number of retries for embedding provider failures; must be non-negative
x >= 0Delay in seconds between embedding retries; must be non-negative
x >= 0Successful Response
Response payload for collection-level management operations.
Operation status: success|partial_success|error
Collection identifier affected by the operation
Human-readable summary of the collection operation
Non-fatal issues encountered while processing the collection
Subset of documents impacted by the collection operation
Aggregated deletion counts per table when applicable