Skip to main content
POST
/
api
/
kb
/
collections
/
{collection}
/
config
Save Collection Config
curl --request POST \
  --url https://api.example.com/api/kb/collections/{collection}/config \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "deepdoc_processing_mode": "<string>",
  "deepdoc_parallel_threads": 2,
  "deepdoc_reserve_cpu": 1,
  "deepdoc_limiter_capacity": 2,
  "deepdoc_pipeline_monitor": true,
  "deepdoc_pipeline_s1_workers": 2,
  "deepdoc_gpu_sessions": 1,
  "embedding_base_url": "<string>",
  "embedding_api_key": "<string>",
  "embedding_timeout_sec": 1,
  "parse_method": "default",
  "chunk_strategy": "recursive",
  "chunk_method": "<string>",
  "chunk_size": 1000,
  "chunk_overlap": 200,
  "headers_to_split_on": [
    [
      "<string>",
      "<string>"
    ]
  ],
  "separators": [
    "<string>"
  ],
  "use_token_count": false,
  "tiktoken_encoding": "cl100k_base",
  "enable_protected_content": true,
  "protected_patterns": [
    "<string>"
  ],
  "table_context_size": 0,
  "image_context_size": 0,
  "embedding_model_id": "<string>",
  "collection_locked": false,
  "allow_mixed_parse_methods": false,
  "skip_config_validation": false,
  "embedding_batch_size": 10,
  "embedding_concurrent": 10,
  "embedding_use_async": false,
  "max_retries": 3,
  "retry_delay": 1
}
'
{
  "status": "<string>",
  "collection": "<string>",
  "message": "<string>",
  "warnings": [
    "<string>"
  ],
  "affected_documents": [
    {
      "doc_id": "<string>",
      "status": "pending",
      "message": "<string>"
    }
  ],
  "deleted_counts": {}
}

Documentation Index

Fetch the complete documentation index at: https://docs.xagent.run/llms.txt

Use this file to discover all available pages before exploring further.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Path Parameters

collection
string
required

Body

application/json

Configuration values for the document ingestion pipeline.

deepdoc_processing_mode
string | null

DeepDoc processing mode (e.g., 'pipeline', 'default').

deepdoc_parallel_threads
integer | null

DeepDoc parallel threads (DEEPDOC_PARALLEL_THREADS).

Required range: x >= 1
deepdoc_reserve_cpu
integer | null

DeepDoc reserved CPU cores (DEEPDOC_RESERVE_CPU).

Required range: x >= 0
deepdoc_limiter_capacity
integer | null

DeepDoc CapacityLimiter capacity (DEEPDOC_LIMITER_CAPACITY).

Required range: x >= 1
deepdoc_pipeline_monitor
boolean | null

Enable DeepDoc pipeline monitor (DEEPDOC_PIPELINE_MONITOR).

deepdoc_pipeline_s1_workers
integer | null

DeepDoc S1 worker count (DEEPDOC_PIPELINE_S1_WORKERS).

Required range: x >= 1
deepdoc_gpu_sessions
integer | null

DeepDoc GPU sessions count/preference (DEEPDOC_GPU_SESSIONS).

Required range: x >= 0
embedding_base_url
string | null

Override DashScope base URL for embedding requests.

embedding_api_key
string | null

Override DashScope API key for embedding requests.

embedding_timeout_sec
number | null

Override embedding request timeout (seconds).

Required range: x > 0
parse_method
enum<string>
default:default

Parse method used during parse_document step

Available options:
default,
pypdf,
pdfplumber,
unstructured,
pymupdf,
deepdoc
chunk_strategy
enum<string>
default:recursive

Chunk strategy passed to chunk_document

Available options:
recursive,
fixed_size,
markdown
chunk_method
string | null

Custom chunk method identifier. If provided, takes precedence over chunk_strategy

chunk_size
integer | null
default:1000

Chunk size passed to chunk_document; must be a positive integer. If None, semantic splitting is used without size limits.

Required range: x > 0
chunk_overlap
integer
default:200

Chunk overlap passed to chunk_document; must be non-negative

Required range: x >= 0
headers_to_split_on
tuple[] | null

Markdown headers split rules for markdown strategy

separators
string[] | null

Custom separators for recursive/markdown strategies

use_token_count
boolean
default:false

If True, chunk_size and chunk_overlap are in tokens (tiktoken); only applies to RECURSIVE strategy

tiktoken_encoding
string
default:cl100k_base

tiktoken encoding name when use_token_count=True (e.g. cl100k_base for GPT-4/3.5). Should align with config.DEFAULT_TIKTOKEN_ENCODING.

enable_protected_content
boolean
default:true

If True, do not split inside code blocks, formulas, tables (P1).

protected_patterns
string[] | null

Optional regex patterns for protected regions; None uses config default.

table_context_size
integer
default:0

Chars from prev/next chunk to attach to table chunks; 0 = off (P2).

Required range: x >= 0
image_context_size
integer
default:0

Chars from prev/next chunk to attach to image chunks; 0 = off (P2).

Required range: x >= 0
embedding_model_id
string | null

Embedding model identifier registered in AgentOS model hub. If omitted, the pipeline attempts to auto-detect a single available embedding model.

collection_locked
boolean
default:false

Whether to lock collection configuration. When True, enforces strict config validation.

allow_mixed_parse_methods
boolean
default:false

Whether to allow mixed parse methods within the collection. When False, enforces type-based parse method consistency.

skip_config_validation
boolean
default:false

Skip collection configuration validation. Use with caution.

embedding_batch_size
integer
default:10

Batch size for embedding provider requests; must be positive

embedding_concurrent
integer
default:10

Maximum concurrent requests for embedding computation when using async mode (for models that don't support batch processing, e.g., text-embedding-v4). Must be positive. Adjust based on machine configuration and API rate limits.

embedding_use_async
boolean
default:false

Whether to use async concurrent processing for embeddings. Set to True for models that don't support batch processing (e.g., text-embedding-v4). When True, embeddings are processed concurrently using asyncio instead of batch API calls.

max_retries
integer
default:3

Maximum number of retries for embedding provider failures; must be non-negative

Required range: x >= 0
retry_delay
number
default:1

Delay in seconds between embedding retries; must be non-negative

Required range: x >= 0

Response

Successful Response

Response payload for collection-level management operations.

status
string
required

Operation status: success|partial_success|error

collection
string
required

Collection identifier affected by the operation

message
string
required

Human-readable summary of the collection operation

warnings
string[]

Non-fatal issues encountered while processing the collection

affected_documents
CollectionOperationDetail · object[]

Subset of documents impacted by the collection operation

deleted_counts
Deleted Counts · object

Aggregated deletion counts per table when applicable