Skip to main content
POST
/
api
/
kb
/
ingest-web
/
jobs
Create Ingest Web Job
curl --request POST \
  --url https://api.example.com/api/kb/ingest-web/jobs \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/x-www-form-urlencoded' \
  --data 'collection=<string>' \
  --data 'start_url=<string>' \
  --data max_pages=100 \
  --data max_depth=3 \
  --data 'url_patterns=<string>' \
  --data 'exclude_patterns=<string>' \
  --data same_domain_only=true \
  --data 'content_selector=<string>' \
  --data 'remove_selectors=<string>' \
  --data concurrent_requests=3 \
  --data request_delay=1 \
  --data timeout=30 \
  --data respect_robots_txt=true \
  --data chunk_size=1 \
  --data chunk_overlap=1 \
  --data 'separators=<string>' \
  --data embedding_model_id=text-embedding-v4 \
  --data embedding_batch_size=1 \
  --data max_retries=1 \
  --data retry_delay=1
{
  "id": "<string>",
  "user_id": 123,
  "job_type": "<string>",
  "queue": "<string>",
  "status": "<string>",
  "attempts": 123,
  "max_attempts": 123,
  "progress": {},
  "result": {},
  "error_message": "<string>",
  "celery_task_id": "<string>",
  "started_at": "2023-11-07T05:31:56Z",
  "finished_at": "2023-11-07T05:31:56Z",
  "created_at": "2023-11-07T05:31:56Z",
  "updated_at": "2023-11-07T05:31:56Z"
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/x-www-form-urlencoded
collection
string
required

Target collection name

start_url
string
required

Starting URL for crawling

max_pages
integer | null
default:100
max_depth
integer | null
default:3
url_patterns
string | null
exclude_patterns
string | null
same_domain_only
boolean | null
default:true
content_selector
string | null
remove_selectors
string | null
concurrent_requests
integer | null
default:3
Required range: 1 <= x <= 10
request_delay
number | null
default:1
Required range: x >= 0
timeout
integer | null
default:30
Required range: x >= 1
respect_robots_txt
boolean | null
default:true
parse_method
enum<string> | null

Available parsing methods

Available options:
default,
pypdf,
pdfplumber,
unstructured,
pymupdf,
deepdoc
chunk_strategy
enum<string> | null

Available chunk strategies

Available options:
recursive,
fixed_size,
markdown
chunk_size
integer | null
Required range: x > 0
chunk_overlap
integer | null
Required range: x >= 0
separators
string | null
embedding_model_id
string
default:text-embedding-v4
embedding_batch_size
integer | null
Required range: x > 0
max_retries
integer | null
Required range: x >= 0
retry_delay
number | null
Required range: x >= 0

Response

Successful Response

id
string
required
user_id
integer
required
job_type
string
required
queue
string
required
status
string
required
attempts
integer
required
max_attempts
integer
required
progress
Progress · object
result
Result · object
error_message
string | null
celery_task_id
string | null
started_at
string<date-time> | null
finished_at
string<date-time> | null
created_at
string<date-time> | null
updated_at
string<date-time> | null