POST
/
crawl

Authorizations

Authorization
string
headerrequired

Body

application/json
url
string
required

The url which starts the crawl. If a protocol is given it will be used, otherwise https be tried first and if that fails http will be tried.

goal
string | null
required

The goal or purpose of the crawl, used to guide the crawling process. This goal will be referenced against the content of the pages crawled and used to determine the relevancy of neighboring pages.

max_pages
integer | null

The maximum number of pages to crawl. Defaults to 1.

max_depth
integer | null

The maximum depth of the crawl. Defaults to 0.

render
boolean | null

Whether to render the pages via js. This sometimes results in more accurate content at additional cost and time. Defaults to False.

children_paths
string[] | null

A list of regex patterns to match against the hrefs of the pages crawled. If a page's href matches one of the patterns, it will be followed. Defaults to an empty list.

json_schema
string | null

If a JSON schema is provided, the output of the crawl will be in the given format.

Response

200 - application/json
id
string
required

Unique identifier for the crawl.

created_at
string
required

Timestamp indicating when the crawl was created.

state
string
required

Current state of the crawl (e.g., 'pending', 'running', 'completed', 'failed').

base_url
string
required

The starting URL for the crawl.

goal
string
required

The purpose or objective of the crawl.

children_paths
string[] | null

List of regex patterns for URLs to follow during the crawl.

json_schema
string | null

JSON schema for structuring the crawl output, if provided.

error
object | null

Error information if the crawl failed.

documents
object[] | null

List of documents retrieved during the crawl.

result
string[] | null
deprecated

List of page results from the crawl.