Create Crawl
Authorizations
Body
The url which starts the crawl. If a protocol is given it will be used, otherwise https be tried first and if that fails http will be tried.
The goal or purpose of the crawl, used to guide the crawling process. This goal will be referenced against the content of the pages crawled and used to determine the relevancy of neighboring pages.
The maximum number of pages to crawl. Defaults to 1.
The maximum depth of the crawl. Defaults to 0.
Whether to render the pages via js. This sometimes results in more accurate content at additional cost and time. Defaults to False.
A list of regex patterns to match against the hrefs of the pages crawled. If a page's href matches one of the patterns, it will be followed. Defaults to an empty list.
If a JSON schema is provided, the output of the crawl will be in the given format.
Response
Unique identifier for the crawl.
Timestamp indicating when the crawl was created.
Current state of the crawl (e.g., 'pending', 'running', 'completed', 'failed').
The starting URL for the crawl.
The purpose or objective of the crawl.
List of regex patterns for URLs to follow during the crawl.
JSON schema for structuring the crawl output, if provided.
Error information if the crawl failed.
List of documents retrieved during the crawl.
List of page results from the crawl.