APIs

Extract API

Deep dives

Article Extractor API → Article Preview API →

The /extract endpoint returns clean, structured article content - title, author, publication date, main text, and images - from blog posts, news articles, and other long-form pages.

GET https://api.ujeebu.com/extract

POST https://api.ujeebu.com/extract

TIP - Want markdown output?

/extract returns cleaned text and metadata. If you want LLM-ready markdown (chunks, semantic splits, main-content filter), call /markdown instead - it wraps /extract under the hood and bills at the same rate.

Choosing a mode

Goal	Call
Article / blog / news clean-text extraction	`/extract?url=…`
Native Go extractor (bypass remote API)	`/extract?url=…&mode=super`

Parameters

Required (one of)

Exactly one of url or raw_html must be provided.

Parameter	Type	Required	Default	Description
`url`	`string`	Yes (one of)	`-`	URL to extract from
`raw_html`	`string`	Yes (one of)	`-`	Pre-fetched HTML (skip fetching)

Extraction flags

Parameter	Type	Required	Default	Description
`text`	`bool`	No	`true`	Return cleaned text
`html`	`bool`	No	`true`	Return cleaned HTML
`media`	`bool`	No	`false`	Extract videos / embeds
`feeds`	`bool`	No	`false`	Extract RSS feeds
`images`	`bool`	No	`true`	Extract image URLs
`author`	`bool`	No	`true`	Extract author
`pub_date`	`bool`	No	`true`	Extract publication date
`publisher_country`	`bool`	No	`false`	Detect publisher country
`publisher_tz`	`bool`	No	`false`	Detect publisher timezone
`is_article`	`bool`	No	`true`	Return article-probability score
`partial`	`int`	No	`0`	Partial extraction mode
`quick_mode`	`bool`	No	`false`	Fast, less thorough
`heavy_mode`	`bool`	No	`false`	Slower, more thorough
`strip_tags`	`string\|[]string`	No	`"form"`	Tags to strip from HTML

Image processing

Parameter	Type	Required	Default	Description
`image_analysis`	`bool`	No	`true`	Analyze images (dims, relevance)
`min_image_width`	`int`	No	`200`	Min image width (px)
`min_image_height`	`int`	No	`100`	Min image height (px)
`image_timeout`	`int`	No	`2`	Per-image timeout (s)
`return_only_enclosed_text_images`	`bool`	No	`true`	Only images surrounded by text
`main_image_in_html`	`bool`	No	`false`	Embed main image in returned HTML

Browser / JS

Parameter	Type	Required	Default	Description
`js`	`bool`	No	`false`	Enable JS rendering
`js_timeout`	`int`	No	`30`	JS timeout (s)
`wait_until`	`enum`	No	`load`	One of `load`, `domcontentloaded`, `networkidle`, `commit`
`scroll_down`	`bool`	No	`false`	Scroll page after load
`scroll_wait`	`int`	No	`100`	Wait between scrolls (ms)
`scroll_percent`	`int`	No	`0`	Scroll fraction
`progressive_scroll`	`bool`	No	`false`	Incremental scroll
`scroll_callback`	`string`	No	`null`	JS to run during scroll
`scroll_to_selector`	`string`	No	`null`	CSS selector to scroll to

Proxy

Parameter	Type	Required	Default	Description
`proxy_type`	`string`	No	`null`	`datacenter`, `residential`, `premium`, `custom`, …
`proxy_country`	`string`	No	`null`	Two-letter country code
`session_id`	`string`	No	`null`	Sticky session id
`auto_proxy`	`bool`	No	`false`	Smart proxy selection
`auto_premium_proxy`	`bool`	No	`false`	Auto-upgrade to premium on failure
`custom_proxy`	`string`	Conditional	`null`	Required if `proxy_type=custom`
`custom_proxy_username`	`string`	No	`null`	Custom proxy auth
`custom_proxy_password`	`string`	No	`null`	Custom proxy auth

INFO - Non-US residential proxies are metered

residential / residential_us (US) bill a flat per-request rate. Setting a non-US proxy_country routes the request through residential_geo, which is metered: the residential base rate plus 10 credits per MB of response over the first 1 MB. The exact charge is returned in the Ujb-credits response header.

Pagination

Parameter	Type	Required	Default	Description
`pagination`	`bool`	No	`false`	Follow pagination links
`pagination_max_pages`	`int`	No	`0`	Max pages to follow (`0` = unlimited)

CAPTCHA

Parameter	Type	Required	Default	Description
`auto_captcha_solve`	`bool`	No	`false`	Detect & solve CAPTCHAs (forces super-mode)
`auto_captcha_solve_timeout`	`int`	No	`0`	CAPTCHA solve timeout (ms, `0` = use server default)

Other

Parameter	Type	Required	Default	Description
`timeout`	`int`	No	`60`	Overall timeout (s)
`cookies`	`string \| map`	No	`null`	Cookies to send
`extra_headers`	`object`	No	`null`	Forwarded headers (also `UJB-*` request headers)
`mode`	`string`	No	`null`	`super` / `d1*` to force native QuickExtract

Scrape API for raw HTML / screenshot / PDF
Markdown API for clean Markdown output for RAG

Ready to build?

Spin up an API key in 60 seconds

Free tier: 5,000 credits, no card, full access to every endpoint on this page.

Get free API key or try the playground →