Article Extractor · Extract API

Any article, clean structured JSON.

Point it at a news story, blog post, or any long-form page and get back the title, author, publish date, clean body text, images, and feeds. Deterministic, multi-language, no LLM bill.

Try in playground Read the docs

credits / call

40+

languages

~0.9s

median response

POST /extract

{
  "article": {
    "title": "Why we should learn German",
    "author": "John le Carré",
    "pub_date": "2017-07-02 00:05:12",
    "text": "I began learning German at the age of 13...",
    "html": "<p>I began learning German...</p>",
    "image": "https://.../lead.jpg",
    "images": ["https://.../fig-1.jpg"],
    "language": "en",
    "site_name": "the Guardian",
    "is_article": 1
  },
  "credits_used": 5
}

Live playground

See it work, before you sign up.

Drop in a URL, run a real call against the live API, and watch the JSON come back in about a second. No API key required.

Clean body text + HTML

Nav, ads, and boilerplate stripped out. Get plain text and sanitised HTML of just the article.

Author, date & metadata

Byline, publish and modified dates, canonical URL, site name, favicon, language - all detected for you.

Images, media & feeds

Lead image plus in-article images (filtered by size + relevance), embedded videos, and RSS feeds.

Multi-language

Non-Latin scripts and mixed encodings handled cleanly. Accurate extraction in 40+ languages.

JS rendering + proxies

Flip on headless Chrome for JS-heavy pages, rotate residential proxies, and solve CAPTCHAs on the hard targets.

Pagination + raw HTML

Stitch multi-page articles into one result, or pass raw_html you already have and skip the fetch entirely.

Drop-in code

Copy. Paste. Ship.

curl 'api.ujeebu.com/extract?url=https://example.com/post&media=1&images=1' \
  -H 'ApiKey: YOUR_API_KEY'

import { Ujeebu } from 'ujeebu';

const uj = new Ujeebu(process.env.UJEEBU_KEY);

const res = await uj.extract({
  url: 'https://example.com/post',
  media: true,
  images: true,
});

console.log(res.article.title, res.article.author);

from ujeebu import Ujeebu

uj = Ujeebu(api_key='YOUR_API_KEY')

res = uj.extract(
    url='https://example.com/post',
    media=True,
    images=True,
)

print(res['article']['title'], res['article']['author'])

What people build with it

Real things real teams shipped this quarter.

News & media monitoring

Turn thousands of article URLs into a clean, searchable text corpus with author and date attached. Track coverage without babysitting parsers.

Content aggregation

Pull full-text from blogs, publishers, and newsrooms into your own reader, newsletter, or knowledge base, images and feeds included.

RAG & LLM ingest

Deterministic, boilerplate-free article text is the cleanest input for embeddings and retrieval. Pair with the Markdown API for chunk-ready output.

Pricing

5 credits per Article Extractor call. Here's what that buys.

One credit pool covers every endpoint. Failed calls cost 0. No per-feature upcharges, no premium-proxy tax. See full pricing →

Start free

Starter per month

$49

60k

Article Extractor calls / month

300k credits · used flexibly across all endpoints

Choose Starter

Next up: any page to JSON, automatically.

Article extraction is what we do today, deterministically, for blogs, news, and long-form pages. Auto-Extract takes the same idea to every other page type: point it at a product listing, a profile, a directory row, and it detects the page type and returns structured JSON, with no selectors and no schema to write.

Any page type. Products, listings, profiles, and more, not just articles.
No rules, no schema. It works out the shape of the page and hands back clean JSON.
Same endpoint you already call. A single type=auto flag on /extract, when it ships.

On the roadmap. The /extract endpoint is built to grow into it, so today's article integration keeps working unchanged.

POST /extract?type=auto · soon

{
  "page_type": "product",
  "data": {
    "name": "Sony WH-1000XM5",
    "price": 399.99,
    "currency": "USD",
    "in_stock": true,
    "rating": 4.7
  },
  "credits_used": 20
}

Ship Article Extractor tonight.

5,000 credits free. No card. Real residential proxies on the free tier.

Get API key Try in playground

Pack	Credits	~ Article Extractor calls	Price	Per 1K credits
Mini	50,000	10k	$10.00	$0.20	Buy
Starter	100,000	20k	$15.00	$0.15	Buy
Standard	500,000	100k	$60.00	$0.12	Buy
Pro	1,000,000	200k	$100.00	$0.10	Buy
Enterprise	5,000,000	1M	$400.00	$0.08	Buy