HTML in. LLM-ready Markdown out.
Convert any URL into clean Markdown: chunked, embed-ready, deduplicated. The fastest path from "the live web" to a vector store. Built specifically for RAG and LLM training pipelines.
POST /markdown
{
"url": "https://docs.stripe.com/api/charges",
"chunk": "semantic",
"max_tokens": 800,
"include": "main, article, .content"
}
→ 200 OK · 3 credits · 1.1s
{
"chunks": [
{ "id":"c0", "tokens": 412, "text":"# Charges\n\nA Charge..." },
{ "id":"c1", "tokens": 786, "text":"## Create a Charge\n\nPOST /..." },
...
],
"title": "Stripe API · Charges",
"url": "https://docs.stripe.com/api/charges"
}
See it work, before you sign up.
Drop in a URL, run a real call against the live API, and watch the JSON come back in about a second. No API key required.
Headings stay headings. Tables stay tables. Code blocks keep their language hints. Links keep their anchors. None of the GPT-flattened slop.
Pick semantic (header-aware), fixed (token-bounded), sentence, or none. We respect natural document boundaries.
Every chunk comes with tokens pre-counted for the tokenizer of your choice (cl100k, o200k, claude). No surprises at embedding time.
Nav, footer, ads, "related articles", all gone. Pass include selectors to keep specific zones; pass exclude to drop them.
Pass embed: "openai" or embed: "voyage" and we return chunks with embedding vectors attached. One round-trip, vector-store-ready.
Pass a sitemap URL with crawl: "site" and we Markdown-ify the entire docs site, dedup, and stream chunks via webhook.
Copy. Paste. Ship.
import { Ujeebu } from "ujeebu";
const uj = new Ujeebu(process.env.UJEEBU_KEY);
const { chunks } = await uj.markdown({
url: "https://docs.stripe.com/api/charges",
chunk: "semantic",
max_tokens: 800,
});
await pinecone.upsert(chunks.map(c => ({
id: c.id, values: await embed(c.text)
})));
from ujeebu import Ujeebu
uj = Ujeebu(api_key=os.environ["UJEEBU_KEY"])
result = uj.markdown(
url="https://docs.stripe.com/api/charges",
chunk="semantic", max_tokens=800,
embed="openai", # vectors included
)
pinecone.upsert([
(c["id"], c["embedding"], {"text": c["text"]})
for c in result["chunks"]
])
curl -X POST https://api.ujeebu.com/markdown \
-H "ApiKey: $UJEEBU_KEY" \
-d '{
"url": "https://docs.stripe.com/api/charges",
"filter": "fit"
}'
// Markdown-ify a whole docs site
const job = await uj.markdown({
url: "https://docs.stripe.com/sitemap.xml",
crawl: "site",
chunk: "semantic",
webhook: "https://your-app.com/ingest"
});
// Chunks stream to your webhook as they finish.
console.log(job.id, job.estimated_pages); // => 4,200
Real things real teams shipped this quarter.
A devtools company keeps embeddings of 18k docs sites fresh. Site-crawl mode + nightly delta. No crawler code. No HTML stripping. No reprocessing.
Build domain-specific Markdown datasets. Filter by token count and language at extraction time, not in post-processing.
Customer points at their docs URL. 90 seconds later they have a working "ask my docs" widget. Zero-config ingest, real-time freshness.
Build LLM eval sets from real public content. Same chunks day-over-day means stable eval scores; no "the doc moved" noise.
3 credits per Markdown call. Here's what that buys.
One credit pool covers every endpoint. Failed calls cost 0. No per-feature upcharges, no premium-proxy tax. See full pricing →
Ship Markdown tonight.
5,000 credits free. No card. Real residential proxies on the free tier.