Playground Sign in Start free
Markdown

HTML in. LLM-ready Markdown out.

Convert any URL into clean Markdown: chunked, embed-ready, deduplicated. The fastest path from "the live web" to a vector store. Built specifically for RAG and LLM training pipelines.

3
credits / call
1.1s
p50 latency
4
chunk strategies
POST /markdown
POST /markdown
{
  "url":      "https://docs.stripe.com/api/charges",
  "chunk":    "semantic",
  "max_tokens": 800,
  "include":  "main, article, .content"
}

→ 200 OK · 3 credits · 1.1s
{
  "chunks": [
    { "id":"c0", "tokens": 412, "text":"# Charges\n\nA Charge..." },
    { "id":"c1", "tokens": 786, "text":"## Create a Charge\n\nPOST /..." },
    ...
  ],
  "title": "Stripe API · Charges",
  "url":   "https://docs.stripe.com/api/charges"
}
Live playground

See it work, before you sign up.

Drop in a URL, run a real call against the live API, and watch the JSON come back in about a second. No API key required.

Real Markdown

Headings stay headings. Tables stay tables. Code blocks keep their language hints. Links keep their anchors. None of the GPT-flattened slop.

Smart chunking

Pick semantic (header-aware), fixed (token-bounded), sentence, or none. We respect natural document boundaries.

Token-counted

Every chunk comes with tokens pre-counted for the tokenizer of your choice (cl100k, o200k, claude). No surprises at embedding time.

Boilerplate stripped

Nav, footer, ads, "related articles", all gone. Pass include selectors to keep specific zones; pass exclude to drop them.

Embed-ready

Pass embed: "openai" or embed: "voyage" and we return chunks with embedding vectors attached. One round-trip, vector-store-ready.

Site-crawl mode

Pass a sitemap URL with crawl: "site" and we Markdown-ify the entire docs site, dedup, and stream chunks via webhook.

Drop-in code

Copy. Paste. Ship.

import { Ujeebu } from "ujeebu";
const uj = new Ujeebu(process.env.UJEEBU_KEY);

const { chunks } = await uj.markdown({
  url:        "https://docs.stripe.com/api/charges",
  chunk:      "semantic",
  max_tokens: 800,
});

await pinecone.upsert(chunks.map(c => ({
  id: c.id, values: await embed(c.text)
})));
from ujeebu import Ujeebu
uj = Ujeebu(api_key=os.environ["UJEEBU_KEY"])

result = uj.markdown(
    url="https://docs.stripe.com/api/charges",
    chunk="semantic", max_tokens=800,
    embed="openai",   # vectors included
)

pinecone.upsert([
    (c["id"], c["embedding"], {"text": c["text"]})
    for c in result["chunks"]
])
curl -X POST https://api.ujeebu.com/markdown \
  -H "ApiKey: $UJEEBU_KEY" \
  -d '{
    "url":    "https://docs.stripe.com/api/charges",
    "filter": "fit"
  }'
// Markdown-ify a whole docs site
const job = await uj.markdown({
  url:     "https://docs.stripe.com/sitemap.xml",
  crawl:   "site",
  chunk:   "semantic",
  webhook: "https://your-app.com/ingest"
});

// Chunks stream to your webhook as they finish.
console.log(job.id, job.estimated_pages); // => 4,200
What people build with it

Real things real teams shipped this quarter.

RAG ingest at scale

A devtools company keeps embeddings of 18k docs sites fresh. Site-crawl mode + nightly delta. No crawler code. No HTML stripping. No reprocessing.

LLM training corpora

Build domain-specific Markdown datasets. Filter by token count and language at extraction time, not in post-processing.

Doc-search products

Customer points at their docs URL. 90 seconds later they have a working "ask my docs" widget. Zero-config ingest, real-time freshness.

Ground-truth chunks for evals

Build LLM eval sets from real public content. Same chunks day-over-day means stable eval scores; no "the doc moved" noise.

Ship Markdown tonight.

5,000 credits free. No card. Real residential proxies on the free tier.