Turn any URL into clean Markdown your LLM can actually read.
Web pages were not designed for context windows. Our Markdown endpoint strips the chrome, preserves headings, tables, and code, and ships citations — so your RAG and agent loops eat tokens, not navbars.
<nav>
<ul class="menu">
<li><a href="/">Home</a></li>
<li><a href="/docs">Docs</a></li>
</ul>
</nav>
<aside class="cookie-banner">
We use cookies. <button>Accept all</button>
</aside>
<main>
<article>
<h1>Quickstart</h1>
<p>Welcome to the <em>API</em>. Follow these steps:</p>
<ol>
<li>Sign up for a free account</li>
<li>Get your API key</li>
<li>Make your first request</li>
</ol>
<pre><code class="lang-bash">
curl -X GET https://api.example.com/data
</code></pre>
</article>
</main>
<footer>© 2026 · Privacy · Terms</footer># Quickstart Welcome to the *API*. Follow these steps: 1. Sign up for a free account 2. Get your API key 3. Make your first request ```bash curl -X GET https://api.example.com/data ``` — filtered: nav, cookie-banner, footer — tokens: 47 · source tokens: 184
Why "just convert HTML to markdown" doesn’t work.
A working pipeline has to handle the dirty parts of the modern web — and emit Markdown an LLM can actually use.
Pages ship 60–90% boilerplate: navs, footers, cookie banners, related-articles widgets, share buttons. Naive converters drag it all into the context window.
Headings, tables, and code blocks survive the round-trip only if the converter respects the DOM. Most don’t.
For RAG, every chunk needs a stable source URL. Most "html→md" tools forget where the text came from.
Docs sites, SPAs, and modern marketing pages render in the browser. Static-only converters return empty shells.
Four workflows. One endpoint.
Feed your retrieval system, not your noise filter.
Crawl any source — docs, news, forums — and pipe clean, chunked Markdown straight into your vector store. Headings become natural chunk boundaries; tables stay tables; code blocks stay code.
- Cut ingest tokens by 60–80% vs raw HTML
- Preserve heading hierarchy as chunk metadata
- Drop in alongside LlamaIndex, LangChain, Haystack
Hand the agent the page, not the wallpaper.
When an autonomous agent fetches a URL, you have one chance to give it usable context. Markdown that respects structure means the model can answer about "the third row of the pricing table" — not give up.
- Drop into MCP tool / OpenAI function loops
- 1 credit per page · sub-second latency on cached
- Token-budget filter modes (raw / fit / bm25)
Mirror docs sites your team uses every day.
Continuously sync vendor docs, API references, and changelogs into your internal Notion/Confluence/Slack. Markdown lossless-renders in every modern wiki, and the link graph survives the import.
- Webhook-friendly batched output
- JS rendering covers SPAs (Stripe, Docusaurus, MkDocs)
- Heading anchors preserved for deep links
Bring the open web into your CMS, your way.
Editorial teams converting press releases, briefs, and competitor research need clean text, not screen-scraped soup. Markdown is the lowest-friction input format for every CMS shipped this decade.
- Strip ads + related-content blocks automatically
- Hero image + caption extracted as front-matter
- Plays nicely with Sanity, Contentful, Strapi
Three modes. Pick what fits the prompt.
For agents that pay-per-token, the right filter is often worth more than the right LLM. Switch modes per request.
# Quickstart Welcome to the *API*. Follow these steps: 1. Sign up for a free account 2. Get your API key 3. Make your first request ## Authentication All requests require an API key sent as a Bearer token in the Authorization header. Keys are issued from the dashboard. ```bash curl -X GET https://api.example.com/data \ -H "Authorization: Bearer YOUR_KEY" ``` ## Rate Limits | Plan | Requests/min | Burst | | -------- | ------------ | ----- | | Free | 60 | 100 | | Pro | 300 | 500 | | Business | 1000 | 2000 | …and another 18 sections of detail.
Everything in the article. Full text, full headings, full tables — no token budget enforced.
?url=...&filter=raw
# Quickstart Welcome to the *API*. Follow these steps: 1. Sign up for a free account 2. Get your API key 3. Make your first request ## Authentication All requests require an API key sent as a Bearer token in the Authorization header. ```bash curl -X GET https://api.example.com/data \ -H "Authorization: Bearer YOUR_KEY" ``` … (truncated at 600 tokens, structure preserved)
Truncates to a target token budget while keeping headings + structure. Default mode.
?url=...&filter=fit
# Quickstart > query: "how do I authenticate?" ## Authentication All requests require an API key sent as a Bearer token in the Authorization header. Keys are issued from the dashboard. ```bash curl -X GET https://api.example.com/data \ -H "Authorization: Bearer YOUR_KEY" ``` (showing 1 of 24 sections, ranked by relevance)
Keeps only the paragraphs that match your query — useful when you know what the agent is looking for.
?url=...&filter=bm25&q=how%20do%20I%20authenticate
URL in. Clean Markdown out.
Send a URL
Call GET /markdown?url=... with optional flags for JS rendering, filter mode, and chunking. One credit per successful conversion.
We render + clean
Headless Chromium executes the page, the cleaner walks the DOM, drops boilerplate (nav, footer, ads), and converts the rest to Markdown with preserved structure.
You get LLM-ready text
Markdown body, chunked by heading if you asked, with a citations[] array tying every chunk back to its source URL.
Drop a URL in. See the markdown.
Use the playground for a real round-trip — your URL, our renderer, fully cited output.
curl 'https://api.ujeebu.com/markdown' \
-H 'ApiKey: YOUR_API_KEY' \
-G \
--data-urlencode 'url=https://docs.stripe.com/api' \
--data-urlencode 'filter=fit' \
--data-urlencode 'js=true'
Everything a clean Markdown pipeline needs.
Smart filtering
Three filter modes — raw, fit, bm25 — keep token budgets predictable. Pick per request.
Citations included
Every response includes a citations[] array tying each chunk back to its source URL. Drop straight into a RAG store.
JS rendering
Real headless Chromium when you need it. Docusaurus, MkDocs, Notion — the SPAs your team actually reads.
Heading hierarchy
Headings survive the round-trip with depth intact, so chunkers can use them as natural boundaries.
Code & table fidelity
Code blocks keep their language hint. Tables stay tables. Lists stay lists. Inline emphasis preserved.
Anti-bot when needed
Heavily protected pages? Pass premium_proxy=true and stealth=true. Same endpoint, no separate API.
Frequently asked.
How is this different from Firecrawl, Jina Reader, or just calling html-to-md?
What page types work best?
Does it work for SPAs / JS-rendered pages?
How do filter modes work?
Can I get chunks instead of a single Markdown blob?
What does it cost?
Explore other use cases
View all →Ship Markdown your LLM actually likes.
One endpoint. One credit per page. Citations included. Free tier covers ~5,000 conversions before you spend a dollar.