Use case · Markdown for AI

Turn any URL into clean Markdown your LLM can actually read.

Web pages were not designed for context windows. Our Markdown endpoint strips the chrome, preserves headings, tables, and code, and ships citations - so your RAG and agent loops eat tokens, not navbars.

Start free Try in playground

5,000 free credits · no card · 1 credit per markdown conversion

Source HTML

docs.example.com/quickstart

<nav>
  <ul class="menu">
    <li><a href="/">Home</a></li>
    <li><a href="/docs">Docs</a></li>
  </ul>
</nav>

<aside class="cookie-banner">
  We use cookies. <button>Accept all</button>
</aside>

<main>
  <article>
    <h1>Quickstart</h1>
    <p>Welcome to the <em>API</em>. Follow these steps:</p>
    <ol>
      <li>Sign up for a free account</li>
      <li>Get your API key</li>
      <li>Make your first request</li>
    </ol>
    <pre><code class="lang-bash">
curl -X GET https://api.example.com/data
    </code></pre>
  </article>
</main>

<footer>© 2026 · Privacy · Terms</footer>

Clean Markdown

ready for context window

# Quickstart

Welcome to the *API*. Follow these steps:

1. Sign up for a free account
2. Get your API key
3. Make your first request

```bash
curl -X GET https://api.example.com/data
```

- filtered: nav, cookie-banner, footer
- tokens: 47  ·  source tokens: 184

The challenge

Why "just convert HTML to markdown" doesn’t work.

A working pipeline has to handle the dirty parts of the modern web - and emit Markdown an LLM can actually use.

Noisy HTML

Pages ship 60–90% boilerplate: navs, footers, cookie banners, related-articles widgets, share buttons. Naive converters drag it all into the context window.

Lost structure

Headings, tables, and code blocks survive the round-trip only if the converter respects the DOM. Most don’t.

No citations

For RAG, every chunk needs a stable source URL. Most "html→md" tools forget where the text came from.

JS-rendered pages

Docs sites, SPAs, and modern marketing pages render in the browser. Static-only converters return empty shells.

Use cases

Four workflows. One endpoint.

RAG pipelines

Feed your retrieval system, not your noise filter.

Crawl any source - docs, news, forums - and pipe clean, chunked Markdown straight into your vector store. Headings become natural chunk boundaries; tables stay tables; code blocks stay code.

What you get

Cut ingest tokens by 60–80% vs raw HTML
Preserve heading hierarchy as chunk metadata
Drop in alongside LlamaIndex, LangChain, Haystack

Agent context

Hand the agent the page, not the wallpaper.

When an autonomous agent fetches a URL, you have one chance to give it usable context. Markdown that respects structure means the model can answer about "the third row of the pricing table" - not give up.

What you get

Drop into MCP tool / OpenAI function loops
1 credit per page · sub-second latency on cached
Token-budget filter modes (raw / fit / bm25)

Knowledge base sync

Mirror docs sites your team uses every day.

Continuously sync vendor docs, API references, and changelogs into your internal Notion/Confluence/Slack. Markdown lossless-renders in every modern wiki, and the link graph survives the import.

What you get

Webhook-friendly batched output
JS rendering covers SPAs (Stripe, Docusaurus, MkDocs)
Heading anchors preserved for deep links

Content workflows

Bring the open web into your CMS, your way.

Editorial teams converting press releases, briefs, and competitor research need clean text, not screen-scraped soup. Markdown is the lowest-friction input format for every CMS shipped this decade.

What you get

Strip ads + related-content blocks automatically
Hero image + caption extracted as front-matter
Plays nicely with Sanity, Contentful, Strapi

Filter modes

Three modes. Pick what fits the prompt.

For agents that pay-per-token, the right filter is often worth more than the right LLM. Switch modes per request.

response.markdown~4,218 tokens

# Quickstart

Welcome to the *API*. Follow these steps:

1. Sign up for a free account
2. Get your API key
3. Make your first request

## Authentication

All requests require an API key sent as a Bearer token in the
Authorization header. Keys are issued from the dashboard.

```bash
curl -X GET https://api.example.com/data \
  -H "Authorization: Bearer YOUR_KEY"
```

## Rate Limits

| Plan     | Requests/min | Burst |
| -------- | ------------ | ----- |
| Free     | 60           | 100   |
| Pro      | 300          | 500   |
| Business | 1000         | 2000  |

…and another 18 sections of detail.

When to use

Everything in the article. Full text, full headings, full tables - no token budget enforced.

Token budgetNone

Structure keptFull

CitationsYes - every output

Credit cost1 / page

?url=...&filter=raw

response.markdown~612 tokens

# Quickstart

Welcome to the *API*. Follow these steps:

1. Sign up for a free account
2. Get your API key
3. Make your first request

## Authentication

All requests require an API key sent as a Bearer token in the
Authorization header.

```bash
curl -X GET https://api.example.com/data \
  -H "Authorization: Bearer YOUR_KEY"
```

… (truncated at 600 tokens, structure preserved)

When to use

Truncates to a target token budget while keeping headings + structure. Default mode.

Token budgettarget_tokens

Structure keptFull

CitationsYes - every output

Credit cost1 / page

?url=...&filter=fit

response.markdown~184 tokens

# Quickstart

> query: "how do I authenticate?"

## Authentication

All requests require an API key sent as a Bearer token in the
Authorization header. Keys are issued from the dashboard.

```bash
curl -X GET https://api.example.com/data \
  -H "Authorization: Bearer YOUR_KEY"
```

(showing 1 of 24 sections, ranked by relevance)

When to use

Keeps only the paragraphs that match your query - useful when you know what the agent is looking for.

Token budgettop-N by relevance

Structure keptPer-section

CitationsYes - every output

Credit cost1 / page

?url=...&filter=bm25&q=how%20do%20I%20authenticate

How it works

URL in. Clean Markdown out.

1

Send a URL

Call GET /markdown?url=... with optional flags for JS rendering, filter mode, and chunking. One credit per successful conversion.

2

We render + clean

Headless Chromium executes the page, the cleaner walks the DOM, drops boilerplate (nav, footer, ads), and converts the rest to Markdown with preserved structure.

3

You get LLM-ready text

Markdown body, chunked by heading if you asked, with a citations[] array tying every chunk back to its source URL.

Try it

Drop a URL in. See the markdown.

Use the playground for a real round-trip - your URL, our renderer, fully cited output.

url Try in playground

curl 'https://api.ujeebu.com/markdown' \
  -H 'ApiKey: YOUR_API_KEY' \
  -G \
  --data-urlencode 'url=https://docs.stripe.com/api' \
  --data-urlencode 'filter=fit' \
  --data-urlencode 'js=true'

No API key required for testing in the playground. Powered by /markdown

Features

Everything a clean Markdown pipeline needs.

Smart filtering

Three filter modes - raw, fit, bm25 - keep token budgets predictable. Pick per request.

Citations included

Every response includes a citations[] array tying each chunk back to its source URL. Drop straight into a RAG store.

JS rendering

Real headless Chromium when you need it. Docusaurus, MkDocs, Notion - the SPAs your team actually reads.

Heading hierarchy

Headings survive the round-trip with depth intact, so chunkers can use them as natural boundaries.

Code & table fidelity

Code blocks keep their language hint. Tables stay tables. Lists stay lists. Inline emphasis preserved.

Anti-bot when needed

Heavily protected pages? Pass premium_proxy=true and stealth=true. Same endpoint, no separate API.

Powered by

Markdown API Try in playground

FAQ

Frequently asked.

How is this different from Firecrawl, Jina Reader, or just calling html-to-md?

All three return Markdown. The differences: we ship filter modes (raw / fit / bm25) so you control token spend per request, we include citations[] for RAG, and we share one credit pool with our scraper/SERP/AI endpoints - no separate subscription. We’re also the cheapest if you’re already paying us for proxies.

What page types work best?

Docs sites, news articles, blogs, knowledge bases, FAQ pages, product changelogs, API references. Highly interactive apps (Figma, Notion live editing) are out of scope - the Markdown we’d return wouldn’t represent the page meaningfully.

Does it work for SPAs / JS-rendered pages?

Yes. Pass js=true and we render with headless Chromium before converting. Same flag as our /scrape endpoint; same anti-bot stack inherited.

How do filter modes work?

raw returns everything we extract - full Markdown, no token cap. fit truncates to a target token budget while keeping headings intact. bm25 ranks paragraphs by your query and returns the most relevant ones. Default is fit, target 2K tokens.

Can I get chunks instead of a single Markdown blob?

Yes - pass chunk=true. We return an array of chunks split by heading (h1/h2/h3 depending on document length), each with its own citation URL and heading path. Drop straight into a vector store.

What does it cost?

1 credit per successful conversion. JS-rendered pages: 5 credits. Failed requests are zero. Compare to per-token markup other vendors apply on top - we just charge the conversion.

5,000 free credits to start.

No credit card. Failed requests cost zero.

Start free

Explore other use cases

View all →

Screenshot API → PDF generation → Extract articles → Content aggregation → SEO & SERP tracking →

Ship Markdown your LLM actually likes.

One endpoint. One credit per page. Citations included. Free tier covers ~5,000 conversions before you spend a dollar.

Get API key Start free trial Talk to an expert

No credit card required.