APIs

AI Scraper

Extract structured data from websites using natural language prompts powered by Large Language Models

Overview

The AI Scraper API uses Large Language Models to extract structured data from websites using simple natural language instructions. Instead of writing complex CSS selectors or parsing HTML, you describe what you want to extract in plain English, and the AI does the heavy lifting.

Perfect for:

Extracting data from complex or dynamically structured pages
Websites where CSS selectors change frequently
Unstructured content like articles, reviews, or social media
Quick prototyping without reverse-engineering page structure
When you need flexibility and don't want to maintain selectors

Quick Start

Basic Extraction

The simplest way to use AI Scraper is with just a URL and a prompt:

curl -X POST https://api.ujeebu.com/ai-scraper \
  -H "Content-Type: application/json" \
  -H "ApiKey: YOUR_API_KEY" \
  -d '{
    "url": "https://example.com/product",
    "prompt": "Extract the product name, price, and rating"
  }'

import { UjeebuClient } from '@ujeebu-org/ujeebu-sdk';

const client = new UjeebuClient('YOUR_API_KEY');

const response = await client.aiScrape(
  'https://example.com/product',
  'Extract the product name, price, and rating'
);

console.log(response.data.data);

from ujeebu_python import UjeebuClient

client = UjeebuClient('YOUR_API_KEY')

response = client.ai_scrape(
    'https://example.com/product',
    'Extract the product name, price, and rating'
)

print(response.json()['data'])

package main

import (
    "fmt"
    "log"
    ujeebu "github.com/ujeebu/ujeebu-go"
)

func main() {
    client, err := ujeebu.NewClient("YOUR_API_KEY")
    if err != nil {
        log.Fatal(err)
    }

    result, _, err := client.AIScrape(ujeebu.AIScrapeParams{
        URL:    "https://example.com/product",
        Prompt: "Extract the product name, price, and rating",
    })

    if err != nil {
        log.Fatal(err)
    }

    fmt.Println(result.Data)
}

Response

{
  "success": true,
  "data": {
    "product_name": "Wireless Headphones",
    "price": "$79.99",
    "rating": 4.5
  },
  "metadata": {
    "html_length": 15420,
    "chunks_processed": 2,
    "extraction_time_ms": 2300,
    "input_tokens": 3200,
    "output_tokens": 150
  }
}

Request Parameters

Required Parameters

Parameter	Type	Required	Default	Description
`url`	`string`	Yes	`-`	URL of the webpage to scrape. Supports all standard scraping features (JavaScript rendering, proxies, etc.)
`prompt`	`string`	No		Natural language instruction describing what data to extract. Be specific about field names and data types. Required unless a `schema` is provided — when a schema is given, the prompt is optional.

AI Configuration

Parameter	Type	Required	Default	Description
`temperature`	`number`	No	`0.0`	LLM temperature (0.0–1.0). Lower values = more deterministic, higher = more creative.
`schema`	`object`	No		JSON schema defining the expected structure of extracted data. Ensures consistent, type-safe output.

Standard Scraping Parameters

All standard scraping parameters are supported:

Parameter	Type	Required	Default	Description
`js`	`boolean`	No	`false`	Enable JavaScript rendering in the browser.
`proxy_type`	`string`	No		Proxy type: 'rotating', 'advanced', 'premium', 'residential', 'residential_us', 'residential_geo'. Auto proxy is enabled by default if not specified.
`proxy_country`	`string`	No		ISO country code for proxy location (e.g., 'US', 'GB', 'FR')
`timeout`	`number`	No	`120`	Request timeout in seconds.
`wait_for`	`string	number`	No
`auto_captcha_solve`	`boolean`	No	`true`	Enable automatic CAPTCHA solving. Note: unlike other endpoints this defaults to `true` here.
`auto_captcha_solve_timeout`	`number`	No		Timeout in milliseconds for CAPTCHA solving

Structured Extraction with Schemas

Schemas ensure consistent, type-safe output by defining the exact structure you expect.

Basic Schema Example

{
  "url": "https://example.com/product",
  "prompt": "Extract product with variants and reviews",
  "schema": {
    "type": "object",
    "properties": {
      "name": { "type": "string" },
      "variants": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "color": { "type": "string" },
            "size": { "type": "string" },
            "price": { "type": "number" },
            "available": { "type": "boolean" }
          }
        }
      },
      "reviews": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "author": { "type": "string" },
            "rating": { "type": "integer" },
            "comment": { "type": "string" },
            "date": { "type": "string" }
          }
        }
      }
    }
  }
}

Schema Response

{
  "success": true,
  "data": {
    "name": "Premium T-Shirt",
    "variants": [
      {
        "color": "Blue",
        "size": "M",
        "price": 29.99,
        "available": true
      },
      {
        "color": "Red",
        "size": "L",
        "price": 29.99,
        "available": false
      }
    ],
    "reviews": [
      {
        "author": "John D.",
        "rating": 5,
        "comment": "Great quality!",
        "date": "2025-01-05"
      }
    ]
  },
  "metadata": {
    "html_length": 24530,
    "chunks_processed": 3,
    "extraction_time_ms": 3100,
    "input_tokens": 5400,
    "output_tokens": 280
  }
}

Common Examples

Example 1: E-commerce Product

Extract product details with variants and pricing:

curl -X POST https://api.ujeebu.com/ai-scraper \
  -H "Content-Type: application/json" \
  -H "ApiKey: YOUR_API_KEY" \
  -d '{
    "url": "https://shop.example.com/product",
    "prompt": "Extract product name, price, currency, rating, and available colors",
    "schema": {
      "type": "object",
      "properties": {
        "name": {"type": "string"},
        "price": {"type": "number"},
        "currency": {"type": "string"},
        "rating": {"type": "number"},
        "colors": {
          "type": "array",
          "items": {"type": "string"}
        }
      },
      "required": ["name", "price", "currency"]
    }
  }'

import { UjeebuClient } from '@ujeebu-org/ujeebu-sdk';

const client = new UjeebuClient('YOUR_API_KEY');

const response = await client.aiScrape(
  'https://shop.example.com/product',
  'Extract product name, price, currency, rating, and available colors',
  {
    schema: {
      type: 'object',
      properties: {
        name: { type: 'string' },
        price: { type: 'number' },
        currency: { type: 'string' },
        rating: { type: 'number' },
        colors: {
          type: 'array',
          items: { type: 'string' }
        }
      },
      required: ['name', 'price', 'currency']
    }
  }
);

console.log(response.data.data);

from ujeebu_python import UjeebuClient

client = UjeebuClient('YOUR_API_KEY')

response = client.ai_scrape(
    'https://shop.example.com/product',
    'Extract product name, price, currency, rating, and available colors',
    params={
        'schema': {
            'type': 'object',
            'properties': {
                'name': {'type': 'string'},
                'price': {'type': 'number'},
                'currency': {'type': 'string'},
                'rating': {'type': 'number'},
                'colors': {
                    'type': 'array',
                    'items': {'type': 'string'}
                }
            },
            'required': ['name', 'price', 'currency']
        }
    }
)

data = response.json()
print(data['data'])

package main

import (
    "fmt"
    "log"
    ujeebu "github.com/ujeebu/ujeebu-go"
)

func main() {
    client, err := ujeebu.NewClient("YOUR_API_KEY")
    if err != nil {
        log.Fatal(err)
    }

    result, _, err := client.AIScrape(ujeebu.AIScrapeParams{
        URL:    "https://shop.example.com/product",
        Prompt: "Extract product name, price, currency, rating, and available colors",
        Schema: map[string]any{
            "type": "object",
            "properties": map[string]any{
                "name":     map[string]any{"type": "string"},
                "price":    map[string]any{"type": "number"},
                "currency": map[string]any{"type": "string"},
                "rating":   map[string]any{"type": "number"},
                "colors": map[string]any{
                    "type":  "array",
                    "items": map[string]any{"type": "string"},
                },
            },
            "required": []string{"name", "price", "currency"},
        },
    })

    if err != nil {
        log.Fatal(err)
    }

    fmt.Println(result.Data)
}

Example 2: News Article Extraction

Extract article content with metadata:

{
  "url": "https://news.example.com/article",
  "prompt": "Extract the article headline, author, publish date, full content, and tags",
  "schema": {
    "type": "object",
    "properties": {
      "headline": { "type": "string" },
      "subheadline": { "type": "string" },
      "author": { "type": "string" },
      "publish_date": { "type": "string" },
      "content": { "type": "string" },
      "tags": {
        "type": "array",
        "items": { "type": "string" }
      },
      "category": { "type": "string" }
    }
  }
}

Example 3: Restaurant Reviews

Extract multiple reviews from a restaurant page:

{
  "url": "https://restaurant-reviews.example.com/place/123",
  "prompt": "Extract all customer reviews including ratings and dates",
  "schema": {
    "type": "object",
    "properties": {
      "restaurant_name": { "type": "string" },
      "overall_rating": { "type": "number" },
      "reviews": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "author": { "type": "string" },
            "rating": { "type": "integer" },
            "title": { "type": "string" },
            "comment": { "type": "string" },
            "date": { "type": "string" },
            "helpful_count": { "type": "integer" }
          }
        }
      }
    }
  }
}

Example 4: Contact Information

Extract contact details from a business website:

{
  "url": "https://business.example.com/contact",
  "prompt": "Extract all contact information including phone, email, address, and social media",
  "schema": {
    "type": "object",
    "properties": {
      "phone": { "type": "string" },
      "email": { "type": "string" },
      "address": {
        "type": "object",
        "properties": {
          "street": { "type": "string" },
          "city": { "type": "string" },
          "state": { "type": "string" },
          "zip": { "type": "string" },
          "country": { "type": "string" }
        }
      },
      "social_media": {
        "type": "object",
        "properties": {
          "facebook": { "type": "string" },
          "twitter": { "type": "string" },
          "instagram": { "type": "string" },
          "linkedin": { "type": "string" }
        }
      }
    }
  }
}

Best Practices

Writing Effective Prompts

Be Specific:

❌ "Get product info"
✅ "Extract the product name, price in USD, availability status, and customer rating"

Mention Field Names:

❌ "Extract price and stock"
✅ "Extract 'price' as a number and 'in_stock' as a boolean"

Specify Data Types:

❌ "Extract the rating"
✅ "Extract the rating as a decimal number between 0 and 5"

Handle Missing Data:

✅ "If rating is not available, return null"
✅ "If price includes currency symbol, remove it and return only the number"

Schema Design Tips

Use Required Fields: Mark essential fields as required to ensure they're always present:

{
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "price": { "type": "number" }
  },
  "required": ["name", "price"]
}

Define Defaults: Provide default values for optional fields:

{
  "properties": {
    "rating": { "type": "number", "default": 0 },
    "in_stock": { "type": "boolean", "default": false }
  }
}

Use Enums for Fixed Values: Constrain values to specific options:

{
  "properties": {
    "size": {
      "type": "string",
      "enum": ["S", "M", "L", "XL"]
    }
  }
}

Performance Optimization

Enable JavaScript Only When Needed:

{
  "url": "https://static-site.example.com",
  "prompt": "Extract content",
  "js": false  // Faster for static pages
}

Chunk Large Pages: The AI automatically chunks content, but for very large pages, consider extracting specific sections:

{
  "prompt": "Extract only the main article content, ignore navigation and footer"
}

Error Handling

Common Errors

Invalid Schema:

{
  "success": false,
  "error": "Invalid schema: property 'price' must be of type number"
}

LLM Timeout:

{
  "success": false,
  "error": "LLM request timeout after 60s. Try reducing content size or increasing timeout."
}

Extraction Failed:

{
  "success": true,
  "data": null,
  "error": "Could not extract requested data from page content"
}

Error Recovery

Fallback to Extract Rules: For structured pages, consider using traditional extract rules as a fallback.

Response Format

Success Response

{
  "success": true,
  "data": {
    // Your extracted data based on prompt/schema
  },
  "metadata": {
    "html_length": 15420,
    "chunks_processed": 2,
    "extraction_time_ms": 2300,
    "input_tokens": 3200,
    "output_tokens": 150
  }
}

INFO — Credits Header

Credits consumed are returned in the Ujb-credits response header, not in the response body.

Field Descriptions

Field	Type	Description
`success`	boolean	Whether extraction was successful
`data`	object	Extracted structured data matching your prompt/schema
`metadata.html_length`	number	Size of HTML content fetched in bytes
`metadata.chunks_processed`	number	Number of content chunks processed (for large pages)
`metadata.extraction_time_ms`	number	AI extraction time in milliseconds
`metadata.input_tokens`	number	Number of input tokens sent to the LLM
`metadata.output_tokens`	number	Number of output tokens received from the LLM
`metadata.validation_warnings`	array	Warnings if extracted data doesn't fully match the schema (only present when there are warnings)

Comparison: AI Scraper vs Extract Rules

Choose the right tool for your use case:

Feature	AI Scraper	Extract Rules
Setup Time	Instant (just write prompt)	Requires CSS selector analysis
Maintenance	Low (AI adapts to changes)	Medium (update selectors when page changes)
Cost	15-40+ credits per page	1-2 credits per page
Speed	2-10 seconds	1-3 seconds
Accuracy	High for unstructured content	Very high for structured content
Best For	Dynamic layouts, unstructured data	Static selectors, high volume
Flexibility	Very high	Medium

When to Use AI Scraper

✅ Content structure varies between pages
✅ Need to extract meaning, not just text
✅ Prototyping and quick extraction
✅ Low-volume, high-value data
✅ Unstructured content (articles, reviews)

When to Use Extract Rules

✅ High-volume scraping (cost-effective)
✅ Consistent page structure
✅ Speed is critical
✅ Need precise control over extraction
✅ Static websites

Hybrid Approach

Combine both for optimal results:

// Try AI scraper for flexibility
const result = await client.aiScraper({
  url,
  prompt: 'Extract product details'
});

// Fallback to extract rules if AI fails
if (!result.data) {
  const fallback = await client.scrape({
    url,
    extract_rules: {
      name: { selector: '.product-title', type: 'text' },
      price: { selector: '.price', type: 'text' }
    }
  });
}

Credits & Billing

Credit Cost by Proxy Type

AI Scraper always uses browser rendering. Pricing reflects the proxy cost plus an AI processing premium.

Proxy Type	Credits per Request
`rotating` (default)	15
`advanced`	20
`premium`	25
`residential`	40
`residential_us`	40
`residential_geo`	25 + 10/MB over 1MB

INFO — Residential Proxy Pricing

US residential proxies (residential or residential_us) cost a flat 40 credits. Non-US residential proxies (residential_geo) have a base cost of 25 credits, plus 10 credits per MB of document size over 1MB.

INFO — Auto Proxy

If no proxy_type is specified, auto proxy is enabled by default. The system automatically tries different proxies until one succeeds. Credits are charged based on the proxy that was actually used for the successful request.

Cost Example

// Extract 50 product pages with default proxy (rotating)
for (const url of productUrls) {
  await client.aiScrape(url, 'Extract product details');
}
// Total: 50 × 15 = 750 credits

// Extract 50 pages with residential proxy (US)
for (const url of productUrls) {
  await client.aiScrape(url, 'Extract product details', {
    proxy_type: 'residential'
  });
}
// Total: 50 × 40 = 2000 credits

Billing Notes

Credits are charged only on successful extraction
Failed requests (4xx, 5xx errors) are not charged
With auto proxy, you are charged for the proxy that succeeded (not failed attempts)
When a CAPTCHA is detected and solved, an additional +5 credits surcharge is applied on top of the base request cost (auto_captcha_solve is enabled by default)

Next Steps

Learn about Extract Rules for structured extraction
Check out Templates for pre-built configurations
Read the Node.js SDK documentation

Ready to build?

Spin up an API key in 60 seconds

Free tier: 5,000 credits, no card, full access to every endpoint on this page.

Get free API key or try the playground →