Amazon product pages contain valuable structured data that can be extracted for price monitoring, competitor analysis, product research, and market intelligence.
Overview
Analyze the HTML Structure
Before creating extract rules, inspect the Amazon product page HTML to identify CSS selectors for each data point:
| Data Point | CSS Selector | Type |
|---|---|---|
| Product Title | #productTitle, #title span | text |
| Price | .a-price .a-offscreen, #priceblock_ourprice, #priceblock_dealprice | text |
| Main Image | #landingImage, #imgBlkFront | image |
| Rating | span[data-hook="rating-out-of-text"], #acrPopover span.a-icon-alt | text |
| Review Count | #acrCustomerReviewText, span[data-hook="total-review-count"] | text |
| Availability | #availability span, #outOfStock span | text |
| Features | #feature-bullets ul li span.a-list-item | text (array) |
| Brand | #bylineInfo, a#bylineInfo | text |
Amazon's HTML structure can vary by product category and region. Always test your selectors with multiple product pages.
Build Extract Rules
Extract rules define how to extract data from the page using CSS selectors:
{
"title": { "selector": "#productTitle, #title span", "type": "text" },
"price": { "selector": ".a-price .a-offscreen, #priceblock_ourprice", "type": "text" },
"original_price": { "selector": ".a-text-price .a-offscreen", "type": "text" },
"main_image": { "selector": "#landingImage, #imgBlkFront", "type": "image" },
"all_images": { "selector": "#altImages img", "type": "image", "multiple": true },
"rating": { "selector": "span[data-hook='rating-out-of-text'], #acrPopover span.a-icon-alt", "type": "text" },
"review_count": { "selector": "#acrCustomerReviewText", "type": "text" },
"availability": { "selector": "#availability span", "type": "text" },
"brand": { "selector": "#bylineInfo", "type": "text" },
"features": { "selector": "#feature-bullets ul li span.a-list-item", "type": "text", "multiple": true },
"description": { "selector": "#productDescription p, #productDescription span", "type": "text" }
}Understanding Extract Rule Types
textimageattrmultiple: trueMake the API Request
Use the Scrape API with your extract rules. Enable js for JavaScript rendering:
curl -X POST "https://api.ujeebu.com/scrape" \
-H "ApiKey: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com"
}'
Handle the Response
The API returns extracted data in JSON format:
{
"success": true,
"result": {
"title": "Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones",
"price": "$18.88",
"rating": "4.8 out of 5",
"review_count": "(146,898)",
"availability": "In Stock",
"main_image": "https://m.media-amazon.com/images/I/81kg51XRc1L._SY466_.jpg",
"brand": "James Clear (Author)",
"features": [
"An longest-running bestselling book on the science of habit formation",
"Practical strategies for forming good habits and breaking bad ones",
"An easy and proven framework for improving every day"
]
}
}The response includes all structured data from your extract rules, ready for processing.
Best Practices
Use JavaScript Rendering
Amazon heavily uses JavaScript. Always set js=true and use wait_for to ensure content loads.
Implement Rate Limiting
Respect Amazon's servers with 2-3 second delays between requests to avoid being blocked and maintain good scraping etiquette.
Use Rotating Proxies
Enable proxy rotation for production use to distribute requests across multiple IP addresses and avoid blocks.
Handle Variations
Amazon's HTML varies by product category, region, and session state. Always test selectors across different products.
Legal Notice
Review Amazon's Terms of Service and robots.txt before scraping. Use scraped data responsibly and in compliance with applicable laws. This tutorial is for educational purposes only.
Ready to Start Scraping?
Try the API in our interactive playground or explore the documentation.