Playground Sign in Start free
Blog

Notes from the team.

Practical writeups on web scraping, anti-bot tactics, structured extraction, SERPs and the messy realities of building data pipelines for AI.

Filtering by author: sam Clear ×

How to Take Full-Page Screenshots with a Screenshot API

Ever tried to capture an entire webpage in one go, only to end up taking multiple screenshots and stitching them together? Taking a full page screenshot manually is about as fun as printing a web page and scanning it. Whether you're a developer needing a complete page snapshot for testing, or a marketer monitoring how a landing page looks over time, the struggle is real. Even the Chrome full page screenshot trick in DevTools (handy, but hidden) is fine for one-off captures, not so much for autom

Sam May 4, 2025 14 min read

Safeguarding Your Website from Abusive Web Scraping

Abusive scraping can cause significant problems for website owners, including server overload, unauthorized data extraction, and the potential exposure of sensitive information. Implementing effective anti-scraping mechanisms is crucial to protect your website from these threats.

Sam Sep 19, 2024 7 min read

Puppeteer based Simple Data Scraper: Advanced Options

In this article, we show how Puppeteer's advanced capabilities can be used to make our scraper better equipped for handling real world use cases. Namely, we will explore options such as controlling page load behavior, HTTP Authentication, adding extra headers, changing user agent, etc...

Sam Mar 22, 2024 7 min read

A Simple Rule-based Scraper using Puppeteer's native methods

In our previous article [https://ujeebu.com/blog/simple-puppeteer-based-scraper-rule-based-extraction/] of the Puppeteer series we implemented a rule-based scraper based on headless Chrome using Puppeteer. We injected our scraping functions into the browser's context (window) then used those to execute scraping scenarios inside the browser. In this article we will try to achieve the same thing, but this time using Puppeteer's methods without injecting functions into the browser's context. Rew

Sam Mar 1, 2024 9 min read