Blog

Notes from the team.

Practical writeups on web scraping, anti-bot tactics, structured extraction, SERPs and the messy realities of building data pipelines for AI.

Filtering by content-extraction Clear ×

AI ChatGPT Content Extraction

Extracting Product Information automatically using ChatGPT

Product information like prices, descriptions, and reviews are crucial for market analysis, dynamic pricing, and inventory management. However, manually extracting this data from multiple sources can be time-consuming and error-prone.

Youssef Oct 10, 2024 6 min read

Content Extraction Web Scraping

How to Scrape TikTok: A Comprehensive Guide

The TikTok API has several restrictions that limit what data you can access and how frequently you can query it. For this reason, web scraping becomes a viable solution, as long as it is done in compliance with TikTok’s Terms of Service.

Sam Sep 12, 2024 4 min read

Content Extraction Web Scraping Amazon

Step-by-Step Guide to Scraping Amazon Product Data

Uncover the secrets of scraping Amazon product data with our comprehensive step-by-step guide. Scrape data without getting blocked with Ujeebu. Read more!

Manpreet Nagpal Jul 29, 2024 14 min read

Content Extraction Puppeteer

A Simple Rule-based Scraper using Puppeteer's native methods

In our previous article [https://ujeebu.com/blog/simple-puppeteer-based-scraper-rule-based-extraction/] of the Puppeteer series we implemented a rule-based scraper based on headless Chrome using Puppeteer. We injected our scraping functions into the browser's context (window) then used those to execute scraping scenarios inside the browser. In this article we will try to achieve the same thing, but this time using Puppeteer's methods without injecting functions into the browser's context. Rew

Sam Mar 1, 2024 9 min read

Content Extraction Puppeteer Web Scraping

Simple Puppeteer-based Scraper: Rule based extraction

In this article, we show how to scrape any website with a given set of rules using the Puppeteer library.

Sam Apr 23, 2023 10 min read

Content Extraction Puppeteer Web Scraping

A Simple Scraper using Puppeteer

Web scraping is the process of extracting data from websites. One popular library for web scraping is Puppeteer. Puppeteer is a Node.js library that provides a high-level API to control headless Chrome or Chromium over the DevTools Protocol.

Sam Jan 29, 2023 6 min read

Content Extraction Web Scraping

Is Web Scraping Legal?

The issues of legality and ethics surrounding web scraping are a massive grey area. While some may be in favor of web scraping, others might not share the same enthusiasm. This is what makes the subject so controversial.

Sam Dec 23, 2022 8 min read

Content Extraction Machine Learning

Extracting clean data from blog and news articles

Several open source tools allow the extraction of clean text from article HTML. We list the most popular ones below, and run a benchmark to see how they stack up against the Ujeebu API

Sam Aug 9, 2019 6 min read