Playground Sign in Start free

Extracting Links & Emails

1

Overview

Link and email extraction is one of the most common web scraping tasks. Whether you're building a crawler, finding contact information, or analyzing website structure, this tutorial will show you multiple approaches.

What You'll Learn

Extract all page links
Find email addresses
Filter by domain
Get social media links
Find mailto links
Build link profiles
Use Cases

Lead generation, competitive analysis, SEO audits, contact discovery, link building, website mapping, and dead link checking.

3

Extract Email Addresses

Use mailto links and regex patterns to find email addresses:

JSON - Extract Rules
{
  "mailto_links": {
    "selector": "a[href^='mailto:']",
    "type": "obj",
    "multiple": true,
    "children": {
      "email": { "selector": "", "type": "attr", "attribute": "href" },
      "text": { "selector": "", "type": "text" }
    }
  },
  "page_content": {
    "selector": "body",
    "type": "text"
  }
}

Parse Emails with Regex

Python
import re

# Extract emails from page content
email_pattern = r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'

data = response.json()["result"]

# From mailto links
mailto_emails = [
    link["email"].replace("mailto:", "").split("?")[0]
    for link in data.get("mailto_links", [])
]

# From page content using regex
content_emails = re.findall(email_pattern, data.get("page_content", ""))

# Combine and deduplicate
all_emails = list(set(mailto_emails + content_emails))
print(f"Found {len(all_emails)} unique emails: {all_emails}")
6

Best Practices

01

Deduplicate Results

Pages often have duplicate links. Always remove duplicates using set() or similar.

Essential
02

Validate Emails

Use proper email validation. Regex can match invalid patterns like image@2x.png.

Recommended
03

Resolve Relative URLs

Convert relative URLs to absolute URLs using the page's base URL for proper link analysis.

Important
04

Respect Privacy

Don't spam extracted emails. Respect GDPR and anti-spam regulations when using contact data.

Legal

Ready to Start Scraping?

Try the API in our interactive playground or explore the documentation.