Stealth Startup Scraper | Founder Intelligence Extraction

Name: Stealth Startup Scraper
Author: Abhinav Sinha

The Problem

Stealth Startup Spy is a valuable newsletter that covers founders coming out of stealth mode. But the data is:

Unstructured: Prose format, not a database
Scattered: Across 200+ monthly editions
Not searchable: Can't query "founders from Google" or "AI startups"
Manual to track: No way to monitor new editions automatically

I wanted to build a searchable database of stealth startups for research and networking.

My Approach

I built a multi-stage extraction pipeline:

Newsletter Ingestion: Fetch Substack content via Firecrawl API
Section Detection: Identify "Founders Coming Out of Stealth" and "Key Talent Going Under Stealth"
Profile Extraction: Parse 18+ structured fields using regex patterns
Validation & Storage: Type checking, constraint enforcement, Supabase insert

The system uses MCP for Claude Code integration, enabling direct database operations.

Architecture

Stealth Startup Scraper - Architecture Diagram

Key Features

Dual Section Parsing: Handles both "Coming Out of Stealth" and "Going Under Stealth"
18+ Structured Fields: From basic info to funding details
Gap Detection: Identifies missing newsletter editions
Batch Processing: Process multiple editions in one command
Dry-Run Mode: Preview extractions without database writes
Idempotent Processing: Safe to re-run without duplicates
Monthly Automation: ./run_monthly.sh for scheduled execution

Results & Metrics

Metric	Value
Newsletters Archived	200+
Profiles Extracted	1,216+
Fields per Profile	18+
Latest Edition	#266
Processing Rate	~5 seconds/newsletter
Error Rate	<1%

What I Learned

The hardest part was handling format variations. The newsletter's format evolved over time:

Early editions used different section headers
Some editions have inline LinkedIn links, others have separate fields
Funding info is sometimes detailed, sometimes just "stealth"

I built a flexible regex parser that handles variations:

# Multiple patterns for the same field
linkedin_patterns = [
    r"Connect on LinkedIn:\s*(\S+)",
    r"LinkedIn:\s*(\S+)",
    r"\[Connect\]\((https://linkedin\.com/[^)]+)\)"
]

The MCP integration was valuable for ad-hoc queries during development. Instead of writing database queries manually, I could ask Claude to "find all founders from ex-Google."

Frequently Asked Questions

What problem does this scraper solve?

It converts unstructured newsletter prose into a searchable database of 1,216+ founder profiles. You can query by prior company, industry, location, funding status, and more.

What technologies power this project?

Python for the scraping and parsing logic, Firecrawl for robust web content extraction, Supabase PostgreSQL for structured storage, and MCP for Claude Code integration.

How accurate is the extraction?

Very high accuracy (>99%) for structured fields like names, companies, and locations. Some fields like "team_size" or "funding_info" depend on newsletter content availability and may be incomplete.

Frequently Asked Questions

It converts unstructured newsletter prose into a searchable database of 1,216+ founder profiles. You can query by prior company, industry, location, funding status, and more.

Python for the scraping and parsing logic, Firecrawl for robust web content extraction, Supabase PostgreSQL for structured storage, and MCP for Claude Code integration.

Very high accuracy (>99%) for structured fields like names, companies, and locations. Some fields like "team_size" or "funding_info" depend on newsletter content availability and may be incomplete.

More Projects

View all

Automation

Fort Smith Newsletter Automation

Financial

Credit Card Benefits Organizer

Creative

AI Thumbnail Generator

Built by Abhinav Sinha

AI-First Product Manager who builds production-grade tools. Passionate about turning complex problems into elegant solutions using AI, automation, and modern web technologies.

Connect on LinkedIn Get in Touch View All Projects

The Problem

My Approach

Architecture

Key Features

Results &#x26; Metrics

What I Learned

Frequently Asked Questions

What problem does this scraper solve?

What technologies power this project?

How accurate is the extraction?

Frequently Asked Questions

More Projects

Fort Smith Newsletter Automation

Credit Card Benefits Organizer

AI Thumbnail Generator

Built by Abhinav Sinha

Results & Metrics