Back to Projects
Creative

AI Thumbnail Generator

Multi-stage AI pipeline for YouTube podcast thumbnails. Uses Claude for hook generation and Gemini for image synthesis with character consistency.

Tech Stack
5 tools
Timeline
Development
Status
In Progress
Impact
Featured
A

TL;DR: TL;DR: I built an AI pipeline that generates A/B-testable YouTube thumbnails for podcasts. Claude analyzes episodes and creates 5 hook options, selects the best 3, then Gemini renders images with consistent character faces and brand colors.

The Problem

YouTube thumbnails make or break click-through rates, but creating good ones is:

  • Time-consuming: 30-60 minutes per thumbnail with design tools
  • Skill-dependent: Requires knowing design principles, color theory, expressions
  • Hit-or-miss: Hard to predict what will perform well
  • Inconsistent: Maintaining brand identity across 70+ episodes is tough

I needed a system that could generate multiple professional options quickly for A/B testing.

My Approach

I built a multi-stage generation pipeline:

  1. Hook Generation (Claude): Analyzes episode and generates 5 hook options using different psychological approaches
  2. Hook Selection (Claude): Evaluates all hooks and picks the best 3 for testing
  3. Expression Mapping: Maps hook mood to facial expressions (revelation, authority, controversy)
  4. Image Generation (Gemini): Renders thumbnails with guest images for face consistency

The key insight was separating conceptual work (what message?) from visual work (how to render it?).

Architecture

AI Thumbnail Generator - Architecture Diagram

Key Features

  • 5 Hook Approaches: Each uses different psychological trigger
  • Objective Selection: Claude evaluates without bias toward its own outputs
  • Expression Consistency: Mood maps to specific facial expressions
  • Character Persistence: Reference images maintain face identity
  • Brand Colors: Saved and reused across episodes
  • Session Management: Save/restore incomplete workflows
  • Prompts-Only Mode: Generate hooks without image rendering

Results & Metrics

Metric Value
Hooks Generated 5 per episode
Thumbnails Output 3 per session
Image Resolution 2048x2048
Reference Images Up to 5 guests + host
Rate Limits Claude: 50/min, Gemini: 10/min
Output Files thumbnails + prompts.txt + metadata.json

What I Learned

The hardest part was character consistency. Early versions generated great compositions but the guest's face looked different in each thumbnail. I solved this by:

  1. Reference image feeding: Pass up to 5 guest photos to Gemini
  2. Explicit face instructions: "Maintain exact facial features from reference"
  3. Expression guidance: Specific descriptions like "widened eyes, slight forward lean"

Another challenge was rate limiting. Gemini's image API has strict limits (10/minute), so I added exponential backoff:

# Automatic retry with backoff
@retry(
    stop=stop_after_attempt(5),
    wait=wait_exponential(multiplier=1, min=4, max=60)
)
async def generate_image(prompt: str):
    ...

The separation of Claude for conceptual work and Gemini for visual work was key—each model excels at different tasks.

Frequently Asked Questions

What problem does this generator solve?

It reduces thumbnail creation from 30-60 minutes to 5 minutes per episode. Instead of manually designing, you get 3 A/B-testable options with psychological hooks and consistent character rendering.

What technologies power this project?

Claude API for hook generation and selection, Gemini API for image synthesis, Pydantic for data validation, and an interactive Python CLI for the workflow.

How good are the generated thumbnails?

Quality is high for podcast-style thumbnails with text overlays and host/guest faces. Complex scenes or multiple elements may require manual refinement. The psychological hooks are based on proven CTR frameworks.

Frequently Asked Questions

It reduces thumbnail creation from 30-60 minutes to 5 minutes per episode. Instead of manually designing, you get 3 A/B-testable options with psychological hooks and consistent character rendering.
Claude API for hook generation and selection, Gemini API for image synthesis, Pydantic for data validation, and an interactive Python CLI for the workflow.
Quality is high for podcast-style thumbnails with text overlays and host/guest faces. Complex scenes or multiple elements may require manual refinement. The psychological hooks are based on proven CTR frameworks.

More Projects

View all
AS

Built by Abhinav Sinha

AI-First Product Manager who builds production-grade tools. Passionate about turning complex problems into elegant solutions using AI, automation, and modern web technologies.