| 3 min read

Building Branded Thumbnails with Runware and Python

Runware thumbnails Python Pillow AI images automation

The Thumbnail Problem

Every video in my automated pipeline needs a thumbnail. Creating them manually defeats the purpose of automation, and generic screenshots look terrible. I needed a system that generates eye-catching, branded thumbnails automatically for each video topic.

My solution combines Runware's AI image generation API for backgrounds with Python's Pillow library for text overlays and branding. The result is consistent, professional thumbnails generated in about 10 seconds each.

Why Runware?

I evaluated several image generation APIs including DALL-E, Midjourney's API, and Stability AI. Runware stood out for several reasons:

  • Speed: Images generate in 2-4 seconds, much faster than most alternatives
  • Cost: Significantly cheaper per image than DALL-E 3
  • Consistency: Good at generating background-style images that work well with text overlays
  • Simple API: Clean REST API with Python SDK available

Generating the Background

The first step is generating an AI background that matches the video topic. I craft prompts that produce bold, colourful backgrounds without text (AI-generated text in images is still unreliable):

import runware

async def generate_background(topic: str) -> str:
    client = runware.Runware(api_key=RUNWARE_KEY)
    await client.connect()

    prompt = (
        f"Professional thumbnail background for a tech video about {topic}. "
        "Bold colours, abstract tech elements, no text, no people. "
        "Cinematic lighting, 16:9 aspect ratio."
    )

    images = await client.imageInference(
        positivePrompt=prompt,
        negativePrompt="text, words, letters, blurry, low quality",
        width=1280,
        height=720,
        numberResults=1
    )
    return images[0].imageURL

The negative prompt is important. Explicitly excluding text prevents the model from trying to render words, which almost always looks bad.

Adding Text Overlays with Pillow

Once I have the background, I use Pillow to add the video title, channel branding, and any other elements:

from PIL import Image, ImageDraw, ImageFont
import requests
from io import BytesIO

def create_thumbnail(bg_url: str, title: str, output_path: str):
    # Download the AI background
    response = requests.get(bg_url)
    bg = Image.open(BytesIO(response.content)).resize((1280, 720))

    draw = ImageDraw.Draw(bg)

    # Add semi-transparent overlay for text readability
    overlay = Image.new("RGBA", bg.size, (0, 0, 0, 0))
    overlay_draw = ImageDraw.Draw(overlay)
    overlay_draw.rectangle(
        [(40, 500), (1240, 690)],
        fill=(0, 0, 0, 160)
    )
    bg = Image.alpha_composite(bg.convert("RGBA"), overlay)

    # Add title text
    draw = ImageDraw.Draw(bg)
    font = ImageFont.truetype("fonts/bold.ttf", 52)
    draw.text((60, 520), title, font=font, fill="white")

    bg.convert("RGB").save(output_path, quality=95)

The semi-transparent black rectangle behind the text ensures readability regardless of the background image. This is a simple technique that makes a huge difference.

Text Wrapping

Long titles need to wrap properly. Pillow does not handle this natively, so I wrote a helper:

def wrap_text(text: str, font, max_width: int, draw) -> list[str]:
    words = text.split()
    lines = []
    current_line = ""

    for word in words:
        test_line = f"{current_line} {word}".strip()
        bbox = draw.textbbox((0, 0), test_line, font=font)
        if bbox[2] - bbox[0] <= max_width:
            current_line = test_line
        else:
            lines.append(current_line)
            current_line = word

    if current_line:
        lines.append(current_line)
    return lines

This measures each word addition against the maximum width and starts a new line when needed.

Branding Consistency

Every thumbnail in my system uses the same:

  • Font family: A bold sans-serif that reads well at small sizes
  • Colour palette: Consistent accent colours that match my channel branding
  • Layout: Title at the bottom third, logo in the corner
  • Text shadow: A subtle dark shadow behind text for depth

This consistency is easy to maintain because it is all defined in code. Changing the branding means updating a few variables, and every future thumbnail automatically uses the new style.

Integration with the Pipeline

The thumbnail generator is one step in my larger video production pipeline. After the script generates the video, it calls the thumbnail function with the video title, generates the background, composites the final image, and uploads it alongside the video.

The entire thumbnail generation process takes about 10 seconds: 3-4 seconds for the AI image, and under a second for the Pillow compositing. For a fully automated pipeline, that is negligible.

Lessons Learned

A few tips from generating hundreds of thumbnails:

  • Keep text to 5-7 words maximum. Thumbnails are viewed at small sizes.
  • Use high contrast between text and background. The overlay technique solves this reliably.
  • Test your thumbnails at 168x94 pixels, which is the size YouTube shows in most contexts.
  • Generate 2-3 background variants and pick the best programmatically based on colour distribution.