| 4 min read

How I Automated YouTube Uploads for Under 10p a Month

YouTube API automation cost optimisation Whisper Gemini Python

The Challenge: Automation Without the Price Tag

When people hear "AI automation," they often assume significant ongoing costs. API calls, cloud compute, storage fees. But one of my favourite projects proves that is not always the case. I run a fully automated YouTube upload pipeline that costs less than 10p per month in additional expenses.

This post breaks down exactly where every penny goes and the architectural decisions that keep costs so low.

What the Pipeline Does

As a quick recap, the pipeline handles the entire YouTube upload process:

  1. Detects new video files in a watched directory
  2. Extracts audio and transcribes using Whisper
  3. Generates optimised titles, descriptions, tags, and chapters using Gemini
  4. Uploads to YouTube via the Data API
  5. Archives the source file and logs everything

There is no manual intervention at any step. I drop a video file into a folder, and it appears on YouTube with full metadata within the hour.

Cost Breakdown: The Numbers

Whisper Transcription: 0.00 pounds

I run Whisper locally using the small model. The model weights are downloaded once and run on CPU. There is no API call, no per-minute charge, nothing. It uses some CPU time on my VPS, but that is a fixed cost I am already paying for other projects.

Processing time for a 10-minute video is about 3 minutes on a standard VPS. That is perfectly acceptable for a pipeline that does not need to be real-time.

Gemini Flash Metadata Generation: ~0.002p per video

Gemini 2.0 Flash is extraordinarily cheap. Let me show the maths:

  • Input: roughly 2,000 tokens (transcript + prompt)
  • Output: roughly 500 tokens (title, description, tags, chapters)
  • Gemini Flash pricing: $0.10 per million input tokens, $0.40 per million output tokens
  • Cost per video: (2000 * 0.0000001) + (500 * 0.0000004) = $0.0004
  • In GBP at current rates: approximately 0.0003p

Even if I uploaded 100 videos per month, the Gemini cost would be about 0.03p. It rounds to zero.

YouTube Data API: 0.00 pounds

The YouTube Data API is free within quota limits. Each video upload costs 1,600 quota units out of a daily allocation of 10,000. That gives me 6 uploads per day, which is more than I need.

Server Overhead: ~8p per month

This is the only real cost, and it is an estimate. My VPS costs a fixed monthly fee, and the pipeline shares it with several other projects. Based on CPU usage monitoring, I attribute about 8p of the monthly server cost to the YouTube pipeline. This covers:

  • The cron job running every 30 minutes
  • Whisper processing time when videos are present
  • Storage for the archive of processed files

Total: Under 10p per month

For up to 30 videos per month, the total cost is approximately 8 to 9p. The AI costs are so small they barely register.

Design Decisions That Keep Costs Low

1. Local Whisper Instead of API

The OpenAI Whisper API charges $0.006 per minute. For a 10-minute video, that is $0.06. Not expensive in isolation, but multiply by 30 videos and you are at $1.80/month. Running locally eliminates this entirely.

# Local Whisper - zero API cost
import whisper
model = whisper.load_model("small")
result = model.transcribe("video.mp4")

# vs. API Whisper - $0.006/min
# client.audio.transcriptions.create(
#     model="whisper-1",
#     file=open("audio.mp3", "rb")
# )

2. Gemini Flash for Non-Critical Generation

YouTube metadata generation does not need the most powerful model. Flash handles it perfectly. If I were using Claude Sonnet or GPT-4 for this task, costs would be 10 to 50 times higher for no meaningful improvement in output quality.

3. Shared Infrastructure

Running on a VPS I already pay for means the marginal cost of adding this pipeline is near zero. If I had spun up a dedicated server or used serverless functions, the baseline cost would be significantly higher.

4. Efficient Polling

The cron job checks for new files every 30 minutes. The check itself takes milliseconds. Only when a new file is found does the expensive processing (transcription, API calls) begin. This event-driven-ish approach means the system is effectively idle 99% of the time.

Comparison with Alternatives

To put this in perspective:

  • Zapier or Make.com automation: The free tiers would not cover this workflow. Paid plans start at $20/month
  • Using all cloud APIs: Whisper API + GPT-4 + cloud hosting would cost roughly $5 to $10/month
  • Manual upload: Free in money, but 20 to 30 minutes per video. At 30 videos/month, that is 10+ hours of time

The Takeaway

AI automation does not have to be expensive. The key decisions are: run models locally when you can, use the cheapest model that does the job well, and leverage infrastructure you already have. This pipeline handles a genuinely useful task, runs completely unattended, and costs less than a packet of crisps per month. That is the kind of ROI that makes AI engineering exciting.