| 3 min read

FFmpeg for AI Engineers: Automating Video Post-Production

FFmpeg video production automation Python AI pipelines

Why AI Engineers Need FFmpeg

If you are building any kind of automated content pipeline that involves video, you will eventually need FFmpeg. It is the Swiss Army knife of video processing, and it runs on every platform. I use it daily in my AI video production pipeline for everything from concatenating clips to adding text overlays and encoding final output.

The learning curve is steep, but once you know the core patterns, FFmpeg becomes an incredibly powerful tool. Here are the commands and techniques I use most often.

Essential Commands

Concatenating Video Segments

My pipeline generates individual video segments (intro, chapters, outro) and then stitches them together. The concat demuxer is the cleanest approach:

# Create a file list
echo "file 'intro.mp4'" > list.txt
echo "file 'chapter1.mp4'" >> list.txt
echo "file 'outro.mp4'" >> list.txt

# Concatenate without re-encoding
ffmpeg -f concat -safe 0 -i list.txt -c copy output.mp4

The -c copy flag is crucial. It copies streams without re-encoding, which is instant. This only works when all segments have the same codec, resolution, and frame rate, so I ensure consistency during the generation phase.

Adding Audio Tracks

Combining a silent video with a voiceover and background music:

ffmpeg -i video.mp4 -i voiceover.wav -i music.mp3 \
  -filter_complex "
    [1:a]volume=1.0[voice];
    [2:a]volume=0.15[music];
    [voice][music]amix=inputs=2:duration=first[aout]
  " \
  -map 0:v -map "[aout]" \
  -c:v copy -c:a aac -b:a 192k \
  output.mp4

The amix filter mixes the voiceover and music, with the music at 15% volume. The duration=first parameter ensures the output matches the voiceover length.

Text Overlays

Adding title cards or lower thirds:

ffmpeg -i input.mp4 \
  -vf "drawtext=text='Chapter 1':fontsize=48:\
  fontcolor=white:x=(w-text_w)/2:y=(h-text_h)/2:\
  enable='between(t,0,3)'" \
  -c:a copy output.mp4

The enable parameter controls when the text appears. This shows the title for the first 3 seconds.

Python Automation Patterns

I never run FFmpeg commands manually. Everything goes through Python's subprocess module with careful error handling:

import subprocess
import logging

def run_ffmpeg(args: list[str], timeout: int = 300) -> bool:
    cmd = ["ffmpeg", "-y", "-loglevel", "warning"] + args
    logging.info(f"Running: {' '.join(cmd)}")
    try:
        result = subprocess.run(
            cmd,
            capture_output=True,
            text=True,
            timeout=timeout
        )
        if result.returncode != 0:
            logging.error(f"FFmpeg error: {result.stderr}")
            return False
        return True
    except subprocess.TimeoutExpired:
        logging.error("FFmpeg timed out")
        return False

Key details: the -y flag auto-overwrites output files (essential for automation), and -loglevel warning reduces noise while still showing errors.

Probing Video Metadata

Before processing, I always probe the input to verify dimensions and duration:

import json

def probe_video(path: str) -> dict:
    result = subprocess.run(
        ["ffprobe", "-v", "quiet", "-print_format", "json",
         "-show_format", "-show_streams", path],
        capture_output=True, text=True
    )
    return json.loads(result.stdout)

This returns detailed information about codecs, resolution, frame rate, and duration. I use it to validate inputs before expensive processing steps.

Performance Tips

  • Use hardware acceleration: On machines with NVIDIA GPUs, use -c:v h264_nvenc instead of -c:v libx264 for 5-10x faster encoding.
  • Avoid unnecessary re-encoding: Use -c copy whenever possible. Re-encoding is the slowest part of any video pipeline.
  • Process in parallel: If you have multiple independent segments, use Python's ThreadPoolExecutor to run FFmpeg instances concurrently.
  • Set appropriate CRF: For web delivery, CRF 23-28 provides good quality at reasonable file sizes. Lower numbers mean better quality but larger files.

Common Pitfalls

A few issues that cost me hours of debugging:

  • Audio sync drift: When concatenating segments, mismatched sample rates cause audio drift. Always normalize to 44100 Hz or 48000 Hz before joining.
  • Pixel format mismatches: Some operations require specific pixel formats. Add -pix_fmt yuv420p for maximum compatibility.
  • Filter graph ordering: In complex filter graphs, the order of operations matters. Scale before overlay, not after.

Wrapping Up

FFmpeg is not glamorous, but it is indispensable for AI video pipelines. Invest time in learning the filter system and the concat demuxer, and you will have a solid foundation for any video automation project. The commands in this post cover about 80% of what I use in production daily.