FFmpeg for AI Engineers: Automating Video Post-Production
Why AI Engineers Need FFmpeg
If you are building any kind of automated content pipeline that involves video, you will eventually need FFmpeg. It is the Swiss Army knife of video processing, and it runs on every platform. I use it daily in my AI video production pipeline for everything from concatenating clips to adding text overlays and encoding final output.
The learning curve is steep, but once you know the core patterns, FFmpeg becomes an incredibly powerful tool. Here are the commands and techniques I use most often.
Essential Commands
Concatenating Video Segments
My pipeline generates individual video segments (intro, chapters, outro) and then stitches them together. The concat demuxer is the cleanest approach:
# Create a file list
echo "file 'intro.mp4'" > list.txt
echo "file 'chapter1.mp4'" >> list.txt
echo "file 'outro.mp4'" >> list.txt
# Concatenate without re-encoding
ffmpeg -f concat -safe 0 -i list.txt -c copy output.mp4The -c copy flag is crucial. It copies streams without re-encoding, which is instant. This only works when all segments have the same codec, resolution, and frame rate, so I ensure consistency during the generation phase.
Adding Audio Tracks
Combining a silent video with a voiceover and background music:
ffmpeg -i video.mp4 -i voiceover.wav -i music.mp3 \
-filter_complex "
[1:a]volume=1.0[voice];
[2:a]volume=0.15[music];
[voice][music]amix=inputs=2:duration=first[aout]
" \
-map 0:v -map "[aout]" \
-c:v copy -c:a aac -b:a 192k \
output.mp4The amix filter mixes the voiceover and music, with the music at 15% volume. The duration=first parameter ensures the output matches the voiceover length.
Text Overlays
Adding title cards or lower thirds:
ffmpeg -i input.mp4 \
-vf "drawtext=text='Chapter 1':fontsize=48:\
fontcolor=white:x=(w-text_w)/2:y=(h-text_h)/2:\
enable='between(t,0,3)'" \
-c:a copy output.mp4The enable parameter controls when the text appears. This shows the title for the first 3 seconds.
Python Automation Patterns
I never run FFmpeg commands manually. Everything goes through Python's subprocess module with careful error handling:
import subprocess
import logging
def run_ffmpeg(args: list[str], timeout: int = 300) -> bool:
cmd = ["ffmpeg", "-y", "-loglevel", "warning"] + args
logging.info(f"Running: {' '.join(cmd)}")
try:
result = subprocess.run(
cmd,
capture_output=True,
text=True,
timeout=timeout
)
if result.returncode != 0:
logging.error(f"FFmpeg error: {result.stderr}")
return False
return True
except subprocess.TimeoutExpired:
logging.error("FFmpeg timed out")
return FalseKey details: the -y flag auto-overwrites output files (essential for automation), and -loglevel warning reduces noise while still showing errors.
Probing Video Metadata
Before processing, I always probe the input to verify dimensions and duration:
import json
def probe_video(path: str) -> dict:
result = subprocess.run(
["ffprobe", "-v", "quiet", "-print_format", "json",
"-show_format", "-show_streams", path],
capture_output=True, text=True
)
return json.loads(result.stdout)This returns detailed information about codecs, resolution, frame rate, and duration. I use it to validate inputs before expensive processing steps.
Performance Tips
- Use hardware acceleration: On machines with NVIDIA GPUs, use
-c:v h264_nvencinstead of-c:v libx264for 5-10x faster encoding. - Avoid unnecessary re-encoding: Use
-c copywhenever possible. Re-encoding is the slowest part of any video pipeline. - Process in parallel: If you have multiple independent segments, use Python's ThreadPoolExecutor to run FFmpeg instances concurrently.
- Set appropriate CRF: For web delivery, CRF 23-28 provides good quality at reasonable file sizes. Lower numbers mean better quality but larger files.
Common Pitfalls
A few issues that cost me hours of debugging:
- Audio sync drift: When concatenating segments, mismatched sample rates cause audio drift. Always normalize to 44100 Hz or 48000 Hz before joining.
- Pixel format mismatches: Some operations require specific pixel formats. Add
-pix_fmt yuv420pfor maximum compatibility. - Filter graph ordering: In complex filter graphs, the order of operations matters. Scale before overlay, not after.
Wrapping Up
FFmpeg is not glamorous, but it is indispensable for AI video pipelines. Invest time in learning the filter system and the concat demuxer, and you will have a solid foundation for any video automation project. The commands in this post cover about 80% of what I use in production daily.