| 4 min read

The Architecture of a Multi-Agent Image Generation Pipeline

multi-agent image generation AI pipeline automation Vertex AI architecture

Why Multi-Agent for Image Generation?

Single-prompt image generation is fine for one-off creative work. But when you need to produce hundreds of consistent, brand-aligned images across multiple styles and formats, you need a system that can handle the complexity. That system is a multi-agent pipeline where each agent specializes in one part of the image creation process.

I built this architecture for a content automation project that required generating 500 unique images per week across 10 different visual styles. Here is how it works.

The Agent Lineup

The pipeline has 5 specialized agents, each with a distinct responsibility:

  • Brief Interpreter: Converts high-level content briefs into detailed image specifications
  • Prompt Engineer: Transforms specifications into optimized generation prompts
  • Generator: Executes the actual image generation via API
  • QA Inspector: Evaluates generated images against quality and compliance criteria
  • Post-Processor: Handles cropping, resizing, format conversion, and metadata

The Brief Interpreter Agent

This agent receives a content brief like "hero image for blog post about Python async programming" and produces a structured specification:

class BriefInterpreter:
    def __init__(self, style_guide: dict):
        self.style_guide = style_guide
        self.client = OpenAI()
    
    def interpret(self, brief: str, context: dict) -> ImageSpec:
        prompt = f"""Convert this content brief into an image specification.
        
        Brief: {brief}
        Brand style guide: {json.dumps(self.style_guide)}
        Content context: {json.dumps(context)}
        
        Output a JSON specification with:
        - subject: main visual subject
        - style: art style from approved list
        - mood: emotional tone
        - color_palette: primary colors
        - composition: layout description
        - text_overlay: any text to include
        - aspect_ratio: target ratio
        - negative_elements: things to avoid"""
        
        response = self.client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": prompt}],
            response_format={"type": "json_object"}
        )
        return ImageSpec(**json.loads(response.choices[0].message.content))

The Prompt Engineer Agent

This is where the real craft happens. Different image generation models respond differently to prompt structures. The Prompt Engineer agent knows the quirks of each model and optimizes accordingly.

class PromptEngineer:
    MODEL_TEMPLATES = {
        'imagen': {
            'structure': '{style} {subject}, {mood} atmosphere, {colors}, {composition}',
            'max_length': 1024,
            'negative_prompt': True
        },
        'dall-e-3': {
            'structure': 'A {style} image of {subject}. The mood is {mood}. {composition}. Color palette: {colors}.',
            'max_length': 4000,
            'negative_prompt': False
        }
    }
    
    def craft_prompt(self, spec: ImageSpec, model: str) -> GenerationPrompt:
        template = self.MODEL_TEMPLATES[model]
        
        base_prompt = template['structure'].format(
            style=spec.style,
            subject=spec.subject,
            mood=spec.mood,
            colors=', '.join(spec.color_palette),
            composition=spec.composition
        )
        
        # Model-specific optimizations
        if model == 'imagen':
            base_prompt = self._optimize_for_imagen(base_prompt, spec)
        
        return GenerationPrompt(
            positive=base_prompt[:template['max_length']],
            negative=spec.negative_elements if template['negative_prompt'] else None,
            model=model
        )

The Generator Agent

The Generator is a thin wrapper around the image generation APIs, but it handles retry logic, model fallback, and cost tracking.

class ImageGenerator:
    def __init__(self):
        self.providers = {
            'imagen': VertexImagenClient(),
            'dall-e-3': OpenAIDalleClient()
        }
        self.cost_tracker = CostTracker()
    
    async def generate(self, prompt: GenerationPrompt) -> GeneratedImage:
        provider = self.providers[prompt.model]
        
        for attempt in range(3):
            try:
                result = await provider.generate(
                    prompt=prompt.positive,
                    negative_prompt=prompt.negative,
                    aspect_ratio=prompt.aspect_ratio
                )
                self.cost_tracker.log(prompt.model, result.cost)
                return result
            except RateLimitError:
                await asyncio.sleep(2 ** attempt)
            except ContentFilterError:
                prompt = self._sanitize_prompt(prompt)
        
        raise GenerationFailedError(f"Failed after 3 attempts")

The QA Inspector Agent

This is the gatekeeper. Every generated image passes through QA before being accepted. The inspector uses a vision model to evaluate quality.

class QAInspector:
    QUALITY_CRITERIA = [
        'visual_quality',      # No artifacts, proper rendering
        'prompt_adherence',    # Matches the specification
        'brand_compliance',    # Aligns with style guide
        'text_accuracy',       # Any text is spelled correctly
        'composition_quality'  # Good visual balance
    ]
    
    async def inspect(self, image: GeneratedImage, spec: ImageSpec) -> QAResult:
        response = self.client.chat.completions.create(
            model="gpt-4o",
            messages=[{
                "role": "user",
                "content": [
                    {"type": "text", "text": self._build_qa_prompt(spec)},
                    {"type": "image_url", "image_url": {"url": image.data_url}}
                ]
            }],
            response_format={"type": "json_object"}
        )
        
        scores = json.loads(response.choices[0].message.content)
        passed = all(scores[c] >= 3 for c in self.QUALITY_CRITERIA)
        return QAResult(passed=passed, scores=scores)

The Orchestrator

The orchestrator manages the flow between agents. If QA fails, it sends the image back to the Prompt Engineer with feedback for a retry.

class PipelineOrchestrator:
    def __init__(self):
        self.interpreter = BriefInterpreter(load_style_guide())
        self.prompt_engineer = PromptEngineer()
        self.generator = ImageGenerator()
        self.qa = QAInspector()
        self.post_processor = PostProcessor()
    
    async def process(self, brief: str, max_attempts: int = 3) -> FinalImage:
        spec = self.interpreter.interpret(brief)
        
        for attempt in range(max_attempts):
            prompt = self.prompt_engineer.craft_prompt(spec, model='imagen')
            image = await self.generator.generate(prompt)
            qa_result = await self.qa.inspect(image, spec)
            
            if qa_result.passed:
                final = await self.post_processor.process(image, spec)
                return final
            
            spec = self._refine_spec(spec, qa_result)
        
        raise QualityThresholdNotMetError(brief)

Scaling to 500 Images Per Week

At scale, the key optimization is parallelism. The interpreter and prompt engineer are fast (LLM calls), but generation and QA are slower. I run generation in parallel batches of 10, with QA running concurrently on completed images.

The multi-agent approach also enables easy A/B testing. I can swap in a new Prompt Engineer agent trained on different optimization strategies and compare QA pass rates between the two versions without touching the rest of the pipeline.

This architecture has been running in production for 4 months, generating over 8,000 images with a 91% first-pass QA rate. The multi-agent design makes it maintainable, testable, and adaptable as image generation models continue to improve.