| 4 min read

What I Learned Building 9 AI Projects in 12 Months

AI engineering lessons learned career portfolio production AI retrospective

The Year in Review

Twelve months ago, I made a deliberate decision to go deep on AI engineering. Not just experimenting with APIs, but building real, production systems that handle real data and serve real users. Nine projects later, I want to share the lessons that no tutorial or course taught me.

The Projects

For context, here is what I built:

  • A multi-agent content generation pipeline producing 1000+ articles
  • A financial signal scoring system with 8 quality factors
  • An autonomous art generation system with spatial logic
  • A RAG-powered knowledge base for technical documentation
  • A YouTube SEO optimization pipeline using Gemini
  • An automated video production system with ElevenLabs and Suno
  • A brand compliance scoring engine for content agencies
  • An MCP server for Claude Desktop integration
  • A newsletter automation system with AI-generated digests

Lesson 1: Start with the Error Handling

Every project taught me the same thing: write the error handling before the happy path. AI systems fail in creative and unexpected ways. Rate limits, content filters, malformed outputs, hallucinated JSON, timeout cascades. If you build the happy path first and bolt on error handling later, you will spend more time debugging failures than building features.

My current approach: I build the retry logic, fallback chains, and dead letter queues first, then fill in the actual business logic. This feels backwards but consistently produces more reliable systems.

Lesson 2: Cost Management Is a First-Class Concern

My second project racked up $400 in API costs during a single debugging session because I forgot to implement cost tracking. After that, every project gets a cost monitoring layer from day one.

Specific practices that saved me money:

  • Use the cheapest model that meets quality requirements. GPT-4o-mini handles 70% of tasks that people reflexively route to GPT-4o.
  • Cache everything. Identical prompts with identical inputs should never call the API twice.
  • Set hard spending limits per project, per day, and per user.
  • Log every API call with its token count and cost. You cannot optimize what you do not measure.

Lesson 3: Prompts Are Code, Treat Them Like It

In my early projects, prompts were inline strings scattered throughout the codebase. That was a maintenance nightmare. Now I treat prompts as first-class code artifacts:

  • Stored in dedicated files with version control
  • Tested with evaluation suites that measure output quality
  • Parameterized with clear variable names
  • Reviewed during code review just like any other code change

A single word change in a prompt can dramatically alter system behavior. Treat prompt changes with the same rigor as code changes.

Lesson 4: Multi-Agent Is Not Always Better

After building the multi-agent content pipeline, I tried to apply the pattern to everything. Sometimes a simple function call chain is better than a multi-agent system. The overhead of agent communication, state management, and coordination is only worth it when you genuinely have distinct specialized tasks that benefit from independent optimization.

My rule of thumb: if you can describe the entire workflow in a single prompt without losing quality, you do not need agents.

Lesson 5: Evaluation Is the Hardest Problem

How do you know if your AI system is working well? For the financial system, metrics were clear: Sharpe ratio, win rate, drawdown. For content generation and art systems, evaluation is much harder. "Is this article good?" is a subjective question.

What worked for me:

  • Define specific, measurable quality dimensions (accuracy, readability, brand compliance)
  • Build automated scoring rubrics that measure each dimension independently
  • Calibrate automated scores against human judgment
  • Track quality metrics over time to catch degradation

Lesson 6: The Prototype-to-Production Gap Is Real

Getting a demo working takes a day. Getting it production-ready takes a month. The gap is filled with error handling, input validation, rate limiting, monitoring, logging, testing, documentation, and deployment infrastructure. Every project confirmed this ratio.

I now plan for this explicitly. When estimating project timelines, I allocate 20% for the core AI logic and 80% for everything else.

Lesson 7: Build Your Own Tools

The MCP server project was the most impactful thing I built. Not because it was technically complex, but because it multiplied my daily productivity. Every AI engineer should invest time building tools that automate their own workflow. The compound returns are enormous.

Lesson 8: Document Everything in Real-Time

Writing these blog posts was not just content marketing. It was forced documentation. Every time I wrote up a project, I discovered architectural decisions I had forgotten the reasoning for, edge cases I had not considered, and optimizations I wanted to revisit. Writing is thinking, and thinking about your systems makes them better.

Lesson 9: The Field Is Moving Fast, but Fundamentals Are Stable

Models change every few months. The fundamentals do not. Good software architecture, robust error handling, systematic testing, cost awareness, and clean code matter just as much in AI engineering as in any other discipline. The engineers who will thrive are not the ones chasing every new model release, but the ones who build reliable systems on solid foundations.

What Is Next

The next 12 months will focus on deeper specialization. I am particularly interested in advanced agentic workflows, real-time AI systems, and the emerging patterns around AI-native application architecture. The foundation is laid. Now it is time to build on it.

If you are considering a similar journey into AI engineering, my advice is simple: build things. Not tutorials, not courses, not toy examples. Real projects with real constraints. That is where the learning happens.