| 4 min read

How I Built an AI Document Analysis SaaS with Claude and Stripe

Claude API SaaS FastAPI Stripe document analysis AI engineering

Why I Built a Document Analysis SaaS

Late last year I kept running into the same problem: teams at work had stacks of PDFs, policy documents, and technical specs that nobody had time to read properly. People were making decisions based on skimming. I figured if I could build a tool that let anyone upload a document and get structured, accurate analysis back in seconds, it would solve a real pain point.

That idea became my AI document analysis SaaS. It is now live, handling real users, processing real documents, and billing through Stripe. Here is exactly how I built it.

The Architecture

The stack is intentionally lean. I wanted to ship fast and keep operational costs low:

  • Backend: FastAPI running on a VPS with Uvicorn
  • AI layer: Claude API (Anthropic) for document comprehension and structured output
  • Database: PostgreSQL with pgvector for storing document embeddings
  • Payments: Stripe Checkout and webhook-based subscription management
  • Frontend: A clean, minimal interface built with vanilla JS and server-rendered templates

Document Ingestion Pipeline

When a user uploads a PDF, the system runs through a pipeline that I designed to be both robust and cost-efficient:

  1. The file is validated (size limits, file type checks, malware scanning)
  2. Text extraction runs via PyMuPDF, which handles most PDF layouts well
  3. The extracted text gets chunked into overlapping segments of around 1500 tokens
  4. Each chunk is sent to Claude with a structured prompt that returns JSON
  5. Results are aggregated and stored alongside vector embeddings for future retrieval
# Simplified document analysis call
async def analyse_document(chunks: list[str]) -> dict:
    results = []
    for chunk in chunks:
        response = await anthropic_client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=2048,
            messages=[{
                "role": "user",
                "content": f"Analyse this document section and return structured JSON with key_findings, risks, and action_items:\n\n{chunk}"
            }]
        )
        results.append(json.loads(response.content[0].text))
    return merge_analysis(results)

Prompt Engineering for Reliable Output

Getting Claude to return consistent, structured JSON every time was the hardest part of the entire build. Early on, I was getting maybe 85% valid JSON responses. That is not good enough for a production SaaS where users expect reliability.

The fix came down to three things:

  • System prompts with explicit schema definitions: I provide Claude with the exact JSON schema I expect, including field types and constraints
  • Few-shot examples: Two examples of ideal output in the system prompt dramatically improved consistency
  • Validation and retry logic: If the response fails JSON parsing, the system retries once with an even more explicit prompt. This brought success rates above 99.5%

Stripe Integration

I chose Stripe Checkout because it handles the entire payment UI, PCI compliance, and receipt emails. My integration uses webhooks to manage subscription state:

@app.post("/webhooks/stripe")
async def stripe_webhook(request: Request):
    payload = await request.body()
    sig = request.headers.get("stripe-signature")
    event = stripe.Webhook.construct_event(payload, sig, WEBHOOK_SECRET)
    
    if event["type"] == "checkout.session.completed":
        await activate_subscription(event["data"]["object"])
    elif event["type"] == "customer.subscription.deleted":
        await deactivate_subscription(event["data"]["object"])
    
    return {"status": "ok"}

One thing I learned the hard way: always verify webhook signatures. During development I skipped this step and it caused a subtle bug where test events were being processed as real ones.

Cost Management

Running an AI SaaS means every API call costs money. I implemented several strategies to keep costs under control:

  • Caching: If the same document has been analysed before (matched by content hash), serve the cached result
  • Tiered models: Simple documents use Haiku for speed and cost savings, while complex documents route to Sonnet
  • Usage limits: Each subscription tier has a monthly document cap, enforced server-side

The average cost per document analysis is around 2p, which gives healthy margins on even the cheapest subscription tier.

What I Would Do Differently

If I were starting again, I would add streaming responses from day one. Users upload a document and wait for results, and even though processing takes only 10 to 15 seconds, it feels longer without feedback. I have since added a progress indicator, but true streaming of partial results would be better.

I would also invest more time in the chunking strategy upfront. My initial naive approach of splitting on character count caused issues with tables and formatted data. The overlap-aware chunker I use now handles these edge cases, but I lost a few days debugging garbled analysis output before I found the root cause.

Security and Data Handling

When users upload documents, security is paramount. I implemented several safeguards: files are scanned for malware on upload, stored temporarily with randomly generated filenames, and automatically deleted after processing. Document content is never used for training or shared with third parties. This was important not just for user trust, but for compliance with data handling regulations. I also added rate limiting on the upload endpoint to prevent abuse and protect API costs from runaway usage.

Results and Lessons

The SaaS is live and handling documents daily. The key numbers:

  • Average analysis time: 12 seconds for a 20-page document
  • JSON parse success rate: 99.7%
  • Infrastructure cost: under 15 pounds per month at current usage

Building this project taught me that the AI part is honestly the easy bit. The hard parts are billing, error handling, edge cases in document formats, and making the whole thing feel reliable to end users. That is the real work of AI engineering.