| 4 min read

Building an AI Second Brain with pgvector and MCP

pgvector MCP second brain knowledge management RAG AI engineering

Why I Built Open Brain

I consume a lot of information: research papers, technical articles, project notes, meeting summaries, code snippets, and random ideas. Traditional note-taking apps are fine for storage, but terrible for retrieval. I can never find the thing I half-remember reading three months ago. Search by keyword fails because I rarely remember the exact words I used.

Open Brain is my solution. It is a personal knowledge system that stores everything I capture, embeds it for semantic search, and exposes it through the Model Context Protocol (MCP) so my AI assistants can access my knowledge base directly.

The Core Architecture

The system has three layers:

  • Ingestion: Multiple input methods for capturing knowledge from different sources
  • Storage and retrieval: PostgreSQL with pgvector for semantic search across all stored knowledge
  • Access: MCP server that lets Claude and other AI assistants query my knowledge base

Ingestion Methods

I built several ingestion pathways to make it easy to capture knowledge regardless of where I encounter it:

  • CLI tool: A command-line interface for quick notes and code snippets
  • Web clipper: A bookmarklet that sends article content to the API
  • File watcher: Monitors a folder for markdown files and ingests them automatically
  • API endpoint: For programmatic ingestion from other tools and scripts
# CLI usage examples
$ brain add "FastAPI handles async natively, unlike Flask" --tags python,fastapi
$ brain add-url https://example.com/article --tags research
$ brain search "how to handle streaming responses in FastAPI"

Chunking and Embedding

Every piece of content goes through the same pipeline: clean, chunk, embed, store. For longer content like articles, I use a recursive text splitter with 500-token chunks and 50-token overlap. Each chunk is embedded using OpenAI's text-embedding-3-small model and stored alongside the original text and metadata.

async def ingest_content(content: str, metadata: dict):
    chunks = recursive_split(content, chunk_size=500, overlap=50)
    
    for i, chunk in enumerate(chunks):
        embedding = await get_embedding(chunk)
        await db.execute(
            "INSERT INTO knowledge (content, embedding, metadata, chunk_index) "
            "VALUES ($1, $2, $3, $4)",
            chunk, embedding, json.dumps(metadata), i
        )

Semantic Search

The search layer uses pgvector's cosine similarity to find relevant knowledge chunks. But raw vector search is just the starting point. I added several refinements:

  • Reranking: Top 20 vector results are reranked using Claude Haiku for relevance to the actual query
  • Source grouping: If multiple chunks from the same source match, they are grouped and presented together with context
  • Recency weighting: More recent knowledge gets a slight boost, configurable per query

The MCP Integration

This is where things get really powerful. The Model Context Protocol lets AI assistants call tools and access resources from external systems. I built an MCP server that exposes my knowledge base as a set of tools:

from mcp.server import Server
from mcp.types import Tool

server = Server("open-brain")

@server.tool()
async def search_knowledge(query: str, limit: int = 5) -> str:
    """Search the knowledge base for relevant information."""
    results = await semantic_search(query, limit=limit)
    return format_results(results)

@server.tool()
async def add_knowledge(content: str, tags: list[str] = []) -> str:
    """Add new knowledge to the brain."""
    await ingest_content(content, {"tags": tags, "source": "mcp"})
    return "Knowledge stored successfully."

@server.tool()
async def get_recent(days: int = 7, topic: str = None) -> str:
    """Get recently added knowledge, optionally filtered by topic."""
    results = await get_recent_entries(days=days, topic=topic)
    return format_results(results)

With this MCP server running, I can use Claude and say things like "search my notes for that article about streaming architectures" or "what did I save about pgvector indexing strategies?" and Claude will query my knowledge base directly.

Daily Usage Patterns

After three months of use, I have around 2,400 knowledge chunks stored. My most common interactions are:

  • Searching for code patterns I have used before in other projects
  • Retrieving notes from technical articles I read weeks ago
  • Asking Claude to summarise everything I know about a specific topic by pulling from my knowledge base
  • Finding connections between ideas stored at different times

Performance and Scaling

With 2,400 chunks, pgvector searches complete in under 15 milliseconds. The IVFFlat index handles this volume easily, and I expect it to perform well up to tens of thousands of chunks before needing any optimisation. The entire system runs on the same VPS as my other projects, using minimal resources.

Tagging and Organisation

Every knowledge chunk can be tagged with multiple labels. Tags help with both filtering and context. When I search for something, I can optionally restrict results to specific tags like "python" or "architecture" to narrow down the results. The tagging system also powers a simple analytics view that shows me which topics I have been capturing the most knowledge about, revealing patterns in my learning that I would not notice otherwise. Over the past three months, my top tags have been "AI engineering," "pgvector," and "prompt engineering," which aligns well with where I have been focusing my work.

What I Learned

The most surprising lesson was how much the MCP integration changed my workflow. Before MCP, I had to context-switch between my knowledge base and my AI conversations. Now they are seamlessly connected. When I ask Claude a question about one of my projects, it can pull in my actual notes and past decisions rather than guessing.

If you are building any kind of retrieval-augmented generation system, I strongly recommend considering MCP as the access layer. It turns a standalone tool into a capability that every AI assistant in your workflow can use.