| 4 min read

Building a News Intelligence Platform with AI Signal Scoring

AI Signal Desk news intelligence NLP signal scoring Python automation

Why I Built AI Signal Desk

Information overload is a real problem. Whether you are tracking markets, monitoring competitors, or staying current with AI research, the volume of relevant news is impossible to process manually. I wanted a system that would ingest news from multiple sources, score each item's relevance and significance, and surface only the signals that actually matter.

AI Signal Desk is the result. It is a self-hosted intelligence platform that processes hundreds of articles daily and presents a prioritised feed of actionable signals.

Data Ingestion Layer

The platform pulls from multiple source types:

  • RSS feeds: Over 60 curated feeds covering AI research, fintech, crypto markets, and technology news
  • API sources: Twitter/X lists, Reddit subreddits (via API), and Hacker News top stories
  • Custom scrapers: For sources without APIs or feeds, I wrote targeted scrapers using httpx and BeautifulSoup

Each source has a normalisation layer that converts raw content into a unified format:

@dataclass
class RawSignal:
    source: str
    title: str
    content: str
    url: str
    published_at: datetime
    source_type: str  # rss, api, scraper
    raw_metadata: dict

Deduplication

When you are pulling from 60+ sources, the same story appears multiple times. I use a two-stage deduplication approach:

  1. URL normalisation: Strip tracking parameters and compare canonical URLs
  2. Semantic similarity: For stories with different URLs but the same content, I compute title embeddings and flag pairs above a cosine similarity threshold of 0.92

AI Signal Scoring

This is the core of the platform. Each deduplicated signal runs through a scoring pipeline that assigns three scores:

Relevance Score (0 to 1)

How relevant is this signal to my defined interest areas? I maintain a set of topic profiles (essentially descriptions of what I care about), and the AI rates each signal against these profiles.

Significance Score (0 to 1)

Is this a minor update or a major development? The model evaluates whether the signal represents a genuine shift, a new capability, a market event, or just incremental news.

Actionability Score (0 to 1)

Does this signal require me to do something? Change a strategy, explore a new tool, adjust a position? This score helps separate "interesting to know" from "need to act on."

async def score_signal(signal: RawSignal, profiles: list[TopicProfile]) -> ScoredSignal:
    prompt = f"""Score this news signal on three dimensions (0-1 each):
    
    Signal: {signal.title}
    Content: {signal.content[:2000]}
    
    Topic profiles: {json.dumps([p.description for p in profiles])}
    
    Return JSON: {{"relevance": float, "significance": float, 
                   "actionability": float, "reasoning": str}}"""
    
    response = await claude_client.messages.create(
        model="claude-haiku-4-20250414",
        max_tokens=512,
        messages=[{"role": "user", "content": prompt}]
    )
    scores = json.loads(response.content[0].text)
    return ScoredSignal(signal=signal, **scores)

I use Claude Haiku for scoring because it needs to process hundreds of signals daily and the cost has to stay manageable. Haiku handles this classification task very well, and the per-signal cost is fractions of a penny.

The Dashboard

Scored signals are stored in PostgreSQL and served through a FastAPI backend. The frontend is a simple but functional dashboard that shows:

  • A ranked list of today's top signals, sorted by composite score
  • Trend detection: topics that are suddenly appearing across multiple sources
  • Source health monitoring: which feeds are active, which have gone stale
  • Historical search with vector similarity matching

Trend Detection

Beyond individual signal scoring, the system looks for emerging patterns. If a topic that normally appears in 2 articles per day suddenly appears in 15, that is flagged as a trend. This is done with a simple statistical approach comparing rolling averages against current-day counts, weighted by the significance scores of the individual signals.

Operational Considerations

Running this system 24/7 required some practical engineering:

  • Rate limiting: Each source has configurable polling intervals to avoid hitting API limits
  • Graceful degradation: If a source fails, the system continues with the remaining sources and logs the failure
  • Cost monitoring: Daily API spend is tracked and alerts fire if it exceeds a threshold
  • Database maintenance: Signals older than 90 days are archived to keep query performance fast

Daily Briefing System

Beyond the dashboard, I built a daily briefing system that sends a morning email with the top 5 signals from the previous 24 hours. Each signal includes a one-sentence summary, the composite score, and a direct link to the full analysis. This has become one of the most valued features because it means I start every day knowing what happened overnight without needing to open the dashboard. The briefing is generated by sending the top signals to Claude Haiku for synthesis, costing fractions of a penny per day.

What I Learned

The biggest lesson from building AI Signal Desk is that scoring and ranking is a harder problem than it looks. My first version used a simple weighted sum of the three scores, but this surfaced too many "interesting but not actionable" signals. I eventually added a configurable scoring formula that lets me adjust weights based on what I am focused on at any given time.

The second lesson is that source quality matters enormously. A handful of high-quality, low-volume sources produce better signals than dozens of noisy, high-volume feeds. I spent more time curating sources than building the scoring pipeline, and that was time well spent.