11 March 2026 | 3 min read

Setting Up Telegram Alerts for AI Pipeline Monitoring

Telegram monitoring AI pipelines alerting Python DevOps

Why Telegram for Pipeline Monitoring

When you are running AI pipelines in production, things break. Models hit rate limits, APIs go down, data quality drifts, and servers run out of memory. You need to know about these problems immediately, not when a customer complains or when you happen to check your logs.

I use Telegram as my primary alerting channel for all my production AI systems. It hits the perfect balance of immediacy, simplicity, and reliability. My phone buzzes within seconds of any pipeline failure, and I can see exactly what went wrong without opening a laptop.

Creating Your Telegram Bot

Setting up a Telegram bot takes about five minutes. Here is the process:

Open Telegram and search for @BotFather
Send /newbot and follow the prompts to name your bot
Save the API token you receive
Create a private channel or group for your alerts
Add the bot to your channel and get the chat ID

Getting Your Chat ID

The easiest way to get your chat ID is to send a message to your bot, then hit the Telegram API:

curl https://api.telegram.org/bot<YOUR_TOKEN>/getUpdates

Look for the chat.id field in the response. For channels, the ID will be a negative number.

Building the Alert Module

I keep a reusable alert module that all my pipelines import. Here is the core of it:

import httpx
from datetime import datetime

class TelegramAlerts:
    def __init__(self, token: str, chat_id: str):
        self.token = token
        self.chat_id = chat_id
        self.base_url = f"https://api.telegram.org/bot{token}"
    
    async def send(self, message: str, level: str = "info"):
        icons = {
            "info": "[INFO]",
            "warning": "[WARN]",
            "error": "[ERROR]",
            "critical": "[CRITICAL]"
        }
        prefix = icons.get(level, "[INFO]")
        timestamp = datetime.now().strftime("%H:%M:%S")
        
        text = f"{prefix} {timestamp}\n{message}"
        
        async with httpx.AsyncClient() as client:
            await client.post(
                f"{self.base_url}/sendMessage",
                json={
                    "chat_id": self.chat_id,
                    "text": text,
                    "parse_mode": "HTML"
                }
            )

What I Monitor

After running nine production AI projects, I have settled on a core set of alerts that catch 95% of problems:

Pipeline start and completion: Know when jobs kick off and finish, with duration
API rate limits: Get warned before you hit hard limits on Claude, OpenAI, or Gemini
Error counts: If errors exceed a threshold within a time window, alert immediately
Cost tracking: Daily spend summaries and alerts when usage spikes unexpectedly
Data quality: When scoring pipelines detect quality drops, flag for review
Server health: CPU, memory, and disk usage on my VPS

Rate Limiting Your Alerts

One mistake I made early on was flooding my phone with alerts during cascading failures. When an API goes down, every request fails, and you do not need 500 individual failure messages. I solved this with a simple debounce pattern:

from collections import defaultdict
from time import time

class AlertThrottler:
    def __init__(self, cooldown_seconds: int = 300):
        self.cooldown = cooldown_seconds
        self.last_sent = defaultdict(float)
    
    def should_send(self, alert_key: str) -> bool:
        now = time()
        if now - self.last_sent[alert_key] > self.cooldown:
            self.last_sent[alert_key] = now
            return True
        return False

Structured Alert Messages

Good alert messages tell you exactly what happened, where it happened, and what to do about it. I use a consistent format across all my pipelines:

[ERROR] 14:23:07
Pipeline: content-scorer
Stage: gemini-analysis
Error: Rate limit exceeded (429)
Requests today: 1,847 / 2,000
Action: Auto-retry in 60s
Dashboard: https://example.com/logs

This format lets me triage problems at a glance without needing to SSH into the server.

Integration with FastAPI Services

Most of my AI applications run as FastAPI services. I add alert hooks to the exception handlers so that unhandled errors automatically trigger Telegram messages:

@app.exception_handler(Exception)
async def global_exception_handler(request, exc):
    await alerts.send(
        f"Unhandled exception in {request.url.path}\n"
        f"Type: {type(exc).__name__}\n"
        f"Detail: {str(exc)[:200]}",
        level="error"
    )
    return JSONResponse(status_code=500, content={"detail": "Internal error"})

Daily Digest Reports

Beyond real-time alerts, I send myself a daily digest at 8am summarizing all pipeline activity from the previous 24 hours. This includes total requests processed, error rates, costs incurred, and any notable events. It takes about 30 seconds to read and gives me confidence that everything is running smoothly.

The best monitoring system is the one you actually check. Telegram wins because it lives on the device I already look at dozens of times a day.

Getting Started

Start with basic error alerts on your most critical pipeline. Once you see how much faster you catch and resolve issues, you will want to add alerts to everything. The whole setup takes under an hour and costs nothing. Telegram bots are free, and the API is generous with rate limits for alerting use cases.