Nginx Reverse Proxy Configuration for AI APIs
Why Nginx in Front of AI APIs?
My AI services run on Node.js and Python, listening on high-numbered ports like 3000, 8000, and 8080. I do not expose these directly to the internet. Instead, Nginx sits in front as a reverse proxy, handling SSL termination, rate limiting, CORS headers, and routing. This is a standard pattern, but the specific configuration for AI APIs has some nuances worth documenting.
Basic Reverse Proxy Setup
Here is the core configuration for proxying requests to a Node.js API server:
server {
listen 443 ssl http2;
server_name api.stevecv.com;
ssl_certificate /etc/letsencrypt/live/api.stevecv.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/api.stevecv.com/privkey.pem;
location / {
proxy_pass http://127.0.0.1:3000;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}The key headers (X-Real-IP, X-Forwarded-For, X-Forwarded-Proto) pass client information through to the backend. Without these, your API sees every request as coming from 127.0.0.1.
HTTP to HTTPS Redirect
server {
listen 80;
server_name api.stevecv.com;
return 301 https://$server_name$request_uri;
}Every HTTP request gets redirected to HTTPS. No exceptions.
Rate Limiting
AI APIs are expensive to run. A single client hammering your endpoint can run up significant costs. Nginx rate limiting is the first line of defence:
# In the http block
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
limit_req_zone $binary_remote_addr zone=ai:10m rate=2r/s;
server {
# Standard API endpoints: 10 requests per second
location /api/ {
limit_req zone=api burst=20 nodelay;
proxy_pass http://127.0.0.1:3000;
}
# AI inference endpoints: 2 requests per second
location /api/generate/ {
limit_req zone=ai burst=5 nodelay;
proxy_pass http://127.0.0.1:3000;
}
}I set different rate limits for different endpoints. Standard CRUD operations get 10 requests per second. AI inference endpoints that call LLM APIs get a stricter 2 requests per second to prevent cost overruns.
The burst parameter allows short bursts above the rate limit, and nodelay processes burst requests immediately rather than queuing them.
CORS Configuration
If your AI API is called from a different domain (like a frontend on a separate subdomain), you need CORS headers:
location /api/ {
# CORS headers
add_header Access-Control-Allow-Origin "https://stevecv.com" always;
add_header Access-Control-Allow-Methods "GET, POST, OPTIONS" always;
add_header Access-Control-Allow-Headers "Authorization, Content-Type" always;
add_header Access-Control-Max-Age 86400 always;
# Handle preflight requests
if ($request_method = OPTIONS) {
return 204;
}
proxy_pass http://127.0.0.1:3000;
}Restrict the Access-Control-Allow-Origin to your actual domain. Never use * in production, especially for authenticated endpoints.
Streaming Responses
LLM APIs often return streaming responses (Server-Sent Events). Nginx needs specific configuration to avoid buffering these:
location /api/stream/ {
proxy_pass http://127.0.0.1:3000;
proxy_http_version 1.1;
proxy_set_header Connection '';
proxy_buffering off;
proxy_cache off;
chunked_transfer_encoding off;
}The critical setting is proxy_buffering off. Without it, Nginx buffers the entire response before sending it to the client, which defeats the purpose of streaming.
WebSocket Support
For real-time features like pipeline status updates, I use WebSockets. Nginx needs the upgrade headers:
location /ws/ {
proxy_pass http://127.0.0.1:3000;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_read_timeout 86400s;
proxy_send_timeout 86400s;
}The extended timeouts prevent Nginx from closing idle WebSocket connections. The default 60-second timeout is too short for persistent connections.
Request Size Limits
AI APIs sometimes receive large payloads: images for analysis, audio files for transcription, or long text documents. Increase the body size limit for these endpoints:
# Global default
client_max_body_size 10m;
# Override for upload endpoints
location /api/upload/ {
client_max_body_size 100m;
proxy_pass http://127.0.0.1:3000;
proxy_read_timeout 300s; # Allow long processing times
}Security Headers
I add standard security headers to every response:
add_header X-Content-Type-Options nosniff always;
add_header X-Frame-Options DENY always;
add_header X-XSS-Protection "1; mode=block" always;
add_header Referrer-Policy strict-origin-when-cross-origin always;
add_header Content-Security-Policy "default-src 'self'" always;Monitoring and Logging
Custom log format for API requests that includes response time:
log_format api '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" $request_time';
access_log /var/log/nginx/api.access.log api;The $request_time variable logs how long each request took, which is invaluable for identifying slow API endpoints.
Nginx is the invisible layer that makes everything work reliably. Spend time getting it right, and you will save yourself from a whole category of production issues.