How I Built a Signal Scoring System with 8 Quality Factors
Not All Signals Are Created Equal
When you run multiple AI models generating trade signals, you quickly discover that raw signals are not enough. A momentum model might fire a buy signal on a stock with terrible liquidity during an earnings blackout period. Without a quality filter, that signal goes straight to execution and loses money.
I built a scoring system that evaluates every signal across 8 quality factors before it reaches the risk engine. Signals that score below the threshold get discarded. Here is how it works.
The 8 Quality Factors
Each factor produces a score between 0 and 1. The composite score is a weighted average of all 8 factors.
Factor 1: Signal Strength
How confident is the model in this signal? I normalize the raw model output to a 0-1 scale based on historical signal distributions.
def score_signal_strength(raw_signal: float, history: list[float]) -> float:
percentile = np.searchsorted(sorted(history), abs(raw_signal)) / len(history)
return percentile
Factor 2: Model Agreement
When multiple models agree on direction, the signal is more reliable. I measure agreement as the fraction of active models pointing the same way.
def score_model_agreement(signals: dict[str, float]) -> float:
directions = [1 if s > 0 else -1 for s in signals.values() if s != 0]
if not directions:
return 0.0
majority = max(directions.count(1), directions.count(-1))
return majority / len(directions)
Factor 3: Liquidity
Signals on illiquid instruments are dangerous because you cannot enter or exit positions efficiently. I score liquidity based on average daily volume relative to the intended position size.
def score_liquidity(avg_daily_volume: float, position_size: float) -> float:
volume_ratio = avg_daily_volume / max(position_size, 1)
return min(volume_ratio / 50, 1.0) # Full score at 50x volume coverage
Factor 4: Volatility Regime
Signals generated during extreme volatility regimes are less reliable because model training data typically underrepresents these conditions.
def score_volatility_regime(current_vol: float, historical_vol: list[float]) -> float:
vol_percentile = np.searchsorted(sorted(historical_vol), current_vol) / len(historical_vol)
if vol_percentile > 0.9:
return 0.3 # High vol penalty
elif vol_percentile > 0.75:
return 0.7 # Moderate penalty
return 1.0
Factor 5: Time Decay
Signals lose relevance over time. A signal generated 5 minutes ago is more actionable than one generated 2 hours ago. I apply exponential decay.
def score_time_decay(signal_age_minutes: float, half_life: float = 30) -> float:
return np.exp(-0.693 * signal_age_minutes / half_life)
Factor 6: Calendar Awareness
Certain periods are systematically problematic for trading signals: earnings weeks, options expiration days, major economic releases. I maintain a calendar and penalize signals during these windows.
def score_calendar(symbol: str, timestamp: datetime, event_calendar: dict) -> float:
events = event_calendar.get(symbol, [])
for event in events:
days_until = (event['date'] - timestamp.date()).days
if 0 <= days_until <= event.get('blackout_days', 2):
return event.get('penalty', 0.2)
return 1.0
Factor 7: Sector Concentration
If the portfolio already has heavy exposure to a sector, additional signals in that sector receive a penalty to encourage diversification.
def score_sector_concentration(
symbol_sector: str,
portfolio_sectors: dict[str, float],
max_sector_pct: float = 0.25
) -> float:
current_pct = portfolio_sectors.get(symbol_sector, 0)
if current_pct >= max_sector_pct:
return 0.0
return 1.0 - (current_pct / max_sector_pct)
Factor 8: Recent Performance
Models go through hot and cold streaks. I track the recent hit rate of each model and boost or penalize accordingly.
def score_recent_performance(
model_id: str,
recent_trades: list[dict],
lookback: int = 20
) -> float:
model_trades = [t for t in recent_trades if t['model'] == model_id][-lookback:]
if len(model_trades) < 5:
return 0.5 # Neutral for insufficient data
win_rate = sum(1 for t in model_trades if t['pnl'] > 0) / len(model_trades)
return win_rate
The Composite Scoring Engine
class SignalScorer:
WEIGHTS = {
'signal_strength': 0.20,
'model_agreement': 0.20,
'liquidity': 0.15,
'volatility_regime': 0.10,
'time_decay': 0.10,
'calendar': 0.10,
'sector_concentration': 0.08,
'recent_performance': 0.07
}
def score(self, signal: TradeSignal, context: dict) -> float:
scores = {
'signal_strength': score_signal_strength(signal.raw, context['history']),
'model_agreement': score_model_agreement(context['all_signals']),
'liquidity': score_liquidity(context['adv'], signal.size),
'volatility_regime': score_volatility_regime(
context['current_vol'], context['vol_history']
),
'time_decay': score_time_decay(signal.age_minutes),
'calendar': score_calendar(signal.symbol, signal.timestamp, context['calendar']),
'sector_concentration': score_sector_concentration(
signal.sector, context['sector_weights']
),
'recent_performance': score_recent_performance(
signal.model_id, context['recent_trades']
)
}
composite = sum(
score * self.WEIGHTS[factor]
for factor, score in scores.items()
)
return composite, scores
Setting the Threshold
I calibrate the threshold using historical data. The goal is to find the score cutoff that maximizes the Sharpe ratio of accepted signals. In my systems, this typically lands between 0.55 and 0.70.
Results
After implementing this scoring system, the quality of executed trades improved dramatically. The win rate on accepted signals went from 52% to 61%, and the average profit per trade increased by 40%. More importantly, the worst single-day loss dropped by 55% because the system was filtering out the highest-risk signals before they could do damage.
Signal scoring is the layer between your AI models and your money. Build it carefully, calibrate it regularly, and it will be the most valuable component in your trading system.