Skip to content

Drift Detection

What is behavioral drift?

Behavioral drift occurs when an AI model’s outputs change systematically over time in ways that were not intended. Drift can result from:

  • Model updates — the model provider silently deploys a new version
  • Prompt changes — your team modifies prompts without realizing the downstream impact
  • Distribution shift — the nature of incoming queries changes
  • Jailbreak accumulation — adversarial inputs gradually shifting model behavior

GOVERN Probe’s drift detector establishes a behavioral baseline from your first inferences and continuously compares new inferences against that baseline.

How drift is measured

The drift detector measures change across five behavioral dimensions:

DimensionWhat’s measured
Response lengthMean and variance of response token count
ToneSentiment distribution (positive/neutral/negative)
Format adherenceStructure consistency (markdown usage, list frequency)
Topic distributionSemantic topic clusters across responses
Refusal rateFrequency of model refusals and safety responses

Each dimension produces a component drift score. The final drift score is a weighted average:

drift_score = 0.3 × topic_drift + 0.25 × tone_drift + 0.2 × length_drift + 0.15 × format_drift + 0.10 × refusal_drift

Baseline establishment

Drift detection requires a minimum number of inferences to establish a reliable baseline:

scoring:
drift:
min_baseline_inferences: 100 # default: 100
baseline_window_hours: 168 # default: 7 days

During baseline establishment, drift scores are computed but no alerts fire (drift_score is returned as null in telemetry until baseline is ready).

The baseline is a rolling window — old inferences age out as new ones arrive. This means the baseline reflects your application’s recent normal behavior, not its behavior from months ago.

Alert conditions

Drift alerts fire when:

  1. The baseline is established (min inferences reached)
  2. The current drift score exceeds the threshold
  3. The elevated drift persists for at least 3 consecutive inference batches (to reduce noise)
scoring:
drift:
enabled: true
threshold: 0.25
alert_persistence_batches: 3
baseline_window_hours: 168
min_baseline_inferences: 100

Drift event types

EventDescription
drift.baseline_establishedEnough inferences collected for baseline
drift.threshold_exceededDrift score crossed threshold for first time
drift.sustainedDrift persists for alert_persistence_batches
drift.recoveredDrift score returned below threshold
drift.baseline_resetManual or automatic baseline reset

Manual baseline reset

If you intentionally change your prompts or model, reset the baseline so the new behavior becomes the new normal:

Terminal window
# Via API
curl -X POST https://api.govern.archetypal.ai/v1/probes/probe_xxxx/drift/reset \
-H "Authorization: Bearer gvn_live_xxxx" \
-H "Content-Type: application/json" \
-d '{"reason": "Upgraded to Claude Sonnet 4 — new model version"}'

After a reset, the baseline establishment period begins again.

Dashboard analysis

The GOVERN dashboard shows drift over time as a time series. You can:

  • See which dimensions are drifting (topic vs. tone vs. format)
  • Compare inference samples from baseline vs. current period
  • Annotate drift events with deployment notes
  • Set up Slack/PagerDuty alerts for sustained drift