Drift Detection

What is behavioral drift?

Behavioral drift occurs when an AI model’s outputs change systematically over time in ways that were not intended. Drift can result from:

Model updates — the model provider silently deploys a new version
Prompt changes — your team modifies prompts without realizing the downstream impact
Distribution shift — the nature of incoming queries changes
Jailbreak accumulation — adversarial inputs gradually shifting model behavior

GOVERN Probe’s drift detector establishes a behavioral baseline from your first inferences and continuously compares new inferences against that baseline.

How drift is measured

The drift detector measures change across five behavioral dimensions:

Dimension	What’s measured
Response length	Mean and variance of response token count
Tone	Sentiment distribution (positive/neutral/negative)
Format adherence	Structure consistency (markdown usage, list frequency)
Topic distribution	Semantic topic clusters across responses
Refusal rate	Frequency of model refusals and safety responses

Each dimension produces a component drift score. The final drift score is a weighted average:

drift_score = 0.3 × topic_drift + 0.25 × tone_drift + 0.2 × length_drift + 0.15 × format_drift + 0.10 × refusal_drift

Baseline establishment

Drift detection requires a minimum number of inferences to establish a reliable baseline:

scoring:
  drift:
    min_baseline_inferences: 100  # default: 100
    baseline_window_hours: 168    # default: 7 days

During baseline establishment, drift scores are computed but no alerts fire (drift_score is returned as null in telemetry until baseline is ready).

The baseline is a rolling window — old inferences age out as new ones arrive. This means the baseline reflects your application’s recent normal behavior, not its behavior from months ago.

Alert conditions

Drift alerts fire when:

The baseline is established (min inferences reached)
The current drift score exceeds the threshold
The elevated drift persists for at least 3 consecutive inference batches (to reduce noise)

scoring:
  drift:
    enabled: true
    threshold: 0.25
    alert_persistence_batches: 3
    baseline_window_hours: 168
    min_baseline_inferences: 100

Drift event types

Event	Description
`drift.baseline_established`	Enough inferences collected for baseline
`drift.threshold_exceeded`	Drift score crossed threshold for first time
`drift.sustained`	Drift persists for `alert_persistence_batches`
`drift.recovered`	Drift score returned below threshold
`drift.baseline_reset`	Manual or automatic baseline reset

Manual baseline reset

If you intentionally change your prompts or model, reset the baseline so the new behavior becomes the new normal:

# Via API
curl -X POST https://api.govern.archetypal.ai/v1/probes/probe_xxxx/drift/reset \
  -H "Authorization: Bearer gvn_live_xxxx" \
  -H "Content-Type: application/json" \
  -d '{"reason": "Upgraded to Claude Sonnet 4 — new model version"}'

After a reset, the baseline establishment period begins again.

Dashboard analysis

The GOVERN dashboard shows drift over time as a time series. You can:

See which dimensions are drifting (topic vs. tone vs. format)
Compare inference samples from baseline vs. current period
Annotate drift events with deployment notes
Set up Slack/PagerDuty alerts for sustained drift