Scoring Modes

Three modes

GOVERN Probe operates in one of three modes, set via SCORING_MODE or the scoring.mode YAML key:

Mode	Traffic effect	Alert	Use when
`log`	None — always passes	No	Starting out, baselining behavior
`flag`	None — always passes	Yes	Monitoring with human review
`block`	422 returned when threshold exceeded	Yes	Compliance enforcement

Log mode

SCORING_MODE=log

Every inference passes through unchanged. Scores are computed and telemetry is flushed, but no alerts are emitted and no requests are blocked. This is the default mode.

Use log mode to:

Establish a behavioral baseline (minimum 100 inferences before drift detection is useful)
Understand your application’s normal score distribution
Test the Probe without any risk to production traffic

Sample log output:

{"level":"info","msg":"inference scored","mode":"log","security":0.02,"bias":0.01,"accuracy":0.91,"drift":0.04,"cost":0.08,"action":"pass"}

Flag mode

SCORING_MODE=flag

Every inference passes through unchanged. When any score exceeds its threshold, a FLAG event is emitted to the GOVERN platform. Alerts fire in the dashboard, webhook events are dispatched, and the inference is tagged as a policy concern.

Use flag mode for:

Production monitoring with human-in-the-loop review
Soft enforcement: visibility without risk of blocking legitimate traffic
High-stakes applications where false positives from blocking would be costly

Sample log output:

{"level":"warn","msg":"inference flagged","mode":"flag","security":0.83,"threshold":0.70,"action":"flag","flag_reason":"credential_exposure","event_id":"evt_01HXYZ"}

Block mode

SCORING_MODE=block

When any enabled scorer exceeds its threshold, the Probe returns a 422 Unprocessable Entity to the caller instead of the model response. The response body indicates which scorer triggered the block.

Block response:

HTTP/1.1 422 Unprocessable Entity
Content-Type: application/json

{
  "error": {
    "type": "govern_policy_violation",
    "message": "Inference blocked by GOVERN Probe",
    "violations": [
      {
        "scorer": "security",
        "score": 0.87,
        "threshold": 0.70,
        "reason": "potential_credential_exposure"
      }
    ],
    "event_id": "evt_01HXYZ",
    "probe_id": "probe-production-1"
  }
}

Use block mode for:

Regulatory compliance environments (EU AI Act, NIST RMF)
Applications handling sensitive data where policy violations must never pass
Post-baselining enforcement after flag mode has been tuned

Per-scorer mode override

You can run different modes per scorer. For example: block on security violations, flag on bias, log accuracy:

scoring:
  mode: log           # default for all scorers
  security:
    mode: block       # override: block on security violations
    threshold: 0.70
  bias:
    mode: flag        # override: flag on bias
    threshold: 0.60
  accuracy:
    mode: log         # explicitly log (same as default)

Threshold tuning guide

Thresholds are scores from 0.0 (not detected) to 1.0 (maximum confidence). A score of 0.70 means the scorer is 70% confident the dimension has been violated.

Starting threshold	Meaning
`0.90`	Only flag/block on very high confidence findings
`0.70`	Standard production default
`0.50`	Aggressive — catches more, more false positives
`0.30`	Very aggressive — research/audit only

Recommended tuning sequence:

Deploy in log mode with default thresholds (0.70)
Review 500+ inferences in the GOVERN dashboard
Identify p95 scores for each dimension in normal traffic
Set thresholds to p95 + 0.10 buffer
Switch to flag mode, monitor alerts for 1 week
Adjust thresholds based on false positive rate
Switch to block mode when false positive rate is acceptable