What is GOVERN Probe?
Overview
GOVERN Probe is a lightweight Docker container that operates as a transparent reverse proxy between your application and any AI model API. It requires zero changes to your application code — you redirect one environment variable, and every inference is automatically monitored.
The Probe intercepts both the outbound request and the inbound response. It scores the full conversation turn across five dimensions, buffers telemetry in a ring buffer, and flushes batches to the GOVERN platform asynchronously. Your application never waits for scoring — the Probe returns the model response immediately and scores in parallel.
Architecture
┌─────────────────┐ ┌──────────────────────┐ ┌──────────────────┐│ Your App │──POST──▶│ GOVERN Probe │──POST──▶│ Model API ││ (any language) │ │ :4020 │ │ (Anthropic, ││ │◀──resp──│ │◀──resp──│ OpenAI, etc.) │└─────────────────┘ │ ┌────────────────┐ │ └──────────────────┘ │ │ Scoring Engine │ │ │ │ • Security │ │ │ │ • Bias │ │ │ │ • Accuracy │ │ │ │ • Drift │ │ │ │ • Cost │ │ │ └────────────────┘ │ │ │ │ │ ┌──────▼──────┐ │ │ │ Ring Buffer │ │ │ │ (flush 5s) │ │ │ └──────┬──────┘ │ └─────────┼────────────┘ │ ▼ ┌──────────────────┐ │ GOVERN Platform │ │ (telemetry API) │ └──────────────────┘Key design principles
Non-blocking by design
The Probe never adds latency to the request path beyond the proxy overhead (typically 2-4ms). Scoring runs concurrently with the response delivery. Telemetry is buffered locally and flushed in background batches.
Protocol transparency
The Probe forwards all HTTP headers, authentication tokens, and request bodies verbatim. It does not modify your requests or responses. Your model provider never sees the Probe — it sees your exact request.
Stateless operation
Each Probe instance maintains only the ring buffer and the policy cache. All durable state lives in the GOVERN platform. You can run multiple Probe replicas without coordination.
Fail-open by default
If the Probe encounters an error (network partition, scoring timeout), it forwards the model response unchanged and flags the event as unscored. Your application always gets a response. Scoring failures do not become application failures.
What gets scored
Every inference turn is scored. A turn is one request/response pair:
- Request — the prompt sent to the model (messages array, system prompt, tool definitions)
- Response — the model’s completion (content, tool calls, finish reason)
- Context — model ID, token counts, latency, HTTP status
The Probe does not store prompt or response content on disk. Content is held in memory for scoring and then discarded. Only the scores and metadata are transmitted to the GOVERN platform.
Supported model providers
| Provider | API base | Notes |
|---|---|---|
| Anthropic | https://api.anthropic.com | Claude 3.x, streaming supported |
| OpenAI | https://api.openai.com | GPT-4o, o1, streaming supported |
| Azure OpenAI | https://*.openai.azure.com | All deployments |
| Google Vertex | https://*.aiplatform.googleapis.com | Gemini models |
| Groq | https://api.groq.com | Llama, Mixtral |
| Ollama | http://localhost:11434 | Local models |
| Any OpenAI-compatible | Custom base URL | Set UPSTREAM_URL |
Next steps
- Quickstart — running in 30 seconds
- Docker deployment — production-ready Docker setup
- Configuration reference — full
config/default.yamloptions