Skip to content

What is GOVERN Probe?

Overview

GOVERN Probe is a lightweight Docker container that operates as a transparent reverse proxy between your application and any AI model API. It requires zero changes to your application code — you redirect one environment variable, and every inference is automatically monitored.

The Probe intercepts both the outbound request and the inbound response. It scores the full conversation turn across five dimensions, buffers telemetry in a ring buffer, and flushes batches to the GOVERN platform asynchronously. Your application never waits for scoring — the Probe returns the model response immediately and scores in parallel.

Architecture

┌─────────────────┐ ┌──────────────────────┐ ┌──────────────────┐
│ Your App │──POST──▶│ GOVERN Probe │──POST──▶│ Model API │
│ (any language) │ │ :4020 │ │ (Anthropic, │
│ │◀──resp──│ │◀──resp──│ OpenAI, etc.) │
└─────────────────┘ │ ┌────────────────┐ │ └──────────────────┘
│ │ Scoring Engine │ │
│ │ • Security │ │
│ │ • Bias │ │
│ │ • Accuracy │ │
│ │ • Drift │ │
│ │ • Cost │ │
│ └────────────────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │ Ring Buffer │ │
│ │ (flush 5s) │ │
│ └──────┬──────┘ │
└─────────┼────────────┘
┌──────────────────┐
│ GOVERN Platform │
│ (telemetry API) │
└──────────────────┘

Key design principles

Non-blocking by design

The Probe never adds latency to the request path beyond the proxy overhead (typically 2-4ms). Scoring runs concurrently with the response delivery. Telemetry is buffered locally and flushed in background batches.

Protocol transparency

The Probe forwards all HTTP headers, authentication tokens, and request bodies verbatim. It does not modify your requests or responses. Your model provider never sees the Probe — it sees your exact request.

Stateless operation

Each Probe instance maintains only the ring buffer and the policy cache. All durable state lives in the GOVERN platform. You can run multiple Probe replicas without coordination.

Fail-open by default

If the Probe encounters an error (network partition, scoring timeout), it forwards the model response unchanged and flags the event as unscored. Your application always gets a response. Scoring failures do not become application failures.

What gets scored

Every inference turn is scored. A turn is one request/response pair:

  • Request — the prompt sent to the model (messages array, system prompt, tool definitions)
  • Response — the model’s completion (content, tool calls, finish reason)
  • Context — model ID, token counts, latency, HTTP status

The Probe does not store prompt or response content on disk. Content is held in memory for scoring and then discarded. Only the scores and metadata are transmitted to the GOVERN platform.

Supported model providers

ProviderAPI baseNotes
Anthropichttps://api.anthropic.comClaude 3.x, streaming supported
OpenAIhttps://api.openai.comGPT-4o, o1, streaming supported
Azure OpenAIhttps://*.openai.azure.comAll deployments
Google Vertexhttps://*.aiplatform.googleapis.comGemini models
Groqhttps://api.groq.comLlama, Mixtral
Ollamahttp://localhost:11434Local models
Any OpenAI-compatibleCustom base URLSet UPSTREAM_URL

Next steps