Skip to content

Health Endpoints

Overview

GOVERN Probe exposes three health endpoints on the same port as the proxy (default: 4020):

PathPurposeUse for
/healthzLivenessKubernetes liveness probe, load balancer health check
/readyzReadinessKubernetes readiness probe — checks upstream reachability
/metricsPrometheus metricsScraping by Prometheus/Grafana

GET /healthz

Returns 200 OK when the Probe process is running. Does not check upstream or GOVERN platform connectivity.

Response (healthy):

HTTP/1.1 200 OK
Content-Type: application/json
{
"status": "ok",
"version": "1.2.0",
"probe_id": "probe-production-1",
"uptime_seconds": 3847,
"timestamp": "2026-04-12T14:23:01.432Z"
}

Response (unhealthy — process starting):

HTTP/1.1 503 Service Unavailable
{"status": "starting"}

The liveness endpoint returns 503 only during the first 10 seconds of startup (configurable via health_start_period_seconds). After that, it always returns 200 as long as the process is running.

GET /readyz

Returns 200 OK when the Probe is ready to accept traffic. Performs active checks:

  1. Upstream reachability — a lightweight OPTIONS or HEAD request to UPSTREAM_URL
  2. GOVERN platform — a ping to api.govern.archetypal.ai
  3. Ring buffer — not full (below 95% capacity)

Response (ready):

HTTP/1.1 200 OK
{
"status": "ready",
"checks": {
"upstream": { "status": "ok", "latency_ms": 47 },
"govern_platform": { "status": "ok", "latency_ms": 82 },
"ring_buffer": { "status": "ok", "utilization": 0.12 }
}
}

Response (not ready):

HTTP/1.1 503 Service Unavailable
{
"status": "not_ready",
"checks": {
"upstream": { "status": "ok", "latency_ms": 45 },
"govern_platform": { "status": "error", "error": "connection timeout" },
"ring_buffer": { "status": "ok", "utilization": 0.08 }
}
}

GET /metrics

Returns Prometheus-format metrics. See the Metrics Reference for the full metric list.

Terminal window
curl http://localhost:4020/metrics

The metrics endpoint does not require authentication by default. Restrict access via network policy if needed.

Kubernetes probe configuration

livenessProbe:
httpGet:
path: /healthz
port: 4020
initialDelaySeconds: 10
periodSeconds: 30
failureThreshold: 3
readinessProbe:
httpGet:
path: /readyz
port: 4020
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 3
startupProbe:
httpGet:
path: /healthz
port: 4020
initialDelaySeconds: 0
periodSeconds: 2
failureThreshold: 15 # 30 seconds to start

Load balancer health check

For AWS ALB, GCP Cloud Load Balancing, or NGINX:

# NGINX upstream health check
upstream govern_probe {
server 10.0.0.1:4020;
server 10.0.0.2:4020;
keepalive 32;
}
# Health check via lua or nginx-upcheck module
check interval=3000 rise=2 fall=3 timeout=2000 type=http;
check_http_send "GET /healthz HTTP/1.0\r\n\r\n";
check_http_expect_alive http_2xx;