Health Endpoints
Overview
GOVERN Probe exposes three health endpoints on the same port as the proxy (default: 4020):
| Path | Purpose | Use for |
|---|---|---|
/healthz | Liveness | Kubernetes liveness probe, load balancer health check |
/readyz | Readiness | Kubernetes readiness probe — checks upstream reachability |
/metrics | Prometheus metrics | Scraping by Prometheus/Grafana |
GET /healthz
Returns 200 OK when the Probe process is running. Does not check upstream or GOVERN platform connectivity.
Response (healthy):
HTTP/1.1 200 OKContent-Type: application/json
{ "status": "ok", "version": "1.2.0", "probe_id": "probe-production-1", "uptime_seconds": 3847, "timestamp": "2026-04-12T14:23:01.432Z"}Response (unhealthy — process starting):
HTTP/1.1 503 Service Unavailable
{"status": "starting"}The liveness endpoint returns 503 only during the first 10 seconds of startup (configurable via health_start_period_seconds). After that, it always returns 200 as long as the process is running.
GET /readyz
Returns 200 OK when the Probe is ready to accept traffic. Performs active checks:
- Upstream reachability — a lightweight OPTIONS or HEAD request to
UPSTREAM_URL - GOVERN platform — a ping to
api.govern.archetypal.ai - Ring buffer — not full (below 95% capacity)
Response (ready):
HTTP/1.1 200 OK
{ "status": "ready", "checks": { "upstream": { "status": "ok", "latency_ms": 47 }, "govern_platform": { "status": "ok", "latency_ms": 82 }, "ring_buffer": { "status": "ok", "utilization": 0.12 } }}Response (not ready):
HTTP/1.1 503 Service Unavailable
{ "status": "not_ready", "checks": { "upstream": { "status": "ok", "latency_ms": 45 }, "govern_platform": { "status": "error", "error": "connection timeout" }, "ring_buffer": { "status": "ok", "utilization": 0.08 } }}GET /metrics
Returns Prometheus-format metrics. See the Metrics Reference for the full metric list.
curl http://localhost:4020/metricsThe metrics endpoint does not require authentication by default. Restrict access via network policy if needed.
Kubernetes probe configuration
livenessProbe: httpGet: path: /healthz port: 4020 initialDelaySeconds: 10 periodSeconds: 30 failureThreshold: 3
readinessProbe: httpGet: path: /readyz port: 4020 initialDelaySeconds: 5 periodSeconds: 10 failureThreshold: 3
startupProbe: httpGet: path: /healthz port: 4020 initialDelaySeconds: 0 periodSeconds: 2 failureThreshold: 15 # 30 seconds to startLoad balancer health check
For AWS ALB, GCP Cloud Load Balancing, or NGINX:
# NGINX upstream health checkupstream govern_probe { server 10.0.0.1:4020; server 10.0.0.2:4020; keepalive 32;}
# Health check via lua or nginx-upcheck modulecheck interval=3000 rise=2 fall=3 timeout=2000 type=http;check_http_send "GET /healthz HTTP/1.0\r\n\r\n";check_http_expect_alive http_2xx;