Health Endpoints

Overview

GOVERN Probe exposes three health endpoints on the same port as the proxy (default: 4020):

Path	Purpose	Use for
`/healthz`	Liveness	Kubernetes liveness probe, load balancer health check
`/readyz`	Readiness	Kubernetes readiness probe — checks upstream reachability
`/metrics`	Prometheus metrics	Scraping by Prometheus/Grafana

GET /healthz

Returns 200 OK when the Probe process is running. Does not check upstream or GOVERN platform connectivity.

Response (healthy):

HTTP/1.1 200 OK
Content-Type: application/json

{
  "status": "ok",
  "version": "1.2.0",
  "probe_id": "probe-production-1",
  "uptime_seconds": 3847,
  "timestamp": "2026-04-12T14:23:01.432Z"
}

Response (unhealthy — process starting):

HTTP/1.1 503 Service Unavailable

{"status": "starting"}

The liveness endpoint returns 503 only during the first 10 seconds of startup (configurable via health_start_period_seconds). After that, it always returns 200 as long as the process is running.

GET /readyz

Returns 200 OK when the Probe is ready to accept traffic. Performs active checks:

Upstream reachability — a lightweight OPTIONS or HEAD request to UPSTREAM_URL
GOVERN platform — a ping to api.govern.archetypal.ai
Ring buffer — not full (below 95% capacity)

Response (ready):

HTTP/1.1 200 OK

{
  "status": "ready",
  "checks": {
    "upstream": { "status": "ok", "latency_ms": 47 },
    "govern_platform": { "status": "ok", "latency_ms": 82 },
    "ring_buffer": { "status": "ok", "utilization": 0.12 }
  }
}

Response (not ready):

HTTP/1.1 503 Service Unavailable

{
  "status": "not_ready",
  "checks": {
    "upstream": { "status": "ok", "latency_ms": 45 },
    "govern_platform": { "status": "error", "error": "connection timeout" },
    "ring_buffer": { "status": "ok", "utilization": 0.08 }
  }
}

When govern_platform is unhealthy, the Probe still functions as a proxy. It buffers telemetry locally and retries when the platform is available. The readiness check fails to prevent Kubernetes from routing traffic to a Probe that is accumulating a backlog, but this is configurable:

health:
  readyz_require_govern_platform: false  # default: true

GET /metrics

Returns Prometheus-format metrics. See the Metrics Reference for the full metric list.

curl http://localhost:4020/metrics

The metrics endpoint does not require authentication by default. Restrict access via network policy if needed.

Kubernetes probe configuration

livenessProbe:
  httpGet:
    path: /healthz
    port: 4020
  initialDelaySeconds: 10
  periodSeconds: 30
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /readyz
    port: 4020
  initialDelaySeconds: 5
  periodSeconds: 10
  failureThreshold: 3

startupProbe:
  httpGet:
    path: /healthz
    port: 4020
  initialDelaySeconds: 0
  periodSeconds: 2
  failureThreshold: 15  # 30 seconds to start

Load balancer health check

For AWS ALB, GCP Cloud Load Balancing, or NGINX:

# NGINX upstream health check
upstream govern_probe {
  server 10.0.0.1:4020;
  server 10.0.0.2:4020;
  keepalive 32;
}

# Health check via lua or nginx-upcheck module
check interval=3000 rise=2 fall=3 timeout=2000 type=http;
check_http_send "GET /healthz HTTP/1.0\r\n\r\n";
check_http_expect_alive http_2xx;