Monitoring
The nbe-operator exposes Prometheus metrics and health check endpoints for observability.
Operator Metrics
The operator exposes metrics at :8080/metrics in Prometheus format.
Prometheus Annotations
When metrics.enabled: true (the default), Prometheus scrape annotations are added to the operator pod:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
| Key | Type | Default | Description |
|---|---|---|---|
metrics.enabled | bool | true | Enable Prometheus annotations on the operator pod |
metrics.podAnnotations | bool | true | Add standard prometheus.io/* annotations |
ServiceMonitor
For clusters using the Prometheus Operator, create a ServiceMonitor:
serviceMonitor:
enabled: true
interval: "30s"
scrapeTimeout: "10s"
Full ServiceMonitor configuration:
| Key | Type | Default | Description |
|---|---|---|---|
serviceMonitor.enabled | bool | false | Create a ServiceMonitor resource |
serviceMonitor.namespace | string | Release namespace | Target namespace |
serviceMonitor.labels | object | {} | Labels for ServiceMonitor selection |
serviceMonitor.interval | string | 30s | Scrape interval |
serviceMonitor.scrapeTimeout | string | 10s | Scrape timeout |
serviceMonitor.scheme | string | http | HTTP scheme |
serviceMonitor.honorLabels | bool | true | Honor labels from metrics |
NetBox Application Metrics
Enable the NetBox /metrics endpoint separately via the NetBoxEnterprise spec:
netboxEnterprise:
spec:
netbox:
config:
metricsEnabled: true
The operator aggregates metrics from NetBox and Diode deployments and exposes them at its own /metrics endpoint, so Prometheus only needs to scrape the operator.
Deployment metrics aggregation (collecting metrics from NetBox/Diode pods) is a new feature. Contact NetBox Labs support for early access.
Diode Component Metrics
Enable per-component metrics with telemetry configuration:
netboxEnterprise:
spec:
diode:
config:
ingester:
telemetryConfig:
metricsEnabled: true
reconciler:
telemetryConfig:
metricsEnabled: true
auth:
telemetryConfig:
metricsEnabled: true
Supported exporters: prometheus, otlp, console, none.
Health Check Endpoints
The operator exposes two health endpoints:
| Endpoint | Port | Purpose |
|---|---|---|
/healthz | 8081 | Liveness probe — is the operator process alive? |
/readyz | 8081 | Readiness probe — is the operator ready to serve? |
These are configured as Kubernetes liveness and readiness probes on the operator pod.
Operator Log Levels
Adjust operator verbosity for debugging:
operator:
logging:
level: "debug" # Or: info, info,kube=warn, operator=debug,info
format: "json" # Or: auto, compact, pretty, gcp, aws, otlp
| Format | Use Case |
|---|---|
auto | Detects environment (JSON in Kubernetes, compact locally) |
json | Structured logging for log aggregation (Elasticsearch, Loki) |
compact | Single-line human-readable for local development |
pretty | Multi-line verbose for debugging |
gcp | Google Cloud Logging format (auto-ingested in GKE) |
aws | CloudWatch-optimized JSON (auto-ingested in EKS) |
otlp | OpenTelemetry export for Azure Monitor, Jaeger, etc. |
Next Steps
- Troubleshooting — Using metrics and logs to diagnose issues
- Helm Values Reference — Full operator configuration