Troubleshooting

This section covers common issues and their solutions when deploying NetBox Enterprise.

Common Issues

Database Connection Failures

Symptoms:

Pods fail to start
Errors about database connections in logs
"could not connect to server" messages

Diagnosis:

# Test database connectivity from a pod
kubectl run -it --rm debug --image=netboxlabs/nbe-utils:5 --restart=Never -- \
  psql postgresql://username:password@host:5432/netbox

# Check environment variables
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
  env | grep -E "(DATABASE|REDIS)"

Common Solutions:

Verify database credentials in values file
Ensure database service is accessible from Kubernetes cluster
Check security groups/firewall rules
Verify database is accepting connections
Ensure database user has proper permissions

Redis Authentication Issues

Symptoms:

"NOAUTH Authentication required" errors
Worker pods unable to connect to Redis
Cache connection failures

Solution:

Verify Redis password matches in both Redis deployment and NetBox configuration
Use standalone Redis instead of cluster mode
Ensure Redis URL includes password: redis://:password@host:6379/0

Image Pull Errors

Symptoms:

Pods stuck in ImagePullBackOff or ErrImagePull
"unauthorized" or "not found" errors

Diagnosis:

# Check pod events
kubectl describe pod -n netbox-enterprise <pod-name>

# Verify secret exists
kubectl get secret -n netbox-enterprise registry-credentials

Solutions:

Verify registry credentials are correct
Ensure image pull secret is referenced in values file
Check network connectivity to registry
Verify image names and tags

License Configuration Issues

Symptoms:

"Invalid license" errors
Enterprise features not available

Solution:

Verify license ID is correctly set in values file
Ensure license ID matches in both licenseID and global.license.id
Check license expiration date
Contact NetBox Labs support for license issues

Persistent Volume Issues

Symptoms:

Pods stuck in Pending state
PVC not bound
"no persistent volumes available" errors

Diagnosis:

# Check PVC status
kubectl get pvc -n netbox-enterprise

# Check available storage classes
kubectl get storageclass

# Describe PVC for events
kubectl describe pvc -n netbox-enterprise <pvc-name>

Solutions:

Set a default storage class (most common issue):

# Check current storage classes
kubectl get storageclass

# Set default storage class (adjust name for your environment)
# AWS EKS:
kubectl patch storageclass gp2 -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

# Azure AKS:
kubectl patch storageclass managed-csi -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

# Google GKE:
kubectl patch storageclass standard -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

Verify storage class exists and has required features
Check if cluster has available storage capacity
Ensure PVC size is within storage class limits
Verify node has required CSI drivers installed

IngressClass Conflicts

Symptoms:

Installation fails with "IngressClass 'nginx' already exists"
Error about missing Helm labels on IngressClass
"invalid ownership metadata" errors

Diagnosis:

# Check existing IngressClass
kubectl get ingressclass

# Check if an ingress controller is already installed
kubectl get pods -A | grep -i nginx

Solutions:

Option 1: Use the existing ingress controller

# Disable the chart's built-in ingress controller
diode:
  ingress-nginx:
    enabled: false

Option 2: Disable IngressClass creation only

# Keep the chart's controller but don't create IngressClass
diode:
  ingress-nginx:
    controller:
      ingressClassResource:
        enabled: false

Option 3: Remove existing ingress controller

# If using Minikube
minikube addons disable ingress

# Then proceed with installation using chart's controller

Registry Authentication Issues

Problem: Helm registry authentication failures

# Re-authenticate to correct registry
helm registry login registry.netboxlabs.com \
  --username "$USERNAME" \
  --password "$SERVICE_ACCOUNT_TOKEN"

Problem: Namespace creation failures

# Always use --create-namespace flag
helm install netbox-enterprise \
  oci://registry.netboxlabs.com/netbox-enterprise/beta/netbox-enterprise \
  --namespace netbox-enterprise \
  --create-namespace \
  --values netbox-enterprise-values.yaml

Problem: Values file missing license signatures

Solution: Always start with the generated values file from Enterprise Portal
Never: Create values files from scratch

Diode Ingress Routing Issues

Symptoms:

✅ NetBox web interface loads normally at your domain
❌ Diode plugin shows connection errors in NetBox logs
❌ Diode agents can't connect to the reconciler service
❌ Error messages about "Diode API not reachable" in NetBox startup logs

Diagnosis:

Check Diode Plugin Status:

kubectl logs -n netbox-enterprise deployment/netbox-enterprise | grep -i diode

Expected output when working:

ⓘ Enabling Diode plugin
ⓘ Using ingress host: yourdomain.com
✅ Diode plugin configured successfully

Problem indicators:

⚠️ No valid ingress host found, Diode API will not be reachable
⚠️ Failed to connect to Diode service

Check Ingress Configuration:

kubectl get ingress -n netbox-enterprise -o yaml | grep -A 10 -B 5 "paths:"

Problem: Missing /diode route - you only see routes for / and /(.*) going to netbox-enterprise:

paths:
  - path: /
    pathType: Prefix
    backend:
      service:
        name: netbox-enterprise
        port:
          number: 80
  - path: /(.*)
    pathType: ImplementationSpecific  
    backend:
      service:
        name: netbox-enterprise
        port:
          number: 80

Test Diode gRPC Endpoint:
```
# This should fail with broken routing
curl -v https://yourdomain.com/diode
```
Problem response: Returns NetBox HTML instead of gRPC response

Solutions:

Option 1: Add Missing /diode Route

Create a values override file to fix the ingress routing:

# diode-routing-fix.yaml
diode:
  enabled: true
  ingressNginx:
    extraHttpPaths:
      # ADD THIS FIRST - routes /diode to actual Diode service
      - path: /diode
        pathType: Prefix
        serviceName: netbox-enterprise-diode-reconciler
        servicePort: 8081
      # Keep existing NetBox routes
      - path: /
        pathType: Prefix
        serviceName: netbox-enterprise
        servicePort: 80
      - path: /(.*)
        pathType: ImplementationSpecific
        serviceName: netbox-enterprise
        servicePort: 80
    # Ensure gRPC is properly configured
    grpcAnnotations:
      nginx.ingress.kubernetes.io/ssl-redirect: "false"
      nginx.ingress.kubernetes.io/proxy-body-size: "25m"
      nginx.ingress.kubernetes.io/grpc-backend: "true"
    httpAnnotations:
      nginx.ingress.kubernetes.io/ssl-redirect: "false"

Apply the fix:

helm upgrade netbox-enterprise \
  oci://registry.netboxlabs.com/netbox-enterprise/beta/netbox-enterprise \
  --namespace netbox-enterprise \
  --values your-existing-values.yaml \
  --values diode-routing-fix.yaml

Option 2: Use Existing Ingress Controller

If you have an existing ingress controller and want to use it instead:

# existing-ingress.yaml  
diode:
  enabled: true
  # Disable diode's built-in ingress creation
  ingress-nginx:
    enabled: false
  # Configure to use your existing ingress
  ingressNginx:
    extraHttpPaths:
      - path: /diode
        pathType: Prefix
        serviceName: netbox-enterprise-diode-reconciler
        servicePort: 8081
      - path: /
        pathType: Prefix
        serviceName: netbox-enterprise
        servicePort: 80

Option 3: Manual Ingress Resource

For complex ingress setups or when you need full control:

# manual-diode-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: netbox-enterprise-diode
  namespace: netbox-enterprise
  annotations:
    nginx.ingress.kubernetes.io/grpc-backend: "true"
    nginx.ingress.kubernetes.io/ssl-redirect: "false"
spec:
  rules:
  - host: yourdomain.com
    http:
      paths:
      - path: /diode
        pathType: Prefix
        backend:
          service:
            name: netbox-enterprise-diode-reconciler
            port:
              number: 8081

kubectl apply -f manual-diode-ingress.yaml

Option 4: Temporary Disable

If you need NetBox working immediately while investigating:

# disable-diode.yaml
diode:
  enabled: false

Verification:

Check Updated Ingress:

kubectl get ingress -n netbox-enterprise -o yaml | grep -A 15 "paths:"

Expected output:

paths:
  - path: /diode                    # ← NEW ROUTE ADDED
    pathType: Prefix
    backend:
      service:
        name: netbox-enterprise-diode-reconciler
        port:
          number: 8081
  - path: /
    pathType: Prefix
    backend:
      service:
        name: netbox-enterprise
        port:
          number: 80

Check NetBox Logs:

kubectl logs -n netbox-enterprise deployment/netbox-enterprise -f | grep -i diode

Expected success output:

ⓘ Enabling Diode plugin
ⓘ Using ingress host: yourdomain.com
✅ Inferred ingress host from ingress controller
✅ Diode plugin configured successfully

Test gRPC Connectivity:

# Should now get gRPC response instead of HTML
curl -H "Content-Type: application/grpc" https://yourdomain.com/diode

Important Notes:

Route Order Matters: The /diode route must come BEFORE the catch-all / route
Complex Configurations: In environments with multiple ingress controllers, custom ingress classes, or complex routing rules, the automatic route insertion may not work as expected. You may need to manually configure ingress resources or adjust the order of path rules
Service Verification: Ensure the netbox-enterprise-diode-reconciler service exists:
```
kubectl get svc -n netbox-enterprise | grep diode-reconciler
```
Port Numbers: Diode reconciler uses port 8081, NetBox uses port 80
gRPC Annotations: Required for proper gRPC routing through NGINX ingress controllers

Additional Diode Troubleshooting:

If issues persist, check all Diode components:

# Check all Diode services are running
kubectl get pods -n netbox-enterprise | grep diode

# Collect logs from all Diode components
kubectl logs -n netbox-enterprise deployment/netbox-enterprise-diode-reconciler
kubectl logs -n netbox-enterprise deployment/netbox-enterprise-diode-ingester  
kubectl logs -n netbox-enterprise deployment/netbox-enterprise-diode-auth

Deployment Issues in Environments with Restricted Connectivity

Symptom: Images not found in private registry

Solutions:

Verify all images are mirrored:

# Check if images exist in your registry
docker pull mycompany.jfrog.io/nbe/netbox-enterprise/nbe-core:4.2.9_main-90

Validate private registry configuration:

# Check the generated private registry values
cat my-private-registry.yaml

# Ensure all image references are updated
grep -r "MY_REGISTRY" my-private-registry.yaml  # Should return no results

Use the private registry script helper:

# Download the helper script
curl -O https://netboxlabs.com/docs/files/private-registry.sh
chmod +x private-registry.sh

# Generate configuration
./private-registry.sh mycompany.jfrog.io/nbe > generated-private-registry.yaml

# Compare with manual configuration
diff my-private-registry.yaml generated-private-registry.yaml

Note: The script is provided as a troubleshooting aid. The recommended approach is manual configuration using the template file for better control and understanding.

Usage:

# Script generates values for all image references
./private-registry.sh your-registry.com/nbe > private-registry-values.yaml

# Then use in installation
helm install netbox-enterprise ./netbox-enterprise-1.11.4.tgz \
  --values netbox-enterprise-values.yaml \
  --values private-registry-values.yaml \
  --namespace netbox-enterprise

Pod Troubleshooting

Check Pod Status

# List all pods with wide output
kubectl get pods -n netbox-enterprise -o wide

# Describe specific pod
kubectl describe pod -n netbox-enterprise <pod-name>

# Check pod events
kubectl get events -n netbox-enterprise --sort-by='.lastTimestamp'

View Pod Logs

# Application logs
kubectl logs -n netbox-enterprise deployment/netbox-enterprise --tail=100 -f

# Worker logs
kubectl logs -n netbox-enterprise deployment/netbox-enterprise-worker --tail=100 -f

# Previous container logs (if restarted)
kubectl logs -n netbox-enterprise <pod-name> --previous

Execute Commands in Pods

# Open shell in pod
kubectl exec -it -n netbox-enterprise deployment/netbox-enterprise -- /bin/bash

# Run Django management commands
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
  python manage.py showmigrations

# Check Django settings
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
  python manage.py shell -c "from django.conf import settings; print(settings.DATABASES['default'])"

Database Connectivity

Test PostgreSQL Connection

# From NetBox pod
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
  python manage.py dbshell -c "SELECT version();"

# Using psql directly
kubectl run -it --rm psql-test --image=netboxlabs/nbe-utils:5 --restart=Never -- \
  psql "postgresql://username:password@host:5432/netbox" -c "SELECT 1;"

Test Redis Connection

# From NetBox pod
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
  python -c "import redis; r=redis.from_url('redis://:password@host:6379/0'); print('Redis ping:', r.ping())"

# Using redis-cli
kubectl run -it --rm redis-test --image=redis:7 --restart=Never -- \
  redis-cli -h host -a password ping

Check Database Migrations

# Show migration status
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
  python manage.py showmigrations

# Run pending migrations (if needed)
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
  python manage.py migrate

Performance Issues

High Memory Usage

Symptoms:

Pods being OOMKilled
Slow response times
Memory alerts

Diagnosis:

# Check resource usage
kubectl top pods -n netbox-enterprise

# Check container limits
kubectl describe deployment -n netbox-enterprise netbox-enterprise | grep -A5 "Limits:"

# Check application memory usage
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
  ps aux | grep python

Solutions:

Increase memory limits in values file
Scale horizontally by increasing replica count

Slow Database Queries

Diagnosis:

# Check slow queries
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
  python manage.py dbshell -c "SELECT query, calls, mean_exec_time \
  FROM pg_stat_statements \
  ORDER BY mean_exec_time DESC LIMIT 10;"

# Check database connections
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
  python manage.py dbshell -c "SELECT count(*) FROM pg_stat_activity;"

Solutions:

Ensure database has adequate resources
Check for missing indexes
Review database connection pool settings
Consider scaling database vertically

Getting Help

Collect Diagnostic Information

When issues persist, generate a support bundle to send to NetBox Labs support:

# Install the support-bundle kubectl plugin (one-time setup)
curl https://krew.sh/support-bundle | bash

# Generate support bundle
kubectl support-bundle --namespace netbox-enterprise

The support bundle automatically collects:

Cluster information and resources
NetBox deployment status
Application logs
Helm values
Pod descriptions and events
Network configurations
Storage information

Contact NetBox Labs Support

When contacting support:

Include all diagnostic information
Provide clear description of the issue
List steps to reproduce
Include any error messages
Specify versions (Helm chart, Kubernetes, databases)

Additional Troubleshooting Resources

NetBox Documentation: https://netboxlabs.com/docs/
Kubernetes Documentation: https://kubernetes.io/docs/
PostgreSQL Documentation: https://www.postgresql.org/docs/
Redis Documentation: https://redis.io/documentation

Private Registry Resources

For deployments in environments with restricted connectivity, these resources are available to help with private registry configuration:

Private Registry Template: https://netboxlabs.com/docs/files/private-registry.yaml
Registry Configuration Script: https://netboxlabs.com/docs/files/private-registry.sh
Values Template: https://netboxlabs.com/docs/files/values-extra.yaml

The script automates the process of generating private registry configurations but manual configuration using the template is recommended for production deployments.

Common Issues​

Database Connection Failures​

Redis Authentication Issues​

Image Pull Errors​

License Configuration Issues​

Persistent Volume Issues​

IngressClass Conflicts​

Registry Authentication Issues​

Diode Ingress Routing Issues​

Deployment Issues in Environments with Restricted Connectivity​

Pod Troubleshooting​

Check Pod Status​

View Pod Logs​

Execute Commands in Pods​

Database Connectivity​

Test PostgreSQL Connection​

Test Redis Connection​

Check Database Migrations​

Performance Issues​

High Memory Usage​

Slow Database Queries​

Getting Help​

Collect Diagnostic Information​

Contact NetBox Labs Support​

Additional Troubleshooting Resources​

Private Registry Resources​

Common Issues

Database Connection Failures

Redis Authentication Issues

Image Pull Errors

License Configuration Issues

Persistent Volume Issues

IngressClass Conflicts

Registry Authentication Issues

Diode Ingress Routing Issues

Deployment Issues in Environments with Restricted Connectivity

Pod Troubleshooting

Check Pod Status

View Pod Logs

Execute Commands in Pods

Database Connectivity

Test PostgreSQL Connection

Test Redis Connection

Check Database Migrations

Performance Issues

High Memory Usage

Slow Database Queries

Getting Help

Collect Diagnostic Information

Contact NetBox Labs Support

Additional Troubleshooting Resources

Private Registry Resources