Skip to main content

Troubleshooting

This section covers common issues and their solutions when deploying NetBox Enterprise.

Common Issues

Database Connection Failures

Symptoms:

  • Pods fail to start
  • Errors about database connections in logs
  • "could not connect to server" messages

Diagnosis:

# Test database connectivity from a pod
kubectl run -it --rm debug --image=netboxlabs/nbe-utils:5 --restart=Never -- \
psql postgresql://username:password@host:5432/netbox

# Check environment variables
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
env | grep -E "(DATABASE|REDIS)"

Common Solutions:

  1. Verify database credentials in values file
  2. Ensure database service is accessible from Kubernetes cluster
  3. Check security groups/firewall rules
  4. Verify database is accepting connections
  5. Ensure database user has proper permissions

Redis Authentication Issues

Symptoms:

  • "NOAUTH Authentication required" errors
  • Worker pods unable to connect to Redis
  • Cache connection failures

Solution:

  1. Verify Redis password matches in both Redis deployment and NetBox configuration
  2. Use standalone Redis instead of cluster mode
  3. Ensure Redis URL includes password: redis://:password@host:6379/0

Image Pull Errors

Symptoms:

  • Pods stuck in ImagePullBackOff or ErrImagePull
  • "unauthorized" or "not found" errors

Diagnosis:

# Check pod events
kubectl describe pod -n netbox-enterprise <pod-name>

# Verify secret exists
kubectl get secret -n netbox-enterprise registry-credentials

Solutions:

  1. Verify registry credentials are correct
  2. Ensure image pull secret is referenced in values file
  3. Check network connectivity to registry
  4. Verify image names and tags

License Configuration Issues

Symptoms:

  • "Invalid license" errors
  • Enterprise features not available

Solution:

  1. Verify license ID is correctly set in values file
  2. Ensure license ID matches in both licenseID and global.license.id
  3. Check license expiration date
  4. Contact NetBox Labs support for license issues

Persistent Volume Issues

Symptoms:

  • Pods stuck in Pending state
  • PVC not bound
  • "no persistent volumes available" errors

Diagnosis:

# Check PVC status
kubectl get pvc -n netbox-enterprise

# Check available storage classes
kubectl get storageclass

# Describe PVC for events
kubectl describe pvc -n netbox-enterprise <pvc-name>

Solutions:

  1. Set a default storage class (most common issue):

    # Check current storage classes
    kubectl get storageclass

    # Set default storage class (adjust name for your environment)
    # AWS EKS:
    kubectl patch storageclass gp2 -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

    # Azure AKS:
    kubectl patch storageclass managed-csi -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

    # Google GKE:
    kubectl patch storageclass standard -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
  2. Verify storage class exists and has required features

  3. Check if cluster has available storage capacity

  4. Ensure PVC size is within storage class limits

  5. Verify node has required CSI drivers installed

IngressClass Conflicts

Symptoms:

  • Installation fails with "IngressClass 'nginx' already exists"
  • Error about missing Helm labels on IngressClass
  • "invalid ownership metadata" errors

Diagnosis:

# Check existing IngressClass
kubectl get ingressclass

# Check if an ingress controller is already installed
kubectl get pods -A | grep -i nginx

Solutions:

Option 1: Use the existing ingress controller

# Disable the chart's built-in ingress controller
diode:
ingress-nginx:
enabled: false

Option 2: Disable IngressClass creation only

# Keep the chart's controller but don't create IngressClass
diode:
ingress-nginx:
controller:
ingressClassResource:
enabled: false

Option 3: Remove existing ingress controller

# If using Minikube
minikube addons disable ingress

# Then proceed with installation using chart's controller

Registry Authentication Issues

Problem: Helm registry authentication failures

# Re-authenticate to correct registry
helm registry login registry.netboxlabs.com \
--username "$USERNAME" \
--password "$SERVICE_ACCOUNT_TOKEN"

Problem: Namespace creation failures

# Always use --create-namespace flag
helm install netbox-enterprise \
oci://registry.netboxlabs.com/netbox-enterprise/beta/netbox-enterprise \
--namespace netbox-enterprise \
--create-namespace \
--values netbox-enterprise-values.yaml

Problem: Values file missing license signatures

  • Solution: Always start with the generated values file from Enterprise Portal
  • Never: Create values files from scratch

Diode Ingress Routing Issues

Symptoms:

  • ✅ NetBox web interface loads normally at your domain
  • ❌ Diode plugin shows connection errors in NetBox logs
  • ❌ Diode agents can't connect to the reconciler service
  • ❌ Error messages about "Diode API not reachable" in NetBox startup logs

Diagnosis:

  1. Check Diode Plugin Status:

    kubectl logs -n netbox-enterprise deployment/netbox-enterprise | grep -i diode

    Expected output when working:

    ⓘ Enabling Diode plugin
    ⓘ Using ingress host: yourdomain.com
    ✅ Diode plugin configured successfully

    Problem indicators:

    ⚠️ No valid ingress host found, Diode API will not be reachable
    ⚠️ Failed to connect to Diode service
  2. Check Ingress Configuration:

    kubectl get ingress -n netbox-enterprise -o yaml | grep -A 10 -B 5 "paths:"

    Problem: Missing /diode route - you only see routes for / and /(.*) going to netbox-enterprise:

    paths:
    - path: /
    pathType: Prefix
    backend:
    service:
    name: netbox-enterprise
    port:
    number: 80
    - path: /(.*)
    pathType: ImplementationSpecific
    backend:
    service:
    name: netbox-enterprise
    port:
    number: 80
  3. Test Diode gRPC Endpoint:

    # This should fail with broken routing
    curl -v https://yourdomain.com/diode

    Problem response: Returns NetBox HTML instead of gRPC response

Solutions:

Option 1: Add Missing /diode Route

Create a values override file to fix the ingress routing:

# diode-routing-fix.yaml
diode:
enabled: true
ingressNginx:
extraHttpPaths:
# ADD THIS FIRST - routes /diode to actual Diode service
- path: /diode
pathType: Prefix
serviceName: netbox-enterprise-diode-reconciler
servicePort: 8081
# Keep existing NetBox routes
- path: /
pathType: Prefix
serviceName: netbox-enterprise
servicePort: 80
- path: /(.*)
pathType: ImplementationSpecific
serviceName: netbox-enterprise
servicePort: 80
# Ensure gRPC is properly configured
grpcAnnotations:
nginx.ingress.kubernetes.io/ssl-redirect: "false"
nginx.ingress.kubernetes.io/proxy-body-size: "25m"
nginx.ingress.kubernetes.io/grpc-backend: "true"
httpAnnotations:
nginx.ingress.kubernetes.io/ssl-redirect: "false"

Apply the fix:

helm upgrade netbox-enterprise \
oci://registry.netboxlabs.com/netbox-enterprise/beta/netbox-enterprise \
--namespace netbox-enterprise \
--values your-existing-values.yaml \
--values diode-routing-fix.yaml

Option 2: Use Existing Ingress Controller

If you have an existing ingress controller and want to use it instead:

# existing-ingress.yaml  
diode:
enabled: true
# Disable diode's built-in ingress creation
ingress-nginx:
enabled: false
# Configure to use your existing ingress
ingressNginx:
extraHttpPaths:
- path: /diode
pathType: Prefix
serviceName: netbox-enterprise-diode-reconciler
servicePort: 8081
- path: /
pathType: Prefix
serviceName: netbox-enterprise
servicePort: 80

Option 3: Manual Ingress Resource

For complex ingress setups or when you need full control:

# manual-diode-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: netbox-enterprise-diode
namespace: netbox-enterprise
annotations:
nginx.ingress.kubernetes.io/grpc-backend: "true"
nginx.ingress.kubernetes.io/ssl-redirect: "false"
spec:
rules:
- host: yourdomain.com
http:
paths:
- path: /diode
pathType: Prefix
backend:
service:
name: netbox-enterprise-diode-reconciler
port:
number: 8081
kubectl apply -f manual-diode-ingress.yaml

Option 4: Temporary Disable

If you need NetBox working immediately while investigating:

# disable-diode.yaml
diode:
enabled: false

Verification:

  1. Check Updated Ingress:

    kubectl get ingress -n netbox-enterprise -o yaml | grep -A 15 "paths:"

    Expected output:

    paths:
    - path: /diode # ← NEW ROUTE ADDED
    pathType: Prefix
    backend:
    service:
    name: netbox-enterprise-diode-reconciler
    port:
    number: 8081
    - path: /
    pathType: Prefix
    backend:
    service:
    name: netbox-enterprise
    port:
    number: 80
  2. Check NetBox Logs:

    kubectl logs -n netbox-enterprise deployment/netbox-enterprise -f | grep -i diode

    Expected success output:

    ⓘ Enabling Diode plugin
    ⓘ Using ingress host: yourdomain.com
    ✅ Inferred ingress host from ingress controller
    ✅ Diode plugin configured successfully
  3. Test gRPC Connectivity:

    # Should now get gRPC response instead of HTML
    curl -H "Content-Type: application/grpc" https://yourdomain.com/diode

Important Notes:

  • Route Order Matters: The /diode route must come BEFORE the catch-all / route
  • Complex Configurations: In environments with multiple ingress controllers, custom ingress classes, or complex routing rules, the automatic route insertion may not work as expected. You may need to manually configure ingress resources or adjust the order of path rules
  • Service Verification: Ensure the netbox-enterprise-diode-reconciler service exists:
    kubectl get svc -n netbox-enterprise | grep diode-reconciler
  • Port Numbers: Diode reconciler uses port 8081, NetBox uses port 80
  • gRPC Annotations: Required for proper gRPC routing through NGINX ingress controllers

Additional Diode Troubleshooting:

If issues persist, check all Diode components:

# Check all Diode services are running
kubectl get pods -n netbox-enterprise | grep diode

# Collect logs from all Diode components
kubectl logs -n netbox-enterprise deployment/netbox-enterprise-diode-reconciler
kubectl logs -n netbox-enterprise deployment/netbox-enterprise-diode-ingester
kubectl logs -n netbox-enterprise deployment/netbox-enterprise-diode-auth

Deployment Issues in Environments with Restricted Connectivity

Symptom: Images not found in private registry

Solutions:

  1. Verify all images are mirrored:

    # Check if images exist in your registry
    docker pull mycompany.jfrog.io/nbe/netbox-enterprise/nbe-core:4.2.9_main-90
  2. Validate private registry configuration:

    # Check the generated private registry values
    cat my-private-registry.yaml

    # Ensure all image references are updated
    grep -r "MY_REGISTRY" my-private-registry.yaml # Should return no results
  3. Use the private registry script helper:

    # Download the helper script
    curl -O https://netboxlabs.com/docs/files/private-registry.sh
    chmod +x private-registry.sh

    # Generate configuration
    ./private-registry.sh mycompany.jfrog.io/nbe > generated-private-registry.yaml

    # Compare with manual configuration
    diff my-private-registry.yaml generated-private-registry.yaml

    Note: The script is provided as a troubleshooting aid. The recommended approach is manual configuration using the template file for better control and understanding.

    Usage:

    # Script generates values for all image references
    ./private-registry.sh your-registry.com/nbe > private-registry-values.yaml

    # Then use in installation
    helm install netbox-enterprise ./netbox-enterprise-1.11.4.tgz \
    --values netbox-enterprise-values.yaml \
    --values private-registry-values.yaml \
    --namespace netbox-enterprise

Pod Troubleshooting

Check Pod Status

# List all pods with wide output
kubectl get pods -n netbox-enterprise -o wide

# Describe specific pod
kubectl describe pod -n netbox-enterprise <pod-name>

# Check pod events
kubectl get events -n netbox-enterprise --sort-by='.lastTimestamp'

View Pod Logs

# Application logs
kubectl logs -n netbox-enterprise deployment/netbox-enterprise --tail=100 -f

# Worker logs
kubectl logs -n netbox-enterprise deployment/netbox-enterprise-worker --tail=100 -f

# Previous container logs (if restarted)
kubectl logs -n netbox-enterprise <pod-name> --previous

Execute Commands in Pods

# Open shell in pod
kubectl exec -it -n netbox-enterprise deployment/netbox-enterprise -- /bin/bash

# Run Django management commands
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py showmigrations

# Check Django settings
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py shell -c "from django.conf import settings; print(settings.DATABASES['default'])"

Database Connectivity

Test PostgreSQL Connection

# From NetBox pod
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py dbshell -c "SELECT version();"

# Using psql directly
kubectl run -it --rm psql-test --image=netboxlabs/nbe-utils:5 --restart=Never -- \
psql "postgresql://username:password@host:5432/netbox" -c "SELECT 1;"

Test Redis Connection

# From NetBox pod
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python -c "import redis; r=redis.from_url('redis://:password@host:6379/0'); print('Redis ping:', r.ping())"

# Using redis-cli
kubectl run -it --rm redis-test --image=redis:7 --restart=Never -- \
redis-cli -h host -a password ping

Check Database Migrations

# Show migration status
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py showmigrations

# Run pending migrations (if needed)
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py migrate

Performance Issues

High Memory Usage

Symptoms:

  • Pods being OOMKilled
  • Slow response times
  • Memory alerts

Diagnosis:

# Check resource usage
kubectl top pods -n netbox-enterprise

# Check container limits
kubectl describe deployment -n netbox-enterprise netbox-enterprise | grep -A5 "Limits:"

# Check application memory usage
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
ps aux | grep python

Solutions:

  1. Increase memory limits in values file
  2. Scale horizontally by increasing replica count

Slow Database Queries

Diagnosis:

# Check slow queries
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py dbshell -c "SELECT query, calls, mean_exec_time \
FROM pg_stat_statements \
ORDER BY mean_exec_time DESC LIMIT 10;"

# Check database connections
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py dbshell -c "SELECT count(*) FROM pg_stat_activity;"

Solutions:

  1. Ensure database has adequate resources
  2. Check for missing indexes
  3. Review database connection pool settings
  4. Consider scaling database vertically

Getting Help

Collect Diagnostic Information

When issues persist, generate a support bundle to send to NetBox Labs support:

# Install the support-bundle kubectl plugin (one-time setup)
curl https://krew.sh/support-bundle | bash

# Generate support bundle
kubectl support-bundle --namespace netbox-enterprise

The support bundle automatically collects:

  • Cluster information and resources
  • NetBox deployment status
  • Application logs
  • Helm values
  • Pod descriptions and events
  • Network configurations
  • Storage information

Contact NetBox Labs Support

When contacting support:

  1. Include all diagnostic information
  2. Provide clear description of the issue
  3. List steps to reproduce
  4. Include any error messages
  5. Specify versions (Helm chart, Kubernetes, databases)

Additional Troubleshooting Resources

Private Registry Resources

For deployments in environments with restricted connectivity, these resources are available to help with private registry configuration:

The script automates the process of generating private registry configurations but manual configuration using the template is recommended for production deployments.