Troubleshooting
This section covers common issues and their solutions when deploying NetBox Enterprise.
Common Issues
Database Connection Failures
Symptoms:
- Pods fail to start
- Errors about database connections in logs
- "could not connect to server" messages
Diagnosis:
# Test database connectivity from a pod
kubectl run -it --rm debug --image=netboxlabs/nbe-utils:5 --restart=Never -- \
psql postgresql://username:password@host:5432/netbox
# Check environment variables
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
env | grep -E "(DATABASE|REDIS)"
Common Solutions:
- Verify database credentials in values file
- Ensure database service is accessible from Kubernetes cluster
- Check security groups/firewall rules
- Verify database is accepting connections
- Ensure database user has proper permissions
Redis Authentication Issues
Symptoms:
- "NOAUTH Authentication required" errors
- Worker pods unable to connect to Redis
- Cache connection failures
Solution:
- Verify Redis password matches in both Redis deployment and NetBox configuration
- Use standalone Redis instead of cluster mode
- Ensure Redis URL includes password:
redis://:password@host:6379/0
Image Pull Errors
Symptoms:
- Pods stuck in
ImagePullBackOff
orErrImagePull
- "unauthorized" or "not found" errors
Diagnosis:
# Check pod events
kubectl describe pod -n netbox-enterprise <pod-name>
# Verify secret exists
kubectl get secret -n netbox-enterprise registry-credentials
Solutions:
- Verify registry credentials are correct
- Ensure image pull secret is referenced in values file
- Check network connectivity to registry
- Verify image names and tags
License Configuration Issues
Symptoms:
- "Invalid license" errors
- Enterprise features not available
Solution:
- Verify license ID is correctly set in values file
- Ensure license ID matches in both
licenseID
andglobal.license.id
- Check license expiration date
- Contact NetBox Labs support for license issues
Persistent Volume Issues
Symptoms:
- Pods stuck in
Pending
state - PVC not bound
- "no persistent volumes available" errors
Diagnosis:
# Check PVC status
kubectl get pvc -n netbox-enterprise
# Check available storage classes
kubectl get storageclass
# Describe PVC for events
kubectl describe pvc -n netbox-enterprise <pvc-name>
Solutions:
-
Set a default storage class (most common issue):
# Check current storage classes
kubectl get storageclass
# Set default storage class (adjust name for your environment)
# AWS EKS:
kubectl patch storageclass gp2 -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
# Azure AKS:
kubectl patch storageclass managed-csi -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
# Google GKE:
kubectl patch storageclass standard -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}' -
Verify storage class exists and has required features
-
Check if cluster has available storage capacity
-
Ensure PVC size is within storage class limits
-
Verify node has required CSI drivers installed
IngressClass Conflicts
Symptoms:
- Installation fails with "IngressClass 'nginx' already exists"
- Error about missing Helm labels on IngressClass
- "invalid ownership metadata" errors
Diagnosis:
# Check existing IngressClass
kubectl get ingressclass
# Check if an ingress controller is already installed
kubectl get pods -A | grep -i nginx
Solutions:
Option 1: Use the existing ingress controller
# Disable the chart's built-in ingress controller
diode:
ingress-nginx:
enabled: false
Option 2: Disable IngressClass creation only
# Keep the chart's controller but don't create IngressClass
diode:
ingress-nginx:
controller:
ingressClassResource:
enabled: false
Option 3: Remove existing ingress controller
# If using Minikube
minikube addons disable ingress
# Then proceed with installation using chart's controller
Registry Authentication Issues
Problem: Helm registry authentication failures
# Re-authenticate to correct registry
helm registry login registry.netboxlabs.com \
--username "$USERNAME" \
--password "$SERVICE_ACCOUNT_TOKEN"
Problem: Namespace creation failures
# Always use --create-namespace flag
helm install netbox-enterprise \
oci://registry.netboxlabs.com/netbox-enterprise/beta/netbox-enterprise \
--namespace netbox-enterprise \
--create-namespace \
--values netbox-enterprise-values.yaml
Problem: Values file missing license signatures
- Solution: Always start with the generated values file from Enterprise Portal
- Never: Create values files from scratch
Diode Ingress Routing Issues
Symptoms:
- ✅ NetBox web interface loads normally at your domain
- ❌ Diode plugin shows connection errors in NetBox logs
- ❌ Diode agents can't connect to the reconciler service
- ❌ Error messages about "Diode API not reachable" in NetBox startup logs
Diagnosis:
-
Check Diode Plugin Status:
kubectl logs -n netbox-enterprise deployment/netbox-enterprise | grep -i diode
Expected output when working:
ⓘ Enabling Diode plugin
ⓘ Using ingress host: yourdomain.com
✅ Diode plugin configured successfullyProblem indicators:
⚠️ No valid ingress host found, Diode API will not be reachable
⚠️ Failed to connect to Diode service -
Check Ingress Configuration:
kubectl get ingress -n netbox-enterprise -o yaml | grep -A 10 -B 5 "paths:"
Problem: Missing
/diode
route - you only see routes for/
and/(.*)
going tonetbox-enterprise
:paths:
- path: /
pathType: Prefix
backend:
service:
name: netbox-enterprise
port:
number: 80
- path: /(.*)
pathType: ImplementationSpecific
backend:
service:
name: netbox-enterprise
port:
number: 80 -
Test Diode gRPC Endpoint:
# This should fail with broken routing
curl -v https://yourdomain.com/diodeProblem response: Returns NetBox HTML instead of gRPC response
Solutions:
Option 1: Add Missing /diode
Route
Create a values override file to fix the ingress routing:
# diode-routing-fix.yaml
diode:
enabled: true
ingressNginx:
extraHttpPaths:
# ADD THIS FIRST - routes /diode to actual Diode service
- path: /diode
pathType: Prefix
serviceName: netbox-enterprise-diode-reconciler
servicePort: 8081
# Keep existing NetBox routes
- path: /
pathType: Prefix
serviceName: netbox-enterprise
servicePort: 80
- path: /(.*)
pathType: ImplementationSpecific
serviceName: netbox-enterprise
servicePort: 80
# Ensure gRPC is properly configured
grpcAnnotations:
nginx.ingress.kubernetes.io/ssl-redirect: "false"
nginx.ingress.kubernetes.io/proxy-body-size: "25m"
nginx.ingress.kubernetes.io/grpc-backend: "true"
httpAnnotations:
nginx.ingress.kubernetes.io/ssl-redirect: "false"
Apply the fix:
helm upgrade netbox-enterprise \
oci://registry.netboxlabs.com/netbox-enterprise/beta/netbox-enterprise \
--namespace netbox-enterprise \
--values your-existing-values.yaml \
--values diode-routing-fix.yaml
Option 2: Use Existing Ingress Controller
If you have an existing ingress controller and want to use it instead:
# existing-ingress.yaml
diode:
enabled: true
# Disable diode's built-in ingress creation
ingress-nginx:
enabled: false
# Configure to use your existing ingress
ingressNginx:
extraHttpPaths:
- path: /diode
pathType: Prefix
serviceName: netbox-enterprise-diode-reconciler
servicePort: 8081
- path: /
pathType: Prefix
serviceName: netbox-enterprise
servicePort: 80
Option 3: Manual Ingress Resource
For complex ingress setups or when you need full control:
# manual-diode-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: netbox-enterprise-diode
namespace: netbox-enterprise
annotations:
nginx.ingress.kubernetes.io/grpc-backend: "true"
nginx.ingress.kubernetes.io/ssl-redirect: "false"
spec:
rules:
- host: yourdomain.com
http:
paths:
- path: /diode
pathType: Prefix
backend:
service:
name: netbox-enterprise-diode-reconciler
port:
number: 8081
kubectl apply -f manual-diode-ingress.yaml
Option 4: Temporary Disable
If you need NetBox working immediately while investigating:
# disable-diode.yaml
diode:
enabled: false
Verification:
-
Check Updated Ingress:
kubectl get ingress -n netbox-enterprise -o yaml | grep -A 15 "paths:"
Expected output:
paths:
- path: /diode # ← NEW ROUTE ADDED
pathType: Prefix
backend:
service:
name: netbox-enterprise-diode-reconciler
port:
number: 8081
- path: /
pathType: Prefix
backend:
service:
name: netbox-enterprise
port:
number: 80 -
Check NetBox Logs:
kubectl logs -n netbox-enterprise deployment/netbox-enterprise -f | grep -i diode
Expected success output:
ⓘ Enabling Diode plugin
ⓘ Using ingress host: yourdomain.com
✅ Inferred ingress host from ingress controller
✅ Diode plugin configured successfully -
Test gRPC Connectivity:
# Should now get gRPC response instead of HTML
curl -H "Content-Type: application/grpc" https://yourdomain.com/diode
Important Notes:
- Route Order Matters: The
/diode
route must come BEFORE the catch-all/
route - Complex Configurations: In environments with multiple ingress controllers, custom ingress classes, or complex routing rules, the automatic route insertion may not work as expected. You may need to manually configure ingress resources or adjust the order of path rules
- Service Verification: Ensure the
netbox-enterprise-diode-reconciler
service exists:kubectl get svc -n netbox-enterprise | grep diode-reconciler
- Port Numbers: Diode reconciler uses port
8081
, NetBox uses port80
- gRPC Annotations: Required for proper gRPC routing through NGINX ingress controllers
Additional Diode Troubleshooting:
If issues persist, check all Diode components:
# Check all Diode services are running
kubectl get pods -n netbox-enterprise | grep diode
# Collect logs from all Diode components
kubectl logs -n netbox-enterprise deployment/netbox-enterprise-diode-reconciler
kubectl logs -n netbox-enterprise deployment/netbox-enterprise-diode-ingester
kubectl logs -n netbox-enterprise deployment/netbox-enterprise-diode-auth
Deployment Issues in Environments with Restricted Connectivity
Symptom: Images not found in private registry
Solutions:
-
Verify all images are mirrored:
# Check if images exist in your registry
docker pull mycompany.jfrog.io/nbe/netbox-enterprise/nbe-core:4.2.9_main-90 -
Validate private registry configuration:
# Check the generated private registry values
cat my-private-registry.yaml
# Ensure all image references are updated
grep -r "MY_REGISTRY" my-private-registry.yaml # Should return no results -
Use the private registry script helper:
# Download the helper script
curl -O https://netboxlabs.com/docs/files/private-registry.sh
chmod +x private-registry.sh
# Generate configuration
./private-registry.sh mycompany.jfrog.io/nbe > generated-private-registry.yaml
# Compare with manual configuration
diff my-private-registry.yaml generated-private-registry.yamlNote: The script is provided as a troubleshooting aid. The recommended approach is manual configuration using the template file for better control and understanding.
Usage:
# Script generates values for all image references
./private-registry.sh your-registry.com/nbe > private-registry-values.yaml
# Then use in installation
helm install netbox-enterprise ./netbox-enterprise-1.11.4.tgz \
--values netbox-enterprise-values.yaml \
--values private-registry-values.yaml \
--namespace netbox-enterprise
Pod Troubleshooting
Check Pod Status
# List all pods with wide output
kubectl get pods -n netbox-enterprise -o wide
# Describe specific pod
kubectl describe pod -n netbox-enterprise <pod-name>
# Check pod events
kubectl get events -n netbox-enterprise --sort-by='.lastTimestamp'
View Pod Logs
# Application logs
kubectl logs -n netbox-enterprise deployment/netbox-enterprise --tail=100 -f
# Worker logs
kubectl logs -n netbox-enterprise deployment/netbox-enterprise-worker --tail=100 -f
# Previous container logs (if restarted)
kubectl logs -n netbox-enterprise <pod-name> --previous
Execute Commands in Pods
# Open shell in pod
kubectl exec -it -n netbox-enterprise deployment/netbox-enterprise -- /bin/bash
# Run Django management commands
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py showmigrations
# Check Django settings
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py shell -c "from django.conf import settings; print(settings.DATABASES['default'])"
Database Connectivity
Test PostgreSQL Connection
# From NetBox pod
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py dbshell -c "SELECT version();"
# Using psql directly
kubectl run -it --rm psql-test --image=netboxlabs/nbe-utils:5 --restart=Never -- \
psql "postgresql://username:password@host:5432/netbox" -c "SELECT 1;"
Test Redis Connection
# From NetBox pod
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python -c "import redis; r=redis.from_url('redis://:password@host:6379/0'); print('Redis ping:', r.ping())"
# Using redis-cli
kubectl run -it --rm redis-test --image=redis:7 --restart=Never -- \
redis-cli -h host -a password ping
Check Database Migrations
# Show migration status
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py showmigrations
# Run pending migrations (if needed)
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py migrate
Performance Issues
High Memory Usage
Symptoms:
- Pods being OOMKilled
- Slow response times
- Memory alerts
Diagnosis:
# Check resource usage
kubectl top pods -n netbox-enterprise
# Check container limits
kubectl describe deployment -n netbox-enterprise netbox-enterprise | grep -A5 "Limits:"
# Check application memory usage
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
ps aux | grep python
Solutions:
- Increase memory limits in values file
- Scale horizontally by increasing replica count
Slow Database Queries
Diagnosis:
# Check slow queries
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py dbshell -c "SELECT query, calls, mean_exec_time \
FROM pg_stat_statements \
ORDER BY mean_exec_time DESC LIMIT 10;"
# Check database connections
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py dbshell -c "SELECT count(*) FROM pg_stat_activity;"
Solutions:
- Ensure database has adequate resources
- Check for missing indexes
- Review database connection pool settings
- Consider scaling database vertically
Getting Help
Collect Diagnostic Information
When issues persist, generate a support bundle to send to NetBox Labs support:
# Install the support-bundle kubectl plugin (one-time setup)
curl https://krew.sh/support-bundle | bash
# Generate support bundle
kubectl support-bundle --namespace netbox-enterprise
The support bundle automatically collects:
- Cluster information and resources
- NetBox deployment status
- Application logs
- Helm values
- Pod descriptions and events
- Network configurations
- Storage information
Contact NetBox Labs Support
When contacting support:
- Include all diagnostic information
- Provide clear description of the issue
- List steps to reproduce
- Include any error messages
- Specify versions (Helm chart, Kubernetes, databases)
Additional Troubleshooting Resources
- NetBox Documentation: https://netboxlabs.com/docs/
- Kubernetes Documentation: https://kubernetes.io/docs/
- PostgreSQL Documentation: https://www.postgresql.org/docs/
- Redis Documentation: https://redis.io/documentation
Private Registry Resources
For deployments in environments with restricted connectivity, these resources are available to help with private registry configuration:
- Private Registry Template: https://netboxlabs.com/docs/files/private-registry.yaml
- Registry Configuration Script: https://netboxlabs.com/docs/files/private-registry.sh
- Values Template: https://netboxlabs.com/docs/files/values-extra.yaml
The script automates the process of generating private registry configurations but manual configuration using the template is recommended for production deployments.