Troubleshooting
Beta Notice: These Helm charts are currently in beta. While stable for testing and development environments, please thoroughly test in your specific environment before production deployment. For the most up-to-date information, please refer to the main documentation.
Table of Contents
- Diagnostic Commands - Essential debugging tools
- Installation Issues - Deployment problems
- Authentication & Registry Issues - Access problems
- Database Connection Issues - Database problems
- Network & Ingress Issues - Connectivity problems
- Performance Issues - Resource and scaling problems
- Configuration Issues - Settings and validation
- Support Bundle Generation - Gathering diagnostic info
- Getting Help - When to contact support
This guide covers common issues and their solutions for NetBox Enterprise Helm deployments.
Diagnostic Commands
Essential Debugging Commands
# Check pod status and events
kubectl get pods -n netbox-enterprise -o wide
kubectl describe pod <pod-name> -n netbox-enterprise
# Check deployment status
kubectl get deployments -n netbox-enterprise
kubectl rollout status deployment/netbox-enterprise -n netbox-enterprise
# Check services and endpoints
kubectl get svc -n netbox-enterprise
kubectl get endpoints -n netbox-enterprise
# Check ingress configuration
kubectl get ingress -n netbox-enterprise
kubectl describe ingress -n netbox-enterprise
# Check persistent volumes
kubectl get pv,pvc -n netbox-enterprise
# Check events for errors
kubectl get events -n netbox-enterprise --sort-by='.lastTimestamp'
# Check resource usage
kubectl top pods -n netbox-enterprise
kubectl top nodes
Log Analysis Commands
# View application logs
kubectl logs -n netbox-enterprise deployment/netbox-enterprise --tail=100 -f
# View worker logs
kubectl logs -n netbox-enterprise deployment/netbox-enterprise-worker --tail=100 -f
# View previous container logs (if pod crashed)
kubectl logs -n netbox-enterprise deployment/netbox-enterprise --previous
# View logs from all containers in a pod
kubectl logs -n netbox-enterprise <pod-name> --all-containers=true
# Search logs for specific errors
kubectl logs -n netbox-enterprise deployment/netbox-enterprise | grep -i error
kubectl logs -n netbox-enterprise deployment/netbox-enterprise | grep -i "database"
Database Connection Testing
# Test database connectivity from NetBox pod
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py dbshell -c "SELECT 1;"
# Check database configuration
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py shell -c "from django.conf import settings; print(settings.DATABASES)"
# Test external database connection
kubectl run -it --rm debug --image=postgres:14 --restart=Never -- \
psql postgresql://user:pass@host:5432/dbname
Network Connectivity Testing
# Test internal service connectivity
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
curl -I http://netbox-enterprise:80
# Test external connectivity
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
curl -I https://google.com
# Check DNS resolution
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
nslookup netbox-enterprise.netbox-enterprise.svc.cluster.local
Installation Issues
Helm Installation Failures
Problem: Helm installation fails with timeout or resource errors
Diagnostic Steps:
# Check Helm installation status
helm status netbox-enterprise -n netbox-enterprise
# Check for failed resources
kubectl get all -n netbox-enterprise
kubectl get events -n netbox-enterprise --sort-by='.lastTimestamp'
# Check resource constraints
kubectl describe nodes | grep -A5 "Allocated resources"
Solutions:
- Increase timeout: Add
--timeout 15m
to helm install command - Check resource availability: Ensure cluster has sufficient CPU/memory
- Verify storage class: Confirm storage class exists and is available
- Check image pull: Verify registry authentication and image availability
Pod Startup Failures
Problem: Pods fail to start or remain in pending/error state
Diagnostic Steps:
# Check pod status and events
kubectl describe pod <pod-name> -n netbox-enterprise
# Check for image pull errors
kubectl get events -n netbox-enterprise | grep -i "image"
# Check resource constraints
kubectl describe node <node-name> | grep -A10 "Allocated resources"
Common Solutions:
-
ImagePullBackOff:
# Verify registry authentication
kubectl get secret -n netbox-enterprise | grep regcred
# Test image pull manually
docker pull proxy.enterprise.netboxlabs.com/netbox-enterprise/nbe-core:latest -
Resource constraints:
# Check resource requests/limits
kubectl get pods -n netbox-enterprise -o yaml | grep -A10 resources
# Adjust resource requirements in values file -
Storage issues:
# Check PVC status
kubectl get pvc -n netbox-enterprise
# Verify storage class
kubectl get storageclass
Database Migration Failures
Problem: Database migrations fail during startup
Diagnostic Steps:
# Check migration logs
kubectl logs -n netbox-enterprise deployment/netbox-enterprise | grep -i migration
# Check database connectivity
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py dbshell -c "SELECT 1;"
Solutions:
-
Manual migration:
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py migrate -
Database permissions:
# Verify database user has required permissions
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py dbshell -c "SELECT current_user, session_user;"
Authentication & Registry Issues
Registry Authentication Issues
Problem: Cannot pull images from NetBox Enterprise registry
Diagnostic Steps:
# Test registry authentication
docker login registry.enterprise.netboxlabs.com -u $USERNAME -p $LICENSE_ID
# Check image pull secrets
kubectl get secrets -n netbox-enterprise | grep regcred
# Test image pull manually
kubectl run test-pull --image=registry.enterprise.netboxlabs.com/netbox-enterprise/netbox:latest \
--rm -it --restart=Never -- /bin/sh
Solutions:
-
Verify credentials:
# Ensure LICENSE_ID and USERNAME are correct
echo "Username: $USERNAME"
echo "License ID: $LICENSE_ID" -
Recreate registry secret:
# Delete existing secret
kubectl delete secret regcred -n netbox-enterprise
# Create new secret
kubectl create secret docker-registry regcred \
--docker-server=registry.enterprise.netboxlabs.com \
--docker-username=$USERNAME \
--docker-password=$LICENSE_ID \
--namespace netbox-enterprise -
Update image pull policy:
# Add to values file
netbox:
image:
pullPolicy: Always
Network & Ingress Issues
Network Issues
Problem: Connectivity issues between pods or external services
Diagnostic Steps:
# Check network policies
kubectl get networkpolicy -n netbox-enterprise
# Test pod-to-pod connectivity
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
nc -zv netbox-enterprise-postgresql 5432
# Check DNS resolution
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
nslookup netbox-enterprise-postgresql
# Check service endpoints
kubectl get endpoints -n netbox-enterprise
Solutions:
-
Verify service configuration:
# Check service definition
kubectl describe service netbox-enterprise -n netbox-enterprise -
Test network connectivity:
# Test external connectivity
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
curl -I https://google.com -
Check firewall rules:
- Ensure Kubernetes cluster can reach external services
- Verify ingress controller configuration
- Check cloud provider security groups
Database Connection Issues
Connection Refused Errors
Problem: NetBox cannot connect to database
Diagnostic Steps:
# Check database service
kubectl get svc -n netbox-enterprise | grep postgres
# Test database connectivity
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py dbshell -c "SELECT 1;"
# Check database configuration
kubectl get secret -n netbox-enterprise netbox-enterprise-secret-config -o yaml
Solutions:
-
Internal PostgreSQL:
# Check PostgreSQL pod status
kubectl get pods -n netbox-enterprise | grep postgres
# Check PostgreSQL logs
kubectl logs -n netbox-enterprise statefulset/netbox-enterprise-postgresql -
External PostgreSQL:
# Verify external database connectivity
kubectl run -it --rm debug --image=postgres:14 --restart=Never -- \
psql postgresql://user:pass@host:5432/dbname
# Check firewall rules and network policies
Database Performance Issues
Problem: Slow database queries affecting application performance
Diagnostic Steps:
# Check database connections
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py dbshell -c "SELECT count(*) FROM pg_stat_activity;"
# Check slow queries
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py dbshell -c "SELECT query, calls, mean_time FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 10;"
Solutions:
-
Database maintenance:
# Run VACUUM ANALYZE
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py dbshell -c "VACUUM ANALYZE;" -
Connection pooling:
# Add to values file
netbox:
extraEnvironment:
DB_CONN_MAX_AGE: '300'
DB_CONN_RETRY_DELAY: '5' -
Resource allocation:
# Increase database resources
postgresql:
resources:
requests:
cpu: '2000m'
memory: '4Gi'
limits:
cpu: '4000m'
memory: '8Gi'
Network & Ingress Issues
Ingress Controller Issues
Problem: Cannot access NetBox through ingress
Diagnostic Steps:
# Check ingress status
kubectl get ingress -n netbox-enterprise
kubectl describe ingress -n netbox-enterprise
# Check ingress controller
kubectl get pods -A | grep ingress
# Test service connectivity
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
curl -I http://netbox-enterprise:80
Solutions:
-
Install ingress controller:
# Install NGINX ingress controller
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm install ingress-nginx ingress-nginx/ingress-nginx -
Update ingress configuration:
# Add to values file
netbox:
ingress:
enabled: true
className: 'nginx'
hosts:
- host: netbox.example.com
paths:
- path: /
pathType: Prefix
TLS Certificate Issues
Problem: TLS certificate errors or warnings
Diagnostic Steps:
# Check certificate status
kubectl get certificates -n netbox-enterprise
# Check cert-manager logs
kubectl logs -n cert-manager deployment/cert-manager
# Test certificate
openssl s_client -connect netbox.example.com:443 -servername netbox.example.com
Solutions:
-
Install cert-manager:
helm repo add jetstack https://charts.jetstack.io
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--set installCRDs=true -
Configure certificate issuer:
# Add to values file
netbox:
ingress:
annotations:
cert-manager.io/cluster-issuer: 'letsencrypt-prod'
tls:
- secretName: netbox-tls
hosts:
- netbox.example.com
Performance Issues
Resource Constraints
# Check resource usage
kubectl top pods -n netbox-enterprise
kubectl top nodes
# Check resource limits
kubectl describe deployment netbox-enterprise -n netbox-enterprise | grep -A5 "Limits:"
# Check for pending pods
kubectl get events -n netbox-enterprise --field-selector reason=FailedScheduling
Slow Database Queries
If you're experiencing slow database performance:
# Check database connection pool
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py dbshell -c "SELECT count(*) FROM pg_stat_activity;"
# Monitor slow queries
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py dbshell -c "SELECT query, mean_exec_time FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 10;"
# Check database size and performance
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py dbshell -c "SELECT schemaname,tablename,attname,n_distinct,correlation FROM pg_stats WHERE tablename='dcim_device';"
Database Optimization
For better database performance:
- Use external PostgreSQL with proper indexing
- Configure connection pooling appropriately
- Monitor query performance regularly
- Consider read replicas for large deployments
Configuration: For automated backups, see Values Guide - Production Resources for resource requirements.
Configuration Issues
Invalid Configuration Values
Problem: Helm deployment fails due to invalid configuration
Diagnostic Steps:
# Validate configuration
helm template netbox-enterprise \
oci://registry.enterprise.netboxlabs.com/netbox-enterprise/beta/netbox-enterprise \
--values netbox-enterprise-values.yaml \
--values values-extra.yaml \
--dry-run
# Check for YAML syntax errors
yamllint values-extra.yaml
Solutions:
- Fix YAML syntax: Use proper indentation and structure
- Validate values: Check against chart documentation
- Use dry-run: Test configuration before deployment
Environment Variable Issues
Problem: Application not reading environment variables correctly
Diagnostic Steps:
# Check environment variables in pod
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- env | grep -i netbox
# Check secret values
kubectl get secret -n netbox-enterprise netbox-enterprise-secret-config -o yaml
Solutions:
- Verify secret creation: Ensure secrets are properly created
- Check variable names: Verify environment variable names match requirements
- Restart deployment: Apply changes with pod restart
Support Bundle Generation
When issues persist, generate a support bundle for NetBox Labs support:
Method 1: Using kubectl support-bundle
# Install support-bundle plugin
curl https://krew.sh/support-bundle | bash
# Generate support bundle
kubectl support-bundle --namespace netbox-enterprise
Method 2: Manual Collection
# Create support bundle directory
mkdir netbox-support-bundle
cd netbox-support-bundle
# Collect basic information
kubectl get all -n netbox-enterprise > resources.yaml
kubectl describe all -n netbox-enterprise > descriptions.yaml
kubectl get events -n netbox-enterprise > events.yaml
# Collect logs
kubectl logs -n netbox-enterprise deployment/netbox-enterprise > netbox-logs.txt
kubectl logs -n netbox-enterprise deployment/netbox-enterprise-worker > worker-logs.txt
# Collect configuration
helm get values netbox-enterprise -n netbox-enterprise > helm-values.yaml
kubectl get configmap -n netbox-enterprise -o yaml > configmaps.yaml
# Create archive
tar czf netbox-support-bundle-$(date +%Y%m%d-%H%M%S).tar.gz *
For complete support bundle procedures, see Operations - Support Bundle Generation.
Getting Help
Before Contacting Support
- Check this troubleshooting guide for common solutions
- Review the installation guide for proper setup procedures
- Generate a support bundle with diagnostic information
- Document the issue with steps to reproduce
Information to Include
When contacting support, provide:
- Environment details: Kubernetes version, platform (EKS, GKE, etc.)
- Helm chart version: Version of NetBox Enterprise chart
- Error messages: Complete error messages and logs
- Support bundle: Generated diagnostic bundle
- Steps to reproduce: Detailed reproduction steps
- Recent changes: Any recent configuration or version changes
Support Channels
- NetBox Labs Support Portal: Submit tickets with support bundles
- Documentation: Check NetBox Enterprise documentation
- Community Forums: Search existing discussions
Emergency Procedures
For critical production issues:
- Immediate actions: Check Operations - Backup Procedures
- Rollback procedures: Use
helm rollback
if needed - Escalation: Contact NetBox Labs support with "URGENT" priority
Complete Installation Guide
- Overview - Architecture and approach
- Prerequisites - System requirements
- Installation - Installation procedures
- Values Guide - Configuration reference
- Operations - Backup and maintenance
- → Troubleshooting - Problem resolution