Skip to main content

Troubleshooting

Beta Notice: These Helm charts are currently in beta. While stable for testing and development environments, please thoroughly test in your specific environment before production deployment. For the most up-to-date information, please refer to the main documentation.

Table of Contents

  1. Diagnostic Commands - Essential debugging tools
  2. Installation Issues - Deployment problems
  3. Authentication & Registry Issues - Access problems
  4. Database Connection Issues - Database problems
  5. Network & Ingress Issues - Connectivity problems
  6. Performance Issues - Resource and scaling problems
  7. Configuration Issues - Settings and validation
  8. Support Bundle Generation - Gathering diagnostic info
  9. Getting Help - When to contact support

This guide covers common issues and their solutions for NetBox Enterprise Helm deployments.

Diagnostic Commands

Essential Debugging Commands

# Check pod status and events
kubectl get pods -n netbox-enterprise -o wide
kubectl describe pod <pod-name> -n netbox-enterprise

# Check deployment status
kubectl get deployments -n netbox-enterprise
kubectl rollout status deployment/netbox-enterprise -n netbox-enterprise

# Check services and endpoints
kubectl get svc -n netbox-enterprise
kubectl get endpoints -n netbox-enterprise

# Check ingress configuration
kubectl get ingress -n netbox-enterprise
kubectl describe ingress -n netbox-enterprise

# Check persistent volumes
kubectl get pv,pvc -n netbox-enterprise

# Check events for errors
kubectl get events -n netbox-enterprise --sort-by='.lastTimestamp'

# Check resource usage
kubectl top pods -n netbox-enterprise
kubectl top nodes

Log Analysis Commands

# View application logs
kubectl logs -n netbox-enterprise deployment/netbox-enterprise --tail=100 -f

# View worker logs
kubectl logs -n netbox-enterprise deployment/netbox-enterprise-worker --tail=100 -f

# View previous container logs (if pod crashed)
kubectl logs -n netbox-enterprise deployment/netbox-enterprise --previous

# View logs from all containers in a pod
kubectl logs -n netbox-enterprise <pod-name> --all-containers=true

# Search logs for specific errors
kubectl logs -n netbox-enterprise deployment/netbox-enterprise | grep -i error
kubectl logs -n netbox-enterprise deployment/netbox-enterprise | grep -i "database"

Database Connection Testing

# Test database connectivity from NetBox pod
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py dbshell -c "SELECT 1;"

# Check database configuration
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py shell -c "from django.conf import settings; print(settings.DATABASES)"

# Test external database connection
kubectl run -it --rm debug --image=postgres:14 --restart=Never -- \
psql postgresql://user:pass@host:5432/dbname

Network Connectivity Testing

# Test internal service connectivity
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
curl -I http://netbox-enterprise:80

# Test external connectivity
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
curl -I https://google.com

# Check DNS resolution
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
nslookup netbox-enterprise.netbox-enterprise.svc.cluster.local

Installation Issues

Helm Installation Failures

Problem: Helm installation fails with timeout or resource errors

Diagnostic Steps:

# Check Helm installation status
helm status netbox-enterprise -n netbox-enterprise

# Check for failed resources
kubectl get all -n netbox-enterprise
kubectl get events -n netbox-enterprise --sort-by='.lastTimestamp'

# Check resource constraints
kubectl describe nodes | grep -A5 "Allocated resources"

Solutions:

  1. Increase timeout: Add --timeout 15m to helm install command
  2. Check resource availability: Ensure cluster has sufficient CPU/memory
  3. Verify storage class: Confirm storage class exists and is available
  4. Check image pull: Verify registry authentication and image availability

Pod Startup Failures

Problem: Pods fail to start or remain in pending/error state

Diagnostic Steps:

# Check pod status and events
kubectl describe pod <pod-name> -n netbox-enterprise

# Check for image pull errors
kubectl get events -n netbox-enterprise | grep -i "image"

# Check resource constraints
kubectl describe node <node-name> | grep -A10 "Allocated resources"

Common Solutions:

  1. ImagePullBackOff:

    # Verify registry authentication
    kubectl get secret -n netbox-enterprise | grep regcred

    # Test image pull manually
    docker pull proxy.enterprise.netboxlabs.com/netbox-enterprise/nbe-core:latest
  2. Resource constraints:

    # Check resource requests/limits
    kubectl get pods -n netbox-enterprise -o yaml | grep -A10 resources

    # Adjust resource requirements in values file
  3. Storage issues:

    # Check PVC status
    kubectl get pvc -n netbox-enterprise

    # Verify storage class
    kubectl get storageclass

Database Migration Failures

Problem: Database migrations fail during startup

Diagnostic Steps:

# Check migration logs
kubectl logs -n netbox-enterprise deployment/netbox-enterprise | grep -i migration

# Check database connectivity
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py dbshell -c "SELECT 1;"

Solutions:

  1. Manual migration:

    kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
    python manage.py migrate
  2. Database permissions:

    # Verify database user has required permissions
    kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
    python manage.py dbshell -c "SELECT current_user, session_user;"

Authentication & Registry Issues

Registry Authentication Issues

Problem: Cannot pull images from NetBox Enterprise registry

Diagnostic Steps:

# Test registry authentication
docker login registry.enterprise.netboxlabs.com -u $USERNAME -p $LICENSE_ID

# Check image pull secrets
kubectl get secrets -n netbox-enterprise | grep regcred

# Test image pull manually
kubectl run test-pull --image=registry.enterprise.netboxlabs.com/netbox-enterprise/netbox:latest \
--rm -it --restart=Never -- /bin/sh

Solutions:

  1. Verify credentials:

    # Ensure LICENSE_ID and USERNAME are correct
    echo "Username: $USERNAME"
    echo "License ID: $LICENSE_ID"
  2. Recreate registry secret:

    # Delete existing secret
    kubectl delete secret regcred -n netbox-enterprise

    # Create new secret
    kubectl create secret docker-registry regcred \
    --docker-server=registry.enterprise.netboxlabs.com \
    --docker-username=$USERNAME \
    --docker-password=$LICENSE_ID \
    --namespace netbox-enterprise
  3. Update image pull policy:

    # Add to values file
    netbox:
    image:
    pullPolicy: Always

Network & Ingress Issues

Network Issues

Problem: Connectivity issues between pods or external services

Diagnostic Steps:

# Check network policies
kubectl get networkpolicy -n netbox-enterprise

# Test pod-to-pod connectivity
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
nc -zv netbox-enterprise-postgresql 5432

# Check DNS resolution
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
nslookup netbox-enterprise-postgresql

# Check service endpoints
kubectl get endpoints -n netbox-enterprise

Solutions:

  1. Verify service configuration:

    # Check service definition
    kubectl describe service netbox-enterprise -n netbox-enterprise
  2. Test network connectivity:

    # Test external connectivity
    kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
    curl -I https://google.com
  3. Check firewall rules:

    • Ensure Kubernetes cluster can reach external services
    • Verify ingress controller configuration
    • Check cloud provider security groups

Database Connection Issues

Connection Refused Errors

Problem: NetBox cannot connect to database

Diagnostic Steps:

# Check database service
kubectl get svc -n netbox-enterprise | grep postgres

# Test database connectivity
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py dbshell -c "SELECT 1;"

# Check database configuration
kubectl get secret -n netbox-enterprise netbox-enterprise-secret-config -o yaml

Solutions:

  1. Internal PostgreSQL:

    # Check PostgreSQL pod status
    kubectl get pods -n netbox-enterprise | grep postgres

    # Check PostgreSQL logs
    kubectl logs -n netbox-enterprise statefulset/netbox-enterprise-postgresql
  2. External PostgreSQL:

    # Verify external database connectivity
    kubectl run -it --rm debug --image=postgres:14 --restart=Never -- \
    psql postgresql://user:pass@host:5432/dbname

    # Check firewall rules and network policies

Database Performance Issues

Problem: Slow database queries affecting application performance

Diagnostic Steps:

# Check database connections
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py dbshell -c "SELECT count(*) FROM pg_stat_activity;"

# Check slow queries
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py dbshell -c "SELECT query, calls, mean_time FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 10;"

Solutions:

  1. Database maintenance:

    # Run VACUUM ANALYZE
    kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
    python manage.py dbshell -c "VACUUM ANALYZE;"
  2. Connection pooling:

    # Add to values file
    netbox:
    extraEnvironment:
    DB_CONN_MAX_AGE: '300'
    DB_CONN_RETRY_DELAY: '5'
  3. Resource allocation:

    # Increase database resources
    postgresql:
    resources:
    requests:
    cpu: '2000m'
    memory: '4Gi'
    limits:
    cpu: '4000m'
    memory: '8Gi'

Network & Ingress Issues

Ingress Controller Issues

Problem: Cannot access NetBox through ingress

Diagnostic Steps:

# Check ingress status
kubectl get ingress -n netbox-enterprise
kubectl describe ingress -n netbox-enterprise

# Check ingress controller
kubectl get pods -A | grep ingress

# Test service connectivity
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
curl -I http://netbox-enterprise:80

Solutions:

  1. Install ingress controller:

    # Install NGINX ingress controller
    helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
    helm install ingress-nginx ingress-nginx/ingress-nginx
  2. Update ingress configuration:

    # Add to values file
    netbox:
    ingress:
    enabled: true
    className: 'nginx'
    hosts:
    - host: netbox.example.com
    paths:
    - path: /
    pathType: Prefix

TLS Certificate Issues

Problem: TLS certificate errors or warnings

Diagnostic Steps:

# Check certificate status
kubectl get certificates -n netbox-enterprise

# Check cert-manager logs
kubectl logs -n cert-manager deployment/cert-manager

# Test certificate
openssl s_client -connect netbox.example.com:443 -servername netbox.example.com

Solutions:

  1. Install cert-manager:

    helm repo add jetstack https://charts.jetstack.io
    helm install cert-manager jetstack/cert-manager \
    --namespace cert-manager \
    --create-namespace \
    --set installCRDs=true
  2. Configure certificate issuer:

    # Add to values file
    netbox:
    ingress:
    annotations:
    cert-manager.io/cluster-issuer: 'letsencrypt-prod'
    tls:
    - secretName: netbox-tls
    hosts:
    - netbox.example.com

Performance Issues

Resource Constraints

# Check resource usage
kubectl top pods -n netbox-enterprise
kubectl top nodes

# Check resource limits
kubectl describe deployment netbox-enterprise -n netbox-enterprise | grep -A5 "Limits:"

# Check for pending pods
kubectl get events -n netbox-enterprise --field-selector reason=FailedScheduling

Slow Database Queries

If you're experiencing slow database performance:

# Check database connection pool
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py dbshell -c "SELECT count(*) FROM pg_stat_activity;"

# Monitor slow queries
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py dbshell -c "SELECT query, mean_exec_time FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 10;"

# Check database size and performance
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- \
python manage.py dbshell -c "SELECT schemaname,tablename,attname,n_distinct,correlation FROM pg_stats WHERE tablename='dcim_device';"

Database Optimization

For better database performance:

  • Use external PostgreSQL with proper indexing
  • Configure connection pooling appropriately
  • Monitor query performance regularly
  • Consider read replicas for large deployments

Configuration: For automated backups, see Values Guide - Production Resources for resource requirements.

Configuration Issues

Invalid Configuration Values

Problem: Helm deployment fails due to invalid configuration

Diagnostic Steps:

# Validate configuration
helm template netbox-enterprise \
oci://registry.enterprise.netboxlabs.com/netbox-enterprise/beta/netbox-enterprise \
--values netbox-enterprise-values.yaml \
--values values-extra.yaml \
--dry-run

# Check for YAML syntax errors
yamllint values-extra.yaml

Solutions:

  1. Fix YAML syntax: Use proper indentation and structure
  2. Validate values: Check against chart documentation
  3. Use dry-run: Test configuration before deployment

Environment Variable Issues

Problem: Application not reading environment variables correctly

Diagnostic Steps:

# Check environment variables in pod
kubectl exec -n netbox-enterprise deployment/netbox-enterprise -- env | grep -i netbox

# Check secret values
kubectl get secret -n netbox-enterprise netbox-enterprise-secret-config -o yaml

Solutions:

  1. Verify secret creation: Ensure secrets are properly created
  2. Check variable names: Verify environment variable names match requirements
  3. Restart deployment: Apply changes with pod restart

Support Bundle Generation

When issues persist, generate a support bundle for NetBox Labs support:

Method 1: Using kubectl support-bundle

# Install support-bundle plugin
curl https://krew.sh/support-bundle | bash

# Generate support bundle
kubectl support-bundle --namespace netbox-enterprise

Method 2: Manual Collection

# Create support bundle directory
mkdir netbox-support-bundle
cd netbox-support-bundle

# Collect basic information
kubectl get all -n netbox-enterprise > resources.yaml
kubectl describe all -n netbox-enterprise > descriptions.yaml
kubectl get events -n netbox-enterprise > events.yaml

# Collect logs
kubectl logs -n netbox-enterprise deployment/netbox-enterprise > netbox-logs.txt
kubectl logs -n netbox-enterprise deployment/netbox-enterprise-worker > worker-logs.txt

# Collect configuration
helm get values netbox-enterprise -n netbox-enterprise > helm-values.yaml
kubectl get configmap -n netbox-enterprise -o yaml > configmaps.yaml

# Create archive
tar czf netbox-support-bundle-$(date +%Y%m%d-%H%M%S).tar.gz *

For complete support bundle procedures, see Operations - Support Bundle Generation.

Getting Help

Before Contacting Support

  1. Check this troubleshooting guide for common solutions
  2. Review the installation guide for proper setup procedures
  3. Generate a support bundle with diagnostic information
  4. Document the issue with steps to reproduce

Information to Include

When contacting support, provide:

  • Environment details: Kubernetes version, platform (EKS, GKE, etc.)
  • Helm chart version: Version of NetBox Enterprise chart
  • Error messages: Complete error messages and logs
  • Support bundle: Generated diagnostic bundle
  • Steps to reproduce: Detailed reproduction steps
  • Recent changes: Any recent configuration or version changes

Support Channels

  • NetBox Labs Support Portal: Submit tickets with support bundles
  • Documentation: Check NetBox Enterprise documentation
  • Community Forums: Search existing discussions

Emergency Procedures

For critical production issues:

  1. Immediate actions: Check Operations - Backup Procedures
  2. Rollback procedures: Use helm rollback if needed
  3. Escalation: Contact NetBox Labs support with "URGENT" priority

Complete Installation Guide

  1. Overview - Architecture and approach
  2. Prerequisites - System requirements
  3. Installation - Installation procedures
  4. Values Guide - Configuration reference
  5. Operations - Backup and maintenance
  6. Troubleshooting - Problem resolution
Related Topics