Blog/Blog

Bulk Operations at Scale: Introducing NetBox TurboBulk

|
7 min
Authors
Kris Beevers
Bulk Operations at Scale: Introducing NetBox TurboBulk
Be the first to hear news and subscribe here.
Key links
Share

At NetBox Labs we work with the teams operating the largest and most automated infrastructure environments in the world. Hyperscale operators building massive AI datacenters at incredible speed. Enterprises managing tens of thousands of network devices across hundreds of sites. Teams where NetBox isn’t just a documentation tool — it’s the system of record at the center of their automation stack.

These teams push NetBox hard. They’re syncing data from CMDBs and discovery platforms, running CI/CD pipelines that provision infrastructure programmatically, and exporting complete datasets into analytics and visualization systems. And the common thread in every conversation we have with them is the same: the standard REST and GraphQL APIs are fantastic for interactive use and moderate-scale automation, but when you need to load 500,000 devices or export two million IP addresses, per-object API calls hit a wall.

Today we’re announcing TurboBulk — a high-performance bulk data API for the NetBox Labs platform that processes tens of thousands of objects per second. Insert, upsert, delete, and export millions of rows in minutes instead of hours. Built in partnership with several of our large-scale AI datacenter customers, and available now as a customer preview for NetBox Cloud and NetBox Enterprise premium tier customers.

Why Bulk Operations Need a Different Approach

The NetBox REST API is designed around individual objects. You POST a device, you get back a fully validated, serialized device with all its relationships resolved. Every custom validator runs. Every webhook fires. Every changelog entry is created synchronously. That’s exactly what you want when you’re creating a device from a form or updating a description from a script.

But that per-object overhead adds up. Even on a well-provisioned, performance-tuned NetBox Community deployment, the REST API typically sustains 100-200 objects per second with full validation and serialization. Loading 100,000 devices takes over eight minutes. A million takes over an hour. And real-world bulk workflows don’t involve just devices — they involve interfaces, IP addresses, cables, connections. A single datacenter buildout can easily produce hundreds of thousands of objects across a dozen models. Iterating on a large environment’s worth of infrastructure designs can involve millions.

TurboBulk takes a fundamentally different approach. Instead of processing objects one at a time through the ORM, it operates at the database level — bulk SQL operations that process entire datasets in single statements, wrapped in atomic transactions. The results NetBox Labs customers are seeing is throughput that’s orders of magnitude faster while maintaining the data integrity guarantees that matter.

The Platform Picture

If you’ve been paying attention to what we’ve been building at NetBox Labs, TurboBulk fits into a larger story about what the NetBox Labs platform becomes at scale.

TurboBulk is one of a growing set of capabilities — alongside Visual Explorer, Event Streams, Branching, and NetBox Copilot — that differentiate the NetBox Labs platform for production grade use cases and provide value for the most demanding infrastructure environments. Visual Explorer depends on TurboBulk’s export engine to deliver interactive visualizations of massive-scale infrastructure directly from your live source of truth. Event Streams turns the NetBox Labs platform’s state changes into real-time data feeds for automation and security. TurboBulk makes it possible to read and write data at the scale these workflows demand.

As the environments our customers manage grow larger and more automated, the NetBox Labs platform grows with them.

How It Works

Bulk Exports and Intelligent Caching

The most immediately valuable capability for most teams is bulk export. Instead of paginating through the REST API one page at a time — which for a million objects means thousands of round trips, each with serialization overhead — TurboBulk generates a complete export file server-side and makes it available for download in JSONL or Parquet format. Filters and field selection let you scope the export to exactly what you need.

What makes this particularly powerful for operational workflows is the caching layer. TurboBulk computes a deterministic cache key from your export parameters and checks whether the underlying data has changed since the last export by consulting NetBox’s audit log. If nothing has changed, the cached file is returned immediately — no query, no serialization, no generation. Clients that already have the file can send their cache key and receive an HTTP 304 — no data transferred at all.

This changes what’s practical to build. Analytics dashboards, visualization tools, and sync pipelines can request fresh data frequently, only downloading when something has actually changed. We use this pattern ourselves — it’s what powers the data layer behind Visual Explorer. When you’re navigating across complex, detailed infrastructure views — drilling from a 3D floorplan into a rack elevation, tracing a cable path across a dense fabric — the experience feels real-time because TurboBulk’s cached exports deliver complete, consistent snapshots from your live NetBox data without the latency of regenerating them on every request.

Bulk Writes: Stage, Merge, Commit

For teams that need to write at scale — initial data population, ongoing syncs, iterative infrastructure design — TurboBulk’s write path takes a fundamentally different approach from the REST API. Instead of processing objects one at a time through the ORM, it operates at the database level.

A write operation starts with a data file — we recommend JSONL for most use cases. The file is uploaded and an asynchronous job is created. On the server side, incoming rows are ingested using PostgreSQL’s native COPY protocol — the fastest path into Postgres — streaming into a temporary staging area in configurable chunks to keep memory usage constant regardless of dataset size. The final operation — insert, upsert, or delete — executes as a single bulk SQL statement that merges the staged data into the target table. The entire pipeline is wrapped in an atomic transaction: either every row commits, or none do. There is no partial state.

This is what makes the performance difference fundamental rather than incremental. We’re not optimizing the per-object path — we’re replacing it with a set-based approach that lets PostgreSQL do what it’s built to do.

Knobs for Power Users

TurboBulk is designed for operators who understand their data and want explicit control over the tradeoffs. A few examples:

Validation modes — Three levels that let you choose the right balance of speed and safety. None relies on database constraints only — the fastest mode, appropriate for trusted sources. Auto (the default) adds SQL-based pre-validation rules for models like IPAM prefixes where database constraints alone aren’t sufficient. Full runs Django’s model validation on every object — 30 to 60 times slower, but the right call for untrusted data or complex models like cables.

Changelogs and event dispatch — Audit trail and webhook/event-rule dispatch are supported and enabled by default, so downstream systems stay in sync. Both can be disabled per-operation for initial loads where the overhead isn’t justified.

Dry-run mode — Validates data through the full pipeline without committing. Ingested, staged, merged, validated — then rolled back. Fix your data, run again. This makes it practical to iterate on large datasets without risk.

Branching — TurboBulk integrates with NetBox Branching, so you can load or delete within an isolated branch, review the changes, and merge to main when ready.

Performance in Practice

Here’s what this looks like with real data. We recently tested TurboBulk against a production-scale NetBox dataset from a large real-world infrastructure environment: tens of millions of objects — several hundred thousand devices, nearly ten million interfaces, millions of cables and IP addresses.

Using the standard REST API, the export managed a few hundred thousand objects before running out of memory. At roughly 170 objects per second, the full dataset would have taken over 23 hours — if it could fit in memory at all.

TurboBulk exported the entire dataset in about 10 minutes. More than 24,000 objects per second, under 500 MB of memory. Over 100x faster, handling an order of magnitude more data, using a fraction of the memory. On a laptop — not a tuned production server.

This isn’t a synthetic benchmark. It’s a real-world dataset running through the same pipeline our customers use.

Customer Preview — Get Started

TurboBulk is available now as a customer preview, included in the premium tier of NetBox Cloud and NetBox Enterprise. Reach out to your customer success representative or contact us to get it enabled on your account.

The Python client library, comprehensive documentation, and progressive example scripts are available at netboxlabs/netbox-turbobulk-public on GitHub.

TurboBulk was built in partnership with several of our large-scale AI datacenter customers, and we’re continuing to develop it based on real-world feedback. If your team is pushing the limits of what’s possible with NetBox at scale — we built this for you.