When I started building the Boids simulation project, I had one goal: push as many autonomous actors as possible while maintaining 60 fps on a mid-range GPU. The naive approach — one AActor per boid — collapsed at around 500 agents. Here's how spatial partitioning and Instanced Static Mesh rendering got me to 20,000 actors running in real-time.
The Naive Approach — And Why It Dies
Craig Reynolds' classic Boids algorithm is elegant: every agent looks at its neighbours and applies three forces —
separation, alignment, and cohesion.
Spawning 500 AActors, each ticking and querying SweepMultiByChannel
on every frame, turns the Physics thread into a bottleneck almost immediately. At 1,000 agents the frame
budget is already blown.
Spatial Partitioning with a Uniform Grid
The fix is a uniform spatial grid. The world is subdivided into fixed-size cells. Each boid registers itself into whichever cell its current position maps to. Neighbour queries then only check the 3×3 (or 3×3×3 in 3D) block of surrounding cells — reducing the average lookup from O(n) per agent to O(k) where k is the average density per cell, typically a handful of boids.
With the grid in place, overall complexity drops to roughly O(n·k). Because cell size is chosen to match the boid's perception radius, k stays nearly constant regardless of total population. The lookup effectively becomes O(n).
Eliminating Draw Calls with Instanced Static Mesh
Even with fast neighbour queries, 20,000 individual actors each issuing their own draw call would
saturate the render thread. The solution: one UInstancedStaticMeshComponent
owned by a single manager actor. Each boid is a transform entry in the ISM, not a spawned actor.
Every tick, the manager updates instance transforms in a single batch call. The GPU sees one draw call for all 20,000 meshes.
Parallelising the Simulation with ParallelFor
Unreal's ParallelFor distributes loop iterations across available worker threads
with minimal overhead. Because each boid's steering only reads neighbour data (never writes),
there are no data races. The grid is rebuilt at the start of each tick then treated as read-only
during steering calculation.
ParallelFor causes race conditions. Keeping them in two serial-parallel
stages eliminates the need for any locks.
Results
On a mid-range system (RTX 3060, Ryzen 5 5600X), the final implementation achieves:
- 500 agents (naive): ~45 fps — barely playable
- 5,000 agents (grid + ISM): ~72 fps
- 20,000 agents (grid + ISM + ParallelFor): ~62 fps
The emergent behaviour — murmurations, splitting flocks, collision avoidance around obstacles — is visually convincing at 20k and completely breaks down in interesting ways when you tweak the perception radius or weight ratios live in-editor. That real-time tunability was one of the most satisfying parts of the project.
Takeaways for Production Code
- Fix the algorithm first. Hardware can't save O(n²).
- ISM is the right tool whenever you have many identical meshes with different transforms.
ParallelForin UE is trivial to adopt — the hardest part is ensuring read/write separation.- Spatial partitioning generalises: the same grid structure powers AI perception, projectile queries, and area-of-effect checks in full production titles.
The full source is available on GitHub if you want to dig into the implementation details or adapt it for your own project.
