Simulating 20,000 Actors in Real-Time

When I started building the Boids simulation project, I had one goal: push as many autonomous actors as possible while maintaining 60 fps on a mid-range GPU. The naive approach — one AActor per boid — collapsed at around 500 agents. Here's how spatial partitioning and Instanced Static Mesh rendering got me to 20,000 actors running in real-time.

20k Simultaneous agents

60 Stable FPS

O(n) Neighbour query

1 Draw call (ISM)

The Naive Approach — And Why It Dies

Craig Reynolds' classic Boids algorithm is elegant: every agent looks at its neighbours and applies three forces — separation, alignment, and cohesion. Spawning 500 AActors, each ticking and querying SweepMultiByChannel on every frame, turns the Physics thread into a bottleneck almost immediately. At 1,000 agents the frame budget is already blown.

The root problem is algorithmic complexity. Naive neighbour lookup is O(n²) — every boid checks every other boid. For 20,000 agents that's 400 million comparisons per frame. No amount of multithreading saves you from a bad algorithm.

Spatial Partitioning with a Uniform Grid

The fix is a uniform spatial grid. The world is subdivided into fixed-size cells. Each boid registers itself into whichever cell its current position maps to. Neighbour queries then only check the 3×3 (or 3×3×3 in 3D) block of surrounding cells — reducing the average lookup from O(n) per agent to O(k) where k is the average density per cell, typically a handful of boids.

// Uniform grid: map world position to cell index int32 UBoidGrid::WorldToCell(const FVector& Pos) const { int32 X = FMath::FloorToInt((Pos.X - Origin.X) / CellSize); int32 Y = FMath::FloorToInt((Pos.Y - Origin.Y) / CellSize); // Clamp to grid bounds X = FMath::Clamp(X, 0, GridWidth - 1); Y = FMath::Clamp(Y, 0, GridHeight - 1); return Y * GridWidth + X; } // Collect neighbours from adjacent cells only void UBoidGrid::GetNeighbours( int32 CellIdx, TArray<FBoidData*>& OutNeighbours) const { for (int32 dy = -1; dy <= 1; ++dy) for (int32 dx = -1; dx <= 1; ++dx) { int32 Idx = CellIdx + dy * GridWidth + dx; if (Cells.IsValidIndex(Idx)) OutNeighbours.Append(Cells[Idx]); } }

With the grid in place, overall complexity drops to roughly O(n·k). Because cell size is chosen to match the boid's perception radius, k stays nearly constant regardless of total population. The lookup effectively becomes O(n).

Eliminating Draw Calls with Instanced Static Mesh

Even with fast neighbour queries, 20,000 individual actors each issuing their own draw call would saturate the render thread. The solution: one UInstancedStaticMeshComponent owned by a single manager actor. Each boid is a transform entry in the ISM, not a spawned actor.

Every tick, the manager updates instance transforms in a single batch call. The GPU sees one draw call for all 20,000 meshes.

// Manager actor — owns one ISM, drives all boids void ABoidManager::Tick(float DeltaTime) { Super::Tick(DeltaTime); // 1. Rebuild spatial grid Grid->Clear(); for (FBoidData& B : Boids) Grid->Insert(B); // 2. Compute steering (can be parallelised) ParallelFor(Boids.Num(), [&](int32 i) { SimulateBoid(Boids[i], DeltaTime); }); // 3. Push all transforms in one batch ISMComponent->BatchUpdateInstancesTransforms( 0, Transforms, false, true); }

Parallelising the Simulation with ParallelFor

Unreal's ParallelFor distributes loop iterations across available worker threads with minimal overhead. Because each boid's steering only reads neighbour data (never writes), there are no data races. The grid is rebuilt at the start of each tick then treated as read-only during steering calculation.

Key insight: separate the write phase (grid rebuild) from the read phase (steering). Mixing both inside ParallelFor causes race conditions. Keeping them in two serial-parallel stages eliminates the need for any locks.

Results

On a mid-range system (RTX 3060, Ryzen 5 5600X), the final implementation achieves:

500 agents (naive): ~45 fps — barely playable
5,000 agents (grid + ISM): ~72 fps
20,000 agents (grid + ISM + ParallelFor): ~62 fps

The emergent behaviour — murmurations, splitting flocks, collision avoidance around obstacles — is visually convincing at 20k and completely breaks down in interesting ways when you tweak the perception radius or weight ratios live in-editor. That real-time tunability was one of the most satisfying parts of the project.

Takeaways for Production Code

Fix the algorithm first. Hardware can't save O(n²).
ISM is the right tool whenever you have many identical meshes with different transforms.
ParallelFor in UE is trivial to adopt — the hardest part is ensuring read/write separation.
Spatial partitioning generalises: the same grid structure powers AI perception, projectile queries, and area-of-effect checks in full production titles.

The full source is available on GitHub if you want to dig into the implementation details or adapt it for your own project.

C++ Unreal Engine 5 Simulation Spatial Partitioning Multithreading ISM Boids

Khaled Zakaria

Gameplay Programmer at Limbic Entertainment. Shipped Park Beyond on PC, PS5 & Xbox Series. Writing about Unreal Engine, C++ systems, and game architecture.

Simulating 20,000 Actors in Real-Time with Boids and UE5