How a naive background loop over millions of in-memory objects caused latency spikes — and the Paced Monitor pattern we used to fix it.
for Loop Can Freeze Your Go ServiceIn backend engineering, the most dangerous code is often the simplest.
We recently encountered a classic scaling pitfall in one of our core services. The system required a background reconciliation loop — a process that periodically checks the health of millions of in-memory objects.
The solution seemed obvious: write a for loop to scan the map.
The result?
A service that periodically “froze” for hundreds of milliseconds, causing API timeouts and P99 latency spikes.
This is the story of how a harmless loop turned into a performance bottleneck — and the “Paced Monitor” pattern we used to fix it.
Imagine you're building:
To keep things fast, you store your objects (let’s call them Entities) in memory.
You need a background worker that:
Sounds simple.
The standard thread-safe pattern in Go:
func (m *Manager) RunHealthCheck() {
// 🔒 LOCK THE WORLD
m.store.mu.RLock()
defer m.store.mu.RUnlock()
for id, entity := range m.store.Entities {
if !entity.IsHealthy() {
m.repair(id)
}
}
}
It works perfectly in unit tests with 10 or 100 items.
But working ≠ scaling.
Assume:
When the health check runs:
| Operation | Estimated Cost |
|---|---|
| Fetch pointer | ~10 ns |
| Check condition | ~2 ns |
| Access nested data | ~200 ns |
| Total per item | ~250 ns |
Now multiply:
1,600,000 × 250 ns ≈ 0.4 seconds
The loop takes ~400 milliseconds.
Sounds small?
It’s not.
The issue is this line:
m.store.mu.RLock()
We hold a global read lock for 400ms.
In a highly concurrent system, 400ms is an eternity.
During that window:
Lock())Now imagine scaling to 10GB RAM (~16 million items):
16,000,000 × 250 ns ≈ 4 seconds
Your service freezes for 4 seconds.
This creates latency jitter:
We moved from:
Also known as:
Snapshot & Yield
Instead of holding the lock during processing, we:
func (m *Manager) GetEntityIDs() []string {
m.store.mu.RLock()
defer m.store.mu.RUnlock()
ids := make([]string, 0, len(m.store.Entities))
for id := range m.store.Entities {
ids = append(ids, id)
}
return ids
}
Copying strings is much cheaper than running full health logic inside the lock.
Now we process each entity individually.
We lock only for nanoseconds:
func (m *Manager) checkSingleEntity(id string) {
m.store.mu.RLock()
entity, ok := m.store.Entities[id]
m.store.mu.RUnlock()
if !ok {
return
}
if !entity.IsHealthy() {
m.repair(id)
}
}
After each item (or small batch), we yield:
func (m *Manager) RunPacedHealthCheck() {
allIDs := m.GetEntityIDs()
for _, id := range allIDs {
m.checkSingleEntity(id)
// PACING: Let user requests run
time.Sleep(1 * time.Millisecond)
}
}
That tiny sleep:
We traded:
| Before | After |
|---|---|
| Fast scan (~0.4s) | Slow scan (5–10s) |
| Massive freeze | Zero user impact |
| High throughput | Stable latency |
| Terrible P99 | Smooth tail |
In distributed systems:
Background tasks must be second-class citizens.
They should never compete with the user request path.
Even O(n) can destroy your system at scale.
Holding a global lock turns CPU time into system-wide pause time.
Sometimes slowing down background work improves overall system performance.
Users don’t care about average latency. They feel P99.
If your background job:
It’s not a loop.
It’s a distributed denial-of-service against yourself.
You can call it:
But the principle is simple:
Do small work. Release locks quickly. Yield often. Protect the request path at all costs.
In backend engineering, the simplest code can be the most dangerous.
Sometimes the fix is not smarter algorithms.
Sometimes it’s just:
time.Sleep(1 * time.Millisecond)
And the discipline to respect concurrency.
Ayush
Last updated by Ayush on May 3, 2026, 09:53 AM IST
ayushvish6555@gmail.com