How a Simple `for` Loop Can Freeze Your Go Service

#golang#concurrency#backend#performance#systems-design

How a naive background loop over millions of in-memory objects caused latency spikes — and the Paced Monitor pattern we used to fix it.

How a Simple for Loop Can Freeze Your Go Service

In backend engineering, the most dangerous code is often the simplest.

We recently encountered a classic scaling pitfall in one of our core services. The system required a background reconciliation loop — a process that periodically checks the health of millions of in-memory objects.

The solution seemed obvious: write a for loop to scan the map.

The result?

A service that periodically “froze” for hundreds of milliseconds, causing API timeouts and P99 latency spikes.

This is the story of how a harmless loop turned into a performance bottleneck — and the “Paced Monitor” pattern we used to fix it.


The Scenario: Managing State in RAM

Imagine you're building:

  • A session manager
  • A job scheduler
  • A real-time game server
  • A high-performance control plane

To keep things fast, you store your objects (let’s call them Entities) in memory.

You need a background worker that:

  • Checks for timeouts
  • Repairs stale state
  • Cleans up inconsistent objects

Sounds simple.


The Naive Implementation

The standard thread-safe pattern in Go:

func (m *Manager) RunHealthCheck() {
    // 🔒 LOCK THE WORLD
    m.store.mu.RLock() 
    defer m.store.mu.RUnlock()

    for id, entity := range m.store.Entities {
         if !entity.IsHealthy() {
             m.repair(id)
         }
    }
}

It works perfectly in unit tests with 10 or 100 items.

But working ≠ scaling.


The Math: Why 1.6 Million Objects Hurt

Assume:

  • 1 GB RAM allocated for metadata
  • ~1.6 million entities fit in memory

When the health check runs:

  • It iterates over 1.6 million items

CPU Cost Per Iteration

OperationEstimated Cost
Fetch pointer~10 ns
Check condition~2 ns
Access nested data~200 ns
Total per item~250 ns

Now multiply:

1,600,000 × 250 ns ≈ 0.4 seconds

The loop takes ~400 milliseconds.

Sounds small?

It’s not.


The Real Problem: Lock Contention

The issue is this line:

m.store.mu.RLock()

We hold a global read lock for 400ms.

In a highly concurrent system, 400ms is an eternity.

During that window:

  • ❌ Writers block (Lock())
  • ❌ Readers may queue
  • ❌ API calls stall
  • ❌ P99 latency spikes
  • ❌ Load balancers may time out

Now imagine scaling to 10GB RAM (~16 million items):

16,000,000 × 250 ns ≈ 4 seconds

Your service freezes for 4 seconds.

This creates latency jitter:

  • Fast one moment
  • Unresponsive the next

The Solution: The “Paced Monitor” Pattern

We moved from:

  • Eager Locking → Lock everything at once To:
  • Lazy Locking → Lock only what we need, when we need it

Also known as:

Snapshot & Yield


Step 1: Snapshot the Keys

Instead of holding the lock during processing, we:

  1. Lock briefly
  2. Copy the list of keys
  3. Unlock immediately
func (m *Manager) GetEntityIDs() []string {
    m.store.mu.RLock()
    defer m.store.mu.RUnlock()

    ids := make([]string, 0, len(m.store.Entities))
    for id := range m.store.Entities {
        ids = append(ids, id)
    }
    return ids
}

Copying strings is much cheaper than running full health logic inside the lock.


Step 2: Fine-Grained Locking

Now we process each entity individually.

We lock only for nanoseconds:

func (m *Manager) checkSingleEntity(id string) {
    m.store.mu.RLock()
    entity, ok := m.store.Entities[id]
    m.store.mu.RUnlock()

    if !ok {
        return
    }

    if !entity.IsHealthy() {
        m.repair(id)
    }
}

Step 3: Yield to the Scheduler (The Secret Sauce)

After each item (or small batch), we yield:

func (m *Manager) RunPacedHealthCheck() {
    allIDs := m.GetEntityIDs()

    for _, id := range allIDs {
        m.checkSingleEntity(id)

        // PACING: Let user requests run
        time.Sleep(1 * time.Millisecond)
    }
}

That tiny sleep:

  • Allows waiting goroutines to acquire locks
  • Lets user requests "squeeze in"
  • Reduces tail latency
  • Prevents request starvation

The Trade-Off

We traded:

BeforeAfter
Fast scan (~0.4s)Slow scan (5–10s)
Massive freezeZero user impact
High throughputStable latency
Terrible P99Smooth tail

In distributed systems:

Background tasks must be second-class citizens.

They should never compete with the user request path.


Key Takeaways

1️⃣ Big-O Is Not Enough

Even O(n) can destroy your system at scale.

2️⃣ Locks Amplify Latency

Holding a global lock turns CPU time into system-wide pause time.

3️⃣ Throughput vs Latency Is a Trade

Sometimes slowing down background work improves overall system performance.

4️⃣ Always Think in Tail Latency

Users don’t care about average latency. They feel P99.


The Principle

If your background job:

  • Iterates millions of items
  • Holds a shared lock
  • Runs periodically

It’s not a loop.

It’s a distributed denial-of-service against yourself.


The Pattern Name

You can call it:

  • Paced Monitor
  • Snapshot & Yield
  • Cooperative Background Processing
  • Latency-Friendly Reconciliation

But the principle is simple:

Do small work. Release locks quickly. Yield often. Protect the request path at all costs.


In backend engineering, the simplest code can be the most dangerous.

Sometimes the fix is not smarter algorithms.

Sometimes it’s just:

time.Sleep(1 * time.Millisecond)

And the discipline to respect concurrency.

Ayush

Last updated by Ayush on May 3, 2026, 09:53 AM IST

ayushvish6555@gmail.com