- Lab
-
Libraries: If you want this lab, consider one of these libraries.
- Core Tech
Async Survival: Defusing Event Loop Blocking
A single synchronous function can take down an entire Node.js service. Because all of your JavaScript runs on one thread, one CPU-bound loop doesn't just slow itself down. It also freezes health checks, stalls every concurrent request, and spikes latency across the board, all while CPU usage looks deceptively fine. In this guided lab, you'll step into an AI-native backend whose endpoint has started blocking the event loop under load, and you'll defuse it the way a senior engineer actually does. You'll instrument the loop with perf_hooks to measure the blocking instead of guessing at it, master the ordering rules that govern process.nextTick, microtasks, setImmediate, and timers, and fix a starvation bug that the wrong primitive quietly causes. Then you'll partition the offending CPU work and yield control back to the loop between chunks, watching event-loop lag improve. You'll walk away able to detect, diagnose, and dissolve event-loop blocking and know precisely when a problem has outgrown the single thread.
Lab Info
Table of Contents
-
Challenge
Introduction to event loop blocking
Node.js can handle large amounts of concurrent I/O because it does not dedicate one operating-system thread to every request. But that model has a sharp edge: your JavaScript executes on one main thread. If one request handler spends 300 milliseconds in a synchronous CPU loop, it does not only make that one request slow. It also prevents timers, health checks, socket callbacks, and other requests from running.
In this lab, the expensive work is a similarity scorer. It loops over a batch of candidate vectors and computes scores synchronously. That is a reasonable shape for AI-native backend work: ranking retrieved context, scoring callback payloads, filtering candidates, post-processing model output, or applying local business rules after an LLM response. The specific math is less important than the operational failure mode: CPU-bound JavaScript monopolizes the event loop.
In a Terminal, start the service:
npm startIn a second Terminal, run the load driver:
npm run loadAt the beginning of the lab, the numbers will not be very helpful because the metrics functions are still stubs. By the end of the lab, the same load driver will show a clear difference between blocking and chunked execution:
MODE=blocking npm run load MODE=chunked npm run loadInfo: If you get stuck at any point, the
solutions/folder contains the completed code for every task. -
Challenge
Detect and measure the blocking
Before you can fix event-loop blocking, you need to prove that it is happening. CPU usage by itself can be misleading: a single blocked Node process may show only one busy core on a large machine, while the service is still unable to answer health checks.
In this step, you will instrument the loop with Node's
perf_hooksAPIs and capture a baseline you can compare against later. ### Task 2.1: Measure event loop delaymonitorEventLoopDelay()creates a histogram that samples how late the event loop is when it wakes up. If synchronous JavaScript monopolizes the thread, the histogram records that delay. The values are stored in nanoseconds, so you will convert them to milliseconds before reporting them.In this task, you will build the loop-delay monitor and summarize its readings. ### Task 2.2: Add event loop utilization sampling
Delay tells you the loop woke up late. Event loop utilization helps corroborate why: it estimates how much time the loop spent active instead of idle.
For this lab, you will sample utilization as a delta between two points in time so the load driver can report what happened during the recent interval. ### Task 2.3: Capture a baseline snapshot
Now that the raw metrics work, capture a small "before" snapshot and persist it to
outputs/. This gives you a concrete artifact to compare against after the blocking scorer is refactored. -
Challenge
Task scheduling and ordering
Not every asynchronous-looking primitive yields to the event loop in the same way.
process.nextTick()and microtasks run before the loop moves on to timers or I/O callbacks. That makes them useful for very small follow-up work, but dangerous for recursive CPU work. A loop that keeps scheduling more next-tick or microtask callbacks can starve the rest of the service while still looking "async" in code review. ### Task 3.1: Verify scheduling orderYou will schedule several callback types from inside an I/O callback and record the order in which they run. Scheduling from an I/O callback is deliberate:
setImmediate()vs.setTimeout(0)can be platform-sensitive from top-level code, but the order is deterministic from the poll phase. ### Task 3.2: Fix next-tick starvationThe file includes
runNextTickStarvation(), which recursively schedules CPU work withprocess.nextTick(). That function is deferred, but it does not let the event loop breathe. You will replace that pattern with bounded chunks and a real loop yield. -
Challenge
Defuse the CPU blocking and verify
The scorer still does all of its CPU work in one synchronous pass. In this step, you will apply the same technique from the previous step to the service workload: process a bounded amount of CPU work, yield to the event loop, and then resume. This approach does not make the CPU work disappear. It trades some throughput for much better responsiveness, which is often the right trade for request-serving code. ### Task 4.1: Chunk the similarity scorer
The blocking scorer in
src/workload.jsis intentionally left intact so you can compare behavior. Your job is to implement a chunked equivalent indefuse.jsthat processes the batch in bounded slices and yields to the event loop between them. ### Task 4.2: Route the endpoint to the chunked scorerThe HTTP service already passes a
modevalue intoscoreRequest(). In this task, you will makemode=chunkedselect the yielding implementation. That lets the load driver compare the old and new behavior without changing the service code. ### Task 4.3: Decide when to escalate to a worker threadChunking is a mitigation, not magic. It improves responsiveness by sharing the main thread more fairly, but the CPU work still runs on the main thread. Once a job is large enough, the right answer is to move it to a worker thread or an external worker service. In this task, you will encode a simple decision rule.
About the author
Real skill practice before real-world application
Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.
Learn by doing
Engage hands-on with the tools and technologies you’re learning. You pick the skill, we provide the credentials and environment.
Follow your guide
All labs have detailed instructions and objectives, guiding you through the learning process and ensuring you understand every step.
Turn time into mastery
On average, you retain 75% more of your learning if you take time to practice. Hands-on labs set you up for success to make those skills stick.