- Lab
-
Libraries: If you want this lab, consider one of these libraries.
- Core Tech
Resilient Concurrency and Rate-Limiting for LLM Callbacks
In this Code Lab, you'll build a rate-limited concurrency system for handling thousands of LLM API callbacks. You'll implement queue-based concurrency control, handle rate-limit responses, manage asynchronous arrays efficiently, and maintain system resilience. When finished, you'll have production-ready patterns for safely orchestrating high-volume LLM operations.
Lab Info
Table of Contents
-
Challenge
Introduction
Welcome to the Resilient Concurrency and Rate-Limiting for LLM Callbacks Code Lab.
In this lab, you will build the orchestration layer that lets a backend safely send a large batch of LLM callbacks to a rate-limited endpoint: capping how many run at once with a promise pool, backing off when the service pushes back with a
429, retrying the failures worth retrying, and settling the whole batch into a clean success-and-failure report instead of crashing on the first rejection.About the tools and concepts
-
Rate limt: A rate limit is the endpoint's contract: it accepts only so many requests per unit of time and rejects the overflow with an HTTP
429 Too Many Requests, often alongside a hint for how long to wait. Sending an unbounded burst at such an endpoint is the fast path to a wall of429sand wasted work. -
Avoiding API starvation is the goal. When a naive job sends every request at once, the endpoint rejects almost all of them and useful throughput collapses toward zero: the work starves even though the service is available.
-
A promise pool prevents this by capping how many callbacks are in flight at the same instant. Rather than hand-rolling a semaphore, production code reaches for a small, battle-tested library (here,
p-limit). You create a limiter with a fixed concurrency, then wrap each unit of work in it: the limiter runs up to the ceiling immediately and holds the rest in an internal queue, releasing a queued caller the moment a running one finishes. This keeps offered load near the endpoint's sustainable rate instead of spiking far above it. -
p-limitalso exposes live counters,activeCountfor callbacks executing right now andpendingCountfor those waiting in its queue, so you can observe backpressure as the batch runs. -
Backkoff strategy: Capped exponential backoff is the standard answer to a
429. Each successive retry waits roughly twice as long as the last, so a struggling endpoint gets exponentially more breathing room, but the delay is capped at a ceiling so late retries stay bounded instead of ballooning into minutes. Adding jitter, a random offset on top of the computed delay, keeps a fleet of retrying callers from resynchronizing into a new thundering herd. -
Promise.allSettledis the array primitive for managing massive asynchronous arrays resiliently. UnlikePromise.all, which rejects the moment any single promise rejects,allSettledwaits for every promise to finish and reports each outcome as eitherfulfilledwith a value orrejectedwith a reason, so one failed record never cancels the other ninety-nine. -
A typed error lets the retry layer make decisions. By throwing a dedicated
RateLimitErrorinstead of a genericError, the code that catches it can retry rate-limit rejections specifically while letting genuinely broken requests fail fast.
Prerequisites
Before starting this lab, you should have:
- Understanding of promises and async/await: composing, awaiting, and resolving promises
- Familiarity with arrays and array methods:
.map,.filter, and.reduce - Knowledge of API rate limits and backoff strategies: what a
429means and why retries need delay - Basic understanding of the event loop and concurrency: what runs in parallel versus what merely interleaves
- Experience with HTTP requests and callbacks: sending a request and handling its response
The lab environment is ready to use. Run
node --versionfrom inside theworkspacefolder at any time to confirm the runtime, and the project dependencies are already installed.
The scenario
You are a backend engineer at CarvedRock building a system that processes customer data records requiring LLM transformation. The LLM service enforces a rate limit of 10 requests per second, and your application has 100 pending requests to process. Naive parallel execution exhausts the rate limit and triggers
429responses.Your task is to implement queue-based concurrency control that respects the rate limit, handle
429responses with exponential backoff, manage the asynchronous request array efficiently, and maintain system reliability so every record is accounted for.The application structure
Key files in the lab environment
workspace/src/config.js: the shared knobs (concurrency ceiling, retry budget, base and maximum backoff delay) every module reads from one placeworkspace/src/mockLlmServer.js: the mock endpoint that enforces the rate limit and returns429swith instant responses; treat it as a black boxworkspace/src/llmClient.js: the client wrapper that calls the endpoint and raises a typedRateLimitErroron a429workspace/src/backoff.js: the capped exponential-backoff delay with jitterworkspace/src/processRecord.js: the per-record retry loop that ties the client, backoff, and retry budget togetherworkspace/src/batchProcessor.js: the orchestrator that creates thep-limitpool, gates every record through it, monitors backpressure, and settles the resultsworkspace/src/logger.js: the shared stage loggerworkspace/runPipeline.js: the end-to-end runner that dispatches all 100 records and prints the summaryworkspace/data/records.js: the 100 pending records
Complete the tasks in order. Each task builds on the previous one.
Run the full workload from the workspace directory at any point with:
node runPipeline.js -
-
Challenge
Establishing the client and the rate-limit contract
Setting the system's limits in one place
Every resilient batch starts from a handful of numbers: how many callbacks may run at once, how many times a single record may try before giving up, and how the backoff delay grows and where it stops.
Centralizing these in
config.jsmeans the pool, the retry loop, and the backoff function all read from one source of truth, and tuning the system later is a one-line change rather than a hunt across modules.The values you set here matter: a concurrency ceiling that sits just under the endpoint's rate limit keeps the pipeline busy without tripping it, and a retry budget large enough to outlast the queue keeps records from failing before their turn comes.
Calling the endpoint and naming the rate-limit failure
The client wrapper is the single point where your code touches the model service, so it is also the right place to translate the endpoint's response into something the retry layer understands. A
429is not a normal error: it is a retry me later signal, and the rest of the system needs to recognize it as distinct from a malformed request or a broken record.Raising a dedicated
RateLimitError, rather than a generic one, lets the retry loop catch rate-limit rejections specifically and back off, while letting other failures surface immediately. -
Challenge
Controlling concurrency with a promise pool
Capping in-flight work with
p-limitThe pool is the mechanism that keeps the workload from overwhelming the endpoint. Instead of hand-rolling a semaphore, you use
p-limit, the library advanced teams actually reach for in production: it is small, well-tested, and removes the queue bookkeeping you would otherwise own and have to maintain.You create a limiter bound to the concurrency ceiling, then gate every record's processing through it.
p-limitruns up to the ceiling immediately and parks the overflow in an internal queue, admitting the next queued caller the instant a running one finishes, so the number of callbacks in flight stays pinned at the ceiling from start to finish.Watching the queue and the in-flight load
A pool you cannot see into is hard to operate.
p-limitexposes two live counters:activeCount, the callbacks executing at this instant, andpendingCount, the callbacks waiting in its internal queue.Reading them as the batch runs turns the pool from a black box into an observable system: you can watch the active count sit pinned at the ceiling while the pending count drains toward zero, which is exactly the backpressure signal an operator needs to confirm the limiter is doing its job and to reason about whether the ceiling is set correctly.
-
Challenge
Handling rate limits with exponential backoff
Waiting longer each time, capped, with jitter
Even a well-tuned concurrency ceiling will occasionally draw a
429: bursts overlap, windows roll, and the endpoint pushes back. The right response is to wait, and to wait progressively longer on each successive attempt so a struggling service gets exponentially more room to recover.Doubling the delay per attempt is the standard curve, but unbounded doubling quickly produces excessive waits, so you cap the delay at a ceiling.
On its own, a fixed curve also makes every retrying caller wake at the same instant, recreating the burst that caused the
429. Adding jitter, a random offset layered on the computed delay, spreads those wakeups out so the retries arrive smoothly instead of in a synchronized wave.Retrying the right failures and failing hard on the rest
Backoff is the "how" of a retry; the retry loop is the "when" and the "how many".
A robust loop draws a sharp line between failures worth retrying and failures that should stop the record cold. A
RateLimitErroris transient: wait and try again. The loop treats anything else as non-retryable and re-throws it at once, because retrying a genuinely broken request just wastes the budget. And the budget is finite: once a record exhausts its attempts, the loop raises a clear, final error so the batch layer can record a clean failure instead of letting the record hang or silently vanish. -
Challenge
Managing asynchronous arrays and resilience
Settling every callback instead of bailing on the first
With the pool, backoff, and retry loop in place, the array of pool-governed promises is already in flight. The choice of how you await that array is what makes the batch resilient.
Promise.allwould abandon the entire run the instant any one record threw its final error, discarding the ninety-nine that succeeded alongside the one that failed.Promise.allSettledinstead waits for all of them and reports each outcome independently, which is exactly the behavior a batch job needs when you expect a few records to fail.Splitting the outcomes into success and failure
A settled array is a list of outcome objects, not results: each entry reports a
statusof eitherfulfilledorrejected, with the real value or the error tucked inside.The final step reshapes that into the report a caller actually wants: the transformed values that succeeded and the reasons that failed, separated. Partitioning the settled array by status gives the runner a complete, honest picture of the batch: how many records made it through and exactly which ones did not, which is the difference between a job you can operate and one you can only guess at.
-
Challenge
Run the full pipeline
Now that every task is complete, run the end-to-end workload to watch the orchestration layer absorb the rate limit and settle all 100 callbacks.
-
Confirm the runtime is available:
node --version -
Start the workload from the workspace directory against the full batch of 100 records:
node runPipeline.js -
Watch the log stream print an
[INIT]line as the dispatch begins, then a series of[POOL]lines reporting the liveactiveandqueuedcounts, and finally a[DONE]line reporting how many callbacks settled and how long the run took. -
Notice the
[POOL]snapshots: the active count holds at the ceiling you set while the queued count falls steadily, visible proof the pool is pacing the work rather than sending it all at once. -
Confirm the final
[DONE]summary reports succeeded: 100 and failed: 0. Every record made it through despite the endpoint's rate limit, because the pool held concurrency at the ceiling and capped backoff absorbed the429sthe endpoint returned. Because the endpoint responds instantly, the run's pace is set entirely by the rate limit and your backoff, so it settles in roughly ten to thirteen seconds.
Expected result: Every layer you built is visible in one run: the pool caps in-flight callbacks and exposes its queue, the client raises typed rate-limit errors, capped backoff spaces out the retries, the retry loop recovers the transient failures, and
Promise.allSettledsettles the whole batch into a clean 100 succeeded, 0 failed report instead of collapsing under a wall of429s. -
-
Challenge
Conclusion
Congratulations on completing the Resilient Concurrency and Rate-Limiting for LLM Callbacks lab!
You have built the orchestration layer that turns a naive parallel burst into a production-grade batch: capping concurrency with a
p-limitpool, backing off on rate limits, retrying the failures worth retrying, and settling every callback into a complete success-and-failure report. These are the patterns you need to safely orchestrate high-volume LLM operations.What you have accomplished
- Set the concurrency, retry, and backoff limits: Defined the concurrency ceiling, retry budget, and backoff bounds once in a shared config every module reads from.
- Wired up the client and detected rate-limit responses: Routed every record through one client wrapper that raises a typed
RateLimitErroron a429. - Created the pool and gated the workload: Built a
p-limitpool at the concurrency ceiling and routed every record through it so in-flight work stays pinned at the limit. - Surfaced backpressure with the pool's live counters: Read
activeCountandpendingCountto make the queue and in-flight load observable as the batch runs. - Implemented capped exponential backoff: Waited a bounded, exponentially growing, jittered interval before each retry so a struggling endpoint recovers and retries never resynchronize.
- Added bounded retry and error recovery: Retried rate-limit failures within a budget, failed fast on non-retryable errors, and raised a clear final error on exhaustion.
- Awaited settlement of the whole batch: Used
Promise.allSettledso one failure never cancels the run. - Partitioned results for a complete report: Split the settled array into succeeded values and failure reasons for an honest, operable summary.
Key takeaways
- A promise pool such as
p-limitkeeps offered load near the endpoint's sustainable rate, which avoids the API starvation a naive parallel burst causes and prevents far more429sthan any retry strategy can clean up after. - The pool's
activeCountandpendingCountcounters turn concurrency into something you can observe and reason about in production, not just configure and hope. - Capped exponential backoff with jitter is the standard pairing for rate-limit recovery: the exponential curve gives the endpoint room, the cap keeps late retries bounded, and the jitter stops a fleet of callers from retrying in lockstep.
- A typed error turns retry logic into a clear decision: retry the transient failures, fail fast on the rest.
Promise.allSettledis the resilient way to await a batch: it reports every outcome instead of abandoning the run on the first rejection.
Experiment before you go
You still have time in the lab environment. Try these explorations:
- Lower
MAX_CONCURRENCYand rerun the workload: watch the[POOL]active count drop and the total time climb as fewer callbacks run at once. Then raise it well above the endpoint's ceiling and watch the429sand retries multiply. - Lower
MAX_RETRIEStoward5and watch the failed bucket fill: with instant responses, a record that runs out of attempts before the queue clears fails, which is exactly the starvation the retry budget prevents. - Raise
MAX_DELAY_MSand observe how a higher cap lengthens the tail of the run as late retries wait longer. - Add a
[RETRY]log line inside the retry loop so the timeline shows each backoff as it happens, then watch where the retries cluster during a run. - Explore
p-limit'sclearQueue()method: imagine a fatal condition partway through the batch and reason about how draining the pending queue would let you abort the remaining work cleanly.
About the author
Real skill practice before real-world application
Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.
Learn by doing
Engage hands-on with the tools and technologies you’re learning. You pick the skill, we provide the credentials and environment.
Follow your guide
All labs have detailed instructions and objectives, guiding you through the learning process and ensuring you understand every step.
Turn time into mastery
On average, you retain 75% more of your learning if you take time to practice. Hands-on labs set you up for success to make those skills stick.