Tom MacWright

tom@macwright.com

A mental model of what d3-queue does

A mental model of callback management challenges and what d3-queue does.

What it is and who it’s for

d3-queue is a library for JavaScript that makes it easier to work with multiple asynchronous functions that that use callbacks. If you’re only dealing with one asynchronous function, or if you use Promises, you probably don’t need d3-queue. For managing multiple Promises, you can use Promise.all for a small number of promises, or use p-all for concurrency control for large numbers of promises.

What d3-queue does

Let’s start with a reasonable expectation: that code works in the order you write it. For instance, if you wrote

a = 1
b = 2
a = b + 1

You would reasonably expect it runs line-by-line, in order. If, instead, the third line ran first: a = b + 1, that would be unexpected.

For all kinds of synchronous code, this mental model works: the order you write things in is the order that they run. You can also think of it as each line waiting for the previous line to finish.

There’s another kind of code: asynchronous. Asynchronous code doesn’t run in order.

get('a.com', function() { console.log(1); });
get('b.com', function() { console.log(2); });
get('c.com', function() { console.log(3); });

This code might say 1, 2, 3, or 2, 1, 3 or 3, 2, 1 or any permutation of the three numbers: it all depends on when it gets the response from a.com, b.com, and c.com.

This is a good thing in an important way: performance. If we, instead, waited for the response from a.com before asking for b.com, and waiting for b.com before asking for c.com, the total time taken could be a lot more than the asynchronous approach, since it lets all 3 requests start at the same time.

But to get that performance boost, the asynchronous code can no longer guarantee an order for results. The asynchronous approach also makes it harder to answer the question of when you’re “all done”. You supply a function that tells you when a.com, b.com, and c.com are done, but how do you know if all of them are done?

That’s callback management

The three problems

There are three big problems that d3-queue solves for you. Those are:

  1. Completion: how do you know when work is done?
  2. Order: how to you keep track of which result belongs to which request?
  3. Concurrency: what happens when you do a lot of work all at the same time?

Visualizing callback management

Sync requests

So this is what requests in series would look like. See how easy that is to manage? Well, you get that by default in Python and some other languages. As you can see, doing requests in series doesn’t trigger these problems: you know when the work is complete, the order that results come back in is the same in which you send requests, and the concurrency is always 1.

But that isn’t the JavaScript default. Because of its goal of high performance and background as a language for interactive interface, things like ‘http requests’ are async by default in JavaScript.

Async requests

This is what async requests look like: they immediately ‘go out’ and then the results come back whenever they are independently ready.

  • Completion: So it’s much harder to know when you’re done. The last request you sent might be the first to get a response.
  • Order: likewise, the results come back in whatever order they please, not in the order you send the requests.
  • Concurrency: all of the requests are active at the same time.

Without these easy guarantees, it’s really hard to write applications: if you want to do something with the results from A, B, C, it’s hard to tell when you have all of them, and hard to figure out which response is A.

d3-queue

These are the problems that d3-queue solves: you put requests into it, and it gives you one callback, called await or awaitAll, that fires when everything is done and gives you results in the same order you supply requests in.

  • Completion: You know when you’re done: it’s when awaitAll or await is called.
  • Order: Results come back in the order you expect: in the order of requests.
  • Concurrency: we haven’t gotten to that yet

Concurrency

Finally, let’s talk concurrency. If you have lots and lots of requests: like if you’re trying to scrape thousands of websites, by default, async methods mean that all of those requests happen at the same time. That can be bad news bears for both your computer and the servers it is hitting. You want to pace yourself.

That’s what the concurrency option in d3-queue is for: it lets you limit the number of requests that happen at the same time.

If you omit concurrency like queue(), it’s infinite, but if you say queue(2), only 2 requests will be active at a time, or if you set queue(10), then 10.

FAQ

Where do I call the functions and put my callbacks?

It can be weird to adapt your code to use d3-queue, because you’ll be used to seeing a function like

get('a.com', function(err, res) {
  console.log(res);
});

You call the function get, and you provide a callback to it, and you add a.com as its argument.

d3-queue dispenses with all that: you don’t call the functions yourself, and you don’t provide individual callbacks for them.

var q = queue();
q.defer(get, 'a.com');
q.await(function (res) {
  console.log(res);
});

See how get doesn’t have () near it? You aren’t calling it: you’re telling d3-queue to call it when appropriate. Likewise, you don’t provide it a callback: d3-queue takes care of that.

Why?

  • Why don’t you provide a callback? Because d3-queue adds a little ‘bookkeeping’ callback automatically that keeps track of the result and saves it for when you get it in await
  • Why don’t you call the function yourself? As you saw in the section about concurrency, get might not run immediately. Providing it as a function to be called lets d3-queue potentially wait before calling it, until there’s concurrency to spare.

What about errors?

d3-queue relies on a tradition from Node: callbacks work like (err, result): the first argument of a callback is always an error. So, if d3-queue ever gets a callback with an error as the first argument, then it cancels the whole queue and calls await or awaitAll with that error.

Meaning: this will just work with most libraries, but if you’re writing your own async methods, they’ll need to follow the (err, result) pattern.

What’s the difference between await and awaitAll?

q.defer(get, 'a.com');
q.defer(get, 'b.com');

q.await(function (err, aResult, bResult) {
  // each result is provided as an argument
});

q.awaitAll(function (err, results) {
  // results are provided as an array
  results[0]; // = aResult
  results[1]; // = bResult
});

When to use each? Generally I’ll use awaitAll if you’re doing the same operation to a long list of inputs, like if you’re scraping lots of websites. And use await when you’re doing a small number of operations or doing operations that are different: like if you’re creating a directory and reading a file, it makes more sense to get those 2 or 3 results as named arguments rather than an array.

What does queue() do?

d3-queue follows a pattern from the d3 visualization library, in which you create instances of objects by calling a function named as a noun. For instance:

var scale = d3.scaleLinear();

This can be confusing new programmers: is it creating a new scale, using a pre-existing scale, or something else? In other JavaScript systems and other languages, you might see a pattern like:

// the 'new' keyword creates instances of classes
var scale = new d3.scaleLinear();

// or sometimes functions will be named with verbs
var scale = d3.createScaleLinear();

So, in d3-queue’s case, we require the module with require('d3-queue') and name that result queue() and then use that method to get an object called q. There aren’t many clues in these words.

Put simply, what the d3-queue module exports is a method that creates a new queue. You could name it createQueue and you might want to, since that certainly makes it more obvious:

var createQueue = require('d3-queue');

// You can create multiple queues: they work exactly as described
// in this article.
var pageRequestQueue = createQueue();
var imageRequestQueue = createQueue();

When to use what

  • Callbacks
    • 1 callback: no need for a library
    • Many callbacks: d3-queue
    • Infinite callbacks: if you want to control concurrency of infinite callbacks, like if you have a certain server-side request for which you want to make sure you don’t overload a system, then d3-queue isn’t the right answer. d3-queue will always keep track of all results, so in this case it’d end up just growing and growing forever. Use node-pool or another concurrency control library
  • Promises
    • 1 promise: no need for a library
    • Many promises with no concurrency concerns: Promise.all
    • Many promises and concurrency concerns: p-all
    • Infinite promises and you want to control concurrency: node-pool or p-throttle