Asynchronous loops and list generation in node.js
If you've interacted with node.js any further than 3 lines of code on a weekend before saying "Screw it, this isn't worth my sanity. I'm going back to python/ruby/PHP", then you've probably already encountered this problem. Take the following python code for example:
import os
data = []
for file in os.listdir('.'):
if os.path.isfile(file):
f = open(file)
data.append(f.read())
f.close()
# process data list, for example sorting it then storing its contents in redis
Now in a synchronous paradigm, this is pretty straightforward to write. When you want to do the same thing in node.js that has asynchronous file IO, this becomes a bit more complex.
Disclaimer
There are many ways to approach this problem, and the solutions I'll explore in this article are certainly not the only ones. There are many control flow libraries on npm and elsewhere that will help you do these things. My approach here is a minimal one that uses no third-party libraries as an exercise in leaner code and better understanding of the asynchronous paradigm.
If you're looking for ready-made solutions, look into async, fibers or async-foreach.
Naive solution
I'm calling this solution naive because it works only thanks to closures, and as such is applicable only in simple situations where you can keep your code inline.
var fs = require('fs');
var data = [];
function callback(err, data) {
// process data list
console.log(err, data);
}
fs.readdir('.', function(err, dirlist) {
// error handling omitted for concision
var count = dirlist.length;
function processFile(filename) {
fs.stat(filename, function(err, stats) {
if (!stats.isFile()) {
if (--count == 0) {
callback(err, data);
}
return;
}
fs.readFile(filename, function(err, contents) {
data.push(contents);
if (--count == 0) {
callback(err, data);
}
});
});
}
dirlist.forEach(processFile);
});
Ideally, I would want that processFile
function to be outside the scope of
my readdir
, so I can reuse this code elsewhere and reduce my nesting
level.
I could write this function:
function processFiles(count, data, callback) {
return function processFile(filename) {
// same as before
}
}
and change my forEach
callback to be
dirlist.forEach(processFiles(dirlist.length, data, callback));
But this function is still very specifically handling my use-case of wanting
all the results in a list (while in another use case, I might want to process
the file contents individually as they are made available).
On top of this, I still can't use processFile
independently, and there's a
lot of code managing control flow in a function that just needs to read file
contents.
Did we even solve anything at all? Let's move on to another solution that can actually get rid of these flaws.
Asynchronous forEach
callback
We all have this toolbelt of functions that we often use and sometimes rewrite, and this one is definitely part of mine.
var _ = require('underscore');
function forEachCallback(length, callback) {
var delayed = _.after(length, callback);
delayed.res = [];
return function(err, res) {
if (err) {
return callback(err);
}
delayed.res.push(res);
delayed(err, delayed.res);
}
}
Now this is a simple function wrapper that will take any standard callback
function and make it fit to use in an asynchronous forEach call. The first
argument is the number of elements present in the list (the count
variable
we used before), the second is the function that will process the list of
results.
forEachCallback
will store the data in an array everytime it is called, and
after length
calls, will trigger the callback with the array of results. Or
if one of the calls results in an error, trigger the callback immediately with
that error.
We can rewrite our example:
var forEachCallback = require('./utils').forEachCallback;
function processList(err, data) {
// process data list
console.log(err, data);
}
function processFile(filename, cb) {
fs.stat(filename, function(err, stats) {
if (!stats.isFile()) {
return cb(err, null);
}
fs.readFile(filename, function(err, contents) {
cb(err, contents);
});
});
}
fs.readdir('.', function(err, dirlist) {
var callback = forEachCallback(dirlist.length, processList);
dirlist.forEach(function(file) {
processFile(file, callback);
});
});
processFile
is much simpler as a result, we minimized our use of closures,
but most importantly we've made processFile
generic. So if we want to change
our code to process each file independently, we can do
function processContents(err, data) {
console.log(err, data);
}
fs.readdir('.', function(err, dirlist) {
dirlist.forEach(function(file) {
processFile(file, processContents);
});
});
You said you wouldn't use any third-party libraries but this code uses underscore.
This is true, and while underscore is a super-lightweight, extremely useful
toolbelt lib, you might have your reasons to want to opt out of using it.
The only underscore function we're using in this particular snippet is
_.after
which we can easily rewrite.
function after(count, fn) {
return function() {
if (--count < 1) {
return fn.apply(null, arguments);
}
}
}
Final words
It's important to be able to separate control-flow related code from the rest in your node.js app. In my opinion, when people say "I can't keep track of anything in node.js, it's a callback-ridden nested mess.", it's usually because they didn't make this separation properly – it's a common mistake especially from people who are writing asynchronous code for the first time.