Stream Statistics

stream-statistics is a Javascript library that implements online algorithms for descriptive statistics.

The idea came about while developing simple-statistics, a module I made to understand statistics better. That one takes full datasets, in many cases, massive arrays of numbers, but there's another approach - providing data number-by-number to online algorithms via an interface like nodejs's streams.

To be clear - stream-statistics doesn't require nodejs and can run in browsers (even old ones). When you use it as a module with npm, it tries to align to nodejs's stream specification.

That said; 'stream specification' is kind of overstating what node has - it has no prescriptive docs for how to implement streams, and my experience with making this 'compliant' has been less than sunny.

Here's a thing you can do with stream-statistics

var fs = require('fs'),
    StreamStatistics = require('stream-statistics'),
    byline = require('byline');

var ss = new StreamStatistics();
var stream = byline(fs.createReadStream(__dirname + '/samples.txt'));

// Pipe a stream of newline-separated numbers into stream-statistics
stream.on('end', function() {
    assert.equal(ss.max(), 120);
Unlike `simple-statistics`, the algorithms in `stream-statistics` don't look much like their definitions on Wikipedia - they're made to be quite fast and usable. Like `stream-statistics`, it's just one more implementation in a field of many - [Boost.Accumulators]( is a notably incredible implementation in C++ which I've tinkered with in terms of [mapnik]( The streaming quantile implementation will be inspired by the C implementation of [Efficient Computation of Biased Quantiles over Data Streams]( in [statsite]( by [Armon Dadgar]( To announce this, I wanted to finish either a neat drawing or one of the uber-difficult algorithms for a more complex statistic. The former won out; implementing quantiles was stalled for a while. The different, inpenetrable writing on Wikipedia, MathWorld, R, Mathematica, and elsewhere is a shame, and a ready example of how math fails to try to be useful in the gap between theory and pre-baked implementations. Anyway, when I get more coffee or a pull request, `stream-statistics` will do cool [quantiles]( and [k-means]( analysis. Install `stream-statistics` with [npm]( or download `stream_statistics.js` from GitHub to use it in the browser.

Posted Aug 04, 2012

Vote on Hacker News

Tom MacWright

I'm Tom MacWright. I work on tools for creativity at Mapbox. This is where I write about technology and everything else.