Waltz

(new Teen Mom song)

RPL (2) - Terrarium

Bark Bench Terrarium

CC-BY-NC joshleo

The core of rpl is terrarium. Terrarium is an alternative to JavaScript’s eval() function and node’s vm.runInNewContext(). Unlike these systems, rpl is designed to run code as full-fledged scripts and to manage its execution as a proper full-fledged process.

The precedecessor to rpl, called mistakes.io, used eval() at its core - it would execute each 1..n subset of code text in eval() and display the result. While this worked very well for imperative code that didn’t use timing functions, but couldn’t handle async and timed code properly. This ruins the fun of JavaScript.

The problem is that some programs will finish executing in one pass:

var x = 10;

But other programs will never exit:

setInterval(function() {
  var x = 10;
}, 100);

JavaScript has a few different ways to run forever - setInterval, setTimeout, process.nextTick, and so on. It’s very difficult to ‘clear all intervals’ without brutal hacks.

The other problem of running code in a sandbox is scope: if you run something like eval('x = 10'), the variable x will now be defined in the outer scope. While node’s vm.runInContext method gives more control over scope access, there isn’t a browser equivalent with the same behavior.

Terrarium’s solution is true nesting in both cases - for node, it creates a module on the fly and runs it with child_process.fork to create a new subprocess with a stream for communication. In a browser, it creates a new HTML resource with the javascript code embedded and runs it in a new iframe with window.top as a communication mechanism.

This allows the sub-scripts in question to do anything normal code can do - in a browser, you can create elements and attach events to them, and in node, you can create web servers, access files, and all the rest. And all this without hacks necessary even in node core to make the default node REPL act like software.

var T = require('terrarium');
var terrarium = new T.Node(); // or T.Browser();
terrarium.on('data', function(d) {
  // instrumentation data
  terrarium.destroy();
});
terrarium.run('//=1');

Terrarium is a low-level API: you create a sandbox, write code to it, and receive instrumentation data. The API is identical between node and the browser.

var terrariumStream = require('terrarium-stream');
var terrarium = new terrariumStream.Node();
terrarium.on('data', function(d) {
  // instrumentation data
  terrarium.destroy();
});
terrarium.write({ value: '//=1' });
  • browser: iframe, communication via window.top
  • node: subprocess, communication via fork and streams

terrarium-stream is a slightly higher-level API that connects the node stream lifecycle - creating, reading, writing, destroying - to terrarium. Using terrarium-stream, rpl is able to interface with browser & node code and build an extremely simple websocket-based interface using shoe.

Three Things

Slides

A presentation given at NYPL to librarians in the historical mapping space.

Change over Time

I think the biggest unsolved problem in maps is change over time. This is unfortunately an unsolved problem in most data structures, which either represent change as replacement, like you can see with street maps - Google Maps, Mapbox, and so on - or as copying, which you can see in annual sources like the US Census. Tools that manage change over time in a mature way and which can present it as a fundamental characteristic of data are extremely rare. Specifically, OpenStreetMap is the only mainstream example.

Cracking this code is necessary because it’s the only way to solve collaboration, as software development has shown us. Once multiple people interact with a source, you can no longer count on changes being ordered and ‘versions’ being a strong concept.

Mapbox is working on this problem as part of our larger push towards creating the first scalable geospatial database. GeoGit and dat are also working on potential solutions. We should talk about this, because it’s a massive technology change that will rely on it fitting your use cases and rely even more on network effect.

Things to read:

Rights

The second stumbling block for data to work, and I know there’s a big copyright symbol there, but a better description would be rights and expectations.

Historical data often has the luxury of Public Domain status in the United States, but Public Domain is an American flavor of a concept that has different and sometimes doesn’t have a representation elsewhere. The products derived from Public Domain data, whether they’re extracted buildings or even just scans - licensing of these artifacts is more or less the choice of the maker.

Maps in particular are a battleground for copyright law for two reasons: they have forms as database, data, and image, and they are mostly useful in combination with other data.

Copyright matters to you because combinations of data are combinations of licenses, and the future is in datasets that come from a lot of places. At Mapbox we already have projects like OpenAddresses and a Satellite layer that combine more than 30 datasets and thus have the aggregated legal boilerplante of all of them. The friction of this combined with the legal risk is a brake.

Things to read:

Standards

Finally, maps need standards. Like the copyright symbol before, I don’t really mean standards. I mean a standards body. Much like copyright, this is a prickly subject, which is why I chose it. But storage and collaboration are the pain points where being unique is a risk: nobody wants to store data in a format that won’t be openable in 10 years, or 5 years even. And everyone wants to share data in the format that everyone else uses.

There are roughly three kinds of standards: those that come from a standards body, like WMS from the OGC, those from a company, like KML from Keyhole, and those from interest groups, like GeoJSON. The lines admittedly blur. In the last five years, little of importance has come out of a standards body. And the reason for companies like Google or Esri to publish standards is for their products to be more successful. Those products used to be desktop applications like Google Earth, which inspired KML, but are moving to the web: so the way they communicate is behind the scenes and in APIs, not formats. File formats as a user-facing concern are losing their shine for startups.

Interest groups are the theoretically purest way to produce standards, but they’re rare and the people who participate need to be operating in a sort of time surplus, or have complex motivations that boil down to the previous two types.

Things to read: