New wave modularity with Lerna, monorepos, and npm organizations

What is modularity?

Modularity, the principle that programs can be made up of small connected parts, is one of the main ingredients of JavaScript’s rising popularity. npm, the central repository where JavaScript modules are shared, has 8 million dollars in funding and contains over 300,000 modules.

Modularity isn’t new: the Unix Philosophy popularized its principles in the 1970s and 1980s:

Write programs that do one thing and do it well.
Write programs to work together.
Write programs to handle text streams, because that is a universal interface.¹

The ease of publishing JavaScript code made these principles actionable. The process of creating an npm module can be executed in a few minutes, and you can publish as many as you want. With few rules, the community created an abundance of modules: some only a line or two long, others complete software packages. Tiny modules that, perform trivial math, like testing whether a number is even, bring the idea of modularity to its logical extreme.

The JavaScript community’s focus on modularity has earned praise and condemnation: there’s plenty to read for and against tiny modules.¹ I won’t add another opinion to that debate: modularity in my projects is a matter of balance, not rhetoric. Several projects I work on are composed of modules and for good reason, and let’s talk about a big change in how they work.

The modular application: Turf v2

One of the systems that I help maintain is Turf. It’s a GIS library that lets you do things with geographical data, like calculating the distance between points or the area in a polygon.

You can install all of Turf’s functionality:

$ npm install --save @turf/turf

var turf = require('@turf/turf');

Or you can install one of its functions individually:

$ npm install --save @turf/buffer

var createBuffer = require('@turf/buffer');

Initially Turf was developed as a large collection of repositories and modules - one repository per module. That meant that we had a repository at github.com/Turfjs/turf, and one at github.com/Turfjs/turf-buffer, and so on: over 40 repositories total. The main turf repository - for the module you install to get all the functionality at once - simply included all of the other modules as dependencies and re-exported them:

module.exports = {
  buffer: require('turf-buffer'),
  distance: require('turf-distance'),
  bearing: require('turf-bearing'),
  // ...40 more lines of includes

Turf’s setup was very similar to systems like mercury: the main repository has almost no source code and lots of dependencies.

This was how Turf worked until the recently-released version 3, and it had some great advantages:

Users could install just the parts they wanted - if you just want turf-bearing, it could be a few kilobytes of extra code in your application instead of over 100kb for all of Turf.
Since each module had its own package, modules could have great and comprehensive tests.
Since dependencies with JavaScript and npm are versioned - you specify a specific version of a dependency and npm keeps all versions handy if you need an old one - we had the flexibility to upgrade parts of Turf individually.

This approach started to show its warts.

Managing inter-dependencies becomes scary.
Repository sprawl makes issue management harder for maintainers and users.
Managing permissions is a hack.

Managing inter-dependencies becomes scary. So, we had over 40 repositories, each containing a little bit of this system’s functionality. The repositories depended on each other: general-purpose modules were used by many parts of the Turf system. While strict versions with semver can ensure that modules depend on the right versions of each other, it doesn’t help with keeping the whole system up to date. Turf releases would often include 3 different versions of some module since 3 other modules depend on 3 different versions.

Developing new versions is equally scary: if we’re trying to improve something fundamental and shared across the codebase, like the way that Turf handles input validation, testing it in-situ requires npm link and other custom commands. This is not just a nuisance - it makes contributing much harder, since potential contributors need to learn about lots of infrastructure concerns just to write a PR.

Repository sprawl makes issue management harder for maintainers and users. GitHub is constantly improving its software, but we’re far away from a perfect system for software management. When we wanted to release a new version of Turf, we’d have to rely on custom searches or custom-built tools in order to collect all issues from all Turf-related repositories and show them in one place and as one checklist. Similarly, since users usually installed Turf with the main package - npm install turf, they would file issues in the main repository, Turfjs/turf, regardless of which function they used. There was constant tension between ticketing an issue about the turf.buffer method in the main repository or in Turfjs/turf-buffer.

Managing permissions is a hack. To release a new version of Turf, we’d have to release over 40 different modules and coordinate everything properly. And responsibility for Turf is split between multiple people - Morgan Herlocker, its creator, Tim Channell, myself, and others. We had to resort to hand-built tools like turf-owners - essentially a loop around npm owner add to make sure everyone had access to the right things. Removing people’s access was unsolved.

Lerna + monorepo

The collaboration friction caused by these problems was slowing down Turf development & putting off contributors. I wanted to fix this problem in Turf v3. Luckily, in the year that had passed a new option arose.

Several well-known software packages, including Babel, React, Ember, PouchDB, and Meteor, had adopted a new way. Instead of the traditional one-repository-per-module approach, they use a single repository that contains all of their associated modules. The modules are still published independently and you can install their parts independently from npm, and semver is still used, but the code & issues live in one place.

We call this the monorepo.

To make this work, most of these projects use a tool called Lerna. Lerna was developed in order to support Babel’s monorepo approach. It’s a command line tool that handles some tricky operations by connecting modules together and running commands against multiple modules:

lerna bootstrap: probably the most important command of the set, this links all modules in a monorepo together. This way, you can immediately test whether a change will break code that relies on a module.
lerna run test: a way to run unit tests across many modules in one command, and to make tests fail if a module fails.
lerna publish: a wrapper around npm publish that can publish multiple repositories at a time and is smart enough to only publish changed code.

So as a result, Turf’s 40+ repositories are now consolidated into one: Turfjs/turf contains a packages/ path with all of their code.

This has a bunch of great side-dependencies too:

Continuous integration testing is easier, since all tests run on all commits
Housekeeping stuff like updating license information and documentation can be done in one simple place. This change greatly simplified the way we generate documentation.

npm organizations

Switch to a monolithic repository solved a lot of Turf’s management problems, but left one: access. How do we keep track of who can publish changes to Turf, in a robust way? How can we stop using these hacky solutions that give people publish access without a paper trail?

This is where npm organizations come in. The most promoted use of organizations is private packages, but they also work well for public projects. Organizations let you manage ownership over a set of projects - both adding & removing membership to the organization, modifying which packages each contributor can access, and so on. They solve the access problem very well.

But organizations do have one big user-facing effect: they work with scoped packages. Scoped packages are still somewhat rare in the community, and they look a bit different: instead of installing turf-buffer, you can now install @turf/buffer. We run the @turf organization and can publish anything we want after the /. This is kind of nice and fancy - it’s obvious to a user whether a package is an official part of turf, and we have naming flexibility to choose short and descriptive names.

Fin

Open source systems and small modules are lovely ways to build software and they eventually run up against the same problems of scale and coordination seen in any other kind of project. These two big changes - switching to a monorepo and organization - were very successful for Turf’s case and I think they have potential for other projects made up of small but interconnected parts.

Notes & references

For: sindresorhus, substack, Against: Pete Hunt, Medium: James Long. There’s another conversation that centers around the left-pad module. You can Google for it, but I don’t recommend reading any of the articles about it: there are few thoughtful responses to the incident and many that are offensive, silly, and unproductive.
Embracing Conway’s Law is an excellent related read.

July 8, 2016 Tom MacWright (@tmcw, @tmcw@mastodon.social)