Tom MacWright

tom@macwright.com

Harder than it looks: globs

Every once in a while, you run into coding problems that can’t possibly seem real because they’re so deeply embedded and weird. This is mostly a time capsule to any person who stumbles upon the problem in the future. These kinds of problems are solveable, but solutions will make you groan.

Anyway, these articles are probably irrelevant to many: I’m sorry about that, there’ll be cool mathematical, graphical explorations soon.

First thing that’s broken to the core: globs.

Globs, including wildcards, are characters you type in a terminal to match more than one file. The most familiar glob character is the friendly * - an asterisk, which can stand for any part of a name.

So, if you have a folder with files like this:

folder
├── copy-1.txt
├── copy-1a.txt
├── copy-2.txt
└── copy-2b.txt

And you wanted to only list the files that started with copy-2, you could use the ls command with * as a stand-in for ‘anything at the end of the filename’:

$ ls copy-1*

copy-1.txt
copy-1a.txt

Great: this is a convenience that people get used to pretty quickly, and we use in lots of programs. From the application developer’s perspective, when you specify filenames with globs, your shell - for example, bash or zsh - interprets the glob for you, expanding an input like copy-* into [copy-1.txt, copy-1a.txt].

All is fine and dandy until you think cross-platform: what about Windows? Windows has multiple shells, with lackluster and sometimes missing support for globs. Users on Windows rightfully expect globs to work with your program, and from their perspective, the implementation level, whether in a shell or in an application, is irrelevant.

It gets worse.

Consider the rules:

  • globs are special characters like * that indicate flexible spaces in filenames
  • on Windows, these characters aren’t interpreted and expanded into filenames, they’re simply passed to the program as-is
  • on all operating systems, you’re allowed to use special glob characters as part of filenames: you can name a file my*file.txt

And then the table of outcomes, given a command run in a directory with these files in it:

folder
├── my*file
├── my-big-file
└── my-other-file
InputmacOS & Linux InterpretationWindows Interpretation
my*filemy-big-file, my*file, my-other filemy*file
"my*file"my*filemy*file
my\*filemy*filemy*file

Hopefully this helps illustrate the admittedly obtuse problem: from your program’s perspective, you can’t tell if, when you get an input of my*file, it’s a macOS user trying to open my*file, or a Windows user trying to get all files that start with my and end with file.

So, even though you have nice libraries to expand my*file into anything that matches - libraries like node-glob, for instance, you can’t just apply them to everything that your program sees as input, or you’ll risk treating literal inputs as globs.

So, without further ado, the solution, as inspired by eslint and soon adopted by my project, documentation.js:

  • Take the list of inputs given to your program.
  • For each input, first test whether the input is a file by using fs.stat or shelljs.
    • If the input is a file, treat it literally.
    • If the input is not a file, treat it as a glob and expand it.

All in all, you’ll need two modules and about 180 lines to make globs work cross-platform in a way that people expect them to work by default.

If you’re somehow still reading and want to even read the implementation, here’s eslint’s implementation: glob-util.js, the magical work of one Ian VanSchooten.