Fuzz Testing

Fuzz testing is a neat and under-appreciated way to find bugs in software by brute-force.

Fuzzing is just creating random invalid input and seeing what happens. With enough invalid input, it’s often possible to uncover stuff that will crash servers and cause JavaScript to throw exceptions.

Fuzzing is especially great for parsers. Let’s say you have a parser for math expressions:

2 + 5 = 7

A fuzzer will mutate this into a bunch of invalid variants, and run them against your code. The vast majority will be invalid, some minority will be valid but wrong, and then, usually, there will be a few that throw a wrench in the gears and cause a crash

2 + 5= 7 // valid
2 5= 7 // invalid
2 + 5 = 7fdsaflka // invalid
.+ 5 = 7fdsaflka // CRASH

In The Wild

I’ve written a handful of parsers and generators, like tokml, togeojson, geojsonhint, and others. They’re usually satisfying to write, because they can be purely functional and sometimes get to use good specs, like GeoJSON, as their basis.

These libraries are deployed in places like geojson.io, mapbox.com, and elsewhere. Geojson.io has been essential to the development process, because it’s hooked up with getsentry, a tool that catches crashes and errors and emails me backtraces. This way, I’ve learned about a lot of bizarre ways in which files can be invalid - weird variations on KML, GeoJSON, WKT, and absolutely everything else.

I’ve learned that the range of data malformations and screwups is much more than anyone can imagine: every possible variation exists in the wild. So, fuzzers to the rescue: I wrote a tiny bit of code that does fuzzing, called fuzzer, and used it with a few libraries, and it quickly identified corner cases: in tokml, a KML generator, wellknown, a WKT parser, and elsewhere. fuzzer then grew a binary called fuzz-get that runs messed-up GET requests against an API endpoint, so that I could make sure some new Mapbox web services wouldn’t crash.

The only real neat implementation detail of fuzzer is that it uses random-js with a fixed seed, so it provides random-seeming mutations, but always produces the same series. So tests that pass locally will pass everywhere, forever.

For example, the tests for wellknown now include this fuzzing run:

test('fuzz', function(t) {
  fuzzer.seed(0);
  var inputs = [
    'MULTIPOLYGON (((30 20, 10 40, 45 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5)))',
    'POINT(1.1 1.1)',
    'LINESTRING (30 10, 10 30, 40 40)',
    'GeometryCollection(POINT(4 6),\nLINESTRING(4 6,7 10))'];
  inputs.forEach(function(str) {
    for (var i = 0; i < 10000; i++) {
      try {
        var input = fuzzer.mutate.string(str);
        parse(input);
      } catch(e) {
        t.fail('could not parse ' + input + ', exception ' + e + '\n' + e.stack);
      }
    }
  });
  t.end();
});

There are several types of fuzzers, and fuzzer is the simplest: it mutates known ‘good’ input. Inside, all it’s doing is incrementing and decrementing numbers, chopping letters off of words, messing with object properties: not rocket science. Down the line, it should learn to generate input from given models. It’s somewhat limited in what it can produce - only JavaScript objects in and out. To really kick the tires on toGeoJSON it’ll need to generate and mutate XML documents. In other languages there are pretty solid and tested fuzzers, like rfuzz for Ruby, sulley for Python, fuzzdb, and untidy for XML.

Not all code needs to be bulletproof. In the past, I’ve believed that core code had the right to crash on invalid input. But, for parsers, a ‘validate’ step is essentially a parse in and of itself, so where do you check? And given the hackiness of using try{}catch{} everywhere, and the idea that code is eventually exposed to the outside world, I think parsers must be more tolerant of problems.

Recently

March was a month of traveling.

NY

The first week I spent teaching a high school class in upstate New York.

In the afternoons, I went on cold runs and worked on the Mapbox Android SDK.

I camped out in the school’s darkroom until 1am, blasting Drake and relearning film chemistry.

SF

We packed the new office, in the top floor of the Code for America building.

On the weekend, we biked north to Tiburon.

DC

Next, to regroup with Teen Mom and properly celebrate birthdays with close friends.

CONSUMPTION

Watching: The Man Who Fell to Earth, Seinfeld

Reading: The Hard Thing About Hard Things, How to Win Friends and Influence People, Seasteading, Going into Detail, the start of something new.

Listening: J. Cole: Born Sinner

ELSEWHERE

Eric Theise contributed distributions to simple-statistics, and it’s nearly there. I helped Alex Barth, Ian Dees make openaddresses.io, a project to index & distribute address data. I wrote geocodify, a CSV geocoding tool and landpatch, visualization for our party. The Spanish translation of mapschool went live. The DC Code project got airtime on the Kojo Nnamdi Show.

Teaching

Education means a lot of things: self-education, online classes, Pre-K. It’s as important as it is complicated, especially in technology. Many programmers learned their first lessons on a family computer, not in a classroom. That’s where I started, and without deeper consideration, it could be “the way”: formal education in the backseat, adolescent hacking at center stage.

But self-starting doesn’t scale like we want it to. There are kids with home computers who have time and will write scripts and hacks for fun and have a supportive peer group who will think that’s cool. But they look a lot like I did at the time: white boys with parents who can afford a real computer, and people around them who understand or accept their interests.

I don’t think this will change on its own. Computers are cheaper and more powerful than they were in my youth, but they are poorer gateways to experimentation and learning. Gender stereotypes are still strong and destructive, as are many other lines upon which people are group- and self-discouraged to experiment.

Educational resources are one escape hatch: with a little technical knowledge and a reason to learn, you can dial up sites like Codecademy and advance up the rungs of knowledge. But those qualifications are brutal. What seems like an iota of prerequisites to me is a mountain. The spark, the motivation, to take the leap, is huge, and without it few have reason to trudge through the painful initial learning curve.

Teaching can help. It’s hard and time-intensive and can be draining, but it’s uniquely able to inspire and inform.

So I decided to try: I spent a week co-teaching a class with Sarah MacWright at Millbrook School in upstate New York. With ten high schoolers, for five days of five hours a day, and we learned how to make the web. Here’s how it went.

Building the web with HTML, JavaScript, and CSS.

The choice of topic wasn’t obvious or easy. As _why astutely said in his Art && Code talk, why would anyone choose to teach a just-starting programmer three languages at the same time? You could use a single language and a single environment, like Python or Ruby. But the end product matters: on the web, kids can share and brag about their creations. And they can connect to websites they already use in a new, fascinating way: view-source feels like understanding deeply.

Then, the question of abstraction: should we teach HTML or Wordpress, or another CMS? Wordpress quickly makes real-looking websites, and spares kids the worst parts of the learning curve.

But our intent was to create deeper understanding and creative urges, not just to make websites. The pain of HTML, CSS, and JavaScript lets way to a different way of thinking about the web and a cheat code to understanding it in a universal light.

The First Lesson

The order of languages taught follows their difficulty level and distance to the ‘basic webpage’: HTML for content, then CSS for style, and finally JavaScript for behavior.

I didn’t want to start the class with a ‘framework’ or Hello, world on a white page. We need a better onramp: something that connects and entertains.

We used ‘Inspect Element’. Everyone opens the school homepage, and right-clicks on a title. They see their first pageful of confusing HTML, but have a clear mission: vandalism. In practice, it was gentle trolling: changing the name of the school to ‘My School’ and redesigning the page in fashionable shades of pink, amongst other tweaks.

Then we moved on to start changing behaviors and animations, by learning our first bits of JavaScript. Given a Flappy Birds Clone, students typed their first lines of JavaScript, changing gravity, acceleration, the gap between pipes, and everything else.

This wasn’t a lesson to teach the nitty-gritty. We imparted the hacker mentality, that we can change the stuff in computers, and braved the first sight of intimidating software code while ensuring kids felt positive and were having fun.

The Nitty Gritty

The second day, we started building from scratch. I gave a short, simple presentation on the basics of HTML, and students followed along. Setting up text editors and learning how to save & refresh would have cost time, so we used jsbin to edit our first bits of HTML.1 First, Hello, world, and then Hello, <strong>world</strong>. Then tinkering with tags and typing. In this early stage it’s vital to have instant updates so kids can try variations quickly: this attempt-fail-succeed loop is where you learn.

Then, a presentation about CSS and similar tinkering.

Words Help Us Learn

Around this point I noticed that my style of teaching didn’t use terms effectively.

Words contain and define concepts, and they structure learning. Mine had blended together in day-to-day work, so I wasn’t nearly discrete or precise enough in my language.

For instance, I should have taught the concept of matching HTML start tags to end tags as ‘balancing’, with exercises and analogies, not as a detail or peculiarity of language. Then you can talk about balancing, and kids can grasp why it’s a common characteristic to the languages they learn. I needed to create words and mnemonics for when to use which brackets in which language. Familiar metaphors for syntax like ; in JavaScript - “it’s like the end of a sentence” helped students in a way that tens of mistakes wouldn’t.

First Bite of JavaScript

Then, the final language: JavaScript. I ran through a quick presentation and encouraged the class to try out examples in mistakes.io.2

As expected, teaching JavaScript is significantly harder than HTML or CSS. These lessons, more than any others, identified disconnects between the world of code and the language of English.

  • People are sentence-case by default. HTML is permissive, CSS is less, and JavaScript is even less.
  • Balancing brackets like method({}) requires a mental queue of what you still have to end. The ability to do this varied widely.
  • It was remarkably easy to avoid an in-depth explanation of objects and JavaScript ‘warts’ like this.
  • Explaining ‘what happens when’ in JavaScript is extremely hard.
  • JavaScript error reporting in browsers, while great and improving, could always be better and friendlier.

Only two students decided to grapple with JavaScript for the rest of the class, and they constructed incredible games. For the rest of the class, this was just a glimmer, and a lesson I was divided on: the amount of knowledge you need to start using JavaScript is significant and its usefulness your first HTML/CSS websites is limited. But, conversely, as the only imperative programming language, it has creative and academic potential greater than any markup language.

~/Sites

The first lessons used online programming environments that imitated live-coding, but to move forward, we needed to start building websites - that means files, editing, and servers. We set up Sublime Text 2 for each student, and I ran a Terminal window with python -m SimpleHTTPServer running from the ~/Sites directory. Instead of just typing, students needed to save and click ‘refresh’ to see changes.2

In usage, Sublime Text had a number of issues that made it feel like a sub-optimal choice.1 In hindsight, I might have used TextWrangler instead, if an even-simpler text editor doesn’t crop up.

Exercises & Handouts

Sarah wisely added exercises to intermittently help kids solidify their learning and check that they understood what we were talking about. After each lesson, we would make a quick ‘about me’ page or a <table> of our favorite foods: tasks that encourage students to use their skills creatively.

Then we created ‘cheatsheets’ with examples of HTML elements and tips for writing new pages. These were quickly and persistently adopted by the class.

Student Projects

On day three, students began their projects: four students built photo portfolios, two built games, and four worked in a group creating a student blog with Tumblr.

Portfolios

In an afternoon, I built a simple template for photo galleries, with concise HTML & CSS markup. By link-dropping a ZIP file of it on Dropbox, I gave students a running start on their projects: quickly we went towards Google Fonts for customization, switched to auto-advancing slides, tweaked colors and other details.

Seeing the students use the web as their medium was fascinating, especially in those who had experience in photography and framing. There grew new logical questions: how should a carousel work with different aspect ratios? Given infinite options for backgrounds, which complements their photography and style?

Games

Though many expressed interest, only two students ended up building games. A freshman and senior, they started with the Coquette framework which provides fundamentals like collision detection and a canvas to draw on. The game above, called The Maze Raze, features an awesome hand-drawn player and this student’s first use of trigonometry in the real world.

Another game, called Flying Mario, evolved from a side-scroller to a helicopter-like game with Mario-inspired graphics and style.

The games were far different in scope and skills than the other projects - there was an awesome moment where a student used trigonometry in the real world for the first time, making little pixel rectangle targets move in a video game. But we also bumped into some of the hard problems of games, like collision detection and how have different entities in the game exhibit different behavior.

I was fascinated by the ‘tuning’ tasks that students quickly mastered, like adjusting the ratio between gravity and jump height to make a game tricky, or designing a maze with nothing more than pixel coordinates and sizes of rectangles.

Coquette was fantastic for this purpose and let us focus mostly on actual topics and behavior of the games. For the small issues we found, I’m making notes to create examples or patches to make things even better.

Integration

At lunch on Thursday, I was talking with the math teacher about class, and reported that two of my students had used math: not just arithmetic, but multiplication, trigonometry, algebra.

I realized something I never had as a student: schools have multiple subjects that are independent. In each subject area, students can be two or more years ahead or behind, or can even opt-out.

The difference between how I barely learned in school and how I learn today is that I now jump subjects through combination. For instance, I wanted to learn statistics, so I wrote simple statistics, bootstrapping my non-existent stats skills with my decent coding skills. This way you learn by application and avoid the knowledge vacuum. And you have the fun experience of finding new subjects within your range.

But when you don’t know if students have taken tech class or art class, you can’t teach this way. And ideally, that’s how you teach coding: you teach fundamentals, and then apply them everywhere, in math, in art, in English. Coding is a lever, and could be effectively learned as one. You could think of it as writing or reading - skills than you use for the whole journey.

School Blog

The final project was a school blog, brook-posts.com. We chose to host it on Tumblr, since it permitted HTML & CSS editing, but also made the upkeep of a blog-like website less tricky. The places students took the site were interesting and awesome - after the have an understanding of HTML & CSS, they start asking more of ‘custom’: can we change layouts and design part of the site from scratch? And the mix of technical and non-technical tasks let students teach each other.

Let’s Talk URLs

Once the websites were in motion, I realized that we were missing an element of knowledge: URLs. How to link to pages, the difference between relative and absolute, and the different parts from http:// to .html, are essential bits of learning that make the web make sense, and are often abstracted away. Chrome has experimented with hiding http:// in the address bar, and students spend much of their time on singular websites, like Facebook. I never had the time to teach this, but I should have.*

Going Live

The final step was going live: putting portfolios & games on the internet, where students could pass around URLs or even put them on college applications. This was a surprising challenge to source: where, on the internet, can you just drag & drop files, for free, simply? It’s easy to find application hosting like Wordpress.com or more advanced tools like GitHub’s gh-pages functionality, but the low end is scarce.

The answer came from an unexpected place: neocities.org, a volunteer project that captures the spirit of now-shuttered GeoCities, worked perfectly. Registering for an account is simple, and uploading a site is just drag & drop. Since photos in portfolios were hosted on Flickr, all of the student projects were well under the 10MB storage limit. And in a moment that made my oldness visible, no-one in the class had heard of the once-popular GeoCities, so the name had no connotations.

Landing

The class was a few weeks ago, and it’s still sinking in. When I write code or try to explain a topic, there’s a much greater range of considerations and possibilities, thanks to my sister and these students. Like any sort of teaching, the results will come in time - I hope that we inspired people to think creatively and feel like they can change and make more kinds of things.

Teachers reading this post will probably wince at my newbie mistakes.5 Without my sister’s teaching ability, I would have only confused and bored this class. Like many non-teachers, I didn’t know how to teach. But this is a new area, and is still a little mysterious to everyone. There’s a blossoming of adult tech education, like nodeschool.io, and Hacker School and its clones, but we’re still looking for a good way to start earlier. Teachers and tech people alike are still finding what works and what doesn’t.

And so, just like coding, I’m hoping to iterate. The things that didn’t work, I have guesses for what would work better but they’re only guesses. The variables of time, location, and demographics would all massively influence content & style.

The full class materials are in a Github repository and are CC0 licensed. If you’re interested in trying out teaching, there are plenty of places to start, like CoderDojo, TEALS, Citizen Schools, and code.org.

Footnotes

  • It was hard to choose a HTML editor that would work great: Mozilla Thimble was a first choice because of its great HTML error reporting, but I was expecting to need to save the output, and the Persona requirement was a deal-breaker. JSBin did well, but exhibited a few bugs and oddities: sometimes the cursor would jump to an incorrect spot, external links don’t work in the right pane, and refreshing the right pane is somewhat unreliable. In the end, we didn’t actually need to save output often in this stage, so Thimble would have fared well.
  • For the most part this was successful, but it made me realize that an even-simpler mistakes.io might be useful - something that slims it down to a single line of JavaScript and explains what that JS does in painstaking detail. You type var foo = 4; and it explains what every token means - you’re creating a variable, called foo, assigning it to 4, which is a number, and ending the line with ;. And it would give fantastic guidance for incorrect or incomplete input.
  • Unlicensed Sublime nags you for a license every few saves. Automatic bracket insertion was, in the vast majority of cases, counterproductive. I think that bracket insertion is an example of overoptimizing for writing code while crippling the experience of editing and rewriting code, which is very much the majority task. The ‘tabs + sidebar’ navigation was confusing. Disappearing tabs are poorly communicated by the interface, and tabs don’t convey the difference between files with the same name in different folders. And by default, Sublime doesn’t have JavaScript formatting and I didn’t have enough time to configure a plugin for it (or find some hidden preference).
  • Autorefresh code does exist, of course, but given the time constraints and the limited ability of students to debug software, I went for the simplest route.
  • Incidentally, the school is also trying to find a teacher who could do technology, and it’s a tough hunt. Resources that would help teachers cross over into technology seem pretty slim, and there aren’t a lot of experienced educators who can or will cross-over. Perhaps this is an area where teacher fellowship programs can contribute.
  • This was inspired by Eric Mill, who previously presented on URLs at Open Data Day and discusses how using URLs is a skill that needs to be explained and taught.