For Sense City I wanted to isolate foreground objects in video. This is really a computer vision problem, but there are simple ways to cheat.

This technique that works well for my case:

  • A relatively static background
  • Lighting that changes over the course of a day
  • Moving, solid, and opaque objects

Isolating the Background

There are always foreground elements in the scene, like pedestrians, birds, or cars. This rules out the approach of choosing a single reference frame and computing foreground elements relative to an empty street. So, we’ll need a more flexible definition of foreground and background:

  • Foreground is something that isn’t there most of the time
  • Background is something that is there most of the time

Start with three frames. There are no ‘clean’ frames in this sample - each one contains cars, pedestrians, bicyclists, and so on.

To isolate the background, we’re going to run an operation over every pixel in each of these images. For instance, for the very top left corner, if that pixel is black for 2 frames and white for 1, running an average over the frames will make it a 66% gray.

Here’s what a mean looks like:

Ghostly cars are not ideal. The same ghosts haunt min & max:

min, max

So, statistical minds have already figured it out: foreground objects are outliers, so we’ll need a more robust statistic: the median.

The result isn’t perfect - some of the areas where bicyclist and motorist overlapped are still noticeable in the green protected area of the intersection.

But it’s quite usable: from here, all we need to do is subtract the median from each frame, and we’re starting to see isolated motion.

Now, instead of simply subtracting each pixel from the other, determine whether the difference between the two is beyond a certain threshold and just return one of them if true. This way colors are true to what’s visible in the image, rather than flipped and skewed by the difference between the median and frame.



The frames we picked for this example are close together in time, so they share similar daylight. A larger sample includes much more diversity in lighting:

The solution to this issue is to use windows: instead of finding the median of all frames in the dataset, run medians over local samples. So, a frame of video at 2pm would be compared against a median of all frames between 2:00 and 2:15pm.

In Code

Implementing this algorithm is incredibly simple. I used Sean Gillies’s rasterio Python package for the first pass, For this post, I reimplemented it in JavaScript as isolate-movement. I’ve stashed some of the prettier examples on /myland.

See Also

  • This approach is inspired by the Cloudless Atlas technique, which also uses pixel math to eliminate obstructions.
  • SIOX is a recent solution to a static version of this problem.
  • OpenCV’s optical flow algorithms also try to detect and track motion.

Sense City


Washington, DC has a gunshot detector network. In technical terms, a system of 300 acoustic sensors mounted on buildings with fast enough networking that sounds can be triangulated using precise timing.*


The Washington Post broke the story to most residents. Gun violence in cities is an unquestionable scourge and the detector system is a clear step toward a solution, so the story was entirely positive.

ShotSpotter, the company* that installed DC’s system, offers a cloud service by which their own personnel monitor the sensors, listen to recordings, and pass the results to police officers. While the police provided data to the media - the raw data for the WP’s story - what’s the raw product? After all, a network of gunshot detectors is a network of microphones, installed throughout a city and running 24/7. The detectors don’t just detect high-pressure sounds: SST advertises them as having subsonic to supersonic range. The implication should be clear: gunshot detectors are microphones. Their being microphones is certainly useful for their main purpose, since many shooters will say their own or the victim’s name right before or after the crime. The applications outside of this purpose, like the ability to record private conversations, are more troublesome.4

ShotSpotter UI ShotSpotter’s UI, from SST via Richmond Confidential

ShotSpotter retains the data to let police review incidents. The database is geographically distributed, just like any good web service, and one can only assume that it runs on public TCP.


Above are Capital Bikeshare stations - 337 locations, including a few that have closed or haven’t opened yet. The Bikeshare system publishes trip data every quarter - start & end locations, with minutely accuracy.

DC Capital Bikeshare

photo by James Schwartz, CC-BY-ND

The data inevitably ends up as beautiful visualizations and tools for calculating odds of getting a bike.


The District Department of Transportation maintains cameras that remotely monitor traffic and occasionally prove useful in criminal proceedings. They license the data from these cameras to a private company called, that then resells the data to online and television news.

While some municipalities do this themselves - for instance, the NYC DOT’s system - TrafficLand* has similar contracts with 50+ departments and handles 18,000 cameras. The cameras in DC update every two seconds or less.

Cities are sensors.

I’ll remove my tin-foil hat and add a few thoughts.

Space and Time is Identity

Given the sensor infrastructure that’s public and obvious in operation, the most powerful technique is cross-referencing. That is, when data is released with minutely precision or a few extra decimal places of latitude and longitude, the potential hacks multiply. Which is to say,


DC traffic camera on

Several cameras include Capital Bikeshare racks in their field of view. Given a 2 second camera frequency and minutely Bikeshare data frequency, it’s reasonable to assume that by recording all cameras and cross-referencing bike trips one could de-anonymize the dataset.

Deanonymization is a trick in which you can recover personally identifiable information from supposedly anonymized datasets. The first popular example was Netflix’s dataset, which in 2006 was successfully remapped to specific members. Later, New York City’s weakly anonymized taxi data was quickly decoded to reveal exact license plate numbers.anon

Storing one day of one camera’s footage costs 200MB. Roughly 175 cameras that cover the DMV would consume 35GB/day or 3.1 terabytes of storage per quarter. That’s around $90 to store three months of data on Amazon’s S3 service.s3


These sensor systems are large investments, made over long periods of time, with support of the government and often the community. Capital Bikeshare’s data releases are well-known and traffic cameras are an expected utility. The MPD’s relationship with ShotSpotter has been relatively quiet, but is casually mentioned in annual reports. On the other coast, Oakland, CA residents opposed the removal of ShotSpotter, saying it did good for their community.

What’s interesting about these examples is that they are on the edge of open data. The police department also has 103 CCTV cameras since 2006, but nobody expects residents to be invited to tune in.


But traffic cameras are cameras, gunshot detectors are microphones.* Some seem innocuous, some creepy, some cross into the public domain, and some don’t. The gap between stated purpose and usage is real: in Boston, a license plate reader system that was supposed to be tracking stolen cars seemed to do everything but.

And so sensor data feels different. Letting everyone listen through a gunshot detector would expose not just crime but police brutality. Open access to high-resolution cameras throughout the city will show traffic and also presidential convoys.

Every new eye added to the network, every new connection from the world to a database, has mixed consequences. In Ferguson, both volunteers and police using body cameras to establish wrongdoing and record encounters. It’s remarkable that surveillance could be a tool for activism.

Unfiltered, sensor data shows everything, reveals everything. Or rather, where the sensors are. In DC, that’s where the police chose to install the sensors and it initially excluded the far northwest - so a gunshot-detected map of the city is more a map of where gunshot detectors are than gunshots.

Commuting to the old office, I would pass three cameras two times a day.

This is one of them, after being split into 2-second chunks and run through simple and stupid processing to show only foreground, moving objects.

This is a work in progress: I’m experimenting with what it really means, and how to think of this relatively new sort of thing.

Thoughts and prior art are greatly appreciated - let me know at @tmcw.


  • If you stored every photo as a separate object on S3, the pricing would be steeper due to PUT costs, but this is assuming that you'd have a local cache and push archived tar files to the service.
  • Deanonymization takes advantage of mistakes made by the people trying to anonymize data, whether they're cryptographic, in NYC's case, or statistical, in Netflix's. It's by no means a new problem - the US Census has been grappling with it for years, since they release data for aggregate units of land - blocks, parcels, states - that can contain as few as one person. If the Census were to release exact information about every unit, they'd reveal personal details of any sole residents of Census blocks.
  • It's interesting that most of the relatively positive articles about ShotSpotter refer to the microphones as acoustic sensors. But maybe it is accurate, because I think that microphones have become something totally different than the ones we see onstage. An iPhone 5 has three antennas, one for background noise and two that perform short-range beamforming. Further in deep end is badBIOS's vivid illustration of what can happen outside our hearing range.
  • DC doesn't publicize the locations of the devices, only where gunshots have been detected. The data doesn't immediately make this obvious, especially since it appears that the coordinates were set to a lower precision for export.
  • The exact range of ShotSpotter's system isn't clear: they advertise a wide radius for gunshots, but other sounds would vary. Microphones are very interesting in this way, because their coverage isn't usually directional but can be amplified in interesting ways by city structures - sound bounces off of walls readily whereas light needs a reflective surface.


Thanks to Eric Mill for reviewing drafts of this article. Thanks to the satellite crew at Mapbox for image-processing advice.

See Also




  • I’m slowly getting into Semiology of Graphics. It’s a special kind of read, because I know that a lot of the ideas I employ already are derived from this book, but I’ve never read it directly.
  • Winter is Coming (Probably) Soon. Something that everyone in the tech industry likely nervously read this month. The connection between general interest rates and a specific industry - and thinking about where that industry lies in the economy - is something new to me. Bad Notes on Venture Capital is less of an intro and I’m embarrassed not to follow a lot of it.
  • What We Talk About When We Talk About What We Talk About When We Talk About Making: it seems like everyone read this this month, and it really is quite good. Though I hadn’t read anything from Tim Maly prior, the style feels familiar, in a good way, to some other folks I admire. That said, I think I’ve realized why I don’t always like reading this type of article, because I can’t find the intended mood or actions or whatever thoughtpieces should invoke. On a similar level of micro/macro technology social thinking is a preliminary atlas of gizmo landscapes.
  • Zoe Quinn’s Depression Quest: #gamergate loomed large in the public consciousness this month, and I followed Zoe’s twitter with interest. There are lots of directions - understanding the way the media talks about 4chan and the difference between the ‘underbelly of the internet’ and the, usual internet. Or about what isn’t the underbelly of the internet: what separates Facebook and 4chan?