Washington, DC has a gunshot detector network. In technical terms, a system of 300 acoustic sensors mounted on buildings with fast enough networking that sounds can be triangulated using precise timing.*
The Washington Post broke the story to most residents. Gun violence in cities is an unquestionable scourge and the detector system is a clear step toward a solution, so the story was entirely positive.
ShotSpotter, the company* that installed DC’s system, offers a cloud service by which their own personnel monitor the sensors, listen to recordings, and pass the results to police officers. While the police provided data to the media - the raw data for the WP’s story - what’s the raw product? After all, a network of gunshot detectors is a network of microphones, installed throughout a city and running 24/7. The detectors don’t just detect high-pressure sounds: SST advertises them as having subsonic to supersonic range. The implication should be clear: gunshot detectors are microphones. Their being microphones is certainly useful for their main purpose, since many shooters will say their own or the victim’s name right before or after the crime. The applications outside of this purpose, like the ability to record private conversations, are more troublesome.4
ShotSpotter’s UI, from SST via Richmond Confidential
ShotSpotter retains the data to let police review incidents. The database is geographically distributed, just like any good web service, and one can only assume that it runs on public TCP.
Above are Capital Bikeshare stations - 337 locations, including a few that have closed or haven’t opened yet. The Bikeshare system publishes trip data every quarter - start & end locations, with minutely accuracy.
photo by James Schwartz, CC-BY-ND
The data inevitably ends up as beautiful visualizations and tools for calculating odds of getting a bike.
The District Department of Transportation maintains cameras that remotely monitor traffic and occasionally prove useful in criminal proceedings. They license the data from these cameras to a private company called TrafficLand.com, that then resells the data to online and television news.
While some municipalities do this themselves - for instance,
the NYC DOT’s system - TrafficLand* has similar contracts with 50+ departments and handles 18,000 cameras. The cameras in DC update every two seconds or less.
Cities are sensors.
I’ll remove my tin-foil hat and add a few thoughts.
Space and Time is Identity
Given the sensor infrastructure that’s public and obvious in operation, the most powerful technique is cross-referencing. That is, when data is released with minutely precision or a few extra decimal places of latitude and longitude, the potential hacks multiply. Which is to say,
DC traffic camera on TrafficLand.com
Several cameras include Capital Bikeshare racks in their field of view. Given a 2 second camera frequency and minutely Bikeshare data frequency, it’s reasonable to assume that by recording all cameras and cross-referencing bike trips one could
de-anonymize the dataset.
Deanonymization is a trick in which you can recover personally identifiable information from supposedly anonymized datasets. The first popular example was Netflix’s dataset, which in 2006 was successfully remapped to specific members. Later, New York City’s weakly anonymized taxi data was quickly decoded to reveal exact license plate numbers.anon
Storing one day of one camera’s footage costs 200MB. Roughly 175 cameras that cover the DMV would consume 35GB/day or 3.1 terabytes of storage per quarter. That’s around $90 to store three months of data on Amazon’s S3 service.s3
These sensor systems are large investments, made over long periods of time, with support of the government and often the community. Capital Bikeshare’s data releases are well-known and traffic cameras are an expected utility. The MPD’s relationship with ShotSpotter has been relatively quiet, but is casually mentioned in annual reports. On the other coast, Oakland, CA residents opposed the removal of ShotSpotter, saying it did good for their community.
What’s interesting about these examples is that they are on the edge of open data. The police department also has 103 CCTV cameras since 2006, but nobody expects residents to be invited to tune in.
But traffic cameras are cameras, gunshot detectors are microphones.* Some seem innocuous, some creepy, some cross into the public domain, and some don’t. The gap between stated purpose and usage is real: in Boston, a license plate reader system that was supposed to be tracking stolen cars seemed to do everything but.
And so sensor data feels different. Letting everyone listen through a gunshot detector would expose not just crime but police brutality. Open access to high-resolution cameras throughout the city will show traffic and also presidential convoys.
Every new eye added to the network, every new connection from the world to a database, has mixed consequences. In Ferguson, both volunteers and police using body cameras to establish wrongdoing and record encounters. It’s remarkable that surveillance could be a tool for activism.
Unfiltered, sensor data shows everything, reveals everything. Or rather, where the sensors are. In DC, that’s where the police chose to install the sensors and it initially excluded the far northwest - so a gunshot-detected map of the city is more a map of where gunshot detectors are than gunshots.
Commuting to the old office, I would pass three cameras two times a day.
This is one of them, after being split into 2-second chunks and run through simple and stupid processing to show only foreground, moving objects.
This is a work in progress: I’m experimenting with what it really means, and how to think of this relatively new sort of thing.
Thoughts and prior art are greatly appreciated - let me know at @tmcw.
- If you stored every photo as a separate object on S3, the pricing would be steeper due to PUT costs, but this is assuming that you'd have a local cache and push archived tar files to the service.
- Deanonymization takes advantage of mistakes made by the people trying to anonymize data, whether they're cryptographic, in NYC's case, or statistical, in Netflix's. It's by no means a new problem - the US Census has been grappling with it for years, since they release data for aggregate units of land - blocks, parcels, states - that can contain as few as one person. If the Census were to release exact information about every unit, they'd reveal personal details of any sole residents of Census blocks.
- It's interesting that most of the relatively positive articles about ShotSpotter refer to the microphones as acoustic sensors. But maybe it is accurate, because I think that microphones have become something totally different than the ones we see onstage. An iPhone 5 has three antennas, one for background noise and two that perform short-range beamforming. Further in deep end is badBIOS's vivid illustration of what can happen outside our hearing range.
- DC doesn't publicize the locations of the devices, only where gunshots have been detected. The data doesn't immediately make this obvious, especially since it appears that the coordinates were set to a lower precision for export.
- The exact range of ShotSpotter's system isn't clear: they advertise a wide radius for gunshots, but other sounds would vary. Microphones are very interesting in this way, because their coverage isn't usually directional but can be amplified in interesting ways by city structures - sound bounces off of walls readily whereas light needs a reflective surface.
Thanks to Eric Mill for reviewing drafts of this article. Thanks to the satellite crew at Mapbox for image-processing advice.