Prioritizing privacy when using location in apps

5 Recommendations for developers

Mapbox
maps for developers
4 min readJan 29, 2019

--

By: Tom Lee

Location is finally a prominent part of the data privacy conversation. This is overdue. The physical location of your body in a given moment, and over time, is uniquely personal.

The future of location is exciting: connecting our digital lives to the real world is going to let us solve old problems and uncover new insights. But we have to approach that future responsibly. As more developers and companies experiment with these capabilities, it’s essential that they also consider the ethical obligations that come with handling personal data — not only by their own teams, but by the third party services they depend on.

The privacy and dignity of our customers and users has been a north star for us since before we ever collected our first byte of user data. Our team has spent a lot of time considering these questions. Here are 5 privacy recommendations for developers and companies building with location:

1. De-identification & anonymization

Location can reveal a lot about an individual. Even after removing obvious identifiers like IP addresses or session tokens, a pattern of travel between specific places can contain private details about an individual’s identity. The risk of data being connected back to an individual can be reduced by breaking location data into shorter segments that can’t be linked back together. At Mapbox we also discard the beginning and end of each trace, as well as data that looks like it’s from residential dwellings. This process leaves us with short segments that are useful for detecting traffic conditions, but useless for identifying individuals.

2. Fuzzing & aggregation

Anonymizing the data you collect can substantially reduce risk — but not eliminate it. That’s because of what privacy researchers sometimes call the “Mosaic Effect.” Put simply, it says that the privacy implications of a piece of data can’t be fully understood in isolation. You must also consider how your data can interact with other data. Think of a Sherlock Holmes story, where the hero combines many seemingly innocuous details to reach an unexpected revelation. Because it’s impossible to anticipate all of the datasets that might be combined with your data, it’s hard to ever declare data conclusively safe.

But it’s possible to reduce risk substantially by discarding, attenuating or obfuscating the signal that data contains. Aggregation is one way of doing this, and the US Census is a good example of how it can be put into action. The Census collects highly personal data from individuals, including details like race and income. But this data is only published in aggregate form, by tract, block, block group and so on.

If aggregation isn’t an option, it may suffice to reduce the data’s fidelity. Statistical techniques like differential privacy get researchers and computer scientists excited (trust us, we know). But techniques as simple as rounding geographic coordinates to a few decimal places can substantially reduce risk. Whether this is viable depends on what you need the data for: if you’re showing users a weather forecast, you might require less precision than if you’re helping them find a coffee shop. And of course this works both ways: being intentional about your business strategy lets you offer stronger privacy guarantees. Before Mapbox began collecting location data for our traffic product we carefully considered whether we would ever sell the data to advertisers. Our decision not to has let us put user privacy first.

3. Standardized encryption at rest & in transit

Data should be encrypted both as it’s transmitted and when it’s stored. This should always be done using widely adopted libraries that implement modern standards and which have been independently audited. Unless you employ a PhD cryptographer you should never think about using homegrown ideas or implementations (and even then you should probably think about it carefully).

The specifics of how to best implement encryption depend on your use case and threat model, but there are a variety of techniques that can harden a typical implementation, from certificate pinning to hardware security modules.

4. Access control

Scrubbing data’s content and securing its form are essential, but your first and most important line of defense must be controlling access to what you collect. Implementing the principle of least privilege — by which staff may only access the resources they need — is a priority for any top-notch security team. And having such a team in place is a prerequisite to handling location data responsibly. Without carefully designed access control, both your users and your business will be at risk. This should include both procedures for onboarding and offboarding those who need access, and instrumentation to detect unexpected attempts at access or privilege escalation.

5. Give users choice

Your users deserve to know how their location data is being collected and used. This is not just the right thing to do: in more and more places, it’s the law. Providing clear, unambiguous details about how data is shared and monetized lets users make an informed decision about whether to use your service — or, in some cases, whether to simply opt out of data collection (an option that we require all Mapbox customers to offer their users).

Is a user’s data going to be scrubbed and used to build better maps? Analyzed in aggregate by urban planners? Or is their location going to be used to do things they don’t like or approve of? The answers matter a lot.

The above list isn’t meant to be exhaustive. But it reflects some of the answers that we’ve arrived at as we’ve considered how to do right by our customers and their users. And, like location services themselves, these techniques are both powerful and accessible.

Want to dig into details / ask questions / gush about how cool homomorphic encryption is? Say hi on Twitter, check out our telemetry page, or just head straight to mapbox.com/careers.

--

--

mapping tools for developers + precise location data to change the way we explore the world