I attended a workshop sponsored by the IAB, W3C, ISOC and MIT on Internet Privacy. The workshop had much more of a web focus than it should have: the web is quite important should certainly cover a majority of the time, but backend issues, network issues, and mobile applications are certainly important too. For me this workshop was an excellent place to think about linkability and correlation of information. When people describe attacks such as using the ordered list of fonts installed in a web browser to distinguish one person from another, it’s all too easy to dismiss people who want to solve that attack as the privacy fringe. Who cares if someone knows my IP address or what fonts I use? The problem is that computers are very good at putting data together. If you log into a web site once, and then later come back to that same website, it’s relatively easy to fingerprint your browser and determine that it is the same computer. There’s enough information that even if you use private browsing mode, clear your cookies and move IP addresses, it’s relatively easy to perform this sort of linking.

It’s important to realize that partially fixing this sort of issue will make it take longer to link two things with certainty, but tends not to actually help in the long-run. Consider the font issue. If your browser returns the set of fonts it has in the order they are installed, then that provides a lot of information. Your fingerprint will look the same as people who took the same OS updates, browser updates and installed the same additional fonts in exactly the same order as you. Let’s say that the probability that someone has the same font fingerprint as you is one in a million. For a lot of websites that’s enough that you could very quickly be linked. Sorting the list of fonts reduces the information; in that case, let’s say your probability of having the same font set as someone else is one in a hundred. The website gets much less information from the fonts. However it can combine that information with timing information etc. It can immediately rule out all the people who have a different font profile. However as all the other people who have the same font fingerprint access the website over time, differences between them and you will continue to rule them out until eventually you are left. Obviously this is at a high level. One important high-level note is that you can’t fix these sorts of fingerprinting issues on your own; trying makes things far worse. If you’re the only one whose browser doesn’t give out a font list at all, then it’s really easy to identify you.

The big question in my mind now is how much do we care about this linking. Governments have the technology to do a lot with linking. We don’t have anything we technical we can do to stop them, so we’ll need to handle that with laws. Large companies like Google, Facebook and our ISPs are also in a good position to take significant advantage of linking. Again, though, these companies can be regulated; technology will play a part, especially in telling them what we’re comfortable with and what we’re not, but most users will not need to physically prevent Google and Facebook from linking their data. However smaller websites are under a lot less supervision than the large companies. Unless you take significant steps, such a website can link all your activities on that website. Also, if any group of websites in that space want to share information, they can link across the websites.

I’d like to run thought experiments to understand how bad this is. I’d like to come up with examples of things that people share with small websites but don’t want linked together or alternatively don’t want linked back to their identity. Then look at how this information could be linked. However, I’m having trouble with these thought experiments because I’m just not very privacy minded. I can’t think of something that I share on the web that I wouldn’t link directly to my primary identity. I certainly can’t find anything concrete enough to be able to evaluate how clearly I care to protect it. Helping me out here would be appreciated; if you can think of fairly specific examples. There’s lots of important I prefer to keep private like credit card numbers, but there, it’s not about linking at all. I can reasonably assume that the person I’m giving my credit card number to has a desire to respect my privacy.a

Leave a Reply

You must be logged in to post a comment.