Archive for December, 2010

Privacy

Friday, December 10th, 2010 by hartmans

I attended a workshop sponsored by the IAB, W3C, ISOC and MIT on Internet Privacy. The workshop had much more of a web focus than it should have: the web is quite important should certainly cover a majority of the time, but backend issues, network issues, and mobile applications are certainly important too. For me this workshop was an excellent place to think about linkability and correlation of information. When people describe attacks such as using the ordered list of fonts installed in a web browser to distinguish one person from another, it’s all too easy to dismiss people who want to solve that attack as the privacy fringe. Who cares if someone knows my IP address or what fonts I use? The problem is that computers are very good at putting data together. If you log into a web site once, and then later come back to that same website, it’s relatively easy to fingerprint your browser and determine that it is the same computer. There’s enough information that even if you use private browsing mode, clear your cookies and move IP addresses, it’s relatively easy to perform this sort of linking.

It’s important to realize that partially fixing this sort of issue will make it take longer to link two things with certainty, but tends not to actually help in the long-run. Consider the font issue. If your browser returns the set of fonts it has in the order they are installed, then that provides a lot of information. Your fingerprint will look the same as people who took the same OS updates, browser updates and installed the same additional fonts in exactly the same order as you. Let’s say that the probability that someone has the same font fingerprint as you is one in a million. For a lot of websites that’s enough that you could very quickly be linked. Sorting the list of fonts reduces the information; in that case, let’s say your probability of having the same font set as someone else is one in a hundred. The website gets much less information from the fonts. However it can combine that information with timing information etc. It can immediately rule out all the people who have a different font profile. However as all the other people who have the same font fingerprint access the website over time, differences between them and you will continue to rule them out until eventually you are left. Obviously this is at a high level. One important high-level note is that you can’t fix these sorts of fingerprinting issues on your own; trying makes things far worse. If you’re the only one whose browser doesn’t give out a font list at all, then it’s really easy to identify you.

The big question in my mind now is how much do we care about this linking. Governments have the technology to do a lot with linking. We don’t have anything we technical we can do to stop them, so we’ll need to handle that with laws. Large companies like Google, Facebook and our ISPs are also in a good position to take significant advantage of linking. Again, though, these companies can be regulated; technology will play a part, especially in telling them what we’re comfortable with and what we’re not, but most users will not need to physically prevent Google and Facebook from linking their data. However smaller websites are under a lot less supervision than the large companies. Unless you take significant steps, such a website can link all your activities on that website. Also, if any group of websites in that space want to share information, they can link across the websites.

I’d like to run thought experiments to understand how bad this is. I’d like to come up with examples of things that people share with small websites but don’t want linked together or alternatively don’t want linked back to their identity. Then look at how this information could be linked. However, I’m having trouble with these thought experiments because I’m just not very privacy minded. I can’t think of something that I share on the web that I wouldn’t link directly to my primary identity. I certainly can’t find anything concrete enough to be able to evaluate how clearly I care to protect it. Helping me out here would be appreciated; if you can think of fairly specific examples. There’s lots of important I prefer to keep private like credit card numbers, but there, it’s not about linking at all. I can reasonably assume that the person I’m giving my credit card number to has a desire to respect my privacy.a

Bad Hair Day for Kerberos

Friday, December 3rd, 2010 by hartmans

Tuesday, MIT Kerberos had a bad hair day—one of those days where you’re looking through your hair and realize that it’s turned to Medusa’s snakes while you weren’t looking. Apparently, since the introduction of RC4, MIT Kerberos has had significant problems handling checksums. Recall that when Kerberos talks about checksums it’s conflating two things: unkeyed checksums like SHA-1 and message authentication codes like HMAC-SHA1 used with an AES key derivation. The protocol doesn’t have a well defined concept of an unkeyed checksum, although it does have the concept of checksums like CRC32 that ignore their keys and can be modified by an attacker. One way of looking at it is that checksums were over-abstracted and generalized. Around the time that 3DES was introduced, there was a belief that we’d have a generalized mechanism for introducing new crypto systems. By the time RFC 3961 actually got written, we’d realized that we could not abstract things quite as far as we’d done for 3DES. The code however was written as part of adding 3DES support.

There are two major classes of problem. The first is that the 3DES (and I believe AES) checksums don’t actually depend on the crypto system: they’re just HMACs. They do end up needing to perform encryption operations as part of key derivation. However the code permitted these checksums to be used with any key, not just the kind of key that was intended. In a nice abstract way, the operations of the crypto system associated with the key were used rather than those of the crypto system loosely associated with the checksum. I guess that’s good: feeding a 128-bit key into 3DES might kind of confuse 3DES which expects a 168-bit key. On the other hand, RC4 has a block size of 1 because it is a stream cipher. For various reasons, that means that regardless of what RC4 key you start with, if you use the 3DES checksum with that key, there are only 256 possible outpus for the HMAC. Sadly, that’s not a lot of work for the attacker. To make matters worse, one of the common interfaces for choosing the right checksum to use was to enumerate through the set of available checksums and pick the first one that would accept the kind of key in question. Unfortunately, 3DES came before RC4 and there are some cases where the wrong checksum would be used.

Another serious set of problems stems from the handling of unkeyed checksums. It’s important to check and make sure that a received checksum is keyed if you are in a context where an attacker could have modified it. Using an md5 outside of encrypted text to integrity protect a message doesn’t make sense. Some of the code was not good about checking this.

What worries me most about this set of issues is how many new vulnerabilities were introduced recently. The set of things you can do with 1.6 based on these errors was significant, but not nearly as impressive as 1.7. A whole new set of attacks were added for the 1.8 release. In my mind, the most serious attack was added for the 1.7 release. A remote attacker can send an integrity-protected GSS-API token using an unkeyed checksum. Since there’s no key the attacker doesn’t need to worry about not knowing it. However the checksum verifies, and the code is happy to go forward.

I think we need to take a close look at how we got here and what went wrong. The fact that multiple future releases made the problem worse made it clear that we produced a set of APIs where doing the worng thing is easier than doing the right thing. It seems like there is something important to fix here about our existing APIs and documentation. It might be possible to add tests or things to look for when adding new crypto systems. However I also think there is an important lesson to take away at a design level. Right now I don’t know what the answers our, but I encourage the community to think closely about this issue.

I’m speaking about MIT Kerberos because I’m familiar with the details there. However it’s my understanding that the entire Kerberos community has been thinking about checksums lately, and MIT is not the only implementation with improvements to make here.