Thursday, July 11, 2013

My discovery of "semantic security"

One interesting thing I forgot to mention in the previous post about homomorphic encryption: the concept of semantic security.

It was actually a major stumbling block for me. When I got to the passage that mentions the concept, the author casually remarks that "semantic security is an expected feature of public key cryptosystems", and then defines the term as follows: a system is semantically secure if, given two plaintexts and the ciphertext of one of them, an attacker cannot do better than chance in guessing which plaintext goes with that ciphertext.

That didn't make sense because I had always assumed that the defining feature of public key cryptography was that the attacker is permitted unlimited chosen-plaintext attacks, which -- I thought -- means that the attacker always gets to know what ciphertext goes with any plaintext. So how can you make it so that the attacker can -- as required for public key encryption -- produce valid ciphertexts from arbitrary plaintext, and yet still have a semantically secure cryptosystem? Couldn't the attacker just encrypt both plaintexts to figure out which one corresponds to the given ciphertext?

What I missed was that you can use a one-to-many cipher: that is, the same plaintext corresponds to many ciphertexts. What's more, it's actually trivial to convert a plain-vanilla one-to-one public key cipher into a one-to-many semantically secure version. Here's how: just before applying the encryption step, generate a random number (a "nonce") and append it to the plaintext, with the proviso that the recipient with look for it in that position and strip it off after decryption.

This way, in order for an attacker to try to guess the plaintext in the game above, it's no longer enough for them to simply encrypt both plaintexts: a random number was inserted in the process. This means that in order to find a match between a plaintext and a ciphertext, the attacker must encrypt each plaintext with every possible nonce, which requires resources that increase exponentially with the size of the nonce used. (That is, an n-bit nonce can have 2^n possible values.)

The more you know (tm).

2 comments:

Tel said...

"Couldn't the attacker just encrypt both plaintexts to figure out which one corresponds to the given ciphertext?"

Not if the algorithm is salted, which most of them are.

For example, check a linux shadow password file, then create a second user with the same password, note that the encrypted passwords are different, even though the original password is the same. Yes, of course this makes the ciphertext at least larger by the size of the salt.

Tel said...

By the way, there's a few situations where you have an opportunity to salt the algorithm, without paying the penalty of excess overhead adding to the size of the data.

Suppose you want to have an encrypted telephone conversation, so you might use some sort of VoIP system, and probably you might want a datagram format because realtime voice is not suitable for packet retransmission (that really messes up the jitter). For various reasons you generally want a protocol with a packet counter in the headers (so you can see if you lost any packets) and with some sort of timestamp (so you can judge the jitter in the channel). Now you have some incrementing values, not random admittedly, but also not often repeating. The trouble with encrypted datagrams is you really want to keep them small (for efficiency reasons) but you also want each packet to be independent (in case one gets lost, you don't want to screw the whole stream), and you don't want to expose any easy statistical information (like the reason WEP was broken so easily).

Using the packet counters and timestamps to contribute to the salt always made sense to me, and you don't even need to encrypt those. After all, someone can see a VoIP transmission just by traffic analysis anyhow, and they can count the packets without your help, and there might be good reasons why you want to allow intermediate routers the ability to detect a dropped packet in the stream (e.g. for congestion management, etc). Something to think about.