Thursday, July 11, 2013

My discovery of "semantic security"

One interesting thing I forgot to mention in the previous post about homomorphic encryption: the concept of semantic security.

It was actually a major stumbling block for me. When I got to the passage that mentions the concept, the author casually remarks that "semantic security is an expected feature of public key cryptosystems", and then defines the term as follows: a system is semantically secure if, given two plaintexts and the ciphertext of one of them, an attacker cannot do better than chance in guessing which plaintext goes with that ciphertext.

That didn't make sense because I had always assumed that the defining feature of public key cryptography was that the attacker is permitted unlimited chosen-plaintext attacks, which -- I thought -- means that the attacker always gets to know what ciphertext goes with any plaintext. So how can you make it so that the attacker can -- as required for public key encryption -- produce valid ciphertexts from arbitrary plaintext, and yet still have a semantically secure cryptosystem? Couldn't the attacker just encrypt both plaintexts to figure out which one corresponds to the given ciphertext?

What I missed was that you can use a one-to-many cipher: that is, the same plaintext corresponds to many ciphertexts. What's more, it's actually trivial to convert a plain-vanilla one-to-one public key cipher into a one-to-many semantically secure version. Here's how: just before applying the encryption step, generate a random number (a "nonce") and append it to the plaintext, with the proviso that the recipient with look for it in that position and strip it off after decryption.

This way, in order for an attacker to try to guess the plaintext in the game above, it's no longer enough for them to simply encrypt both plaintexts: a random number was inserted in the process. This means that in order to find a match between a plaintext and a ciphertext, the attacker must encrypt each plaintext with every possible nonce, which requires resources that increase exponentially with the size of the nonce used. (That is, an n-bit nonce can have 2^n possible values.)

The more you know (tm).