Friday, May 24, 2013

"I added your numbers, and I have no idea what they are."

So it turns out there's a thesis arguing that polynomial-time, fully homomorphic encryption is possible. (Link is to the dumbed-down -- but still journal-published -- version that mortals like me are capable of understanding.)

It's hard to understate the significance of this. This means that it's possible for you to give someone your data in encrypted form, and for them to execute arbitrary operations and give it back to you, without ever knowing what the data is. That is, they transform an input ciphertext to and output ciphertext such that when you decrypt the output, you have the answer to your query about the data, but at no point did they decrypt it or learn what was inside.

In other words: "I just calculated the sum of the numbers you gave me, but I have no idea what the sum is, nor what any of the numbers are."

If it sounds impossible, it's not because you misunderstand it, but because that kind of thing shouldn't be possible -- how can you perform arbitrary operations on data without learning something about it? Sure, maybe there are edge cases, but a rich, Turing-complete set?

It would mean that "the cloud" can ensure your privacy, while *also* doing useful operations on your data (as the author, Craig Gentry, goes at great length to emphasize).

As best I can tell from the paper, here's the trick, and the intuition why it's possible:

1) The computation must be non-deterministic -- i.e. many encrypted outputs correspond to the correct decrypted output. This is the key part that keeps the computation provider from learning about the data.

2) The output must be fixed size, so you have a sort of built-in restriction of "limit to the first n bytes of the answer".

3) It does require a blowup in the computational resources expended to get the answer. However, as noted above, it's only a polynomial blowup. And thanks to comparative advantage, it can still make sense to offload the computation to someone else, for much the same reason that it makes sense for surgeons to hire secretaries even when the surgeon can do every secretarial task faster. (Generally, when the provider's opportunity cost of performing the task is less than yours.)

4) Finally, to be fully homomorphic -- capable of doing every computation, not just a restricted set of additions and such -- the encrypted computation has to find a way around the buildup of "noise" in the computation, i.e. properties of the output that put it outside the range of what can be decrypted (due to exceeding the modulus of the operation needed to extract the output). And to do that, in turn, its operation set must be sufficient to perform its own decryption.

I'm only about halfway through the paper, but it's been really enlightening to get the intuition behind why this kind of thing can work.

1 comment:

Tel said...

I dunno about this one.

The reasons why "the cloud" is valuable right now are basically threefold:

[1] real estate is expensive.

[2] big data pipes are expensive.

[3] proper physical security is expensive.

Thus, hiring a number of virtual machines is a cheap way to get physical diversity (so you are safe from disaster) and shared access to someone's big data pipe. It is absolutely NOT a good way to get access to high power computing machinery... you get much better value for grinding calculations if you run them on your own hardware.

You may find that "cloud" services also include some management and support component, and then you are essentially just hiring engineering labour, and you get some advantage if a lot of people all want the same maintenance done (i.e. division of labour and specialization). Trouble is, generally the sort of regular maintenance that is very commonplace, is also very easy to automate with software, so you don't get good value hiring engineers to type "yum update" for you.