Saturday, November 12, 2011

Another explanation of hash functions

I think I've found a new way to explain hash functions and convey the intuition behind them.

First, think back to the classic problem from grade school: a farmer raises cows and chickens. His animals have a total of, say, 10 heads and 22 feet. How many cows and chickens does he have?

If you know algebra, you probably laugh at how easy the problem is, though it's still interesting for kids, who generally have fun with it. Now, you can either solve it the kid way, or the algebraic way. The kid way is to "guess and check": that is, guess a number of chickens and of cows, find the corresponding number of heads and feet, and check if it matches that given in the problem. The algebraic way is to let x be the number of chickens, y the number of cows, and write:

x + y = 10
2x + 4y = 22

And then solve for x and y the standard way. (In case it needs to be said, chickens have one head and two feet while cows have one head and four feet.)

What does this all have to do with hash functions, though? Glad you asked.

A cryptographic hash function for 10,000 BC

Recall the requirements for a cryptographic hash function: it must be easy to compute the output (digest) for any input (preimage), but hard to compute the input for any output (other than by trying every preimage). In other words, a one-way function (or trapdoor one-way function without the trapdoor).

So, here's a hash function for a mathematically backward world: First, it takes two integers as input. You let the first be number of chickens, and the second be the number of cows. Then, output the total number of heads and feet (perhaps with a simple separator, like "10:22" for the above example).

It works as a hash function for the people of 10,000 BC precisely because of their (relatively) poor mathematical understanding. Since they don't know to express it as a system of linear equations, or otherwise derive a simple general solution, someone trying to "crack" a given hash digest (output) would have no choice but to guess and check a bunch of chicken/cow possibilities. This method is referred to as "brute force" in cryptography, and as long as a hash function hasn't yet been "broken" (by a better understanding of the math theory involved), it's the only option available.

Though this situation may seem contrived to us now, the same dynamic is at play in modern, military grade cryptographic hash functions: so long as people lack sufficient mathematical understanding, it is impossible to invert a hash digest except by brute force. The only difference between now and then is that the math and computations are harder.

In the previous post, I mentioned how hash functions can be used to protect stored passwords while still allowing password-based authentication. Let's go over how this would work.

An encrypted password system for 10,000 BC

Let's say Mike the Merchant wants to set up a service of warehousing people's valuables. He'll deal with many customers and doesn't want to count on remembering their faces when they come to reclaim their stuff. So, he'll give each customer a unique password they must use to get in. This will be a stronger authentication system than just trying to remember every customer.

Like modern website owners, Mike also wants to keep his password records from being stolen or misused (say, to steal his customers' stuff). If he simply kept the passwords in a book, someone who gained access to it could copy them and then come one by one to illicitly claim the goods in the warehouse. (Though of course, Mike can always use common sense security measures, like noticing something is fishy when the same customer one day claims to know all the passwords!)

So, rather than store the passwords, he converts them into two integers (if the password isn't already in that form) and stores the digest from putting them through the hash function described above. So, if Harry the Hoarder's password were 100:350, Mike would store Harry's password as 450:1600. (That is, interpret the number before the colon in the password as chickens and the latter as cows, and then store the number of heads [100 + 350 = 450] and number of feet [2*100 + 4*350 = 1600], separated by a colon.)

Then, when someone comes in claiming to be Harry, Mike asks for the password. Next, instead of comparing Harry's password (100:350) to an entry in his book, he first computes the hash digest (450:1600), and compares that number with his entry for Harry. So, he still has a working system to authenticate people by password.

What protection does this offer Mike if someone gains access to his password book and copies the entries? Well, remember, like with modern systems, all they get are the digests that result from putting the passwords through the hash function, not the passwords themselves. And knowing a digest won't convince Mike that you're the account holder: remember, he's checking for a pair of numbers that hash to your entry, not the entry itself.

Could the attacker infer the passwords from the digests listed in Mike's book? Yes, but it would take infeasibly long to do so -- that's the role of the hash function. Going from cows/chickens to heads/feet is easy, but going the other way is hard (for a mathematically backward society, at least). To make any use out of Mike's hashed-password book, an attacker would have to guess a huge number of passwords and see if their digests match any in the book. As long as the password space is big enough, and the society remains mathematically backward enough, it's just not feasible for an attacker to guess and check enough passwords to find a match in the book.

And like in the previous section, this is the same position we are in with respect to hashed password storage today: with a good enough mathematical breakthrough, it might become feasible to quickly invert a hashed password, but as it stands now, the hash functions used are enough to render such databases useless to attackers (though obviously some attacks still get through, such as when a website continues to use a long-broken hash function). The only difference is the complexity of the math.

Of course, Mike still has to make sure his customers don't give out their passwords or store them insecurely ... a problem we still grapple with, twelve thousand years later.

Sunday, November 6, 2011

Setting professor fashion straight

How good are you at distinguishing professors from the homeless? Test your prof/hobo classifier here. I only did slightly better than chance -- 6/10. (Though obviously it doesn't sample uniformly over all professors.)

HT: Jerry Coyne

Saturday, November 5, 2011

Setting unilateral disarmament (obligations) straight

Apparently a Steve Landsburg post from a few months ago has been rediscovered and sparked a new blogosphere debate. The question: "If you favor higher taxes on a class of people that includes yourself, are you obligated, in the absence of the higher taxes, to make voluntary contributions to the government so as to push the world closer to your preferred one?"

The debate also considers (in greater depth) a weaker claim: "If you favor higher taxes on a class of people that includes yourself, you have a greater moral obligation to voluntarily pay (part of) such taxes regardless of whether they are enacted."

Bryan Caplan, Tyler Cowen, and Bob Murphy, all libertarians, weigh in and agree with that weaker claim. (They are listed in approximate decreasing order of confidence in the claim.) Karl Smith, one of those who does want higher taxes, disagrees.

My take is that, despite superficial dissimilarities, the question reduces to that of unilateral disarmament (UD). That is, if everyone (else) would be better of for each person who (metaphorically) disarms, but you would be much worse off if only you disarmed, should you disarm? I say that, you do not have such an obligation, either morally, or for logical consistency, though it would certainly be a noble act. So, I think Karl Smith is basically right (about the implied obligation -- obviously, not about taxes!).

Just as in UD/public goods/free rider cases, the decision to UD will, for lack of a better term, "weed out the meme pool" of people like you, effectively rewarding those who favor opposite policies (which you, by stipulation, regard as pernicious).

It is for the same reason that you should not pay Coasean extortioners: though ostensibly, it works toward your goals, it undermines them by rewarding the wrong people.

I think Douglas Hofstadter made the point very well in his Tale of Happiton, which discusses this dynamic, but in the (less relevant, IMHO) context of nuclear disarmament. He's set up a public goods type situation in which "writing postcards" (i.e. to advocate nuclear disarmament) benefits everyone, but has its costs paid purely by whoever writes them. Watch how he subtly describes the dynamics of what happens when one person takes it upon himself or herself to do the postcard writing. (Here, it's a girl named Andrea.)

Andrea’s older sister’s boyfriend, Wayne, was a star halfback at Happiton High. One evening he was over and teased Andrea about her postcards. She asked him, “Why don’t you write any, Wayne?”.

“I’m out lifeguardin’ every day, and the rest of the time I got scrimmages – for the fall season.”

“But you could take some time out - just 15 minutes a day - and write a few postcards!” she argued. He just laughed and looked a little fidgety. “I don’t know, Andrea”, he said. “Anyway, me ‘n Ellen have got better things to do-huh, Ellen?” Ellen giggled and blushed a little. Then they ran out of the house and jumped into Wayne’s sports car to go bowling at the Happi-Bowl.


Naturally, Hofstadter wrote this piece to encourage people to UD ("write postcards") in such a situation. But I think he's just as well shown that, in the absence of a collective agreement, your decision to unilaterally disarm is, well, spitting in the wind.

(A version of this post was made as a comment on Bob Murphy's blog.)

Thursday, November 3, 2011

Setting universal debt paydown straight

Not that this will get the article any more hits, but I strongly recommend Bob Murphy's takedown of the all-too-common belief that it's somehow impossible or damaging for everyone to reduce or eliminate their debts. It was in response to the latest articulation of the idea by Paul Krugman.

Really, folks, any economy that relies on a certain level of indebtedness is not an economy I care to defend, as it rests on a poor foundation. The purpose of an economy is to provide people with the best consumption/leisure/labor bundle possible, not to goose the hippest new econometric.

Wednesday, November 2, 2011

Virtualization is a riot!

Isn't it neat how computers can completely simulate other computers purely via software? (Background). Well, right now I'm reading a book that comes with a Linux Live CD containing relevant source code -- basically, a CD that you can boot from to see what it's like to use the operating system without interfering with the one you currently have. (Not to mention the benefit of having the exact same environment as the author so you can make sure It Works.) And that's how Live CDs are typically run: from bootup.

But with emulation, you don't need to reboot your whole computer just to get that Linux (or whatever OS) experience! You can set up a "sandbox" environment, or virtual machine that "pretends to be a computer". The software allocates a portion of your computing resources that you specify (disk space, RAM, etc.), and you just run the Live CD through that "pretend computer", freely switching from the window containing that virtual machine, to your web browser and whatnot. Again, it's without the hassle of rebooting every time you want to switch between that and the cute cat video you were watching.

So there I am -- I've got the virtual machine running a "sample" of an operating system off a Live CD, no need to reboot my "real" machine. But it gets better! I can go one step further and tell my pretend computer, "You know what? Let's go all the way. I want the full operating system -- not just the "sample" -- installed on your pretend hardware!" And then it dutifully runs through all the screens you would normally see when installing a Linux distro as your operating system, seizing control of the sandboxed software-that-thinks-it's-hardware, for a full wipe of the, um, non-OS you had there before.

Sorry, I don't know why ... I just find this all so hilarious ... virtualizing the use of a "sample" operating system before I install it on its virtual hardware.

(For those who are curious, the software I'm using is Oracle VM VirtualBox, available free for (IIUC) personal non-commercial use. I learned about it from LessWrong.com's failed attempts to teach others how to play with the site's code.)