Spreading chaos and entropy

NB: deze blog post is redelijk oud

We hebben sinds dit artikel een nieuwe website gekregen. Wellicht ziet dit artikel er daarom niet zo uit zoals je zou verwachten.

Als je denkt dat deze pagina erg nuttig is, en hij er niet mooi uit ziet of niet goed functioneert, neem dan contact met ons op.

Imagine you're a hosting provider and you want to generate cryptographic keys. Lots of them. On virtual machines. Pretty soon you'll find out that computers are indeed very logical devices, which makes them particularly bad at coming up with random data. And guess what makes or breaks a cryptographic key? That's right: random data. In this post we will show you how we tackled this problem at Greenhost. To clarify how we are working and hopefully to also share some of our knowledge to the wider community."

Virtual randomness is hard

Usually your operating system will try and scrounge together bits of randomness from the big bad illogical outside world, things like keyboard strokes, mouse movements, network interrupts, etcetera. Everyone who has ever generated their own PGP key will know how long that can take. On virtual machines there's really not all that much usable random data coming in. That's because on virtual machines logical devices are pretending to be other logical devices - and the virtualisation cuts off most direct contact between the virtual machine and the big bad illogical outside world.

Creating random out of thin air

Luckily there are these things called hardware Random Number Generators (RNG's). Those little nifty boxes concentrate all the illogical-ness of the big bad outside world into a small little box and generate random numbers ready to be fed into your computer. Yes, hardware RNG's are a perfect solution for the entropy problem and will solve your problem pretty much out of the box on a desktop machine. But things get a bit more complicated when you're a hosting provider with tons of virtual machines. You can stick an RNG into a server and then feed that one server, making sure it'll always have a big entropy pool, but all that random data won't just magically find it's way to your virtual machines. (Side note: KVM has some options to feed it's VM's with entropy, but we're using Xen here!). And besides, with the large amount of data that's being produced by the average RNG, buying a separate RNG for every server would be both excessive and rather expensive.

We want something that will spread the random data from one RNG to a cluster of virtual machines. Luckily, Folkert van Heusden wrote a piece of software that does exactly that: the Entropy Broker. The system consists of three parts: the broker, the server and the client. The server sends random data from the RNG to the broker. The broker collects and distributes the random data. The client pulls random data from the broker to feed the local entropy pool on the machine. Sounds pretty neat, eh?

Getting it up and running

First of all, we want redundancy, so we got ourselves two RNG's called Araneus Alea's. We plugged them into two physical machines. On these machines two virtual machines were created: let's call them broker1 and broker2. Then we made ourselves a custom debian package for the entropy broker to easily install it across the network.* We added that to our local repository and hooray! apt-get install entropybroker!

Then the slightly more tricky part: how to get a redundant broker out of two RNG's? We opted for having two brokers (broker1 and broker2), each fed by one of the RNG's and then both feeding each other, just in case one of the RNG's might go haywire. As a side bonus, mixing different sources will also decrease vulnerability to potential backdoors. So there we have entropy broker's server installed on the physical machines. On the brokers we have not only the broker, but also the client (feeding the local entropy pool) and again the server (using that local entropy pool to feed the other broker) installed. Add keepalived for a virtual interface in between the brokers and haproxy to forward incoming connections and you end up with an architecture that looks like this:

The last step is to install the entropy broker client on the virtual machines in the hosting park and share unique credentials between them and the broker. Now you can cat /dev/random and the flow of random characters that fill your screen just keeps on coming. As a simple test, I tried generating 10 gpg key pairs on a virtual machine. With the entropy broker client installed that took 9 seconds. Without the entropy broker client.. well.. that one is still running.

A few last remarks.

Why entropy broker and not one of the other comparable systems? Well, the code base is small and pretty clean, it's been audited, and the upstream developer is a nice guy. There's a few small downsides - management of credentials is a bit of a hassle, but not unworkable. The main reason we haven't implemented this on customer virtual machines yet is the lack of configurable access control that would allow the client to retrieve randomness from the broker, but not push data to the broker. This might allow the clients to corrupt the entropy pool by pushing in non-random data, though it is debatable to what degree random XOR non-random is still really random or at least random enough. Either way, we're experimenting with a patch that fixes this problem and hopefully the next release will have a configuration option for read-only access.

* The package isn't public yet (working on that), but if you're interested, just send us an e-mail. The only thing that really sets it apart from a git checkout and manual build is the init-script, which uses an /etc/defaults/entropybroker file to select which service you want to start.