blog.voltone.net

Who wants cookies?

Posted 2016-06-13 19:35:52.000000

There has been some discussion over at the erlang-questions mailing list about the security of cookies in Erlang distribution. Since I had a few things to say about that in my talk, but I didn’t have enough time to go into all the details, let’s have another look at the security of Erlang clusters.

Does it matter?

The Erlang distribution interface is designed to make the full power of the Erlang VM available over the network, and an attacker who penetrates an Erlang node gains almost full control over the user account under which the process is running, unless the BEAM process is in some way contained (see below). For instance, the :os.cmd/1 and :erlang.open_port/2 functions allow execution of arbitrary shell commands. That power extends to all nodes in the cluster, not just the node that was compromised.

So in many ways the Erlang distribution protocol is as attractive a target for attackers as SSH. sshd goes to great lengths to prevent unauthorised access. For instance, its listener process can be made to run in a sandboxed environment to restrict the damage that can be done should the process be compromised prior to authentication. And the sshd binary is typically compiled to take maximum advantage of Address Space Layout Randomisation (ASLR), to frustrate any attempts to leverage buffer overflows or similar bugs in low-level (C) code.

I don’t think anyone is arguing that Erlang distribution doesn’t require special scrutiny from a security perspective. So let’s see what the discussion is about.

Cookie strength

Much of the discussion on the Erlang mailing list is about the strength of cookies. Auto-generated cookies (created in the ~/.erlang.cookie file if you didn’t specify a cookie when starting a node) are 20 capital letters. That means 26^20 = 2.0 x 10^28 possible values, or approximately 94 bits. Ideally I think you’d want at least 128 bits, to match the MD5 digest mechanism used on the wire. That would require 28 letters, or 22 base64 characters.

If you generate properly random cookies of sufficient length, a brute force attack is unlikely to be successful: while the distribution protocol doesn’t seem to rate-limit connections or black-list sources of spurious connection attempts, you would probably notice that someone is doing something nasty before they’d hit the jackpot. My only concern here is that the log message printed when a node fails to authenticate does not identify the culprit: it does show the alleged node name of the initiator, but that can be easily spoofed by the attacker.

Drawbacks of standard distribution protocol

So cookies can be a secure enough way to authenticate nodes, but there are a few real issues with the standard TCP-based distribution protocol that make it unsuitable for use outside an isolated, fully trusted network.

For one thing there is no protection against man-in-the-middle attacks. An attacker who is able to modify the TCP channel between two nodes can simply let the nodes authenticate one another, and then take over. At this point both nodes are compromised.

And while a passive attacker snooping the traffic between two nodes won’t be able to learn the cookie value, thanks to the MD5-based challenge/response mechanism, all subsequent application data is exchanged in cleartext. The implications are of course application specific.

And finally, the cookie mechanism makes it difficult to rotate credentials periodically. While it is possible to gradually roll out a new cookie value across a cluster, link by link, using :erlang.set_cookie/2, it does require careful coordination among nodes.

The good news is that distribution protocol is actually pluggable: if the standard, TCP-based distribution protocol does not suit your needs you can swap it out for the TLS-based alternative included in the :ssl application. And if that one’s not good enough you could even write your own.

SSL/TLS distribution protocol

Let’s have a look at how the TLS distribution module addresses the shortcoming of the standard TCP protocol.

First of all, it leverages the TLS protocol handshake to prevent man-in-the-middle attacks and it protects the confidentiality and integrity of the data being exchanged between nodes.

On top of that, it allows you to augment or replace the cookie mechanism with certificate-based mutual authentication of nodes. Among other security benefits this allows key rotation without coordination: the certificate and private key of a node can be replaced at any time, with no need to simultaneously reconfigure other cluster nodes.

However, TLS distribution has its own drawbacks: it is more CPU intensive due to the encryption and decryption of data (though Erlang/OTP 19 should alleviate that concern somewhat), and establishing a private PKI to issue node certificates is not for the faint at heart. A nice middle-ground might be TLS distribution with PSK authentication instead of certificates, but unfortunately this does not appear to be supported at this time.

Practical advice

Use strong cookie values - make sure your cookies can withstand a brute-force (e.g. dictionary) attack
Do not run a distributed Erlang node as root - that’s a no-brainer, and of course it applies to almost any application
Use TLS rather than TCP distribution on an open network - for the reasons given above
Bind the distribution protocol to the loopback interface if you’re only ever connecting locally - I often launch BEAM instances with a short name just to be able to connect locally for monitoring and debugging, from within an SSH session or using SSH tunnelling; by default the distribution protocol binds to the wildcard interface (0.0.0.0), leading to unnecessary exposure outside of the machine
Run your distributed Erlang node in a container - this could be a Docker container, or some other sandboxing mechanism that restricts the privileges of the BEAM process; for instance, suid binaries in /bin or /usr/bin are a potential target for further privilege escalation, so they probably should not even be accessible from your application
Compile the BEAM binaries as Position Independent Executables (PIE), to leverage ASLR - this reduces the risk of exploitation of buffer overflows in the C-code that makes up the BEAM; to do this, compile Erlang/OTP from source and set CFLAGS=-fpie and LDFLAGS="-pie -z now" when calling ./configure; note that this does not work with HiPE!

Back