Encrypting EC2 ephemeral volumes with LUKS and AWS KMS
Update 2017-03-06: If I did this again today I’d probably use the Systems Manager Parameter Store to save the passphrase rather than using envelope encryption to keep the encrypted passphrase on disk. All the other concepts remain viable.
A project I worked on recently has a business requirement to encrypt data at rest. We had a mid-sized Cassandra cluster on EC2 that, for various reasons, stored data on ephemeral volumes. The system had previously relied on Gazzang (now owned by Cloudera) for on-disk encryption, but according to the operations team it was unwieldy to manage and an “operational bottleneck.” I can’t attest to that as I wasn’t involved in the implementation. I was asked to replace it.
Enter dm-crypt, and its sidekicks cryptsetup and LUKS. Perfect for what we needed: an open source disk encryption system built in to the Linux kernel.
Operationally, one challenge with full disk encryption for servers is key management. LUKS/cryptsetup takes a passphrase to be used to generate symmetric keys for encrypting the disk. How can you safely provide the passphrase for mounting the encrypted volume at boot time without operational intervention?
I chose to tackle this with AWS Key Management Service. The scheme works like this:
- All Cassandra nodes are members of an IAM role with access to a Cassandra master key. It is the easiest step since the KMS Create Key wizard walks you right through it. I ended up with a key policy that lets my IAM user account administer the key, and the C* IAM role has access to encrypt and decrypt data using the KMS master key.
- At boot, the C* nodes generate a random passphrase and encrypt it using KMS envelope encryption. Store the encrypted passphrase ciphertext on disk.
- Create a RAID device of all the ephemeral volumes, then use the passphrase to encrypt the volume using LUKS.
- After a reboot, send the ciphertext to KMS, which decrypts and hands back the passphrase. Use it to open the volume.
The production implementation is chef-ified and somewhat abstract, but I used the following scripts as a proof-of-concept.
Create a KMS key
You can do this either from the console or via the aws cli tool. It requires a gnarly key policy, which is created for you automatically if you use the console wizard. See an example in this gist.
Create and encrypt the ephemeral disks
The script below performs steps 2 & 3 in the list above. Create a RAID array, generate a passphrase, encrypt the RAID device using dm-crypt, and finally create and mount a filesystem on the encrypted device.
Set up post-boot decryption
At boot we need to decrypt the volume before it’s needed by Cassandra. We’re using Amazon Linux, which still uses sysv-style init. I put the script in /etc/init.d/luks-mount
with a symlink from
/etc/rc.3/S15luks
, but this will vary depending on Linux distro. A script to set this up runs just after the encrypted volume is mounted. It looks like this:
Make sure Amazon Linux can boot
A final wrinkle cropped up when Amazon Linux would auto-discover the LUKS volume early in the boot process and would try to decrypt it. At a reboot, it would just prompt for a passphrase and hang indefinitely. The system log looked something like this:
0.478204 dracut: luksOpen /dev/md0 luks-9445aa60-9664-4d3b-a65d-bae6576b77bb
/dev/md0 (luks-9445aa60-9664-4d3b-a65d-bae6576b77bb) is password protectedEnter passphrase for /dev/md0:
The fix was to prevent the initramfs from discovering and mounting encrypted devices. I did it by removing the crypt module altogether:
Conclusion
Is it a perfect scheme? Probably not. But it protects the data at rest as it resides on the host running our EC2 instances, and it avoids ever storing the passphrase in plaintext anywhere. Notice that the passphrase is only ever stored in variables in the bash scripts, and so it’s only ever in memory.