Company
Meltdown and Spectre are Critical Vulnerabilities for Cloud Infrastructure. Here’s How the Aptible Security Team Responded
Frank Macreery
CEO
By now it’s likely that you’ve heard about Meltdown and Spectre1, which were publicly disclosed on January 3rd.
As an Aptible customer, here’s what you need to know:
Enclave is architected to mitigate vulnerabilities like Meltdown and Spectre.
The Aptible Security Team immediately responded to the disclosure to further remediate the issue.
We provided realtime account of our response efforts on our status page. This blog post will provide additional context on our response. We’ll also share some of the ways our architecture is designed to protect against these sorts of vulnerabilities.
How these vulnerabilities impact cloud infrastructure
Meltdown in particular (more on Spectre later in this post) allows processes to read memory they should normally not have access to. By extension, in a PaaS environment running untrusted customer code, it allows customers to read memory they shouldn’t normally be allowed to read.
The vulnerability isn’t trivial to exploit at scale, but in theory, it allows for:
Escalation: one customer reads data (e.g. credentials) belonging to the PaaS provider, and uses that to compromise the PaaS provider itself, and by extension other customers.
Lateral compromise: one customer reads data belonging to another customer whose apps are deployed on the same underlying instance.
In other words, Meltdown is a critically important vulnerability for any PaaS provider. However, as an Aptible Enclave customer, you’re protected by the intrinsic architecture of Enclave, as well as an active Security Team. Here’s how.
Aptible Enclave is architected to protect against attacks like Meltdown and Spectre
In fact, this exact type of vulnerability where a customer gets access to memory they shouldn’t normally be able to read is part of our threat model, and Enclave is architected to protect against those.
Here’s how this plays out in terms of the escalation and lateral compromise attacks explained earlier:
Escalation: instances running customer containers on Enclave are unprivileged by design. All privileged access to e.g. AWS or Aptible APIs is orchestrated through isolated “coordinator” instances, which do not host customer containers.
Lateral compromise: for sensitive data, Enclave requires that customers deploy on dedicated-tenancy stacks, which host a single customer’s containers.
In other words: the container boundary is our first line of defense, but it’s not the only one.
Aptible’s Meltdown remediation efforts
As soon as the Meltdown vulnerability was publicized, we acted immediately to deploy patches across our infrastructure to restore the integrity of the container boundary before public exploits were available. These patches needed to be applied to the Linux Kernel, and are known as the “Kernel Page-Table-Isolation” patch set (or “KPTI”).
Here, our remediation was made more difficult by the fact that the Ubuntu Linux distribution, which we rely on for Enclave, was taken by surprise by the unanticipated early release of the vulnerability on January 3rd, and did not have patched Kernels available yet.
As a result, hours after the vulnerability was announced, we started working on a contingency plan, which consisted of building our own patched Kernels targeting Linux 4.14.112. On January 4th, we understood that Ubuntu was unlikely to be able to provide patched Kernels before January 9th (which turned out to be correct), and made the decision to roll out our own Kernels instead3. Other providers have since announced that they followed a similar approach.
Once we validated our newly-minted Kernel through Enclave’s suite of integration tests, we published our plans on our status page and contacted customers with scheduled maintenance windows. Over the course of a few days, we replaced thousands of instances with minimal disruption. Ultimately, our patching of Meltdown completed early in the morning of January 9th, before public Meltdown PoCs were available and before Ubuntu had released patched Kernels.
Timeline
January 3, 2018: We posted to our status page indicating that the Security Team was monitoring the expected release of information about an upcoming vulnerability.
January 4, 2018: Once the details of the vulnerability were released, we published our response plan to our status page, and prioritized response around patching Shared Stacks (which are inherently vulnerable to Meltdown) and otherwise vulnerable Dedicated Stacks. We completed kernel patching for Shared and Dedicated Stacks. We used a bespoke kernel because an official kernel patch was not yet released. We began to contact each customer to coordinate a scheduled maintenance window during which we could restart databases, as needed.
January 9, 2018: We completed all patching and database restarts needed for all remediation efforts related to Meltdown.
Looking ahead and Spectre remediation
As of now Aptible has fully remediated Meltdown for Enclave Stacks.
Going forward, we are continuing to assess the impact of the Spectre vulnerabilities and the development of mitigations in the Linux Kernel to protect against it. Once these mitigations evolve, we’ll likely follow a similar approach (albeit with less urgency) to deploy mitigations for Spectre.
The stakes continue to get higher, as the threat environment continues to elevate just as the consequences for data breach grows. The Aptible Security Team will continue to be aggressive about protecting our customers’ environments from these and all critical vulnerabilities.
Footnotes
The site provides useful information, recommendations and links to security advisories that describe Meltdown for a context broader than this blog post. You may find useful information there related to how to appropriately respond to Meltdown in your own cloud or personal data environments.
Some additional fixes to the KPTI patch series were included in the subsequent Linux 4.14.12 release. 4.14.12 hadn’t been released yet when we started rolling out our Linux 4.14.11-based Kernel, but we did backport the relevant patches onto our 4.14.11 tree ahead of time.
It’s worth noting that the reason we were able to move faster than Ubuntu was because we had fewer constraints. Indeed, Ubuntu guarantees a stable Kernel version for a given Ubuntu release, which means they had to backport the KPTI patches onto older Kernels. That’s a lot of work, which they had to complete on short notice. In comparison, we had the flexibility to choose to upgrade to a newer Kernel instead, which we did.