By Paweł Płatek
In the race to secure cloud applications, AWS Nitro Enclaves have emerged as a powerful tool for isolating sensitive workloads. But with great power comes great responsibility—and potential security pitfalls. As pioneers in confidential computing security, we at Trail of Bits have scrutinized the attack surface of AWS Nitro Enclaves, uncovering potential bugs that could compromise even these hardened environments.
This post distills our hard-earned insights into actionable guidance for developers deploying Nitro Enclaves. After reading, you’ll be equipped to:
- Identify and mitigate key security risks in your enclave deployment
- Implement best practices for randomness, side-channel protection, and time management
- Avoid common pitfalls in virtual socket handling and attestation
We’ll cover a number of topics, including:
Whether you’re new to Nitro Enclaves or looking to harden existing deployments, this guide will help you navigate the unique security landscape of confidential computing on AWS.
A brief threat model
First, a brief threat model. Enclaves can be attacked from the parent Amazon EC2 instance, which is the only component that has direct access to an enclave. In the context of an attack on an enclave, we should assume that the parent instance’s kernel (including its nitro_enclaves
drivers) is controlled by the attacker. DoS attacks from the instance are not really a concern, as the parent can always shut down its enclaves.
If the EC2 instance forwards user traffic from the internet, then attacks on its enclaves could come from that direction and could involve all the usual attack vectors (business-logic, memory corruption, cryptographic, etc.). And in the other direction, users could be targeted by malicious EC2 instances with impersonation attacks.
In terms of trust zones, an enclave should be treated as a single trust zone. Enclaves run normal Linux and can theoretically use its access control features to “drive lines” within themselves. But that would be pointless—adversarial access (e.g., via a supply-chain attack) to anything inside the enclave would diminish the benefits of its strong isolation and of attestation. Therefore, compromise of a single enclave component should be treated as a total enclave compromise.
Finally, the hypervisor is trusted—we must assume it behaves correctly and not maliciously.
Vsocks
The main entrypoint to an enclave is the local virtual socket (vsock). Only the parent EC2 instance can use the socket. Vsocks are managed by the hypervisor—the hypervisor provides the parent EC2 instance’s and the enclave’s kernels with /dev/vsock
device nodes.
Vsocks are identified by a context identifier (CID) and port. Every enclave must use a unique CID, which can be set during initialization and can listen on multiple ports. There are a few predefined CIDs:
VMADDR_CID_HYPERVISOR
= 0VMADDR_CID_LOCAL
= 1VMADDR_CID_HOST
= 2VMADDR_CID_PARENT
= 3 (the parent EC2 instance)VMADDR_CID_ANY
=0xFFFFFFFF
= -1U (listen on all CIDs)
Enclaves usually use only the VMADDR_CID_PARENT CID
(to send data) and the VMADDR_CID_ANY CID
(to listen for data). An example use of the VMADDR_CID_PARENT
can be found in the init.c
module of AWS’s enclaves SDK—the enclave sends a “heartbeat” signal to the parent EC2 instance just after initialization. The signal is handled by the nitro-cli
tool.
Standard socket-related issues are the main issues to worry about when it comes to vsocks. When developing an enclave, consider the following to ensure such issues cannot enable certain attack vectors:
- Does the enclave accept connections asynchronously (with multithreading)? If not, a single user may block other users from accessing the enclave for a long period of time.
- Does the enclave time out connections? If not, a single user may persistently occupy a socket or open multiple connections to the enclave and drain available resources (like file descriptors).
- If the enclave uses multithreading, is its state synchronization correctly implemented?
- Does the enclave handle errors correctly? Reading from a socket with the
recv
method is especially tricky. A common pattern is to loop over therecv
call until the desired number of bytes is received, but this pattern should be carefully implemented:- If the
EINTR
error is returned, the enclave should retry therecv
call. Otherwise, the enclave may drop valid and live connections. - If there is no error but the returned length is 0, the enclave should break the loop. Otherwise, the peer may shut down the connection before sending the expected number of bytes, making the enclave loop infinitely.
- If the socket is non-blocking, then reading data correctly is even more tricky.
- If the
The main risk of these issues is DoS. The parent EC2 instance may shut down any of its enclaves, so the actual risks are present only if a DoS can be triggered by external users. Providing timely access to the system is the responsibility of both the enclave and the EC2 instance communicating with the enclave.
Another vulnerability class involving vsocks is CID confusion: if an EC2 instance runs multiple enclaves, it may send data to the wrong one (e.g., due to a race condition issue). However, even if such a bug exists, it should not pose much risk or contribute much to an enclave’s attack surface, because traffic between users and the enclave should be authenticated end to end.
Finally, note that enclaves use the SOCK_STREAM
socket type by default. If you change the type to SOCK_DGRAM
, do some research to learn about the security properties of this communication type.
Randomness
Enclaves must have access to secure randomness. The word “secure” in this context means that adversaries don’t know or control all the entropy used to produce random data. On Linux, a few entropy sources are mixed together by the kernel. Among them are the CPU-provided RDRAND/RDSEED
source and platform-provided hardware random number generators (RNGs). The AWS Nitro Trusted Platform Module provides its own hardware RNG (called nsm-hwrng
).
The final randomness can be obtained via the getrandom
system call or from (less reliable) /dev/{u}random
devices. There is also the /dev/hwrng
device, which gives more direct access to the selected hardware RNG. This device should not be used by user-space applications.
When a new hardware RNG is registered by the kernel, it is used right away to add entropy to the system. A list of available hardware RNGs can be found in the /sys/class/misc/hw_random/rng_available
file. One of the registered RNGs is selected automatically to periodically add entropy and is indicated in the /sys/devices/virtual/misc/hw_random/rng_current
file.
We recommend configuring your enclaves to explicitly check that the current RNG (rng_current
) is set to nsm-hwrng
. This check will ensure that the AWS Nitro RNG was successfully registered and that it’s the one the kernel uses periodically to add entropy.
To further boost the security of your enclave’s randomness, have it pull entropy from external sources whenever there are convenient sources available. A common external source is the AWS Key Management Service, which provides a convenient GenerateRandom
method that enclaves can use to bring in entropy over an encrypted channel.
If you want to follow NIST/AIS standards (see section 5.3.1 in “Documentation and Analysis of the Linux Random Number Generator”) or suspect issues with the RDRAND
/RDSEED
instructions (see also this LWNet article and this tweet), you can disable the random.trust_{bootloader,cpu}
kernel parameters. That will inform the kernel not to include these sources for estimation of available entropy.
Lastly, make sure that your enclaves use a kernel version greater than 5.17.12
—important changes were introduced to the kernel’s random algorithm.
Side channels
Application-level timing side-channel attacks are a threat to enclaves, as they are to any application. Applications running inside enclaves must process confidential data in constant time. Attacks from the parent EC2 instance can use almost system-clock-precise time measurements, so don’t count on network jitter for mitigations. You can read more about timing attack vectors in our blog post “The life of an optimization barrier.”
Also, though this doesn’t really constitute a side-channel attack, error messages returned by an enclave can be used by attackers to reason about the enclave’s state. Think about issues like padding oracles and account enumeration. We recommend keeping errors returned by enclaves as generic as possible. How generic errors should be will depend on the given business requirements, as users of any application will need some level of error tracing.
CPU memory side channels
The main type of side-channel attack to know about involves CPU memory. CPUs share some memory—most notably the cache lines. If memory is simultaneously accessible to two components from different trust zones—like an enclave and its parent EC2 instance—then it may be possible for one component to indirectly leak the other component’s data via measurements of memory access patterns. Even if an application processes secret data in constant time, attackers with access to this type of side channel can exploit data-dependent branching.
In a typical architecture, CPUs can be categorized into NUMA nodes, CPU cores, and CPU threads. The smallest physical processing unit is the CPU core. The core may have multiple logical threads (virtual CPUs)—the smallest logical processing units—and threads share L1 and L2 cache lines. The L3 line (also called the last-level cache) is shared between all cores in a NUMA node.
Parent EC2 instances may have been allocated only a few CPU cores from a NUMA node. Therefore, they may share an L3 cache with other instances. However, the AWS white paper “The Security Design of the AWS Nitro System” claims that the L3 cache is never shared simultaneously. Unfortunately, there is not much more information on the topic.
What about CPUs in enclaves? CPUs are taken from the parent EC2 instance and assigned to an enclave. According to the AWS and nitro-cli
source code, the hypervisor enforces the following:
- The CPU #0 core (all its threads) is not assignable to enclaves.
- Enclaves must use full cores.
- All cores assigned to an enclave must be from the same NUMA node.
In the worst case, an enclave will share the L3 cache with its parent EC2 instance (or with other enclaves). However, whether the L3 cache can be used to carry out side-channel attacks is debatable. On one hand, the AWS white paper doesn’t make a big deal of this attack vector. On the other hand, recent research indicates the practicality of such an attack (see “Last-Level Cache Side-Channel Attacks Are Feasible in the Modern Public Cloud”).
If you are very concerned about L3 cache side-channel attacks, you can run the enclave on a full NUMA node. To do so, you would have to allocate more than one full NUMA node to the parent EC2 instance so that one NUMA node can be used for the enclave while saving some CPUs on the other NUMA node for the parent. Note that this mitigation is resource-inefficient and costly.
Alternatively, you can experiment with Intel’s Cache Allocation Technology (CAT) to isolate the enclave’s L3 cache (see the intel-cmt-cat
software) from the parent. Note, however, that we don’t know whether CAT can be changed dynamically for a running enclave—that would render this solution unuseful.
If you implement any of the above mitigations, you will have to add relevant information to the attestation. Otherwise, users won’t be able to ensure that the L3 side-channel attack vector was really mitigated.
Anyway, you want your security-critical code (like cryptography) to be implemented with secrets-independent memory access patterns. Both hardware- and software-level security controls are important here.
Memory
Memory for enclaves is carved out from parent EC2 instances. It is the hypervisor’s responsibility to protect access to an enclave’s memory and to clear it after it’s returned to the parent. When it comes to enclave memory as an attack vector, developers really only need to worry about DoS attacks. Applications running inside an enclave should have limits on how much data external users can store. Otherwise, a single user may be able to consume all of an enclave’s available memory and crash the enclave (try running cat /dev/zero
inside the enclave to see how it behaves when a large amount of memory is consumed).
So how much space does your enclave have? The answer is a bit complicated. First of all, the enclave’s init process doesn’t mount a new root filesystem, but keeps the initial initramfs
and chroots
to a directory (though there is a pending PR that will change this behavior once merged). This puts some limits on the filesystem’s size. Also, data saved in the filesystem will consume available RAM.
You can check the total available RAM and filesystem space by executing the free command inside the enclave. The filesystem’s size limit should be around 40–50% of that total space. You can confirm that by filling the whole filesystem’s space and checking how much data ends up being stored there:
dd count=9999999999 if=/dev/zero > /fillspace du -h -d1 /
Another issue with memory is that the enclave doesn’t have any persistent storage. Once it is shut down, all its data is lost. Moreover, AWS Nitro doesn’t provide any specific data sealing mechanism. It’s your application’s responsibility to implement it. Read our blog post “A trail of flipping bits” for more information.
Time
A less common source of security issues is an enclave’s time source—namely, from where the enclave gets its time. An attacker who can control an enclave’s time could perform rollback and replay attacks. For example, the attacker could switch the enclave’s time to the past and make the enclave accept expired TLS certificates.
Getting a trusted source of time may be a somewhat complex problem in the space of confidential computing. Fortunately, enclaves can rely on the trusted hypervisor for delivery of secure clock sources. From the developer’s side, there are only three actions worth taking to improve the security and correctness of your enclave’s time sources:
- Ensure that
current_clocksource
is set tokvm-clock
in the enclave’s kernel configuration; consider even adding an application-level runtime check for the clock (in case something goes wrong during enclave bootstrapping and it ends up with a different clock source). - Enable the Precision Time Protocol for better clock synchronization between the enclave and the hypervisor. It’s like the Network Time Protocol (NTP) but works over a hardware connection. It should be more secure (as it has a smaller attack surface) and easier to set up than the NTP.
- For security-critical functionalities (like replay protections) use Unix time. Be careful with UTC and time zones, as daylight saving time and leap seconds may “move time backwards.”
Why kvm-clock?
Machines using an x86 architecture can have a few different sources of time. We can use the following command to check the sources available to enclaves:
cat /sys/devices/system/clocksource/clocksource0/available_clocksource
Enclaves should have two sources: tsc
and kvm-clock
(you can see them if you run a sample enclave and check its sources); the latter is enabled by default, as can be checked in the current_clocksource
file. How do these sources work?
The TSC mechanism is based on the Time Stamp Counter register. It is a per-CPU monotonic counter implemented as a model-specific register (MSR). Every (virtual) CPU has its own register. The counter increments with every CPU cycle (more or less). Linux computes the current time based on the counter scaled by the CPU’s frequency and some initial date.
We can read (and write!) TSC values if we have root privileges. To do so, we need the TSC’s offset (which is 16) and its size (which is 8 bytes). MSR registers can be accessed through the /dev/cpu
device:
dd iflag=count_bytes,skip_bytes count=8 skip=16 if=/dev/cpu/0/msr dd if=<(echo "34d6 f1dc 8003 0000" | xxd -r -p) of=/dev/cpu/0/msr seek=16 oflag=seek_bytes
The TSC can also be read with the clock_gettime
method using the CLOCK_MONOTONIC_RAW
clock ID, and with the RDTSC
assembly instruction.
Theoretically, if we change the TSC, the wall clock reported by clock_gettime
with the CLOCK_REALTIME
clock ID, by the gettimeofday
function, and by the date
command should change. However, the Linux kernel works hard to try to make TSCs behave reasonably and be synchronized with each other (for example, check out the tsc
watchdog code and functionality related to the MSR_IA32_TSC_ADJUST
register). So breaking the clock is not that easy.
The TSC can be used to track time elapsed, but where do enclaves get the “some initial date” from which the time elapsed is counted? Usually, in other systems, that date is obtained using the NTP. However, enclaves do not have out-of-the-box access to the network and don’t use the NTP (see slide 26 of this presentation from AWS’s 2020 re:Invent conference).
With the tsc
clock and no NTP, the initial date is somewhat randomly selected—the truth is we haven’t determined where it comes from. You can force an enclave to boot without the kvm-clock
by passing the no-kvmclock no-kvmclock-vsyscall
kernel parameters (but note that these parameters should not be provided at runtime) and check the initial date for yourself. In our experiments, the date was:
Tue Nov 30 00:00:00 UTC 1999
As you can see, the TSC mechanism doesn’t work well with enclaves. Moreover, it breaks badly when the machine is virtualized. Because of that, AWS introduced the kvm-clock
as the default source of time for enclaves. It is an implementation of the paravirtual clock driver (pvclock) protocol (see this article and this blog post for more info on pvclock). With this protocol, the host (the AWS Nitro hypervisor in our case) provides the pvclock_vcpu_time_info
structure to the guest (the enclave). The structure contains information that enables the guest to adjust its time measurements—most notably, the host’s wall clock (system_time
field), which is used as the initial date.
Interestingly, the guest’s userland applications can use the TSC mechanism even if the kvm-clock is enabled. That’s because the RDTSC instruction is (usually) not emulated and therefore may provide non-adjusted TSC register readings.
Please note that if your enclaves use different clock sources or enable NTP, you should do some additional research to see if there are related security issues.
Attestation
Cryptographic attestation is the source of trust for end users. It is essential that users correctly parse and validate attestations. Fortunately, AWS provides good documentation on how to consume attestations.
The most important attestation data is protocol-specific, but we have a few generally applicable tips for developers to keep in mind (in addition to what’s written in the AWS documentation):
- The enclave should enforce a minimal nonce length.
- xUsers should check the timestamp provided in the attestation in addition to nonces.
- The attestation’s timestamp should not be used to reason about the enclave’s time. This timestamp may differ from the enclave’s time, as the former is generated by the hypervisor, and the latter by whatever clock source the enclave is using.
- Don’t use RSA for the
public_key
feature.
The NSM driver
Your enclave applications will use the NSM driver, which is accessible via the /dev/nsm
node. Its source code can be found in the aws-nitro-enclaves-sdk-bootstrap
and kernel
repositories. Applications communicate with the driver via the IOCTL
system call and can use the nsm-api
library to do so.
Developers should be aware that applications running inside an enclave may misuse the driver or the library. However, there isn’t much that can go wrong if developers take these steps:
- The driver lets you extend and lock more platform configuration registers (PCRs) than the basic 0–4 and 8 PCRs. Locked PCRs cannot be extended, and they are included in enclave attestations. How these additional PCRs are used depends on how you configure your application. Just make sure that it distinguishes between locked and unlocked ones.
- Remember to make the application check the PCRs’ lock state properties when sending the
DescribePCR
request to the NSM driver. Otherwise, it may be consulting a PCR that may still be manipulated. - Requests and responses are CBOR-encoded. Make sure to get the encoding right. Incorrectly decoded responses may provide false data to your application.
- It is not recommended to use the
nsm_get_random
method directly. It skips the kernel’s algorithm for mixing multiple entropy sources and therefore is more prone to errors. Instead, use common randomness APIs (likegetrandom
). - The
nsm_init
method returns-1
on error, which is an unusual behavior in Rust, so make sure your application accounts for that.
That’s (not) all folks
Securing AWS Nitro Enclaves requires vigilance across multiple attack vectors. By implementing the recommendations in this post—from hardening virtual sockets to verifying randomness sources—you can significantly reduce the risk of compromise to your enclave workloads, helping shape a more secure future for confidential computing.
Key takeaways:
- Treat enclaves as a single trust zone and implement end-to-end security.
- Mitigate side-channel risks through proper CPU allocation and constant-time processing.
- Verify enclave entropy sources in the runtime.
- Use the right time sources inside the enclave.
- Implement robust attestation practices, including nonce and timestamp validation.
For more security considerations, see our first post on enclave images and attestation. If your enclave uses external systems—like AWS Key Management Service or AWS Certificate Manager—review the systems and supporting tools for additional security footguns.
We encourage you to critically evaluate your own Nitro Enclave deployments. Trail of Bits offers in-depth security assessments and custom hardening strategies for confidential computing environments. If you’re ready to take your Nitro Enclaves’ security to the next level, contact us to schedule a consultation with our experts and ensure that your sensitive workloads remain truly confidential.