The cybersecurity community was recently alerted to a critical vulnerability within the NVIDIA Container Toolkit – an essential component for GPU-accelerated applications in AI and ML (machine learning). Identified as CVE-2024-0132, this flaw significantly impacts a wide array of AI applications that rely on GPU resources on both cloud and on-premises settings. Following its discovery in September 2024, NVIDIA acknowledged the issue and released a patch shortly after.
Details of the Vulnerability
The vulnerability within the NVIDIA Container Toolkit, particularly in versions up to and including 1.16.1, originates from a Time-of-check Time-of-Use (TOCTOU) flaw. This weakness can be exploited to elevate privileges, escape containers, and manipulate GPU workloads, potentially leading to erroneous AI outputs or complete service disruptions.
The specific vulnerabilities related to this incident include:
- CVE-2024-0132: This critical flaw, rated 9.0 on the severity scale, may allow specially crafted container images to access the host file system, which potentially leads to code execution, denial of service, and privilege escalation.
- CVE-2024-0133: This medium-severity vulnerability, with a rating of 4.1, permits specially crafted container images to create empty files on the host file system, which could result in data tampering.
NVIDIA promptly addressed the vulnerability by releasing a security bulletin and updated versions of the affected software.
Who is Affected?
Research indicates that over a third (35%) of cloud environments utilizing NVIDIA GPUs are at risk, as discovered by Wiz.
Organizations using the NVIDIA Container Toolkit versions up to and including 1.16.1, as well as the NVIDIA GPU Operator up to and including 24.6.1, should evaluate their environments and take the necessary steps to mitigate the cascading effects of this vulnerability.
Understanding the NVIDIA Container Toolkit
The NVIDIA Container Toolkit is designed to facilitate the creation and execution of GPU-accelerated Docker containers. By default, containers do not have access to GPUs; and this toolkit enables users to expose their NVIDIA GPUs to their containers. The toolkit consists of runtime libraries and utilities that automate the configuration of containers, in which users can leverage NVIDIA GPUs to process high-performance AI workloads. In short, the NVIDIA Container Toolkit allows containers to access the NVIDIA GPU so that applications that need GPU acceleration can run faster and more efficiently.
Alongside the NVIDIA GPU Operator – which orchestrates GPU resources in Kubernetes environments – the toolkit plays a pivotal role in modern AI and ML applications. Essentially, it enhances the performance and efficiency of applications that require HPC (high-performance computing) for data-heavy tasks such as AI training.
However, a vulnerability can introduce risks in various ways:
- Unauthorized GPU Access: Attackers could gain access to the GPU and cause data theft or resource hijacking.
- Privilege Escalation: Attackers may break out of containers and execute code on the host system to compromise the underlying infrastructure.
- Cross-Container Attacks: A compromised container could open illegitimate access to the GPU resources of other containers. This can result in data leaks or denial of service across multiple applications running on the same system.
- Sensitive Data Exposure: Rather than directly mining sensitive data, attackers sometimes seek vulnerabilities in various system components to navigate the environment and escalate privileges. Container technology adds complexity to these exploits.
Potential Attack Scenario
A potential attack flow exploiting the NVIDIA Container Toolkit can be generalized in three steps:
- Create a Malicious Image: An attacker can design a malicious container image aimed at exploiting CVE-2024-0132.
- Access the Host File System: The attacker then executes the malicious image on a vulnerable platform, whether directly through shared GPU services or indirectly through a supply chain attack scheme, or via social engineering. This allows them to mount the host file system and gain unauthorized access to the underlying infrastructure and potentially confidential data from other users.
- Complete Control: With access to critical Unix sockets (docker.sock/containerd.sock), the attacker can issue arbitrary commands on the host system with root privileges, then ultimately seize control of the machine.
Recommendations to Protect Against Container Vulnerabilities
This incident serves as a timely reminder that even trusted container images from reputable sources can harbor serious vulnerabilities. Organizations utilizing the NVIDIA Container Toolkit should:
Upgrade to the Latest Version
Users are highly encouraged to update to NVIDIA Container Toolkit version 1.16.2 and NVIDIA GPU Operator 24.6.2 as soon as possible, especially for container hosts that may operate untrusted images.
Conduct Regular Security Scans
Implement regular scanning procedures for malicious container images and any other components going into your application within cloud environments. Regular scans help assess risks and identify security blind spots associated with these images. Automated scanning tools can help continuously monitor for known vulnerabilities and misconfigurations.
Additionally, integrating security scans into CI/CD pipelines ensures that vulnerabilities are detected before deployment, while comprehensive reports provide insights into identified risks and recommended remediation steps.
Secure Container Images with MetaDefender Software Supply Chain
To mitigate vulnerabilities like those found in the NVIDIA Container Toolkit, OPSWAT MetaDefender Software Supply Chain provides robust threats scanning capabilities for container registries and source code repositories.
Software development and DevSecOps teams will be informed of potentially malicious or vulnerable container images within their application stacks. By leveraging multiple layers of threat detection and prevention, MetaDefender Software Supply Chain also provides insights and recommendations for remediation, including updates to secure versions of affected container images.
You can assess the threat status for the packages in your container images at both a general and detailed level.
Container Security is Part of AI Security
Container vulnerabilities have exposed the need for vigilant and proactive security for organizations that increasingly depend on AI and ML technologies. To learn more about container security and software supply chain security, check out our resources:
MetaDefender Software Supply Chain
Docker Image – a Rising Threat Vector?