Unmasking Hidden Threats: How to Detect Secrets in Code

Sep 22, 2023 by Armin Ziaie Tabari, Ph.D., Technical Program Manager

Share this Post

API keys, passwords, encryption keys, and other sensitive data—collectively known in software development as "secrets"—secure a wide range of technologies throughout the software development lifecycle. Unfortunately, they keep finding their way into the hands of attackers.

Researchers at North Carolina State University demonstrated two different approaches for mining secrets, one technique discovered 99% of newly committed files containing secrets in near real-time. They found the median time-to-discovery was 20 seconds. Even if you realize you've accidentally pushed a secret to a public repository, you may only have 20 seconds on average to set up a new secret.

Leaking secrets enables cybercriminals to access and manipulate data. With this in mind, OPSWAT set out to create a developer-friendly solution for detecting secrets in code before it’s pushed to a public repository or compiled into a release. Additionally, the enhanced technology better detects concealed secrets and confidential information.

Let's look at what types of secrets leak, the consequences, and how to stop it from happening.

The Silent Risks of Embedded Secrets

Hardcoded secrets include sensitive information incorporated into software, such as usernames, passwords, SSH keys, access tokens, and other sensitive data. If developers leave secrets in an application's source code or configuration, they will likely become prime targets for malicious agents to attack. Malicious actors constantly scan public code repositories for patterns that identify secrets.

To better illustrate this, consider the following Java code:

java code containing endpoint, accessId, accessKey, projectName, and topicName

Developers might use code similar to this for local testing and forget to remove it when they push the code to a repository. Following compilation, the executable will contain the credentials for "admin" or "secretpass" within its code. It doesn't matter what the environment is- whether it's a desktop application, a server component, or any other software platform—these embedded strings lie exposed. There is no doubt that exposed keys are often the first step for many cybercriminals to breach an otherwise fortified system.

Recent Secret Leaks and Their Impact

In 2021, cyber adversaries took advantage of a flaw in how CodeCov produced Docker images, tweaking an upload tool to relay credentials to them, potentially endangering the development protocols of numerous businesses. In a separate incident, hackers unveiled the source code from the game-streaming platform Twitch, revealing over 6,000 Git repositories and about 3 million documents.

This breach exposed over 6,600 development secrets, paving the way for potential subsequent intrusions. Notably, other incidents of inadvertent exposure after a security breach were reported at prominent companies such as Samsung Electronics, Toyota Motor Corporation, and Microsoft.

The problem of secrets getting exposed is widespread. The following are some of the other significant leaks that have occurred in recent years:

Binance's Data Breach

On July 3, 2022, Changpeng Zhao, CEO of Binance, tweeted about a significant data breach. He said criminals were selling 1 billion records on the dark web. These included names, addresses, and even police and medical records. The leak started because someone copied source code with access details onto a Chinese blog site, letting everyone see it.

Twitter's Leak

On August 2, 2022, 3,207 mobile apps were found leaking Twitter API keys. This means private messages between Twitter users in these apps could be accessed.

Issues with AWS Tokens

On September 1, 2022, Symantec reported that 1,859 apps (both iPhone and Android) had AWS tokens. 77% of these tokens could access private AWS services, and 47% allowed access to large amounts of stored files.

Target's Big Problem

Target faced issues when attackers stole 40 million customer card details. Their sales dropped by 4% after this. The New York Times reported that this issue cost Target $202 million.

After all that has happened, we must ask, "why did these companies not discover these leaks earlier?" A careful code review or regular check could have caught these issues before they became a huge problem.

How OPSWAT's Proactive Data Loss Prevention (DLP) Can Help Prevent Data Leaks

The Proactive DLP has an integrated secret detection function. This feature promptly notifies you when it detects secrets, such as API keys or passwords in source code. Specifically, it can detect secrets associated with Amazon Web Services, Microsoft Azure, and Google Cloud Platform.

Let's take the Binance data leak as an example. According to reports, someone submitted the following information to the blog. Proactive DLP can scan the code, identify the generic access dey, and notify the developers before they push the code to a public repository.

Proactive DLP catches potential mistakes before they become more significant problems.

screenshot of proactive dlp secret detection function detecting sensitive data

About OPSWAT Proactive DLP

OPSWAT Proactive DLP detects, blocks, or redacts sensitive data, helping organizations prevent potential data leaks and compliance violations. Besides hardcoded secrets, Proactive DLP catches sensitive and confidential information before it becomes a significant problem.