How Threat Actors Access Sensitive Data in Unexpected Ways

Jun 21, 2024 by Vinh Lam, Senior Technical Program Manager

Share this Post

While sophisticated encryption and steganographic techniques are widely used by attackers to conceal malicious code, many employ surprisingly simple yet remarkably effective methods to achieve their goals instead. In this exploration, we'll examine some intriguing cases that highlight the creativity and ingenuity behind these seemingly straightforward techniques.

Unintentional Leaks

Data security threats aren't always external; sometimes, they lurk within our systems. Unintentional data leaks, often resulting from human error or oversight, pose a significant risk to organizations and individuals.

Revision

Users may inadvertently share sensitive information when sharing files with multiple versions in cloud storage services like OneDrive. Even if they delete sensitive information from the latest version, previous revisions might still contain it. Recipients who can access the files through sharing links can potentially view all versions, including the problematic ones. The below Microsoft Word screenshot shows a result from a shared MS Word file which included all revisions the owner was unaware of.

Microsoft Word screenshot with text "Removed the SSN"

The same concept can be seen in Git. Attempting to overwrite the commit without properly deleting it from history can lead to the data being accessible to anyone.

Screenshot showing a version control diff in TortoiseGitMerge with a JSON config file modification, where a password value has been added

Another similar case can happen with container layers. Secrets or sensitive data may be inadvertently stored in container layers. Even if updates are made to remove this data, the old layers may still contain it, potentially exposing it when inspecting container images. See the below example:

First, we build a new container image, “with-secret,” which contains a source code file with a secret on it:

Screenshot of a Dockerfile command to copy a main.cpp file containing secret keys into an Alpine Docker image

From that new image, we try to overwrite it with a new file:

Screenshot of a Dockerfile directive to copy a sanitized main.cpp file from a secured Docker image

However, if we inspect the images, we will still see the old main.cpp in layer 2. The exported .tar file contains both files, which means the secret in the original file can still be accessed and leaked

Screenshot showing a 1KB C++ source file named 'main.cpp' in a directory, highlighting the file size and modification date — Copied file

Linked Data and External References

In some cases, the user may insert data from external sources, such as Excel spreadsheets or databases, into a Word document using linked objects or external references. However, they may not realize that changes to the original data source can automatically update the information in the Word document, leading to inconsistencies or unintended disclosures if the external data is modified without proper authorization.

Screenshot showing a spreadsheet with a single column listing names — The user uses linked content in Word from Excel

Screenshot showing a spreadsheet with two columns labeled 'Name' and 'Salary' — The user modifies the Excel file while unaware that the Word file is also being updated.

Cropped Images in Microsoft Office

While seemingly convenient for users to swiftly "cut" images in Microsoft Word, this feature doesn't truly remove the content. The cropped image may still retain hidden or sensitive information, susceptible to potential reconstruction or recovery.

sheet where a single column list of employee names expands into a detailed table including additional columns for 'DOB' and 'ID'

File Metadata

File metadata, such as company names or GPS locations, can contain sensitive information that users may not be aware of. This metadata can inadvertently reveal details about the document's origin or location, potentially compromising confidentiality or privacy.

Intentional Leaks

Intentional breaches are deliberately executed to exploit vulnerabilities and compromise sensitive information for personal gain or nefarious purposes.

Visualization

Hiding data beyond the margins of a page

In the below scenario, the attacker tries to hide data far away from the first columns. Unless users zoom out far enough, there is no way to see that data.

Screenshot showing an empty spreadsheet with a message about data outside the visible area

ZeroFont

The term "ZeroFont" originates from the technique of using tiny, invisible font sizes (often set to zero) to hide malicious URLs or content within the email body. These attacks exploit the fact that many email security filters primarily analyze the visible content of an email to detect phishing attempts or malicious links. By using ZeroFont techniques, attackers may evade detection, increasing the likelihood of successful phishing attacks.

Screenshot displaying HTML code with a hidden span element containing a Social Security Number styled to be invisible

Same Text and Background Color

In this method, the text is formatted with a specific font color matching the document's background color, rendering the text invisible to the naked eye. The concept relies on exploiting the viewer's inability to distinguish between the text and the background due to identical coloration.

Manipulating File Structure

Viewing images versus printing images

<</Type/XObject/Subtype/Image/Width 1100/Height 733/ColorSpace/DeviceRGB/BitsPerComponent 8/Filter/DCTDecode/Interpolate true/Length 160490/Alternates[<</Image 14 0 R/DefaultForPrinting true>>]>>

The PDF shown above contains two distinct images. The second image is denoted by an alternate tag that specifies the image as default for printing. Consequently, if sensitive information is concealed within the second image, it can be easily transmitted to an external recipient and accessed simply by selecting the print option. We previously addressed this scenario in a blog post.

Hidden Data in an Option Object that won’t Show Up in the Reader Application

Using % in PDF as an example, this symbol is used as a comment in a PDF file in text mode, the attackers can write anything without impacting the file’s usability when opening with Adobe Reader.

Screenshot of a text editor displaying the structure of a PDF file, showing the PDF header, catalog, outlines, and pages objects

What to Do to Minimize the Risk

Robust Solutions

Utilize data loss prevention (DLP) solutions to monitor and prevent unauthorized transmission of sensitive information, within the organization's network and external sources.

Utilize content disarm and reconstruct (CDR) solutions to prevent unapproved objects hidden in the file.

Deploy a multi-layered approach to data security, including firewalls, intrusion detection/prevention systems, antivirus software, and encryption tools to safeguard sensitive information at various levels.

Implement access controls and authentication mechanisms to ensure that only authorized individuals can access sensitive data, and regularly review and update user permissions as needed.

Employ endpoint security measures to protect devices and endpoints from malware, ransomware, and other cyber threats.

Regular Audits and Assessments

Conduct regular security audits and risk assessments to identify vulnerabilities, evaluate existing security measures, and implement necessary improvements.

Perform penetration testing and vulnerability scanning to proactively identify and address weaknesses in the organization's infrastructure and systems.

Monitor and analyze network traffic, system logs, and user activity for signs of anomalous behavior or potential security incidents.

Training and Awareness

Implement comprehensive training programs to educate employees about data security best practices, including recognizing phishing attempts, handling sensitive information, and adhering to company policies.

Raise awareness about the importance of data protection and the potential consequences of data leaks, fostering a culture of cybersecurity awareness throughout the organization.

Provide regular updates and reminders about emerging cybersecurity threats and preventive measures.

Defense-in-depth with OPSWAT MetaDefender Platform

Diagram of OPSWAT MetaDefender Platform showcasing comprehensive cybersecurity solutions and connections to cloud, on-premises, and air-gapped environments

The OPSWAT MetaDefender Platform provides multi-layered protection against file-based threats. OPSWAT MetaDefender combats the constant evolution of new attack types with the following technologies: