MetaDefender Core File Scanning

MetaDefender NDR's file-scanning family is the only detection engine that examines the content of what the network is carrying rather than its metadata. When a sensor carves an executable, an archive, a document, or a script out of an observed Hypertext Transfer Protocol (HTTP) download, File Transfer Protocol (FTP) transfer, or Server Message Block (SMB) share copy, the file is handed off to OPSWAT MetaDefender for multi-antivirus (multi-AV) scanning. The result -- a verdict from dozens of AV engines plus file-type and file-size telemetry -- is stitched back onto the originating FileInfo event and, when any engine raises a threat, promoted to one of three severity-tiered MD Core alerts. This chapter describes how extraction, submission, and scoring work end-to-end, what the operator sees, and which knobs govern the family's behavior.

First-use acronym expansions in this chapter: MD Core (MetaDefender Core -- OPSWAT's on-premises multi-AV scanning server), MD Cloud (MetaDefender Cloud -- the Software-as-a-Service equivalent), AV (antivirus), IOC (Indicator of Compromise), MIME (Multipurpose Internet Mail Extensions), SHA-256 (Secure Hash Algorithm 256-bit), MD5 (Message Digest algorithm 5), PDF (Portable Document Format), HTTP (Hypertext Transfer Protocol), FTP (File Transfer Protocol), SMB (Server Message Block), API (Application Programming Interface), FRD (Functional Requirements Document), MVP (Minimum Viable Product).

What It Is

The MetaDefender Core file-scanning family is an enrichment service that automates the full artifact lifecycle from sensor extraction through analysis result delivery. When MetaDefender NDR sensors carve files out of observed network traffic, the enrichment service submits them for scanning by a MetaDefender backend -- either the OPSWAT MetaDefender Cloud Software-as-a-Service or an on-premises MetaDefender Core cluster -- and publishes the scan verdict as an entity-keyed enrichment that the aggregator stitches back onto the originating FileInfo event. The alert engine then evaluates the verdict against three severity-tiered rules and emits an MD Core Alert when any engine flags the file.

The service sits between two external subsystems: the sensor's file-store, which writes carved files to a shared location and emits a FileInfo event announcing their availability, and the MetaDefender backend, which performs the actual multi-AV scan. Operators choose one of two deployment modes per environment.

Mode	MetaDefender Backend	When to use
Cloud	`api.metadefender.com/v4` (OPSWAT-hosted)	Default. No on-premises infrastructure to run; engine set and signatures managed by OPSWAT; requires outbound internet access and an API key from the My OPSWAT portal
Core	On-premises MetaDefender Core cluster	Air-gapped environments, or deployments with regulatory or bandwidth constraints. Requires a licensed MD Core cluster reachable from the Manager; authentication via API key or session credentials

Deployment mode, credentials, and every scanning parameter are Policy-managed -- operators reconfigure the service without redeployment. See Integrations for the MetaDefender integration setup procedure.

What It Detects

The family covers every malicious artifact the network happens to carry, limited by what the sensor can extract and what the configured MetaDefender backend's engine set covers. Typical hits include:

Commodity malware payloads. Executables, Dynamic-Link Libraries, and scripts associated with commodity trojan families (Emotet, TrickBot, IcedID, RedLine, AsyncRAT), loaders and droppers, and remote-administration tools delivered over Hypertext Transfer Protocol or Server Message Block.
Document-borne malware. Weaponized Portable Document Format files, Office documents carrying macros or exploits, Rich Text Format lures, and archive formats (ZIP, RAR, 7-Zip) concealing second-stage payloads.
Known-bad files by hash. Files whose Secure Hash Algorithm 256-bit hash is already in the MetaDefender cache from a prior submission anywhere on the platform -- a zero-latency cache hit returns the prior verdict without resubmitting the content.
Multi-engine corroboration on borderline artifacts. Files that one or two engines flag as suspicious are surfaced at the Low tier; the same file flagged by six or more engines is surfaced at the High tier. The positive-engine count carries forward into severity selection.

The family does not attempt content-aware sandbox detonation, static binary feature extraction, or machine-learning classification inside MetaDefender NDR -- those behaviors are inherited from the MetaDefender backend's engine set. YARA rule authoring, Data Loss Prevention classification, Content Disarm and Reconstruction output, and vulnerability assessment against scanned files are MetaDefender Core features that MetaDefender NDR does not surface in the MVP release.

How It Works

The pipeline moves a candidate file through seven stages.

Extraction. Suricata's file-store output, enabled on the sensor, watches HTTP bodies, SMB reads and writes, FTP transfers, and other protocols that carry file content. When a transfer completes and the carved file's hash and type are known, Suricata writes the file to the extraction directory in a {sha256[0:2]}/{sha256} layout and emits a FileInfo event carrying the file's Multipurpose Internet Mail Extensions type, size, Secure Hash Algorithm 256-bit, and Message Digest 5 hashes. The FileInfo event is published to the raw sensor-events stream.
Event filtering. The MetaDefender Core enrichment service consumes every FileInfo event. It keeps only events whose fileinfo.stored field is true and whose fileinfo.state is CLOSED -- the pair of flags Suricata sets once a file is fully written and the hash computation is finished. Events without a Secure Hash Algorithm 256-bit or Message Digest 5 hash are skipped. Events whose Multipurpose Internet Mail Extensions type matches the configured skip-list -- either an exact-match list or a prefix-match list exposed through Policy -- are also skipped so that operators can exclude traffic-heavy benign types such as images, audio, and video without changing the carving policy on the sensor.
Hash lookup. With hash-lookup-first enabled (the default), the service issues a hash-lookup call against the MetaDefender backend before any file data is uploaded. MetaDefender returns a cached verdict when one exists within the rescan window -- three days by default, configurable via Policy. A cache hit produces the same entity enrichment as a full scan but with cached_result: true and zero upload cost. This is the primary deduplication mechanism: the same file observed a hundred times across the fleet produces one scan and ninety-nine cache hits.
File submission. On a cache miss, the service opens the carved file from its configured file source -- either a local filesystem path that the sensor and the enrichment service share, or a Simple Storage Service (S3) compatible bucket -- and uploads it to the MetaDefender file-submission endpoint with the original filename and configured authentication headers. Uploads run concurrently, capped by a configurable concurrency semaphore; rate-limit headers from the backend back off submissions automatically when the account quota is near exhaustion. Transient failures (network errors, 5xx responses) retry with exponential backoff and jitter.
Polling for results. After a successful submission the service receives a scan identifier and polls MetaDefender on a configurable interval until the scan reaches 100% completion, fails, or the maximum poll budget is exhausted. Each outer poll attempt tolerates a small number of inner retries on transient errors so that one flaky response does not abandon the scan.
Result materialization. On scan completion, the service converts MetaDefender's response into a flat scan-result data map -- threat flag, threat name, multi-AV positive-engine count, total engines, file metadata (type, size, hashes), per-engine details, scan timestamp, and whether the result came from cache. The service then attaches this data map to the file hash as an entity enrichment; the aggregator stitches it onto the event at fileinfo.sha256_enrichments.mdcore (the per-entity sibling) and also under a top-level mdcore summary keyed by the file hash so the alert engine can reason across multiple files on the same event.
Alert evaluation and post-scan cleanup. The alert engine evaluates three severity-tiered MD Core rules against the mdcore summary (see the next three sections); any rule that matches emits an MD Core Alert at its configured severity. Independently, the service optionally deletes the carved file from the extraction directory once scanning completes (the Policy-managed default) and optionally archives it to long-term Simple Storage Service storage for later investigation (off by default).

When the MetaDefender integration is unconfigured -- no API key for Cloud mode, or no Core base URL for Core mode -- the service stays running in idle mode, consumes FileInfo events, and emits a disabled enrichment status so that the aggregator's tracker advances cleanly. Once credentials are delivered via the Policy configuration broadcast, the MetaDefender client lazy-initializes and scanning begins without a restart.

Trigger Conditions

An MD Core Alert fires when the enrichment service produces at least one scan result with threat_found: true and the alert engine's severity-tiered rule sees the corresponding positive_engines count on any entity under the event's top-level mdcore summary. The fields below are what analysts read when triaging the alert.

Field	Meaning
`mdcore.<sha256>`	Per-entity scan-result block keyed by the file's Secure Hash Algorithm 256-bit hash. Multiple entities appear when the same event carries several carved files.
`mdcore.<sha256>.threat_found`	Boolean. `true` when at least one MetaDefender engine flagged the file. Drives whether any MDCore rule can fire.
`mdcore.<sha256>.threat_name`	The canonical threat name returned by the engine with the highest-confidence verdict (for example, `Trojan.Generic.PDF`). Present only when `threat_found` is `true.`
`mdcore.<sha256>.scan_result`	Human-readable result label (`Infected`, `Suspicious`, `No Threat Detected`, and so on).
`mdcore.<sha256>.scan_result_code`	Numeric scan verdict. The code-to-meaning map is `0 = No Threat`, `1 = Infected`, `2 = Suspicious`, `3 = Failed to Scan`, `4 = Cleaned / Quarantined`, `5 = Unknown`, `6 = Skipped - Clean`, `7 = Skipped - Infected`.
`mdcore.<sha256>.positive_engines`	Integer count of engines that flagged the file. Drives the severity tier -- the rules compare the maximum value across all entities on the event.
`mdcore.<sha256>.total_engines`	Integer count of engines that successfully scanned the file. Present so analysts can read the positive ratio (for example, 12 positive of 30 total).
`mdcore.<sha256>.data_id`	The MetaDefender-assigned scan identifier. Useful for pivoting into the MetaDefender console for per-engine details beyond what the enrichment carries.
`mdcore.<sha256>.cached_result`	Boolean. `true` when the verdict came from a hash-cache hit rather than a fresh scan.
`mdcore.<sha256>.scan_time`	Timestamp of the scan that produced the verdict (may precede the event's timestamp when the result is cached).
`mdcore.<sha256>.file_info`	Object containing the file's Secure Hash Algorithm 256-bit, Secure Hash Algorithm 1, Message Digest 5, size in bytes, file type, and file-type description.
`mdcore.<sha256>.scan_details`	Object mapping engine name to per-engine detail (engine version, signature version, threat name, per-engine result code). Used to confirm which engines raised the flag.

When an event carries multiple files -- common with multi-part archive transfers and bulk Server Message Block copies -- all files share the same event and the alert engine reduces across them: the highest positive_engines value determines the severity tier, so a single MD Core Alert is emitted per event at the strongest verdict's tier. Each file's full scan-result block remains readable in the sidebar for per-file analysis.

Severity Classification

MD Core alerts are raised at three native severity tiers that track the positive-engine count. This is the only family whose severity tier is a computed reduction rather than a direct label -- the alert engine evaluates the maximum positive_engines across all file entities on the event against the thresholds below.

Tigger	Unified Severity
Any entity has `threat_found: true` and the maximum `positive_engines` across entities is 6 or greater.	High
Any entity has `threat_found: true` and the maximum `positive_engines` across entities is 3, 4, or 5.	Medium
Any entity has `threat_found: true` and the maximum `positive_engines` across entities is 1 or 2.	Low

The High tier corresponds to broad cross-vendor agreement on a malicious verdict -- six or more independent AV engines flagging the same content is strong corroboration. The Medium tier covers the typical early-detection band where three to five engines agree, common on variants that have not yet propagated through every vendor's signature set. The Low tier is labeled "possible false positive" in the rule description -- one or two engines flagging a file is frequently but not always a real positive, and analysts handling Low-tier MD Core alerts should factor in whether the affected hosts, destinations, or filenames corroborate the verdict before escalating.

The IOC auto-escalation rule described in Detection Overview promotes an M DCore Alert to Critical / 0.99 confidence when the scanned file's hash -- or the connection that delivered it -- coincides with a C2 feed hit or an InSights Threat Intelligence Database match. A borderline Low-tier file downloaded from a C2-feed destination becomes a Critical MD Core Alert; the matched indicator is visible in the companion C2 Enrichment or InSights Enrichment sidebar section so analysts can see which feed drove the escalation.

Confidence Scoring

Confidence tracks the positive-engine ratio so that analysts can order work inside each severity tier.

Trigger	Alert Confidence
High-tier alert (6+ positive engines).	0.90
Medium-tier alert (3-5 positive engines).	0.75
Low-tier alert (1-2 positive engines).	0.55
Any tier with an IOC auto-escalation.	0.99

The bands correspond to the confidence scale defined in Detection Overview: 0.80-0.94 for strong signals, 0.60-0.79 for moderate, 0.40-0.59 for low. The 0.99 band is reserved for the IOC auto-escalation case, where the file-scan verdict is corroborated by an independent indicator match.

Where It Surfaces

MDCore alerts and the underlying scan-result enrichments appear in five places.

Dashboard -- Recent Severities donut. High-tier MD Core alerts contribute to the High slice; Medium-tier alerts contribute to the Medium slice; Low-tier alerts contribute to the Low slice; IOC-auto-escalated alerts contribute to the Critical slice. Top Signature Hits is Suricata-only and does not include MD Core alerts.
Dashboard -- Recent Alerts feed. Newest MDCore alerts flow through the cross-family feed, distinguishable by the MDCore Alert type label. Clicking a row opens the same sidebar used on the Hunt page.
**Hunt page -- ** MD Core Alert sub-tab. Under the All Alerts bucket, this per-type sub-tab lists every MD Core alert with columns for timestamp, alert.signature, source and destination endpoints, event_type, the severity label, and the enrichment-specific columns mdcore.scan_result and mdcore.threat_name. Analysts filter by severity to split queue load between the High and Medium tiers.
Hunt page -- FileInfo sessions sub-tab. Every carved file is visible as a FileInfo session record whether or not MetaDefender produced a verdict. The columns include fileinfo.filename, fileinfo.magic (the file type), fileinfo.size, fileinfo.sha256, and fileinfo.md5. Rows that scanned with a positive verdict render the MDCore Enrichment sidebar section alongside the FileInfo block; rows that scanned clean render the sidebar cleanly to indicate the file was examined and found benign.
**Hunt detail sidebar -- ** MD Core Enrichment section. This section renders on any record that carries an mdcore block -- the MD Core alert rows, the underlying FileInfo session rows, and any downstream record that references the same file hash. The section lists the scan verdict, the threat name when present, the positive-to-total engine ratio, the per-engine detail map, the file metadata block, and whether the result came from cache. When the scanned file's companion traffic also matched a C2 indicator or an InSights indicator, the sidebar renders the C2 Enrichment or InSights Enrichment sections alongside so analysts see the full intelligence picture without leaving the record.

Alert Payload Example

Abbreviated JavaScript Object Notation for a High-tier MD Core Alert fired on a Portable Document Format download flagged by twelve MetaDefender engines. The underlying event is a standard Suricata FileInfo record; the alert adds the mdcore entity-keyed payload and the alert block, and the alert engine sets alert_type: "mdcore" and severity 2 (High).

{ "timestamp": "2025-11-15T14:27:08.912447+0000", "flow_id": 1847291847292004, "event_type": "alert", "alert_type": "mdcore", "src_ip": "192.168.10.157", "src_port": 49284, "dest_ip": "203.0.113.42", "dest_port": 443, "proto": "TCP", "app_proto": "tls", "community_id": "1:9XZyM7pL5mQ2rT9uV1wX3qR8", "alert": { "action": "allowed", "gid": 1, "signature_id": 1000006, "rev": 1, "signature": "MetaDefender NDR MDCore High-AV Detection", "category": "A Network Trojan was Detected", "severity": 2 }, "fileinfo": { "filename": "invoice-2025-11.pdf", "magic": "PDF document, version 1.5", "size": 284716, "stored": true, "state": "CLOSED", "sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", "md5": "d41d8cd98f00b204e9800998ecf8427e" }, "mdcore": { "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855": { "threat_found": true, "threat_name": "Trojan.PDF.Emotet.Generic", "scan_result": "Infected", "scan_result_code": 1, "positive_engines": 12, "total_engines": 30, "data_id": "bda4e3201ff54c6aa65f11c7b0e2a9d8", "scan_time": "2025-11-15T14:27:07Z", "cached_result": false, "file_info": { "sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", "md5": "d41d8cd98f00b204e9800998ecf8427e", "file_size": 284716, "file_type": "PDF", "file_type_description": "PDF document" }, "scan_details": { "Kaspersky": { "threat_found": "HEUR:Trojan.PDF.Emotet.gen", "scan_result_i": 1, "engine_version": "21.0.1.45", "def_version": "2025.11.15" }, "Bitdefender": { "threat_found": "Trojan.PDF.Emotet.A", "scan_result_i": 1, "engine_version": "7.94302", "def_version": "2025.11.15" } } } }, "rule_name": "MetaDefenderHighAVDetection", "rule_salience": 8, "triggered_at": "2025-11-15T14:27:08.951283Z" }

A Medium-tier alert carries rule_name: "MetaDefenderMediumAVDetection", rule_salience: 6, alert.severity: 3, and a positive_engines value between 3 and 5. A Low-tier alert carries rule_name: "MetaDefenderLowAVDetection", rule_salience: 4, alert.severity: 4, and a positive_engines value of 1 or 2. When the same event carries multiple file hashes, the mdcore object contains one entry per hash and each scan result is independently browsable in the sidebar; the top-level alert severity reflects the maximum positive-engine count across entities.

Tuning Considerations

Every MetaDefender Core parameter is Policy-managed and changes propagate live without a service restart. The knobs divide into four groups.

Integration and credentials.

Enable or disable the MetaDefender Core enrichment. Operators toggle the enrichment on or off through Policy. Disabling the service leaves the pipeline running -- no crash, no dropped events -- and suppresses all downstream MD Core alerts. Events continue to be consumed and FileInfo records continue to appear in the Hunt page; only the scan verdict is absent.
Deployment mode and credentials. Operators choose Cloud or Core, set the API key (Cloud) or the Core base URL plus API key or session credentials (Core), and optionally specify a scan rule or workflow. See Integrations for the full setup procedure including My OPSWAT portal sign-up for a Cloud API key and credential rotation guidance.

Event filtering.

Multipurpose Internet Mail Extensions skip-list. The service accepts both an exact-match list and a prefix-match list so operators can exclude whole families of traffic from scanning. Common defaults for high-volume benign content: image/, audio/, video/, and text/plain or text/html on exact match.
Maximum file size. Files larger than the configured limit (100 megabytes by default) are skipped and logged; the limit exists to keep the scanning pipeline responsive under high-volume extraction bursts.

Scanning behavior.

Hash-lookup-first deduplication. On by default. When enabled, the service checks MetaDefender's hash cache before uploading file content; cache hits return the prior verdict at zero upload cost and with cached_result: true in the enrichment payload. Disabling this setting forces every file to be re-uploaded, which multiplies API and bandwidth consumption and is rarely useful outside controlled testing.
Rescan window. The service treats a cached result as fresh for a configurable number of days (three by default). Files seen after the window are re-uploaded so the verdict reflects current signatures.
Concurrency and rate limiting. The service caps concurrent scan workflows at a Policy-managed maximum (five by default) and backs off automatically against MetaDefender's rate-limit headers. Environments with a generous MetaDefender Core deployment may raise the concurrency cap; environments constrained to MetaDefender Cloud's per-tier quota should leave the default and trust the automatic backoff.
Poll interval and budget. The service polls MetaDefender for scan completion on a configurable interval (30 seconds by default) with a maximum poll budget (120 attempts by default, for a 60-minute total scan timeout). Slow scans -- archives with deep nesting, large binaries submitted to a busy Core cluster -- complete inside the default budget; environments that routinely scan very large files may lengthen either setting.

Post-scan file handling.

Delete after scan. On by default. The carved file is deleted from the extraction directory once the scan completes to keep the sensor's disk footprint low. Operators who need the raw file retained for manual review should turn this off and configure long-term archive storage below; leaving delete-after-scan off without archival will fill the sensor disk.
Long-term archive storage. Off by default. When enabled, scanned files are copied to a Simple Storage Service bucket (MinIO, Amazon Web Services, or any S3-compatible backend) with a configurable sharding strategy, optionally restricted to files containing threats. Useful when downstream forensic or reverse-engineering workflows require the original bytes beyond the Hunt-page retention window.

All four groups are exposed through Policy -- the Updates management surface in Updates Management shows per-policy enable / disable state and integration connectivity status, and Integrations walks through the MetaDefender integration setup.

The family's main false-positive surface is the Low tier -- the threshold is deliberately permissive so analysts can see borderline verdicts rather than silently drop them, which means one or two vendors occasionally flag legitimate installers, patched binaries from obscure publishers, or signed content that still matches a heuristic. When a Low-tier alert's companion traffic shows no other anomaly (no C2 hit, no InSights indicator, no behavioral signal on the downloading host), the appropriate response is usually to leave the verdict as-is rather than tune it down -- the Low tier is already a "possible false positive" lane by design, and operators who need file-scanning work filtered out should lower the Policy's scanning scope rather than suppress alerts at the engine.

MD Core alerts at the High and Medium tiers -- and any tier that has been IOC-auto-escalated to Critical -- route through Malicious File Investigation, which walks through reading the multi-AV verdict, correlating the download session with its initiating host and user, retrieving the archived file when storage is enabled, and determining whether the file reached a host that executed it. Low-tier alerts with corroborating context (an unusual source host, a rare destination, a companion Beaconing or Long Duration Flow detection) also enter that runbook; isolated Low-tier alerts with no corroborating signal are triaged through Alert, Flow, and PCAP Pivoting to verify the delivery context before escalating.

Last updated on

Was this page helpful?