ML Anomaly Investigation

This runbook is the family-specific continuation of Critical Alert Triage for any alert whose evidence came from MetaDefender NDR's Machine Learning (ML) anomaly detector. It covers the single trigger surface the family produces — an ML Random Cut Forest (RCF) Anomaly Event on the Hunt page — across all three event types the detector scores (Domain Name System (DNS), Hypertext Transfer Protocol (HTTP), and Flow) and across both the threshold-based Medium path and the Indicator of Compromise (IOC) auto-escalated Critical path. The steps below take the analyst from "the model flagged an event as unusual" to "a triage decision recorded in the incident-management system".

ML anomaly alerts are unusual among MetaDefender NDR's families because they do not tell the analyst what is wrong — only that an event's shape deviates from what the detector has learned is normal. Every other detection family names the matched pattern (a Suricata signature identifier, a C2 feed hit, a cataloged Threat Intelligence Database (TIDB) match, an antivirus verdict, a named behavioral detection). The ML family only says this event looks isolated relative to the last few thousand events of the same type. The analyst's job is to read the original event payload, decide what made it anomalous, and then correlate that observation against other signals on the same host to separate a real detection from a baseline shift.

This runbook is written for Tier 1 Security Operations Center (SOC) analysts performing first-response triage on ML anomaly alerts, Tier 2 analysts resolving escalations when the anomaly corroborates a signal from another family, and threat hunters using the ML surface to find novel activity that the signature, intelligence, and behavioral families have not cataloged. It assumes the analyst has already executed Critical alert triage through step 5 when the lead originated from an IOC-auto-escalated Critical alert — the affected asset, the supporting sidebar evidence, the repetition history, and any same-flow corroboration are already in hand. For Medium-severity ML alerts triaged directly from the Hunt page queue, the analyst records the affected asset and its criticality tier before executing the runbook below.

First-use acronym expansions in this runbook: SOC (Security Operations Center), IOC (Indicator of Compromise), ML (machine learning), RCF (Random Cut Forest), RRCF (Robust Random Cut Forest — the online-learning variant used in MetaDefender NDR), C2 (command-and-control), DNS (Domain Name System), HTTP (Hypertext Transfer Protocol), HTTPS (Hypertext Transfer Protocol Secure), TLS (Transport Layer Security), SNI (Server Name Indication), JA3 and JA4 (TLS client fingerprint hashes), NXDOMAIN (Non-Existent Domain DNS response), TTL (Time-to-Live), TCP (Transmission Control Protocol), SYN / ACK / FIN / PSH / RST (TCP control flags), IP (Internet Protocol), ASN (Autonomous System Number), GeoIP (geographical IP lookup), DGA (Domain Generation Algorithm), URI (Uniform Resource Identifier), CIDR (Classless Inter-Domain Routing), RFC-1918 (the Internet Engineering Task Force standard reserving private IP ranges), TIDB (Threat Intelligence Database), REPDB (Reputation Database), EDR (endpoint detection and response), VPN (Virtual Private Network), SaaS (Software-as-a-Service), IR (incident response), PCAP (packet capture), MVP (Minimum Viable Product), FRD (Functional Requirements Document), CI (continuous integration).

Trigger Scenario

An analyst reaches this runbook from one of two starting points. In both cases the lead describes a single merged event — a DNS query, an HTTP transaction, or a Flow record — whose feature vector the RCF detector scored above the per-event-type threshold.

  • Primary — ML RCF Anomaly Event on the Hunt page. The analyst opens the Hunt Page, navigates to the All Alerts bucket, and selects the ML RCF Anomaly Event sub-tab. A row carries alert_type: "ml_rcf_anomaly", a Medium severity label (the MVP default under the pass-through MLAnomalyDetection rule), an anomaly_score, a threshold, and the suricata_event_type that identifies which detector fired — dns, http, or flow. The column set also carries model_version and, when the infrastructure-aware whitelist reduced the raw score, original_score, whitelist_action, and whitelist_factor.
  • Secondary — Critical alert carrying an ml_rcf_anomaly block. The alert reaches the analyst through Critical Alert Triage because the IOC auto-escalation rule promoted it to Critical / 0.99 confidence: the anomalous event's source, destination, or queried domain coincides with a C2 feed hit or an OPSWAT InSights TIDB or Reputation Database (REPDB) match. The underlying anomaly score may be anywhere from just above the per-type threshold to an extreme outlier.

The two trigger surfaces converge in this runbook because the investigation is the same: the RCF detector found an event worth a closer look, and the analyst needs to decide whether the deviation is malicious, benign-but-unusual (a legitimately-new service or a recent environment change), or a warmup artifact. The evidence-gathering steps are identical once the analyst is in the Hunt workspace; the only divergence is in step 6, where an IOC match on the secondary trigger is an immediate escalation lever.

Prerequisites

Before executing this runbook the analyst confirms the following.

  • Steps 1 through 5 of Critical Alert Triage are complete when the lead is a Critical-severity alert. The affected asset, its owner and criticality tier, the sidebar evidence summary, and any repetition or same-flow correlation already live in the incident ticket. For Medium-severity ML alerts triaged directly, the analyst still records the affected asset and its criticality tier before executing the runbook.
  • A Hunt session open on the originating ML RCF Anomaly Event sub-tab (for the primary trigger) or on the originating Critical alert row carrying an ml_rcf_anomaly block (for the secondary trigger), with the detail sidebar visible.
  • Access to the organization's asset inventory for the internal host named in the event's five-tuple. On a DNS anomaly the internal host is typically the querying client; on an HTTP or Flow anomaly it is typically the connection initiator. The analyst reads the direction from the event's five-tuple and the carrier-protocol role (client versus server).
  • A working mental model of the deployment's recent change history — newly-deployed services, software rollouts, new monitoring probes, subnet migrations, Virtual Private Network (VPN) concentrator changes, or anything else that would legitimately introduce traffic shapes the RCF detectors have not yet seen. Recent legitimate change is the single most common driver of transient ML anomaly alert volume.
  • An understanding of how long the local RCF deployment has been running on production traffic. Alerts that fire within the first several minutes of a detector's lifetime — or within the first several minutes of any service-restart or redeployment — are at meaningful risk of being warmup artifacts (see step 8).
  • Access to the incident-management system used by the SOC and, when available, read access to the organization's EDR console so emergent host-level behavior following the anomalous event can be corroborated against endpoint telemetry.

Missing any of these is a hard stop. An ML anomaly investigation run without asset context tends to misread the direction of the anomalous flow; run without a recent-change baseline, it tends to over-escalate legitimate environment shifts the model has not yet absorbed; run without a sense of detector lifetime, it tends to escalate warmup artifacts that would naturally dissipate within minutes.

Investigation Steps

Each step is numbered, adds a distinct piece of evidence, and feeds the decision tree at the end. The analyst executes steps in order even when the verdict looks obvious early; an ML anomaly disposition is never safely recorded on the raw score alone because the score is a ranking signal and not an explanation — the evidence that determines the disposition lives in the original event payload, in the host's recent baseline, and in the correlation against other detection families.

1. Open the alert sidebar and capture the ml_rcf_anomaly block

On the Hunt page, the analyst clicks the alert row to open the detail sidebar. The sidebar header shows the alert type (ML RCF Anomaly Event), the severity label (Medium on MVP unless IOC-escalated), and the confidence score. MVP does not wire a dedicated ML Anomaly sidebar section; the ml_rcf_anomaly fields appear inline on the record, and the per-protocol sidebar section for the anomalous event's type (Suricata DNS, Suricata HTTP, Suricata TLS, Suricata Flow, or Suricata FileInfo) renders alongside the standard Network Base block. The analyst records the following fields from the ml_rcf_anomaly block before any pivot.

  • ml_rcf_anomaly.anomaly_score — the per-event score after whitelist adjustment. Floating-point, non-negative. The absolute value is meaningless without the threshold; both are read together in step 2.
  • ml_rcf_anomaly.threshold — the per-event-type threshold the score crossed. DNS events use 3.0, HTTP events use 8.0, Flow events use 20.0 under the MVP defaults. A score of 4.35 on a DNS event is barely above threshold; the same score on a Flow event is below threshold and would not have alerted.
  • ml_rcf_anomaly.suricata_event_type — the event type that the scoring detector received: dns, http, or flow (other event types such as tls or alert may appear when the detector's upstream routing scored them against the matching per-type classifier). This field tells the analyst which per-protocol sidebar section to read in step 3.
  • ml_rcf_anomaly.event_id — the identifier of the original Suricata event that scored anomalous. Used to pivot to the full event in the Hunt page in step 3.
  • ml_rcf_anomaly.timestamp — the timestamp of the original anomalous event, not the scoring time. Used to bound same-source correlation in step 5.
  • ml_rcf_anomaly.original_event — the full merged event payload — DNS, HTTP, TLS, Flow, or FileInfo block — that the detector scored. The protocol-specific sidebar section renders these fields in the usual layout.
  • ml_rcf_anomaly.model_version and ml_rcf_anomaly.version_model — identifiers of the RCF model and its semantic version. Useful when the configuration has changed recently and the analyst needs to distinguish alerts produced by the previous model era from alerts produced by the current one.
  • ml_rcf_anomaly.detector_config — the identifier of the detector configuration snapshot (num_trees, shingle_size, tree_size, per-event-type threshold). Recorded so a tuning follow-up action (the Tune the rule branch of the decision tree) can reference the specific configuration that produced the alert.
  • ml_rcf_anomaly.original_score — present only when the infrastructure-aware whitelist reduced the raw score. The value is the unadjusted forest co-displacement score before the whitelist factor applied. Read together with whitelist_factor in step 7.
  • ml_rcf_anomaly.whitelist_action — present only when the whitelist matched. Typically reduced (the score was multiplied by a fractional factor); the excluded action never appears on an emitted alert because excluded events are dropped before scoring.
  • ml_rcf_anomaly.whitelist_factor — present only when the whitelist reduced the score. A value between 0.4 and 0.9 that the original score was multiplied by.

Above the ml_rcf_anomaly block the base network section carries src_ip, src_port, dest_ip, dest_port, proto, app_proto, and the community_id correlator. The analyst captures the full base record so subsequent pivots have a reference point. "DNS anomaly on 10.40.12.88 (finance-workstation-412, Tier 2), score 4.35 against threshold 3.0, community_id 1:abc..., original event dns query for a1b2c3d4e5f6g7h8i9.example.net" is the shape of a capture note that survives handoff.

2. Read the score against the threshold and locate the alert on the confidence ladder

The raw anomaly_score is not directly interpretable across event types — a 4.35 on DNS is barely above the 3.0 threshold, while a 4.35 on Flow is well below the 20.0 threshold and would not have produced an alert at all. The analyst reads the score relative to its threshold by computing the ratio anomaly_score / threshold and locates the alert on the ladder from ML Anomaly Investigation

Ratio of score to thresholdConfidence bandInterpretive note
IOC auto-escalation fires (any feed hit on the event's entities).0.99Treated as Critical per the IOC auto-escalation rule. Confidence is not derived from the score at this band.
≥ 3× threshold (for example, DNS score ≥ 9.0, HTTP score ≥ 24.0, Flow score ≥ 60.0).0.90Extreme anomaly — the forest considers the event isolated by an order of magnitude.
2× to 3× threshold.0.80Strongly anomalous.
1.3× to 2× threshold.0.65Moderately anomalous.
1.0× to 1.3× threshold.0.50At or just above the threshold; most alerts in a well-tuned deployment land here and are corroborating context rather than standalone priorities.

The analyst records the ratio and the band explicitly in the ticket. "Score 4.35 / threshold 3.0 = 1.45× ratio, confidence band 0.65 (moderately anomalous)" is the kind of note a senior analyst or a reviewer can read without pulling the record again. The confidence band shapes the disposition weight in the decision tree: 0.50-band alerts rarely justify escalation on the ML evidence alone, 0.90-band alerts frequently do when at least one corroborating signal exists, and 0.99-band alerts (IOC-escalated) are Critical by construction.

3. Pivot into the original event to identify what shape of traffic was flagged

This is the single most important step in the runbook. The ML family's value is indicating which event is worth a closer look; the analyst's value is reading why the event is worth a closer look. The analyst pivots from the anomaly alert into the original event and reads the per-protocol sidebar section for the values that are actually unusual.

The analyst clicks through to the original event in one of two ways. First and simplest, the ml_rcf_anomaly.original_event block renders inline in the alert sidebar under the standard per-protocol section that matches suricata_event_type. Second, when the event needs to be seen in its native sub-tab with surrounding rows, the analyst right-clicks ml_rcf_anomaly.event_id and selects Show related events — a new tab opens filtered to the underlying record.

What the analyst looks for depends on the event type.

  • DNS anomalies (suricata_event_type: "dns"). The Suricata DNS section exposes dns.type (query or answer), dns.rrname, dns.rrtype, dns.rcode, dns.answers[], and the timestamp. The features the detector baselines most heavily on DNS are query and answer counts and the event's hour. Values that look unusual on inspection include very long second-level labels (a1b2c3d4e5f6g7h8i9... patterns characteristic of Domain Generation Algorithms), uncommon record types (TXT or NULL bulk records, ANY queries), very high NXDOMAIN ratios from the same source, and off-hours query spikes. A DNS anomaly with an obviously-algorithmic rrname is a strong signal by itself and a candidate hand-off to Tunneling Investigation.
  • HTTP anomalies (suricata_event_type: "http"). The Suricata HTTP section exposes http.hostname, http.http_method, http.url, http.user_agent, http.http_content_type, http.status, http.length, and the full request / response header set when recorded. The features the detector baselines on HTTP are the status code and the response length, plus temporal context. Values that look unusual on inspection include atypical user agents (hand-rolled transports, toolkit-default strings like python-requests or Go-http-client, Cobalt Strike or Sliver defaults), unusually large response lengths on a host whose HTTP baseline is small, unfamiliar hostnames, Uniform Resource Identifiers with high-entropy path segments, and status codes outside the host's usual profile.
  • Flow anomalies (suricata_event_type: "flow"). The Suricata Flow section exposes flow.bytes_toserver, flow.bytes_toclient, flow.pkts_toserver, flow.pkts_toclient, flow.age, flow.alerted, the TCP flag tallies (tcp.syn, tcp.ack, tcp.fin, tcp.psh, tcp.rst), and the flow-end reason. The features the detector baselines on Flow are byte and packet asymmetries across directions, TCP flag combinations, and temporal patterns. Values that look unusual on inspection include byte ratios the host's baseline does not produce (very asymmetric uploads or downloads), TCP flag combinations that indicate scanning or reset behavior (high RST, unusual SYN-without-ACK patterns), short flows with disproportionate byte counts, and long-running flows outside the host's usual session durations.

The analyst also scans the companion TLS section when present (Server Name Indication, certificate subject and issuer, JA3 / JA4 client fingerprints). TLS fields are not among the RCF feature vector's fifteen dimensions on MVP but are rendered in the sidebar when the event carries them, and an unfamiliar JA3 fingerprint paired with an anomalous Flow or HTTP event is independent corroboration of the anomaly.

The analyst records the specific values that appear unusual. "DNS query for a1b2c3d4e5f6g7h8i9.example.net, rrtype A, from 10.40.12.88 at 02:14 local time; 18-character random-looking second-level label, off-hours query, query to a domain never previously resolved from this source" is the shape of a finding that determines whether the evidence stands on its own or needs corroboration to justify escalation.

4. Compare against the host's normal traffic pattern over the prior 24 hours

An ML anomaly says this event is isolated relative to the forest. A baseline comparison says this event is isolated relative to this specific host's recent behavior. The two are independent questions, and the host-level baseline is often the faster path to disposition — a host that reliably issues a particular kind of traffic every day has an internal baseline the forest cannot see, while a host whose usual footprint is narrow produces a sharper contrast against the anomalous event.

The analyst right-clicks the internal host's IP on the alert row or in the sidebar and selects Hunt all events from this IP. A new All Events tab opens on the Hunt Page filtered to the host. The analyst widens the time range to Last 24 hours (the prior-24 window is a reasonable default; longer windows are appropriate when the alert fires on a weekend or outside a host's typical working pattern) and reads the result.

  • DNS baseline. The analyst filters the DNS sub-tab to the host and surveys the parent domains, the query types, the query-rate pattern across the day, and the usual NXDOMAIN ratio. Does the anomalous query type or label pattern appear in the prior 24 hours at all? If the host's DNS baseline has never produced a TXT query to a random-looking name and now produces one, the contrast is sharp. If the host has issued dozens of similar queries over the window and only the most recent was flagged, the contrast is weaker — the forest's feature space crossed the threshold on this event but the host's internal baseline includes the pattern.
  • HTTP baseline. The analyst filters the HTTP sub-tab to the host and surveys the hostnames visited, the method distribution, the user agents used, and the response-length distribution. Does the anomalous hostname appear elsewhere in the window? Does the anomalous user agent match one this host typically uses? A host whose HTTP history is dominated by a single corporate proxy and suddenly produces a python-requests User-Agent to an unfamiliar host is a far stronger signal than a host whose HTTP history already includes mixed client identities.
  • Flow baseline. The analyst filters the Flow sub-tab (or the Netflows bucket) to the host and surveys the usual destination population, the usual byte volumes per direction, the usual flow durations, and the typical TCP flag distributions. Does the anomalous destination appear in the window's flow history? Does the anomalous byte ratio match the host's usual upload or download profile, or contrast sharply with it? A host whose flow baseline is short symmetric sessions to a known peer set and suddenly produces a long asymmetric upload to an unfamiliar endpoint is a sharply-contrasted anomaly.

The analyst records a one-line read of the baseline contrast. "Prior 24 hours on this host show only corporate-proxy DNS queries during business hours; the anomalous query is the first off-hours query to an off-proxy resolver and the first query with a random-looking second-level label — sharp contrast against the host's own baseline" is the shape of a finding that strongly weighs toward escalation in the decision tree. "Prior 24 hours include half a dozen similar queries to the same parent domain, same query type, same time-of-day pattern — the anomalous event is only marginally different from the host's baseline" is the shape that weighs toward monitor or close.

5. Correlate the anomaly with other detection signals on the same source

This is the second strongest disposition lever in the runbook. A single RCF anomaly stands alone is a weak hypothesis; the same anomaly paired with a Beaconing alert, a Data Exfiltration alert, a Tunneling alert, a C2 Enrichment match, or an MDCore threat-finding on the same source is multi-signal convergence and is the strongest basis for escalation outside the IOC-auto-escalation rule itself.

The analyst runs two correlation pivots.

First, the analyst right-clicks the internal host's IP and selects Hunt all events from this IP (if a tab from step 4 is already open, the analyst reuses it), narrows the tab to the All Alerts bucket, and widens the time range to Last 7 days to capture the fullest correlation window. The analyst reads which alert types have fired on the same source over the window.

  • Co-occurring Beaconing or C2 alerts. When a Beaconing Detection Alert on the same source appeared within the hours before or after the RCF anomaly, the anomaly is likely part of the beacon traffic the forest flagged as isolated (for example, a specific check-in whose flow shape differed from the steady-state beacon). The analyst hands off to C2 Beacon Investigation.
  • Co-occurring Data Exfiltration alerts. When a Data Exfiltration Detection Alert on the same source appeared near the anomaly, the anomaly is likely part of the exfiltration upload (an unusually large flow, an unusual upload-to-download ratio). The analyst hands off to Data Exfiltration Investigation.
  • Co-occurring DNS Tunneling alerts. When a DNS Tunneling Detection Alert or a DNS Tunneling Hourly aggregation alert on the same source appeared near the anomaly, the DNS anomaly is likely part of the tunneling channel. The analyst hands off to Tunneling Investigation.
  • Co-occurring MDCore alerts. When a MetaDefender Core alert on the same source appeared before the anomaly, the anomaly may be post-delivery behavior (a delivered payload executing and making outbound contact). The analyst hands off to Malicious File Investigation and carries the anomaly as post-delivery corroboration.
  • Co-occurring Suricata signature or InSights alerts. A signature match or a Threat Intelligence Database / Reputation Database hit on the same source that appeared within the correlation window is parallel corroboration — the anomaly augments the existing lead rather than originating its own.
  • No co-occurring alerts from any other family. The anomaly stands alone. The case for escalation now rests on the sharpness of the baseline contrast in step 4, the strength of the values observed in step 3, the confidence band in step 2, and the IOC enrichment check in step 6.

Second, the analyst uses the Show all events with this community id pivot on the community_id of the original event to check whether any other detection family fired on the same specific connection. A same-flow correlation is tighter than a same-source correlation; a Beaconing detection on the same community_id as the RCF anomaly is the strongest form of convergence because both signals describe the same traffic at the same time.

The analyst records the correlation reading explicitly. "No other alerts on this source in Last 7 days; the anomaly stands alone" is a different disposition pressure from "Beaconing Detection on the same source within 90 minutes before the anomaly, and a Data Exfiltration Detection on the same source within 30 minutes after — three independent signals within a two-hour window".

6. Check for IOC enrichment on the event's entities

The IOC auto-escalation check produces the same disposition pressure it does in every other family-specific runbook. The alert sidebar carries the C2 Enrichment section whenever any C2 feed match fired on the event's entities; the Insights Enrichment section carries parallel matches from the OPSWAT InSights TIDB and REPDB feeds.

  • Any c2.matches[] entry present on the event's src_ip, dest_ip, resolved DNS name, or queried DNS name. The alert is IOC-Critical by construction. Severity is Critical, alert-level confidence is 0.99, and the match payload identifies the threat actor or malware family the indicator belongs to. An ML anomaly with a C2-flagged entity is a two-source corroborated finding: the ML detector flagged the event as isolated, and an independent intelligence feed identifies the entity as known-bad. The finding on its own is sufficient for escalation to incident response.
  • Parallel InSights TIDB match on an event entity. See InSights TIDB and REPDB for the feed semantics. A TIDB hit on the source, destination, or queried domain promotes the alert to Critical under the IOC auto-escalation rule.
  • Parallel InSights REPDB match on an event entity. A REPDB hit is lower-confidence than a TIDB hit but is still material and also promotes to Critical under the IOC auto-escalation rule, especially when the REPDB category names a class consistent with the anomaly shape (for example, an anonymous file-sharing host on a Flow anomaly with high upload volume).
  • No IOC match on any feed. The alert is threshold-Medium on RCF evidence alone. The case for escalation now depends on the confidence band in step 2, the anomalous-value reading in step 3, the baseline contrast in step 4, the same-source correlation in step 5, and the whitelist reading in the next step.

The analyst records the IOC state explicitly in the ticket — "IOC match present on src_ip / dest_ip / resolved domain / queried domain / none" — because it is the single field that changes downstream dispositions most often. A Medium-tier ML anomaly with an IOC match on the destination is triaged identically to any IOC-Critical alert across the family; a 0.90-band ML anomaly with no IOC match still warrants escalation on same-source convergence alone when the baseline contrast is sharp, but the disposition record must reflect the absence of IOC evidence.

7. Read the whitelist fields to confirm the score's provenance

The infrastructure-aware whitelist reduces or excludes scores that match patterns the service ships with for MetaDefender NDR's own platform traffic, Kafka, PostgreSQL, service-discovery and pod-network IPs, multicast, broadcast, and a short allowlist of common internal endpoints. When the whitelist reduced (but did not exclude) a score, the alert sidebar carries the original_score, whitelist_action, and whitelist_factor fields alongside the adjusted anomaly_score. The analyst reads these fields to confirm that the score the alert fired on is after any reduction and to see how aggressively the whitelist is smoothing traffic from the matched pattern.

  • No whitelist fields present. The event did not match any whitelist pattern. The anomaly_score is the raw forest co-displacement score, and the alert's position on the confidence ladder in step 2 reflects the model's unmodified view.
  • whitelist_action: "reduced" with whitelist_factor between 0.4 and 0.9. The event matched an infrastructure pattern and the score was multiplied by the factor. The original_score is the unadjusted value. The post-reduction score still crossed the threshold, which is why the alert emitted despite the reduction. When the raw score is very high and the reduction factor is near the top of the range (0.8 or 0.9), the event is likely legitimately anomalous infrastructure traffic that the whitelist softened but did not suppress; when the reduction factor is near the bottom of the range (0.4 or 0.5) and the adjusted score still crosses the threshold, the event is very strongly anomalous even against the whitelist's reduction.
  • whitelist_action: "excluded". This action never appears on an emitted alert (excluded events are dropped before scoring). If it ever appears, treat the field as an environmental fault and record the anomaly.

The analyst records the whitelist reading in the ticket. "Raw score 40.18 reduced by factor 0.6 to 24.11; post-reduction score crosses Flow threshold 20.0 by 1.2× — the whitelist matched but the event is still anomalous against the adjusted view" is the shape of a whitelist-aware note. A whitelist reduction is not a disposition by itself — the adjusted score is what the alert fired on and what the confidence ladder reads — but the provenance matters when an analyst needs to distinguish "anomalous despite infrastructure smoothing" from "anomalous in raw space only".

8. Check for warmup artifacts and recent detector-configuration change

The RCF detectors warm up inside their first few hundred events, and the first six events per detector (two times the default shingle size of three) should be disregarded entirely per ML anomaly detections — Tuning Considerations. Alerts that fire within minutes of a service restart, redeployment, or fresh rollout are at meaningful risk of being warmup artifacts — high scores the forest would not produce once it has absorbed a few hundred more events. The analyst checks two conditions before closing on a warmup hypothesis.

  • Recent detector-service restart. The analyst checks the ML RCF Anomaly Event sub-tab for the overall alert volume over the hour preceding the alert. A sudden spike in ML anomaly alerts across many sources and many event types at the same minute is the signature of a detector warmup after a service restart. A single alert on a single source, with no matching spike, is not a warmup artifact regardless of absolute score.
  • Recent configuration change to model_version, version_model, or detector_config. When the model_version or detector_config on the alert differs from the values on ML anomaly alerts from the same deployment more than a few hours earlier, the detector has been reconfigured recently and its model is still stabilizing. The analyst records the configuration identifier and treats the alert with additional caution in the decision tree — a high score under a freshly-reconfigured detector is a candidate warmup artifact; a high score under a detector whose configuration has been stable for days is not.

The analyst also considers whether the anomalous event type has a naturally thin baseline in the deployment. In a network that is mostly TLS with relatively little plaintext HTTP, the HTTP detector warms up more slowly than the DNS or Flow detectors because it sees fewer events per minute; a Medium HTTP anomaly from a sparsely-exercised detector in the first hours of deployment is more plausibly a warmup artifact than the same anomaly in a dense-traffic detector that has been running for days.

When session evidence from step 3 is insufficient — the anomalous event rode an encrypted carrier with no usable SNI, the HTTP record lacks header detail, or the protocol parser did not populate the expected fields — the analyst requests a PCAP for the event's time window through the organization's PCAP request workflow. PCAP availability is configuration-dependent and selective on MetaDefender NDR (see Alert, Flow, and PCAP Pivoting); when one is available, the analyst inspects the raw transport payload for the features the sidebar cannot surface. When a PCAP is unavailable, the investigation concludes on the evidence already gathered, and the analyst explicitly records the gap in the ticket.

Decision Tree

The analyst records one of four outcomes. Each branch lists the minimum artifacts captured before the ticket is closed.

  • Escalate — corroborated anomaly with an identified threat hypothesis. The disposition when any of the following holds: the alert is IOC-Critical (C2, TIDB, or REPDB match on any event entity) on any severity tier; at least one co-occurring detection from another family (Beaconing, Data Exfiltration, Tunneling, MDCore, Suricata signature, InSights) fired on the same source within a same-source correlation window (typically the prior seven days) or on the same community_id; the baseline-contrast reading in step 4 is sharp and the anomalous values in step 3 match a known attacker tradecraft pattern (DGA-shaped DNS query, Cobalt Strike / Sliver toolkit default User-Agent, asymmetric upload to an unfamiliar external endpoint); or the confidence band in step 2 is 0.90 or higher and the host is a Tier 1 or Tier 2 asset. The analyst opens an IR ticket, requests endpoint isolation when the evidence warrants it, transfers the evidence record and pivot tabs into the incident, and hands off to the corresponding companion runbook (C2 Beacon Investigation, Data Exfiltration Investigation, Malicious File Investigation, or Tunneling Investigation) when one applies. The Escalate branch is where this runbook ends and incident-response procedures take over.
  • Monitor — isolated anomaly that warrants a recurrence watch. The disposition when the evidence is partial: a 0.50-band or 0.65-band alert on a Tier 2 or Tier 3 asset, no IOC enrichment, no co-occurring alerts from other families, a baseline-contrast reading that is present but not sharp, and no obvious warmup-artifact indicator. The analyst records the observation in the ticket, leaves the Hunt tabs open, schedules a follow-up review (typically within twenty-four hours for a Medium alert), and watches for recurrence on the same source, for a similar anomaly on a neighboring host, or for a belated corroborating signal from another family. Because the RCF family adapts continuously, an isolated anomaly that does not recur is the expected shape — the forest absorbs the pattern into its learned baseline within hours. When a recurrence fires within the review window, the analyst re-enters at step 1 with the accumulated context and re-evaluates; when no recurrence surfaces, the disposition moves to close as benign.
  • Close as benign — identified legitimate explanation. The disposition when the evidence positively identifies a legitimate driver for the unusual traffic shape: a known recent environment change (a newly-deployed internal service whose traffic pattern the forest has not yet absorbed, a new monitoring probe, a new vendor integration, a subnet migration, a VPN concentrator swap), a warmup artifact within the first few minutes of a detector-service restart, a detector-configuration change that has not yet stabilized, an infrastructure pattern the whitelist reduced but did not fully suppress, or an unusual-but-explained traffic shape (a CI job producing a burst of HTTP requests, a scheduled backup with an off-hours signature). The analyst records the specific fields that justify the conclusion in the ticket — "score 4.35 on DNS anomaly, 1.45× threshold; source host 10.40.12.88 recently moved to a new resolver per change ticket CHG-12345 on 2026-04-20, anomalous queries are to the new resolver's upstream, baseline contrast explained by the change, no IOC enrichment, no co-occurring alerts, closing as benign" is the shape of a defensible close-note — so later readers can reopen the conclusion when one of those fields changes.
  • Tune the rule — known benign baseline shift with a scoped policy fix. The disposition when the same benign anomaly pattern has fired repeatedly across the same population, the source of the noise is understood (a specific internal service producing a specific traffic shape, a specific infrastructure pattern the whitelist does not yet cover, a per-event-type threshold set too low for the deployment's traffic mix), and the Policy surface for the RCF family supports a scoped adjustment. Relevant tuning levers exposed through Policy and documented in ML anomaly detections — Tuning Considerations include the per-event-type anomaly threshold (DNS 3.0, HTTP 8.0, Flow 20.0 defaults), the whitelist pattern set (CIDR ranges, port lists, HTTP Uniform Resource Locator wildcards, complete-exclusion rules), and the detector-capacity fields (num_trees, shingle_size, tree_size). Tuning is a detection-engineering action, not a triage shortcut: the analyst opens a follow-up task against the RCF Policy definition rather than tuning inline, submits the change through the organization's peer-review workflow, and keeps the alert live until the tuned policy deploys. The default disposition for a single confirmed-benign alert is close, not tune — a single occurrence does not justify raising a threshold or broadening the whitelist, and either change reduces coverage for future variants of the matched pattern. An analyst raising a threshold should adjust one per-event-type value at a time and observe the resulting queue depth over at least a one-day warmup window.

Every branch is recorded in the incident-management system. The runbook reference, the disposition, the asset context, the confidence-band reading, the anomalous-value reading, the baseline-contrast reading, the same-source correlation reading, the IOC state, and the pivot tab references form the minimum record. When the disposition is Escalate, the record hands off directly into the incident.

Common False-Positive Patterns

A material share of ML anomaly alerts have benign explanations, especially at the 0.50 and 0.65 confidence bands where most alerts land in a well-tuned deployment. Recognizing the patterns saves investigation time and keeps analysts from over-escalating legitimate traffic the forest has not yet absorbed.

  • Newly-deployed services, vendors, or integrations. A service that started producing outbound traffic within the last few hours or days has a pattern the RCF detectors have not yet absorbed. The first N events from the new service score high on every feature axis the forest uses — destinations never seen before, flow shapes with no learned precedent, DNS queries to new parent domains, HTTP content types or response-length patterns that contrast sharply against the learned baseline. The same services produce no alerts a few days later once the detectors have ingested enough events. Disposition: close when the deployment change is confirmed against the organization's change-management record; consider tune only if the new service produces sustained alert volume across many events over days and the traffic pattern is well-characterized.
  • Monitoring probes and synthetic transactions. Synthetic-monitoring agents, uptime-check services, endpoint-telemetry probes, external-monitoring providers (Pingdom, UptimeRobot), and internal network-monitoring synthetic transactions regularly produce small, repeating traffic shapes that score anomalously on Flow and HTTP detectors when the probe's source, destination, or timing does not match the forest's baseline. Disposition: close when the monitor identity is confirmed via user agent, source IP, or destination; tune via whitelist CIDR addition when the monitor produces sustained cross-host volume.
  • Infrastructure patterns the shipped whitelist does not yet cover. The RCF whitelist ships with MetaDefender NDR platform traffic, Kafka, PostgreSQL, common pod-network CIDR ranges, multicast, and broadcast patterns. Deployments with non-standard infrastructure (corporate-specific management networks, non-standard service-discovery protocols, custom orchestration platforms, atypical VPN concentrator layouts) routinely produce anomalous scores on traffic the infrastructure-aware whitelist does not recognize. Disposition: close when the infrastructure identity is confirmed against the network diagram; tune by adding a CIDR range or a port-list pattern to the whitelist through Policy when the pattern produces sustained alert volume.
  • Warmup after detector service restart, redeployment, or configuration change. Alerts that fire within minutes of a detector-service restart or within the first hours of a configuration change to num_trees, shingle_size, tree_size, or per-type threshold are at risk of being warmup artifacts. The signature is a sudden spike in alert volume across many sources and many event types at the same minute (for restart) or across the same event type (for configuration change). Disposition: close when the restart or configuration-change timestamp aligns with the alert timestamp; disregard alerts in the first few minutes of a detector's lifetime regardless of absolute score.
  • Off-hours automated work (backups, reporting exports, CI pipelines, scheduled scans). Jobs scheduled for nights, weekends, or maintenance windows produce traffic shapes whose time-of-day feature contrasts sharply with the host's daytime baseline. A nightly backup producing a long asymmetric upload at 02:00, a scheduled reporting export producing a burst of HTTPS requests every Sunday, or a CI pipeline issuing repeated artifact pulls every 15 minutes all score anomalously on a forest that has mostly seen daytime traffic from the same source. Disposition: close when the scheduled-job identity is confirmed against the SOC's maintenance-window baseline; tune is not usually appropriate because the underlying signal is useful.
  • Load-balancer, proxy, and NAT-boundary artifacts. Forward proxies, reverse proxies, outbound NAT gateways, and load balancers that aggregate traffic from many hosts onto a single source IP can produce flow shapes on the aggregated address that do not match any single host's baseline — many short flows with diverse destinations, unusual byte-ratio aggregates, and atypical temporal patterns. Disposition: close once the proxy identity is confirmed; tune via whitelist when the proxy IP produces sustained volume.
  • Red-team, penetration-testing, and authorized security-research activity. Authorized internal security activity produces high-scoring ML anomaly alerts by design — red-team toolkits, pentest frameworks, and malware-research workflows produce traffic shapes the forest has never seen because the whole point is that the traffic does not fit the network's norm. Disposition: close once the activity is confirmed with the red-team, the vulnerability-management program, or the malware-research team. Every SOC maintains a communication channel with internal security teams so these alerts are identifiable on sight.

Closing on a false-positive pattern still requires the runbook's evidence record. "Looks like a warmup artifact" is not a disposition; "fifteen-minute spike in ML RCF Anomaly alerts across all three event types at 14:02 on 2026-04-22, coinciding with RCF-inference pod restart per CHG-12389, alert volume returned to baseline at 14:14; closing this alert and the sibling spike alerts as warmup artifacts" is.

See Also

  • Critical Alert Triage — the generic first-response runbook that hands off to this one on the IOC-Critical secondary trigger.
  • C2 Beacon Investigation — companion runbook when the ML anomaly co-occurs with a Beaconing Detection or a C2 Infrastructure Alert on the same source or community_id.
  • Data Exfiltration Investigation — companion runbook when the ML anomaly co-occurs with a Data Exfiltration Detection on the same source or community_id.
  • Malicious File Investigation — companion runbook when an MDCore alert preceded the ML anomaly and the anomaly is plausibly post-delivery behavior on the recipient host.
  • Tunneling Investigation — companion runbook when the ML anomaly is a DNS anomaly whose original event resembles the tunneling-family's per-query suspicion signals.
  • Alert, Flow, and PCAP Pivoting — the pivot-mechanics meta-runbook referenced by steps 3, 4, 5, and 8.
  • ML Anomaly Detections — background on the ML family, including the feature vector, the warmup window, the whitelist semantics, the per-event-type thresholds, the confidence ladder keyed off score-to-threshold ratios, and the Policy-managed tuning levers referenced in the Tune the rule branch.
  • C2 and Threat Intelligence — background on the C2 feed whose matches drive the IOC auto-escalation in step 6.
  • InSights TIDB and REPDB — parallel intelligence feeds referenced in step 6.
  • Behavioral Detections — background on the Beaconing, Data Exfiltration, Long Duration, and DNS Tunneling detections cited as same-source correlation targets in step 5.
  • Detection Overview — unified severity scale, confidence scale, and the IOC auto-escalation rule.
  • Hunt Page — tabs, sidebar, right-click pivots (Hunt all events from this IP, Show all events with this community id, Show related events), the ML RCF Anomaly Event sub-tab, and the per-protocol sidebar sections referenced throughout this runbook.
  • Manager Configuration — Policy management surface referenced in the Tune the rule branch.
Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard