Alert Correlation & Reducing False Positives With Intelligence

The False Positive Problem — Why It Is a CTI Problem

Alert fatigue in most SOC environments is not caused by too many threats. It is caused by too many alerts that were written without enough context to distinguish real threats from normal behavior. Threat intelligence is the primary mechanism for adding that context — either at detection rule creation time or at alert triage time.

An organization that generates 5,000 alerts per day and has a 90% false positive rate is processing 4,500 irrelevant alerts per day. Reducing that to 60% false positive rate by adding CTI context cuts irrelevant work by 1,500 alerts per day — without adding a single analyst.

False Positives Are a Detection Engineering Problem

A rule that fires on a legitimate behavior — and always fires on that behavior — is not tunable by analysts. It has to be fixed at the rule level. CTI helps detection engineers write rules that include threat context from the start: known-bad infrastructure, specific attacker behaviors, combinations of signals that only exist together in malicious contexts. A rule with good CTI context fires rarely and accurately. A rule without it fires constantly and uselessly.

Where False Positives Come From

False Positive Source	Example	CTI-Based Fix
Overly broad rule — matches legitimate behavior	Alert on any PowerShell execution — fires for Windows Updates, software installs, admin scripts	Add parent process and command line filters. Correlate with known-good process signatures. Alert only on unexpected parent+command combinations
Missing allowlist for known-good infrastructure	Alert on connections to 'unknown external IPs' — fires for every new SaaS tool	Maintain enriched allowlist of legitimate business SaaS, CDN ranges, and update infrastructure. Auto-suppress matched destinations
Time-insensitive rules — no context for what is normal	Alert on logins outside business hours — fires for overnight backups, scheduled tasks, international users	Profile baseline per user or asset. Alert on deviation from their specific pattern, not a generic threshold
Single-signal rules — no corroboration required	Alert on any failed login — fires for forgotten passwords constantly	Require two signals: failed login + success from different geo, or failed login + new device, or failed login count > 20
Missing asset context	Alert on suspicious process on any machine — analyst cannot tell if it is a dev workstation or a DC	Enrich every alert with asset classification at fire time. Severity auto-scales with asset criticality

The Allowlist Architecture — Making Suppressions Durable

The worst allowlist is a list of IP addresses. IPs change, CDN ranges rotate, SaaS providers add new ranges — and your allowlist becomes stale within weeks. Durable allowlists suppress based on attributes, not addresses.

Building Durable Allowlists

splunk-spl

# Instead of suppressing specific IPs, build attribute-based suppressions

# CDN and cloud provider allowlisting — by ASN, not IP:
| lookup ip_asn ip OUTPUT asn_name
| eval is_known_cloud = case(
    asn_name LIKE "%Cloudflare%", "true",
    asn_name LIKE "%Akamai%", "true",
    asn_name LIKE "%Amazon%", "true",
    asn_name LIKE "%Microsoft%", "true",
    asn_name LIKE "%Google%", "true",
    true(), "false"
  )
| where is_known_cloud = "false"
| table src_ip, dst_ip, asn_name, alert_type

# This suppression does not need updating when CDN adds a new /16
# because it matches on ASN name, not IP range

# User baseline allowlisting — alert on deviation from individual profile:
# First, establish baseline per user (30-day lookback):
index=auth earliest=-30d
| stats values(src_ip) as known_ips, values(country) as known_countries
    by username
| outputlookup user_baseline.csv

# Then use baseline in alert evaluation:
index=auth
| lookup user_baseline.csv username OUTPUT known_countries
| eval is_baseline_country = if(country IN (known_countries), "true", "false")
| where is_baseline_country = "false"
| table username, src_ip, country, known_countries

The CTI Enrichment Layer in Alert Routing

splunk-spl

# Multi-layer alert enrichment pipeline — adds CTI before analyst sees alert

# Layer 1: Asset enrichment
| lookup asset_inventory ip AS src_ip OUTPUT hostname, owner, criticality, environment
| eval alert_severity = case(
    criticality="critical", "P1",
    criticality="high", "P2",
    criticality="medium", "P3",
    true(), "P4"
  )

# Layer 2: User enrichment
| lookup user_directory username OUTPUT department, manager, role, privileged_user
| eval user_risk = if(privileged_user="true", "elevated", "standard")

# Layer 3: Threat intel enrichment on destination
| lookup threat_intel_ioc ioc AS dst_ip OUTPUT threat_type, confidence, actor
| eval threat_context = if(isnotnull(threat_type), threat_type, "no_intel_match")

# Layer 4: Combine signals to determine routing
| eval final_priority = case(
    threat_context != "no_intel_match" AND alert_severity IN ("P1", "P2"), "IMMEDIATE",
    threat_context != "no_intel_match" AND user_risk = "elevated", "IMMEDIATE",
    alert_severity = "P1" AND user_risk = "elevated", "HIGH",
    alert_severity IN ("P1", "P2"), "HIGH",
    threat_context != "no_intel_match", "MEDIUM",
    true(), "LOW"
  )

# Low priority + no intel = Tier 1 auto-close review
# Medium = Tier 1 investigation required
# High = Tier 2 escalation
# Immediate = Tier 2 + notify IR lead

Cross-Source Alert Correlation — Building the Story

Individual alerts from individual sources are data points. Correlation across multiple sources within a time window builds the story that confirms or rules out malicious activity. CTI provides the connective tissue that makes correlation meaningful.

The Correlation Time Window Approach

splunk-spl

# Correlation: same asset, multiple alert types, short time window
# This is how you find the incidents that individual alerts would miss

index=* (src_ip="10.1.1.85" OR dst_ip="10.1.1.85" OR host="HOSTNAME-85")
| where _time > relative_time(now(), "-4h")
| stats
    count as total_events,
    values(alert_type) as alert_types,
    values(dst_ip) as destinations,
    values(process_name) as processes,
    dc(alert_type) as unique_alert_types
    by src_ip
| where unique_alert_types > 2
| sort -total_events
| table src_ip, total_events, unique_alert_types, alert_types, destinations

# Three or more different alert types from the same source in 4 hours:
# This pattern is almost never a false positive — single alert types can be
# but overlapping alert types from the same source = coordinated activity

# Correlation: sequential login anomaly then lateral movement
| transaction src_ip maxspan=2h
| where eventtype IN ("failed_login", "successful_login", "lateral_movement_attempt")
| stats
    min(_time) as first_event,
    max(_time) as last_event,
    list(eventtype) as event_sequence,
    values(dst_ip) as targets
    by src_ip, username
| eval sequence_suspicious = if(
    like(tostring(event_sequence), "%failed_login%successful_login%lateral%"),
    "HIGH",
    "LOW"
  )
| where sequence_suspicious = "HIGH"

Using ATT&CK to Structure Correlation

MITRE ATT&CK provides a framework for thinking about correlation — not just individual techniques but the sequences that make up a full attack chain. An attacker who progresses from Initial Access to Execution to Persistence leaves evidence in multiple log sources at multiple stages. Correlating across those stages catches what single-source monitoring misses.

ATT&CK Stage	Log Source	Key Indicator to Correlate
Initial Access — Phishing	Email gateway, endpoint	Email with attachment + process creation from email client within 10 minutes
Execution	Endpoint/Sysmon	Unexpected process tree: Office → PowerShell → cmd.exe or wscript.exe
Persistence	Windows Registry events, Scheduled Tasks	New registry run key or scheduled task created by non-admin user or unexpected process
Defense Evasion	EDR behavioral events	Process injection, clearing event logs (EventID 1102), disabling security tools
Credential Access	Windows Security events	EventID 4769 RC4 tickets (Kerberoasting) or EventID 4624 Type 3 from unexpected source
Lateral Movement	Authentication + network logs	New RDP or SMB authentication to multiple hosts from same source in short window
Command and Control	Network/firewall logs	Regular interval connections to same destination, encrypted, small payload
Exfiltration	Firewall traffic logs	Sustained large outbound transfer to unknown destination, unusual hours

Measuring Detection Quality — The Metrics That Matter

Detection engineering without measurement is tuning by feel. These metrics make alert quality objective and track improvement over time.

False Positive Rate per rule: alerts closed without action divided by total alerts from that rule. Any rule above 80% FP rate should be reviewed or retired
True Positive Escalation Rate: percentage of Tier 1 alerts that result in a Tier 2 investigation. Healthy range: 10-25%. Below 5% = rules are too noisy. Above 40% = rules may be too narrow (missing context)
Mean Time from Alert to Escalation Decision: how long Tier 1 spends on each alert before deciding to escalate or close. If this exceeds 15 minutes for common alert types, the alert lacks enrichment
Detection Coverage Map: what percentage of ATT&CK techniques relevant to your threat model have active detection rules? Map this quarterly — coverage gaps drive threat hunting priorities
Alert Volume Trend: total alerts per week, per source. A rule whose volume increases month-over-month without corresponding increase in true positives = environmental drift requiring re-tuning

→

The goal of CTI-enhanced detection is not zero false positives — it is a false positive rate low enough that analysts can properly investigate every real alert. In a high-volume environment, that target is typically below 30% false positives for Tier 1 queues. Above 50% false positives and alert fatigue erodes the quality of every investigation.

Previous Module Next Module