Alert Correlation & Reducing False Positives With Intelligence
The False Positive Problem — Why It Is a CTI Problem
Alert fatigue in most SOC environments is not caused by too many threats. It is caused by too many alerts that were written without enough context to distinguish real threats from normal behavior. Threat intelligence is the primary mechanism for adding that context — either at detection rule creation time or at alert triage time.
An organization that generates 5,000 alerts per day and has a 90% false positive rate is processing 4,500 irrelevant alerts per day. Reducing that to 60% false positive rate by adding CTI context cuts irrelevant work by 1,500 alerts per day — without adding a single analyst.
False Positives Are a Detection Engineering Problem
Where False Positives Come From
| False Positive Source | Example | CTI-Based Fix |
|---|---|---|
| Overly broad rule — matches legitimate behavior | Alert on any PowerShell execution — fires for Windows Updates, software installs, admin scripts | Add parent process and command line filters. Correlate with known-good process signatures. Alert only on unexpected parent+command combinations |
| Missing allowlist for known-good infrastructure | Alert on connections to 'unknown external IPs' — fires for every new SaaS tool | Maintain enriched allowlist of legitimate business SaaS, CDN ranges, and update infrastructure. Auto-suppress matched destinations |
| Time-insensitive rules — no context for what is normal | Alert on logins outside business hours — fires for overnight backups, scheduled tasks, international users | Profile baseline per user or asset. Alert on deviation from their specific pattern, not a generic threshold |
| Single-signal rules — no corroboration required | Alert on any failed login — fires for forgotten passwords constantly | Require two signals: failed login + success from different geo, or failed login + new device, or failed login count > 20 |
| Missing asset context | Alert on suspicious process on any machine — analyst cannot tell if it is a dev workstation or a DC | Enrich every alert with asset classification at fire time. Severity auto-scales with asset criticality |
The Allowlist Architecture — Making Suppressions Durable
The worst allowlist is a list of IP addresses. IPs change, CDN ranges rotate, SaaS providers add new ranges — and your allowlist becomes stale within weeks. Durable allowlists suppress based on attributes, not addresses.
Building Durable Allowlists
# Instead of suppressing specific IPs, build attribute-based suppressions
# CDN and cloud provider allowlisting — by ASN, not IP:
| lookup ip_asn ip OUTPUT asn_name
| eval is_known_cloud = case(
asn_name LIKE "%Cloudflare%", "true",
asn_name LIKE "%Akamai%", "true",
asn_name LIKE "%Amazon%", "true",
asn_name LIKE "%Microsoft%", "true",
asn_name LIKE "%Google%", "true",
true(), "false"
)
| where is_known_cloud = "false"
| table src_ip, dst_ip, asn_name, alert_type
# This suppression does not need updating when CDN adds a new /16
# because it matches on ASN name, not IP range
# User baseline allowlisting — alert on deviation from individual profile:
# First, establish baseline per user (30-day lookback):
index=auth earliest=-30d
| stats values(src_ip) as known_ips, values(country) as known_countries
by username
| outputlookup user_baseline.csv
# Then use baseline in alert evaluation:
index=auth
| lookup user_baseline.csv username OUTPUT known_countries
| eval is_baseline_country = if(country IN (known_countries), "true", "false")
| where is_baseline_country = "false"
| table username, src_ip, country, known_countriesThe CTI Enrichment Layer in Alert Routing
# Multi-layer alert enrichment pipeline — adds CTI before analyst sees alert
# Layer 1: Asset enrichment
| lookup asset_inventory ip AS src_ip OUTPUT hostname, owner, criticality, environment
| eval alert_severity = case(
criticality="critical", "P1",
criticality="high", "P2",
criticality="medium", "P3",
true(), "P4"
)
# Layer 2: User enrichment
| lookup user_directory username OUTPUT department, manager, role, privileged_user
| eval user_risk = if(privileged_user="true", "elevated", "standard")
# Layer 3: Threat intel enrichment on destination
| lookup threat_intel_ioc ioc AS dst_ip OUTPUT threat_type, confidence, actor
| eval threat_context = if(isnotnull(threat_type), threat_type, "no_intel_match")
# Layer 4: Combine signals to determine routing
| eval final_priority = case(
threat_context != "no_intel_match" AND alert_severity IN ("P1", "P2"), "IMMEDIATE",
threat_context != "no_intel_match" AND user_risk = "elevated", "IMMEDIATE",
alert_severity = "P1" AND user_risk = "elevated", "HIGH",
alert_severity IN ("P1", "P2"), "HIGH",
threat_context != "no_intel_match", "MEDIUM",
true(), "LOW"
)
# Low priority + no intel = Tier 1 auto-close review
# Medium = Tier 1 investigation required
# High = Tier 2 escalation
# Immediate = Tier 2 + notify IR leadCross-Source Alert Correlation — Building the Story
Individual alerts from individual sources are data points. Correlation across multiple sources within a time window builds the story that confirms or rules out malicious activity. CTI provides the connective tissue that makes correlation meaningful.
The Correlation Time Window Approach
# Correlation: same asset, multiple alert types, short time window
# This is how you find the incidents that individual alerts would miss
index=* (src_ip="10.1.1.85" OR dst_ip="10.1.1.85" OR host="HOSTNAME-85")
| where _time > relative_time(now(), "-4h")
| stats
count as total_events,
values(alert_type) as alert_types,
values(dst_ip) as destinations,
values(process_name) as processes,
dc(alert_type) as unique_alert_types
by src_ip
| where unique_alert_types > 2
| sort -total_events
| table src_ip, total_events, unique_alert_types, alert_types, destinations
# Three or more different alert types from the same source in 4 hours:
# This pattern is almost never a false positive — single alert types can be
# but overlapping alert types from the same source = coordinated activity
# Correlation: sequential login anomaly then lateral movement
| transaction src_ip maxspan=2h
| where eventtype IN ("failed_login", "successful_login", "lateral_movement_attempt")
| stats
min(_time) as first_event,
max(_time) as last_event,
list(eventtype) as event_sequence,
values(dst_ip) as targets
by src_ip, username
| eval sequence_suspicious = if(
like(tostring(event_sequence), "%failed_login%successful_login%lateral%"),
"HIGH",
"LOW"
)
| where sequence_suspicious = "HIGH"Using ATT&CK to Structure Correlation
MITRE ATT&CK provides a framework for thinking about correlation — not just individual techniques but the sequences that make up a full attack chain. An attacker who progresses from Initial Access to Execution to Persistence leaves evidence in multiple log sources at multiple stages. Correlating across those stages catches what single-source monitoring misses.
| ATT&CK Stage | Log Source | Key Indicator to Correlate |
|---|---|---|
| Initial Access — Phishing | Email gateway, endpoint | Email with attachment + process creation from email client within 10 minutes |
| Execution | Endpoint/Sysmon | Unexpected process tree: Office → PowerShell → cmd.exe or wscript.exe |
| Persistence | Windows Registry events, Scheduled Tasks | New registry run key or scheduled task created by non-admin user or unexpected process |
| Defense Evasion | EDR behavioral events | Process injection, clearing event logs (EventID 1102), disabling security tools |
| Credential Access | Windows Security events | EventID 4769 RC4 tickets (Kerberoasting) or EventID 4624 Type 3 from unexpected source |
| Lateral Movement | Authentication + network logs | New RDP or SMB authentication to multiple hosts from same source in short window |
| Command and Control | Network/firewall logs | Regular interval connections to same destination, encrypted, small payload |
| Exfiltration | Firewall traffic logs | Sustained large outbound transfer to unknown destination, unusual hours |
Measuring Detection Quality — The Metrics That Matter
Detection engineering without measurement is tuning by feel. These metrics make alert quality objective and track improvement over time.
- False Positive Rate per rule: alerts closed without action divided by total alerts from that rule. Any rule above 80% FP rate should be reviewed or retired
- True Positive Escalation Rate: percentage of Tier 1 alerts that result in a Tier 2 investigation. Healthy range: 10-25%. Below 5% = rules are too noisy. Above 40% = rules may be too narrow (missing context)
- Mean Time from Alert to Escalation Decision: how long Tier 1 spends on each alert before deciding to escalate or close. If this exceeds 15 minutes for common alert types, the alert lacks enrichment
- Detection Coverage Map: what percentage of ATT&CK techniques relevant to your threat model have active detection rules? Map this quarterly — coverage gaps drive threat hunting priorities
- Alert Volume Trend: total alerts per week, per source. A rule whose volume increases month-over-month without corresponding increase in true positives = environmental drift requiring re-tuning
The goal of CTI-enhanced detection is not zero false positives — it is a false positive rate low enough that analysts can properly investigate every real alert. In a high-volume environment, that target is typically below 30% false positives for Tier 1 queues. Above 50% false positives and alert fatigue erodes the quality of every investigation.