Incident Response in Production — From Alert to Resolution
Incident Response vs Alert Investigation
Alert investigation is what Tier 1 does every shift — triage, determine real or false positive, close or escalate. Incident response is what happens when a confirmed threat requires coordinated action across multiple teams to contain, eradicate, and recover.
The transition from investigation to incident response happens at a specific decision point: when the threat is confirmed and the scope extends beyond a single user or system. That decision point is the most important moment in the IR lifecycle — acting too slowly lets attackers expand their foothold, acting too aggressively without understanding scope can tip off the attacker before containment is ready.
The IR Lifecycle — What Each Phase Actually Looks Like
Phase 1: Preparation (Before the Incident)
The quality of incident response is determined before the incident happens. Teams that have documented playbooks, tested communication channels, and practiced containment procedures respond in minutes. Teams without these respond in hours of confusion.
- Documented playbooks for the most likely incident types: ransomware, data breach, account compromise, insider threat
- Pre-approved containment authority — who can authorize isolating a production server at 3am?
- Communication tree — who gets called when, in what order, via what channel
- Asset inventory that IR can query in real time — which assets are critical, who owns them
- Tested SIEM and EDR access for all IR team members — not the night of the incident
Phase 2: Detection and Analysis
An incident starts when a Tier 2 analyst confirms a threat is real and scoped. The immediate questions:
| Question | Where to Find the Answer |
|---|---|
| What is the initial access vector? | Phishing logs, VPN auth logs, vulnerability scan hits, firewall traffic logs for unusual inbound connections |
| What accounts are involved? | Authentication logs — which accounts logged in from affected systems, what privilege level |
| What systems are affected? | EDR — lateral movement events, new processes on other systems by the same attacker tools |
| Is data being exfiltrated? | Firewall traffic logs — large outbound transfers, encrypted channels to unknown destinations, DNS tunneling patterns |
| Is the attacker still active? | Real-time EDR alerts, active sessions in SIEM, live network connections from affected systems |
Phase 3: Containment — The Critical Decision
Containment is the phase where timing matters most. The goal is to stop attacker movement without tipping them off prematurely — a contained attacker who knows they are discovered will destroy evidence, trigger ransomware, or pivot to backup access paths.
Soft containment first: Increase monitoring, prepare isolation scripts, stage blocks in firewall — do not apply yet. Use this time to scope the full compromise before cutting access.
Hard containment when scope is understood: Isolate affected systems simultaneously across all confirmed compromised hosts. Sequential isolation gives the attacker time to pivot away from the first-isolated system before the others are cut off.
Preserve evidence before containment: Memory images of affected systems, network captures, logs exported to a separate system. Once isolated, the attacker's active processes disappear.
Real Incident Scenario: Suspicious Outbound Beacon Escalates to IR
Walk through a real incident timeline to understand how phases overlap and decisions interact.
08:15 — Tier 1 Alert
SIEM fires: "Outbound connection to known-bad IP from internal host 10.1.1.85." Tier 1 analyst opens the alert.
# Tier 1 first check — is the destination actually bad? # Query threat intel platform: 185.220.101.45 # Result: confirmed Tor exit node associated with C2 infrastructure, seen in 3 campaigns # Is this a known scanner or internal tool? Check asset inventory # Asset: 10.1.1.85 = workstation, assigned to Finance user Jane Smith # Not a scanner, not a server # Pull all traffic from this host to this destination # Result: 47 connections over 6 hours, each lasting 58-62 seconds at precise intervals # Beaconing pattern confirmed # Tier 1 decision: escalate to Tier 2 — two signals (bad IP + beaconing pattern)
08:32 — Tier 2 Takes Over
# Tier 2 opens investigation — pull full picture for 10.1.1.85 # Step 1: What process is making these connections? # EDR query: network connections from 10.1.1.85 to 185.220.101.45 # Result: process = chrome.exe — but wait, check parent process # Parent: powershell.exe → parent of powershell: outlook.exe # Execution chain: outlook → powershell → chrome → C2 # This is a phishing-triggered execution # Step 2: When did it start? # First beacon: 02:17 — user was not at work at 2am # Email arrived at 01:54 with attachment # User opened attachment at 02:13 (logged in after hours? check badge access) # No badge access — email opened automatically by auto-preview? Or user worked late? # Step 3: What else did powershell do? # EDR: powershell executed → downloaded payload to %TEMP%\update_helper.exe # File hash: check against VirusTotal → 43/70 vendors detect as Cobalt Strike beacon # Step 4: Did it move laterally? # Check for logins FROM 10.1.1.85 to other systems in same time window # Found: RDP session from 10.1.1.85 to 10.1.1.200 (file server) at 03:45 # Scope update: two systems confirmed — workstation + file server # Tier 2 escalates to Tier 3 / IR lead — confirmed multi-host compromise
09:15 — IR Decision Point
IR lead reviews findings. Scope: two hosts confirmed, initial access via phishing attachment, Cobalt Strike beacon active for 7 hours, lateral movement to file server occurred.
- Preserve evidence: EDR memory capture on both systems initiated immediately
- Confirm file server access: what did the attacker access? File access logs pulled for 10.1.1.200 for the 03:45-09:15 window
- Check for other lateral movement: search all systems for connections FROM 10.1.1.85 and 10.1.1.200 to other internal hosts
- Prepare simultaneous isolation: firewall rule drafted to block both hosts, EDR isolation staged — NOT applied yet
- Notify management: CISO informed of confirmed compromise, brief without speculation
09:45 — Containment Executed
Scope confirmed: two hosts. File server access confirmed: attacker accessed HR document share. No further lateral movement found.
- 10.1.1.85 and 10.1.1.200 isolated simultaneously via EDR network isolation
- Firewall rules applied to block both IPs at perimeter
- User Jane Smith account disabled pending investigation
- C2 destination IP 185.220.101.45 and associated domain blocked in firewall
- HR notified of potential document exposure
Communication During an Incident
Poor communication during incidents causes as much damage as delayed technical response. Two failure modes: over-communicating speculation (causes panic), under-communicating confirmed facts (leaves stakeholders making decisions without information).
What to Communicate and When
| Phase | What to Say | What NOT to Say |
|---|---|---|
| Initial escalation | 'We are investigating a potential security incident involving [asset type]. More details in 30 minutes.' | 'We may have been breached' — scope is unknown, do not speculate |
| During investigation | 'Confirmed incident affecting [N] systems. Containment in progress. Update in 60 minutes.' | Do not name specific employees or data until confirmed and legally reviewed |
| Post-containment | 'Incident contained. [N] systems isolated. Eradication and recovery in progress. Full report within 24 hours.' | Do not declare resolution until post-incident review is complete |
Post-Incident Review — The Most Skipped Step
The post-incident review is where incidents pay dividends for the future. Most teams skip it under the pressure of returning to normal operations. Teams that conduct it consistently get measurably better at detection and response with each incident.
- Timeline reconstruction — from initial attacker action to detection to containment. Every gap is a finding.
- Detection gap analysis — could we have caught this earlier? What log or rule would have?
- Response gap analysis — what slowed us down? Missing playbook, unclear escalation path, access issues?
- New detection rules — every confirmed TTP in the incident that is not currently detected becomes a new SIEM rule
- Control gap findings — what allowed initial access? What failed to prevent lateral movement?
- Lessons documented and tracked — findings assigned to owners with due dates, not filed and forgotten
The Loop That Improves Security
Course Complete