Incident Response in Production — From Alert to Resolution

Incident Response vs Alert Investigation

Alert investigation is what Tier 1 does every shift — triage, determine real or false positive, close or escalate. Incident response is what happens when a confirmed threat requires coordinated action across multiple teams to contain, eradicate, and recover.

The transition from investigation to incident response happens at a specific decision point: when the threat is confirmed and the scope extends beyond a single user or system. That decision point is the most important moment in the IR lifecycle — acting too slowly lets attackers expand their foothold, acting too aggressively without understanding scope can tip off the attacker before containment is ready.

The IR Lifecycle — What Each Phase Actually Looks Like

Phase 1: Preparation (Before the Incident)

The quality of incident response is determined before the incident happens. Teams that have documented playbooks, tested communication channels, and practiced containment procedures respond in minutes. Teams without these respond in hours of confusion.

Documented playbooks for the most likely incident types: ransomware, data breach, account compromise, insider threat
Pre-approved containment authority — who can authorize isolating a production server at 3am?
Communication tree — who gets called when, in what order, via what channel
Asset inventory that IR can query in real time — which assets are critical, who owns them
Tested SIEM and EDR access for all IR team members — not the night of the incident

Phase 2: Detection and Analysis

An incident starts when a Tier 2 analyst confirms a threat is real and scoped. The immediate questions:

Question	Where to Find the Answer
What is the initial access vector?	Phishing logs, VPN auth logs, vulnerability scan hits, firewall traffic logs for unusual inbound connections
What accounts are involved?	Authentication logs — which accounts logged in from affected systems, what privilege level
What systems are affected?	EDR — lateral movement events, new processes on other systems by the same attacker tools
Is data being exfiltrated?	Firewall traffic logs — large outbound transfers, encrypted channels to unknown destinations, DNS tunneling patterns
Is the attacker still active?	Real-time EDR alerts, active sessions in SIEM, live network connections from affected systems

Phase 3: Containment — The Critical Decision

Containment is the phase where timing matters most. The goal is to stop attacker movement without tipping them off prematurely — a contained attacker who knows they are discovered will destroy evidence, trigger ransomware, or pivot to backup access paths.

→

Soft containment first: Increase monitoring, prepare isolation scripts, stage blocks in firewall — do not apply yet. Use this time to scope the full compromise before cutting access.

→

Hard containment when scope is understood: Isolate affected systems simultaneously across all confirmed compromised hosts. Sequential isolation gives the attacker time to pivot away from the first-isolated system before the others are cut off.

→

Preserve evidence before containment: Memory images of affected systems, network captures, logs exported to a separate system. Once isolated, the attacker's active processes disappear.

Real Incident Scenario: Suspicious Outbound Beacon Escalates to IR

Walk through a real incident timeline to understand how phases overlap and decisions interact.

08:15 — Tier 1 Alert

SIEM fires: "Outbound connection to known-bad IP from internal host 10.1.1.85." Tier 1 analyst opens the alert.

siem-query

# Tier 1 first check — is the destination actually bad?
# Query threat intel platform: 185.220.101.45
# Result: confirmed Tor exit node associated with C2 infrastructure, seen in 3 campaigns

# Is this a known scanner or internal tool? Check asset inventory
# Asset: 10.1.1.85 = workstation, assigned to Finance user Jane Smith
# Not a scanner, not a server

# Pull all traffic from this host to this destination
# Result: 47 connections over 6 hours, each lasting 58-62 seconds at precise intervals
# Beaconing pattern confirmed

# Tier 1 decision: escalate to Tier 2 — two signals (bad IP + beaconing pattern)

08:32 — Tier 2 Takes Over

siem-query

# Tier 2 opens investigation — pull full picture for 10.1.1.85

# Step 1: What process is making these connections?
# EDR query: network connections from 10.1.1.85 to 185.220.101.45
# Result: process = chrome.exe — but wait, check parent process
# Parent: powershell.exe → parent of powershell: outlook.exe

# Execution chain: outlook → powershell → chrome → C2
# This is a phishing-triggered execution

# Step 2: When did it start?
# First beacon: 02:17 — user was not at work at 2am
# Email arrived at 01:54 with attachment
# User opened attachment at 02:13 (logged in after hours? check badge access)
# No badge access — email opened automatically by auto-preview? Or user worked late?

# Step 3: What else did powershell do?
# EDR: powershell executed → downloaded payload to %TEMP%\update_helper.exe
# File hash: check against VirusTotal → 43/70 vendors detect as Cobalt Strike beacon

# Step 4: Did it move laterally?
# Check for logins FROM 10.1.1.85 to other systems in same time window
# Found: RDP session from 10.1.1.85 to 10.1.1.200 (file server) at 03:45

# Scope update: two systems confirmed — workstation + file server
# Tier 2 escalates to Tier 3 / IR lead — confirmed multi-host compromise

09:15 — IR Decision Point

IR lead reviews findings. Scope: two hosts confirmed, initial access via phishing attachment, Cobalt Strike beacon active for 7 hours, lateral movement to file server occurred.

Preserve evidence: EDR memory capture on both systems initiated immediately
Confirm file server access: what did the attacker access? File access logs pulled for 10.1.1.200 for the 03:45-09:15 window
Check for other lateral movement: search all systems for connections FROM 10.1.1.85 and 10.1.1.200 to other internal hosts
Prepare simultaneous isolation: firewall rule drafted to block both hosts, EDR isolation staged — NOT applied yet
Notify management: CISO informed of confirmed compromise, brief without speculation

09:45 — Containment Executed

Scope confirmed: two hosts. File server access confirmed: attacker accessed HR document share. No further lateral movement found.

10.1.1.85 and 10.1.1.200 isolated simultaneously via EDR network isolation
Firewall rules applied to block both IPs at perimeter
User Jane Smith account disabled pending investigation
C2 destination IP 185.220.101.45 and associated domain blocked in firewall
HR notified of potential document exposure

Communication During an Incident

Poor communication during incidents causes as much damage as delayed technical response. Two failure modes: over-communicating speculation (causes panic), under-communicating confirmed facts (leaves stakeholders making decisions without information).

What to Communicate and When

Phase	What to Say	What NOT to Say
Initial escalation	'We are investigating a potential security incident involving [asset type]. More details in 30 minutes.'	'We may have been breached' — scope is unknown, do not speculate
During investigation	'Confirmed incident affecting [N] systems. Containment in progress. Update in 60 minutes.'	Do not name specific employees or data until confirmed and legally reviewed
Post-containment	'Incident contained. [N] systems isolated. Eradication and recovery in progress. Full report within 24 hours.'	Do not declare resolution until post-incident review is complete

Post-Incident Review — The Most Skipped Step

The post-incident review is where incidents pay dividends for the future. Most teams skip it under the pressure of returning to normal operations. Teams that conduct it consistently get measurably better at detection and response with each incident.

Timeline reconstruction — from initial attacker action to detection to containment. Every gap is a finding.
Detection gap analysis — could we have caught this earlier? What log or rule would have?
Response gap analysis — what slowed us down? Missing playbook, unclear escalation path, access issues?
New detection rules — every confirmed TTP in the incident that is not currently detected becomes a new SIEM rule
Control gap findings — what allowed initial access? What failed to prevent lateral movement?
Lessons documented and tracked — findings assigned to owners with due dates, not filed and forgotten

The Loop That Improves Security

Detection finds the incident. Investigation scopes it. Response contains it. Post-incident review feeds new detections back into the SIEM. The next time the same attacker technique is used, the detection fires earlier and the response is faster because the playbook already exists. This loop — incident to detection improvement — is how security programs get materially better over time rather than just responding to the same attacks repeatedly.

Course Complete

You have completed SOC Operations Basics. You now understand how SOC teams are structured, how alerts flow through real triage decisions, how to read NGFW logs for investigation, how to correlate across multiple sources, and how real incident response works from first alert to post-incident review. These are the foundations every SOC analyst needs — not the theory of incident response, but how it actually works under production pressure.

Previous Module

Course Complete