Skip to main content
TACUNS
Module 1 of 3
33% complete
Module 1

SOC Structure & Real Alert Triage Workflow

What a SOC Actually Is (vs What It Looks Like on Paper)

On paper: a Security Operations Center is a team of analysts monitoring security alerts 24 hours a day, detecting threats, and responding to incidents.

In reality: a SOC is a team fighting a constant battle between alert volume and analyst bandwidth. The average enterprise SIEM generates thousands of alerts per day. A Tier 1 analyst handles dozens of them per shift. The gap between alerts generated and alerts properly investigated is where attackers live.

Understanding this tension is the foundation of SOC operations. Every decision — how to tune detection rules, when to escalate, how to triage — exists to close that gap.

SOC Tier Structure — What Each Tier Actually Does

TierRoleWhat They Actually DoEscalates To
Tier 1Alert AnalystFirst responder to every SIEM alert. Validates whether alert is real or false positive. Follows playbooks. Does NOT do deep investigation.Tier 2 when confirmed true positive or high-confidence suspicious
Tier 2Incident AnalystTakes escalated alerts. Performs deep investigation — log correlation, endpoint forensics, network traffic analysis. Determines scope and impact.Tier 3 for complex or high-severity incidents, or for detection gap findings
Tier 3Threat Hunter / IR LeadProactive threat hunting. Builds new detections. Leads incident response for major events. Reviews Tier 1/2 work for quality.CISO / executive for P1 incidents, external IR firm for major breaches
Detection EngineeringRule DeveloperWrites and maintains SIEM detection rules, SOAR playbooks, and alert logic. Feeds output of Tier 2/3 findings back into detection.Tier 2/3 for testing new rules

The Problem With Tier 1 Alert Factories

Many SOCs treat Tier 1 as a checkbox exercise — open alert, apply playbook, close alert, repeat. This produces no real security value when the playbook says "mark as false positive" for 90% of alerts. A mature SOC invests in detection quality so that Tier 1 alerts are high confidence and analysts spend time investigating, not triaging noise. If your Tier 1 analysts are closing 80% of alerts as false positives without escalating — the detection rules are wrong, not the analysts.

How Alerts Actually Flow

Understanding the alert flow prevents the most common SOC failure mode: treating every alert as independent instead of part of a story.

The Alert Lifecycle

1. Detection fires: SIEM correlation rule matches, EDR fires a behavioral alert, or an external threat feed match occurs. Alert appears in the queue.

2. Tier 1 triage: Analyst opens the alert. First question: is this a known good? Check against allowlist, known scheduled tasks, vulnerability scanners, backup agents. If known good — close as false positive, document why. If not known good — proceed.

3. Initial investigation: Analyst pulls context — who is the user, what is the asset, what is the risk classification of the asset, what happened in the 30 minutes before the alert. This context changes the severity significantly.

4. Confidence decision: Based on context, analyst decides: confirmed false positive (close), unclear (investigate further), confirmed suspicious (escalate to Tier 2), confirmed malicious (escalate immediately + initiate containment playbook).

5. Escalation or closure: If escalating — write a clear handoff note. What triggered the alert, what investigation was done, what makes it suspicious. Tier 2 should not repeat Tier 1 work.

The Triage Decision Framework

A Tier 1 analyst makes three sequential assessments for every alert. This framework prevents both false positive fatigue and missed real threats.

Assessment 1: Is This Real?

  • Is the source a known scanner, backup agent, or monitoring tool? Check your known-good inventory.
  • Is this activity happening at a time when it normally occurs? A file backup at 2am is expected. A login at 2am from an executive in a different timezone is not.
  • Does the alert volume match what you expect? One failed login is noise. 500 failed logins in 30 seconds is not.
  • Is the destination a known-good internal or external service?

Assessment 2: What Is the Asset?

  • Is the affected asset a workstation, server, or critical infrastructure?
  • Does the asset handle sensitive data — PII, payment data, intellectual property?
  • Is the asset reachable from the internet directly?
  • What is the blast radius if this asset is compromised — can it reach other critical systems?

Asset Context Changes Everything

A PowerShell alert on a developer workstation is different from the same alert on a domain controller. Same alert logic, completely different severity and response. Without asset inventory and classification in your SIEM, every alert looks the same — and you are triaging blind.

Assessment 3: What Happened Around This Event?

  • What did this user/asset do in the 30 minutes before the alert?
  • Is this alert part of a sequence — did another alert fire on the same asset recently?
  • Is there corroborating activity in other log sources — DNS, network, endpoint?
  • Has this same pattern been seen before? Was it investigated?

Common Alert Types and How to Triage Each

Failed Login Alerts

PatternLikely ExplanationAction
5-10 failures from known user, normal business hoursUser forgot passwordConfirm with user, close as benign
5-10 failures from known user, 3amCredential stuffing attempt or account takeoverEscalate to Tier 2, investigate account
500+ failures across many usernames from one external IPCredential stuffing attackBlock source IP, escalate, check for any successes in the sequence
Failures followed immediately by successful loginBrute force succeeded OR MFA bypassCritical — escalate immediately, consider account lockdown
Failures on service account from unexpected sourceLateral movement or misconfigured applicationEscalate — service accounts should have known, fixed source IPs

Unusual Process Execution Alerts

siem-query
# What Tier 1 looks for when EDR fires on PowerShell:

# 1. What launched it?
# - Office application (winword.exe, excel.exe) → high suspicion
# - Browser → high suspicion
# - Windows scheduler → check if the task is known
# - svchost.exe → very suspicious — PowerShell shouldn't spawn from service host

# 2. What did it do?
# - Encoded command (-EncodedCommand or -enc) → suspicious
# - Downloaded from internet (Invoke-WebRequest, WebClient) → high suspicion
# - Accessed LSASS or other security-sensitive process → critical
# - Ran entirely in memory, no file on disk → critical

# 3. Did it communicate externally?
# Cross-reference process creation time with network logs for the same host
# Outbound connection immediately after PowerShell launch = likely C2

Alert Fatigue — The Real SOC Threat

Alert fatigue is not a people problem. It is a process and tooling problem. When analysts are forced to process hundreds of low-quality alerts per shift, two failure modes emerge: they start auto-closing alerts without proper investigation, or they burn out and leave. Both destroy the security value of the SOC.

Measuring SOC Health

MetricWhat It MeasuresHealthy Range
False Positive RatePercentage of alerts closed without actionBelow 30% — above 50% means detection rules need tuning
Mean Time to Triage (MTTT)Average time to complete initial triage per alertUnder 15 minutes for Tier 1
Escalation RatePercentage of alerts escalated to Tier 25-20% depending on environment maturity
Mean Time to Detect (MTTD)Time from attacker action to first alertUnder 24 hours is good — under 1 hour is mature
Mean Time to Respond (MTTR)Time from first alert to containmentUnder 4 hours for high-severity incidents