Spanning Tree Loops & Broadcast Storms — 'The Network Just Died'

The Call Pattern That Means Spanning Tree

"Everything was fine, now nothing is working — no warning"
"Network is down but all switches show green lights and ports are up"
"Users can ping the switch but cannot reach anything beyond it"
"Network recovered on its own, then went down again 10 minutes later"
"We added a new access switch and everything died"
"The core switch CPU is at 100% — nothing is responding"

Why Spanning Tree Failures Are Total

A broadcast storm does not degrade gracefully. It consumes switch CPU and uplink bandwidth exponentially — within seconds of a loop forming, broadcast traffic fills every port. Switches spend all CPU forwarding broadcast frames and have nothing left to process management traffic, routing protocols, or user data. The network appears completely dead while every interface shows as up and passing traffic.

How a Spanning Tree Loop Forms in Production

Spanning Tree Protocol prevents loops by blocking redundant paths. When STP is working correctly, one path between any two points is active and all others are in blocking state. A loop forms when the blocking is removed — intentionally or not — and two active paths exist between the same switches.

Trigger	What Happens	How Common
New switch added without STP configured	Unmanaged switch creates a second path between two managed switches — STP not running on the new device	Very common — unmanaged switches from IT stores or user-brought-in equipment
STP disabled 'temporarily' for a migration	Engineer disables portfast or STP on a trunk, forgets to re-enable	Common — temporary changes that become permanent until a failure
Trunk link reconnected after physical work	Both ends of a trunk become active before STP converges — loop exists during convergence	Common — physical maintenance windows
Rogue BPDU from a new device claiming to be root	New switch sends superior BPDUs, core switch yields root, topology recalculates — old blocking ports become active	Less common but causes immediate network-wide impact
Software bug causing BPDU flooding or port state error	Switch fails to properly block a port after STP topology change	Rare but documented in specific switch firmware versions

Identifying a Broadcast Storm — The Debug Sequence

Step 1: Confirm the Symptom at Layer 2

cisco-ios

! First thing to check — interface counters
! A broadcast storm shows up as massive input/output rates on multiple interfaces simultaneously

show interfaces | include GigaEthernet|packets input|packets output|broadcasts
! In a normal network: broadcast counts are a small fraction of total packets
! In a storm: broadcasts can be 90-100% of all traffic

! More specific view:
show interface GigabitEthernet1/0/1
! Look for:
! Input rate: 1000000000 bits/sec  ← wire-rate input is abnormal
! Output rate: 1000000000 bits/sec ← wire-rate output is abnormal
! Broadcasts: 50000/sec            ← massive broadcast rate

! Check all interfaces at once for wire-rate traffic:
show interfaces counters | include rate
! Any interface showing near line-rate on input AND output = loop suspect

! Check CPU utilization — a storm maxes out CPU:
show processes cpu sorted | head 10
! If CPU is at 95-100% and the top process is "Cat4k Mgmt LoPri" or similar
! spanning tree/bridging process = almost certainly a loop

Step 2: Check Spanning Tree State

cisco-ios

! Check STP topology for all VLANs
show spanning-tree summary
! Shows: number of ports in each state (forwarding, blocking, listening, learning)
! A healthy network has some blocking ports — if ALL ports are forwarding, there may be no redundancy protection

! Check specific VLAN STP state
show spanning-tree vlan 1
! Shows:
! Root ID — who is the root bridge
! Bridge ID — this switch's priority and MAC
! Port states — which ports are designated, root, alternate (blocked)

! Look for topology changes — these indicate the storm has started:
show spanning-tree detail | include topology change
! Shows: "Number of topology changes 47 last change occurred 0:00:05 ago"
! High topology change count = loop or instability
! Recent timestamp = active problem

! Find ports causing topology changes:
show spanning-tree detail | include port which|topology
! Identifies which port is causing TCs — this is your suspected loop port

! Check if any port is in an unexpected state:
show spanning-tree vlan 1 detail | include Port|State
! A port that should be blocking but is now forwarding = loop

Step 3: Identify the Loop Port

cisco-ios

! Find which MAC addresses are flapping between ports
! (same MAC appearing on multiple ports = loop)
show mac address-table | include dynamic
! If same MAC address appears on multiple interfaces = loop confirmed

! More targeted — look for MAC address instability:
show mac address-table count
! Unusually high MAC count can indicate MAC thrashing

! On Cisco Catalyst — check for MAC flapping explicitly:
show interfaces status err-disabled
! Ports may have been error-disabled by storm-control or BPDU guard

! Check log for STP events:
show log | include STP|spanning|topology|BPDU
! Look for:
! "%SPANTREE-2-LOOPGUARD_BLOCK" = loop guard activated
! "%SPANTREE-2-RECV_PVID_ERR" = VLAN mismatch on trunk
! "%SPANTREE-7-TOPOTCHANGE" = topology change events
! "%PORT_SECURITY-2-PSECURE_VIOLATION" = MAC flap violation

Emergency Response — Stopping the Storm

During a live broadcast storm, the priority is stopping the traffic flood before investigating root cause. The network is unusable until the loop is broken.

cisco-ios

! Option 1: Shut down suspected loop port immediately
! This stops the storm — you can investigate after connectivity is restored
interface GigabitEthernet1/0/24
  shutdown
! If network recovers immediately after this: this port was the loop source

! Option 2: Shut down the access switch that was recently added
! Physically disconnect or shut the uplink port to the new switch
! Network should recover within 30 seconds of STP reconvergence

! Option 3: If you cannot identify the port — shutdown all recently changed ports
! Start with the most recently added/changed connection and work backward

! After the storm is stopped — before re-enabling the port:
! Apply storm control to prevent recurrence:
interface GigabitEthernet1/0/24
  storm-control broadcast level 20
  storm-control action shutdown
  spanning-tree portfast
  spanning-tree bpduguard enable
! bpduguard: automatically shuts down if a switch is connected to this port
! portfast: skip listening/learning states for access ports (safe for endpoint ports only)

! Re-enable port after protection is configured:
no shutdown

BPDU Guard — The Right Default for Access Ports

Every access switch port connected to an endpoint (workstation, phone, printer) should have BPDU Guard enabled. If a user connects a small unmanaged switch or a device that sends BPDUs, BPDU Guard immediately err-disables the port. This prevents the most common cause of broadcast storms — unmanaged switches added without IT knowledge. The port can be re-enabled after the rogue device is removed. Enable BPDU Guard globally on all access switch configurations, not port-by-port.

Root Cause Patterns After Recovery

The Unmanaged Switch Problem

The most common root cause in enterprise networks: a user or technician adds an inexpensive unmanaged switch to expand ports at a desk or in a server room. Unmanaged switches do not run STP. When one is connected with two cables — one to each of two managed switch ports — it creates a loop that STP on the managed switches cannot detect. The loop is invisible until the storm starts.

Check LLDP/CDP neighbors on all ports — an unmanaged switch will not appear in CDP/LLDP neighbor tables
Any port with a high MAC count that is not a trunk = suspect (multiple MACs on an access port)
Check for duplicate MAC addresses appearing on separate interfaces simultaneously
Physical walkthrough of recently changed areas — look for power strips with built-in network switches

The Trunk Mismatch Trigger

cisco-ios

! Trunk VLAN mismatch can cause spanning tree issues:
! If VLAN is native on one side but tagged on other, STP BPDUs go to wrong VLAN

show interfaces trunk
! Verify: VLAN lists match on both sides of each trunk
! Verify: native VLAN is same on both ends of each trunk

! Check for native VLAN mismatch specifically:
show spanning-tree vlan 1 | include native
! Or check CDP for mismatch warnings:
show cdp neighbors detail | include Native VLAN

! Native VLAN mismatch fix:
interface GigabitEthernet1/0/1
  switchport trunk native vlan 10
! Must match on both ends of the trunk

Prevention — Spanning Tree Hardening

Protection	Purpose	Apply To
BPDU Guard	Err-disables port if BPDU received — prevents user-connected switches from participating in STP	All access ports (endpoints, phones, printers)
Root Guard	Prevents a port from becoming root port — protects root bridge selection from rogue BPDUs	All distribution and core switch downlinks
Storm Control	Limits broadcast/multicast/unknown-unicast rate — contains storm before it takes down the network	All switch ports, especially access ports
Loop Guard	Prevents alternate/backup port from becoming designated if BPDUs stop — protects against unidirectional link failures	All STP redundant uplinks
UDLD	Detects unidirectional fiber links where one strand works and STP sees the link as up but traffic only flows one way	All fiber uplinks between switches

→

STP hardening is a one-time configuration applied globally, not reactive troubleshooting. A network that has never had a broadcast storm has usually had these protections applied from the beginning. A network that has periodic unexplained outages has usually skipped them. Apply BPDU Guard and Root Guard as part of initial switch configuration, not after the first storm.

Previous Module Next Module