Spanning Tree Loops & Broadcast Storms — 'The Network Just Died'
The Call Pattern That Means Spanning Tree
- "Everything was fine, now nothing is working — no warning"
- "Network is down but all switches show green lights and ports are up"
- "Users can ping the switch but cannot reach anything beyond it"
- "Network recovered on its own, then went down again 10 minutes later"
- "We added a new access switch and everything died"
- "The core switch CPU is at 100% — nothing is responding"
Why Spanning Tree Failures Are Total
How a Spanning Tree Loop Forms in Production
Spanning Tree Protocol prevents loops by blocking redundant paths. When STP is working correctly, one path between any two points is active and all others are in blocking state. A loop forms when the blocking is removed — intentionally or not — and two active paths exist between the same switches.
| Trigger | What Happens | How Common |
|---|---|---|
| New switch added without STP configured | Unmanaged switch creates a second path between two managed switches — STP not running on the new device | Very common — unmanaged switches from IT stores or user-brought-in equipment |
| STP disabled 'temporarily' for a migration | Engineer disables portfast or STP on a trunk, forgets to re-enable | Common — temporary changes that become permanent until a failure |
| Trunk link reconnected after physical work | Both ends of a trunk become active before STP converges — loop exists during convergence | Common — physical maintenance windows |
| Rogue BPDU from a new device claiming to be root | New switch sends superior BPDUs, core switch yields root, topology recalculates — old blocking ports become active | Less common but causes immediate network-wide impact |
| Software bug causing BPDU flooding or port state error | Switch fails to properly block a port after STP topology change | Rare but documented in specific switch firmware versions |
Identifying a Broadcast Storm — The Debug Sequence
Step 1: Confirm the Symptom at Layer 2
! First thing to check — interface counters ! A broadcast storm shows up as massive input/output rates on multiple interfaces simultaneously show interfaces | include GigaEthernet|packets input|packets output|broadcasts ! In a normal network: broadcast counts are a small fraction of total packets ! In a storm: broadcasts can be 90-100% of all traffic ! More specific view: show interface GigabitEthernet1/0/1 ! Look for: ! Input rate: 1000000000 bits/sec ← wire-rate input is abnormal ! Output rate: 1000000000 bits/sec ← wire-rate output is abnormal ! Broadcasts: 50000/sec ← massive broadcast rate ! Check all interfaces at once for wire-rate traffic: show interfaces counters | include rate ! Any interface showing near line-rate on input AND output = loop suspect ! Check CPU utilization — a storm maxes out CPU: show processes cpu sorted | head 10 ! If CPU is at 95-100% and the top process is "Cat4k Mgmt LoPri" or similar ! spanning tree/bridging process = almost certainly a loop
Step 2: Check Spanning Tree State
! Check STP topology for all VLANs show spanning-tree summary ! Shows: number of ports in each state (forwarding, blocking, listening, learning) ! A healthy network has some blocking ports — if ALL ports are forwarding, there may be no redundancy protection ! Check specific VLAN STP state show spanning-tree vlan 1 ! Shows: ! Root ID — who is the root bridge ! Bridge ID — this switch's priority and MAC ! Port states — which ports are designated, root, alternate (blocked) ! Look for topology changes — these indicate the storm has started: show spanning-tree detail | include topology change ! Shows: "Number of topology changes 47 last change occurred 0:00:05 ago" ! High topology change count = loop or instability ! Recent timestamp = active problem ! Find ports causing topology changes: show spanning-tree detail | include port which|topology ! Identifies which port is causing TCs — this is your suspected loop port ! Check if any port is in an unexpected state: show spanning-tree vlan 1 detail | include Port|State ! A port that should be blocking but is now forwarding = loop
Step 3: Identify the Loop Port
! Find which MAC addresses are flapping between ports ! (same MAC appearing on multiple ports = loop) show mac address-table | include dynamic ! If same MAC address appears on multiple interfaces = loop confirmed ! More targeted — look for MAC address instability: show mac address-table count ! Unusually high MAC count can indicate MAC thrashing ! On Cisco Catalyst — check for MAC flapping explicitly: show interfaces status err-disabled ! Ports may have been error-disabled by storm-control or BPDU guard ! Check log for STP events: show log | include STP|spanning|topology|BPDU ! Look for: ! "%SPANTREE-2-LOOPGUARD_BLOCK" = loop guard activated ! "%SPANTREE-2-RECV_PVID_ERR" = VLAN mismatch on trunk ! "%SPANTREE-7-TOPOTCHANGE" = topology change events ! "%PORT_SECURITY-2-PSECURE_VIOLATION" = MAC flap violation
Emergency Response — Stopping the Storm
During a live broadcast storm, the priority is stopping the traffic flood before investigating root cause. The network is unusable until the loop is broken.
! Option 1: Shut down suspected loop port immediately ! This stops the storm — you can investigate after connectivity is restored interface GigabitEthernet1/0/24 shutdown ! If network recovers immediately after this: this port was the loop source ! Option 2: Shut down the access switch that was recently added ! Physically disconnect or shut the uplink port to the new switch ! Network should recover within 30 seconds of STP reconvergence ! Option 3: If you cannot identify the port — shutdown all recently changed ports ! Start with the most recently added/changed connection and work backward ! After the storm is stopped — before re-enabling the port: ! Apply storm control to prevent recurrence: interface GigabitEthernet1/0/24 storm-control broadcast level 20 storm-control action shutdown spanning-tree portfast spanning-tree bpduguard enable ! bpduguard: automatically shuts down if a switch is connected to this port ! portfast: skip listening/learning states for access ports (safe for endpoint ports only) ! Re-enable port after protection is configured: no shutdown
BPDU Guard — The Right Default for Access Ports
Root Cause Patterns After Recovery
The Unmanaged Switch Problem
The most common root cause in enterprise networks: a user or technician adds an inexpensive unmanaged switch to expand ports at a desk or in a server room. Unmanaged switches do not run STP. When one is connected with two cables — one to each of two managed switch ports — it creates a loop that STP on the managed switches cannot detect. The loop is invisible until the storm starts.
- Check LLDP/CDP neighbors on all ports — an unmanaged switch will not appear in CDP/LLDP neighbor tables
- Any port with a high MAC count that is not a trunk = suspect (multiple MACs on an access port)
- Check for duplicate MAC addresses appearing on separate interfaces simultaneously
- Physical walkthrough of recently changed areas — look for power strips with built-in network switches
The Trunk Mismatch Trigger
! Trunk VLAN mismatch can cause spanning tree issues: ! If VLAN is native on one side but tagged on other, STP BPDUs go to wrong VLAN show interfaces trunk ! Verify: VLAN lists match on both sides of each trunk ! Verify: native VLAN is same on both ends of each trunk ! Check for native VLAN mismatch specifically: show spanning-tree vlan 1 | include native ! Or check CDP for mismatch warnings: show cdp neighbors detail | include Native VLAN ! Native VLAN mismatch fix: interface GigabitEthernet1/0/1 switchport trunk native vlan 10 ! Must match on both ends of the trunk
Prevention — Spanning Tree Hardening
| Protection | Purpose | Apply To |
|---|---|---|
| BPDU Guard | Err-disables port if BPDU received — prevents user-connected switches from participating in STP | All access ports (endpoints, phones, printers) |
| Root Guard | Prevents a port from becoming root port — protects root bridge selection from rogue BPDUs | All distribution and core switch downlinks |
| Storm Control | Limits broadcast/multicast/unknown-unicast rate — contains storm before it takes down the network | All switch ports, especially access ports |
| Loop Guard | Prevents alternate/backup port from becoming designated if BPDUs stop — protects against unidirectional link failures | All STP redundant uplinks |
| UDLD | Detects unidirectional fiber links where one strand works and STP sees the link as up but traffic only flows one way | All fiber uplinks between switches |
STP hardening is a one-time configuration applied globally, not reactive troubleshooting. A network that has never had a broadcast storm has usually had these protections applied from the beginning. A network that has periodic unexplained outages has usually skipped them. Apply BPDU Guard and Root Guard as part of initial switch configuration, not after the first storm.