QoS Misconfiguration & Interface Flapping — 'VoIP Breaks When Busy'
Two Failure Modes, One Shift
QoS and interface flapping often appear together during high-traffic events. A link that flaps under load looks like a hardware problem — but may be a configuration or environmental issue. VoIP that only drops during business hours looks like a capacity problem — but is often a queue configuration that was never validated under real load.
- "VoIP calls are crystal clear in the morning, choppy and dropping after 9am"
- "Interface going up/down cycling — once every few minutes"
- "Video conferencing is degraded but file transfers are fine — same time of day"
- "We upgraded bandwidth but VoIP is still dropping at peak"
- "Link was stable for months, now cycling since the firmware update"
- "Specific site loses connectivity periodically — no pattern we can find"
QoS Misconfiguration — Why It Only Breaks Under Load
QoS queuing is invisible when bandwidth is available. Every packet gets forwarded immediately regardless of queue configuration when the link is not congested. The moment utilization approaches capacity, queuing decisions start to matter. Bad QoS configuration has zero impact at 30% utilization and catastrophic impact at 85%.
The QoS Validation Gap
Understanding Where Voice Fails
| QoS Condition | Effect on VoIP | What You Hear |
|---|---|---|
| No QoS — all traffic in one queue | VoIP packets wait behind large data transfers | Choppy audio, one-way audio, dropped calls under load |
| Wrong DSCP marking — voice marked as best-effort | VoIP gets no priority treatment even if QoS is configured | Same as no QoS — voice suffers when data is competing |
| Correct DSCP but wrong queue — voice in wrong priority queue | Voice gets queued but wrong priority level — inconsistent behavior | Audio is OK at low load, degrades under moderate load |
| Priority queue starving data queues (over-provisioned voice) | Data traffic cannot get through — users report slow file transfers alongside working voice | VoIP works, everything else slows dramatically |
| Jitter buffer exhaustion | Packets arrive with variable delay — buffer cannot absorb it | Robotic or warbling audio — the classic VoIP quality complaint |
Debug Sequence — Finding QoS Failures
! Step 1: Check current queue statistics — the most important command show policy-map interface GigabitEthernet0/0 ! Shows for each class: ! Class-map: VOICE ! packets: 50000 ! bytes: 8000000 ! rate: 1000000 bps ! Match: dscp ef (46) ! Queueing: Strict Priority ! queue limit: 64 packets ! (output) queue drops: 0 ← this is what matters ! (tail) drop: 0 ← drops = voice is being discarded ! In a voice quality problem: look for queue drops in the VOICE class ! If drops are non-zero: VoIP packets are being discarded = quality issues ! Step 2: Check DSCP markings on actual voice traffic ! Voice should be marked DSCP EF (Expedited Forwarding = decimal 46) show policy-map interface | include dscp|ef|class VOICE ! To verify live traffic is marked correctly — use NBAR or debug: show ip nbar protocol-discovery ! Shows what traffic NBAR is classifying — verify voice protocols are being identified ! Step 3: Check interface utilization at peak times show interface GigabitEthernet0/0 | include input rate|output rate ! If output rate is near line rate when VoIP quality degrades: ! Congestion is confirmed — QoS configuration is what determines voice behavior here ! Step 4: Check queue depths during congestion ! Poll this command during peak load: show queueing interface GigabitEthernet0/0 ! Look for: queue depth increasing for specific classes ! A queue that is constantly at its maximum depth = packets being dropped from that class ! Step 5: Check if trust is configured properly show mls qos interface GigabitEthernet0/0 ! Check: trust dscp is enabled (not trust cos or untrusted) ! If interface is "untrusted" — all DSCP markings from devices are re-marked to 0 = no QoS
Fixing QoS for Voice — What Actually Works
! Correct QoS policy structure for voice traffic:
! Step 1: Create class map that matches voice DSCP marking
class-map match-any VOICE
match dscp ef
! ef = Expedited Forwarding, DSCP 46 — standard voice marking
class-map match-any VIDEO
match dscp af41
! af41 = Assured Forwarding — video conferencing
class-map match-any SIGNALING
match dscp cs3
! cs3 = Class Selector 3 — call signaling (SIP, H.323)
! Step 2: Create policy that prioritizes voice
policy-map QOS-POLICY
class VOICE
priority percent 20
! Reserve 20% of bandwidth for voice, always served first
! Do not exceed 33% — above this, data starvation occurs
class VIDEO
bandwidth percent 30
! Guaranteed minimum for video
class SIGNALING
bandwidth percent 5
class class-default
fair-queue
! Remaining bandwidth shared fairly
! Step 3: Apply policy to the WAN/uplink interface
interface GigabitEthernet0/0
service-policy output QOS-POLICY
! Always apply on output — this is where congestion occurs
! Step 4: Configure IP phones to mark their own traffic
! On the switch port connected to a Cisco IP phone:
interface FastEthernet1/0/10
mls qos trust dscp
! Trust the DSCP markings coming from the phone
! Without this, the phone's markings are ignoredInterface Flapping — The Link That Keeps Cycling
An interface that cycles up and down — flapping — is one of the most disruptive network events. Every flap triggers routing protocol reconvergence, breaks active sessions through the interface, and can cause BGP or OSPF to drop adjacencies if flapping is fast enough.
Identifying and Categorizing the Flap
! Check carrier transitions — times the link physically went up/down: show interface GigabitEthernet0/0 | include carrier transitions|line protocol ! Check the log for flap events: show log | include changed state|line protocol ! Look for: ! "GigabitEthernet0/0 changed state to down" ! "GigabitEthernet0/0 changed state to up" ! Timestamp pattern — regular interval vs random = different causes ! Check error counters on the flapping interface: show interface GigabitEthernet0/0 ! Look for: ! Input errors: high CRC or frame errors = physical/cabling problem ! Output errors: drops or queue failures = bandwidth or hardware issue ! Runts/Giants: frame size errors = duplex mismatch or hardware fault ! Check for duplex/speed mismatch: show interface GigabitEthernet0/0 | include duplex|speed ! "Half-duplex" on a GigabitEthernet = forced half or auto-negotiation failure ! Duplex mismatch causes errors and eventual flapping under load
Root Cause Patterns for Interface Flapping
| Cause | Log Pattern | Confirming Test |
|---|---|---|
| Physical cable fault — bad connector, damaged fiber, kink | Flapping correlates with physical movement (HVAC, building vibration) | Replace cable — if flapping stops immediately, cable was the cause |
| SFP/transceiver failure | Flapping on fiber link, optics show low receive power | show interface transceiver — check Rx power levels against threshold |
| Duplex mismatch | Many CRC errors, half-duplex detected, errors increase with load | Force both sides to same speed/duplex — auto-negotiation disabled |
| Power instability on PoE port | Flapping correlates with power consumption events (phone call initiation) | show power inline — check power draw vs budget. Increase PoE budget or move to dedicated circuit |
| Keepalive failure (routing protocol) | Interface stays physically up but protocol goes down — loopback issue or BFD problem | show interface — line protocol down while hardware is up = keepalive timeout |
The Optic Power Check — Fiber Interface Diagnosis
! Check optical signal strength — most specific test for fiber link flapping show interface GigabitEthernet0/0 transceiver ! Shows: ! Tx Power: -2.5 dBm ← transmit power (what this end is sending) ! Rx Power: -12.8 dBm ← receive power (what this end is receiving) ! Temperature, Voltage, Current ! Interpreting Rx power: ! Typical SFP operating range: -3 dBm to -20 dBm ! Above -3 dBm = too much light = possible cause of errors ! Below -20 dBm = too little light = cable or SFP failure ! Near -20 dBm and fluctuating = intermittent fiber — will cause flapping ! If Rx power is near threshold — check the fiber path: ! Clean the SFP ferrule and patch cable connector (fiber contamination is common) ! Test with a known-good fiber patch cable ! Test with a known-good SFP from spare inventory ! Check optical DOM (Digital Optical Monitoring) thresholds: show interface transceiver detail ! Shows alarm thresholds — compare current values to alarm levels ! If current Rx is within 2 dBm of alarm threshold: link is marginal — will flap under load
Damping Interface Flaps at the Routing Protocol Level
While the physical issue is being investigated, route flap damping prevents the flapping interface from destabilizing the entire routing domain. This is a temporary measure — not a substitute for fixing the physical cause.
! Configure interface dampening to reduce routing protocol impact: interface GigabitEthernet0/0 dampening ! Default dampening values: ! Half-life: 5 minutes (penalty decays by half every 5 min when link is stable) ! Suppress threshold: 2000 (suppress route when penalty exceeds this) ! Reuse threshold: 750 (re-advertise when penalty drops below this) ! Max suppress time: 20 minutes ! Custom dampening for more aggressive suppression: interface GigabitEthernet0/0 dampening 5 750 2000 20 ! View current dampening status: show interface GigabitEthernet0/0 dampening ! Shows: penalty value, flap count, suppressed status ! On BGP — route dampening prevents BGP route flapping: ! (in router bgp config) bgp dampening ! Same effect — BGP prefixes from flapping neighbors are suppressed ! This is separate from interface dampening
Interface flapping that exceeds one event per minute will eventually drop OSPF adjacencies (dead timer is typically 40 seconds). BGP hold time defaults to 90 seconds — three flap events in 90 seconds drops a BGP session. Apply dampening as soon as flapping is confirmed, then investigate and fix the physical cause. A flapping interface that routes critical traffic is a major outage risk.
Course Summary — The Pattern Behind All Five Modules
Every production network failure in this course follows the same pattern: a specific trigger creates a specific failure signature, and the debug sequence follows the failure signature — not the complaint. The complaint is always "the network is broken." The failure signature is specific.
| Module | Failure Signature | First Debug Command |
|---|---|---|
| BGP/OSPF Failures | Multiple systems fail simultaneously — internet AND internal | show bgp summary — check session state and uptime |
| MTU/MSS Mismatch | Large transfers fail, small ones succeed — works on LAN, breaks on VPN | ping <dest> -f -l 1472 — confirm MTU threshold |
| Spanning Tree Loop | All interfaces show up and passing traffic, but CPU is at 100% | show interfaces — check for wire-rate broadcast traffic |
| DHCP/DNS Failures | Users get 169.254.x.x or cannot resolve names — routing is fine | show ip dhcp pool — check pool utilization and bindings |
| QoS/Interface Flapping | Voice quality degrades at peak hours — link cycling | show policy-map interface — check queue drops during load |
Course Complete