Incident
|
#17 at
Stack Exchange on
2014/08/25
|
Full report
|
https://stackstatus.net/post/96025967369/outage-post-mortem-august-25th-2014
|
How it happened
|
Defective configuration change pushed to firewall on load balancers, without testing, which prevented load balancers from connecting with web servers since response traffic (from web servers) was blocked.
|
Architecture
|
HAProxy load balancers (with firewall) and IIS web servers (maintained through Git).
|
Technologies
|
HAProxy Load Balancer, Internet Information Services (IIS), Puppet
|
Root cause
|
A defective iptable configuration change to the firewall on the load balancers. 3
|
Failure
|
The load balancers were not able to complete connections to the web servers because response traffic for those connections was being blocked by the firewall.
|
Impact
|
All requests to web servers (through load balancers) failed causing a complete outage.
|
Mitigation
|
Reverted the change and manually ran the configuraion update (using Puppet) rather than wait from normal automatic updates.
|