Incident
|
#36 at
Amazon Web Services on
2016/06/04 by AWS Team
|
Full report
|
https://aws.amazon.com/message/4372T8/
|
How it happened
|
Sever weather caused a reduction of utility power at one datacenter (experienced initially as an unusually long voltage sag rather than a complete outage). A set of breakers that would have opened in a complete outage did not open soon enough and the stored power (in the DRUPS) drained into the power grid and the DRUPS system shutdown leaving many visual instances with no power.
|
Architecture
|
Virtual servers (EC2) and attached storage units (EBS)
|
Technologies
|
Amazon Elastic Block Store (EBS), Diesel Rotary Uninterruptable Power Supply (DRUPS), Amazon Elastic Compute Cloud (EC2)
|
Root cause
|
Power redundancy system failed to open breakers during a loss of power from utility (initially an unusually long voltage sag and then a complete outage).
|
Failure
|
Loss of power to a significant number of instances led to unavailable (virtual) servers and attached storage units.
|
Impact
|
Customers and auto-scaling systems could not launch new virtual servers (until responders manually failed over to other availability zones).
|
Mitigation
|
Once responders determined it was safe to do so, they re-engaged the power line-ups, and then some virtual servers and storage units needed to be manually restored.
|