Incident
|
#31 at
Google on
2015/08/13
|
Full report
|
https://status.cloud.google.com/incident/compute/15056#5719570367119360
|
How it happened
|
Four successive lightning strikes on the local electric grid that powers the datacenter caused a brief loss of power to storage systems. Some of the storage systems were more susceptible to power failure and failed.
|
Architecture
|
Virtual machine instances and attached persistent disks.
|
Technologies
|
Google Compute Engine (GCE), Standard Persistent Disks
|
Root cause
|
Local power outage to storage hardware susceptible to power failure.
|
Failure
|
Storage systems failed, including some containing data not yet saved to stable storage.
|
Impact
|
5% of persistent disks in region sporadically returned I/O errors to attached virtual machines instances; errors during some disk management operations (eg, snapshot creation); some permanent data loss in small number of cases (0.000001% of persistent disks in region) for data that had not yet been written to stable storage.
|
Mitigation
|
Engineers recovered most of the data through snapshots and other operations.
|