Incident
|
#37 at
Duo on
2018/08/29
|
Full report
|
https://status.duo.com/incidents/4w07bmvnt359
|
How it happened
|
A surge of inbound requests exceeded database capacity and were queued by the application. Even after the traffic subsided the queued requests prevented the database from recovering (ie, capacity was still exceeded).
|
Architecture
|
A fleet of application servers (running an authentication service) backed by a relational database.
|
Technologies
|
|
Root cause
|
Traffic exceeded database capacity; and a queuing strategy for failed database requests.
|
Failure
|
Database requests failed and were were queued on the application servers.
|
Impact
|
Increased authentication latency and intermittent request timeouts for all customer applications.
|
Mitigation
|
Application request queues were flushed.
|