Incident
|
#12 at
Cloudflare on
2019/07/02 by John Graham-Cumming (CTO)
|
Full report
|
https://blog.cloudflare.com/details-of-the-cloudflare-outage-on-july-2-2019/
|
How it happened
|
Engineer deployed a new web application firewall rule (to protect against cross-site scripting attacks). That rule required excessive CPU time to process for each request and led to a failure of all services accessed through the firewall, including services needed to mitigate the issue.
|
Architecture
|
Web application firewalls and tooling for quickly updating rules worldwide.
|
Technologies
|
|
Root cause
|
A regular expression in a web application firewall rule contained excessive backtracking.
|
Failure
|
CPU exhaustion leading to a failure in proxy, CDN and firewall services globally.
|
Impact
|
Users were served a 502 error page when visiting any affected domain.
|
Mitigation
|
Responders disabled the web application firewall component, reverted the change and re-enabled the firewall.
|