Details of the Cloudflare outage on July 2, 2019

“The real story of how the Cloudflare service went down for 27 minutes is much more complex than 'a regular expression went bad'.”
Incident #12 at Cloudflare on 2019/07/02 by John Graham-Cumming (CTO)
Full report https://blog.cloudflare.com/details-of-the-cloudflare-outage-on-july-2-2019/
How it happened Engineer deployed a new web application firewall rule (to protect against cross-site scripting attacks). That rule required excessive CPU time to process for each request and led to a failure of all services accessed through the firewall, including services needed to mitigate the issue.
Architecture Web application firewalls and tooling for quickly updating rules worldwide.
Technologies
Root cause A regular expression in a web application firewall rule contained excessive backtracking.
Failure CPU exhaustion leading to a failure in proxy, CDN and firewall services globally.
Impact Users were served a 502 error page when visiting any affected domain.
Mitigation Responders disabled the web application firewall component, reverted the change and re-enabled the firewall.