Cloudflare initially believed it was under a DDoS attack due to an incident triggered by a database permissions change. The change caused the size of a 'feature file' used by its Bot Management system to double, exceeding file size limits and causing system failures. The issue was compounded by fluctuating configuration files, leading to intermittent recovery before ultimately failing.
was a change to database permissions, and that the company initially thought the symptoms of that adjustment indicated it was the target of a “hyper-scale DDoS attack,” before figuring out the real problem.
that explains the incident was “triggered by a change to one of our database systems' permissions which caused the database to output multiple entries into a ‘feature file’ used by our Bot Management system.” The file describes malicious bot activity and Cloudflare distributes it so the software that runs its routing infrastructure is aware of emerging threats. Changing database permissions caused the size of the feature file to double and grow beyond the file size limit Cloudflare imposes on its software. When that code saw the illegally large feature file, it failed. And then it recovered – for a while – because when the incident started Cloudflare was updating permissions management on a ClickHouse database cluster it uses to generate a new version of the feature file. The permission change aimed to give users access to underlying data and metadata, but Cloudflare made mistakes in the query it used to retrieve data, so it returned extra info that more than doubled the size of the feature file.“Bad data was only generated if the query ran on a part of the cluster which had been updated. As a result, every five minutes there was a chance of either a good or a bad set of configuration files being generated and rapidly propagated across the network,” Prince wrote.AWS outage exposes Achilles heel: central control plane “This fluctuation made it unclear what was happening as the entire system would recover and then fail again as sometimes good, sometimes bad configuration files were distributed to our network,” Prince wrote. “Initially, this led us to believe this might be caused by an attack. Eventually, every ClickHouse node was generating the bad configuration file and the fluctuation stabilized in the failing state.” That “stabilized failing state” happened a few minutes before 13:00 UTC, which was when the fun really started and Cloudflare customers started to experience persistent outages. Cloudflare eventually figured out the source of the problem and stopped generation and propagation of bad feature files, then manually inserted a known good file into the feature file distribution queue. The company then forced a restart of its core proxy so its systems would read only good files.“An outage like today is unacceptable,” he said. “We've architected our systems to be highly resilient to failure to ensure traffic will always continue to flow. When we've had outages in the past it's always led to us building new, more resilient systems.”Hardening ingestion of Cloudflare-generated configuration files in the same way we would for user-generated inputEliminating the ability for core dumps or other error reports to overwhelm system resourcesSenators propose to let users sue tech giants for harmful algosFord rolls into the Xen Project as hypervisor gears up for autosCommodity memory prices set to double as fabs pivot to AI market
Cloudflare Database Ddos Bot Management Permissions
United Kingdom Latest News, United Kingdom Headlines
Similar News:You can also read news stories similar to this one that we have collected from other news sources.
Cloudflare down: Facebook and X among apps not working after major network outageMultiple social media websites are down for thousands of users, including Facebook and X, as well as music streaming app Spotify.
Read more »
Cloudflare outage hits UK web traffic - what the service actually doesMany services are reportedly going down due to an unknown error
Read more »
Cloudflare explained as a fifth of web traffic thought to be downCloudflare, a major global web services provider, has confirmed its services have crashed, leading to 'widespread errors' for internet users
Read more »
Cloudflare coughs, half the internet catches a coldBreaking: Outage leaves users staring at error pages while recovery crawls along
Read more »
Cloudflare explained as almost one fifth of the internet could currently be downCloudflare has confirmed that its services are down which has led to 'widespread online errors' across the globe
Read more »
Cloudflare Outage Disrupts Web Access and Raises Privacy ConcernsA major outage at Cloudflare, a prominent internet infrastructure provider, disrupted web access for millions of users, exposing them to potential privacy risks and highlighting the internet's reliance on centralized services. The incident affected numerous popular websites and services and prompted experts to warn about the vulnerability of the modern internet.
Read more »



