Everything you need to know about the Facebook outage
Boston University computer scientist explains what caused Facebook’s global shutdown and discusses its widespread impact
By Molly Gluck
On October 4, Facebook and its Instagram and WhatsApp platforms experienced a gigantic global outage, preventing 3.5 billion users from interacting with friends and family, sharing information and sustaining their businesses. This is just the tip of the iceberg — according to Similarweb, WhatsApp is the most popular mobile messaging app in many of the world's most populous countries, and The New York Times illuminates how in some countries “Facebook is synonymous with the Internet.” Furthermore, three million businesses actively advertise on Facebook, and were left unable to reach, engage with and sell to customers for almost an entire work day.
Facebook has separately come under fire from whistleblower Frances Haugen, a former Facebook data scientist who gathered and shared documents that formed ‘The Facebook Files,’ a Wall Street Journal series that describes how Facebook “favors elites; how its algorithms foster discord; and how drug cartels and human traffickers use its services openly.”
So what really happened during the outage — and what is the scope of impact on an interpersonal and business level? We turned to Boston University computer science and engineering expert Gianluca Stringhini for answers.
What was the root cause of the global outage? Is there any chance that malicious activity played a role?
Gianluca Stringhini: This was a spectacular series of cascading failures in multiple protocols that we normally use to retrieve data from the Web. Before contacting any website, Web browsers first contact Domain Name Server (DNS) servers, which maintain a mapping between a website’s name (for example facebook.com) and an IP address. These mappings are critical for the Web to function, because mnemonic names are easy for people to remember, but computers actually need numeric IP addresses to talk to each other. Any network in the world knows how to contact Facebook’s DNS server because the path to reach it is periodically advertised through another protocol, the Border Gateway Protocol (BGP). What happened on Monday is that a botched network update by Facebook caused its DNS servers to revoke those BGP paths, preventing any other network to reach their servers.
Essentially, when a computer would issue a request for facebook.com, it would receive nothing in return, preventing it from contacting the actual Facebook servers. While we regularly see attacks against the BGP system, where attackers actively hijack routes, it doesn’t seem to have been the case this time, and it was likely just a human error.
This outage marked the longest stretch of downtime for Facebook since 2008. How come the crash lasted so long, and could Facebook’s response have been faster/stronger?
Facebook’s network is extremely complex and geographically sparse. It likely took the engineering team time to figure out what was happening, and the errors had to be resolved by visiting data centers in person because the network to access them remotely went down. Even after the network came back up, the configuration changes needed to bring things back to normal take time to propagate across the entire Internet.
With millions of businesses worldwide depending heavily on Facebook for paid and organic promotions, sales and revenue (translating to livelihood), is it reasonable for Facebook to hold all of the power? If not, how should this conglomerate be broken up?
I think that it all lies in user demand. Facebook is the largest social network at the moment, but there are other services that are used by businesses to advertise, for example YouTube. The dynamics of power on the Web have constantly been changing, for example MySpace was in a similar position as Facebook 15 years ago but now it’s gone.
What are your thoughts on ‘The Facebook Files’ report, given your research focus on malicious activity on the Internet? Do you have recommendations for Facebook to reduce its fundamental flaws which cause harm?
A big focus in online recommender systems has always been to identify the content that users will like the most and that would generate the most engagement. Unfortunately, we came to realize that this content is often unsavory and toxic. I think that online services should switch their focus from maximizing engagement to safeguarding their users’ mental well-being, but this is not as straightforward as it sounds because it might hurt their business model.
For additional commentary by Boston University experts, follow us on Twitter at @BUexperts. You can follow Gianluca Stringhini at @gianluca_string, Boston University College of Engineering at @BUCollegeofENG, Boston University Department of Computer Science at @BUCompSci and Boston University Hariri Institute for Computing at @BU_Computing on Twitter.