05/25/2017

Website Downtime: The Biggest Outages of 2016

From unexpected spikes in traffic to massive DDoS attacks, 2016 saw more than its fair share of  major bouts of website downtime. As we all become more and more aware of the threats that such downtime pose to our businesses, it would seem that these episodes of downtime would fall to the wayside. If 2016 is any indication, however, downtime is just as prevalent as ever. Here we will discuss some of the biggest outages of 2016 and how they could have been prevented, helping you ensure that your website doesn't face these issues unprepared in the years to come.

The Takedown of Dyn

DDOS attacks seemed to get increasingly worse in 2016 and began to expose just how much we depend on critical infrastructures, such as DNS. This was all too evident on October 21st when the largest DDoS attack known to date was launched, becoming responsible for bringing down social networks, SaaS companies, media websites, gaming websites, consumer product websites, and more. The attack was a series of three large-scale attacks triggered against the managed DNS provider known as Dyn. This particular attack affected more than 1,200 domains, predominantly across North America and Europe. Anyone who relied on Dyn for DNS services was vulnerable at the time and many were severely impacted.

How do you cope when someone is hell-bent on taking such a big target down and your website becomes part of the collateral damage? In this type of scenario, websites that had load-balanced their DNS name servers across multiple providers were able to fall back on secondary vendors during the attack. When the problem arose, a website monitoring service could have given the alert and the failsafe could have been put into place. With DDoS attacks still on the rise, this is definitely something any website owner should consider, if the measures have not already been implemented, as we move into the second half of 2017.

When PokemonGo was Pokemon No

In the summer of 2016 everyone seemed to be going crazy over PokemonGo. To the dismay of Pokemon trainers everywhere, on both July 16th and July 20th, players couldn't catch or train their favorite Pokemon. The first outage, ushered in by increased packet loss over a period of four hours, was caused by a combination of the network architecture and overloaded servers. The second outage was caused by a software update that resulted in users being unable to log in and game content being incomplete.

Both of these incidences are, again, perfect examples of why redundancies and proper planning are crucial to the success and continued uptime of any online business no matter what form that business may take. Had quality monitoring been in place to catch the decrease in performance when the server began to overload, proper steps could have been put into place to recover more quickly. Had the software update been tested on a test server before being rolled out, the July 20th downtime could have been avoided altogether. While it is important to have quality website monitoring in place to notify you when a problem occurs, it is also important to have benchmarking and capacity planning for network operations. Test your network prior to new software updates and reinforce your network architecture through CDN vendors.

The ASOS Website Crash                                                                                                                     

In June of 2016 ASOS, the popular British clothing website, went down for well over 24 hours after the Brexit referendum passed (the referendum to leave the European Union). The official word from ASOS was that the crash was due to a power outage at one of its third-party data centers. However, the timing of the website downtime has led many to wonder if the real issue was that a serious influx of traffic caused the crash of the website.

If it was a power outage, redundancies need to be put in place for this website, including multiple servers located across multiple server locations. If power goes out in one location, another location can easily pick up the slack.

The Canadian Immigration Panic

Another instance of politics affecting website downtime could be seen during the presidential election of 2016. On November 8th, the Canadian Immigration website was brought down as United States citizens panicked during the close of the presidential polls. As more and more states closed and results began coming in, the immigration website started choking before it finally just gave out due to the spike in traffic that it wasn't ready to handle. Again, this is another instance in which proper capacity planning and real-time scalability would have been essential to the continued uptime of the website.

Looking Toward the Future

While 100 percent uptime may not be a realistic goal, we can clearly see where these instances of downtime could have been decreased in severity if not altogether eliminated with proper planning, tools, and failsafes. As we soon head into the second half of 2017, let us reflect on the lessons learned during 2016 and make sure our websites don’t fall prey to the same mistakes that caused some of the biggest instances of downtime during that year.