Amazon Prime day came and went, and frustrated many-a-people as they attempted to log onto the Amazon app to purchase echo dots, vitamins, and things that had been in their carts for weeks (I’m not the only one who has 100+ things in the “save for later” right?). What did they see instead of the goods of their choosing with 2-day shipping (prime or nothing!)?
Don’t get me wrong, I love cute dogs as much as the next guy. In fact, I found myself refreshing my app just to see more images of puppies.
— Mary Hart (@MaryEHHart) July 16, 2018
Okay, if we are being completely honest, I had no intention of purchasing anything on Prime Day and kept an eye on the servers and their capacity. AWS (Amazon Web Services) the hosting provider of Amazon is the industry leader in cloud computing, with more regions, availability zones, and data centers than any other cloud provider in the world. They constantly use the Black Friday example as a reason to switch your hosting to the Public Cloud to avoid site crashes, overloads, and downtime.
So what went wrong?
While I don’t know for certain what to make of the AWS fumble of Prime Day, I do have my thoughts. (I want to make very clear, as an AWS Partner we do not have any insight on this event and have not reached out to AWS for comment on this event. This below is entirely my opinion on what happened and how to avoid it happening to you)
This is the easily explanation as to why the site was down. Reserved instances are one of the most affordable ways to manage your environment, and often Solutions Architects can look at the trends of an environment, predict workload, and purchase reserved instances to ensure that the environment is up and running at the most affordable price (reserved instances are offered at a discounted price to on-demand instances).
The danger of Reserved Instances? Well, guessing (predicting, hypothesis, etc.) at demand for your server workload is not always easy when you are dealing with something like Black Friday, Prime Day, or other major sales. When you reach your capacity of your reserved instances, they slow and overload, the same as a Private Cloud, with the error page being displayed (in this instance is was puppies).
So how did this happen to Amazon? They guessed wrong.
Feedvisor reported a 5 percent drop at 1 p.m. PT, an hour into the sale and right when many were complaining about outages, however sales recovered quickly after that and were more than double the year before at 4 p.m. PT. To put that into perspective, Prime Day 2017 brought in an estimated $2.5 billion to $2.9 billion in global sales, including sales from third-party vendors (according to financial firm Cowen).
It is being reported that Amazon may have surpassed the $5 billon mark in global sales this year – so it is possible that even Amazon wasn’t entirely certain or ready for their Prime Day.
AWS is the leader in cloud computing, so to assume that they got their own gig wrong is ify (but not impossible) so what are the alternatives?
Well, the truth is Prime Day offers unique challenges for even Amazon, since the sale is consolidated to a very short period of time — not a whole shopping season — and kicks off at a set period across the globe (instead of a phased time of say 12am local time) and is focused on just one retailer.
To add fuel to that fire, Amazon added four new countries to Prime Day this year: Australia, Singapore, Netherlands and Luxembourg.
This all plays into a big wave of shoppers, globally, that arrive at Amazon right at the same time — beginning of the sale– which seemed to temporarily overwhelm the AWS capacity. Additionally (oh yes, there’s more), Prime Day offers its own set of over 1 million deals globally and a customized homepage specifically for the sale/day, adding to the make up of operating, developing, and maintaining Prime Day.
“We’ve seen this sort of thing happen on Black Friday and Cyber Monday in the past with other retailers,” said eMarketer retail analyst Andrew Lipsman. “In many ways, it’s a double edged sword. On the one hand, you don’t want to erode consumer trust by having an outage. On the other side, it’s often a reflection of higher-than-expected demand.”
Amazon deals with 1 million global deals, through hundreds of thousands of vendors. It is possible that some of those vendors had configuration issues that could have played into the part of some of the loop and cart issues that were seen within some of the users experience. While this is likely more a part of what went wrong, and likely didn’t play a major role in the overall error messages – when you are dealing with influx of traffic, globally, every little bit of code counts – and just the slightest error or warning can then turn into a major bug.
While many made jokes about puppies and frustrations grew, with some people tweeting that they would “never use Amazon again” (I’m sure…), Amazon again came out on top with limited downtime globally (sure Canada and America saw extended downtime, but I don’t know a single person who didn’t get the bottle opener or echo dot of their liking).
While some vendors may balk at the claims AWS makes of their cloud computing services and may point to this as to how AWS might be unreliable. The truth of the matter is, without cloud computing there would have been no uptime, and even more outages. It would have been generally impossible for a vendor to manage the traffic of Prime Day without cloud computing, and AWS is still the world leader, ahead of Microsoft Azure (who does have more global reach, but less data centers) and light years ahead of Google Cloud, who I mention only to remind people not to use Google Cloud.
Want to talk more about this outage and how to learn from it in time for Black Friday or your next big sale? Lets Talk, about your cloud or how to bring your company, organization, or proud there. Or feel free to download our Cloud 101: Introduction to the cloud White paper for more information on cloud computing.
Author: Roy Edwards
Source: Capitol Presence