I survived my first Cyber Monday and came away with a new appreciation for how difficult it can be when working with applications at scale. Thankfully our data centers performed well and held up under the extreme load. We had a few database issues with one of our brands but the others all ran fine without any significant problems. The excitement started around 8AM yesterday but it really picked up just after 8PM a traditionally busy hour on Cyber Monday when we noticed a significant increase in the number of users hitting the site. At that time we started splitting traffic across multiple data centers to try and alleviate the compute load – Internet bandwidth wasn’t the bottleneck. As midnight drew closer the load on all our data centers surged as shoppers tried to cash in on the sales and deals. Here’s a combined graph of all Internet traffic for yesterday across all our Internet Service Providers. Thankfully our sites are front-ended by a Content Distribution Network so we only see the traffic to/from the origin servers and we’re spared the actual edge bandwidth.
Looking at some of the stats from our CDN provider they served up 233 million page views on Monday December 1, 2014 totaling some 31TB of data. To put that into comparision, last Monday we only had 130 million page views totaling some 14.8TB of data, that’s a 179% increase in page views and a 209% increase in edge bandwidth. Looking at the graph above you can see that we peaked around 450Mbps outbound. That’s still a lot of data when you remember that the majority of caching is going on in the CDN, and that traffic is just the raw HTML and XML data for the category pages and the shopping cart. Mobile Apps were a big hit this year, as projected in the retail industry. They also caused us some significant performance issues as Apple Push notifications went out to 50K+ users at a time the site would start to break under the load only to recover shortly thereafter.
While there are still quite a few shopping days left this holiday season a few of the folks in the war room were already talking about preparing for next year. I was speaking to a few of the developers who are eager to look into utilizing Docker in their staging and development environments. While I’ve played with Docker a little myself need to figure out how it interacts with the network.
All in all it was a rewarding but very tiring experience. It was impressive to work with such a talented and dedicated team all working tirelessly to make sure that the lights stayed out throughout the past 4 days. I’m excited and look forward to what we’ll be able to-do together in the future.
Note: This is a series of posts made under the Network Engineer in Retail 30 Days of Peak, this is post number 8 of 30. All the posts can be viewed from the 30in30 tag.