Most Read This Week
Amazon Web Services and the NetFlix Fix?
Do not be scared of clouds, however be ready, do your homework, learn and understand what needs to be done or done differently
By: Greg Schulz
Jul. 11, 2012 06:00 AM
I received the following note from Amazon Web Services (AWS) about an enhancement to their Elastic Compute Cloud (EC2) service that can be seen by some as an enhancement to service or perhaps by others after last weeks outages, a fix or addressing a gap in their services. Note for those not aware, you can view current AWS service status portal here.
The following is the note I received from AWS:
Either way you look at it, AWS (disclosure I'm a paying EC2 and S3 customer) is taking responsibility on their part to do what is needed to enable a resilient, flexible, scalable data infrastructure. What I mean by that is that protecting data and access to it in cloud environments is a shared responsibility including discussing what went wrong, how to fix and prevent it, as well as communicating best practices. That is both the provider or service along with those who are using those capabilities have to take some ownership and responsibility on how they get used.
For example, last week a major thunderstorms rolled across the U.S. causing large-scale power outages along the eastern seaboard of the U.S. and in particular in the Virginia area where one of Amazons availability zones (US East-1) has data centers located. Keep in mind that Amazon availability zones are made up of a collection of different physical data centers to cut or decrease chances of a single point of failure. However on June 30, 2012 during the major storms on the East coast of the U.S. something did go wrong, and as is usually the case, a chain of events resulted in or near a disaster (you can read the AWS post-mortem here).
The result is that AWS based out of the Virginia availability zone were knocked off line for a period which impacted EC2, Elastic Block Storage (EBS), Relational Database Service (RDS) and Elastic Load Balancer (ELB) capabilities for that zone. This is not the first time that the Virginia availability zone has been affected having met a disruption about a year ago. What was different about this most recent outage is that a year ago one of the marquee AWS customers NetFlix was not affected during that outage due to how they use multiple availability zones for HA. In last weeks AWS outage NetFlix customers or services were affected however not due to loss of data or systems, rather, loss of access (which to a user or consumer is the same thing). The loss of access was due to failure of elastic load balancing not being able to allow users access to other availability zones.
Consequently, if you choose to read between the lines on the above email note I received from AWS, you can either look at the new service capabilities as an enhancement, or AWS learning and improving their capabilities. Also reading between the lines you can see how some environments such as NetFlix take responsibility in how they use cloud services designing for availability, resiliency and scale with stability as opposed to simply using as a cost cutting tool.
Thus when both the provider and consumer take some responsibility for ensuring data protection and accessibility to services, there is less of a chance of service disruptions. Likewise when both parties learn from incidents or mistakes or leverage experiences, it makes for a more robust solution on a go forward basis. For those who have been around the block (or file) a few times thinking that clouds are not reliable or still immature you may have a point however think back to when your favorite or preferred platform (e.g. Mainframe, Mini, PC, client-server, iProduct, Web or other) initially appeared and teething problems or associated headaches.
IMHO AWS along with other vendors or service providers who take responsibility to publish post-mortem's of incidents, find and fix issues, address and enhance capabilities is part of the solution for laying the groundwork for the future vs. simply playing to a near term trend theme. Likewise vendors and service providers who are reaching out and helping to educate and get their customers to take some responsibility in how they can use services for removing complexity (and cost) to enhance services as opposed to simply cutting cost and introducing risk will do better over the long run.
As I discuss in my book Cloud and Virtual Data Storage Networking (CRC Press), do not be scared of clouds, however be ready, do your homework, learn and understand what needs to be done or done differently. This means taking a shared responsibility one that the service provider should also be taking with you not to mention identifying new best practices, tools to be used along with conducting proof of concepts (POCs) to learn what to do and what not to do.
Some related information:
Ok, nuff said for now.
All Comments, (C) and (TM) belong to their owners/posters, Other content (C) Copyright 2006-2012 StorageIO All Rights Reserved
Reader Feedback: Page 1 of 1
Subscribe to the World's Most Powerful Newsletters
Today's Top Reads