Real-World Cloud Computing
Storms in the Cloud
No technology negates the need for proper planning and the cloud is no different
Apr. 14, 2014 09:45 AM
There are things we tend to take for granted in our everyday lives. We have certain expectations that don't even have to be spoken, they're just a given. If you walk into a room and turn on the light switch, the lights will go on, it's assumed. If you turn the water faucet on, water will come out; if you pick up the telephone, there will be a dial tone. The concept of any of those things not happening does not enter the conversation. These are services we have that are ubiquitous; we don't even think about them - they are just there.
In recent years people have seen the impact Mother Nature has had on those core services such as electricity, water and phone, Storms, hurricanes, floods and blizzards have taken our expectations of these services and turned them on their head.
Cloud Computing, the New Light Switch
Cloud computing has become pervasive in both our personal and business lives; you cannot have a conversation about technology without the word "cloud" in it.
On a personal level, our music players are streaming from the cloud, our tablets and eReaders are getting books from the cloud, our TVs are streaming video from the cloud and our smart phones and PCs are being backed up to the cloud. Google has glasses that connect you to the cloud and Samsung just came out with a watch that connects you to the cloud. Like the electricity and water in your home, the cloud is always there - at least that has become the perception and expectation.
On a business level, our expectations are influenced by our personal exposure and experiences with technology. There is an assumption that by going to the cloud, the services provided will always be there, like the light switch.
Recent Heavy Weather in the Cloud
Cloud services and service providers do enhance those expectations. By dispersing applications across multiple servers and multiple data centers, the technology implementations allow for higher levels of fault tolerance. The risk is that the higher levels of complexity needed to implement these infrastructures introduce new potential ‘technology storms' that can expose a business to unexpected failures and outages.
One need only read the headlines of public cloud outages over the last year whether it be NASDQ, Amazon, Google, and numerous other providers to understand that going to the cloud does not come with 100% availability, and that comes with a cost.
- In January of this year, DropBox experienced an outage due to a ‘routine maintenance episode' on a Friday evening. Customers experienced 2-5 hour loss of access to services, some lasting into the weekend.
- In August of last year, NASDAQ was shut down for 3 hours. The root cause was determined to be a ‘data flood' on requests that peaked at 26,000/sec, (26 times normal volumes) that exposed a software flaw that prevented the fail-safes from being triggered to allow operations to continue.
- In that same month, Google experienced an outage of their services that only lasted 4 minutes. In that short period of time, Internet traffic dropped by 40%. (The fact the outage only lasted 4 minutes speaks well of Google's recovery plans and services.)
- On January 31st, 2013, Amazon had an outage that lasted only 49 minutes. The estimated cost to Amazon in lost sales for that 49 minutes is estimated to be between $4-$5M dollars. (Several other companies that utilize Amazon's services, such as Netflix, also experienced the impact of this outage.)
- As far back as two years ago, a large portion of the State of Maryland's IT services to the public were down for days due to a double failure in the storage sub-systems and their failover systems. No system is immune.
Planning for Availability and Recoverability
Going to the cloud does not in and of itself provide high availability and resiliency. Like any technology architecture, these capabilities need to be designed in and come with a cost. Higher availability has always required more effort and associated costs, and going to the cloud alone does not necessarily provide what your business is expecting from that light switch.
When moving to cloud architectures, whether they are public or private, business needs and expectations around availability and resiliency must be defined and understood. You cannot take for granted that by being in the cloud the needs will be met. Due diligence must still be performed.
- When going to the public clouds, you need to make sure the availability requirements from the business are included in the SLAs with the cloud vendor.
- When building a private cloud network, it is incumbent on the IT organization to ensure the needs and requirements are baked into the design and implementation of that infrastructure, and that expectations with the business are properly set and understood.
- Risk mitigation plans need to be developed and in place before outages occur, as even the best infrastructure may still have a failure (such as the State of Maryland). Going to the cloud does not negate the need to develop and have a business continuity plan.
- If working with a public cloud provider, this is a joint effort, not solely the vendor's responsibility or yours. Vendors will have their own set of plans, and you must dovetail yours with theirs. Make sure you understand what they have in place before signing on the dotted line.
No technology negates the need for proper planning and the cloud is no different. Ultimately, weathering the technological natural disasters in the cloud is accomplished just like we weather those of Mother Nature, prepare a plan, so when the storm does hit, you can make it out the other side.