Most Read This Week
Wasted IT Resources
Resource consolidation through newer architectures and solutions like PaaS is a way to tackle this
By: Phil Whelan
May. 4, 2014 08:00 AM
David Rubinstein's post "Industry Watch: Be resilient as you PaaS" makes a very good point about underutilized hardware in IT data-centers.
Millions of enterprise workloads remain in data centers, where servers are 30% to 40% underutilized, and that's if they're virtualized. If not, they're only using 5% to 7% of capacity.
The reason for this?
Take, for example, servers that are spun up for a project two years ago that were never decommissioned, just sitting there, waiting for a new workload that will never come. And, because the costs of blades and racks went down, cheap hardware has led to a kind of data center sprawl.
So is IT buying new hardware for each new project and not recycling? Is this because projects do not clearly die? Do we need "time of death" announcements mandated on IT projects?
Late last year, the US Government Accountability Office reported that the Federal IT spends 70% of its IT budget on care for legacy systems. That's $56 billion! How many of those legacy systems are rusty old projects with low utilization and are over-consuming resources?
Who wants those old servers? They might only be 2 years "old," but each new project has its own requirements and trying to retrofit old hardware to shiny new projects is probably undesirable. A project has a budget and specifications. A project needs X number of machines, with Y gigabytes of RAM and numerous other requirements. Can second-hand hardware fit the bill? If it can, can we find enough of it? Nobody wants a Frankenstein's monster of a cluster or the risk that it would be difficult to grow the cluster in the future. Better to go with new servers.
But does IaaS go far enough?
IaaS provides infrastructure to be used by projects. But as David Rubinstein points out, even virtualized machines are still 30% to 40% underutilized. This is because virtual machines are provisioned as a best guess of what might be required by the software that is intended to run on it. The business does the math of the traffic they hope to receive and then adds some padding to be safe. They pass the numbers to Operations who provision the machines. Once a machine is provisioned those resources are locked in.
Virtualized or not, adding and removing servers to a cluster can be logistically difficult if the application is not designed with scale in mind. When it comes to legacy IT, it is probably better to leave as-is than to invest more resources and risk in a project that is receiving no engineering attention.
Configuration management helps us to wrangle and make sense of our cattle. It whips our machines into some kind of uniform shape with the intention of moving them all in the same direction, but not without considerable sweat and tears. Snowflake servers that did not receive that last vital update, for reasons unknown, bring disease to the herd.
In Andrew C. Oliver's InfoWorld post "The platform-as-a-service winner is ... Puppet", he makes a good point that IT organizations are also "not a beautiful or unique snowflake".
Whatever you're doing with your IT infrastructure, someone else is probably doing the exact same thing. If what you're doing is so damn unique, then you're probably adding needless layers of complexity and you should stop.
Is configuration management the answer? It enables every organization to build beautiful unique snowflakes, but all the business actually wants is an igloo.
If we encapsulate the complexity of software applications and services in containers then the host operating system becomes much simpler, less of a concern and more standardized. If we can find an elegant way to spread the containers across our cluster then we only need to provision a cluster of thin standardized servers to house them.
Filling a cluster with virtual machines, where each virtual machine is dedicated to a single task is wasteful. It's like trying to utilize the space in a jar by filling it with marbles, when we could be filling it with sand. Depending on your hardware and virtual machine relationship, it might even be like filling it with oranges.
Containers are also portable - and not just in a deployment sense. In a recent talk on "Google Compute Engine and Docker", Marc Cohen from Google spoke about how they migrate running GAE instances from one datacenter to another, while only seeing a flicker of transitional downtime. vSphere also has similar tools for doing this with virtual machines. Surely, it is only a matter of time before we see this commonly available with Linux containers.
Just as we have stopped caring about the unique name of individual servers, we should stop caring about the details of a cluster. When I have a machine with 16Gb of RAM, do I need to care if it's 2x8Gb DIMMS or 4x4Gb? No, I only need to know that this machine gives me "16Gb of RAM". Eventually we will view clusters in the same way.
Now that we can fill jars with sand instead of oranges, we can be less particular about the size and shape of the jars we choose to build our cluster. Throwing away old jars is easier when we can pour the sand into a new jar.
The "why" is ultimately a business reason - which is a long way from the command-line prompt.
Todd Underwood's excellent talk on "PostOps: A Non-Surgical Tale of Software, Fragility, and Reliability" highlights the SLA as the contract that is made with the business and ensures that the Operations team is delivering the throughput, response time and uptime that the business expects. Beyond that they have free reign to ensure that the SLA is met.
Too many SysAdmins and Operations Engineers are stuck in the mindset of machines and operating systems. They have spent their careers thinking this way and honing their skills. They focus too much on the "what" and the "how", not on the "why".
At some level these skills are vital. The harder you push, the more likely it is that you will find a problem further down the stack and have to roll up your sleeves and get dirty.
Operations teams are responsible every level of stack, from the metal upwards. Even in the age of "the cloud" and outsourcing, an Operations team that does not feel that they themselves are ultimately responsible for their stack will have issues at some point. An example of this is Netflix, who use Amazon's cloud infrastructure service and may actually understand it better than Amazon. They take responsibility for their stack very seriously and because of this they are able to provide an amazingly resilient service at incredible scale.
Focusing solely on machines and operating systems does not scale and too many IT departments find it difficult to transcend this. Netflix are a good example here too. When you have tens of thousands of machines you are not going worry about keeping them all in sync or fixing machines that are having issues unless it is pandemic. Use virtualization, create golden images, if a machine is limping - shoot it. This is how Netflix operates. And to ensure that their sharp-shooters are always at the top of their game, Netflix uses tools like Chaos Monkey. They are purposely putting wolves amongst their sheep.
DevOps is a movement that aims to liberate IT from its legacy mindset. It aims to step back and take a look at the bigger picture. View business needs. Address the needs that span across the currently siloed Dev and Ops. This may be through fancy new orchestration and CI tools or be through organizational changes.
Tools and infrastructure are evolving so quickly that the best tool today will not be the best tool tomorrow. It is difficult to keep up.
Todd Underwood said that Operations Engineers, or rather Site Reliability Engineers, at Google are always working just outside of their own understanding. This is the best way to keep moving forward. From this, I take that if you fully understand the tools you are using, you are probably moving too slowly and falling behind. Just do not move so quickly that you jeopardize operational safety.
As long as we focus on "why", we can keep building our stack around that purpose instead of around our favorite naming convention, our favorite operating system and favorite toolset.
As Troy Topnik points out in a recent post IaaS is not required to run Stackato. Although an IaaS will make managing your cluster easier as you grow.
But will an IaaS help with managing your resources at the application level? No. Will it help identify wasted resources? No. Will it put you directly in touch with the owners of the applications? No. But PaaS will.
Lack of visibility comes from not being able to see or reason about machines and projects. Machine responsibilities may be cataloged by IT, but will it be visible by all involved? Building the relationship between machines, processes and projects is still a documentation task without PaaS.
If redundant applications are not visible to the entire organization, then those accountable for announcing "time of death" will less likely make that call. Old projects which should be end-of-lifed will continue to over-consume their allocated resources - resources that were allocated with the hope of what the project may one day become.
Source: ActiveState, originally published, here.
Reader Feedback: Page 1 of 1
Subscribe to the World's Most Powerful Newsletters
Today's Top Reads