Most Read This Week
Detecting Anomalies that Matter!
Like needles in a haystack
Feb. 10, 2014 09:00 AM
As Netuitive's Chief Data Scientist, I am fortunate to work closely with some of the worlds' largest banks, telcos, and eCommerce companies. Increasingly the executives that I speak with at these companies are no longer focused on just detecting application performance anomalies - they want to understand the impact this has on the business. For example - "is the current slowdown in the payment service impacting sales?"
You can think of it as detecting IT operations anomalies that really matter - but this is easier said than done.
Like Needles in a Haystack
Larger organizations typically have access to voluminous data being generated from dozens of monitoring tools that are tracking thousands of infrastructure and application components. At the same time, these companies often track hundreds of business metrics using a totally different set of tools.
The problem is that, collectively, these monitoring tools do not communicate with each other. Not only is it hard to get holistic visibility into the performance and health of a particular business service, it's even harder to discover complex anomalies that have business impact.
Anomalies are Like Snowflakes
Enter IT Operations Analytics
The Shift Towards IT Operations Analytics is Already Happening
Several years ago thought-leading enterprises (primarily large banks with critical revenue driving services) began experimenting with a new breed of IT analytics platform. These companies' electronic and web facing businesses had so much revenue (and reputation) at stake that they needed to find the anomalies that matter -- the ones that were truly indicative of current or impending problems.
Starting with an almost "blank slate", these forward-thinking companies began developing open IT analytics platforms that easily integrated any type of data source in real time to provide a comprehensive view of patterns and relationships between IT infrastructure and business service performance. This was only possible with technologies that leveraged sophisticated data integration, knowledge modeling, and analytics to discover and capture the unique behavior of complex business services. Anything less would fail, because, like snowflakes, no two anomalies are alike.
The Continuous Need for Algorithm Research
For this reason, analytics research is an ongoing endeavor at Netuitive - part driven by customer needs and in part by advances in technology. Once Netuitive technology is installed in an enterprise and integrating data collected across multiple layers in the service stack, behavior learning begins immediately. As time passes, the statistical algorithms have more observations to feed their results and this leads to increasing confidence in both anomalies detected and proactive forecasts. Additionally, customer domain knowledge can be layered in to Netuitive's real-time analysis in the form of knowledge bases and supervised learning algorithms. The Research Group at Netuitive works closely with our Professional Services Group as well as directly with customers to regularly review actual delivered alarm quality to tune the algorithms that we have as well as identify new algorithms that would deliver greater value in an actionable timeframe.
Since Netuitive's software architecture allows for "pluggable" algorithms, we can incrementally introduce new analytics capabilities easily, at first in an experimental or laboratory setting and ultimately, once verified, into production.
The IT operations management market has matured over the past two decades to the point that most critical components are well instrumented. The data is there and mainstream IT organizations (not just visionary early adopters) realize that analytics deliver measurable and tangible value. My vision and challenge is to get our platform to the point where customers can easily customize the algorithms on their own, as their needs and IT infrastructure evolve over time. This is where platforms need to get to because of the endless variety of ways that enterprises must discover and remediate "anomalies that matter".
Stay tuned. In an upcoming blog I will drill down on some specific industry examples of algorithms we developed as part of some large enterprise IT analytic platform solutions.
Subscribe to the World's Most Powerful Newsletters
Today's Top Reads