Most Read This Week
Shots Across the Data Lake
Big Data Analytics Range War
By: Tim Negris
Feb. 7, 2014 11:00 AM
About a century later, with the first tech land rush of the late1980s and early '90s - before the Web - came battles between those who wanted software and data to be centrally controlled on corporate servers and those who wanted it to be distributed to workers' desktops. Oracle and IBM versus Microsoft and Lotus. Database versus Spreadsheet.
Now, with the advent of SoMoClo (Social, Mobile, Cloud) technologies and the Big Data they create, have come battles between groups on different sides of the "Data Lake" over how it should be controlled, managed, used, and paid for. Operations versus Strategy. BI versus Data Science. Governance versus Discovery. Oversight versus Insight.
The range wars of the Old West were not a fight over property ownership, but rather over access to natural resources. The farmers and their fences won that one, for the most part.
Those tech battles in the enterprise are fights over access to the "natural" resource of data and to the tools for managing and analyzing it.
In the '90s and most of the following decade, the farmers won again. Data was harvested from corporate systems and piled high in warehouses, with controlled accessed by selected users for milling it into Business Intelligence.
But now in the era of Big Data Analytics, it is not looking so good for the farmers. The public cloud, open source databases, and mobile tablets are all chipping away at the centralized command-and-control infrastructure down by the riverside. And, new cloud based Big Data analytics solution providers like BigML, Yottamine (my company) and others are putting unprecedented analytical power in the hands of the data ranchers.
A Rainstorm, Not a River
Big Data is more like a relentless rainstorm - falling heavily from the cloud and flowing freely over and around corporate boundaries, with small amounts channeled into analytics and most draining to the digital deep.
Many large companies are failing to master this new data ecology because they are trying to do Big Data analytics in the same way, with the same tools as they did with BI, and that will never work. There is a lot more data, of course, but it is different data - tweets, posts, pictures, clicks, GPS, etc., not RDBMS records - and different analytics - discovery and prediction, not reporting and evaluation.
Successfully gleaning business value from the Big Data rainstorm requires new tools and maybe new rules.
But, it really doesn't matter which view is right. Advanced analytics on Big Data takes more computing horsepower than most companies can afford. Jobs like machine learning from the Twitter Fire Hose will take hundreds or even thousands of processor cores and terabytes of memory (not disk!) to build accurate and timely predictive models.
Most companies will have no choice but to embrace the shadow and use AWS or some other elastic cloud computing service, and new, more scalable software tools to do effective large scale advanced analytics.
Time for New Rules?
If the data to be analyzed were actual business records for customers and transactions as it is in the BI world, those concerns would be reasonable. But more often than not, advanced analytics does not work that way. Machine learning and other advanced algorithms do not look at business data. They look at statistical information derived from business data, usually in the form of an inscrutable mass of binary truth values that is only actionable to the algorithm. That is what gets sent to the cloud, not the customer file.
If you want to do advanced cloud-scale Big Data analytics and somebody is telling you it is against the rules, you should look at the rules. They probably don't even apply to what you are trying to do.
First User Advantage
Some day, technologies like high performance statistical machine learning will be ubiquitous and the business winners will be the ones who uses the software best. But right now, solutions are still scarce and the business winners are ones willing to use the software at all.
Subscribe to the World's Most Powerful Newsletters
Today's Top Reads