Digital Edition

Data Warehouse and Data Lake | @ExpoDX @Schmarzo #BigData #DataLake
Maybe the best way to understand today’s role of the data warehouse is with a bit of history

This blog was written with the thoughtful assistance of David Leibowitz, Dell EMC Director of Business Intelligence, Analytics & Big Data

So data warehousing may not be cool anymore, you say? It’s yesterday’s technology (or 1990’s technology if you’re as old as me) that served yesterday’s business needs. And while it’s true that recent big data and data science technologies, architectures and methodologies seems to have rendered data warehousing to the back burner, it is entirely false that there is not a critical role for the data warehouse and Business Intelligence in digitally transformed organizations.

Maybe the best way to understand today’s role of the data warehouse is with a bit of history. And please excuse us if we take a bit of liberty with history (since we were there for most of this!).

Phase 1: The Data Warehouse Era
Phase 1: In the beginning, Gods (Ralph Kimble and Bill Inmon, depending upon your data warehouse religious beliefs) created the data warehouse. And it was good. The data warehouse, coupled with Business Intelligence (BI) tools, served the management and operational reporting needs of the organization so that executives and line-of-business managers could quickly and easily understand the status of the business, identify opportunities, and highlight potential areas of under-performance (see Figure 1).

Figure 1: The Data Warehouse Era

The data warehouse served as a central integration point; collecting, cleansing and aggregating a variety of data sources from AS/400, relational and file based (such as EDI). For the first time, data from supply chain, warehouse management, AP/AR, HR, point of sale was available in a “single version of the truth.”

Using extraction-transform-load (ETL) processing wasn’t always quick, and could require a degree of technical gymnastics to bring together all of these disparate data sources. At one point, the “enterprise service bus” entered the playing field to lighten the load on ETL maintenance, but routines quickly went from proprietary data sources, to proprietary (and sometimes arcane) middleware business logic code (anyone remember Monk?).

The data warehouse supported reports and interactive dashboards that enabled business management to have a full grasp on the state of the business. That said, report authoring was static and not really enabled for democratizing data. Typically, the nascent concept of self-service BI was limited to cloning a subset of the data warehouse to smaller data marts, and extracts to Excel for business analysis purposes. This proliferation of additional data silos created reporting environments that were out of sync (remember the heated sales meetings where teams couldn’t agree as to which report figures were correct?) and the analysis paralysis caused by spreadmarts meant that more time was spent working the data rather than driving insight. But we all dealt with it, as it was agreed that some information (no matter the effort it took to acquire) was more important that no data.

Phase 2: Optimize the Data Warehouse
But IT man grew unhappy with being held captive by proprietary data warehouse vendors. The costs of proprietary software and expensive hardware (and let’s not even get started on user-defined functions in PL/SQL and proprietary SQL extensions that created architectural lock-in) forced organizations to limit the amount and granularity of data in the data warehouse. IT Man grew restless and looked for ways to reduce the costs associated with operating these proprietary data warehouses while delivering more value to Business Man.

Then Hadoop was born out of the ultra-cool and hip labs of Yahoo. Hadoop provided a low-cost data management platform that leveraged commodity hardware and open sources software that was an estimated to be 20x to 100x cheaper than proprietary data warehouses.

Man soon realized the financial and operational benefits afforded by a commodity-based, natively parallel, open source Hadoop platform to provide an Operational Data Store (now that’s really going old school!) to off-load those nasty Extract Load and Transform (ETL) processes off the expensive data warehouse (see Figure 2).

Figure 2: Optimize the Data Warehouse

The Hadoop-based Operational Data Store was deemed very good as it helped IT Man to decrease spending on the data warehouse (guess not so good if you were a vendor of those proprietary data warehouse solutions…and you know who you are T-man!). Since it’s estimated that ETL consumes 60% to 90% of the data warehouse processing cycles, and since some vendors licensed their products based upon those cycles – this concept of “ETL Offload” could provide substantial cost reductions. So in an environment limited by Service Level Agreements (because outside of Doc Brown’s DeLorean equipped with a flux capacitor, there’s still only 24 hours in a day in which to do all the ETL work), Hadoop provided a low-cost, high-performance environment for dramatically slowing the investment in proprietary data warehouse platforms.

Things were getting better, but still weren’t perfect. While IT Man could shave costs, he couldn’t make the tools easy to use by simple data consumers (like Executive Man). And while Hadoop was great for storing unstructured and semi-structured data, it couldn’t always keep up to the speed relied upon for relational or cube based reporting from traditional transactional systems.

See blog “The Data Warehouse Modernization Act” for more details on the role of the Hadoop-based Operational Data Store and how it has helped to “modernize” today’s existing data warehouse environment.

Phase 3: Introducing Data Science
Then God created the Data Scientists, or maybe it was the Devil based upon one’s perspective. The data scientists needed an environment where they could rapidly ingest high volumes of granular structured (tables), semi-structured (log files) and unstructured data (text, video, images). They realized that data beyond the firewall was needed in order to drive intelligent insight. Data such as weather, social, sensor and third party could be mashed up with the traditional data stores in the EDW and Hadoop to determine customer insight, customer behavior and product effectiveness. This made Marketing Man happy. The scientists needed an environment where they could quickly test new data sources, new data transformations and enrichments, and new analytic techniques in search of those variables and metrics that might be better predictors of business and operational performance. Thusly, the analytic sandbox, which also runs on Hadoop, was born (see Figure 3).

Figure 3: Introducing Data Science

The characteristics of a data science “sandbox” couldn’t be more different than the characteristics of a data warehouse:

Finance Man tried desperately to combine these two environments but the audiences, responsibilities and business outcomes were just too varying to create an cost-effectively business reporting and predictive analytics in single bubble.

Ultimately, the analytic sandbox became one of the drivers for the creation of the data lake that could support both the data science and data warehousing (Operational Data Store) needs.

Data access was getting better for the data scientists but we again were moving towards proprietary process and a technical skill reserved for the elite. Still, things were good as IT Man, Finance Man and Marketing Man could work through the data scientists to drive innovation. But they soon wanted more.

See the following blogs for more details on the complementary nature of the data warehouse and the data lake:

Phase 4: Creating Actionable Dashboards
But Executive Man was still unsatisfied. The Data Scientists were developing wonderful predictions about what was likely to happen and prescriptions about what to do, but the promise of self-service BI was missing. Instead of the old days, and having to run to IT Man for reports, now he was requesting them of the Data Scientist.

The reports and dashboards created to support executive and front-line management in Stage 1 were the natural channel for rendering the predictive and prescriptive insights, effectively closing the loop between the data warehouse and the data lake. With data visualization tools like Tableau and Power BI, IT Man could finally deliver on the promise of self-service BI by providing interactive descriptive and predictive dashboards that even Executive Man could operate (see Figure 4).

Figure 4: Closing the Analytics Loop

See the blog “Creating Actionable Dashboards” for more details on how to convert existing reports and dashboards into actionable reports and dashboards!

And Man was happy (until the advent of Terminator robots began making decisions for us).

The post Data Warehouse and Data Lake Analytics Collaboration appeared first on InFocus Blog | Dell EMC Services.


DXWorldEXPO LLC, the producer of the world's most influential technology conferences and trade shows has announced the conference tracks for CloudEXPO | DXWorldEXPO 2018 New York.

DXWordEXPO New York 2018, colocated with CloudEXPO New York 2018 will be held November 11-13, 2018, in New York City.

Digital Transformation (DX) is a major focus with the introduction of DXWorldEXPO within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term.

A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throughout enterprises of all sizes.

Register for Full Conference "Gold Pass" ▸ Here (Expo Hall ▸ Here)

Sponsorship Opportunities Here

Speaking Opportunities Here

Sponsorship and Speaking Inquiries:

2018 Conference Agenda, Keynotes and 10 Conference Tracks

DXWordEXPO New York 2018 and Cloud Expo New York 2018 agenda present 222 rockstar faculty members, 200 sessions and 22 keynotes and general sessions in 10 distinct conference tracks.

  • Cloud-Native | Serverless
  • DevOpsSummit
  • FinTechEXPO - New York Blockchain Event
  • CloudEXPO - Enterprise Cloud
  • DXWorldEXPO - Digital Transformation (DX)
  • Smart Cities | IoT | IIoT
  • AI | Machine Learning | Cognitive Computing
  • BigData | Analytics
  • The API Enterprise | Mobility | Security
  • Hot Topics | FinTech | WebRTC

Register for Full Conference "Gold Pass" ▸ Here (Expo Hall ▸ Here)

DXWorldEXPO | CloudEXPO 2018 New York cover all of these tools, with the most comprehensive program and with 222 rockstar speakers throughout our industry presenting 22 Keynotes and General Sessions, 200 Breakout Sessions along 10 Tracks, as well as our signature Power Panels. Our Expo Floor brings together the world's leading companies throughout the world of Cloud Computing, DevOps, FinTech, Digital Transformation, and all they entail.

As your enterprise creates a vision and strategy that enables you to create your own unique, long-term success, learning about all the technologies involved is essential. Companies today not only form multi-cloud and hybrid cloud architectures, but create them with built-in cognitive capabilities.

Cloud-Native thinking is now the norm in financial services, manufacturing, telco, healthcare, transportation, energy, media, entertainment, retail and other consumer industries, as well as the public sector.

CloudEXPO is the world's most influential technology event where Cloud Computing was coined over a decade ago and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals.

FinTech Is Now Part of the DXWorldEXPO | CloudEXPO Program!

Financial enterprises in New York City, London, Singapore, and other world financial capitals are embracing a new generation of smart, automated FinTech that eliminates many cumbersome, slow, and expensive intermediate processes from their businesses.

Accordingly, attendees at the upcoming 22nd CloudEXPO | DXWorldEXPO November 11-13, 2018 in New York City will find fresh new content in two new tracks called:

  • FinTechEXPO
  • New York Blockchain Event

which will incorporate FinTech and Blockchain, as well as machine learning, artificial intelligence and deep learning in these two distinct tracks.

Register for Full Conference "Gold Pass" ▸ Here (Expo Hall ▸ Here)

Sponsorship Opportunities Here

Speaking Opportunities Here

Sponsorship and Speaking Inquiries:

FinTech brings efficiency as well as the ability to deliver new services and a much improved customer experience throughout the global financial services industry. FinTech is a natural fit with cloud computing, as new services are quickly developed, deployed, and scaled on public, private, and hybrid clouds.

More than US$20 billion in venture capital is being invested in FinTech this year. DXWorldEXPOCloudEXPO are pleased to bring you the latest FinTech developments as an integral part of our program.

DXWorldEXPO | CloudEXPO are accepting speaking submissions for this new track, so please visit Cloud Computing Expo for the latest information or contact us at

Register for Full Conference "Gold Pass" ▸ Here (Expo Hall ▸ Here)

Sponsorship Opportunities Here

Speaking Opportunities Here

Sponsorship and Speaking Inquiries:

Download Slide Deck ▸ Here

Only DXWorldEXPO | CloudEXPO bring together all this in a single location:

Attend DXWorldEXPO | CloudEXPO. Build your own custom experience. Learn about the world's latest technologies and chart your course to Digital Transformation.

22nd International DXWorldEXPO | CloudEXPO, taking place November 11-13, 2018, in New York City, will feature technical sessions from a rock star conference faculty and the leading industry players in the world.

Register for Full Conference "Gold Pass" ▸ Here (Expo Hall ▸ Here)

Sponsorship Opportunities Here

Speaking Opportunities Here

Sponsorship and Speaking Inquiries:

Download Slide Deck: ▸ Here

Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Meanwhile, 94% of enterprises are using some form of XaaS - software, platform, and infrastructure as a service.

With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend and learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.

Every Global 2000 enterprise in the world is now integrating cloud computing in some form into its IT development and operations. Midsize and small businesses are also migrating to the cloud in increasing numbers.

Register for Full Conference "Gold Pass" ▸ Here (Expo Hall ▸ Here)

Sponsorship Opportunities Here

Speaking Opportunities Here

Sponsorship and Speaking Inquiries:

Download Slide Deck: ▸ Here

Companies are each developing their unique mix of cloud technologies and services, forming multi-cloud and hybrid cloud architectures and deployments across all major industries. Cloud-driven thinking has become the norm in financial services, manufacturing, telco, healthcare, transportation, energy, media, entertainment, retail and other consumer industries, and the public sector.

Sponsorship Opportunities

DXWorldEXPO | CloudEXPO are the single show where technology buyers and vendors can meet to experience and discus cloud computing and all that it entails. Sponsors of DXWorldEXPO | CloudEXPO will benefit from unmatched branding, profile building and lead generation opportunities through:

  • Featured on-site presentation and ongoing on-demand webcast exposure to a captive audience of industry decision-makers.
  • Showcase exhibition during our new extended dedicated expo hours
  • Breakout Session Priority scheduling for Sponsors that have been guaranteed a 35-minute technical session
  • Online advertising on 4,5 million article pages in SYS-CON's i-Technology Publications
  • Capitalize on our Comprehensive Marketing efforts leading up to the show with print mailings, e-newsletters and extensive online media coverage.
  • Unprecedented PR Coverage: Unmatched editorial coverage on Cloud Computing Journal.
  • Tweetup to over 100,000 plus Twitter followers
  • Press releases sent on major wire services to over 500 industry analysts.

Secrets of Our Most Popular Sponsors and Exhibitors ▸ Here

For more information on sponsorship, exhibit, and keynote opportunities, contact

Sponsorship Opportunities Here

Download Slide Deck:Here

Speaking Opportunities

The upcoming 22nd International DXWorldEXPO | CloudEXPO November 11-13, 2018 in New York City, NY announces that its Call For Papers for speaking opportunities is now open.

Secrets of Our Most Popular Faculty Members ▸ Here

Submit your speaking proposal Here or by email

Download Slide Deck: ▸ Here


DXWorldEXPO LLC is a Lighthouse Point, Florida-based trade show company and the creator of DXWorldEXPODigital Transformation Conference & Expo. The company produces and presents CloudEXPO, DevOpsSummitFinTechEXPO Blockchain Event, the world's most influential conferences and trade shows.

About William Schmarzo
Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business” and “Big Data MBA: Driving Business Strategies with Data Science”, is responsible for setting strategy and defining the Big Data service offerings for Dell EMC’s Big Data Practice.

As a CTO within Dell EMC’s 2,000+ person consulting organization, he works with organizations to identify where and how to start their big data journeys. He’s written white papers, is an avid blogger and is a frequent speaker on the use of Big Data and data science to power an organization’s key business initiatives. He is a University of San Francisco School of Management (SOM) Executive Fellow where he teaches the “Big Data MBA” course. Bill also just completed a research paper on “Determining The Economic Value of Data”. Onalytica recently ranked Bill as #4 Big Data Influencer worldwide.

Bill has over three decades of experience in data warehousing, BI and analytics. Bill authored the Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements. Bill serves on the City of San Jose’s Technology Innovation Board, and on the faculties of The Data Warehouse Institute and Strata.

Previously, Bill was vice president of Analytics at Yahoo where he was responsible for the development of Yahoo’s Advertiser and Website analytics products, including the delivery of “actionable insights” through a holistic user experience. Before that, Bill oversaw the Analytic Applications business unit at Business Objects, including the development, marketing and sales of their industry-defining analytic applications.

Bill holds a Masters Business Administration from University of Iowa and a Bachelor of Science degree in Mathematics, Computer Science and Business Administration from Coe College.

Subscribe to the World's Most Powerful Newsletters


"We started a Master of Science in business analytics - that's the hot topic. We serve the business ...
DXWorldEXPO LLC announced today that Dez Blanchfield joined the faculty of CloudEXPO's "10-Year Anni...
There is a huge demand for responsive, real-time mobile and web experiences, but current architectur...
The standardization of container runtimes and images has sparked the creation of an almost overwhelm...
We call it DevOps but much of the time there’s a lot more discussion about the needs and concerns of...
As DevOps methodologies expand their reach across the enterprise, organizations face the daunting ch...
Digital Transformation: Preparing Cloud & IoT Security for the Age of Artificial Intelligence. As au...
"NetApp is known as a data management leader but we do a lot more than just data management on-prem ...
"Since we launched LinuxONE we learned a lot from our customers. More than anything what they respon...
DXWordEXPO New York 2018, colocated with CloudEXPO New York 2018 will be held November 11-13, 2018, ...
DXWorldEXPO | CloudEXPO are the world's most influential, independent events where Cloud Computing w...
DXWorldEXPO LLC announced today that "Miami Blockchain Event by FinTechEXPO" has announced that its ...
The best way to leverage your Cloud Expo presence as a sponsor and exhibitor is to plan your news an...
DevOpsSummit New York 2018, colocated with CloudEXPO | DXWorldEXPO New York 2018 will be held Novemb...
With 10 simultaneous tracks, keynotes, general sessions and targeted breakout classes, @CloudEXPO an...
Modern software design has fundamentally changed how we manage applications, causing many to turn to...
Cloud Expo | DXWorld Expo have announced the conference tracks for Cloud Expo 2018. Cloud Expo will ...
As you move to the cloud, your network should be efficient, secure, and easy to manage. An enterpris...
@DevOpsSummit New York 2018, colocated with CloudEXPO | DXWorldEXPO New York 2018 will be held Novem...
The dynamic nature of the cloud means that change is a constant when it comes to modern cloud-based ...