Big Data = BI + ADD
Posted by Bob Warfield on April 26, 2012
Big Data is Business Intelligence plus Attention Deficit Disorder? That’s gotta be linkbait of an order I’ve not used since my NoSQL is a Premature Optimization post. What’s up with that?
I just got done attending Jeff Kaplan’s excellent Cloud BI Summit at the Computer History Museum. It was a very enjoyable event and it helped me align a lot of disparate threads that had been circulating in my noggin into some coherent insights. It also reminded me what an excellent facilitator of panels Jeff can be.
One of those threads is that the problems Big Data is focused on are not necessarily all that hard. Everyone wants to talk breathlessly about the incredible volumes of data that are flying around these days and how hard it is to deal with that data. But I just don’t buy it. We’ve had Big Data for a long time and we’ve been dealing with it for a long time. Web Scale only affects a relatively few organizations at the very high end–probably not enough organizations to make it the interesting investment thesis that it currently is held out as. At smaller scales, the volumes are just not all that crazy relative to the tools that are available. Put another way, its ADD because we simply have that much more data and excuses to obsess over nuts and bolts technology instead of seeking the real actionable insights our organizations need.
Listen, I’ve been doing Big Data for a long time. You want to talk lots of transactions? I did a startup called iMiner / PriceRadar that had to process all of eBay’s transactions every day in just a few hours. This was back in 1999. There was no Amazon Cloud to leverage. Nobody was using MySQL, and nobody had really even heard of NoSQL. eBay itself was still crashing weekly because they hadn’t started using their relational databases in all the adhoc-NoSQL ways (no joins or transactions!) that people would eventually codify and start talking about. I asked all of my high-powered database expert friends how to do handle the data volumes and they said it would require Oracle and lots of expensive high powered Unix hardware. We couldn’t afford that so I told them we were going to use SQL Server running on commodity Windows hardware. They laughed and said that wasn’t even possible. 8 weeks later it was up and running and worked well. Today we’d use MySQL and commodity hardware at Amazon. This was the whole point of my NoSQL is a Premature Optimization post. You don’t need column stores and flash memory architectures to get this kind of work done for the majority of companies out there. Who really has Facebook’s data volumes (and BTW, they used MySQL to handle those)? Who has Google’s data volumes? Relatively few organizations.
So then what’s the real problem?
The best part of the conference for me was a panel Ken Rudin participated in. I’ve interviewed Ken on this blog before twice when he was running LucidEra. He went on from there to run Analytics at Zynga, and now he is at Facebook running their Analytics. The rest of the people on the panel were from companies building the tools needed to wrangle Big Data. Ken pretty much cleared the decks when he said that was all fine and well, but the hard part wasn’t really managing the data. The hard part isn’t the ETL, the data feeds, the database, nor the dashboards and report generators. That’s all doable and easier than you’d think up to very large scales. The hard part, according to Ken, is knowing what questions to ask. He is so right about that.
There’s not that much new under the sun with Big Data. When I worked at Callidus, terabyte data stores for our application were common. We were handling huge transaction volumes around sales commissions for the world’s largest companies. We got it done largely with off the shelf technologies for the Analytics and a little tuning. If you don’t think managing the complex comp plans for 250,000 insurance agents for one of the largest insurance companies in the world is Big Data, you just don’t know much about the comp plan business. For one customer we had to horizontally scale our architecture out to over 200 cpus.
Jane Griffin, from Deloitte amplified on Ken’s thought in some nice ways. She said what I had been thinking ever since Ken uttered those words during his session: there is a growing shortage of the sort of person who knows how to ask the questions. Once again, she was so right about that.
As a person who does know how to ask those questions and get answers, I have seen it over and over again. I have often wondered in various organizations why I was doing that kind of work. Wasn’t there someone besides the SVP of Products or Engineering that could formulate these questions and get them answered with Analytics? Worse, why was it that when I presented the data and the answers there were so often blank faces? Why didn’t they get it? Didn’t they think analytically?
The simple fact is that we are moving from a world of intuition and gut feel to a world of data and analytical thinking. Before we had the data, intuition was all we had to fall back on when making decisions. It’s better than nothing, but it can result in spectacular failures. I’m fond of telling the story of a certain expensive marketing program that went very wrong that was based on someone’s gut feel. It involved spending a lot of money without producing much of a result. When my friend Marc Randolph, someone who is very analytical in their thinking, was asked why it failed, he said, “I don’t know, but it was tragically knowable.” He meant the idea should have been tested at much smaller volumes before all that wood was put behind an arrow that was doomed to miss.
In this world, if businesses want to succeed, they have to realize that data is plentiful. First they have to collect it up. That’s the easy part. Then comes the hard part. Someone in the organization must have both the skills and the empowerment to ask the right questions, get the some answers backed up by hard analytics, and then get changes made to take advantage of this newfound knowledge. If your competitors have digital strategies driven by hard analytics, and you’re flying by the seat of your pants, you may as well be piloting a biplane and barnstorming because they’re flying a sophisticated jet with radar and gps navigation. You don’t want to try to win that contest!
Thinking about these kinds of people that can perform that analytics-question-asking task, it’s clear they’re scarce as hen’s teeth in most corporations. I am reminded of a similar essential skill set that is equally as scarce: great software developers. The world went through an interesting evolution largely because of the shortage of great software developers. At first, when there weren’t that many computers around, IT built all their own software. Eventually, they couldn’t hire enough great developers, and so packaged software had a chance. Then, there weren’t enough great developers there either, so the world of professional services was spawned. Demand outstripped that supply and so we went global, with offshoring and outsourcing. There is no monopoly on this talent in the US. Now we have a sort of software development supply chain where software of varying degrees of sophistication can be created and the scarce supply of these developers has to be rationed at these different levels in that chain.
Expect to see the same thing going on with the Data Scientists, Quants, or whatever you want to call these people that know how to ask the Big Data Questions. They’re going to be the Moneyballers in their organizations and they’ll be worth every expensive penny you wind up having to pay them. Someday soon you’ll be turning to the Deloittes of the world to hire these people when you can no longer attract them. Who knows, maybe we’ll see them popping up in India, China, or Vietnam not long after that?
BTW, it’s a great pity the space isn’t called “Big Questions” or maybe “Big Insights” instead of “Big Data.” It would’ve been much more to the point, or at least more benefit oriented and less feature oriented.