Phil Simon, author of Too Big to Ignore: The Business Case for Big Data, discusses two myths of collecting and analyzing data.
Myth: You can get all of the data
We are living in unprecedented times. Never before has so much data been available to us. Forget megabytes and petabytes, exabytes of data now exists. I read recently that the average person in an industrialized society today consumes more information in one day than his counterpart did in the fifteenth century in his lifetime.
Despite this unfathomable amount of data, no person or organization can store and retrieve all data. And yes, that includes Google. Its software indexes the Surface Web, not the Deep Web. Some estimates put the latter at 25 times the size of the former. As a result, when you search, you are accessing anywhere from four to six percent of all information on the Internet.
Taking it down a level or thirty, individual authors like me cannot access some very valuable information, such as which specific customers are buying my books. Sites like Amazon and stores like Barnes and Noble keep that information. Nothing would make me happier than knowing my customers, but even in a Big Data world that information eludes me.
You will never get all of the data. Period. Deal with it.
Myth: You need all of the data
No doubt that more data helps, but don’t for a minute think that you need all data to make an informed business decision. Organizations that are effectively leveraging the power of Big Data realize that they will never capture all relevant information.
New sources of data spring up seemingly every day, and it’s not as if they’re all valuable. For instance, e-mail messages often contain extremely valuable insights into the state of an enterprise. Smart companies are mining individual messages to gauge employee sentiment and potentially determine who might be exiting.
This is a far cry from saying that all e-mails are equally valuable. It’s hard to make the argument that using text analytics on spam makes much sense.
You don’t need all of the data. Yes, more is better than less, but don’t waste time trying to achieve the impossible.