Post written by Evan Hu, Co-founder of Ideaca. View his blog here: evanhu.wordpress.com
In a 2001
research report by META Group, Doug Laney laid the seeds of Big Data and
defined data growth challenges and opportunities in a “3Vs” model. The
elements of this 3Vs model include volume (the sheer, massive amount of
data or the “Big” in Big Data), velocity (speed of data processed) and
variety (breadth of data types and sources). Roger Magoulas of O’Reilly
media popularized the term “Big Data” in 2005 by describing these
challenges and opportunities. Presently Gartner defines Big Data as
“high-volume, high-velocity and high-variety information assets that
demand cost-effective, innovative forms of information processing for
enhanced insight and decision making.” Most recently IBM has added a
fourth “V,” Veracity, as an “indication of data integrity and the
ability for an organization to trust the data and be able to confidently
use it to make crucial decisions.”The volume of data being created in our world today is exploding exponentially. McKinsey’s 2012 paper “Big data: The next frontier for innovation, competition, and productivity” noted that:
- to buy a disk drive that can store all of the world’s music costs $600
- there were 5 billion mobile phones in use in 2010
- over 30 billion pieces of content shared on Facebook every month
- the projected growth in global data generated per year is 40% vs. a 5% growth in global IT spending
- 235 terabytes data was collected by the US Library of Congress by April 2011
- 15 out of 17 sectors in the United States have more data stored per company than the US Library of Congress
This sheer volume of data presents huge challenges. For time-sensitive processes such as fraud detection, a quick response is critical. How does one find the signal in all that noise? The variety of both structured and unstructured data is ever expanding in forms: numeric file, text documents, audio, video, etc. And last, in a world where 1 in 3 business leaders lack trust in the information they use to make decisions, data veracity is a barrier to taking action.
The solution lays ever more inexpensive and accessible processing power and the nascent science of machine learning. While Abraham Kaplan (1964) principle of the drunkard’s search holds true: “There is the story of a drunkard, searching under a lamp for his house key, which he dropped some distance away. Asked why he didn’t look where he dropped it, he replied ‘It’s lighter here!’” A massive dataset that all has the same bias as a small dataset will only give you a more precise validate of a flawed answer, we are still in early days. Big Data is the opportunity to unlock answers to previously unanswerable questions and to uncover insights unseen. With it are new dangers as the NSA warrantless surveillance controversy clearly exposes.
I have had the privilege of listening to Clayton Christensen speak several times. In particular he has one common through line that stuck with me and forever embedded itself in my consciousness. “I don’t have an opinion. But I have a theory, and I think my theory has an opinion.” I believe the same for Big Data. The data has an opinion, the data has the answers.
0 comments:
Post a Comment