People often ask me
What is big data?
The answer I usually give is
Any data too large to process using your normal tools & techniques.
That’s a very context dependent answer: last week in one case it meant one million records, too big for Stata on a laptop, and in another it meant a dataset growing at about 1TB per day. To put it another way, big data is anything where you have to think about the engineering side of data science: where you can’t just open up R and run
lm(), because that would take a day and need a terabyte of memory.
The beginnings of a debate has emerged, in the Financial Times, between Evgeny Morozov and Hal Varian, regarding Google’s alleged monopoly – not in search, but in data. (more…)
A new blog at Nesta, reflecting on Tim Harford’s recent critique of big data and why the recommendation to continue a decennial census is a good thing:
We live in a world of exponentially expanding data. Digitisation and the emerging internet of things have created a world in which our daily activities leave a digital trail. To an organisation or an individual with the right skills, that digital trail becomes data, able to be probed and interrogated for meaning, for correlations and for trends. But in the rush to take advantage of this tsunami of zeroes and ones, it’s important to remember that not all data is created equal.