big data

What is big data?

People often ask me

The answer I usually give is

Any data too large to process using your normal tools & techniques.

That’s a very context dependent answer: last week in one case it meant one million records, too big for Stata on a laptop, and in another it meant a dataset growing at about 1TB per day. To put it another way, big data is anything where you have to think about the engineering side of data science: where you can’t just open up R and run lm(), because that would take a day and need a terabyte of memory.


Getting the data you pay for

A new blog at Nesta, reflecting on Tim Harford’s recent critique of big data and why the recommendation to continue a decennial census is a good thing:

We live in a world of exponentially expanding data. Digitisation and the emerging internet of things have created a world in which our daily activities leave a digital trail. To an organisation or an individual with the right skills, that digital trail becomes data, able to be probed and interrogated for meaning, for correlations and for trends. But in the rush to take advantage of this tsunami of zeroes and ones, it’s important to remember that not all data is created equal.