hadoop

What is big data?

People often ask me

What is big data?

The answer I usually give is

Any data too large to process using your normal tools & techniques.

That’s a very context dependent answer: last week in one case it meant one million records, too big for Stata on a laptop, and in another it meant a dataset growing at about 1TB per day. To put it another way, big data is anything where you have to think about the engineering side of data science: where you can’t just open up R and run lm(), because that would take a day and need a terabyte of memory.

(more…)