People often ask me
What is big data?
The answer I usually give is
Any data too large to process using your normal tools & techniques.
That’s a very context dependent answer: last week in one case it meant one million records, too big for Stata on a laptop, and in another it meant a dataset growing at about 1TB per day. To put it another way, big data is anything where you have to think about the engineering side of data science: where you can’t just open up R and run
lm(), because that would take a day and need a terabyte of memory.