167 million TikTok videos viewed a minute x 60 min x 24 hours x 30 days x 365 days
This is big data
The simplest definition of big data is large and complex unstructured data (images posted on Facebook, email, text messages, GPS signals, from mobile phones, tweets, social media updates, etc.) that cannot be processed by traditional database tools
Ex. Walmart collects over 2.5 petabytes of data every hour from customer’s transactions
Peta = 1015
Background
Before talking about big data analytics, a few terms will need to be explained and defined to understand this concept
Statistics is using numbers to quantify the data
Center: Mean, mode, median, midrange
Variation: standard deviation, variance, range
Time
Distribution
Outliers
Data mining uses statistics and programming languages to find patterns hidden in the data
Machine learning uses data mining to build models to predict future outcomes
Artificial intelligence uses models built by machine learning to make machines act in an intelligent way
Big data analytics is the process of studying big data to uncover hidden patterns and correlations to make better decisions using technologies like NoSQL databases, Hadoop, and MapReduce
The main goal of big data analytics is to help organizations make better decisions
Big data as Three Vs
Volume
Unstructured data streaming in from social media
Increasing amounts of sensor and machine-to-machine data being collected
Velocity
Data is streaming in at unprecedented speed and must be dealt with in a timely manner
Variety
Data today comes in all types of format – structured, numeric data in traditional databases
Information created from line-of-business applications
The Future?
Analytics 3.0 is the new wave of big data analytics
Analytics 1.0, which is BI (business intelligence) and Analytics 2.0, which is used by online companies only (Google, Facebook, etc.)