Lecture Notes - CS441

Tuesday, October 15th 2024


  • Big Data

  • 167 million TikTok videos viewed a minute x 60 min x 24 hours x 30 days x 365 days
  • This is big data
  • The simplest definition of big data is large and complex unstructured data (images posted on Facebook, email, text messages, GPS signals, from mobile phones, tweets, social media updates, etc.) that cannot be processed by traditional database tools
  • Ex. Walmart collects over 2.5 petabytes of data every hour from customer’s transactions
  • Peta =
  • Background

  • Before talking about big data analytics, a few terms will need to be explained and defined to understand this concept
    • Statistics is using numbers to quantify the data
      1. Center: Mean, mode, median, midrange
      2. Variation: standard deviation, variance, range
      3. Time
      4. Distribution
      5. Outliers
    • Data mining uses statistics and programming languages to find patterns hidden in the data
    • Machine learning uses data mining to build models to predict future outcomes
    • Artificial intelligence uses models built by machine learning to make machines act in an intelligent way
    • Big data analytics is the process of studying big data to uncover hidden patterns and correlations to make better decisions using technologies like NoSQL databases, Hadoop, and MapReduce
    • The main goal of big data analytics is to help organizations make better decisions
  • Big data as Three Vs
  • Volume
    • Unstructured data streaming in from social media
    • Increasing amounts of sensor and machine-to-machine data being collected
  • Velocity
    • Data is streaming in at unprecedented speed and must be dealt with in a timely manner
  • Variety
    • Data today comes in all types of format – structured, numeric data in traditional databases
    • Information created from line-of-business applications
  • The Future?

  • Analytics 3.0 is the new wave of big data analytics
  • Analytics 1.0, which is BI (business intelligence) and Analytics 2.0, which is used by online companies only (Google, Facebook, etc.)