Data Science 4 - Definition of Big Data

 Definition of Big Data:


Definition by Ernst and Young: "Big Data refers to the dynamic, large and disparate volumes of data created by people, tools, and machines."

Technology Requirement: Demands new, innovative, and scalable technology for collection, hosting, and analytical processing.

Purpose: Derive real-time business insights related to consumers, risk, profit, performance, productivity management, and enhanced shareholder value.

II. Characteristics of Big Data - The V's:


Velocity:


Definition: Speed at which data accumulates.

Data Generation: Occurs extremely fast, with near or real-time streaming, local, and cloud-based technologies facilitating quick processing.

Volume:


Definition: Scale of data or the increase in stored data.

Drivers: Increase in data sources, higher resolution sensors, and scalable infrastructure.

Example: World population using digital devices generates approximately 2.5 quintillion bytes of data every day.

Variety:


Definition: Diversity of data, including structured and unstructured forms.

Data Sources: From various machines, people, processes, and internal/external organizations.

Example: Different types of data, such as text, pictures, film, sound, health data from wearables, and IoT-connected devices.

Veracity:


Definition: Quality, origin, conformity to facts, and accuracy of data.

Attributes: Consistency, completeness, integrity, and ambiguity.

Example: 80% of data considered unstructured; challenges in categorization, analysis, and visualization.

Value:


Definition: Ability and need to turn data into value, not limited to profit.

Examples: Medical or social benefits, customer satisfaction, employee satisfaction.

III. Examples of V's in Action:


Velocity Example: Every 60 seconds, hours of footage uploaded to YouTube, showcasing the rapid accumulation of data.

Volume Example: Global population using digital devices generates 2.5 quintillion bytes of data daily, equivalent to 10 million Blu-ray DVDs.

Variety Example: Diverse types of data, including text, pictures, film, sound, health data, and IoT data.

Veracity Example: Devising ways to produce reliable insights from 80% unstructured data.

IV. Big Data Analysis and Tools:


Scale Challenges: Conventional data analysis tools not feasible due to the scale of data.

Alternative Tools: Apache Spark, Hadoop, and its ecosystem leverage distributed computing power to extract, load, analyze, and process data.

Benefits: Provide new insights, knowledge, and ways for organizations to connect with customers and enhance services.

V. Conclusion:


Data Journey: From smartwatches, smartphones, and workout tracking devices, user data embarks on a global journey through big data analysis, returning with valuable insights.




Comments

Popular posts from this blog

Lila's Journey to Becoming a Data Scientist: Her Working Approach on the First Task

Notes on Hiring for Data Science Teams

switch functions