Understanding Data - Types of Data Based on Structure



 Data refers to unorganized information processed to derive meaning. It encompasses facts, observations, numbers, characters, symbols, and images. Categorizing data by structure, we have:


Structured Data:


Well-defined structure adhering to a specified data model.

Stored in schemas like databases, often presented in tabular form.

Objective facts and numbers suitable for databases.

Sources include SQL databases, OLTP systems, spreadsheets, online forms, sensors (GPS, RFID), and server logs.

Easily stored in relational databases and analyzed using standard methods and tools.

Semi-structured Data:


Has some organizational properties but lacks a fixed schema.

Cannot be stored in traditional rows and columns like databases.

Contains tags, elements, or metadata for grouping and hierarchy.

Sources include emails, XML, other markup languages, binary executables, TCP/IP packets, zipped files, and integrated data.

XML and JSON widely used for hierarchical data storage and exchange.

Unstructured Data:


Lacks easily identifiable structure and cannot be organized in a relational database format.

Does not follow a specific format, sequence, semantics, or rules.

Handles heterogeneity of sources and has diverse business intelligence applications.

Sources include web pages, social media feeds, images (JPEG, GIF, PNG), video/audio files, documents/PDFs, presentations, media logs, and surveys.

Stored in files or NoSQL databases with specific analysis tools for manual analysis.

In summary, structured data is well-organized, suitable for databases and standard analysis; semi-structured data has some organization relying on metadata; unstructured data lacks a conventional structure and is utilized for various analytics applications.





Comments

Popular posts from this blog

Lila's Journey to Becoming a Data Scientist: Her Working Approach on the First Task

Notes on Hiring for Data Science Teams

switch functions