Summary: Common Data Sources for Analytics

 

  1. Relational Databases:

    • Internal applications use databases (SQL Server, Oracle, MySQL) for structured data.
    • Data from transactions, human resources, and workflows are valuable for analysis.
  2. External Datasets:

    • Publicly and privately available datasets, e.g., government demographic data.
    • Companies sell specific data (POS, financial, weather) for strategic decisions.
  3. Flatfiles and XML Datasets:

    • Flat files (CSV) store data in plain text format, one record per line.
    • Spreadsheets (Excel, Google Sheets) organize data in tables, suitable for multiple worksheets.
    • XML files use tags for data identification, supporting complex hierarchical structures.
  4. APIs and Web Services:

    • APIs and Web Services from data providers offer accessible data for processing.
    • Examples include Twitter and Facebook APIs for sentiment analysis, Stock Market APIs, and Data Lookup APIs.
  5. Web Scraping:

    • Extracts relevant data from unstructured sources on web pages.
    • Used for various purposes, such as price comparisons, lead generation, and collecting training datasets for machine learning.
    • Tools include BeautifulSoup, Scrapy, Pandas, and Selenium.
  6. Data Streams:

    • Constant streams of data from instruments, IoT devices, applications, etc.
    • Timestamped and geo-tagged for identification.
    • Applications in financial trading, supply chain management, threat detection, sentiment analysis, industrial monitoring, web performance, and flight events.
    • Processing tools include Apache Kafka, Apache Spark Streaming, and Apache Storm.
  7. RSS Feeds:

    • Really Simple Syndication feeds for capturing updated data from online forums and news sites.
    • Feed readers convert RSS text files into a stream of updated data for user devices.

The diverse nature of data sources requires flexibility in data analytics approaches.

Comments

Popular posts from this blog

Lila's Journey to Becoming a Data Scientist: Her Working Approach on the First Task

Notes on Hiring for Data Science Teams

switch functions