Summary: Common Data Sources for Analytics
Relational Databases:
- Internal applications use databases (SQL Server, Oracle, MySQL) for structured data.
- Data from transactions, human resources, and workflows are valuable for analysis.
External Datasets:
- Publicly and privately available datasets, e.g., government demographic data.
- Companies sell specific data (POS, financial, weather) for strategic decisions.
Flatfiles and XML Datasets:
- Flat files (CSV) store data in plain text format, one record per line.
- Spreadsheets (Excel, Google Sheets) organize data in tables, suitable for multiple worksheets.
- XML files use tags for data identification, supporting complex hierarchical structures.
APIs and Web Services:
- APIs and Web Services from data providers offer accessible data for processing.
- Examples include Twitter and Facebook APIs for sentiment analysis, Stock Market APIs, and Data Lookup APIs.
Web Scraping:
- Extracts relevant data from unstructured sources on web pages.
- Used for various purposes, such as price comparisons, lead generation, and collecting training datasets for machine learning.
- Tools include BeautifulSoup, Scrapy, Pandas, and Selenium.
Data Streams:
- Constant streams of data from instruments, IoT devices, applications, etc.
- Timestamped and geo-tagged for identification.
- Applications in financial trading, supply chain management, threat detection, sentiment analysis, industrial monitoring, web performance, and flight events.
- Processing tools include Apache Kafka, Apache Spark Streaming, and Apache Storm.
RSS Feeds:
- Really Simple Syndication feeds for capturing updated data from online forums and news sites.
- Feed readers convert RSS text files into a stream of updated data for user devices.
The diverse nature of data sources requires flexibility in data analytics approaches.
Comments
Post a Comment