Commercial Tools for Data Science
A comprehensive overview of commercial tools for data science. It's clear that the landscape is diverse, covering data management, integration, visualization, model building, deployment, monitoring, assessment, code and data asset management, and fully integrated development environments. The information you've shared highlights key players in each category and their respective strengths.
To summarize the key points:
Data Management Tools:
- Oracle Database, Microsoft SQL Server, and IBM Db2 are industry-standard choices.
- Commercial support availability is crucial due to the central role of data in organizations.
Data Integration Tools (ETL):
- Informatica PowerCenter and IBM InfoSphere DataStage are Gartner Magic Quadrant leaders.
- Other notable players include SAP, Oracle, SAS, Talend, Microsoft products, and Watson Studio Desktop with Data Refinery.
Data Visualization Tools:
- Business Intelligence tools like Tableau, Microsoft Power BI, and IBM Cognos Analytics for creating visual reports and live dashboards.
- Watson Studio Desktop provides visualization functionality for data scientists.
Model Building Tools:
- SPSS Modeler and SAS Enterprise Miner are prominent choices.
- SPSS Modeler is available in Watson Studio Desktop.
Model Deployment and Monitoring:
- Commercial tools tightly integrate deployment into the model-building process.
- Open standards like PMML are supported, and currently, open source tools are preferred for model monitoring.
Code Asset Management:
- Git and GitHub are the de facto standard for code asset management.
Data Asset Management (Data Governance, Data Lineage):
- Informatica Enterprise Data Governance, IBM, and Watson Studio provide tools for versioning, metadata annotation, data lineage, and data governance.
Integrated Development Environments:
- Watson Studio (Desktop and Cloud) combines Jupyter Notebooks with graphical tools for data scientists.
- H2O Driverless AI is another fully integrated commercial tool covering the complete data science lifecycle.
Fully Integrated Tools:
- Watson Studio, along with Watson Open Scale, covers the entire data science lifecycle and can be deployed locally or on Kubernetes / RedHat OpenShift.
- H2O Driverless AI is another example of a fully integrated commercial tool.
This information is valuable for those looking to navigate the landscape of commercial tools in the field of data science.
Comments
Post a Comment