Open-Source Tools for Data Science Part 1.
"
Open-Source Data Management Tools:
- Relational databases: MySQL, PostgreSQL
- NoSQL Databases: MongoDB, Apache CouchDB, Apache Cassandra
- File-based tools: Hadoop File System, Cloud File systems like Ceph
- Elastic search tool for storing text data and creating search indexes
Open-Source Data Integration and Transformation Tools:
- Apache AirFlow
- KubeFlow
- Apache Kafka
- Apache Nifi
- Apache SparkSQL
- NodeRED
Open-Source Data Visualization Tools:
- Pixie Dust (with a user interface for plotting in Python)
- Hue (creates visualizations from SQL queries)
- Kibana (limited to Elasticsearch data provider)
- Apache Superset (data exploration and visualization web application)
Model Tools for Building, Deployment, Monitoring, and Assessment:
Model Deployment Tools:
- Apache PredictionIO (supports Apache Spark ML models)
- Seldon (supports various frameworks, runs on Kubernetes and Redhat OpenShift)
- MLeap (specifically for deploying SparkML models)
- TensorFlow service, TensorFlow lite, TensorFlow dot JS (for serving TensorFlow models)
Model Monitoring Tools:
- ModelDB (machine model metadata base)
- Prometheus (widely used, not specifically for model monitoring)
- IBM AI Fairness 360 (detects and mitigates bias in models)
- IBM Adversarial Robustness 360 Toolbox (detects vulnerability against adversarial attacks)
- IBM AI Explainability 360 (addresses the interpretability of models)
Code and Data Asset Management Tools:
- Code Asset Management Tools:
- Git (de facto standard for version control)
- GitHub, GitLab, Bitbucket (services around Git)
- Data Asset Management Tools:
- Apache Atlas (supports versioning and metadata annotation)
- ODPi Egeria (an open ecosystem with APIs for metadata repositories)
- Kylo (an open-source data management software platform with extensive data asset management support)* * We have learned about a wide range of open-source tools across different categories, empowering them with options for data management, integration, visualization, model building, deployment, monitoring, assessment, code asset management, and data asset management.
- Code Asset Management Tools:
Comments
Post a Comment