Open-Source Tools for Data Science Part 1.

 "

  1. Open-Source Data Management Tools:

    • Relational databases: MySQL, PostgreSQL
    • NoSQL Databases: MongoDB, Apache CouchDB, Apache Cassandra
    • File-based tools: Hadoop File System, Cloud File systems like Ceph
    • Elastic search tool for storing text data and creating search indexes
  2. Open-Source Data Integration and Transformation Tools:

    • Apache AirFlow
    • KubeFlow
    • Apache Kafka
    • Apache Nifi
    • Apache SparkSQL
    • NodeRED
  3. Open-Source Data Visualization Tools:

    • Pixie Dust (with a user interface for plotting in Python)
    • Hue (creates visualizations from SQL queries)
    • Kibana (limited to Elasticsearch data provider)
    • Apache Superset (data exploration and visualization web application)
  4. Model Tools for Building, Deployment, Monitoring, and Assessment:

    • Model Deployment Tools:

      • Apache PredictionIO (supports Apache Spark ML models)
      • Seldon (supports various frameworks, runs on Kubernetes and Redhat OpenShift)
      • MLeap (specifically for deploying SparkML models)
      • TensorFlow service, TensorFlow lite, TensorFlow dot JS (for serving TensorFlow models)
    • Model Monitoring Tools:

      • ModelDB (machine model metadata base)
      • Prometheus (widely used, not specifically for model monitoring)
      • IBM AI Fairness 360 (detects and mitigates bias in models)
      • IBM Adversarial Robustness 360 Toolbox (detects vulnerability against adversarial attacks)
      • IBM AI Explainability 360 (addresses the interpretability of models)
  5. Code and Data Asset Management Tools:

    • Code Asset Management Tools:
      • Git (de facto standard for version control)
      • GitHub, GitLab, Bitbucket (services around Git)
    • Data Asset Management Tools:
      • Apache Atlas (supports versioning and metadata annotation)
      • ODPi Egeria (an open ecosystem with APIs for metadata repositories)
      • Kylo (an open-source data management software platform with extensive data asset management support)* * We have learned about a wide range of open-source tools across different categories, empowering them with options for data management, integration, visualization, model building, deployment, monitoring, assessment, code asset management, and data asset management.

Comments

Popular posts from this blog

Lila's Journey to Becoming a Data Scientist: Her Working Approach on the First Task

Notes on Hiring for Data Science Teams

switch functions