Establishing Data Mining Goals

 

  1. Establishing Data Mining Goals:

    • Identify key questions to be answered.
    • Consider costs, benefits, and expected accuracy.
    • Address cost-benefit trade-offs for desired accuracy levels.
  2. Selecting Data:

    • Quality of data is crucial for data mining outcomes.
    • Availability varies; may require new data collection initiatives.
    • Type, size, and frequency of data collection impact mining costs.
  3. Preprocessing Data:

    • Raw data may be messy with errors or missing information.
    • Identify and remove irrelevant attributes.
    • Address errors, ensuring data integrity.
    • Develop methods to handle missing data systematically or randomly.
  4. Transforming Data:

    • Determine appropriate data format and reduce attributes.
    • Use data reduction algorithms (e.g., Principal Component Analysis).
    • Transform variables for better representation.
    • Convert continuous variables to categorical for capturing non-linearities.
  5. Storing Data:

    • Store transformed data conducive for data mining.
    • Allow unrestricted read/write access for data scientists.
    • Ensure efficient data storage and prioritize data safety and privacy.
  6. Mining Data:

    • Apply data mining methods, including parametric, non-parametric, and machine-learning algorithms.
    • Start with data visualization for a preliminary understanding of trends.
  7. Evaluating Mining Results:

    • Formally evaluate results, testing predictive capabilities on observed data (in-sample forecast).
    • Share results with key stakeholders for feedback.
    • Incorporate feedback into subsequent iterations for continuous improvement.
    • Data mining and result evaluation form an iterative process.

Comments

Popular posts from this blog

Lila's Journey to Becoming a Data Scientist: Her Working Approach on the First Task

Notes on Hiring for Data Science Teams

switch functions