Glossary: From Problem to Approach

 Welcome! This alphabetized glossary contains many of the terms you'll find within this lesson. These terms are important for you to recognize when working in the industry, when participating in user groups, and when participating in other certificate programs.

TermDefinition
Analytic ApproachThe process of selecting the appropriate method or path to address a specific data science question or problem.
AnalyticsThe systematic analysis of data using statistical, mathematical, and computational techniques to uncover insights, patterns, and trends.
Business UnderstandingThe initial phase of data science methodology involves seeking clarification and understanding the goals, objectives, and requirements of a given task or problem.
Clustering AssociationAn approach used to learn about human behavior and identify patterns and associations in data.
CohortA group of individuals who share a common characteristic or experience is studied or analyzed as a unit.
Cohort studyAn observational study where a group of individuals with a specific characteristic or exposure is followed over time to determine the incidence of outcomes or the relationship between exposures and outcomes.
Congestive Heart Failure (CHF)A chronic condition in which the heart cannot pump enough blood to meet the body's needs, resulting in fluid buildup and symptoms such as shortness of breath and fatigue.
CRISP-DMCross-Industry Standard Process for Data Mining is a widely used methodology for data mining and analytics projects encompassing six phases: business understanding, data understanding, data preparation, modeling, evaluation, and deployment.
Data analysisThe process of inspecting, cleaning, transforming, and modeling data to discover useful information, draw conclusions, and support decision-making.
Data cleansingThe process of identifying and correcting or removing errors, inconsistencies, or inaccuracies in a dataset to improve its quality and reliability
Data scienceAn interdisciplinary field that combines scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured and unstructured data.
Data science methodologyA structured approach to solving business problems using data analysis and data-driven insights.
Data scientistA professional using scientific methods, algorithms, and tools to analyze data, extract insights, and develop models or solutions to complex business problems.
Data scientistsProfessionals with data science and analytics expertise who apply their skills to solve business problems.
Data-Driven InsightsInsights derived from analyzing and interpreting data to inform decision-making
Decision treeA supervised machine learning algorithm that uses a tree-like structure of decisions and their possible consequences to make predictions or classify instances.
Decision Tree Classification ModelA model that uses a tree-like structure to classify data based on conditions and thresholds provides predicted outcomes and associated probabilities.
Decision Tree ClassifierA classification model that uses a decision tree to determine outcomes based on specific conditions and thresholds.
Decision-Tree ModelA model used to review scenarios and identify relationships in data, such as the reasons for patient readmissions
Descriptive approachAn approach used to show relationships and identify clusters of similar activities based on events and preferences
Descriptive modelingModeling technique that focuses on describing and summarizing data, often through statistical analysis and visualization, without making predictions or inferences
Domain knowledgeExpertise and understanding of a specific subject area or field, including its concepts, principles, and relevant data
Goals and objectivesThe sought-after outcomes and specific objectives that support the overall goal of the task or problem.
IterationA single cycle or repetition of a process often involves refining or modifying a solution based on feedback or new information.
Iterative processA process that involves repeating a series of steps or actions to refine and improve a solution or analysis. Each iteration builds upon the previous one.
LeafThe final nodes of a decision tree where data is categorized into specific outcomes.
Machine LearningA field of study that enables computers to learn from data without being explicitly programmed, identifying hidden relationships and trends.
MeanThe average value of a set of numbers is calculated by summing all the values and dividing by the total number of values.
MedianWhen arranged in ascending or descending order, the middle value in a set of numbers divides the data into two equal halves.
Model (Conceptual model)A simplified representation or abstraction of a real-world system or phenomenon used to understand, analyze, or predict its behavior.
Model buildingThe process of developing predictive models to gain insights and make informed decisions based on data analysis.
Pairwise comparison (correlation)A statistical technique that measures the strength and direction of the linear relationship between two variables by calculating a correlation coefficient.
PatternA recurring or noticeable arrangement or sequence in data can provide insights or be used for prediction or classification.
Predictive modelA model used to determine probabilities of an action or outcome based on historical data.
PredictorsVariables or features in a model that are used to predict or explain the outcome variable or target variable.
PrioritizationThe process of organizing objectives and tasks based on their importance and impact on the overall goal.
Problem solvingThe process of addressing challenges and finding solutions to achieve desired outcomes.
StakeholdersIndividuals or groups with a vested interest in the data science model's outcome and its practical application, such as solution owners, marketing, application developers, and IT administration.
Standard deviationA measure of the dispersion or variability of a set of values from their mean; It provides information about the spread or distribution of the data.
Statistical analysisStand deviations are applied to problems that require counts, such as yes/no answers or classification tasks.
StatisticsThe collection, analysis, interpretation, presentation, and organization of data to understand patterns, relationships, and variability in the data.
Structured data (data model)Data organized and formatted according to a predefined schema or model and is typically stored in databases or spreadsheets.
Text analysis data miningThe process of extracting useful information or knowledge from unstructured textual data through techniques such as natural language processing, text mining, and sentiment analysis.
Threshold valueThe specific value used to split data into groups or categories in a decision tree.

Comments

Popular posts from this blog

Lila's Journey to Becoming a Data Scientist: Her Working Approach on the First Task

Notes on Hiring for Data Science Teams

switch functions