Glossary: From Requirements to Collection

 This alphabetized glossary contains many of the terms you'll find within this lesson. These terms are important for you to recognize when working in the industry, when participating in user groups, and when participating in other certificate programs.


TermDefinition
Analytics teamA group of professionals, including data scientists and analysts, responsible for performing data analysis and modeling.
Data collectionThe process of gathering data from various sources, including demographic, clinical, coverage, and pharmaceutical information.
Data integrationThe merging of data from multiple sources to remove redundancy and prepare it for further analysis.
Data PreparationThe process of organizing and formatting data to meet the requirements of the modeling technique.
Data RequirementsThe identification and definition of the necessary data elements, formats, and sources required for analysis.
Data UnderstandingA stage where data scientists discuss various ways to manage data effectively, including automating certain processes in the database.
DBAs (Database Administrators)The professionals who are responsible for managing and extracting data from databases.
Decision tree classificationA modeling technique that uses a tree-like structure to classify data based on specific conditions and variables.
Demographic informationInformation about patient characteristics, such as age, gender, and location.
Descriptive statisticsTechniques used to analyze and summarize data, providing initial insights and identifying gaps in data.
Intermediate resultsPartial results obtained from predictive modeling can influence decisions on acquiring additional data.
Patient cohortA group of patients with specific criteria selected for analysis in a study or model.
Predictive modelingThe building of models to predict future outcomes based on historical data.
Training setA subset of data used to train or fit a machine learning model; consists of input data and corresponding known or labeled output values.
Unavailable dataData elements are not currently accessible or integrated into the data sources.
UnivariateModeling analysis focused on a single variable or feature at a time, considering its characteristics and relationship to other variables independently.
Unstructured dataData that does not have a predefined structure or format, typically text images, audio, or video, requires special techniques to extract meaning or insights.
VisualizationThe process of representing data visually to gain insights into its content and quality.

Comments

Popular posts from this blog

Lila's Journey to Becoming a Data Scientist: Her Working Approach on the First Task

Notes on Hiring for Data Science Teams

switch functions