Data Collection

 


In the Data Collection stage of the Data Science Methodology, the analogy of shopping for ingredients is used to illustrate the process of revising data requirements based on initial data collection, assessing the obtained data, and making decisions about the quantity and quality of data needed. Just as a chef assesses ingredients on a cutting board before cooking, data scientists evaluate collected data through techniques like descriptive statistics and visualization to understand its content, quality, and insights.


The case study highlights the following aspects of data collection:


Data Sources: Identifying sources for required data elements, such as demographic, clinical, and coverage information of patients, provider details, claims records, and pharmaceutical information related to congestive heart failure patients.


Integration Challenges: Some data sources, like drug information, may not be integrated initially with the rest of the data. However, it's acceptable to defer decisions about unavailable data and attempt to acquire it at a later stage, especially if initial modeling results suggest its importance.


Collaboration with DBAs and Programmers: Data collection often involves collaboration between data scientists, database administrators (DBAs), and programmers to extract and merge data from various sources. This collaboration ensures data integrity, removes redundancy, and prepares the data for further analysis.


Data Management Discussions: Data scientists and analytics team members may discuss ways to improve data management processes, including automation of certain tasks in the database to streamline data collection and ensure its accuracy and efficiency.


By following a systematic approach to data collection, data scientists can ensure they have the necessary ingredients for analysis and make informed decisions about managing and integrating data from different sources. This sets the stage for the next phase of the methodology, which is data understanding.





Comments

Popular posts from this blog

Lila's Journey to Becoming a Data Scientist: Her Working Approach on the First Task

Notes on Hiring for Data Science Teams

switch functions