Data sccience - 1
The notes discuss the evolution of data processing, particularly in the context of big data clusters. Traditionally, data was brought to the computer for processing, but in big data clusters, data is sliced into pieces and distributed across thousands of computers. Larry Page and Sergey Brin pioneered this approach, distributing data pieces to various computers in a cluster, each running the same program on its allocated data subset. The results are then aggregated, sorted, and processed through map and reduce processes. This architecture scales linearly, offering efficient handling of large datasets.
The notes also touch upon the history of big data with the development of Hadoop, a system inspired by Google's architecture. The scalability of big data clusters proved beneficial for major social media companies, and Yahoo adopted this technology.
Transitioning to data science, the notes mention that its foundational components, such as probability, statistics, algebra, programming, and databases, have existed for decades. However, advancements in computational capabilities have enabled the application of machine learning to analyze large datasets for pattern recognition. This has led to the emergence of Decision Sciences, combining traditional areas like computer science and mathematics.
The notes conclude by highlighting the recent popularity of terms like "data science" and "big data," with an emphasis on the evolving nature of business analytics and data science. Deep learning, particularly multi-layer neural networks, has become a prominent addition to the field, impacting companies like Google and Facebook. The notes underline the ongoing evolution and growth of data science and analytics.
Comments
Post a Comment