How Can Someone Become a Data Scientist?

 

A real data scientist, the high-end data scientists, 
are mostly PhDs. 
They often come out of physics, out of statistics, 
they have to have a computer science background, 
they have to have a math background, 
they have to know about databases and statistics 
and probability and all that stuff. 
I think the first skills you need is you need 
to know how to program, 
at least have some computational thinking, 
so having taken a programing course, 
you need to know some algebra, at least up to analytics, 
geometry, and hopefully some calculus, 
some basic probability, some basic statistics, 
I mean really have to understand the difference 
and different statistical distributions, and database. 
I mean, one of the easiest places to start 
is relational databases, which stores lots and lots 
of our data so people can first walk before they can run 
by at least understanding about computers and databases 
and how we store things and if you understand 
relational databases nowadays you can still, 
just with that understanding, use big data clusters 
as if they were just a big relational database. 
You don't have to really have understand the whole 
MapReduce programming model. 
But then, as you go further up in the field, 
then you have to know a lot of computer science theory 
and statistics, it's really, and probability, 
it's really the intersection of them 
that the high end data scientists, 
the PhD data scientists work with. 
(music)
I do a lot of self-learning. 
I think everybody these days, 
I mean, I learned about Hadoop all by myself, 
I read some articles, I watched some videos, 
I thought, I played, although I'm a builder, 
I'm a tinkerer, so if I wanna figure out 
how to do something, I build it. 
I mean, my first HPC cluster 
I heard about this term a Beowulf cluster, 
I mean, yeah, what the hell's that? 
So I looked it up and said, oh, 
it's just a bunch of computers hooked together 
with a TCP/IP network, that's pretty easy, 
so we get a grant from Citi Bank 
and we built a five thing cluster and I said, 
oh, well, that's HPC. 
I said, I had one of the first HPC clusters 
at the university, it was tiny but a lot of our 
researchers loved it because they could run stuff 
40 and 50 times faster. 
So I think one of the ways you learn things is you do them, 
you have to do them, and these online learning platforms 
especially now that we have things like IPython 
and Jupyter Notebooks and I guess Zeppelin 
means that you can actually go in 
and take some of these courses 
and you can do things right then 
and you can see them and feel them and play with them 
and, at that point, you know, you'll start to get your 
head around what is actually happening. 
Motivation is the key problem in all of these, 
is how to keep people motivated 
and I think the badge system that the, what was it, 
Big Data University has, is one of the ways 
is how do you get people to keep going through. 
But if they want to, they can. 
It's up to the individual to. 
So they have to understand what the goal is. 
(music)
The place it can't sit 
is probably under the CIO, the Chief Information Officer. 
CIOs current chief information officers in many companies 
got there from an accounting background 
or a finance background, they're clueless. 
Sorry. 
But they really, it has to come out of the research side. 
So you'll find data scientists primarily in companies 
that have some research agenda, pharmaceuticals, 
finance, all of, any technology company. 
If you look at, we can't keep some of our 
PhD data scientists in our program, 
they are now at Facebook, 
they're at Linkedin, they're at Uber, they're at Lyft, 
because the demand out there for the PhD level 
data scientist is just unbelievable. 
They make large amounts of money 
and they're playing with problems 
that are really, really neat. 
How do you schedule the Uber cars? 
You have enormous amounts of data.

Comments

Popular posts from this blog

Lila's Journey to Becoming a Data Scientist: Her Working Approach on the First Task

Notes on Hiring for Data Science Teams

switch functions