• Support PF! Buy your school textbooks, materials and every day products Here!

How to quickly prepare for a Data Science Career

  • Job Skills
  • Thread starter FallenApple
  • Start date
  • #1
565
60

Main Question or Discussion Point

Some background, I have a Masters in Statistics and Bachelors in Applied Mathematics so It's not like I'm starting from scratch. But after graduation, I've been involved in education. I'm about one year out. I suppose this would hurt me in breaking in the field. I've interviewed at a few places. I actually didn't pass because they asked me a few questions about stuff I should have know, but became rusty on. I've made it the the second round though once. I suppose that's not too shabby. Some questions I flat out didn't know. I suppose that has to deal with domain knowledge specific to the company, so I won't bother prepping for those until the last min.

So what is the best way to prepare given my situation?

Right now I'm reading Introduction to Statistical Learning and Elements of Statistical Learning. I wonder if its worth it.

What stepping stone positions would help me break in?

Would Kaggle Help? I only know Linear regression and Generalized Linear regression off the top of my head. I forgot about Baysean methods and Longitudinal Regression, though I can review if need be.

What about Machine Learning Algorithms? I know how descent algorithms works, but it seems lots of packages have these built in already. So skip these?

I know R and some Python. Some OOP but rusty.

I just need a catch-all list of the skills I need. For the domain knowledge of the company, Ill just prep for those once offered the interview.

I don't care to get into a top company at this point. Most important thing to me is that I kickstart my momentum so that I can actually get somewhere.


I feel like I'm running out of time.
 

Answers and Replies

  • #2
138
25
This might not be the best place to ask, since as far as I know, none of the regular posters here actually work as a data scientist?

The other data science sites Ive seen are inundated with students trying to break into the field, but very few actual working data scientists.
 
  • #3
1,806
168
I work in a department that assists in building large software solutions for internal clients at a very large company. We employ traditional statistics, bayesian statistics, machine learning and various optimization methods (LP/IP/MIP/etc.) to build the scientific core to the solution. We do not call ourselves data scientists, but I don't see the difference. Take my opinions with however much salt you see appropriate.

Some background, I have a Masters in Statistics and Bachelors in Applied Mathematics so It's not like I'm starting from scratch. But after graduation, I've been involved in education. I'm about one year out. I suppose this would hurt me in breaking in the field.
Yes, but not that much. As long as you're not rusty you should have a shot.

I've interviewed at a few places. I actually didn't pass because they asked me a few questions about stuff I should have know, but became rusty on.
Doh, time to brush back up on that stuff. Our interviewing process has a few steps. One of them is very heavy on the basics; think statistics qualifying exam type questions.

So what is the best way to prepare given my situation?

Right now I'm reading Introduction to Statistical Learning and Elements of Statistical Learning. I wonder if its worth it.
Those are both great texts. Elements is obviously the more comprehensive of the two, but I might suggest you go through ISL once (reasonably thoroughly). You'll gain more in the short term by knowing all of Introduction well than you will by knowing some of Elements well. (Of course, if you can just blow through Elements in no time, do that!)

The next big thing are programming languages, which you mention later.

What stepping stone positions would help me break in?
Someone else will have to help with this one. For actuarial work, almost any office job, but especially analyst jobs, go a long way. But for data science, I don't notice those having the same weight. Most data scientists I know got the job because they had relevant education or experience (relevant education meaning graduate work in the area the employer needed).

Would Kaggle Help? I only know Linear regression and Generalized Linear regression off the top of my head. I forgot about Baysean methods and Longitudinal Regression, though I can review if need be.
Yes. I mean, personally I don't think much of Kaggle competitions, and lots of my coworkers feel the same way. But if you documented your process well and did a few different ones, the work you did could still mean a lot. I suppose I'm saying focus on the process and the learnings rather than how well you scored. It isn't that hard to copy and paste a solution that does reasonably well.

What about Machine Learning Algorithms? I know how descent algorithms works, but it seems lots of packages have these built in already. So skip these?
In our interviews we expect candidates to be able to walk us through how some of the basic algorithms work. We expect them to understand some detail around how they function.

I know R and some Python. Some OOP but rusty.
Yes, keep with those two. In our department we also leverage Java and C++ (and a few others to a lesser degree), but Python and R is plenty to expect from candidates, and frankly I'd love it if that's all we used.

I feel like I'm running out of time.
It's good you're energized, keep at it.

I might also suggest actuarial work. It also suffers from a big influx of applicants, so don't think of it as a choice between the two; instead, if you are interested in both, pursue both.

Best wishes!
 
  • #4
565
60
I work in a department that assists in building large software solutions for internal clients at a very large company. We employ traditional statistics, bayesian statistics, machine learning and various optimization methods (LP/IP/MIP/etc.) to build the scientific core to the solution. We do not call ourselves data scientists, but I don't see the difference. Take my opinions with however much salt you see appropriate.



Yes, but not that much. As long as you're not rusty you should have a shot.



Doh, time to brush back up on that stuff. Our interviewing process has a few steps. One of them is very heavy on the basics; think statistics qualifying exam type questions.
Thanks for your detailed response! It seems like you are enjoying working on cutting edge stuff.

It good to hear that my degrees haven't reached an expiration date.

Should I prioritize the more practical aspects of data analysis? Or would knowing the theory and equations at an academic level matter as well? I passed those quals, but a lot of it was really heavy on matrix computations and probability theory.



Those are both great texts. Elements is obviously the more comprehensive of the two, but I might suggest you go through ISL once (reasonably thoroughly). You'll gain more in the short term by knowing all of Introduction well than you will by knowing some of Elements well. (Of course, if you can just blow through Elements in no time, do that!)
Ah ok got it. I suppose it goes without saying to work though much of the problems as well. I'll work though them and document my code.



Someone else will have to help with this one. For actuarial work, almost any office job, but especially analyst jobs, go a long way. But for data science, I don't notice those having the same weight. Most data scientists I know got the job because they had relevant education or experience (relevant education meaning graduate work in the area the employer needed).
I'm trying to break into those jobs. Seems competitive. I was thinking about going back for a Phd to improve my chances. But I see people with a Masters in Stats make it.





In our interviews we expect candidates to be able to walk us through how some of the basic algorithms work. We expect them to understand some detail around how they function.
Ah ok that makes sense. Like more of a verbal description about the intuition behind them instead of implementation. Makes sense. When I coded those, it took the better part of a day and lots and lots of trouble shooting.

Yes, keep with those two. In our department we also leverage Java and C++ (and a few others to a lesser degree), but Python and R is plenty to expect from candidates, and frankly I'd love it if that's all we used.
Got it. Are hard coding tests given commonly? Like the ones that you have to use Leet Code or Cracking the Coding Interview?


It's good you're energized, keep at it.

I might also suggest actuarial work. It also suffers from a big influx of applicants, so don't think of it as a choice between the two; instead, if you are interested in both, pursue both.

Best wishes!
Thanks! Yeah I really like statistics/machine learning so a career in something utilizing those would be really fun. I actually passed the first actuarial exam way back in 2011, but I didn't follow through with that path. I regret that now.
 
  • #5
gleem
Science Advisor
Education Advisor
1,602
963
Over the last several years recognizing the importance of big data and data analytics and the plethora of trained scientists and engineers coming onto the market and not finding adequate employment a number of educational programs have been instituted to fill in the information and expertise gaps necessary for employment in the big data/analytics field. They are called data boot camps. They are fairly short for college graduate with expertise in math. statistics and programming varying from about one to six months. Many are legit but as usual you have to use good judgment in selecting a program as I am sure there are those just trying to take advantage of the student. Some of the camps are not cheap, some offer refunds if you do not get a job within a certain span of time or so they say. See for example. https://www.cio.com/article/3051124/careers-staffing/10-boot-camps-to-kick-start-your-data-science-career.html

Many companies expect experience/knowledge in their particular area of interest which the new grad probably does not have. Companies tend to want people who have some knowledge of their company and work product and want the new employee to hit the ground running.
 
  • #6
Zap
170
56
I'm reading an Introduction to Statistical Learning, too.

Why not just take classes on DataCamp.com? It's only 30 dollars a month to take their classes. I haven't taken any of them yet, but I've heard good things, and I plan to try them out soon. It's only 30 bucks. I've read mixed reviews about those data bootcamps. They are pretty fishy, so to speak.

I think some data analyst positions are being labeled as data science when they may or may not be. I am cool with being a data analyst, though. Depending on the type of analysis you do, I think it could be a stepping stone to data science, but I really don't know.

Why not apply to be a data analyst? You need experience to be a data scientist. I know of at least one position in data analysis that doesn't require experience and will train you, as long as you have a math or physics degree. It's not data science, though. I'm not sharing it, either, because I want it.
 
Last edited:
  • #7
32
27
I'm reading an Introduction to Statistical Learning, too.

Why not just take classes on DataCamp.com? It's only 30 dollars a month to take their classes. I haven't taken any of them yet, but I've heard good things, and I plan to try them out soon. It's only 30 bucks. I've read mixed reviews about those data bootcamps. They are pretty fishy, so to speak.

I think some data analyst positions are being labeled as data science when they may or may not be. I am cool with being a data analyst, though. Depending on the type of analysis you do, I think it could be a stepping stone to data science, but I really don't know.

Why not apply to be a data analyst? You need experience to be a data scientist. I know of at least one position in data analysis that doesn't require experience and will train you, as long as you have a math or physics degree. It's not data science, though. I'm not sharing it, either, because I want it.
I've used Datacamp before and it's pretty basic. It's ok for an introduction, but realize that you'll need to do projects or other courses to get a solid knowledge of what you're studying. Both Coursera and EDX have more comprehensive courses if that's what you're looking for.
 
  • #8
gleem
Science Advisor
Education Advisor
1,602
963
@FallenApple I know litttle of data science. However if I were interested in seeking employment in this are I would spend a lot of time on the Web which if is full of info on data science requirements and skills some of which may not be obvious. For example:

https://www.artificialintelligence-news.com/2018/11/26/how-to-be-a-data-scientist-get-deeply-involved-in-big-data-and-cloud-to-build-high-quality-ai-products/

You might also look at this Linkedin learning blog

.https://www.linkedin.com/learning/topics/data-science?trk=lilblog_09-17-18_how-to-thrive-in-the-age-of-AI_tl&cid=70132000001AyziAAC

or this Code Academy offerings as.

https://www.codecademy.com/learn/technical-interview-practice-python?utm_source=customer_io&utm_campaign=monday_course_drop_1_7_19&utm_medium=email&utm_content=inline_link
 
  • #9
Zap
170
56
I am taking a course in SQL, R and Artificial Intelligence. I am also studying a bit about statistical learning. I realize that is probably not enough to become a data scientist, but I think it could be enough to be a data analyst. During my phone interview, they were just interested in any kind of generic coding experience. I think they were just waiting for a few key words like Java or C++. The HR guy didn’t even know what SQL was. I know data analysts don't typically have the same salaries as data scientists, but they can still be pretty good none the less. I don't think it's possible to cram as much experience as it would take to become a data scientist from now until May, but I will make note of this thread for future reference.
 
Last edited:
  • #10
rkr
Gold Member
15
15
The most significant part of practical data science work is just {data munging, preprocessing, cleanup}. Kaggle unfortunately doesn't do a good job of teaching that, as it usually comes with a cleaned and standardized data set. Often the cleanup involves specific domain knowledge. Software frameworks involved here include annotation/labeling GUIs, the ELK stack, pandas for time series, sklearn's preprocessing module.

Depending on the industry you're going into, the next most significant part is generally experimental design. In web-based companies, this means things like designing A/B tests so that you can decide if moving a button several pixels in a certain direction would increase user click-through rates etc., then transforming these into experimental designs into repeatable (production ready) pipelines. Tools that get used in this space include varying flavors of SQL, Spark, dockerized containers, airflow etc.

I argue that Python is a lot more monetizable than R for data science work, because R is neither appropriate for preprocessing nor production.

I don't think there's one sequence of things to do that will best prepare you for data science, but I'd pick up anything that gives you motivation and sufficient time to learn some subset of the frameworks/libraries/tools named above.
 
  • #11
1,806
168
I totally agree with rkr's sentiment and general post. I would like to clarify his tool list, not because it's wrong but just to emphasize how flexible those tools are.

SQL, Spark and Pandas are all general ETL & data manipulation tools, and are not specific to time series or A/B testing (though they can all play a part in preparing data for that). They are also not mutually exclusive either, as Spark and Pandas can both leverage SQL into any database that has a SQL API (and most structured dbs will).

Containerization (Docker or otherwise) is an environment management tool, and could also be described as type of system architecture. Typically you wrap your module in an API (e.g. Flask), containerize it and launch onto one or more hosts. The point being that you can containerize most anything - ETL processes using Pandas, your model training modules, your forecasting modules, and of course your A/B testing modules as rkr mentioned. Today service-based architectures composed of interacting containers is very common. It looked like the future too, until serverless services such as AWS Lambda came along.
 
  • #12
1,806
168
The longer I've spent in the prescriptive analytics area the more the modeling seems like the easiest part of the job. Data procurement, data management, data cleaning, architecting the solution, code management and standardization, testing (unit tests, property/contract testing, system integration testing, UAT, E2E, etc) all add up to so much. I suppose the models are still the heart, right?

But my point is that there's more to data science than modeling, which means there are other avenues into the field (or close enough to it).
 
  • #13
Zap
170
56
* Extract and validate data for the company's databases
* Develop and troubleshoot programming code
* Research and troubleshoot data discrepancies
* Software Development
 

Related Threads on How to quickly prepare for a Data Science Career

Replies
4
Views
720
  • Last Post
Replies
2
Views
784
Replies
10
Views
1K
Replies
1
Views
7K
  • Last Post
Replies
3
Views
3K
Replies
33
Views
11K
  • Last Post
Replies
6
Views
951
  • Last Post
Replies
1
Views
2K
  • Last Post
Replies
16
Views
7K
Top