1. Limited time only! Sign up for a free 30min personal tutor trial with Chegg Tutors
    Dismiss Notice
Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Getting into data science from computational physics?

  1. Feb 6, 2015 #1


    User Avatar

    I'm currently working on a master's degree in physics where my project uses C++. I have read about how some physics phD's were able to get data scientist roles despite working on computational astrophysics. This is a little surprising to me since I thought experimentalists would be more suited to data science since they work with data and data analysis more than computationists.

    If anyone here has gone from computational physics to data science, or know anyone who made the switch, can you explain how? Just go to kaggle and work on data sets?

    Also, do companies only hire phD's for entry-level data scientist roles? With just a master's, if I can't get a data scientist role, what other similar jobs are available?
  2. jcsd
  3. Feb 6, 2015 #2
    Where did you read that about moving from computational astrophysics to data science?
  4. Feb 6, 2015 #3


    User Avatar

    I browsed a few profiles of former computational astrophysics students (for example, by checking the 'previous students' of some profs) and read that they now work as data scientists
  5. Feb 6, 2015 #4
    Ah okay, was wondering if there was an article, etc. out there about it.

    As far as I know, one does not typically obtain many of the requisite skills for handling or analyzing large data sets in astrophysics, experimental or otherwise. Of course, anyone who gets a physics degree hopefully picks up the ability to learn new and difficult things quickly and without much external help.

    "Data scientist" is a pretty vague term that seems to mean different things to different people. As far as I can tell, it comes down to an understanding of database architecture, the skills needed to retrieve and analyze that data (SQL, SAS, Access, Excel, etc.) and some knowledge of more advanced data mining techniques.

    SQL would be a very good place to start. Play around with some cloud data analytics and maybe some student versions of TOAD, SAS or other query platforms. Ultimately anyone who hires you will have their own (sometimes very unusual) process for data extraction and analysis, so be prepared to be flexible.
  6. Feb 7, 2015 #5


    User Avatar
    Education Advisor

    From what I understand, that is only true for those who have completed an undergraduate degree in physics. Once you go further into graduate studies, particularly a PhD but even masters level studies, students in experimental areas of physics are expected to analyze large data sets as part of their research work. At least that is how it was explained to me by my friends who ended up pursuing graduate studies in physics.

    "Data science" is as much involved with the statistical analysis of high-dimensional data that is generated in a variety of different areas (e.g. marketing data, sensor data from GIS, genomics/proteomics data, times series data of financial transactions, medical data, etc.) as understanding the underlying database architecture. Certainly understanding how to retrieve the data using SQL, SAS, or Access is important, but that is only a part of the piece. In essence, I regard "data science" as a fancy re-labelling of statistical analysis.
  7. Feb 7, 2015 #6


    User Avatar

    So then how can students in computational physics get to analyze large data sets?
  8. Feb 7, 2015 #7
    Hah yea, that is a good description of what they think.
  9. Feb 8, 2015 #8
    Yes - that was one of my standard explanations when somebody asked how a physicist could transition into 'IT'. I worked on the optimization of the properties of thin films, thus varied different experimental parameters when manufacturing those films and then measured electrical and optical properties. Making sense of the effect of the impacts of experiments effectively meant navigating some area in a multi-dimensional space of parameters and properties .

    Another thing, mundane as it was, was organizing data in a database and 'normalizing ambiguous data'. I learned about relational databases when trying to keep track of tons of STEM / TEM images and important findings, at that time usually documented only by hand-written comments in a lab notebook.
  10. Feb 9, 2015 #9
    A friend of mine did his PhD in computational biophysics and is currently doing a post-doc in bioinformatics. There may be a similar sort of strategy for you although nothing springs to mind.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook