Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Data analysis software for hydrology research

  1. Jul 26, 2012 #1
    Hello there!

    This is my first time posting, but I'm a long time reader. I could not find a more appropriate sub-forum.

    I have just been hired to compile and organize data for an arctic hydrology research project in Fairbanks, AK. They are studying climate change and have recorded a few years worth of data. 150 temperature sensors in a "glacial polygon" (that's what the researcher said), about 1 meter between each sensor, recording once every hour. I'm scared to calculate how many "values" that would give us...

    My employer has said she isn't very picky about how the data is compiled. She just wants to start by finding data anomalies and spotting general trends. I'm basically going to help her enter data and make graphs. She plans to request a "final product" and I can go about getting there however I see fit.

    I am familiar with these programs: Excel, Mathematica, MATLAB, Microlab, Datastudio.

    Microlab and Datastudio were used in my lower level Chemistry and Physics labs. I've never used Mathematica or MATLAB for generating graphs, but I've heard they work well for that. She only knows excel, which is fairly easy to use. I'm comfortable with it, but it doesn't seem as versatile as the subject-specific programs I mentioned.

    I am only a second year undergraduate in chemistry. I have lab experience from 1st and 2nd year science courses and one summer job, but not much else. I essentially know nothing except how to work (which can be enough some times). Can anyone recommend programs that are usually used for large amounts of data like this? Actually, any advice at all is welcome. I just want to do this small job well, and I would put in the time to learn how to use new tools and techniques.

    Thank you!

    Note: I can provide much more information, but this post is already long.
  2. jcsd
  3. Jul 27, 2012 #2
    I would actually argue excel is more versatile than most of the other programs you mention for what you're doing. That's a subjective opinion.

    HOWEVER, excel just isn't going to handle all that data well.

    24 hours X 365 days a year X 150 sensors is already past the excel row limit.

    So nix that one. I'd mess with mathematica and matlab last.

    What about R?
  4. Jul 27, 2012 #3
    If I was in your shoes, here’s what I would do:

    Use R or some similar program (if one of the others you listed works, great) to find the extreme outliers. Figure out if there’s a reason for their difference, of if they are erroneous, and remove IF appropriate.

    Export the weekly averages to Excel (or to a file that can be imported to Excel). The reason I would do this is because I’m good with Excel; few people realize how powerful a program it really is. By using weekly averages you’re down to 25k data points per year, which is very manageable. You may choose to roll it up again into monthly values or by groups (or all) of sensors, but I always try to get the most detailed data I can into the workbook before I go rolling anything up, in case I need to get back to it later.

    Find your mean, your median, your variance for the whole set, then various subsets (groups of devices, periods of time, etc.). Make some graphs – maybe a lot of graphs – to try and find patterns or problems in the data.

    Ask questions about why the data is the way it is. Do your best to answer them. Once you have some specific question, more specific statistical tools may be necessary to answer them.

    Produce some reports that will stand as useful discussion with your supervisor; these should include your methods, what you’ve done, what information you obtained from it. This will be made easier if you make notes as you go. Set a meeting time with your supervisor and present them.

    But that’s just me, someone else may have some better advice.
  5. Jul 28, 2012 #4
    Thank you!

    That is tremendously helpful.

    I finally got my hands on the actual data itself just yesterday. Most of this weekend will involve basic sorting and organization, which is just fine in excel. The next step is conversions from voltage into temperature and soil moisture. It'll be a bit before I can even start on analysis, but excel will be fine for most of the first steps. Actually, this will probably keep me busy for awhile.

    She listed several reasons for outliers or "gaps." Some of the sensors are solar powered, and Alaskan winters have very little sun, so two or three months out of every year "cut out." Also, some simply get dug up and chewed on by wild animals. They try to check every once in a while to recalibrate, but one little blip obviously isn't worth a flight up to Barrow.

    The nature of my employment seems more secretarial than student. She requested more in terms of tracking down and labeling sets of data than in terms of actual analysis and scientific judgment.

    Thank you again for your reply!

    -- Salome'
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook