Data analysis software for hydrology research

AI Thread Summary
The discussion centers on data analysis for an arctic hydrology research project involving 150 temperature sensors in Fairbanks, AK. The poster is tasked with compiling and organizing extensive data to identify anomalies and trends, expressing concern about the limitations of Excel for handling large datasets. Recommendations include using R for initial data analysis to manage outliers and then exporting manageable weekly averages to Excel for further analysis. The poster plans to convert voltage data into temperature and soil moisture, acknowledging the challenges posed by sensor outages due to environmental factors. Overall, the focus is on effective data management and analysis techniques suitable for large-scale hydrological data.
Alkemyst
Messages
4
Reaction score
0
Hello there!

This is my first time posting, but I'm a long time reader. I could not find a more appropriate sub-forum.

I have just been hired to compile and organize data for an arctic hydrology research project in Fairbanks, AK. They are studying climate change and have recorded a few years worth of data. 150 temperature sensors in a "glacial polygon" (that's what the researcher said), about 1 meter between each sensor, recording once every hour. I'm scared to calculate how many "values" that would give us...

My employer has said she isn't very picky about how the data is compiled. She just wants to start by finding data anomalies and spotting general trends. I'm basically going to help her enter data and make graphs. She plans to request a "final product" and I can go about getting there however I see fit.

I am familiar with these programs: Excel, Mathematica, MATLAB, Microlab, Datastudio.

Microlab and Datastudio were used in my lower level Chemistry and Physics labs. I've never used Mathematica or MATLAB for generating graphs, but I've heard they work well for that. She only knows excel, which is fairly easy to use. I'm comfortable with it, but it doesn't seem as versatile as the subject-specific programs I mentioned.

I am only a second year undergraduate in chemistry. I have lab experience from 1st and 2nd year science courses and one summer job, but not much else. I essentially know nothing except how to work (which can be enough some times). Can anyone recommend programs that are usually used for large amounts of data like this? Actually, any advice at all is welcome. I just want to do this small job well, and I would put in the time to learn how to use new tools and techniques.

Thank you!

Note: I can provide much more information, but this post is already long.
 
Physics news on Phys.org
I would actually argue excel is more versatile than most of the other programs you mention for what you're doing. That's a subjective opinion.

HOWEVER, excel just isn't going to handle all that data well.

24 hours X 365 days a year X 150 sensors is already past the excel row limit.

So nix that one. I'd mess with mathematica and MATLAB last.

What about R?
 
If I was in your shoes, here’s what I would do:

Use R or some similar program (if one of the others you listed works, great) to find the extreme outliers. Figure out if there’s a reason for their difference, of if they are erroneous, and remove IF appropriate.

Export the weekly averages to Excel (or to a file that can be imported to Excel). The reason I would do this is because I’m good with Excel; few people realize how powerful a program it really is. By using weekly averages you’re down to 25k data points per year, which is very manageable. You may choose to roll it up again into monthly values or by groups (or all) of sensors, but I always try to get the most detailed data I can into the workbook before I go rolling anything up, in case I need to get back to it later.

Find your mean, your median, your variance for the whole set, then various subsets (groups of devices, periods of time, etc.). Make some graphs – maybe a lot of graphs – to try and find patterns or problems in the data.

Ask questions about why the data is the way it is. Do your best to answer them. Once you have some specific question, more specific statistical tools may be necessary to answer them.

Produce some reports that will stand as useful discussion with your supervisor; these should include your methods, what you’ve done, what information you obtained from it. This will be made easier if you make notes as you go. Set a meeting time with your supervisor and present them.

But that’s just me, someone else may have some better advice.
 
Thank you!

That is tremendously helpful.

I finally got my hands on the actual data itself just yesterday. Most of this weekend will involve basic sorting and organization, which is just fine in excel. The next step is conversions from voltage into temperature and soil moisture. It'll be a bit before I can even start on analysis, but excel will be fine for most of the first steps. Actually, this will probably keep me busy for awhile.

She listed several reasons for outliers or "gaps." Some of the sensors are solar powered, and Alaskan winters have very little sun, so two or three months out of every year "cut out." Also, some simply get dug up and chewed on by wild animals. They try to check every once in a while to recalibrate, but one little blip obviously isn't worth a flight up to Barrow.

The nature of my employment seems more secretarial than student. She requested more in terms of tracking down and labeling sets of data than in terms of actual analysis and scientific judgment.

Thank you again for your reply!

-- Salome'
 

Similar threads

Replies
3
Views
2K
Replies
3
Views
3K
Replies
12
Views
3K
Replies
6
Views
4K
Replies
5
Views
3K
Replies
3
Views
3K
Replies
1
Views
3K
Replies
1
Views
3K
Back
Top