Undergrad Data Analaysis -- How to choose the best statistical model to use?

Click For Summary
When choosing a statistical model for data analysis with limited data points, it is crucial to use methods like Bayesian Information Criterion (BIC) or Akaike Information Criterion (AIC) for model comparison. However, with only 10 data points, results may be unreliable due to high sensitivity to sample noise, making it advisable to gather more data, ideally at least 40 points. For cases with uncertain distributions, using a spline curve can provide a valid representation of the data. Software like R and RStudio can facilitate the implementation of AIC and BIC, as they are already included in many statistical packages. Ultimately, a numerical comparison alongside visual methods can enhance understanding of model performance.
parazit
Messages
75
Reaction score
3
TL;DR
What is the most conviniant statistical model or method that I should use to determine the most consistent model with the measurements?
Hi all.

Let's assume I have a situation as following. I have a set of x values containing 10 data points. I also got the corresponding measurement values for that each x data points, as y values, and the error on them. Then, I perform calculations, with let's say 5 different models, in where I use the x values to obtain the y values.

In the end, I have x values, measured y values, and their errors, and five different sets of y values. You may see the attached file as an example.

My question is this: What is the most conviniant statistical model or method that I should use to determine the most consistent model with the measurements? Should I use chi-square, reduced chi-square, mean squared error, root mean square error, mean weighted deviation, the relative variance, Kolmogorov-Smirnov or something else?

You may wonder the distribution of the y values like are they linear, polynomial or etc. Let's assume they do not have a certain distribution or their distribution varies for different situation. My main interest in here is to point a statistical method for such cases.

Thank you so much for your time in advance.
 

Attachments

Last edited by a moderator:
Physics news on Phys.org
This is called "model comparison". I generally use the BIC or the closely related AIC for model comparison:

https://en.wikipedia.org/wiki/Bayesian_information_criterionhttps://en.wikipedia.org/wiki/Akaike_information_criterion
Note, you should not use the same data for model selection as for model testing. With only 10 data points you have far too few to do either job reliably. Your result is likely to be highly dependent on the sample noise and not very robust at all. I would recommend acquiring much more data, 40 at a bare minimum so that you can have 20 points for model selection and 20 points for model testing, but ideally substantially more than that.
 
  • Informative
Likes Klystron
It sounds like you want a general method to use when you have no idea about the theoretical model or statistical distributions that apply. In that situation, I do not think that you should concern yourself with a statistical result. A spline curve through the data points would be as valid as anything.
 
  • Like
Likes Klystron and Dale
Dale said:
This is called "model comparison". I generally use the BIC or the closely related AIC for model comparison:

https://en.wikipedia.org/wiki/Bayesian_information_criterionhttps://en.wikipedia.org/wiki/Akaike_information_criterion
Note, you should not use the same data for model selection as for model testing. With only 10 data points you have far too few to do either job reliably. Your result is likely to be highly dependent on the sample noise and not very robust at all. I would recommend acquiring much more data, 40 at a bare minimum so that you can have 20 points for model selection and 20 points for model testing, but ideally substantially more than that.

Thank you so much for your reply Dale. The file was just a sample to show the data and their distrubution. I have looked into BIC and AIC yet I am more confused now since I have no experience about obtaining them. It will be a blessing for me if you could show me a way out or an example. Thank you so much for your guidance in advance.
Best regards.
 
FactChecker said:
It sounds like you want a general method to use when you have no idea about the theoretical model or statistical distributions that apply. In that situation, I do not think that you should concern yourself with a statistical result. A spline curve through the data points would be as valid as anything.

Dear FactChecker,

Thanks for your reply. You're right actually. I normally plot a spline line with the obtained calculation results to compare them visually with the experimental data. However, I also would like to have a numerical comparison for a better understanding. This is why I asked about them. Thank you for your contribution, reply and time.
Best regards.
 
parazit said:
I have looked into BIC and AIC yet I am more confused now since I have no experience about obtaining them. It will be a blessing for me if you could show me a way out or an example.
Most good software packages will have them already implemented and you just have to call them. I would not recommend implementing them by hand!

I use R for my statistics, it is free and is very powerful and RStudio ( https://www.rstudio.com/ ) is a very nice distribution of R. Installing RStudio will probably take less time than programming your own AIC or BIC. Here is the page for AIC and BIC using R including an example at the bottom:

https://stat.ethz.ch/R-manual/R-devel/library/stats/html/AIC.html
 
  • Like
Likes FactChecker
First trick I learned this one a long time ago and have used it to entertain and amuse young kids. Ask your friend to write down a three-digit number without showing it to you. Then ask him or her to rearrange the digits to form a new three-digit number. After that, write whichever is the larger number above the other number, and then subtract the smaller from the larger, making sure that you don't see any of the numbers. Then ask the young "victim" to tell you any two of the digits of the...

Similar threads

  • · Replies 5 ·
Replies
5
Views
6K
  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 16 ·
Replies
16
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 20 ·
Replies
20
Views
3K
  • · Replies 6 ·
Replies
6
Views
1K