Data Analaysis -- How to choose the best statistical model to use?

Click For Summary
SUMMARY

This discussion focuses on selecting the most suitable statistical model for data analysis when dealing with limited data points. The user has 10 x values and corresponding y values with errors, and seeks guidance on methods such as chi-square, AIC, and BIC for model comparison. Key recommendations include acquiring at least 40 data points for reliable model selection and testing, and utilizing spline curves for visual comparisons. R and RStudio are suggested as effective tools for implementing AIC and BIC calculations.

PREREQUISITES
  • Understanding of statistical models and their applications
  • Familiarity with AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion)
  • Basic knowledge of R programming for statistical analysis
  • Concept of model selection and testing in statistics
NEXT STEPS
  • Learn how to implement AIC and BIC in R using the provided documentation
  • Research the use of spline curves for data visualization and comparison
  • Explore methods for increasing sample size and its impact on model reliability
  • Investigate the implications of using different statistical models on data interpretation
USEFUL FOR

Data analysts, statisticians, and researchers looking to enhance their understanding of model selection and comparison in statistical analysis.

parazit
Messages
75
Reaction score
3
TL;DR
What is the most conviniant statistical model or method that I should use to determine the most consistent model with the measurements?
Hi all.

Let's assume I have a situation as following. I have a set of x values containing 10 data points. I also got the corresponding measurement values for that each x data points, as y values, and the error on them. Then, I perform calculations, with let's say 5 different models, in where I use the x values to obtain the y values.

In the end, I have x values, measured y values, and their errors, and five different sets of y values. You may see the attached file as an example.

My question is this: What is the most conviniant statistical model or method that I should use to determine the most consistent model with the measurements? Should I use chi-square, reduced chi-square, mean squared error, root mean square error, mean weighted deviation, the relative variance, Kolmogorov-Smirnov or something else?

You may wonder the distribution of the y values like are they linear, polynomial or etc. Let's assume they do not have a certain distribution or their distribution varies for different situation. My main interest in here is to point a statistical method for such cases.

Thank you so much for your time in advance.
 

Attachments

Last edited by a moderator:
Physics news on Phys.org
This is called "model comparison". I generally use the BIC or the closely related AIC for model comparison:

https://en.wikipedia.org/wiki/Bayesian_information_criterionhttps://en.wikipedia.org/wiki/Akaike_information_criterion
Note, you should not use the same data for model selection as for model testing. With only 10 data points you have far too few to do either job reliably. Your result is likely to be highly dependent on the sample noise and not very robust at all. I would recommend acquiring much more data, 40 at a bare minimum so that you can have 20 points for model selection and 20 points for model testing, but ideally substantially more than that.
 
  • Informative
Likes   Reactions: Klystron
It sounds like you want a general method to use when you have no idea about the theoretical model or statistical distributions that apply. In that situation, I do not think that you should concern yourself with a statistical result. A spline curve through the data points would be as valid as anything.
 
  • Like
Likes   Reactions: Klystron and Dale
Dale said:
This is called "model comparison". I generally use the BIC or the closely related AIC for model comparison:

https://en.wikipedia.org/wiki/Bayesian_information_criterionhttps://en.wikipedia.org/wiki/Akaike_information_criterion
Note, you should not use the same data for model selection as for model testing. With only 10 data points you have far too few to do either job reliably. Your result is likely to be highly dependent on the sample noise and not very robust at all. I would recommend acquiring much more data, 40 at a bare minimum so that you can have 20 points for model selection and 20 points for model testing, but ideally substantially more than that.

Thank you so much for your reply Dale. The file was just a sample to show the data and their distrubution. I have looked into BIC and AIC yet I am more confused now since I have no experience about obtaining them. It will be a blessing for me if you could show me a way out or an example. Thank you so much for your guidance in advance.
Best regards.
 
FactChecker said:
It sounds like you want a general method to use when you have no idea about the theoretical model or statistical distributions that apply. In that situation, I do not think that you should concern yourself with a statistical result. A spline curve through the data points would be as valid as anything.

Dear FactChecker,

Thanks for your reply. You're right actually. I normally plot a spline line with the obtained calculation results to compare them visually with the experimental data. However, I also would like to have a numerical comparison for a better understanding. This is why I asked about them. Thank you for your contribution, reply and time.
Best regards.
 
parazit said:
I have looked into BIC and AIC yet I am more confused now since I have no experience about obtaining them. It will be a blessing for me if you could show me a way out or an example.
Most good software packages will have them already implemented and you just have to call them. I would not recommend implementing them by hand!

I use R for my statistics, it is free and is very powerful and RStudio ( https://www.rstudio.com/ ) is a very nice distribution of R. Installing RStudio will probably take less time than programming your own AIC or BIC. Here is the page for AIC and BIC using R including an example at the bottom:

https://stat.ethz.ch/R-manual/R-devel/library/stats/html/AIC.html
 
  • Like
Likes   Reactions: FactChecker

Similar threads

  • · Replies 5 ·
Replies
5
Views
6K
  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 16 ·
Replies
16
Views
2K
  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 20 ·
Replies
20
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K