Statistical significance of a ML model...

In summary, determining if a ML model is statistically significant involves using tests such as t-tests or F-tests for models like linear regression or logistic regression. However, for other models like decision trees, SVM, or neural nets, there is a subfield called uncertainty quantification that is actively developing methods to determine statistical significance. It is important to set aside a part of the input data for testing purposes to avoid bias in the results.
  • #1
fog37
1,568
108
TL;DR Summary
Determining if a ML model is statistically significant...
Hello,

How do we check if a ML model is statistically significant? For models like linear regression, logistic regression, etc. there are tests (t-tests, F-tests, etc.) that will tell us if the model, trained on some dataset, is statistically significant or not.

But in the case of ML models, like decision trees, SVM, or neural nets, how do we determine if the model is statistically significant? I have not seen any specific test to do that...

Thank you!
 
Technology news on Phys.org
  • #2
There is a whole subfield on this called UQ - uncertainty quantification. It is an area or active development.
 
  • #3
fog37 said:
TL;DR Summary: Determining if a ML model is statistically significant...

But in the case of ML models, like decision trees, SVM, or neural nets, how do we determine if the model is statistically significant? I have not seen any specific test to do that...
The t test will work with any predictive model. You're supposed to set aside a part of the input data, and not use it in your model and use it for testing later. (Because predicting your input data with a ML model is cheating). For a yes/no model, you can score a 1 for correct, and 0 for wrong, and you can compare it other ways to predict the outcomes (or random guessing),
 

What is statistical significance?

Statistical significance is a measure of the likelihood that the results of an experiment or study are not due to chance. It is used to determine if the results of a study are reliable and can be generalized to a larger population.

Why is it important to assess the statistical significance of a ML model?

Assessing the statistical significance of a ML model is important because it tells us whether the results of the model are due to chance or if there is a real relationship between the variables being studied. This helps us determine the reliability of the model and its ability to make accurate predictions on new data.

How is statistical significance calculated?

Statistical significance is typically calculated using a p-value, which represents the probability of obtaining the observed results or more extreme results by chance alone. A p-value of less than 0.05 is generally considered statistically significant, meaning that there is a less than 5% chance that the results are due to chance.

What factors can affect the statistical significance of a ML model?

The size of the dataset, the quality of the data, and the choice of statistical test used can all affect the statistical significance of a ML model. Additionally, the strength of the relationship between the variables being studied can also impact the significance.

Can a ML model be statistically significant but not useful?

Yes, a ML model can be statistically significant but not useful. This can happen if the model has a strong relationship between the variables being studied, but the relationship is not meaningful or applicable in real-world situations. It is important to not only assess statistical significance, but also the practical significance and usefulness of a ML model.

Similar threads

  • Programming and Computer Science
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
14
Views
259
  • Programming and Computer Science
Replies
28
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
843
  • Programming and Computer Science
Replies
7
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Programming and Computer Science
Replies
22
Views
922
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
Back
Top