Over-searching error (statistics)

0rthodontist · Nov 21, 2006

I guess this should properly go in the Programming forum, but I think I might get a better response here.

My question is with respect to statistics (context of machine learning) about "over-searching" error. You have a search space S of all possible models, from which you choose a subset K. Then you have some test data D, and you evaluate how well each model in K fits D. You pick the best model in K and use that as your model.

Over-searching says that it is bad to do exhaustive sampling, where K = S. Though the model you end up with fits D better than the model you end up with when K is much smaller than S, for some reason the model when K = S does not work as well when it's tested against new data that's not in D.

I didn't quite catch the reason for this and I still do not understand. I wrote down, "two or more search spaces contain different numbers of models. The maximum scores in each space are biased to different degrees." I understand this but I don't see its relevance to over-searching.

fresh_42 · May 25, 2019

Difficult to answer without measures for quantifications. The general idea could be: the bigger K the better the fit, but also the tolerances are smaller. This means by additional data to check against, sub optimal solutions can still be solutions, whereas an optimum is likely to be destroyed.

Over-searching error (statistics)

1. What is an over-searching error in statistics?

2. How does over-searching affect the results of a study?

3. What are some common methods to address over-searching in statistical analyses?

4. How can researchers prevent over-searching errors in their studies?

5. Are there any situations where over-searching may be acceptable?

Similar threads

Hot Threads

Recent Insights