Over-searching error (statistics)


Science Advisor
I guess this should properly go in the Programming forum, but I think I might get a better response here.

My question is with respect to statistics (context of machine learning) about "over-searching" error. You have a search space S of all possible models, from which you choose a subset K. Then you have some test data D, and you evaluate how well each model in K fits D. You pick the best model in K and use that as your model.

Over-searching says that it is bad to do exhaustive sampling, where K = S. Though the model you end up with fits D better than the model you end up with when K is much smaller than S, for some reason the model when K = S does not work as well when it's tested against new data that's not in D.

I didn't quite catch the reason for this and I still do not understand. I wrote down, "two or more search spaces contain different numbers of models. The maximum scores in each space are biased to different degrees." I understand this but I don't see its relevance to over-searching.


Insights Author
2018 Award
Difficult to answer without measures for quantifications. The general idea could be: the bigger K the better the fit, but also the tolerances are smaller. This means by additional data to check against, sub optimal solutions can still be solutions, whereas an optimum is likely to be destroyed.

Want to reply to this thread?

"Over-searching error (statistics)" You must log in or register to reply here.

Related Threads for: Over-searching error (statistics)

Physics Forums Values

We Value Quality
• Topics based on mainstream science
• Proper English grammar and spelling
We Value Civility
• Positive and compassionate attitudes
• Patience while debating
We Value Productivity
• Disciplined to remain on-topic
• Recognition of own weaknesses
• Solo and co-op problem solving

Hot Threads