I guess this should properly go in the Programming forum, but I think I might get a better response here.(adsbygoogle = window.adsbygoogle || []).push({});

My question is with respect to statistics (context of machine learning) about "over-searching" error. You have a search space S of all possible models, from which you choose a subset K. Then you have some test data D, and you evaluate how well each model in K fits D. You pick the best model in K and use that as your model.

Over-searching says that it is bad to do exhaustive sampling, where K = S. Though the model you end up with fits D better than the model you end up with when K is much smaller than S, for some reason the model when K = S does not work as well when it's tested against new data that's not in D.

I didn't quite catch the reason for this and I still do not understand. I wrote down, "two or more search spaces contain different numbers of models. The maximum scores in each space are biased to different degrees." I understand this but I don't see its relevance to over-searching.

**Physics Forums | Science Articles, Homework Help, Discussion**

Join Physics Forums Today!

The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

# Over-searching error (statistics)

Can you offer guidance or do you also need help?

Draft saved
Draft deleted

**Physics Forums | Science Articles, Homework Help, Discussion**