Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Over-searching error (statistics)

  1. Nov 21, 2006 #1


    User Avatar
    Science Advisor

    I guess this should properly go in the Programming forum, but I think I might get a better response here.

    My question is with respect to statistics (context of machine learning) about "over-searching" error. You have a search space S of all possible models, from which you choose a subset K. Then you have some test data D, and you evaluate how well each model in K fits D. You pick the best model in K and use that as your model.

    Over-searching says that it is bad to do exhaustive sampling, where K = S. Though the model you end up with fits D better than the model you end up with when K is much smaller than S, for some reason the model when K = S does not work as well when it's tested against new data that's not in D.

    I didn't quite catch the reason for this and I still do not understand. I wrote down, "two or more search spaces contain different numbers of models. The maximum scores in each space are biased to different degrees." I understand this but I don't see its relevance to over-searching.
  2. jcsd
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook

Can you offer guidance or do you also need help?
Draft saved Draft deleted