Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

I Linear regression on data collection error

  1. Nov 2, 2016 #1

    I've collected few sets of data and obtained significant different linear regression (R^2) in 2 particular sets of data .
    Does that indicates the 2 sets of data is not validated which might due to data collection error?

    For example, 20 sets of data contain linear regression of 0.900+ (0.994, 0.983, 0.932...), while the 2 sets of data contain linear regression of 0.720 and 0.810 respectively.
  2. jcsd
  3. Nov 2, 2016 #2


    User Avatar
    2017 Award

    Staff: Mentor

    It depends on the uncertainty on R, which depends on the size of the datasets and the distribution of the data.
  4. Nov 2, 2016 #3


    User Avatar
    Science Advisor
    Gold Member
    2017 Award

    Are the regression equations significantly different or just a smaller R2? If 2 out of 20 are weak results, you should only be suspicious if their estimates are very different. The unusually high R2 might mean that those sets have some outliers. You may want to look at the data and see if some points look unreasonable. If there are outliers pulling the regression equation out of line with the others, I would see what happens if the outliers are thrown out.
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Have something to add?
Draft saved Draft deleted