Linear regression on data collection error

Click For Summary
SUMMARY

The discussion focuses on the implications of varying R² values in linear regression analysis, specifically addressing the potential for data collection errors. Two data sets yielded significantly lower R² values of 0.720 and 0.810 compared to others above 0.900, raising concerns about their validity. The conversation emphasizes the importance of examining outliers and the distribution of data points, suggesting that the presence of outliers may distort regression results. It concludes that if the regression equations differ significantly, further investigation into the data quality is warranted.

PREREQUISITES
  • Understanding of linear regression analysis and R² values
  • Familiarity with statistical concepts such as outliers and data distribution
  • Experience with data validation techniques
  • Knowledge of data collection methods and their potential errors
NEXT STEPS
  • Investigate methods for identifying and handling outliers in datasets
  • Learn about data validation techniques to ensure data integrity
  • Explore the impact of sample size on R² values in linear regression
  • Study the use of statistical software like R or Python for regression analysis
USEFUL FOR

Data analysts, statisticians, researchers, and anyone involved in data collection and analysis who seeks to understand the reliability of their regression results.

Travis T
Messages
18
Reaction score
2
Hi

I've collected few sets of data and obtained significant different linear regression (R^2) in 2 particular sets of data .
Does that indicates the 2 sets of data is not validated which might due to data collection error?

For example, 20 sets of data contain linear regression of 0.900+ (0.994, 0.983, 0.932...), while the 2 sets of data contain linear regression of 0.720 and 0.810 respectively.
 
Physics news on Phys.org
It depends on the uncertainty on R, which depends on the size of the datasets and the distribution of the data.
 
Are the regression equations significantly different or just a smaller R2? If 2 out of 20 are weak results, you should only be suspicious if their estimates are very different. The unusually high R2 might mean that those sets have some outliers. You may want to look at the data and see if some points look unreasonable. If there are outliers pulling the regression equation out of line with the others, I would see what happens if the outliers are thrown out.
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 30 ·
2
Replies
30
Views
5K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 64 ·
3
Replies
64
Views
6K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 44 ·
2
Replies
44
Views
5K