Linear regression on data collection error

In summary, the conversation discusses the significance of obtaining different linear regression (R^2) in two particular sets of data. It is mentioned that the difference in R^2 may indicate a data collection error, or could be due to the uncertainty in R and the size and distribution of the datasets. The conversation also suggests checking for outliers and their impact on the regression equations.
  • #1
Travis T
18
2
Hi

I've collected few sets of data and obtained significant different linear regression (R^2) in 2 particular sets of data .
Does that indicates the 2 sets of data is not validated which might due to data collection error?

For example, 20 sets of data contain linear regression of 0.900+ (0.994, 0.983, 0.932...), while the 2 sets of data contain linear regression of 0.720 and 0.810 respectively.
 
Physics news on Phys.org
  • #2
It depends on the uncertainty on R, which depends on the size of the datasets and the distribution of the data.
 
  • #3
Are the regression equations significantly different or just a smaller R2? If 2 out of 20 are weak results, you should only be suspicious if their estimates are very different. The unusually high R2 might mean that those sets have some outliers. You may want to look at the data and see if some points look unreasonable. If there are outliers pulling the regression equation out of line with the others, I would see what happens if the outliers are thrown out.
 

1. What is linear regression on data collection error?

Linear regression on data collection error is a statistical technique used to analyze the relationship between a dependent variable and one or more independent variables, while taking into account the potential errors in the data collection process. It helps to quantify and correct for any errors in the data, in order to obtain more accurate results.

2. Why is it important to account for data collection error in linear regression?

Data collection error can significantly impact the results of a linear regression analysis. If left unaccounted for, it can lead to biased and inaccurate conclusions. By incorporating it into the analysis, the results can be more reliable and valid.

3. How is data collection error identified in linear regression?

Data collection error can be identified through various methods, such as visual inspection of the data, statistical tests, and residual analysis. These techniques help to detect outliers, missing data, and other types of errors that may affect the results.

4. What are some common types of data collection error in linear regression?

Some common types of data collection error in linear regression include measurement error, sampling error, and non-response error. Measurement error occurs when there are inaccuracies in the measurement of the variables. Sampling error occurs when the sample used for analysis is not representative of the population. Non-response error occurs when there is a lack of response from a portion of the sample.

5. How can data collection error be minimized in linear regression?

Data collection error can be minimized by ensuring proper training and supervision of data collectors, using reliable and valid measurement tools, and carefully selecting a representative sample. Additionally, conducting pilot studies and performing thorough data cleaning and validation before analysis can also help to minimize data collection error.

Similar threads

  • Linear and Abstract Algebra
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
Replies
8
Views
2K
  • Set Theory, Logic, Probability, Statistics
2
Replies
64
Views
3K
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
2K
  • Programming and Computer Science
Replies
28
Views
2K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
12
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
Back
Top