Linear regression and varience.

In summary, the conversation involves a person seeking help with calculating the variance using linear regression and encountering difficulties with using Latex. They explain their process and ask for clarification on the equation. Eventually, they figure out their mistake and ask for assistance with using Latex.
  • #1
matthyaouw
Gold Member
1,125
5
Im having some trouble with this, and I was hoping someone could help me.
I have a data set from which I've determined the [tex]\widehat{a}[/tex] and [tex]\widehat{b}[/tex] values and determined where the line of best fit should go using linear regression. The next thing I have to do is work out the varience using this equation:

[tex]\underline{\sum(y-\widehat{y})^2}
\\n-2[/tex]
(edit) Sorry, first time using latex, and I can't access the tutorials for some reason.
I've typed :
"\underline{\sum(y-\widehat{y})^2}
\\n-2"
But I'm not getting a new line after ^2}. How do I do this? (/edit)

I'm a bit unsure what to do here. Does that mean that I have to sum up all of my y values, and take away the expected y values that are predicted on my line of best fit which correspond to the actual values I've entered?
 
Last edited:
Physics news on Phys.org
  • #2
Never mind, got it (I think). If someone could still tell me what I'm doing wrong with the Latex I'd appreciate it though.
 
  • #3



I am happy to help you with this concept. Linear regression is a statistical method used to model the relationship between two or more variables. It is commonly used to predict an outcome based on one or more input variables. The goal of linear regression is to find the line of best fit that minimizes the distance between the actual data points and the predicted values on the line.

The \widehat{a} and \widehat{b} values you have determined represent the intercept and slope of the line of best fit, respectively. These values are calculated using the least squares method, which minimizes the sum of squared errors between the actual data points and the predicted values on the line.

Now, in order to calculate the variance, you need to take into account the difference between the actual data points and the predicted values on the line. This is represented by the term (y-\widehat{y})^2 in the equation you provided. This term is squared to ensure that both positive and negative differences are accounted for and to give more weight to larger differences.

To calculate the variance, you need to sum up all the squared differences and divide by n-2, where n represents the number of data points. This is known as the degrees of freedom and is used to adjust for the number of parameters (in this case, two - \widehat{a} and \widehat{b}) that were estimated from the data.

I hope this explanation helps you understand the concept of variance in linear regression better. If you are still having trouble, I suggest seeking help from a statistician or a colleague who is familiar with this concept. Good luck with your analysis!
 

1. What is linear regression and how is it used in science?

Linear regression is a statistical method used to model the relationship between two variables. It is commonly used in science to analyze data and make predictions based on the relationship between the variables. It involves fitting a straight line to a scatter plot of the data points, where the line represents the best fit for the data.

2. What is the purpose of calculating variance in linear regression?

Variance is a measure of how spread out the data points are from the mean. In linear regression, calculating the variance helps to determine how well the data points fit the regression line. A lower variance indicates a better fit, while a higher variance indicates a poor fit. This information can be used to evaluate the accuracy of the regression model and make any necessary adjustments.

3. How do you interpret the coefficient of determination (R-squared) in linear regression?

The coefficient of determination, or R-squared, is a measure of how well the regression line fits the data. It represents the percentage of variation in the dependent variable that is explained by the independent variable. A value of 1 indicates a perfect fit, while a value of 0 indicates no relationship between the variables.

4. What are the assumptions of linear regression?

Linear regression assumes that there is a linear relationship between the two variables being studied, the data points are independent of each other, and the errors or residuals are normally distributed. It also assumes that there is no multicollinearity among the independent variables, meaning they are not highly correlated with each other.

5. How do you handle outliers in linear regression?

Outliers are data points that are significantly different from the rest of the data and can have a large impact on the regression line and its accuracy. They should be carefully examined to determine if they are valid data points or errors. If they are valid, they can be included in the analysis, but if they are errors, they should be removed from the data set. It is important to note that removing outliers can also affect the results, so it should be done with caution.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
482
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
818
  • Engineering and Comp Sci Homework Help
Replies
7
Views
720
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
2K
Back
Top