Is the correlation coefficient significant in this data set?

Click For Summary

Discussion Overview

The discussion revolves around the significance of the correlation coefficient in a dataset related to the height of buildings based on the number of stories. Participants explore the calculation of the least squares regression line, the identification of outliers, and the significance of the correlation coefficient, including its interpretation and implications.

Discussion Character

  • Technical explanation
  • Debate/contested
  • Mathematical reasoning
  • Homework-related

Main Points Raised

  • One participant calculated the least squares regression line as y-hat = 11.304 + 106.218x and found a correlation coefficient of r = 0.913, suggesting it was not significant.
  • Another participant suggested plotting the data and regression line to identify outliers, recommending the removal of any outlier before recalculating the regression.
  • There was confusion regarding the variables, with participants questioning whether x represented the number of stories and y the height of the building, leading to a discussion about the regression equation.
  • Some participants noted that the regression equation seemed incorrect based on the relationship between stories and height, prompting a re-evaluation of the calculations.
  • One participant mentioned using a linear regression calculator that produced the same equation, but others challenged the accuracy of the calculator's results.
  • Participants discussed the calculation of the standard deviation of the residuals, with varying results and corrections being made along the way.
  • There was a specific mention of an outlier at about stories = 50, height = 1050, which some participants suggested removing to improve the regression's significance.
  • After removing the outlier, a new regression line was proposed, and a new standard deviation of the residuals was calculated.
  • One participant reiterated their earlier conclusion about the correlation coefficient being not significant, prompting further discussion about the interpretation of correlation values.

Areas of Agreement / Disagreement

Participants expressed differing views on the significance of the correlation coefficient, with some agreeing that it was not significant while others questioned this interpretation. The discussion around the regression equation and the identification of outliers also revealed a lack of consensus on the calculations and their implications.

Contextual Notes

There were multiple references to potential errors in calculations, particularly regarding the regression equation and the standard deviation of the residuals. Participants noted the importance of correctly identifying independent and dependent variables, as well as the need for careful calculations to avoid errors.

Who May Find This Useful

This discussion may be useful for students or individuals interested in statistics, regression analysis, and the interpretation of correlation coefficients in practical applications.

freshcoast
Messages
185
Reaction score
1
16legaq.jpg


I also made a graph which is not pictured.

1.) Calculate the least squares line. Put the equation in the form of: y-hat = a + bx.
I got: y hat = 11.304 + 106.218x

a.) Find correlation coefficient. Is it significant? (use the p-value to decide)
I got: r = 0.913... no it is not significant

b.) Are there any outliers in the data? If so which point(s)? Why is it an outlier? If there are any, recalculate the least squares line after removing the outlier(s).
--Got kinda lost here. Any help appreciated!
 
Physics news on Phys.org
Plot your data and the regression line together. Is there a point that is much farther (vertically), from the line than the others? That would be an outlier. Remove that point from the data and redo the linear regression to see if it is significant without that point.
 
It's not obvious what x is supposed to represent in the regression equation. Is x supposed to be the number of stories in a building and y-hat the height of the building?
 
SteamKing said:
It's not obvious what x is supposed to represent in the regression equation. Is x supposed to be the number of stories in a building and y-hat the height of the building?

Yes, sorry-- "stories" is the independent variable (x) and "height" is the dependent variable (y).
 
freshcoast said:
Yes, sorry-- "stories" is the independent variable (x) and "height" is the dependent variable (y).
Then you've got a problem with your regression equation. According to it, a 10-story building would be over 1000 feet high.
 
Ohhh, I think I just need to switch them around, making the regression equation:

y hat = 106.218 + 11.304x

Correct?
 
freshcoast said:
Ohhh, I think I just need to switch them around, making the regression equation:

y hat = 106.218 + 11.304x

Correct?
So now, a one-story building is over 100 feet high.

Nope, that's not going to do it. I think you did something fundamentally wrong in calculating a and b.

You did use x as the number of stories and y as the heights from your data table? Look carefully, because these data are listed in reverse order in the table.
 
I'm confused... I also entered it into a linear regression calculator and it gave me the same equation that I got.

11hf9sw.jpg
 
This calculator is giving you the wrong results. It can calculate the mean x and mean y and add up the number of data points, but the rest is incorrect.

You can make your own calculation using a spreadsheet, and some calculators have linear regression fits built in.
 
  • #10
freshcoast said:
Ohhh, I think I just need to switch them around, making the regression equation:

y hat = 106.218 + 11.304x

Correct?
I had some errors in my check calculations for a and b. This equation is correct (as is the calculator).

Sorry for the confusion.
 
  • #11
Ok, great! It was driving me crazy-- what a relief. :biggrin:

Moving on... I had another question for a different problem. By hand, calculate the standard deviation of the residuals.
21l1krm.jpg

For the least square line I got:
y = 284.5/114x - 1.1
I know the residuals equation is e = y - y hat... but I'm not exactly sure where to start?
 
  • #12
Actually, I think I'm supposed to use the formula: SEE = √s/(n-p)
 
  • #13
freshcoast said:
Ok, great! It was driving me crazy-- what a relief. :biggrin:

Moving on... I had another question for a different problem. By hand, calculate the standard deviation of the residuals.

For the least square line I got:
y = 284.5/114x - 1.1

I know the residuals equation is e = y - y hat... but I'm not exactly sure where to start?

For the x value of each data point, calculate y-hat according to the regression formula. The actual y value is taken from the graph. Then calculate the sum of the squares of the residuals. Divide this sum by (n-2). This is the variance. The standard deviation is the square root of the variance of the residuals.
 
  • #14
Ok, so after calculating all of that I got: s = 15.7

Correct?
 
  • #15
freshcoast said:
Ok, so after calculating all of that I got: s = 15.7

Correct?
Nope. This is way too big for s.
 
  • #16
I accidentally squared all the y hat variables instead.. Oops. Now I have s = 3.1
 
  • #17
freshcoast said:
I accidentally squared all the y hat variables instead.. Oops. Now I have s = 3.1

You're supposed to calculate the ∑ e2 and divide that by (n-2) and take the square root. Your s is still too big. Show your calculations, please.
 
  • #18
freshcoast said:
I'm confused... I also entered it into a linear regression calculator and it gave me the same equation that I got.

11hf9sw.jpg
You can see immediately from your plot that one point at about stories = 50, height = 1050 is an outlier. That is the first data point in your table. Did you try removing that and doing the regression again? I think it will help the significance of your regression a lot.
 
  • #19
SteamKing said:
You're supposed to calculate the ∑ e2 and divide that by (n-2) and take the square root. Your s is still too big. Show your calculations, please.
FactChecker said:
You can see immediately from your plot that one point at about stories = 50, height = 1050 is an outlier. That is the first data point in your table. Did you try removing that and doing the regression again? I think it will help the significance of your regression a lot.
Yes, I took out that outlier and got a new regression line of y hat = 43.19 + 11.59x
 
  • Like
Likes   Reactions: FactChecker
  • #20
SteamKing said:
You're supposed to calculate the ∑ e2 and divide that by (n-2) and take the square root. Your s is still too big. Show your calculations, please.
6gwkdk.jpg
 
  • #21
Check your work for x = 3.
 
  • #22
Ahh, my bad. Ok, so after fixing that, I got a new standard deviation of 1.65.
 
  • #23
freshcoast said:
Ahh, my bad. Ok, so after fixing that, I got a new standard deviation of 1.65.

Looks good.

In the future, it helps if you post only one problem or question per thread. That keeps everyone from getting answers to different questions mixed up.
 
  • #24
Noted! Sorry about that, got too excited.. But I do have one last question regarding the 1st problem I asked about... I got
SteamKing said:
Looks good.

In the future, it helps if you post only one problem or question per thread. That keeps everyone from getting answers to different questions mixed up.
Noted! Sorry about that, got too excited. But I do have one last question regarding the first problem I posted. I was to find the correlation coefficient and tell if it was significant. I got r = 0.913, and said it was not significant. Was that part correct also?
 
  • #25
freshcoast said:
Noted! Sorry about that, got too excited.. But I do have one last question regarding the 1st problem I asked about... I got

Noted! Sorry about that, got too excited. But I do have one last question regarding the first problem I posted. I was to find the correlation coefficient and tell if it was significant. I got r = 0.913, and said it was not significant. Was that part correct also?

A correlation coefficient of 1.0 indicates a perfect fit of the data. The closer you are to r = 1.00, the better fit you have.
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 13 ·
Replies
13
Views
4K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 30 ·
2
Replies
30
Views
4K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 23 ·
Replies
23
Views
4K
Replies
3
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 64 ·
3
Replies
64
Views
6K