Regression line with zero slope and average as best prediction

  • #1
fog37
1,568
108
TL;DR Summary
Regression line with zero slope and average as best prediction
Hello,

I was considering some made up data ##(X,Y)## and a its best fit regression line. The outcome variable ##Y## is the number of likes and ##X## is the number of comments on a website.

We have 100 data points which spread in such a way that the best fit line has zero slope. This implies that there is no linear relationship between the variables ##X## and ##Y##. This also means that the average of ##Y## would be the best prediction for ##Y## regardless of the value of ##X##. It does not matter what the value of ##X## is, the best prediction for ##Y## would be equal to the average and have a constant value....

My question: here we are talking about taking the arithmetic average of ALL the ##Y## values from all different ##X## values, correct?
What about the average of the ##Y## values for the same ##X## value (assuming there is more than just one ##Y## value for each ##X## value)? These two averages should always be numerically close, correct?

Thank you!
 
Physics news on Phys.org
  • #2
Y does not depend on X value, so I could not infer anything about a particular Y. Unless there is a nonlinear relationship.
 
  • Like
Likes fog37
  • #3
fog37 said:
TL;DR Summary: Regression line with zero slope and average as best prediction

Hello,

I was considering some made up data ##(X,Y)## and a its best fit regression line.
Be careful with "best" here. It is the best that can be done with solid statistical significance. If you "throw everything at the wall to see what sticks" then you can often get very good fits to the data that has no statistical significance at all. You want to be able to convince people, even very skeptical ones, that every term in your model probably belongs there. A good linear regression application should only include terms that show a statistically significant reason to be included
fog37 said:
The outcome variable ##Y## is the number of likes and ##X## is the number of comments on a website.

We have 100 data points which spread in such a way that the best fit line has zero slope. This implies that there is no linear relationship between the variables ##X## and ##Y##. This also means that the average of ##Y## would be the best prediction for ##Y## regardless of the value of ##X##. It does not matter what the value of ##X## is, the best prediction for ##Y## would be equal to the average and have a constant value....

My question: here we are talking about taking the arithmetic average of ALL the ##Y## values from all different ##X## values, correct?
Yes. They are all involved in the linear regression calculations.
fog37 said:
What about the average of the ##Y## values for the same ##X## value (assuming there is more than just one ##Y## value for each ##X## value)? These two averages should always be numerically close, correct?
No. That is too strong a statement. In the 100 data points that you collected for your sample, there might be values of ##X## where that sample happened to be off. In fact, with only 100 samples, if you collected data at 10 ##X## values, you can expect the ##Y## average of some of those ##X## value sets to be off more than others.
 
Last edited:

1. What does it mean when a regression line has zero slope?

A regression line with a zero slope indicates that there is no linear relationship between the independent variable (X) and the dependent variable (Y). This means that changes in the value of X do not predict or cause any changes in the value of Y. The line is horizontal and suggests that Y is constant regardless of X.

2. Why might the average value be considered the best prediction in such cases?

When the regression line has zero slope, it implies that the best predictor of Y, regardless of the value of X, is simply the average of Y. This is because there is no variation in Y explained by X, and the average of Y provides a central measure or the most typical value of Y given the lack of a linear relationship.

3. How is the regression line calculated when it has zero slope?

The regression line with zero slope can be calculated by determining the mean (average) of the dependent variable Y across all data points. The line will then be Y = mean(Y), which is a horizontal line at the height of the average value of Y on the graph.

4. What are the implications of using the average as the best prediction for practical applications?

Using the average of Y as the best prediction when the regression line has zero slope can simplify predictions in scenarios where X provides no information about Y. This approach minimizes the errors in prediction in terms of their sum of squares. However, it also indicates that the model does not account for any variability in Y due to X, which could be limiting in situations where other variables not included in the model might explain the variation in Y.

5. How does a zero slope affect the interpretation of correlation and causation between variables?

A zero slope in a regression line suggests that there is no correlation and hence no linear causation between the independent and dependent variables. This means that one cannot use the independent variable to predict or explain changes in the dependent variable. It highlights the need to either reconsider the variables being analyzed or to explore non-linear models or relationships that might better capture the dynamics between the variables.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
895
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
11
Views
763
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
453
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
982
Back
Top