Undergrad Regression line with zero slope and average as best prediction

Click For Summary
SUMMARY

The discussion centers on the implications of a regression line with a zero slope, indicating no linear relationship between the number of comments (X) and the number of likes (Y) on a website. With 100 data points, the best prediction for Y is the arithmetic average of all Y values, regardless of X. The conversation also addresses the potential differences between the average of Y values across all X values and the average of Y values for specific X values, clarifying that these averages may not always be numerically close.

PREREQUISITES
  • Understanding of linear regression concepts
  • Familiarity with statistical significance in model fitting
  • Knowledge of arithmetic averages and their calculation
  • Basic grasp of data sampling and variability
NEXT STEPS
  • Explore the concept of statistical significance in linear regression models
  • Learn about the implications of zero slope in regression analysis
  • Investigate the differences between overall averages and conditional averages in datasets
  • Study the effects of sample size on regression outcomes and predictions
USEFUL FOR

Data analysts, statisticians, and anyone involved in predictive modeling or regression analysis will benefit from this discussion, particularly those interested in understanding the limitations of linear relationships in data.

fog37
Messages
1,566
Reaction score
108
TL;DR
Regression line with zero slope and average as best prediction
Hello,

I was considering some made up data ##(X,Y)## and a its best fit regression line. The outcome variable ##Y## is the number of likes and ##X## is the number of comments on a website.

We have 100 data points which spread in such a way that the best fit line has zero slope. This implies that there is no linear relationship between the variables ##X## and ##Y##. This also means that the average of ##Y## would be the best prediction for ##Y## regardless of the value of ##X##. It does not matter what the value of ##X## is, the best prediction for ##Y## would be equal to the average and have a constant value....

My question: here we are talking about taking the arithmetic average of ALL the ##Y## values from all different ##X## values, correct?
What about the average of the ##Y## values for the same ##X## value (assuming there is more than just one ##Y## value for each ##X## value)? These two averages should always be numerically close, correct?

Thank you!
 
Physics news on Phys.org
Y does not depend on X value, so I could not infer anything about a particular Y. Unless there is a nonlinear relationship.
 
fog37 said:
TL;DR Summary: Regression line with zero slope and average as best prediction

Hello,

I was considering some made up data ##(X,Y)## and a its best fit regression line.
Be careful with "best" here. It is the best that can be done with solid statistical significance. If you "throw everything at the wall to see what sticks" then you can often get very good fits to the data that has no statistical significance at all. You want to be able to convince people, even very skeptical ones, that every term in your model probably belongs there. A good linear regression application should only include terms that show a statistically significant reason to be included
fog37 said:
The outcome variable ##Y## is the number of likes and ##X## is the number of comments on a website.

We have 100 data points which spread in such a way that the best fit line has zero slope. This implies that there is no linear relationship between the variables ##X## and ##Y##. This also means that the average of ##Y## would be the best prediction for ##Y## regardless of the value of ##X##. It does not matter what the value of ##X## is, the best prediction for ##Y## would be equal to the average and have a constant value....

My question: here we are talking about taking the arithmetic average of ALL the ##Y## values from all different ##X## values, correct?
Yes. They are all involved in the linear regression calculations.
fog37 said:
What about the average of the ##Y## values for the same ##X## value (assuming there is more than just one ##Y## value for each ##X## value)? These two averages should always be numerically close, correct?
No. That is too strong a statement. In the 100 data points that you collected for your sample, there might be values of ##X## where that sample happened to be off. In fact, with only 100 samples, if you collected data at 10 ##X## values, you can expect the ##Y## average of some of those ##X## value sets to be off more than others.
 
Last edited:
If there are an infinite number of natural numbers, and an infinite number of fractions in between any two natural numbers, and an infinite number of fractions in between any two of those fractions, and an infinite number of fractions in between any two of those fractions, and an infinite number of fractions in between any two of those fractions, and... then that must mean that there are not only infinite infinities, but an infinite number of those infinities. and an infinite number of those...

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 30 ·
2
Replies
30
Views
4K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 23 ·
Replies
23
Views
4K