Weighted average of a set of slopes with different goodness of fit

In summary: The averaging would take into account the errors of slope, which are due to the measurement process. The average would be based on the data converging to a theoretically expected mean value of "m". This value would be based on the central limit theorem, which states that the value of "m" should converge to a central tendency in a bell-shaped curve on repeating the experiment.
  • #1
latitude
56
0
Hi there, I have a bit of a confusing question, but I'll try to be as clear as I can in asking it.

I have a set of linear fits for four different sets of data. Basically, I have three sets of data, with sample sizes N1 = 5, N2 = 7, N3 = 5 respectively. I have plotted these data with respect to a common x-axis. Then, I have found the linear fit for each data set, giving me a line with a slope, so 3 slopes total (m1, m2, m3) with three errors in slope as well (m'1, m'2, m'3). Each of the linear fits also has a goodness of fit associated with it (0.79, 0.99, 0.89) using the R-squared value.

Here's my question. I need to solve for the average slope and average error in slope. I feel like a simple average and standard deviation isn't indicative of the real average, because the linear fits are not all equally "well-fit". Is there a way to weight the slopes m1, m2, m3 according to their R^2 values so that I can calculate m_avg using a weighted mean? Or is the average of the error in the slope enough? Should the sample sizes Ni come into it at all?

Any advice would be greatly appreciated! Thanks :)
 
Physics news on Phys.org
  • #2
Hello

Just a suggestion - How about calculating a weighted avarage i.e. (0.79*m1+0.99*m2+0.89*m3)/3 for the average slope

and I do not feel the sample size should matter here ..
 
  • #3
latitude said:
I need to solve for the average slope and average error in slope

That phrase doesn't specify a particular mathematical problem. For the word "average" to have a precise meaning you have to say what random variable is involved. If you have difficulty expressing the problem in mathematical terms, try explaining what you are trying to accomplish. If you get a number for "average slope", how do you expect to use it?
 
  • #4
Basically I am changing something incrementally and systematically (x-axis) and noting the induced change that occurs in another quality (y-axis). The response tends to be linearly increasing, y = mx. I repeated this measurement on four different samples from the same "batch", so that theoretically, they should be showing the same response. Since m has a physical significance, I want to take the average of the set of "m"s to show the average response for the samples from this batch. However, there is variance in the slope determined for each sample (not the standard deviation in the set, but experimental error associated with running the same test on the same sample a number of times and getting different results). I guess what I would like to do is show the average of the slope, with each contributor to the slope weighted according to how consistently I received that response during the testing iterations.

So if I tested Sample 1 six times and got a consistently linear response of m1 = 2.0 +/- 0.1, I feel that that should weigh more heavily in calculating the average than Sample 2, which I tested three times for m2 = 1.6 +/- 0.8. Does that make sense? So the average slope would just be a way of saying, "The Batch of samples tends to show this response (mavg +/- error_in_slope) for this particular test."

Thanks very much for the replies!
 
  • #5
I assume there is a reason that you can not do a single linear regression on all the data. In any case, I think you should make a test case of your proposed method, using real or fabricated data, and see how it compares to a single regression of all the data. I am skeptical that your proposed method is valid, but i could be wrong. I would be interested in hearing the result of that test.
 
  • #6
latitude said:
... theoretically, they should be showing the same response. ...but experimental error associated with running the same test on the same sample a number of times and getting different results).

... So if I tested Sample 1 six times and got a consistently linear response of m1 = 2.0 +/- 0.1, I feel that that should weigh more heavily in calculating the average than Sample 2, which I tested three times for m2 = 1.6 +/- 0.8. Does that make sense?

if the errors of slope are due to measurement and if data converges to any theoretically expected mean value of m consistantly it does make sense to take a weighted average based on frequency of observations rather than arithmatic mean of slopes.. as per central limit theorem value of m should converge to a central tendency in a bell shaped curve on repeating the experiment...
 
  • #7
I think the proper statement of your goal is that you want to "estimate" some quantity. (Statistics has two main endeavors. These are "estimation" and "hypothesis testing".)

latitude said:
theoretically, they should be showing the same response

You need a probability model to make clear what theory says and how errors enter the picture. Here are some alternatives:

For example, you might assume each measurement (X,Y) has the form Y = MX + B + E where M and B are constant and E is a random variable representing an error that occurs with each measurement of Y.

Or you might assume each experiment produces data of the form Y = (M + E1)X + B + E2 + E3 where E1 and E2 are random errors that are constant on each "batch" of measurements and E3 is a random error that occurs on each measurement.

Or you might assume each experiment produces data of the form Y = MX + B + E1 + E3 where E1 is an error that is constant for each "batch" and E3 is an error that occurs on each measurement.

In some reali life situations (X,Y) measurements can have errors in the measurement of X as well as the measurement of Y. The usual sort of linear regression assumes there is no error in the X masurements.

Until you get into such detail, we don't have a specific mathematical question.

The least painful way to get into such detail is to use simulation. If you can write computer programs, I agree with FactChecker's advice to make test cases using fabricated data. Pick a specific probability model for simulating the data. That way you will know the actual value of M. Then simulate data and compute your estimate in various ways and see which does best.
 
  • Like
Likes 1 person
  • #8
FactChecker said:
I assume there is a reason that you can not do a single linear regression on all the data.
The only reason I can think of would be that although the slopes are in principle the same the intercepts are not. If so, it would be logical to adjust the relative intercepts so as to minimise the R-squared value for the linear regression of the conflated set. There's probably a simple algebraic way to do that.
 

1. What is the weighted average of a set of slopes with different goodness of fit?

The weighted average of a set of slopes with different goodness of fit is a statistical method that calculates the mean slope of a group of data points, taking into account the quality of fit for each individual slope. This means that slopes with a higher goodness of fit will have a greater influence on the overall average than slopes with a lower goodness of fit.

2. How is the weighted average of a set of slopes calculated?

The weighted average of a set of slopes is calculated by multiplying each slope by its corresponding weight, which is determined by the goodness of fit. These products are then summed and divided by the total weight of all slopes. The resulting value is the weighted average of the set of slopes.

3. Why is it important to use a weighted average for slopes with different goodness of fit?

Using a weighted average for slopes with different goodness of fit allows for a more accurate representation of the data. It takes into account the reliability of each slope, giving more weight to those with a higher goodness of fit and reducing the influence of outliers or less reliable data points.

4. How is the weight of a slope determined in a weighted average?

The weight of a slope in a weighted average is determined by the goodness of fit, which is typically measured by the coefficient of determination (R-squared value). The higher the R-squared value, the greater the weight of the slope will be in the weighted average calculation.

5. Can the weighted average of a set of slopes change?

Yes, the weighted average of a set of slopes can change if the goodness of fit values for the individual slopes change. For example, if a data point with a high weight (due to a high R-squared value) is removed, the weighted average will be affected and may decrease. Similarly, if a data point with a low weight is added, the weighted average may increase.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
482
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
878
  • Set Theory, Logic, Probability, Statistics
Replies
16
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
456
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
892
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
26
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
Back
Top