Least-squares calculation with a pinch of weights and monte-carlo

  • A
  • Thread starter imsolost
  • Start date
  • Tags
    Calculation
In summary: Using total least squares would be a good option because you would be taking into account all of the information available. However, the decision of whether to use this method or not is subjective.
  • #1
imsolost
18
1
Hello all,
I am a bit lost with a problem and my reasoning and would like to hear your thoughts about it.
Problem is the following :

For a set of data yi, I need to find the best value for Am given :
1590760886505.png

"i" is an indice {1, 2, 3, ... up to something like 20}. "m" stands for "mass" (Am is a physical quantity named "mass activity").

Anyway, so I thought about using common least-square method...

The data yi is not a set of "exact" values though... It comes from a report from a laboratory and they give me a set of values, each with an associated standard deviation uncertainty. For some yi, value seems to be accurately measured since the standard deviation is small, while sometimes, for other values, it can be bit bigger.

So I thought about using weighted least-square method...

As i understand weighted LS, I write :
1590761486938.png

I force the condition :
1590761512899.png

And thus Am should look like :
1590761554641.png

where Wi=1/sigma²_i are the weights. So far, so good.

Now, here is the tricky part... Ki also is "uncertain" : Ki comes from an big expression containing 3 uncertain parameters : λ1, λ2, λ3. These λ1, λ2, λ3 are calculated from other datas with non-linear expressions, and I had to propagate the uncertainties on these datas using Monte-Carlo methods. So what I have is something like 1000 "sets" of slightly different λ1, λ2, λ3. I use these 1000 sets and my big Ki expression to generate a set of 1000 Ki.

So, the question becomes :
1) Should I use my WLS expression to calculate Am for each of the 1000 Ki, which would give me 1000 slightly different Am (and then i can happily get the mean, standard deviation or whatever on these 1000 Am values and I'm done).

2) Or should i use a regular LS expression (i.e. without weights) and handle the uncertainty on the yi with Monte-Carlo. So i mean getting something like 1000 random normally-distributed sampling around the yi's and given the associated standard deviation uncertainty from the laboratory. Then with 1000 set of yi, my previous 1000 set of Ki and the common LS formula, i would get 1000 slightly different Am (and then i can happily get the mean, standard deviation or whatever on these 1000 Am values and I'm done)

3) Or should I do both 1) AND 2), i.e. doing "2)" but using the WLS instead of the LS formula ?

At the time of writing this, I'm doing "3)" but i wonder if this is not a bit stupid. Am I accounting for these uncertainties twice and over-estimating the uncertainty on Am ?

Sorry if the post is a bit long but i really tried to explain the thing as detailed as possible. Also sorry if the question is a bit dumb/obvious but I think i thought about this problem a bit too much and i feel like my brain is totally biased / unable to think about this stuff anymore :D

Anyway, big thank you for your time !

 
Last edited:
Physics news on Phys.org
  • #2
imsolost said:
The data yi is not a set of "exact" values though...

Now, here is the tricky part... Ki also is "uncertain"

I suggest you use total least squares regression https://en.wikipedia.org/wiki/Total_least_squares

However, the question of "what's the optimal procedure" is subjective unless you can quantify the cost or utility of results. For example, in ordinary least squares regression, we take for granted that the sum of the squared differences between predicted y-vales and the data is the "cost" of the resulting fit. Mathematics then tells us how to minimize this cost, but it doesn't justify using this cost as a measure of how "good" the fit is. The choice of how to measure the "goodness" of a fit is subjective.

If your eventual goal is to publish a result in a journal, it's advisable to see what methods other published papers used. That will show which subjective choices are backed by custom and tradition.

(When I click on the attachment, I only see a title.)
 
Last edited:
  • #3
I do agree about what you say about "goodness" of the fit, and indeed as u guessed, this is a study that will be published so i need to make sure people will "adhere" to it. And i haven't find a paper where people faced the same thing.

I already saw about "total" least square but i have no idea how to use it considering that the information about the uncertainty on Ki is "contained" in the 1000 samplings on Ki and I have the feeling this is the only way i can decently propagate the uncertainty of the λ's. So i have this (wrong?) feeling that i have to use Monte Carlo again...
 

1. What is the purpose of using least-squares calculation with weights and Monte-Carlo?

The purpose of using this method is to find the best fit line or curve for a set of data points by minimizing the sum of squared errors. The weights and Monte-Carlo approach help to account for any uncertainties or variations in the data, leading to a more accurate and reliable solution.

2. How do weights affect the least-squares calculation?

Weights are used to assign different levels of importance to each data point. This means that points with higher weights will have a larger impact on the overall solution, while points with lower weights will have a smaller impact. This allows for a more precise fit, especially when dealing with data that may have varying levels of accuracy or reliability.

3. What is the role of Monte-Carlo in the least-squares calculation?

Monte-Carlo is a simulation technique that involves running multiple iterations of a calculation using randomly generated values within a specified range. In the context of least-squares calculation, this approach can help to account for uncertainties in the data and provide a more accurate and robust solution.

4. Can the least-squares calculation with weights and Monte-Carlo be applied to any type of data?

Yes, this method can be applied to a wide range of data types, including linear and nonlinear data. It is particularly useful when dealing with data that may have uncertainties or variations, as it allows for a more accurate and reliable solution.

5. Are there any limitations to using least-squares calculation with weights and Monte-Carlo?

While this approach can provide a more accurate and robust solution compared to traditional least-squares methods, it may also be more computationally intensive and time-consuming. Additionally, the accuracy of the solution may still be limited by the quality and quantity of the data available.

Similar threads

Replies
12
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
12
Views
2K
Replies
67
Views
5K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
794
  • General Math
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
847
  • Programming and Computer Science
Replies
1
Views
753
  • Programming and Computer Science
Replies
1
Views
647
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
929
Back
Top