# Advantage of having more measurements

• I
Hello! I have some points in the plane, with errors on both x and y coordinates. The goal of the experiment is to check if the points are consistent with a straight line or not i.e. if they can be described by a function of the form ##y = f(x)=a+bx## or if there is some nonlinearity involved (e.g. ##y = f(x)=a+bx+cx^2##). Assume first we have only 3 points measured. In this case, the approach is to calculate the area of the triangle formed and the associated error, so we get something of the form ##A\pm dA##. If ##dA>A##, then we are consistent with non-linearity and we can set a constraint (to some given confidence level) on the magnitude of a possible non-linearity (e.g. ##c<c_0##). If we have 4 points, we can do something similar and we can for example calculate the area of the triangle formed by the first 3 points (in order of the x coordinate), ##A_1\pm dA_1## and the area of the last 3 points ##A_2\pm dA_2## and then sum them add and do error propagation to get ##A\pm dA## then proceed as above (in the case of this experiment we expect to not see a non-linearity so we just aim for upper bounds). My question is, what is the advantage of having more points? Intuitively, I expect that the more points you have, the more information you gain and hence the better you can constrain the non-linearity. But it seems like the error gets bigger and bigger, simply because we have more points and error propagation (you can assume that the errors on x and y are the same, or at least very similar for different measurements). So, assuming the points are actually on the line, for 3 points we get ##0\pm dA_3## and for, say 10 points we get ##0\pm dA_{10}## with ##dA_{10}>dA_3##, so the upper bounds we can set on the non-linearity are better (smaller) in the case of 3 points. But intuitively that doesn't make sense. Can someone help me understand what I am doing wrong. Why is it better to have more points? Thank you!

## Answers and Replies

anuttarasammyak
Gold Member
My question is, what is the advantage of having more points?
I believe the more observations or trials we do, the more information we get to know the physical system including its proper disturbance, noise or probabilistic behaviors.

Last edited:
I believe the more observations or trials we do, the more information we get to know the physical system including its proper disturbance, noise or probabilistic behaviors.
Well yeah, this is what I believe intuitively, but I am not sure how to show it mathematically.

anuttarasammyak
Gold Member
Law of large numbers and central limit theorem would be of your interest.

Dale
Mentor
2020 Award
I don’t understand the point of the areas. Why not just estimate c directly using least squares. Or even a Bayesian estimation

• jedishrfu
I don’t understand the point of the areas. Why not just estimate c directly using least squares. Or even a Bayesian estimation
But in order to estimate c, I would need to know the functional form of the non-linearity. However the actual form is very model dependent so in our case we don't want to set constraints on a given model we just want to set a constraint on any deviation from linearity, regardless of its actual form. Am I miss understanding your point?

Basically I want to quantify how far the points are from being on a straight line. I decided to use this area as a quantifier, but I am totally open to suggestions for better ways to do it.

Law of large numbers and central limit theorem would be of your interest.
I know about these in general, I am just not sure how they apply to my particular case. For example, in general the error would go as ##1/\sqrt{N}##, where N is the number of measurements, but I don't see that in my expressions above explicitly, so I am probably doing something wrong.

I do not know what exactly is your system but I expect many (x,y) data plots show some dense and sparse pattern and it becomes clearer for larger N as the below linked experiment video shows as an example. link https://www.hitachi.com/rd/research/materials/quantum/doubleslit/index.html
I don't have many points, tho. Here is a paper that might explain it better (the physics of it is involved, but the details are not important for my question), in figure S2. In the experiments so far, people used to measure 3 points and get something like in figure S2. What it is usually done in literature is to calculate the area created by these 3 points and the error associated to it (by propagating the error from each of the 3 points), and from there set a constraint on the non-linearity (so far all the areas are smaller than the uncertainties, so we were able to just set upper limits). My question is simply, if I am able to measure a 4th point on that plot, how would that help me (I am sure it would, as I would gain more data, but I am not sure mathematically how is the error on the area reduced by adding one more point)?

jedishrfu
Mentor
What you’re trying to do is what a linear regression does. It finds the best line through a set of points. If it looks to be a poor line after a lot of points then you must consider that there’s a different relationship.

Sometimes folks will apply a linear regression to the log values of x or y or both. This scheme can discover polynomial functions like ##y = x^2 ## because a log plot would show a straight line for ##log(y) = 2 log(x)##

Here’s more on linear regression:

https://en.wikipedia.org/wiki/Linear_regression

and this video

What you’re trying to do is what a linear regression does. It finds the best line through a set of points. If it looks to be a poor line after a lot of points then you must consider that there’s a different relationship.

Sometimes folks will apply a linear regression to the log values of x or y or both. This scheme can discover polynomial functions like ##y = x^2 ## because a log plot would show a straight line for ##log(y) = 2 log(x)##

Here’s more on linear regression:

https://en.wikipedia.org/wiki/Linear_regression

and this video

I know what linear regression is, that is not what I am trying to do... as I said in the previous reply, the paper I linked to might explain better what I want to do, especially figure S2. There they measure 3 points, calculate the area of the triangle created by them and quantify the deviation from linearity based on the value of that area. I don't see how doing a linear regression to these 3 points would help me quantify that non-linearity.

• jedishrfu
anuttarasammyak
Gold Member
There they measure 3 points, calculate the area of created by them and quantify the deviation from linearity based on the value of that area.
I observe in S2 they set a half of volume of hexagonal with axis of three momentum vectors as NL, right ? Do these three vectors come from one time experiment data ? I would like to understand how you want to add data or vectors to it in your question.

I observe in S2 they set a half of volume of hexagonal with axis of three momentum vectors as NL, right ? Do these three vectors come from one time experiment data ? I would like to understand how you want to add data or vectors to it in your question.
I am not sure what you mean. What hexagonal volume are you referring to?

anuttarasammyak
Gold Member
Equation (6) and its explanation by S2.

Equation (6) and its explanation by S2.
Equation (6) is just the area of that triangle in figure S2. In the experiment they measure the 6 points ##m\nu_i^{AA_j}## from the x and y axis in figure S2, and from there they calculate the area created.

anuttarasammyak
Gold Member
Equation (6) seems to have dimension of volume p^3 in momentum space
$$|(A \times B)\cdot C|$$
not area for me.

But equation (6) seems to have dimension of volume p^3 in momentum space
$$(A \times B)\cdot C$$
If you look just before equation (5), ##m_\mu## is just a constant, without units.

anuttarasammyak
Gold Member
I see. And the paper saying "Equivalently, in our geometrical picture it is the volume of the parallelepiped defined by −→mν1,2 and −→mµ." assures my view.

Going back to your point what would you like to do more than this triplet vectors ? Making a quartet by incorporating another vector ? Getting a set of the triplet by many experiments?

Last edited:
jim mcnamara
Mentor
You mentioned area. Area==zero. That is how to test for collinearity of points:
https://www.geeksforgeeks.org/program-check-three-points-collinear/

You can also use the distance test, if that makes any difference to you.

Now we are on the same page I hope.

The above is the best way to test when you want yes/no answers. Or. Use some kind of Minimum area test, if you are okay with a not "perfect" result. What you do in this case is up to you. This is arbitrary you realize. Regression seems okay here. As others mentioned.

This is an example for "not perfect", which you already know:
https://cran.r-project.org/web/packages/olsrr/vignettes/regression_diagnostics.html

Tolerance test of multi-collinearity -- what you are asking about i.e., "more points":
https://www.statisticshowto.com/tolerance-level-statistics/

Dale
Mentor
2020 Award
But in order to estimate c, I would need to know the functional form of the non-linearity.
Not really. You can always do a series expansion and approximate your nonlinearity as a polynomial. You only need to know the functional form if you want to make accurate predictions. But if you only want to detect nonlinearity a polynomial is fine.

Basically I want to quantify how far the points are from being on a straight line. I decided to use this area as a quantifier, but I am totally open to suggestions for better ways to do it.
I suggest least squares regression to a polynomial.

My question is simply, if I am able to measure a 4th point on that plot, how would that help me (I am sure it would, as I would gain more data, but I am not sure mathematically how is the error on the area reduced by adding one more point)?
With one more point you could fit a third order polynomial.

Last edited:
Stephen Tashi
Science Advisor
in figure S2. In the experiments so far, people used to measure 3 points and get something like in figure S2.
On the page following that figure, the paper says:
Our procedure above applies to cases with enough experimental data. For systems lacking (sufficiently precise)measurements, we can still derive projections provided that an acceptable estimation of the F21 constant is availablefrom either theory calculation or hyperfine splitting data (whenever available).

So I think the three points in figure S2 are themselves are not necessarily 3 single measurements, but instead , each of those points may be the mean value of many measurements.

Not really. You can always do a series expansion and approximate your nonlinearity as a polynomial. You only need to know the functional form if you want to make accurate predictions. But if you only want to detect nonlinearity a polynomial is fine.

I suggest least squares regression to a polynomial.

With one more point you could fit a third order polynomial.
I am not sure I understand, I do want to set very accurate bounds on the non-linearity. Basically I want to describe my points by ##y=ax+b+g(x)##, with ##g(x) << ax,b##. From there I want to set constraints as tight as possible on the ##g(x)##. If I use a polynomial won't that influence how tight the constraints are? On a more practical aspect, in all the paper on this topic they use this area method, so I assume that if polynomial were to work they would have used them. But given that they use areas in literature, I would still like to find out the answer to my question in the case of using areas to define non-linearity.

You mentioned area. Area==zero. That is how to test for collinearity of points:
https://www.geeksforgeeks.org/program-check-three-points-collinear/

You can also use the distance test, if that makes any difference to you.

Now we are on the same page I hope.

The above is the best way to test when you want yes/no answers. Or. Use some kind of Minimum area test, if you are okay with a not "perfect" result. What you do in this case is up to you. This is arbitrary you realize. Regression seems okay here. As others mentioned.

This is an example for "not perfect", which you already know:
https://cran.r-project.org/web/packages/olsrr/vignettes/regression_diagnostics.html

Tolerance test of multi-collinearity -- what you are asking about i.e., "more points":
https://www.statisticshowto.com/tolerance-level-statistics/
I am not sure I understand what you mean. Of course area=0 for collinear points. But in practice they won't be on a straight line, as we have experimental errors. So the area will be of the form ##3 \pm 5##, which is not zero, but it is consistent with zero within the error. My question is, if I add one more point, and I calculate the area formed by these 4 points, what to I gain compared to the case of having only 3 points. Sending me link to statistic webpages doesn't help me. I know the basics, I just don't know how to apply it to my problem.

On the page following that figure, the paper says:

So I think the three points in figure S2 are themselves are not necessarily 3 single measurements, but instead , each of those points may be the mean value of many measurements.
On yes, the points in figure S2 are the results of many measurements. In the experiment one measures the x and y for a given point several times, then places it on that plot in S2. After measuring 3 such points we quantify the non-linearity by calculating that area. My question is, if I measure a 4th point, with the same uncertainty as the other 3 points. Do I get anything in terms of better constraining the non-linearity of the 3 points.

I see. And the paper saying "Equivalently, in our geometrical picture it is the volume of the parallelepiped defined by −→mν1,2 and −→mµ." assures my view.

Going back to your point what would you like to do more than this triplet vectors ? Making a quartet by incorporating another vector ? Getting a set of the triplet by many experiments?
I would like to add another point to figure S2. In terms of the mathematical description of the problem, the vector will be become 4D (not they are 3D).