Can I Accurately Determine Uncertainty for Data Points with Varying x Values?

In summary, it seems that you would need to add another datapoint to get a better idea of the confidence in the value of y at other x values.
  • #1
aaaa202
1,169
2
I have a lot of measurements of some quantity y as a function of x. All these data points are such that no y_i is taken at the same x_i.
So I want to fit some kind of function to all these data point, but I want an uncertainty in the y_i's. Normally if I had say 10 y_i measured at the same x_i I would calculate the standard deviation and use that as the uncertainty. But since all my y_i are taken at different x_i, I can't do that. How do I give a meaningful uncertainty to all the y_i.
It seems weird that if I have 10 measurements of 10 different y_i (100 data points in total) that I should get something much less uncertain than if I have 100 measurements of 100 all different y_i.
 
Physics news on Phys.org
  • #2
If you postulate some function ##f(x)##, then you have ##f(x_i)##. Then you could consider, for example, ##\sum \left( f(x_i) - y_i \right)^2 ##.

One thing to note is that there is probably some error in ##x_i##, too.
 
  • #3
I assume that the error in x_i is negligible. The problem is that in the sum above I only have one x_i and one y_i since all my measurements are done for different pairs of (x_i,y_i)
 
  • #4
Again. You are fitting a function. That means you have a function. You have points, calculated values of the function at the points, and the measured values. That gives you the sum of squares I mentioned above.

Actually, how do you even fit the function in the first place? Aren't you using a least-squares approach?
 
  • #5
It depends whether the form of the function is known (by theory, say) or whether you are making up the form of the function based on the data.
If the form is given, then as voko indicates, there is nothing special about having multiple y values for the same x. The least squares method gives you a fit, and the R value tells you about the uncertainty in the y. It is unnecessary to evaluate the uncertainty at each x separately.
One complication that can arise is when the uncertainty in y is also a function of x (heteroscedasticity). To handle that, you also need a model for that relationship, e.g. linear with x.

If you are inventing the form to match the data then you need to use model theory to justify the number of free variables in your model. Otherwise you may end up putting a nonsense curve in that goes through all the datapoints.
 
  • #6
Oh okay I get it now. So I fit my data to my model and it gives a standard deviation based on a least-squares approach. This standard deviation should be my errorbars on y_i in the plot right?
 
  • #8
aaaa202 said:
Oh okay I get it now. So I fit my data to my model and it gives a standard deviation based on a least-squares approach. This standard deviation should be my errorbars on y_i in the plot right?
Error bars are at their most useful when you have some other basis for estimating different uncertainties for different datapoints.
When you have no such basis, and you are going to show all datapoints on the chart (including multiple y values for the same x value), error bars are of limited value. You could show two other curves, one a standard deviation (from the R value) above and one below. This would just create a constant width band around the mean curve. If there are places where there are multiple y values for the same (or very close together) x values, the 'local uncertainty' is visible from the spread of the y values.
To make the chart less cluttered, it is common to represent multiple y's at the same x by a single datapoint, but put an error bar around it which illustrates the spread of that cluster. (Correspondingly, they can be represented for the purposes of the curve fitting by a single datapoint and a weighting related to the spread. It should produce the same curve as weighting equally and letting each datapoint be counted separately.)

Of course, what you'd really like to show is the confidence with which the value of y at other x values can be interpolated. This suggests to me a pair of bounding curves which would not merely be parallel to the mean curve. They would be closer together in regions where the observed y values are close to the mean curve - whether the x values are identical or merely close together. I've never seen this done, but it's an interesting idea. E.g., you could imagine adding another datapoint (x, y), and seeing how sensitive the R value is to the value of y. The more sensitive, the more confident you are of the fitted curve at co-ordinate x.
 
Last edited:

What is basic data analysis?

Basic data analysis is the process of organizing, cleaning, and interpreting data in order to draw meaningful insights and conclusions.

What are the steps involved in basic data analysis?

The steps involved in basic data analysis typically include: identifying the research question, collecting and organizing the data, cleaning and preparing the data for analysis, performing descriptive statistics and visualizations, and drawing conclusions or making predictions based on the data.

What are some common tools and techniques used in basic data analysis?

Common tools and techniques used in basic data analysis include spreadsheets, statistical software packages, data visualization tools, and programming languages such as Python or R. Some common techniques include regression analysis, hypothesis testing, and data mining.

What is the importance of basic data analysis in scientific research?

Basic data analysis is crucial in scientific research because it allows scientists to make sense of large amounts of data, identify patterns and relationships, and draw evidence-based conclusions. It also helps to ensure the accuracy and reliability of research findings.

How can I improve my basic data analysis skills?

To improve your basic data analysis skills, it is important to practice and familiarize yourself with various data analysis tools and techniques. You can also take online courses, attend workshops or seminars, and work on real-world data analysis projects to gain hands-on experience.

Similar threads

  • STEM Educators and Teaching
Replies
11
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
19
Views
1K
  • Introductory Physics Homework Help
Replies
15
Views
1K
  • Introductory Physics Homework Help
Replies
9
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
24
Views
2K
  • Set Theory, Logic, Probability, Statistics
2
Replies
64
Views
3K
  • Introductory Physics Homework Help
Replies
8
Views
3K
  • Precalculus Mathematics Homework Help
Replies
6
Views
2K
Back
Top