Generating data from trendline

  • Context: Undergrad 
  • Thread starter Thread starter PixelDictator
  • Start date Start date
  • Tags Tags
    Data
Click For Summary

Discussion Overview

The discussion revolves around generating random data points that fit a specified trendline, including maintaining the same uncertainties associated with the slope and y-intercept. Participants explore methods to achieve this without original data points, focusing on both theoretical and practical approaches.

Discussion Character

  • Exploratory
  • Technical explanation
  • Mathematical reasoning

Main Points Raised

  • One participant seeks a method to generate random data points that perfectly recreate a given trendline with specified uncertainties, expressing uncertainty about the feasibility of this task without programming.
  • Another participant suggests generating random numbers and adding normally distributed noise to create data that approximates the trendline, but notes that this may not yield the exact slope and intercept.
  • A further reply clarifies the distinction between generating data that exactly matches the original line and generating data that assumes the line is the correct deterministic part, which may lead to different results when fitted.
  • One participant explains how to scale a set of values to achieve a desired mean and variance, emphasizing the importance of defining "uncertainties" clearly and noting potential differences in regression results depending on which variable is treated as independent.
  • Technical details are provided regarding the equations necessary for ensuring that linear regression applied to the generated data reproduces the desired slope and intercept.

Areas of Agreement / Disagreement

Participants express differing views on whether it is possible to generate data that perfectly matches the original trendline and uncertainties. There is no consensus on the best approach, and multiple methods are discussed.

Contextual Notes

Participants highlight the ambiguity in the term "uncertainties" and the implications of treating different variables as independent in regression analysis. The discussion includes technical details that may require further clarification or validation.

PixelDictator
Messages
2
Reaction score
0
Hello all,

I am trying to take a fitted line, with given standard error in slope and y-intercept, and generate sets of random data points (and corresponding uncertainties) which would give the same line with the same uncertainties.

I'm at a loss for ways to achieve this, and I'm not quite sure that it would be possible without trying to brute-force it with programming, or something equally ugly... Is there any method that would make this happen? We don't have any original data points, just the few numbers about the trendline.
 
Physics news on Phys.org
Generate some numbers and transform them according to the equation of your line. Then just draw "noise" from a normal distribution centred at zero and add it to your data. In matlab, you would do something like this...

x = rand(1,100); % Generate some data
noise = normrnd(0,1,1,100); % Generate noise
y = 2*x + 1 + noise; % Transform it according to the equation of your line
 
PixelDictator said:
Hello all,

which would give the same line with the same uncertainties.

Do you mean exactly the same line with exactly the same standard deviation for the errors? - so someone fitting a line to the generated data would get the exactly the same slope and intercept?

Or do you mean you want to do what Number Nine suggested -which is to assume your line is the correct deterministic part of the equation for the data and then generate the random errors? In that case, someone fitting a line to the generated data might not get exactly the same line as you began with.
 
Stephen,
I'm attempting to do the former. I've set up a program to do what Number Nine suggested, which works pretty well in the meantime, but it would be a lot better if I had a way to recreate the line and uncertainties perfectly.
 
You can scale a set of values to have whatever mean and standard deviation you want by adding and multiplying it by two constants. For example, generate a set of values E. Suppose it has mean mu and variance sigma_sq. For constants c and k, created scaled data by setting F = k E + c. The data F has mean = k mu + c and variance = k^2 ( sigma_sq). You can solve for the values of k and c that produce the mean and variance that you want.

(In this post I'm talking about variances as a "sample variances", which are computed with a denominator of n = the number of data points, not with a denominator of n-1, as in the unbiased estimator for population variance.)

You are using the ambiguous word "uncertainties", and I can't be sure what quantity or quantities you mean by that.

One interesting technicality about linear least squares regression is that if you fit a line to (x,y) data viewing x as the independent variable, you may get a different line that if you regard y as the independent variable. If you want "artificial" data so that the procedure for linear least squares regression produces a given line when applied to that data, then you must be careful to specify which variable is treated as independent.

Assume x is the independent variable and the artificial data is (x, y) with y = A x + B + F where A and B are constants and the F are artificial "errors" from the trendline. The equations that must be satisified in order for the linear regression to reproduce A and B when applied to the data are (as I recall):

A = ( cov(x,Ax + B + F))/ var(x)
B = mean of (A x + B + F) - (A)( mean of x).

where the means and variances involved are sample means and variances of the data.

If I'm clear on what you are trying to do then we can check if I got those equations right and solve for them for k and c.
 

Similar threads

  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 30 ·
2
Replies
30
Views
4K
  • · Replies 20 ·
Replies
20
Views
3K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
Replies
24
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K