Generating data from trendline

  • Thread starter Thread starter PixelDictator
  • Start date Start date
  • Tags Tags
    Data
AI Thread Summary
Generating random data points that fit a specified trendline with given uncertainties involves creating values based on the line's equation and adding normally distributed noise. The discussion emphasizes the need to clarify whether the goal is to replicate the exact slope and intercept or to generate data that approximates the trendline with random errors. A method is suggested where one can scale a set of values to achieve desired mean and variance, allowing for the recreation of the line and uncertainties. It is also noted that the treatment of independent and dependent variables can affect the results of linear regression. Understanding these nuances is crucial for accurately generating the desired artificial data.
PixelDictator
Messages
2
Reaction score
0
Hello all,

I am trying to take a fitted line, with given standard error in slope and y-intercept, and generate sets of random data points (and corresponding uncertainties) which would give the same line with the same uncertainties.

I'm at a loss for ways to achieve this, and I'm not quite sure that it would be possible without trying to brute-force it with programming, or something equally ugly... Is there any method that would make this happen? We don't have any original data points, just the few numbers about the trendline.
 
Mathematics news on Phys.org
Generate some numbers and transform them according to the equation of your line. Then just draw "noise" from a normal distribution centred at zero and add it to your data. In matlab, you would do something like this...

x = rand(1,100); % Generate some data
noise = normrnd(0,1,1,100); % Generate noise
y = 2*x + 1 + noise; % Transform it according to the equation of your line
 
PixelDictator said:
Hello all,

which would give the same line with the same uncertainties.

Do you mean exactly the same line with exactly the same standard deviation for the errors? - so someone fitting a line to the generated data would get the exactly the same slope and intercept?

Or do you mean you want to do what Number Nine suggested -which is to assume your line is the correct deterministic part of the equation for the data and then generate the random errors? In that case, someone fitting a line to the generated data might not get exactly the same line as you began with.
 
Stephen,
I'm attempting to do the former. I've set up a program to do what Number Nine suggested, which works pretty well in the meantime, but it would be a lot better if I had a way to recreate the line and uncertainties perfectly.
 
You can scale a set of values to have whatever mean and standard deviation you want by adding and multiplying it by two constants. For example, generate a set of values E. Suppose it has mean mu and variance sigma_sq. For constants c and k, created scaled data by setting F = k E + c. The data F has mean = k mu + c and variance = k^2 ( sigma_sq). You can solve for the values of k and c that produce the mean and variance that you want.

(In this post I'm talking about variances as a "sample variances", which are computed with a denominator of n = the number of data points, not with a denominator of n-1, as in the unbiased estimator for population variance.)

You are using the ambiguous word "uncertainties", and I can't be sure what quantity or quantities you mean by that.

One interesting technicality about linear least squares regression is that if you fit a line to (x,y) data viewing x as the independent variable, you may get a different line that if you regard y as the independent variable. If you want "artificial" data so that the procedure for linear least squares regression produces a given line when applied to that data, then you must be careful to specify which variable is treated as independent.

Assume x is the independent variable and the artificial data is (x, y) with y = A x + B + F where A and B are constants and the F are artificial "errors" from the trendline. The equations that must be satisified in order for the linear regression to reproduce A and B when applied to the data are (as I recall):

A = ( cov(x,Ax + B + F))/ var(x)
B = mean of (A x + B + F) - (A)( mean of x).

where the means and variances involved are sample means and variances of the data.

If I'm clear on what you are trying to do then we can check if I got those equations right and solve for them for k and c.
 
Insights auto threads is broken atm, so I'm manually creating these for new Insight articles. In Dirac’s Principles of Quantum Mechanics published in 1930 he introduced a “convenient notation” he referred to as a “delta function” which he treated as a continuum analog to the discrete Kronecker delta. The Kronecker delta is simply the indexed components of the identity operator in matrix algebra Source: https://www.physicsforums.com/insights/what-exactly-is-diracs-delta-function/ by...
Suppose ,instead of the usual x,y coordinate system with an I basis vector along the x -axis and a corresponding j basis vector along the y-axis we instead have a different pair of basis vectors ,call them e and f along their respective axes. I have seen that this is an important subject in maths My question is what physical applications does such a model apply to? I am asking here because I have devoted quite a lot of time in the past to understanding convectors and the dual...
Back
Top