Generating data from trendline

PixelDictator · Feb 6, 2012

Hello all,

I am trying to take a fitted line, with given standard error in slope and y-intercept, and generate sets of random data points (and corresponding uncertainties) which would give the same line with the same uncertainties.

I'm at a loss for ways to achieve this, and I'm not quite sure that it would be possible without trying to brute-force it with programming, or something equally ugly... Is there any method that would make this happen? We don't have any original data points, just the few numbers about the trendline.

Number Nine · Feb 6, 2012

Generate some numbers and transform them according to the equation of your line. Then just draw "noise" from a normal distribution centred at zero and add it to your data. In matlab, you would do something like this...

x = rand(1,100); % Generate some data
noise = normrnd(0,1,1,100); % Generate noise
y = 2*x + 1 + noise; % Transform it according to the equation of your line

Stephen Tashi · Feb 7, 2012

PixelDictator said:

Hello all,

which would give the same line with the same uncertainties.

Do you mean exactly the same line with exactly the same standard deviation for the errors? - so someone fitting a line to the generated data would get the exactly the same slope and intercept?

Or do you mean you want to do what Number Nine suggested -which is to assume your line is the correct deterministic part of the equation for the data and then generate the random errors? In that case, someone fitting a line to the generated data might not get exactly the same line as you began with.

PixelDictator · Feb 13, 2012

Stephen,
I'm attempting to do the former. I've set up a program to do what Number Nine suggested, which works pretty well in the meantime, but it would be a lot better if I had a way to recreate the line and uncertainties perfectly.

Stephen Tashi · Feb 13, 2012

You can scale a set of values to have whatever mean and standard deviation you want by adding and multiplying it by two constants. For example, generate a set of values E. Suppose it has mean mu and variance sigma_sq. For constants c and k, created scaled data by setting F = k E + c. The data F has mean = k mu + c and variance = k^2 ( sigma_sq). You can solve for the values of k and c that produce the mean and variance that you want.

(In this post I'm talking about variances as a "sample variances", which are computed with a denominator of n = the number of data points, not with a denominator of n-1, as in the unbiased estimator for population variance.)

You are using the ambiguous word "uncertainties", and I can't be sure what quantity or quantities you mean by that.

One interesting technicality about linear least squares regression is that if you fit a line to (x,y) data viewing x as the independent variable, you may get a different line that if you regard y as the independent variable. If you want "artificial" data so that the procedure for linear least squares regression produces a given line when applied to that data, then you must be careful to specify which variable is treated as independent.

Assume x is the independent variable and the artificial data is (x, y) with y = A x + B + F where A and B are constants and the F are artificial "errors" from the trendline. The equations that must be satisified in order for the linear regression to reproduce A and B when applied to the data are (as I recall):

A = ( cov(x,Ax + B + F))/ var(x)
B = mean of (A x + B + F) - (A)( mean of x).

where the means and variances involved are sample means and variances of the data.

If I'm clear on what you are trying to do then we can check if I got those equations right and solve for them for k and c.

Generating data from trendline

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad How do E[X] and E[|X|] relate?

Graduate Expected numbers of cards of a last color remaining

Undergrad Understanding permutations and combinations in a coin toss experiment

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight