Matlab data-analysis problems (systematic error, distributions)

lep11 · Oct 20, 2015

Homework Statement

I have a few data-analysis problems due to Thursday,

1. Assume that a sandbag is dropped at different heights and the observations are (z
_i;t_i) pairs. Physical model for a free fall is z=½gt². Assume that the height measurement z has an additive random error v. The observation model is then z=h(θ;t) +v,where θ is the unknown parameter vector. Write the observation models (withpaper and pen) for the following cases by treating systematic errors as model parameters:
a) height z has a systematic error z₀(z_obs=z_real+z₀)
b) falling time thas a systematic error t₀ (tobs=treal+t0)
c) both z and t have additive systematic errors. Simulate results by using Matlab for graphic presentation.

2. Assume that a sandbag is dropped at different heights and the observations are (zi,ti)pairs. It’s known that the clock advances p percents, but the clock can be considered precise (no random error for t). In addition, assume that the height z has a systematic error z₀. Write the (linear) observation model withpaper and pen. How would you solve a value for g by using the model?

4. Simulate with
Matlab the distributions of:
y1=sin(x)
y2=cos(x)
y3=log(x)
when x is uniformly distributed between [−π, π] in Eq. (1) and in Eq. (2), and uniformly distributed
between [1,2] in Eq. (3).

5. The second order polynomial function is of the form
y=p(1)x2+(2)x+p(3). The exact roots of the polynomial are x₁=1 and x₂=2. For some reason, there is a uniformly distributed error in the observed polynomial coefficients p(1), p2) and p(3). Here, we examine the distribution of observed roots in that case. (Matlab simulation)

a) Solve the exact polynomial coefficients p(1), p(2) and p(3) (command poly) and plot the polynomial y
in range x∈[−1,...,4] (command polyval)
b) Generate N= 5000 observations for polynomial coefficients with uniformly distributed noise.
Solve the roots for each observation (command roots).
c) Plot a histogram of observed roots x₁ and x₂

The attempt at a solution

1.
a.) z=h(θ;t) +v +z₀
b.) z=h(θ;t+t₀) +v
c.) z=h(θ;t+t₀) +v+z₀

%% Problem 1/a

g=10; % gravity constant
h_err=2; % systematic error of height
t=[0.3 0.5 0.7 1 2 2.5 3 3.7 3.8 4]';
tt=[0:.1:4]';

h=0.5*g*t.^2 + h_err + 10*randn(size(t));

H=[0.5*t.^2 ones(size(t))];

th=H\h

HH=[0.5*tt.^2 ones(size(tt))];

figure(1),clf
plot(t,h,'+',tt,HH*th)
xlabel 't'
ylabel 'h'

%% Problem 1/b

g=10; % gravity constant
t_err=2; % systematic error of time
t=[0.3 0.5 0.7 1 2 2.5 3 3.7 3.8 4]';
tt=[0:.1:4]';

h=0.5*g*(t+t_err).^2 + 10*randn(size(t));

H=[0.5*t.^2 ones(size(t))];

th=H\h;

HH=[0.5*tt.^2 ones(size(tt))];

figure(2),clf
plot(t,h,'+',tt,HH*th)
xlabel 't'
ylabel 'h'

%% Problem 1/c

g=10; % gravity constant
h_err=2; % systematic error of height
t_err=0;5; % systematic error of time
t=[0.3 0.5 0.7 1 2 2.5 3 3.7 3.8 4]';
tt=[0:.1:4]';

h=0.5*g*(t+t_err).^2 + h_err + 10*randn(size(t));

H=[0.5*t.^2 ones(size(t))];

th=H\h;

HH=[0.5*tt.^2 ones(size(tt))];

figure(3),clf
plot(t,h,'+',tt,HH*th)
xlabel 't'
ylabel 'h'

%% Problem 2

g=10; % gravity constant
t_err=0;01*t; % systematic error of time, e.g.clock is advancing one percent
t=[0.3 0.5 0.7 1 2 2.5 3 3.7 3.8 4]';
tt=[0:.1:4]';

h=0.5*g*(t+t_err).^2 + h_err;

H=[0.5*t.^2 ones(size(t))];

th=H\h

HH=[0.5*tt.^2 ones(size(tt))];

figure(4),clf
plot(t,h,'+',tt,HH*th)
xlabel 't'
ylabel 'h'

%% Problem 4

x_1=unifdist(-pi,pi,2,1);

y_1=sin (x_1);
y_2=cos (x_1);x_3=unifdist(1,2,2,1);
y_3=log (x_3);

I will appreciate any help as I am not very familiar with MATLAB yet \\ could sb please check what I've done so far
p.s. not sure if this is the right forum section for matlab-based questions

RUber · Oct 20, 2015

There is a lot in this post.
Let me start with 1a.
You have an additive random error in h, and a systematic (constant) error in h, right?
h=0.5*g*t.^2 + h_err + 10*randn(size(t));
You have 10 * randn, which gives you a normally distributed error with standard deviation of 10. Is that v? That seems really high for an additive random error, but I can work with it.

In 1b, you are using (time + time_error) to compute height. This does not take into consideration that you have data collected for z, right? Is that supposed to be the case?
---
I suppose I am confused about what your output is supposed to represent. Are the red curves your 2nd degree polynomial best fit curves for the data, z_obs, t_obs? That would make the most sense...Maybe adding a second curve to show the true relation might be helpful as well for visualizing the impact of these errors.

The way I see this is:
Given data ## (z_i, t_i), ## and model ##z =\frac12 g t^2 = h(\theta,t)## You have a model (based on accurate measurements of z and t): ##z_{obs} = \frac12 g t_{true}^2 + v## where v is your random error ##v \sim norm(0,10)##.
I used the 10 as the standard deviation based on your model.

If you add in measurement error in z_i, then your model becomes
##z_{true} =\frac12 g t^2##, and ##z_{obs} = z_{true} + v + z_0## so...
##z_{obs} = \frac12 g t_{true}^2 + v+ z_0##

If you add in measurement error in t_i, then what happens?
##z_{true} =\frac12 g t_{true}^2##
##z_{obs} = \frac12 g t_{true}^2 + v+ z_0##
But what is t_true? ##t_{obs} = t_{true}+t_0##
##z_{obs} = \frac12 g (t_{obs}-t_0 )^2 + v+ z_0##
If you expand that out, you get:
##z_{obs} = \frac12 g (t_{obs}^2-2t_{obs}t_0+t_0^2 ) + v+ z_0##
This should make sense, because your data for z should not be impacted by your error in measuring time.
Make the plot of z_obs vs. t_obs, that should be your observational model.
What should be changed is maybe your estimation of the curve, since all your times are shifted right by 2 (using your constant error of 2).

lep11 · Oct 20, 2015

RUber said:

There is a lot in this post.
Let me start with 1a.
You have an additive random error in h, and a systematic (constant) error in h, right?
h=0.5*g*t.^2 + h_err + 10*randn(size(t));
You have 10 * randn, which gives you a normally distributed error with standard deviation of 10. Is that v? That seems really high for an additive random error, but I can work with it.

Yes, v is 'random error' which is normally distributed. I can make it's deviation smaller so let it be 1.

RUber said:

In 1b, you are using (time + time_error) to compute height. This does not take into consideration that you have data collected for z, right? Is that supposed to be the case?

I think so.

RUber said:

I suppose I am confused about what your output is supposed to represent. Are the red curves your 2nd degree polynomial best fit curves for the data, z_obs, t_obs? That would make the most sense.

They should be. :oldsmile:

RUber said:

What should be changed is maybe your estimation of the curve, since all your times are shifted right by 2 (using your constant error of 2).

How should I change it?

RUber · Oct 20, 2015

Your t data set in Matlab is the true t, right?
But your observational model should be based off of observed z and observed t.
So, your observed t should b the true t plus your error.
Your observed z will not be affected by any error in t, so you can calculate that with the true t data.
Then you fit your curve to the observed data.

Specifically, I would change your line for h to give you the observed z values ... this should be (for part b)
h=0.5*g*(t).^2 + 10*randn(size(t));
See how it is the true function of t plus the sampling error in measuring the height?

Then your H data is the prediction from the model based on your observed time data...so this should be:
H=[0.5*(t+t_err).^2 ones(size(t))];
See how it is the function of the observed time?

Then in your plots, I would change the + to appear at the observed time and height.
plot(t+t_err,h,'+',tt,HH*th)

But you will notice that your reference line for the fit will end early, so then add t_err to the end of your range for tt.
tt=[0:.1:4+t_err]';

These edits should help you understand how this model is actually working. Apply that to fix 1c.

RUber · Oct 20, 2015

For problem 4, I do not have a function called unifdist in my Matlab version. What do the 2,1 at the end of the function refer to?
I normally use rand() to call a uniform distribution between 0 and 1.
You can adjust this by multiplying by the size of the interval and adding/subtracting to recenter. So for -pi to pi, you would have
x1 = rand(N,1)*2*pi - pi
N is the number of points you want to generate.

Does your unifdist function allow you to specify how many points you want to generate?
What does simulate the distribution mean to you? Is this something that you would sample from later? The hist(y) function will help to visualize the functions' effects on the uniform distribution if you generate enough points to see the pattern.

RUber · Oct 21, 2015

Have you had any luck on this set of problems?

lep11 · Oct 21, 2015

RUber said:

Does your unifdist function allow you to specify how many points you want to generate?

Yes, it is a function I created to pick N uniformly distributed numbers. But now I changed it to
x_1 = rand(N,1)*2*pi - pi
x_2= rand(N,1)*2 - 1.5

RUber said:

Have you had any luck on this set of problems?
ern.

Well, not really. I am almost giving up :/

matlab is giving error messages and only plots one graph

RUber · Oct 21, 2015

Sorry to hear you are so frustrated. I will comment more explicitly on your first two codes to help you see what to do. 1a doesn't need to be changes, I just point out what the lines are doing.
Essentially, your h is your observed z values and H is your estimated z based on t_obs. In 1a, your t_obs is accurate, so no problems. In 1b, you used t_obs in your line for h, and that was a problem.

For 1a, You have:
----
%% Problem 1/a
g=10; % gravity constant
h_err=2; % systematic error of height
t=[0.3 0.5 0.7 1 2 2.5 3 3.7 3.8 4]';
tt=[0:.1:4]'; %This is the range of times you want to plot the curve on

h=0.5*g*t.^2 + h_err + 10*randn(size(t)); %This is creating your Z_obs

H=[0.5*t.^2 ones(size(t))]; %This is creating your Z estimates based on t_obs

th=H\h

HH=[0.5*tt.^2 ones(size(tt))]; %This is your best fit curve for the data at points in tt

figure(1),clf
plot(t,h,'+',tt,HH*th) %+ is (observed t, observed z), line is best fit curve.
xlabel 't'
ylabel 'h'
----
I see no real issues with this one.
-----
For 1b, you have:
-----
%% Problem 1/b
g=10; % gravity constant
t_err=2; % systematic error of time
t=[0.3 0.5 0.7 1 2 2.5 3 3.7 3.8 4]'; % t + t_err is your observed time
tt=[0:.1:4]'; % Add t_err to upper bound of this interval for better plot

h=0.5*g*(t+t_err).^2 + 10*randn(size(t)); % This should be 0.5*g*(t).^2 + 10*randn(size(t)); since it is calculating Z_obs

H=[0.5*t.^2 ones(size(t))]; % This should be =[0.5*(t+t_err).^2 ones(size(t))]; since it is calculating estimated z from t_obs.

th=H\h;

HH=[0.5*tt.^2 ones(size(tt))];

figure(2),clf
plot(t,h,'+',tt,HH*th) % Should be plot(t+t_err,h,'+',tt,HH*th) so that + represents (Z_obs, t_obs)
xlabel 't'
ylabel 'h'
_____Do these edits make sense? You should at least get plots from both of these.

lep11 · Oct 21, 2015

yes, now 1 a and b works

RUber · Oct 21, 2015

1c should just be adding back in the error in z_obs from 1a on top of the code for 1b.

h=0.5*g*t.^2 + h_err + 10*randn(size(t)); %This is creating your Z_obs

RUber · Oct 21, 2015

Also, in 1c, your code has the line:
t_err=0;5; % systematic error of time
That semicolon after 0 ends the line...so 5 is doing nothing.

lep11 · Oct 21, 2015

So now it looks like this...

%% Problem 1/a
g=10; % gravity constant
h_err=2; % systematic error of height
t=[0.3 0.5 0.7 1 2 2.5 3 3.7 3.8 4]';
tt=[0:.1:4]'; %This is the range of times I want to plot the curve on

h=0.5*g*t.^2 + h_err + 10*randn(size(t)); %This is creating Z_obs

H=[0.5*t.^2 ones(size(t))]; %This is creating Z estimates based on t_obs

th=H\h;

HH=[0.5*tt.^2 ones(size(tt))]; %This is your best fit curve for the data at points in tt

figure(1)
plot(t,h,'+',tt,HH*th) %+ is (observed t, observed z), line is best fit curve.
xlabel 't'
ylabel 'h'

%% Problem 1/b
g=10; % gravity constant
t_err=2; % systematic error of time
t=[0.3 0.5 0.7 1 2 2.5 3 3.7 3.8 4]'; % t + t_err is observed time
tt=[0:.1:4]'+t_err; % Added t_err to upper bound of this interval for better plot

h=0.5*g*(t).^2 + 10*randn(size(t)); % This is calculating Z_obs

H=[0.5*(t+t_err).^2 ones(size(t))]; % This is calculating estimated z from t_obs.

th=H\h;

HH=[0.5*tt.^2 ones(size(tt))];

figure(2)
plot(t+t_err,h,'+',tt,HH*th) %+ represents (Z_obs, t_obs)
xlabel 't'
ylabel 'h'

%% Problem 1/c

g=10; % gravity constant
h_err=2; % systematic error of height
t_err=0.5; % systematic error of time
t=[0.3 0.5 0.7 1 2 2.5 3 3.7 3.8 4]';
tt=[0:.1:4]'+t_err; %This is the range of times I want to plot the curve on

h=0.5*g*(t).^2 + h_err + 10*randn(size(t)); %This is calculating Z_obs

H=[0.5*(t+t_err).^2 ones(size(t))]; %This is creating Z estimates based on t_obs

th=H\h;

HH=[0.5*tt.^2 ones(size(tt))];

figure(3)
plot(t+t_err,h,'+',tt,HH*th)
xlabel 't'
ylabel 'h'

%% Problem 2

g=10; % gravity constant
t_err=0.01*t; % systematic error of time, e.g. clock is advancing one percent
t=[0.3 0.5 0.7 1 2 2.5 3 3.7 3.8 4]';
tt=[0:.1:4]';%This is the range of times I want to plot the curve on

h=0.5*g*(t).^2 + h_err;

H=[0.5*(t+t_err).^2 ones(size(t))];

th=H\h

HH=[0.5*tt.^2 ones(size(tt))];

figure(4)
plot(t+t_err,h,'+',tt,HH*th)
xlabel 't'
ylabel 'h'

%% Problem 4

x=rand(50,1)*2*pi-pi
y_1=sin (x);
hist(y_1,50)

x=rand(50,1)*2*pi-pi
y_2=cos (x);
hist(y_2,50)

x_2=rand(50,1)*2-1.5;
y_3=log (x_2);
hist(y_3,50)

what does 'clf' do?
when i click run i get only figures 1,2,3,4 where number 4 is a histogram so there is something wrong with them?
and in problem 2 how would I solve a value for g by using the model? least squares method?

RUber · Oct 21, 2015

I have never used clf before.
If you are running all the lines at the same time, you will want to use the
figure( 5 ), figure(6), etc. before your plot command to declare which window you want to plot into. If you are doing them 1 by 1, there is no need.

In this part of 4:
x_2=rand(50,1)*2-1.5;
y_3=log (x_2);
hist(y_3,50)

Your range for x_2 is off. This will generate a uniform distribution between -1.5 and .5, you are looking for uniform on [1,2], right?
x_2=rand(50,1)
Is uniform on [0,1], so just add 1 to shift that interval to the right.
x_2=rand(50,1)+1.

50 bins for your histogram is way too many for 50 data points. Either make much larger x vectors or reduce the bins in your hist(y, bins) command.
g is one of the outputs in "th", check the other models to see.

lep11 · Oct 21, 2015

How about no. 5?

%% Problem 5/a

%y=p(1)x^2+(2)x+p(3) and two known roots are x_1=1, x_2=2
r=[1,2];
p=[2,1,0];

p=poly(r);%% Problem 5/b

p_1=1+rand(5000,1)
p_2=-3+rand(5000,1)
p_3=2+rand(5000,1)

%% Problem 5/c

RUber · Oct 22, 2015

5. The second order polynomial function is of the form
y=p(1)x2+(2)x+p(3). The exact roots of the polynomial are x1=1 and x2=2. For some reason, there is a uniformly distributed error in the observed polynomial coefficients p(1), p2) and p(3). Here, we examine the distribution of observed roots in that case. (Matlab simulation)

a) Solve the exact polynomial coefficients p(1), p(2) and p(3) (command poly) and plot the polynomial y
in range x∈[−1,...,4] (command polyval)
b) Generate N= 5000 observations for polynomial coefficients with uniformly distributed noise.
Solve the roots for each observation (command roots).
c) Plot a histogram of observed roots x1 and x2

lep11 said:

How about no. 5?

%% Problem 5/a

%y=p(1)x^2+(2)x+p(3) and two known roots are x_1=1, x_2=2
r=[1,2];
p=[2,1,0];

p=poly(r);%% Problem 5/b

p_1=1+rand(5000,1)
p_2=-3+rand(5000,1)
p_3=2+rand(5000,1)

%% Problem 5/c

You have the first step right for 5a. I don't know why you have a line that says p=[2,1,0] followed by another line defining p...this is unnecessary.
%Step 1. Input parameters
r = [1,2]; % you did this right.
p = poly(r); % you did this right.
%Step 2. Plot the polynomial
x = linspace(-1,4,500); %linspace allows you to put in endpoints...the 500 is how many points you want, if you leave this blank it will default to 100.
y = polyval(p,x); % plots polynomial with coefficients [p] on points [x].
plot(x,y)

For 5b, you are told to add uniformly distributed noise. Noise is normally centered on the actual values, so I would add 2*k*(rand()-0.5) to put it into the range [-k,k].
You can also make all your p's in one matrix like:
k = .5;
P = ones(5000,1)*p+ 2*k*(rand(5000,3)-0.5);
Make your roots collecting matrix:
Roots = zeros(5000,2); % you will fill this up using the roots command.

I don't know how to take roots on a full matrix, so you may have to use a for loop -- feed one row at a time from P into the roots function.
I found that if your k is too large, you might start to get imaginary roots, so keep it relatively small.

For 5c, the histogram is the same command you did before. Your Roots should come out in a 5000x2 matrix, so separately do hist(Roots(:,1)) and hist(Roots(:,2)).
If you have imaginary roots, shrink k or use the real command to get the nearest real value to the root...hist(real(Roots(:,1))) and hist(real(Roots(:,2))).

lep11 · Oct 24, 2015

Thank you!

Matlab data-analysis problems (systematic error, distributions)

1. What is systematic error in Matlab data analysis?

2. How can systematic error be identified and corrected in Matlab?

3. What are some common distributions used in Matlab data analysis?

4. How can the distribution of data affect the results of a Matlab analysis?

5. How can I ensure the accuracy of my data analysis in Matlab?

Similar threads

Hot Threads

Recent Insights