Matlab data-analysis problems (systematic error, distributions)

In summary: Could you please give a bit more details?Oh I see you have the errors in the wrong place.##z_{obs} = \frac12 g t_{true}^2 + v+ z_0##I think.But I am not sure. It is not clear from the problem statement.I will try again later.It is not clear from the problem statement.I will try again later.Thank you so much. I really appreciate it!
  • #1
lep11
380
7
Homework Statement

I have a few data-analysis problems due to Thursday,

1. Assume that a sandbag is dropped at different heights and the observations are (z
i;ti) pairs. Physical model for a free fall is z=½gt2. Assume that the height measurement z has an additive random error v. The observation model is then z=h(θ;t) +v,where θ is the unknown parameter vector. Write the observation models (withpaper and pen) for the following cases by treating systematic errors as model parameters:
a) height z has a systematic error z0 (zobs=zreal+z0)
b) falling time thas a systematic error t0 (tobs=treal+t0)
c) both z and t have additive systematic errors. Simulate results by using Matlab for graphic presentation.

2. Assume that a sandbag is dropped at different heights and the observations are (zi,ti)pairs. It’s known that the clock advances p percents, but the clock can be considered precise (no random error for t). In addition, assume that the height z has a systematic error z0. Write the (linear) observation model withpaper and pen. How would you solve a value for g by using the model?

4.
Simulate with
Matlab the distributions of:
y1=sin(x)
y2=cos(x)
y3=log(x)
when x is uniformly distributed between [−π, π] in Eq. (1) and in Eq. (2), and uniformly distributed
between [1,2] in Eq. (3).

5. The second order polynomial function is of the form
y=p(1)x2+(2)x+p(3). The exact roots of the polynomial are x1=1 and x2=2. For some reason, there is a uniformly distributed error in the observed polynomial coefficients p(1), p2) and p(3). Here, we examine the distribution of observed roots in that case. (Matlab simulation)

a) Solve the exact polynomial coefficients p(1), p(2) and p(3) (command poly) and plot the polynomial y
in range x∈[−1,...,4] (command polyval)
b) Generate N= 5000 observations for polynomial coefficients with uniformly distributed noise.
Solve the roots for each observation (command roots).
c) Plot a histogram of observed roots x1 and x2

The attempt at a solution

1.
a.)
z=h(θ;t) +v +z0
b.) z=h(θ;t+t0) +v
c.) z=h(θ;t+t0) +v+z0

%% Problem 1/a

g=10; % gravity constant
h_err=2; % systematic error of height
t=[0.3 0.5 0.7 1 2 2.5 3 3.7 3.8 4]';
tt=[0:.1:4]';

h=0.5*g*t.^2 + h_err + 10*randn(size(t));

H=[0.5*t.^2 ones(size(t))];

th=H\h

HH=[0.5*tt.^2 ones(size(tt))];

figure(1),clf
plot(t,h,'+',tt,HH*th)
xlabel 't'
ylabel 'h'

%% Problem 1/b

g=10; % gravity constant
t_err=2; % systematic error of time
t=[0.3 0.5 0.7 1 2 2.5 3 3.7 3.8 4]';
tt=[0:.1:4]';

h=0.5*g*(t+t_err).^2 + 10*randn(size(t));

H=[0.5*t.^2 ones(size(t))];

th=H\h;

HH=[0.5*tt.^2 ones(size(tt))];

figure(2),clf
plot(t,h,'+',tt,HH*th)
xlabel 't'
ylabel 'h'

%% Problem 1/c


g=10; % gravity constant
h_err=2; % systematic error of height
t_err=0;5; % systematic error of time
t=[0.3 0.5 0.7 1 2 2.5 3 3.7 3.8 4]';
tt=[0:.1:4]';

h=0.5*g*(t+t_err).^2 + h_err + 10*randn(size(t));

H=[0.5*t.^2 ones(size(t))];

th=H\h;

HH=[0.5*tt.^2 ones(size(tt))];

figure(3),clf
plot(t,h,'+',tt,HH*th)
xlabel 't'
ylabel 'h'

%% Problem 2

g=10; % gravity constant
t_err=0;01*t; % systematic error of time, e.g.clock is advancing one percent
t=[0.3 0.5 0.7 1 2 2.5 3 3.7 3.8 4]';
tt=[0:.1:4]';

h=0.5*g*(t+t_err).^2 + h_err;

H=[0.5*t.^2 ones(size(t))];

th=H\h

HH=[0.5*tt.^2 ones(size(tt))];

figure(4),clf
plot(t,h,'+',tt,HH*th)
xlabel 't'
ylabel 'h'

%% Problem 4

x_1=unifdist(-pi,pi,2,1);

y_1=sin (x_1);
y_2=cos (x_1);x_3=unifdist(1,2,2,1);
y_3=log (x_3);

I will appreciate any help as I am not very familiar with MATLAB yet \\ could sb please check what I've done so far
p.s. not sure if this is the right forum section for matlab-based questions
 
Last edited:
Physics news on Phys.org
  • #2
There is a lot in this post.
Let me start with 1a.
You have an additive random error in h, and a systematic (constant) error in h, right?
h=0.5*g*t.^2 + h_err + 10*randn(size(t));
You have 10 * randn, which gives you a normally distributed error with standard deviation of 10. Is that v? That seems really high for an additive random error, but I can work with it.

In 1b, you are using (time + time_error) to compute height. This does not take into consideration that you have data collected for z, right? Is that supposed to be the case?
---
I suppose I am confused about what your output is supposed to represent. Are the red curves your 2nd degree polynomial best fit curves for the data, z_obs, t_obs? That would make the most sense...Maybe adding a second curve to show the true relation might be helpful as well for visualizing the impact of these errors.

The way I see this is:
Given data ## (z_i, t_i), ## and model ##z =\frac12 g t^2 = h(\theta,t)## You have a model (based on accurate measurements of z and t): ##z_{obs} = \frac12 g t_{true}^2 + v## where v is your random error ##v \sim norm(0,10)##.
I used the 10 as the standard deviation based on your model.

If you add in measurement error in z_i, then your model becomes
##z_{true} =\frac12 g t^2##, and ##z_{obs} = z_{true} + v + z_0## so...
##z_{obs} = \frac12 g t_{true}^2 + v+ z_0##

If you add in measurement error in t_i, then what happens?
##z_{true} =\frac12 g t_{true}^2##
##z_{obs} = \frac12 g t_{true}^2 + v+ z_0##
But what is t_true? ##t_{obs} = t_{true}+t_0##
##z_{obs} = \frac12 g (t_{obs}-t_0 )^2 + v+ z_0##
If you expand that out, you get:
##z_{obs} = \frac12 g (t_{obs}^2-2t_{obs}t_0+t_0^2 ) + v+ z_0##
This should make sense, because your data for z should not be impacted by your error in measuring time.
Make the plot of z_obs vs. t_obs, that should be your observational model.
What should be changed is maybe your estimation of the curve, since all your times are shifted right by 2 (using your constant error of 2).
 
  • #3
RUber said:
There is a lot in this post.
Let me start with 1a.
You have an additive random error in h, and a systematic (constant) error in h, right?
h=0.5*g*t.^2 + h_err + 10*randn(size(t));
You have 10 * randn, which gives you a normally distributed error with standard deviation of 10. Is that v? That seems really high for an additive random error, but I can work with it.
Yes, v is 'random error' which is normally distributed. I can make it's deviation smaller so let it be 1.
RUber said:
In 1b, you are using (time + time_error) to compute height. This does not take into consideration that you have data collected for z, right? Is that supposed to be the case?
I think so.
RUber said:
I suppose I am confused about what your output is supposed to represent. Are the red curves your 2nd degree polynomial best fit curves for the data, z_obs, t_obs? That would make the most sense.
They should be. :oldsmile:

RUber said:
What should be changed is maybe your estimation of the curve, since all your times are shifted right by 2 (using your constant error of 2).
How should I change it?
 
  • #4
Your t data set in Matlab is the true t, right?
But your observational model should be based off of observed z and observed t.
So, your observed t should b the true t plus your error.
Your observed z will not be affected by any error in t, so you can calculate that with the true t data.
Then you fit your curve to the observed data.

Specifically, I would change your line for h to give you the observed z values ... this should be (for part b)
h=0.5*g*(t).^2 + 10*randn(size(t));
See how it is the true function of t plus the sampling error in measuring the height?

Then your H data is the prediction from the model based on your observed time data...so this should be:
H=[0.5*(t+t_err).^2 ones(size(t))];
See how it is the function of the observed time?

Then in your plots, I would change the + to appear at the observed time and height.
plot(t+t_err,h,'+',tt,HH*th)

But you will notice that your reference line for the fit will end early, so then add t_err to the end of your range for tt.
tt=[0:.1:4+t_err]';

These edits should help you understand how this model is actually working. Apply that to fix 1c.
 
  • #5
For problem 4, I do not have a function called unifdist in my Matlab version. What do the 2,1 at the end of the function refer to?
I normally use rand() to call a uniform distribution between 0 and 1.
You can adjust this by multiplying by the size of the interval and adding/subtracting to recenter. So for -pi to pi, you would have
x1 = rand(N,1)*2*pi - pi
N is the number of points you want to generate.

Does your unifdist function allow you to specify how many points you want to generate?
What does simulate the distribution mean to you? Is this something that you would sample from later? The hist(y) function will help to visualize the functions' effects on the uniform distribution if you generate enough points to see the pattern.
 
  • #6
Have you had any luck on this set of problems?
 
  • #7
RUber said:
Does your unifdist function allow you to specify how many points you want to generate?
Yes, it is a function I created to pick N uniformly distributed numbers. But now I changed it to
x_1 = rand(N,1)*2*pi - pi
x_2= rand(N,1)*2 - 1.5

RUber said:
Have you had any luck on this set of problems?
ern.
Well, not really. I am almost giving up :/

matlab is giving error messages and only plots one graph
 
  • #8
Sorry to hear you are so frustrated. I will comment more explicitly on your first two codes to help you see what to do. 1a doesn't need to be changes, I just point out what the lines are doing.
Essentially, your h is your observed z values and H is your estimated z based on t_obs. In 1a, your t_obs is accurate, so no problems. In 1b, you used t_obs in your line for h, and that was a problem.

For 1a, You have:
----
%% Problem 1/a
g=10; % gravity constant
h_err=2; % systematic error of height
t=[0.3 0.5 0.7 1 2 2.5 3 3.7 3.8 4]';
tt=[0:.1:4]'; %This is the range of times you want to plot the curve on

h=0.5*g*t.^2 + h_err + 10*randn(size(t)); %This is creating your Z_obs

H=[0.5*t.^2 ones(size(t))]; %This is creating your Z estimates based on t_obs

th=H\h

HH=[0.5*tt.^2 ones(size(tt))]; %This is your best fit curve for the data at points in tt

figure(1),clf
plot(t,h,'+',tt,HH*th) %+ is (observed t, observed z), line is best fit curve.
xlabel 't'
ylabel 'h'
----
I see no real issues with this one.
-----
For 1b, you have:
-----
%% Problem 1/b
g=10; % gravity constant
t_err=2; % systematic error of time
t=[0.3 0.5 0.7 1 2 2.5 3 3.7 3.8 4]'; % t + t_err is your observed time
tt=[0:.1:4]'; % Add t_err to upper bound of this interval for better plot

h=0.5*g*(t+t_err).^2 + 10*randn(size(t)); % This should be 0.5*g*(t).^2 + 10*randn(size(t)); since it is calculating Z_obs

H=[0.5*t.^2 ones(size(t))]; % This should be =[0.5*(t+t_err).^2 ones(size(t))]; since it is calculating estimated z from t_obs.

th=H\h;

HH=[0.5*tt.^2 ones(size(tt))];

figure(2),clf
plot(t,h,'+',tt,HH*th) % Should be plot(t+t_err,h,'+',tt,HH*th) so that + represents (Z_obs, t_obs)
xlabel 't'
ylabel 'h'
_____Do these edits make sense? You should at least get plots from both of these.
 
  • #9
yes, now 1 a and b works
 
Last edited:
  • #10
1c should just be adding back in the error in z_obs from 1a on top of the code for 1b.

h=0.5*g*t.^2 + h_err + 10*randn(size(t)); %This is creating your Z_obs
 
  • #11
Also, in 1c, your code has the line:
t_err=0;5; % systematic error of time
That semicolon after 0 ends the line...so 5 is doing nothing.
 
  • #12
So now it looks like this...

%% Problem 1/a
g=10; % gravity constant
h_err=2; % systematic error of height
t=[0.3 0.5 0.7 1 2 2.5 3 3.7 3.8 4]';
tt=[0:.1:4]'; %This is the range of times I want to plot the curve on

h=0.5*g*t.^2 + h_err + 10*randn(size(t)); %This is creating Z_obs

H=[0.5*t.^2 ones(size(t))]; %This is creating Z estimates based on t_obs

th=H\h;

HH=[0.5*tt.^2 ones(size(tt))]; %This is your best fit curve for the data at points in tt

figure(1)
plot(t,h,'+',tt,HH*th) %+ is (observed t, observed z), line is best fit curve.
xlabel 't'
ylabel 'h'

%% Problem 1/b
g=10; % gravity constant
t_err=2; % systematic error of time
t=[0.3 0.5 0.7 1 2 2.5 3 3.7 3.8 4]'; % t + t_err is observed time
tt=[0:.1:4]'+t_err; % Added t_err to upper bound of this interval for better plot

h=0.5*g*(t).^2 + 10*randn(size(t)); % This is calculating Z_obs

H=[0.5*(t+t_err).^2 ones(size(t))]; % This is calculating estimated z from t_obs.

th=H\h;

HH=[0.5*tt.^2 ones(size(tt))];

figure(2)
plot(t+t_err,h,'+',tt,HH*th) %+ represents (Z_obs, t_obs)
xlabel 't'
ylabel 'h'

%% Problem 1/c


g=10; % gravity constant
h_err=2; % systematic error of height
t_err=0.5; % systematic error of time
t=[0.3 0.5 0.7 1 2 2.5 3 3.7 3.8 4]';
tt=[0:.1:4]'+t_err; %This is the range of times I want to plot the curve on

h=0.5*g*(t).^2 + h_err + 10*randn(size(t)); %This is calculating Z_obs

H=[0.5*(t+t_err).^2 ones(size(t))]; %This is creating Z estimates based on t_obs

th=H\h;

HH=[0.5*tt.^2 ones(size(tt))];

figure(3)
plot(t+t_err,h,'+',tt,HH*th)
xlabel 't'
ylabel 'h'

%% Problem 2


g=10; % gravity constant
t_err=0.01*t; % systematic error of time, e.g. clock is advancing one percent
t=[0.3 0.5 0.7 1 2 2.5 3 3.7 3.8 4]';
tt=[0:.1:4]';%This is the range of times I want to plot the curve on

h=0.5*g*(t).^2 + h_err;

H=[0.5*(t+t_err).^2 ones(size(t))];

th=H\h

HH=[0.5*tt.^2 ones(size(tt))];

figure(4)
plot(t+t_err,h,'+',tt,HH*th)
xlabel 't'
ylabel 'h'

%% Problem 4


x=rand(50,1)*2*pi-pi
y_1=sin (x);
hist(y_1,50)

x=rand(50,1)*2*pi-pi
y_2=cos (x);
hist(y_2,50)

x_2=rand(50,1)*2-1.5;
y_3=log (x_2);
hist(y_3,50)

what does 'clf' do?
when i click run i get only figures 1,2,3,4 where number 4 is a histogram so there is something wrong with them?
and in problem 2 how would I solve a value for g by using the model? least squares method?
 
Last edited:
  • #13
I have never used clf before.
If you are running all the lines at the same time, you will want to use the
figure( 5 ), figure(6), etc. before your plot command to declare which window you want to plot into. If you are doing them 1 by 1, there is no need.

In this part of 4:
x_2=rand(50,1)*2-1.5;
y_3=log (x_2);
hist(y_3,50)

Your range for x_2 is off. This will generate a uniform distribution between -1.5 and .5, you are looking for uniform on [1,2], right?
x_2=rand(50,1)
Is uniform on [0,1], so just add 1 to shift that interval to the right.
x_2=rand(50,1)+1.

50 bins for your histogram is way too many for 50 data points. Either make much larger x vectors or reduce the bins in your hist(y, bins) command.
g is one of the outputs in "th", check the other models to see.
 
  • #14
How about no. 5?

%% Problem 5/a


%y=p(1)x^2+(2)x+p(3) and two known roots are x_1=1, x_2=2
r=[1,2];
p=[2,1,0];

p=poly(r);%% Problem 5/b

p_1=1+rand(5000,1)
p_2=-3+rand(5000,1)
p_3=2+rand(5000,1)

%% Problem 5/c
 
  • #15
5. The second order polynomial function is of the form
y=p(1)x2+(2)x+p(3). The exact roots of the polynomial are x1=1 and x2=2. For some reason, there is a uniformly distributed error in the observed polynomial coefficients p(1), p2) and p(3). Here, we examine the distribution of observed roots in that case. (Matlab simulation)

a) Solve the exact polynomial coefficients p(1), p(2) and p(3) (command poly) and plot the polynomial y
in range x∈[−1,...,4] (command polyval)
b) Generate N= 5000 observations for polynomial coefficients with uniformly distributed noise.
Solve the roots for each observation (command roots).
c) Plot a histogram of observed roots x1 and x2

lep11 said:
How about no. 5?

%% Problem 5/a


%y=p(1)x^2+(2)x+p(3) and two known roots are x_1=1, x_2=2
r=[1,2];
p=[2,1,0];

p=poly(r);%% Problem 5/b

p_1=1+rand(5000,1)
p_2=-3+rand(5000,1)
p_3=2+rand(5000,1)

%% Problem 5/c

You have the first step right for 5a. I don't know why you have a line that says p=[2,1,0] followed by another line defining p...this is unnecessary.
%Step 1. Input parameters
r = [1,2]; % you did this right.
p = poly(r); % you did this right.
%Step 2. Plot the polynomial
x = linspace(-1,4,500); %linspace allows you to put in endpoints...the 500 is how many points you want, if you leave this blank it will default to 100.
y = polyval(p,x); % plots polynomial with coefficients [p] on points [x].
plot(x,y)

For 5b, you are told to add uniformly distributed noise. Noise is normally centered on the actual values, so I would add 2*k*(rand()-0.5) to put it into the range [-k,k].
You can also make all your p's in one matrix like:
k = .5;
P = ones(5000,1)*p+ 2*k*(rand(5000,3)-0.5);
Make your roots collecting matrix:
Roots = zeros(5000,2); % you will fill this up using the roots command.

I don't know how to take roots on a full matrix, so you may have to use a for loop -- feed one row at a time from P into the roots function.
I found that if your k is too large, you might start to get imaginary roots, so keep it relatively small.

For 5c, the histogram is the same command you did before. Your Roots should come out in a 5000x2 matrix, so separately do hist(Roots(:,1)) and hist(Roots(:,2)).
If you have imaginary roots, shrink k or use the real command to get the nearest real value to the root...hist(real(Roots(:,1))) and hist(real(Roots(:,2))).
 
  • #16
Thank you!
 

1. What is systematic error in Matlab data analysis?

Systematic error refers to a consistent bias or inaccuracy in data analysis that occurs due to flaws in the experimental design or measurement process. This type of error can lead to incorrect conclusions and cannot be reduced by increasing the sample size.

2. How can systematic error be identified and corrected in Matlab?

To identify and correct systematic error in Matlab, it is important to carefully review the experimental setup and measurement process. This may involve checking for calibration errors, using control groups, and repeating experiments to ensure consistency. Additionally, statistical methods such as regression analysis can help identify and correct for systematic error.

3. What are some common distributions used in Matlab data analysis?

Some common distributions used in Matlab data analysis include the normal distribution, binomial distribution, and Poisson distribution. These distributions are used to model and analyze various types of data, and can help in understanding the underlying patterns and trends in the data.

4. How can the distribution of data affect the results of a Matlab analysis?

The distribution of data can greatly affect the results of a Matlab analysis. For example, if the data follows a normal distribution, certain statistical tests can be used to make accurate inferences. However, if the data is skewed or does not follow a specific distribution, different methods may need to be used to analyze the data and draw conclusions.

5. How can I ensure the accuracy of my data analysis in Matlab?

To ensure the accuracy of your data analysis in Matlab, it is important to carefully plan and execute experiments, use appropriate statistical methods, and validate results with multiple analyses. It is also helpful to have a clear understanding of the underlying assumptions and limitations of the data and analysis methods being used.

Similar threads

  • Engineering and Comp Sci Homework Help
Replies
1
Views
958
  • Engineering and Comp Sci Homework Help
Replies
2
Views
829
  • Engineering and Comp Sci Homework Help
Replies
3
Views
815
  • Engineering and Comp Sci Homework Help
Replies
3
Views
1K
  • Engineering and Comp Sci Homework Help
Replies
1
Views
890
  • Engineering and Comp Sci Homework Help
Replies
6
Views
864
  • Engineering and Comp Sci Homework Help
Replies
4
Views
1K
  • Engineering and Comp Sci Homework Help
Replies
4
Views
3K
  • Engineering and Comp Sci Homework Help
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
Back
Top