[Fortran90]Problem with Gaussian distribution

Aner · Jul 18, 2015

Hi,
I have a problem in my program and I cannot figure it out.
In the last post I had a problem about some arrays, I perfectly resolved it thanks to you, but now my problem is a little bit subtle. I have a subroutine(here I'll post it has a program )that generates random numbers in order to create errors so that those must have a Gaussian distribution and it is perfect, the program works perfectly, I can have a really nice Gaussian until I centred my points (that is what I must do to associated the errors to my data). After I centred the points I get a weird accumulation of point near x=1 that I can't explain.
Here is the program I'm talking about

Fortran:

Program gauss
implicit none

character(len=6)::gaussn
real,dimension(10000)::m,a,ltot,pron
REAL::x,l
INTEGER::i,n,k,p,j,fn=10000

print*,"scrivere il nome del file"

read*,gaussn   OPEN(UNIT=1,FILE=gaussn)

do f=1,n

read(1,*)m(f)

pron(f)=2*0.1*(m(f)-0.5)
end do
close (unit=1)a=0

l=0.1/700do i=1,np=int(pron(i)/l)

a(p+(n/2+1))=a(p+(n/2+1))+1
!a(p)=a(p)+1
end do

do j=1,n
ltot(j)=-(n/2.0)+j

write(unit=15,fmt=*)ltot(j),a(j)
end do

end program gauss

The file that it reads is a column of data that I generate in my main program. I printed two files, one with the already centred points and one with the not centred ones. Now I would expect those two sets of points to give a Gaussian distribution but a you can see those are pretty different.
This pron(f)=2*0.1*(m(f)-0.5) is the line that I use to centre my points, it's something that we did in class with the teacher. I even thought that maybe it was the minus sign that was bothering fortran and I did some experiments like changing it with a plus and defining m(f) as negative. In the main program if I use the centred points I have a "lack" of precision, if I expect my output to be -3 I'll get -2.993... with an error of order 10^-7 when with the not centred ones I get -3.001... with an error of order 10 ^-3.
I can't understand the problem if there is one or it is just me, maybe I'm missing something.
Thanks in advanced.

gsal · Jul 18, 2015

I don't quite get what you are trying to do...can you try again to better explain what you mean by "centering" the data? In which direction? x? y?

f is just the array index, that's fine. But if m(f) is the y value of your data, what does "-0.5" had to do with centering? It is no like the maximum value is 1.0! If you want to center the data in the y direction (and allow negative values) then, you need to subtract half the maximum value from all data points, not just "half" (0.5). You maximum value is over 200...I don't see how taking 0.5 away from it centers it. In the same line, you divide by 10...are you scaling everything down? and then multiply by 2...scaling again?

In the following loop you take pron and multiply by 7000! what is that about? Then, you take these function values and mix them up with the index variable n, in effect, you are kind of transposing the whole thing using function values as your index...is this on purpose? I am confused.

Aner · Jul 18, 2015

gsal said:

I don't quite get what you are trying to do...can you try again to better explain what you mean by "centering" the data? In which direction? x? y?

f is just the array index, that's fine. But if m(f) is the y value of your data, what does "-0.5" had to do with centering? It is no like the maximum value is 1.0! If you want to center the data in the y direction (and allow negative values) then, you need to subtract half the maximum value from all data points, not just "half" (0.5). You maximum value is over 200...I don't see how taking 0.5 away from it centers it. In the same line, you divide by 10...are you scaling everything down? and then multiply by 2...scaling again?

In the following loop you take pron and multiply by 7000! what is that about? Then, you take these function values and mix them up with the index variable n, in effect, you are kind of transposing the whole thing using function values as your index...is this on purpose? I am confused.

I am sorry for not being clear enough. In the program I posted, m(f) are random numbers with gaussian distribution. In reality I tried to obtain random numbers with a gaussian distribution centred at the origin in two ways, but I always run into the same problem. My first attempt was to compute random numbers with uniform distribution between -0.1 and 0.1 and then to use them in order to create random numbers k(f) with a gaussian distribution centred at the origin. To compute them I used the following code

Fortran:

do i=1,n

       call random_number(x(i))

end do

k(f)=sum(x)/n

where the values of x(i) are random numbers between 0 and 1. The problem is that I would like to have a gaussian distribution centred at 0 and not at 1/2, which should be the centre of the gaussian distribution generated above because it is the distribution of the mean values of random numbers with expectation value 1/2, if I'm not wrong. Moreover, even though the fortran function random_number(x) generates random numbers between 0 and 1, I would like to have them between -0.1 and 0.1. So I needed to multiply the random numbers generated by random_number(x) by 2*0.1, which is the maximum width, and after that I needed to subtract half of the maximum value, which is 0.1, so that my random numbers should be of the form randomnumber=2*0.1*(x-0.5) where x is generated with random_number(x). But in doing so I ran into the problem I described above, because if I only plotted my gaussian distribution obtained with the code above I had no problem, but if I plotted my k(f) calculated with the following code

Fortran:

do i=1,n

       call random_number(x(i)) 
       randomnumber=2*0.1*(x(i)-0.5)

end do

k(f)=sum(randomnumber)/n

i found that weird crazy point. So I tried a second attempt, which is the one I quoted in my first post. This time I computed my k(f) with the first code, not including the line randomnumber=2*0.1*(x(i)-0.5), and then I tried to shift these numbers with gaussian distribution centered at 1/2 in order to obtain random numbers with a gaussian distribution centred at the origin. I thought that the same line could be used in order to shift the data because I still wanted a maximum width of 2*0.1 and still wanted the centre to be 0, but I am not sure of this. In the program above, the m(f) correspond to k(f). However I run again into exactly the same problem.
The second part of the program is used to make the gaussian distribution. I defined "l" to be the width of a single interval, that is I wrote l=0.1/h where h is the number of intervals I want between 0 and 0.1. After that I computed p=int(pron(i)/l). In this way I should be able to compute the integer part of pron(i)/l where pron(i) are my shifted random numbers of the second attempt. Doing so I know that the p^th interval contains one of my random numbers, in this case pron(i) (rigorously speaking it should be the (p+1)^th but I don't think it should change much), and to keep track of this I add 1 to the (p+n/2+1)^th component of the array "a" that I defined to be a=0 before the do cycle. I added n/2+1 to the index of "a" so that, when p=0 I obtain that the random number is assigned to the (n/2+1)^th component. Finally, to do the plot of the gaussian distributions, I defined an index ltot(j) to be ltot(j)=-(n/2.0)+j so that, when I plot the points (a(i), ltot(i)), I should obtain a gaussian distribution centered at the origin. That was what I thought, but obviously there is a problem somewhere. I hope to have been more clear!

gsal · Jul 18, 2015

maybe you should google a bit more...you are not the first one trying to do this.

B the way, the manner to get random numbers within a range is: min + (max - min)*random_number(x) and not that way you are doing it...it is important that the multiplication (scaling) be done while all numbers are still positive as the scaling reduces the magnitude of the number towards zero.

Here is the page of some guy who is nice enough to share his experience; once in that page, search for the word "normal" (which is another name for gaussian distribution).

D H · Jul 19, 2015

gsal said:

B the way, the manner to get random numbers within a range is: min + (max - min)*random_number(x) and not that way you are doing it...it is important that the multiplication (scaling) be done while all numbers are still positive as the scaling reduces the magnitude of the number towards zero.

Here is the page of some guy who is nice enough to share his experience; once in that page, search for the word "normal" (which is another name for gaussian distribution).

I so strongly disagree. The page to which you referenced did things exactly wrong. (Prima facie evidence: That guy uses the original Numerical Recipes ran1. That's been known to be bad since sometime in the previous millennium. There is no reason to use something that isn't at least as good as Mersenne Twister nowadays.) In general, it's much better to use the following to convert a number randomly drawn from U(0,1) to a number randomly drawn from U(min,max) than your min+(max-min)*u_0_1_random_deviate:

Code:

r = u_0_1_random_deviate
u_min_max_random_deviate = max*r + min*(1.0-r)

gsal · Jul 19, 2015

D H:
Well, I am no random number expert and was simply providing a link since the OP could use some examples. I don't exactly know what you mean by Numerical Recipes ran1...the ran1 function in the referenced page does nothing but call Fortran90 instrinsic function random_number().

By the way, the min+(max-min)*u_0_1_random formula is not my formula...in any case, it is exactly the same as yours.

Other than that, forgive me if I don't share your passion, but please keep cool and don't kill the messenger, I am just trying to help.

D H · Jul 20, 2015

gsal said:

By the way, the min+(max-min)*u_0_1_random formula is not my formula...in any case, it is exactly the same as yours.

If you had infinite precision, yes, they would be exactly the same. You don't have infinite precision. Your example uses double precision.

Here's a key problem with using min+(max-min)*u_0_1_random instead of my recommended approach, max*u_0_1_random+min*(1.0d0-u_0_1_random). Suppose u_0_1_random is exactly one. Your version reduces to min+(max-min). On the other hand, my version reduces to max in this case.

There is no guarantee that min+(max-min) is equal to max. There are in fact plenty of pairs of IEEE floating point numbers min and max such that min+(max-min) > max.

gsal · Jul 20, 2015

I see, thanks for the explanation.

D H · Jul 20, 2015

Aner said:
To compute them I used the following code
Fortran:
do i=1,n

       call random_number(x(i))

end do

k(f)=sum(x)/n

Your k(f) does not have a gaussian distribution. It has a Bates distribution, which is close to normal thanks to the central limit theorem and thanks to special properties of the uniform distribution. (It takes a lot more trials for the sum of independent exponentially distributed random variables converge toward a normal distribution than it does for the sum of independent uniformly distributed random variables.)

You can use this as an approximation of a guassian distribution, but you need to account for two things:
- The mean is not zero (it is 1/2)
- The variance is not one (it is 1/(12*n)).

You would have something close to [itex]N(0,1)[/itex] (the standard normal distribution, with mean 0 and variances 1) if you had used k(f)=sqrt(12.0d0*n)*(sum(x)/n-0.5). Converting from [itex]N(0,1)[/itex] to [itex]N(\mu,\sigma)[/itex] is fairly straightforward: Multiply by sigma and add mu.

[Fortran90]Problem with Gaussian distribution

Attachments

Is A.I. more than the sum of its parts?

AI vs. Humans as Processors in an Environment

Sweetspot of data compression

Other than just FizzBuzz to test programmer candidates

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

[Fortran90]Problem with Gaussian distribution

Attachments

Similar threads