Fortran [Fortran90]Problem with Gaussian distribution

AI Thread Summary
The discussion revolves around issues with generating a Gaussian distribution in Fortran90 after centering data points. The original poster's method of centering involves subtracting 0.5 from random numbers generated between 0 and 1, which does not effectively center the data around zero. Participants suggest that the correct approach to center data is to subtract half the maximum value of the dataset instead. Additionally, there are concerns about the precision of the generated numbers and the method of scaling random numbers, with recommendations to use a more reliable formula for generating random numbers within a specified range. The conversation highlights the importance of understanding statistical properties when manipulating random distributions.
Aner
Messages
9
Reaction score
0
Hi,
I have a problem in my program and I cannot figure it out.
In the last post I had a problem about some arrays, I perfectly resolved it thanks to you, but now my problem is a little bit subtle. I have a subroutine(here I'll post it has a program )that generates random numbers in order to create errors so that those must have a Gaussian distribution and it is perfect, the program works perfectly, I can have a really nice Gaussian until I centred my points (that is what I must do to associated the errors to my data). After I centred the points I get a weird accumulation of point near x=1 that I can't explain.
Here is the program I'm talking about
Fortran:
Program gauss
implicit none

character(len=6)::gaussn
real,dimension(10000)::m,a,ltot,pron
REAL::x,l
INTEGER::i,n,k,p,j,fn=10000

print*,"scrivere il nome del file"

read*,gaussn   OPEN(UNIT=1,FILE=gaussn)

do f=1,n

read(1,*)m(f)

pron(f)=2*0.1*(m(f)-0.5)
end do
close (unit=1)a=0

l=0.1/700do i=1,np=int(pron(i)/l)

a(p+(n/2+1))=a(p+(n/2+1))+1
!a(p)=a(p)+1
end do

do j=1,n
ltot(j)=-(n/2.0)+j

write(unit=15,fmt=*)ltot(j),a(j)
end do

end program gauss
The file that it reads is a column of data that I generate in my main program. I printed two files, one with the already centred points and one with the not centred ones. Now I would expect those two sets of points to give a Gaussian distribution but a you can see those are pretty different.
This pron(f)=2*0.1*(m(f)-0.5) is the line that I use to centre my points, it's something that we did in class with the teacher. I even thought that maybe it was the minus sign that was bothering fortran and I did some experiments like changing it with a plus and defining m(f) as negative. In the main program if I use the centred points I have a "lack" of precision, if I expect my output to be -3 I'll get -2.993... with an error of order 10-7 when with the not centred ones I get -3.001... with an error of order 10 -3.
I can't understand the problem if there is one or it is just me, maybe I'm missing something.
Thanks in advanced.
 

Attachments

  • gaussnotc.png
    gaussnotc.png
    4.1 KB · Views: 753
  • gausscentred.png
    gausscentred.png
    2.3 KB · Views: 701
Technology news on Phys.org
I don't quite get what you are trying to do...can you try again to better explain what you mean by "centering" the data? In which direction? x? y?

f is just the array index, that's fine. But if m(f) is the y value of your data, what does "-0.5" had to do with centering? It is no like the maximum value is 1.0! If you want to center the data in the y direction (and allow negative values) then, you need to subtract half the maximum value from all data points, not just "half" (0.5). You maximum value is over 200...I don't see how taking 0.5 away from it centers it. In the same line, you divide by 10...are you scaling everything down? and then multiply by 2...scaling again?

In the following loop you take pron and multiply by 7000! what is that about? Then, you take these function values and mix them up with the index variable n, in effect, you are kind of transposing the whole thing using function values as your index...is this on purpose? I am confused.
 
gsal said:
I don't quite get what you are trying to do...can you try again to better explain what you mean by "centering" the data? In which direction? x? y?

f is just the array index, that's fine. But if m(f) is the y value of your data, what does "-0.5" had to do with centering? It is no like the maximum value is 1.0! If you want to center the data in the y direction (and allow negative values) then, you need to subtract half the maximum value from all data points, not just "half" (0.5). You maximum value is over 200...I don't see how taking 0.5 away from it centers it. In the same line, you divide by 10...are you scaling everything down? and then multiply by 2...scaling again?

In the following loop you take pron and multiply by 7000! what is that about? Then, you take these function values and mix them up with the index variable n, in effect, you are kind of transposing the whole thing using function values as your index...is this on purpose? I am confused.
I am sorry for not being clear enough. In the program I posted, m(f) are random numbers with gaussian distribution. In reality I tried to obtain random numbers with a gaussian distribution centred at the origin in two ways, but I always run into the same problem. My first attempt was to compute random numbers with uniform distribution between -0.1 and 0.1 and then to use them in order to create random numbers k(f) with a gaussian distribution centred at the origin. To compute them I used the following code
Fortran:
do i=1,n

       call random_number(x(i))

end do

k(f)=sum(x)/n

where the values of x(i) are random numbers between 0 and 1. The problem is that I would like to have a gaussian distribution centred at 0 and not at 1/2, which should be the centre of the gaussian distribution generated above because it is the distribution of the mean values of random numbers with expectation value 1/2, if I'm not wrong. Moreover, even though the fortran function random_number(x) generates random numbers between 0 and 1, I would like to have them between -0.1 and 0.1. So I needed to multiply the random numbers generated by random_number(x) by 2*0.1, which is the maximum width, and after that I needed to subtract half of the maximum value, which is 0.1, so that my random numbers should be of the form randomnumber=2*0.1*(x-0.5) where x is generated with random_number(x). But in doing so I ran into the problem I described above, because if I only plotted my gaussian distribution obtained with the code above I had no problem, but if I plotted my k(f) calculated with the following code
Fortran:
do i=1,n

       call random_number(x(i)) 
       randomnumber=2*0.1*(x(i)-0.5)

end do

k(f)=sum(randomnumber)/n

i found that weird crazy point. So I tried a second attempt, which is the one I quoted in my first post. This time I computed my k(f) with the first code, not including the line randomnumber=2*0.1*(x(i)-0.5), and then I tried to shift these numbers with gaussian distribution centered at 1/2 in order to obtain random numbers with a gaussian distribution centred at the origin. I thought that the same line could be used in order to shift the data because I still wanted a maximum width of 2*0.1 and still wanted the centre to be 0, but I am not sure of this. In the program above, the m(f) correspond to k(f). However I run again into exactly the same problem.
The second part of the program is used to make the gaussian distribution. I defined "l" to be the width of a single interval, that is I wrote l=0.1/h where h is the number of intervals I want between 0 and 0.1. After that I computed p=int(pron(i)/l). In this way I should be able to compute the integer part of pron(i)/l where pron(i) are my shifted random numbers of the second attempt. Doing so I know that the pth interval contains one of my random numbers, in this case pron(i) (rigorously speaking it should be the (p+1)th but I don't think it should change much), and to keep track of this I add 1 to the (p+n/2+1)th component of the array "a" that I defined to be a=0 before the do cycle. I added n/2+1 to the index of "a" so that, when p=0 I obtain that the random number is assigned to the (n/2+1)th component. Finally, to do the plot of the gaussian distributions, I defined an index ltot(j) to be ltot(j)=-(n/2.0)+j so that, when I plot the points (a(i), ltot(i)), I should obtain a gaussian distribution centered at the origin. That was what I thought, but obviously there is a problem somewhere. I hope to have been more clear!
 
maybe you should google a bit more...you are not the first one trying to do this.

B the way, the manner to get random numbers within a range is: min + (max - min)*random_number(x) and not that way you are doing it...it is important that the multiplication (scaling) be done while all numbers are still positive as the scaling reduces the magnitude of the number towards zero.

Here is the page of some guy who is nice enough to share his experience; once in that page, search for the word "normal" (which is another name for gaussian distribution).
 
gsal said:
B the way, the manner to get random numbers within a range is: min + (max - min)*random_number(x) and not that way you are doing it...it is important that the multiplication (scaling) be done while all numbers are still positive as the scaling reduces the magnitude of the number towards zero.

Here is the page of some guy who is nice enough to share his experience; once in that page, search for the word "normal" (which is another name for gaussian distribution).
I so strongly disagree. The page to which you referenced did things exactly wrong. (Prima facie evidence: That guy uses the original Numerical Recipes ran1. That's been known to be bad since sometime in the previous millennium. There is no reason to use something that isn't at least as good as Mersenne Twister nowadays.) In general, it's much better to use the following to convert a number randomly drawn from U(0,1) to a number randomly drawn from U(min,max) than your min+(max-min)*u_0_1_random_deviate:
Code:
r = u_0_1_random_deviate
u_min_max_random_deviate = max*r + min*(1.0-r)
 
D H:
Well, I am no random number expert and was simply providing a link since the OP could use some examples. I don't exactly know what you mean by Numerical Recipes ran1...the ran1 function in the referenced page does nothing but call Fortran90 instrinsic function random_number().

By the way, the min+(max-min)*u_0_1_random formula is not my formula...in any case, it is exactly the same as yours.

Other than that, forgive me if I don't share your passion, but please keep cool and don't kill the messenger, I am just trying to help.
 
gsal said:
By the way, the min+(max-min)*u_0_1_random formula is not my formula...in any case, it is exactly the same as yours.
If you had infinite precision, yes, they would be exactly the same. You don't have infinite precision. Your example uses double precision.

Here's a key problem with using min+(max-min)*u_0_1_random instead of my recommended approach, max*u_0_1_random+min*(1.0d0-u_0_1_random). Suppose u_0_1_random is exactly one. Your version reduces to min+(max-min). On the other hand, my version reduces to max in this case.

There is no guarantee that min+(max-min) is equal to max. There are in fact plenty of pairs of IEEE floating point numbers min and max such that min+(max-min) > max.
 
I see, thanks for the explanation.
 
Aner said:
To compute them I used the following code
Fortran:
do i=1,n

       call random_number(x(i))

end do

k(f)=sum(x)/n
Your k(f) does not have a gaussian distribution. It has a Bates distribution, which is close to normal thanks to the central limit theorem and thanks to special properties of the uniform distribution. (It takes a lot more trials for the sum of independent exponentially distributed random variables converge toward a normal distribution than it does for the sum of independent uniformly distributed random variables.)

You can use this as an approximation of a guassian distribution, but you need to account for two things:
- The mean is not zero (it is 1/2)
- The variance is not one (it is 1/(12*n)).

You would have something close to N(0,1) (the standard normal distribution, with mean 0 and variances 1) if you had used k(f)=sqrt(12.0d0*n)*(sum(x)/n-0.5). Converting from N(0,1) to N(\mu,\sigma) is fairly straightforward: Multiply by sigma and add mu.
 

Similar threads

Replies
5
Views
5K
Replies
4
Views
2K
Replies
8
Views
2K
Replies
12
Views
1K
Replies
20
Views
3K
Back
Top