Register to reply 
Calculating autocorrelation function 
Share this thread: 
#1
Nov2212, 01:00 AM

P: 15

Dear Sir/Madam,
I am posting this question here because my field of research is biophysics and i am doing molecular dynamics (MD) simulation on bilayer system. Well, what I want is some explanations on calculating autocorrelation function using MD simulation data. I have 4000 frames out of my MD simulation. I want to calculate correlation function (correlation constant) for a particular physical property say, x, from each frame through out all time evolution. When I look into literature, there I saw an equation in the form of C(t) = <x(0)x(t)>. 1) What does this equation really tells us? 2) How I can proceed to calculate autocorrelation function for 4000, x values from each frame? I really appreciate any explanation. Thanks 


#2
Nov2212, 07:55 AM

Sci Advisor
Thanks
PF Gold
P: 12,134

In nonmaths terms, a crosscorrelation operation shows the degree of similarity of two functions (same number of samples, of course). It is done in two stages. It first takes a function and, sample by sample, it multiplies each sample by a sample of the other (comparison) function and adds all those answers up. It then shifts one set of samples by one (putting the samples all in a loop) and repeats the last process. The crosscorrelation function is the set of samples that you get when you have shifted right round the loop. The more the two functions have in common, the bigger the 'swings' of the correlation function.
With the autocorrelation function, you do this but with the function itself, rather than a comparison function. An interesting characteristic of 'truly random noise' is that its autocorellation function is zero for all values of offset except zero (i.e. it's a single spike). It's a function that's available in most maths packages and libraries if you don't feel like writing it yourself. 


#3
Nov2212, 11:43 PM

P: 15

Thanks sophiecentaur for your reply.
Well, can I find the code for this autocorrelation in numerical recipes? Could you suggest any site giving examples? 


#4
Nov2312, 12:34 AM

Sci Advisor
P: 2,470

Calculating autocorrelation function
Are your x values for each frame ordered in some way, or random? It will make a difference in how you need to compute the expectation value.



#5
Nov2312, 12:45 AM

P: 15

Dear K^2,
I calculate angle between lipid and bilayer normal from each frame. So I think there is no random data. 


#6
Nov2312, 01:52 AM

Sci Advisor
P: 2,470

Ah, I misread something in original post. Typically, the way you'd estimate autocorrelation from series of measurements labeled x_{i}=x(t_{i}) for i running 1 to N is like this.
[tex]C(t_j) = \sum_{i=1}^{Nj}\frac{(x_i\mu)(x_{i+j}\mu)}{(Nj)\sigma^2}[/tex] Naturally, this assumes that t_{j} are evenly spaced. Keep in mind that if you are using the same data to compute μ and σ, the result is biased. The bias gets significantly worse for high autocorrelation times. So basically, you want to make sure that t_{N} is much greater than maximum t_{j} for which you want to know C(t_{j}). Hopefully, this is what you were looking for. 


#7
Nov2312, 05:43 AM

Sci Advisor
Thanks
PF Gold
P: 12,134

You could post on the Maths part of PF to get the best chance of someone knowing about where to get the software. Good luck. Easy to do for yourself with Basic, though. Just a few FOR (??) loops and some indices. It's not too hard to use the visual basic in Excel. 


#8
Nov2612, 02:00 AM

P: 15

Dear K^2,
Thanks for your reply. I have few questions to ask relating to equation [itex] C(t_j) [/itex], to correct my understanding. 1) Does for every new [itex] t_j [/itex] value (called as correlation time), which increases in step wise after each loop, the calculation needed to be done starting from frame 1 until frame N (in my case a total of 4000 frames)? Means when the [itex] t_j [/itex] increases the number of calculation (samples) will decrease? 2) Is there any limiting value for [itex] t_j [/itex]? 3) Does the [itex] μ [/itex] and [itex] σ [/itex] calcualted once only for all [itex] t_j [/itex]? 4) Does this equation normalized? 


#9
Nov2612, 04:00 AM

Sci Advisor
P: 2,470

Yes, this is the normalized form. The autocorrelation computed this way should vary mostly between 1 and 1. That's what σ² in denominator takes care of.
If you have a better way to estimate μ and σ, use it. If not, yeah, you just compute it once for all t_{j}. That will introduce a bias into your computations, but not much you can do about that. Your t_{j} values are limited by whatever is the length of time over which you took measurements. Basically, t_{N} is duration of experiment. You cannot extend the computations past that point. And yes, as you get closer to that last frame, you have fewer and fewer computation samples, so precision will get worse. It is possible to estimate the uncertainty of C(t_{j}) for each t_{j}, which might tell you how far you can get before the values become unreliable. 


#10
Nov2612, 08:10 AM

P: 610

If you have an FFT package available, the most simple and efficient way to generate an autocorrelation is:
1. Take the FFT of your data 2. Multiply the FFT by the complex conjugate of the FFT 3. Take the inverse FFT of step #2 


#11
Nov2612, 10:04 AM

Sci Advisor
P: 2,470




#12
Nov2612, 04:37 PM

P: 587

FYI,
Both correlation and FFT are freely available in Microsoft Excel (install analysis tookpak). I have used the FFT, I have not used the correlation. 


#13
Nov2712, 03:09 AM

P: 610

Because of the local nature of the data in the windows used with the FFT or DFT, periodicity of the original data is neither assumed nor implied. The values of either the original data or the extension of the Fourier series outside of each data window at each iteration are immaterial  not even evaluated. Therefore the original data need not be periodic.
I'm not sure how considerations of mean and standard deviation enter into the picture. The choice of the size of the data window would certainly affect the results and the FFT approach would probably be exactly numerically equivalent to the original approach only in the case where the data window is the entire data sample. I do know though, that the results of the autocorrelation (implemented by any means) give separate deviation values for different parts of the signal or data. Broadband noise has its own deviation value shown by the height of the spike at the FFT or autocorrelation center point compared to the average height of all other parts of the autocorrelation signal. That allows you to determine a signaltonoise ratio. (Now I see sophiecentaur already mentioned that) 


#14
Nov2712, 12:18 PM

Sci Advisor
P: 2,470

[tex]F_k G_k = \sum_{m=0}^{N1}f_m e^{2i\pi \frac{k m}{N}} \sum_{n=0}^{N1}g_n e^{2i\pi \frac{k n}{N}} = \sum_{n=o}^{N1}\sum_{m=0}^{N1}f_m g_n e^{2i\pi \frac{k (m+n)}{N}}[/tex] The order in which the sum over n is performed is irrelevant, so I can shift that sum. [tex]\sum_{n=o}^{N1}\sum_{m=0}^{N1}f_m g_{nm\hspace{4px}mod N}e^{2i\pi \frac{k (m+nm)}{N}} = \sum_{n=o}^{N1}\sum_{m=0}^{N1}f_m g_{nm\hspace{4px}mod N}e^{2i\pi \frac{k n}{N}}[/tex] Which is a DFT of a circular convolution f*g defined as bellow. [tex](f*g)_n = \sum_{m=0}^{N1}f_m g_{nm\hspace{4px}mod N}[/tex] Note that the convolution theorem for discrete case only works if you can take that modulus. And that means at least one of the two functions, f or g, has to be periodic. The reason you can use convolution theorem with autocorrelations comes from the following definition of the autocorrelation. [tex]C_n = \frac{1}{N}\sum_{m=0}^{N1}f_m \bar{f}_{mn\hspace{4px}mod N}[/tex] So with substitution [itex]g_j = \bar{f}_{j\hspace{4px}mod N}[/itex] you have that inverse DFT of product of DFT of f and complex conjugate of DFT of f is equal to the above defintion of autocorrelation. Note that both the convolution and autocorrelation in this derrivation have to be cyclic, or the entire thing simply does not work. Furthermore, notice that autocorrelation is taken relative to the mean of zero and is not normalized. These are all perfectly good assumptions in signal processing. I assume your background is in DSP based on your name, so it makes perfect sense that you are used to dealing with autocorrelation in this way. The mean is going to be zero to within a DC offset for any signal. Furthermore, even if instead of a periodic signal you are analyzing a pulse, if you take a sufficiently large chunk of time around the pulse, it might as well be periodic. These things don't get in the way of you analyzing a signal. However, these assumptions are not true for an arbitrary statistical random variable you might want to consider for autocorrelation analysis. Since OP is looking at orientations of the lipids, the true mean of the data might have relevance, and it is certainly not cyclic in general. He might be looking at response to a pulse of some sort in which case the cyclic approximation is still valid, but we don't know that. And we don't know whether he has a large enough window for that. So the FFT method might produce erroneous results. 


#15
Nov2812, 02:56 AM

P: 610

I'd like to find a very high level overview of how the statistical considerations can be analyzed through the method of moments, higher order moments, cumulants, correlation matrices, covariance matrices and regression. The baffling thing there is that multidimensional FFT's can be used to generate the higher order moment information. 


#16
Nov2812, 04:25 AM

Sci Advisor
P: 2,470

Ergodicity is a big assumption here, yes. But it's present in both treatments of the problem. Otherwise, instead of having a common mean and variance, each data point has its own. The autocorrelation, then, is not a vector, but a tensor. [tex]C_{ij} = \frac{<(F_i\mu_i)(F_j\mu_j)>}{\sigma_i \sigma_j}[/tex] Here, by F_{i} I mean the random variable that will give me possible values of f_{i}. Naturally, this is a mess, and probably won't tell OP anything useful. Once we assume that all F_{i} are the same, but not necessarily independent, we can simplify this autocorrelation to a vector, with statistics for expectation values being gathered directly from data points of a single run. And like I mentioned earlier, if μ and σ are gathered from the same run, that naturally introduces a bias. But there might be no way around that. Once all these things are given, your approach is entirely valid. You can simply compute f_{i}μ, get autocorrelation from that, and normalize by σ² in the end. And if you were really dealing with a cyclic data set or a short pulse, whose duration is much shorter than your window, then you could do this with FFTs. But we don't know that from OP's description. Either way, the computations required here aren't that heavy. He's not losing much by doing this autocorrelation the brute force way. So it's safer just to do that. 


#17
Dec712, 10:15 PM

P: 15

Dear K^2,
When I calculate the autocorrelation function for 4000 frames with 1800 (tcorr) as for correlation time. I get negative values after about 600 frame and the graph approach towards tcorr=0. In the literature it is stated as 'caging' effect. What is really means by caging effect? Since my simulation is in crystal state ( not liquid crystal as most of the work published), can I accept the negative value is due to the 'caging' effect? Regards 


Register to reply 
Related Discussions  
Autocorrelation function in Excel  Computers  1  
Determining the autocorrelation function  Calculus & Beyond Homework  4  
Autocorrelation function of a signal  Calculus & Beyond Homework  0  
Study the autocorrelation function  Set Theory, Logic, Probability, Statistics  33  
A study for autocorrelation function  Electrical Engineering  0 