Gaussian process with linear correlation

In summary, you can simulate a stationary gaussian process by summing independent uniform random variables. You can find the covariance and autocorrelation of two sums of random variables by calculating their covariance and autocorrelation functions.
  • #1
gordonc4513
7
0
Hi
I have a stationary gaussian process { X_t } and I have the correlation function p(t) = Corr(X_s, X_(s+t)) = 1 - t/m, where X_t is in {1,2,..,m}, and I want to simulate this process.
The main idea is to write X_t as sum from 1 to N of V_i, X_t+1 as sum from p+1 to N+p V_i and so on, where V_i are uniform variables in [-a, a]. From this point I will use the central limit theorem to prove that X_t is a normal variable. My question is: how can I find the correlation, covariance of X_T and X_t+k for example, a and p using this notification?
 
Physics news on Phys.org
  • #2
I don't know why you chose sums of independent uniform random variables. Why not use sums of independent normal random variables? Then you know the sums are normally distributed instead of approximately normally distributed.

A specific example of your question is:
Let [itex] u_1,u_2,u_3, u_4 [/itex] each be independent uniformly distributed random variables on [0,1].

Let [itex] S_1 = u_1 + u_2 + u_3 [/itex]
Let [itex] S_2 = u_2 + u_3 + u_4 [/itex]

Find [itex] COV(S_1,S_2) [/itex].
Find the correlation of [itex] S_1 [/itex] with [itex] S_2 [/itex].

Well, the covariance of two sums of random variables should be easy enough. See
http://mathworld.wolfram.com/Covariance.html equation 21.

To force the autocorrelation to have a particular shape, you could use weighted sums . That would be another argument for using normal random variables for [itex] V_i [/itex].
 
  • #3
Thank you for the answer.

Can you explain in detail how can I use the weighted sums to force the autocorrelation and how can I find all the parameters I need?
 
  • #4
Whether I can explain it depends on you background in mathematics, including probability theory - and whether I have time! It isn't a good idea to write computer simulations for any serious purpose without having the mathematical background to understand how they work. If your are doing this for recreation, a board game etc. then perhaps it's Ok.

To start with, I suggest that you consider the weights represented by constants [itex] k_1, k_2, k_3 [/itex].

Let [itex] S_j = k_1 x_j + k_2 x_{j+1} + k_3 x_{j+2} [/itex]

Compute [itex] Cov(S_j, S_{j+1}), Cov(S_j,S_{j+2}) [/itex] etc. and see how the constants determine those covariances.
 
  • #5
Yes, I understand what I have to calculate. Actually , what I don't understand is a result from a book I read. The idea is
[itex]X_t[/itex] is a stationary gaussian process. I have m states.
[itex]p(t) = 1 - t/m[/itex] the autocorrelation function.
So, [itex] p(t) = Cov(X_s, X_{s+t}) / Var(X_t) [/itex].
Now, to simulate it, are generated some uniform random values [itex]V_i[/itex] in [itex] [-a,a][/itex].
Now [itex]X_t = V_1 + V_2 + .. +V_N, X_{t+1} = V_{p+1} + V_{p+2} + .. + V_{N+p}[/itex] and so on.N is enough large to use the central limit theorem.
My real problem is that there is stated that [itex]Cov[X_t , X_{t+k}] = E[(X_t - X_{t+k})^2] = 2No -Cov(X_t,X_{t+k}) [/itex], where o is the variance for [itex]V_t[/itex]. Next is stated that for [itex]k < N/p, Cov[X_t , X_{t+k}] = E[(X_t - X_{t+k})^2] = 2kp^2[/itex]. My problem is that I can't reach to this results, I can't prove them. If you can give me some ideas or links to read about this idea to aproximate, it would be great. Thanks again.
 
  • #6
gordonc4513 said:
[itex]Cov[X_t , X_{t+k}] = E[(X_t - X_{t+k})^2] = 2No -Cov(X_t,X_{t+k}) [/itex], where o is the variance for [itex]V_t[/itex].

[tex] E[(X_t - X_{t+k})^2] = E( X_t^2 - 2 X_t X_{t+k} + X_{t+k}^2) = E(X_t^2) - 2 E(X_t X_{t_k}) + E(X_{t+k}^2) = N_o - 2 E(X_t X_{t+k}) + N_o [/tex]

Next is stated that for [itex]k < N/p, Cov[X_t , X_{t+k}] = E[(X_t - X_{t+k})^2] = 2kp^2[/itex].

That could be a mess to write out, but the way I would start it is:

[tex] COV( X_t, X_{t+k}) = COV( (\sum_A V_i + \sum_B V_i) (\sum_B V_i + \sum_C V_i) ) [/tex]

Where [itex] A [/itex] are the [itex] Vi [/itex] unique to [itex] X_t [/itex], [itex] B [/itex] are the [itex] V_i [/itex] common to both [itex] X_t [/itex] and [itex] X_{t+k} [/itex] and [itex] C [/itex] are the [itex] V_i [/itex] unique to [itex] X_{t+k} [/itex].

As I recall, the covariance function obeys a distributive law that would give:

[tex] = COV( \sum_A V_i ,\sum_B V_i) + COV(\sum_A V_i ,\sum_C V_i) + COV(\sum_B V_i ,\sum_B V_i) + COV(\sum_B V_i ,\sum_C V_i) [/tex]

[tex] = 0 + 0 + COV(\sum_B V_i, \sum_B V_i) + 0 [/tex]

So you must comput the variance of [itex] \sum_B V_i [/itex], which will be a function of the variance of one particular [itex] V_i [/itex] and the number of the [itex] V_i [/itex] in the set [itex] B [/itex].
 
  • #7
I have one more question. Maybe is obvious, but I don't see it. Why is the relation [itex]Cov(X_t,X_{t+k}) = E[(X_t - X_{t+k})^2][/itex] true?
 
  • #8
gordonc4513 said:
I have one more question. Maybe is obvious, but I don't see it. Why is the relation [itex]Cov(X_t,X_{t+k}) = E[(X_t - X_{t+k})^2][/itex] true?

I don't have time to think about that now.

Is [itex] X_t [/itex] assumed to have mean = 0 as part of the terminology "gaussian"?

Also, I notice that I got [itex] 2 N_0 - 2 E(X_t,X_{t+k}) [/itex] instead of what you wrote.

I'll be back this evening.
 
  • #9
Yes, the mean is 0. Also, from what I calculated, there it should be [itex]kpo[\itex] instead of [itex]kp^2[\itex].
 
  • #10
gordonc4513 said:
Also, from what I calculated, there it should be [itex]kpo[\itex] instead of [itex]kp^2[\itex].

The notation is getting confusing. We have p() for the correlation function and p in the index that defines [itex] X_t [/itex] as a sum of the [itex] V_i [/itex]. What is "po"?

Anyway, let's say that the cardinality of the set [itex] B [/itex] is b.

[tex] COV( \sum_B V_i, \sum_B V_i) = VAR( \sum_B V_i) [/tex]

[tex] =\sum_B ( VAR(V_i)) = b (VAR(V_i)) [/tex]

versus:

[tex] E( (X_t - X_{t+k})^2) = E ( ( \sum_A V_i - \sum_C V_i)^2) [/tex]

[tex] = E( (\sum_A V_i)^2 + (\sum_C V_i)^2 - 2 \sum_A V_i \sum_C V_i ) [/tex]

[tex] = E( (\sum_A V_i)^2 + E (\sum_C V_i)^2 - 2 E( \sum_A V_i)E(\sum_C V_i) [/tex]

I gather we are assuming [itex] E(V_i) = 0 [/itex] so [itex] E( \sum_A V_i)^2 = VAR( \sum_A V_i) [/itex] etc.

So the above is

[tex] = VAR( \sum_A V_i) + VAR(\sum_C V_i) - 2 (0)(0) [/tex]

Let a = the cardinality of [itex] A [/itex] and c = the cardinality of [itex] C [/itex]

[tex] = (a + c) VAR(V_i) [/tex]

It there an argument that (a+c) = b ?
 
  • #11
Thank you for the help, I figured it out finally. I have one more question. Now I have the algorithm for simulating the process and I want to validate it. Can you give some hints how it must be done?
 
  • #12
Do you mean you want to validate it as a computer program by running it and looking at the data? Or do you mean you want to validate whether the algorithm implements the procedure in the book by reading the code of the algorithm?
 
  • #13
I meant if I see the values the computer program generated, what tests could I use to see if it is working correctly? I am thinking to check if the average of the values estimates the mean, the values respect the autocorrelation function rules and I was wondering if there are more things I could test.
 
  • #14
I'd have to think about this in detail to give good advice. On the spur of the moment, I'd say to also check whether the X_i are normally distributed.

In the history of simulations, there are examples where the random number generating functions were flawed in certain software packages. Don't assume that the random number generator in your software really works. Test it or at least find some documentation that someone else did. A quick test is to do a 2D plot of (X,Y,color) where each component is selected by a uniform distribution on some scale. See if any pronounced visual patterns appear. (Some random number generators misbehave only on particular seeds.)
 

FAQ: Gaussian process with linear correlation

1. What is a Gaussian process with linear correlation?

A Gaussian process with linear correlation is a statistical model that is used to describe the relationship between a set of variables that follow a Gaussian or normal distribution. It assumes that the variables are linearly correlated, meaning that as one variable increases, the other variable also tends to increase or decrease in a linear fashion.

2. How does a Gaussian process with linear correlation differ from other Gaussian processes?

A Gaussian process with linear correlation differs from other Gaussian processes in that it explicitly models the linear correlation between the variables, rather than assuming that the variables are independent or have some other form of correlation. This makes it a more specific and specialized type of Gaussian process.

3. What are the advantages of using a Gaussian process with linear correlation?

One advantage of using a Gaussian process with linear correlation is that it can capture linear relationships between variables, making it useful for modeling real-world phenomena that exhibit this type of correlation. It can also be easier to interpret and explain the results of a Gaussian process with linear correlation compared to other types of Gaussian processes.

4. How is a Gaussian process with linear correlation typically implemented?

A Gaussian process with linear correlation is typically implemented using a kernel function, which is a mathematical function that describes the relationship between the variables. This kernel function is used to calculate the covariance matrix, which is then used to generate a distribution of possible functions that can be fit to the data.

5. What are some common applications of a Gaussian process with linear correlation?

A Gaussian process with linear correlation can be applied in various fields, such as finance, biology, and engineering. Some specific applications include time series analysis, predicting stock market trends, modeling gene expression patterns, and analyzing spatial data. It can also be used for regression and classification tasks.

Similar threads

Back
Top