# What is covariance? (with both random and deterministic variables)

1. Dec 19, 2011

### weetabixharry

I understand the concept of covariance, relating two complex random (scalar) variables. However, I get confused when I have both deterministic and random variables. Therefore, what I write might make very little sense -- I'm really only looking for any general advice on where to start reading. [In all my work, all variables are zero-mean and random variables are complex Gaussian].

For example, let's say I have two (zero-mean) independent random variables, $r_1$ and $r_2$ which vary as a function of time with known second order statistics. Similarly, I have two (zero-mean) deterministic variables $d_1$ and $d_2$ which vary in a known manner as a function of time.

Firstly, I don't even know if the term 'covariance' (or, more generally, 'expectation') can be applied to the deterministic variables. I don't see why I couldn't design, for example, $d_1$ and $d_2$ such that:

$\mathcal{E}\{d_1d_2^*\}=\rho$

for some suitable complex scalar, $\rho$. If such things are possible, then can I say my two (zero-mean) deterministic variables are independent? That is:

$\mathcal{E}\{d_1d_2^*\}= \mathcal{E}\{d_1\}\mathcal{E}\{d_2^*\}=0$

Even if all that is possible, I get particularly confused when random variables are combined with deterministic variables. Specifically, I feel like there must be certain statistical properties of the random variables which are sort of 'unchangeable' after multiplication by deterministic variables (or, at least, ones which don't do anything weird like take zero values all the time). For example, I'd like to be able to write something along the lines of:

$\mathcal{E}\{d_1d_2^*r_1r_2^*\}=\mathcal{E}\{d_1d_2^*\}\mathcal{E}\{r_1r_2^*\}$

which would somehow illustrate that the behaviour of the random variables can be separated from the known behaviour of the deterministic variables.

How can I begin to approach problems such as this, which involve both random and deterministic variables? Any advice is greatly appreciated!

2. Dec 19, 2011

### SW VandeCarr

If you have no measurement error (only calculated or theoretical values), it's clear that $( \bar X - x)$ and $(\bar Y- y$) must be both zero since the "observed" value is always equal to the expected value for all values of fully deterministic variables. For measured values, there will always likely be same non zero covariance due to random, if unbiased, measurement error even if the variables are considered non random.

Last edited: Dec 19, 2011
3. Dec 19, 2011

### weetabixharry

I'm not totally sure what $\bar X, x, \bar Y$ and $y$ are in your example. However, in my work, measurement error is modelled separately, so let's assume we're talking about theoretical values.

4. Dec 19, 2011

### torquil

Perhaps it gets clearer by considering the definition of an expectation value by integrating over the probability space? The probability space is a triplet $(P, \Sigma, \Omega)$, i.e. a probability law, a $\sigma$-algebra and a state space.

An $\mathbb{R}$-valued random variable is a $\Sigma$-measurable function $\Omega\rightarrow\mathbb{R}$. Its expectation is an integral over $\Omega$ using the probability law $P$ as an integration measure:

$$E[r] := \int_\Omega r(\omega) dP(\omega)$$

So the covariance of two random variables $r_1$ and $r_2$ is simply

$$E[r_1r_2] = \int_\Omega r_1(\omega)r_2(\omega)dP(\omega)$$

Any deterministic function $d_1$ is by definition constant on $\Omega$, so it can be taken out of the integral over $\Omega$. Therefore,

$$E[d_1] = d_1 \int_\Omega dP(\omega) = d_1$$

Therefore, you always have for deterministic functions $d_1,d_2$ that

$$E[d_1d_2] = d_1d_2 = E[d_1]E[d_2]$$

We also have:

$$E[d_1r_1r_2] = d_1 \int_\Omega r_1(\omega)r_2(\omega)dP(\omega) = d_1E[r_1r_2]$$

5. Dec 19, 2011

### SW VandeCarr

I thought you knew something about the concept of random variables and covariance. Lets assume the speed of light c is constant in a vacuum as most physicists do. Then c is not a random variable. It is a physical constant. A random variable will likely exhibit different values with different observations from which an expected value may be calculated. Each observation will likely have a different value than the expected value. The cumulative difference between the observed value $x_i$ and the expected value $\bar X$ is used in the calculation of the covariance of (usually) two variables. For the speed of light in a vacuum the difference between the expected and the "observed" is always zero by definition if we don't consider measurement error.

Last edited: Dec 19, 2011
6. Dec 19, 2011

### weetabixharry

I must be misunderstanding this part. In my mind, $d_1$ and $d_2$ are functions of time but $E[d_1d_2]$ is not. Therefore, I cannot make sense of the first equality.

Then, for example, I could take $d_1=d_2=sin(t)$ as my deterministic functions of time. In this case, none of the equalities hold.

What am I doing wrong?

7. Dec 19, 2011

### weetabixharry

Yes, that makes sense to me. In my work, the measured values would be modelled something like $$\hat{c}=c+n(t)$$ where $c$ is a constant and $n(t)$ is a random variable. Therefore, $c$ is assumed to have a perfect, theoretical value and we would characterise the additive random perturbation in terms of its second order statistics.

However, what I want to look at here is deterministic functions that vary as a function of time (speed of light, $c$, is just a constant under my modelling). Maybe it doesn't change anything? It's not clear to me at the moment.

8. Dec 19, 2011

### SW VandeCarr

The term "covariance", as far as I know, is strictly defined only in terms of random variables. If there is no variance, there can't be any covariance. If your variable is a strict function of time ($f(t)= y$) then the term "covariance" does not apply to these two variables Any value that t takes maps to a predictable and precise value of y, at least in theory.

Last edited: Dec 19, 2011
9. Dec 20, 2011

### Stephen Tashi

Let's see, "mean", "variance" and "covariance" can be defined for samples of random variables or for the random variables themselves. When they are defined for the random variables themselves they are computed from the distribution functions for the random variables by integrations ( counting summation as a type of integration). So these procedures are performed on a specific function (which is what the original post means by a "deterministic" function, I suppose.) The same sort of integrations can be applied to a function that is not a distribution function.

Perhaps the original post concerns whether this is every done and whether the things that are computed this way are still called "mean", "variance" and "covariance".

I think a function that is not a distribution can have a "first moment" which is computed like a mean. It can have an "L2-norm" which is like the square root of the sum of the squares. So I think the same sorts of integrations are indeed done on functions that are not distribution functions. Whether a given field (like signal processing) uses the terminology "mean", "variance" and "covariance" for the things computed, I don't know.

10. Dec 20, 2011

### weetabixharry

Yes, part of my question was really just about nomenclature. I'm glad you agree that analogous operations must make sense for deterministic functions; I'll continue to abuse the terminology until someone tells me not to. For example, we could say: $$var\{sin(t)\}\triangleq E\{sin^2(t)\} = 0.5E\{1 - cos(2t)\} = 0.5$$However, I'm still confused about what happens when random and deterministic functions are multiplied together (see end of original post). I feel like functions of random variables must always be independent of deterministic functions, but I don't know where to start in terms of justifying that.

Any ideas?!

11. Dec 20, 2011

### Stephen Tashi

Well, that's one Philsophy of Life. Another one would be not to abuse the terminology until you found some field of study that did this.

You have to be cautious with integrations. Certain integrals may not exist. In your example, you aren't making it clear what the range of the integration is. Do you mean it to be a complete period? The expected value of a probability density f(x) is an integral involving x f(x), not simply f(x). I think you haven't defined your terminology precisely.

You aren't stating a clear question. Mathematics (from the modern point of view) is not about some objective reality where something must "happen" when you multiply two functions together. If you want to apply mathematics to the real world, you have to define how the things involved correspond to things in the real world. If you do that, perhaps what happens will have some interpretation.

You have several types of functions involved in you post. There are "functions of a random variable", distribution functions ( "probability density functions") and functions without either of those restrictions. You aren't making it clear which you wish to multiply together and I'm not sure whether you are using the word "independent" as it is used in probability theory.

Try to formulate a precise question. Don't mix-in a lot of terminology without considering whether the phrases have a definite meaning.

Perhaps your question is how to deal with the properties of a random process ("random process", not "random variable") that consists of a deterministic component c(t) and a stochastic process n(t).

Last edited: Dec 20, 2011
12. Dec 21, 2011

### weetabixharry

I had thought that expectation involved an integration over all time, $-\infty$ to $\infty$. (I see this as taking a sample mean over inifinitely many values). In that case, I would assume that any incomplete period could be neglected. I'm not sure what the meaning would be of an expectation evaluated over a finite interval -- I don't feel that this would be applicable in my work.

I have a very limited understanding of probability theory. The property I was interested in was $E\{ab\}=E\{a\}E\{b\}$

My question is, is the following expression correct:$$E\{\textbf{A}(t)\underline{b}(t)\underline{b}^H(t)\textbf{A}^H(t)\} = E\{\textbf{A}(t)E\{\underline{b}(t)\underline{b}^H(t)\}\textbf{A}^H(t)\}$$given that $\textbf{A}(t)$ is a complex matrix whose elements vary as a function of time in a known, deterministic manner. Meanwhile, elements of the complex column vector $\underline{b}(t)$ vary as a function of time in a random manner. All terms are zero-mean and (in case it simplifies matters) we can assume $\underline{b}(t)$ is complex Gaussian with known covariance matrix $\textbf{R}_{bb}\triangleq E\{\underline{b}(t)\underline{b}^H(t)\}$.

13. Dec 21, 2011

### Stephen Tashi

You are dealing with a continuous stochastic process and in order to define the process precisely, you have to describe precisely how the elements of $\underline{b}(t)$ are random. A stochastic process is an indexed set of random variables I suspect your idea is that at each point $t$ in time, $\underline{b}(t)$ will be an independent realization from some given multivariate Gaussian random variable. I know of no practical situation where this is a useful model.

There are many examples where it is useful to represent the values of a process at a discrete series of time steps $t_0, t_1,...$ by using a multivariate Gaussian random variable realiized at these discrete times. However, if you try to extend this model to one that involves a independent realizations from the same zero mean Gaussian multivariate random variable at every instant of time, you get a model with peculiar properties.

The usual way to extend the idea of Gaussian random variable to continuous time is to use Brownian motion ( in the technical mathematical sense of that term).

You are attempting to define the expectation of a stochastic process as a simple integration of it over time. This doesn't make sense for function that is actually "random". A particular realization of a random function f(t) will have a particular integral over a given time interval. A different realization of it will be a different function g(t) and it's integral will have a different value. To define an "average" value of the integral, you need mathematics that, in a manner of speaking, adds up all the possible integrals of an f(t) each weighted by the probability of that f(t) occurring.

Perhaps your idea is to avoid this complication by integrating over an infinite interval and assuming that this is the same as integrating the random function over many different realizations. If that is what you want then you should look into "stationary random functions".

14. Dec 21, 2011

### Stephen Tashi

..and by the way, one of the peculiar characteristics of model that uses a random realization of the same multivariate Gaussian distribution at each point in time is that the component functions realized by this random vector will be unbounded (with probability 1) on any finite interval of time. For example if a component function has a normal distribution with mean 0 and standard deviation 1 at each instant of time then there only a small chance that a random realization of that variable will be +1000 or -1000. However in any finite time interval, you have an uncountable infinity of chances to realize such a value. So the normal sort of integral of that function doesn't exist.

15. Dec 22, 2011

### weetabixharry

Yes, this is correct.
Yes - I absolutely assume stationarity in everything I do. I think I would find everything slightly terrifying without this assumption.
This sounds like ergodicity to me. To the best of my knowledge, $\underline{b}(t)$ is ergodic.
I feel like this "small chance" of taking a value of precisely +1000.000... or -1000.000... is zero (i.e. the probability of finding a value within an infinitesimally small range is zero). I'd then be in unfamiliar territory of multiplying infinity by zero, so I don't know what the consequences are.

16. Dec 22, 2011

### Stephen Tashi

I think your model is doomed, If you are trying to model "white noise". For that, you should study Brownian
motion, the Wiener process etc.

You can't say it's ergodic until you show you can integrate it on a finite interval.

The probability of any exact value is 0, even the probability of getting exactly the mean value is 0. That fact won't save you. The probability of getting a value equal or greater than 1000.0 is not zero.

As I interpret your thinking, since you define the process $\underline{b}(t)$ to be an independent realization of the same multivariate Gaussian distribution at each instant of time t, then you want to define the properties of the process (variance, etc.) to be the parameters of that single multivariate Gaussian random variable. It's isn't correct to assume that the definitions you create this way match the standard definitions for those things.

There are many examples where people model a phenomenon at discrete time intervals by assuming that at each time there is a independent realization of the same random variable. However, I have not seen any examples where people assume that an independent realization of the same variable takes place at each instant of time. If you can point out such an example then perhaps it will clarify your approach.

17. Dec 22, 2011

### weetabixharry

It's not clear to me why the model is doomed, nor what a model should look like to prevent it from being doomed.
I'm not sure what standard definitions you're talking about.
Having read this a dozen times, I still don't follow.

I'm getting progressively more confused and don't feel as though we're getting any closer to understanding the question.

18. Dec 22, 2011

### THeCatFRoMCaN

A Different View

I see: covariance since creation.
I see, the chance encounter:
a random man and woman,
interaction and codependence

And the actual time "in contact"
Two forces acting in and out
each other
Push and pull,
and how much they both
grow and do grow,
everafter.

19. Dec 23, 2011

### Stephen Tashi

My analysis of that is
1) Assuming that a process is modeled by an independent draw from the same random variable at each instant of time does have a remarkable conceptual simplicity and thus you have been able to deal with this model by thinking only about the distribution of this single random variable. You haven't had to worry about the distinction between a "continuous time stochastic process" and "a random variable".

2) If models like yours were useful in the real world, then everybody would use them since they are conceptually much simpler than the stuff that people do use.

3) People do attempt something like what you are doing. They pick a fixed interval of time and they make a model for the process that is defined on discrete time steps. However, when they do this they do NOT assert that their model can be "refined" by cutting the time step in half and drawing from the same random variable at each half-interval of time.

Saying the same thing with more detail, suppose the researcher invents the model and fits the model to some observed data. He models the process X(t) as draws from, say, a normal distribution with mean 0 and standard deviation 3.24 at times t = 1,2,3,.... If he wanted to refine the model to represent the process at times t = 0.5, 1.0, 1.5, 2.0, 2.5,... it usually turns out that drawing from a normal distribution with mean 0 and standard deviation 3.24 at all those times doesn't fit the data that is observed at this smaller time increment.

To model "white noise" in the standard way, when you decrease the time interval by half, you must change the standard deviation of the normal distribution used in the model.

It is often the case that the additive effect of "noise" must be represented by a model. For example, if X(t) is "noise" representing jumps in a stock price then the the current price of the stock depends on summing these jumps. Let D(t) = X(t) + X(t+1). Let S(t) = X(t) + X(t+0.5) + X(t+1). If we model X by independent draws from the same normal distribution then D(t) and S(t) have different distributions since D is a sum of 2 random variables and S is a sum of 3 random variables.

The idea behind the usual modeling of white noise is to change the standard deviation of the normal distribution when we decrease the time interval. It is done in such a manner that D(t) and S(t) have the same variance.

Last edited: Dec 23, 2011
20. Dec 23, 2011

### Stephen Tashi

weetabixharry,
Read the edited version of my previous post on the forum instead of the one that is probably emailed to you! I did a major revision.