Finding the error/correlation between two functions

Hepth · Aug 17, 2014

I have a function
$$F[x,a,b]$$
I am trying to find the error correlation between the function at one point and another.
For example, if I have the function at F[0,a,b] and F[0.0001,a,b] the errors should be highly correlated (nearly 1) if treated as different functions/inputs, assuming a smooth function in "x", which it is. "a" and "b" have an error that is known, and I believe I can find the correlation between "a" and "b" using external means. (I have a LOT of functions to use for that)

So if I define two functions
$$F_1[a,b] = F[0,a,b]$$
$$F_2[a,b] = F[0.1,a,b]$$

How can I find the correlation ##\sigma_{F_1 F_2}##? Even approximately would be fine. The function isn't algebraic, but programmatic in nature, though numerical derivatives are well-behaved due to its smoothness.

In the end really the function is going to be ##F[x,a_1,a_2,...,a_{11}]## But I assume if I can do it with 2 variables I can do it with more. Again, the variables "a","b","a_n" are not independent.

mfb · Aug 17, 2014

For example, if I have the function at F[0,a,b] and F[0.0001,a,b] the errors should be highly correlated (nearly 1) if treated as different functions/inputs

That depends on the scale of your input parameter x.

Where does your function (or your knowledge about it) come from?

Hepth · Aug 17, 2014

It is a long process. A lot of numerical integrals, different cuts, some fortran code, etc. Some parts of it are from numerical fits previously done.

Basically it is a function integrated over 3 phase space variables, with "x" being one of the variables, with about 11 parameters that have to be input before the function can be evaluated. It is not something that can be manipulated, BUT its nearly linear in all of the variables. So F = c_i a_i for the most part. So I can separate out building blocks by setting all parameters to zero but one, etc.

The function is well behaved in that it is slightly parabolic, with F[0,a] and F[0.1,a] differing by less than a few percent at any given point in "a".
"x" can go from 0..1 is fine. Scale doesn't really matter, as I can adjust that. By "correlation" close to "1" I mean such that ##\sigma_{1,2} = \rho_{1,2} \sigma_1 \sigma_2 \approx (0.8-1.0) \sigma_1 \sigma_2##

Stephen Tashi · Aug 17, 2014

We must clarify whether you are dealing with a problem that involves probability (random variables, correlation in the statistical sense) or whether you are talking about a "correlation" that is defined between two deterministic functions. (If the functions are deterministic, what definition are you using for "correlation"?) Are you talking about "error" as a random variable or are you talking about it as error in approximation between a deterministic function and another deterministic function that approximates it?

Hepth · Aug 18, 2014

The functions are deterministic, in that they are a theoretical prediction with error only appearing in the input parameters, ##a_i##.
There are corresponding experimental data for these functions, some of which are F[0], F[0.1], etc. Experimentally these have an error correlation. They also should for the theoretical side. In the end, the theoretical functions will be used to fit all ##a_i## parameters.
So I have errors on the ##a_i## for determining my theoretical points. I have functions F, G, H, I, J. These are uncorrelated. But I also have each evaluated at different points in ##x##. I can construct both an experimental and theoretical correlation matrix relating each of these datapoints. {F[0], F[0.1], F[0.2], G[0]. etc}.
There should be some correlation in the error of the theoretical F[0] and F[0.1]. The errors come strictly from the input parameters. F,G,H,I,J are independent of each other, but F[0], F[0.1] are not.

Does that clear anything up?
I tried something like
##F_{+} \equiv F[0]+F[0.1]##
##F_{-} \equiv F[0]-F[0.1]##
##\sigma^2(F_{+}) - \sigma^2(F_{-}) = 4 \sigma^2_{F[0],F[0.1]} \approx \sum_{i,j} \partial_{a_i} F_{+} \partial_{a_j} F_{+} \sigma_{a_i} \sigma_{a_j} \rho_{i,j}-\sum_{i,j} \partial_{a_i} F_{-} \partial_{a_j} F_{-} \sigma_{a_i} \sigma_{a_j} \rho_{i,j}##

But I am not sure if that is the correct way to progress.

Stephen Tashi · Aug 18, 2014

In your example:

Hepth said:

assuming a smooth function in "x", which it is. "a" and "b" have an error that is known, and I believe I can find the correlation between "a" and "b" using external means. (I have a LOT of functions to use for that)

So if I define two functions
$$F_1[a,b] = F[0,a,b]$$
$$F_2[a,b] = F[0.1,a,b]$$

How can I find the correlation ##\sigma_{F_1 F_2}##?.

If the parameters [itex]a [/itex] and [itex] b [/itex] are random variables, this makes [itex] F_1, F_2 [/itex] random variables that are functions of [itex] a [/itex] and [itex] b [/itex]. To compute the correlation between [itex] F_1 [/itex] and [itex] F_2 [/itex] you need to know the joint distribution of [itex] F_1, F_2 [/itex] or the joint distribution of [itex] a, b [/itex].. To estimate the correlation between [itex] F_1 , F_2 [/itex] you need data that contains several random realizations of [itex] F_1, F_2 [/itex] or data that contains several random realizations of [itex] a, b [/itex].

If you want to treat [itex] a, b [/itex] as fixed constants instead of random variables, then it isn't clear to me what you mean by a "correlation" between [itex] F_1 [/itex] and [itex] F_2 [/itex] because [itex] F_1, F_2 [/itex] would be fixed constants.

da_nang · Aug 18, 2014

You have a scalar function then with error-free input [itex]\vec x[/itex] as well as the input [itex]\vec a[/itex], which has correlated errors [itex]\vec\epsilon[/itex], which I'll assume is a random vector. Let the function be [itex]f(\vec x, \vec a)[/itex], and the covariance between two errors of the parameters [itex]cov(\epsilon_i, \epsilon_j) = \sigma_{i,j}[/itex].

At any point, for a reasonably "nice" function [itex]f[/itex], the function can be approximated with the first-order Taylor expansion around the point [itex](\hat{\vec x}, \hat{\vec a})[/itex] as
$$f(\vec x, \vec a) \approx f(\hat{\vec x}, \hat{\vec a}) + \sum_{r=1}^m \left (\frac{\partial f}{\partial x_r} \right )_{(\hat{\vec x}, \hat{\vec a})}(x_r - \hat{x}_r) + \sum_{s=1}^n \left ( \frac{\partial f}{\partial a_s} \right )_{(\hat{\vec x}, \hat{\vec a})}(a_s - \hat{a}_s)$$

For the sake of my sanity, let the nominal value [itex]\vec p = \begin{bmatrix}\vec x \\ \vec a\end{bmatrix}[/itex] and the error be [itex]\vec p_\epsilon = \begin{bmatrix}\vec 0 \\ \vec \epsilon\end{bmatrix}[/itex]. If we then evaluate the function and include input error, I can then rewrite the approximation as [itex]f(\vec p + \vec p_\epsilon) \approx f(\hat{\vec p}) + \vec J_f(\hat{\vec p})(\vec p + \vec p_\epsilon - \hat{\vec p})[/itex] where [itex]\vec J_f(\hat{\vec p})[/itex] is the Jacobian evaluated at the expansion point. Let's also assume that [itex]\hat{\vec p}[/itex] is error-free. Therefore, the error of the function is [itex]\vec J_f(\hat{\vec p})\vec p_\epsilon[/itex]. We can interpret this as a linear function of a random vector.

Let the error function be [itex]\delta(\vec t) = \vec J_f(\hat{\vec p})\vec t[/itex]. It can then be seen that
$$E(\delta(\vec p_\epsilon)) = \vec J_f(\hat{\vec p})E(\vec p_\epsilon) \\
\Sigma(\delta(\vec p_\epsilon)) = \vec J_f (\hat{\vec p})\Sigma(\vec p_\epsilon)\vec J_f(\hat{\vec p})^\text{T}$$
where [itex]\Sigma[/itex] is the covariance matrix. Note that since the function is a scalar function, the error function as well as these values are scalars as well.

If we now evaluate at two points, each one having their own identically distributed errors [itex]\vec p_{\epsilon, 1}[/itex] and [itex]\vec p_{\epsilon, 2}[/itex], you want to find the correlation between the errors of the function, [itex]\delta_1 = \delta(\vec p_{\epsilon, 1})[/itex] and [itex]\delta_2 = \delta(\vec p_{\epsilon, 2})[/itex].

The correlation is [itex]\rho_{\delta_1, \delta_2} = \frac{cov(\delta_1, \delta_2)}{\sqrt{\Sigma(\delta_1)\Sigma(\delta_2)}}[/itex]. From what I can tell, you should be able to calculate the denominator. The numerator is the tricky one. Apart from estimating it from experimental data, I don't know of a good way of doing it without knowing more about the errors of the parameters themselves. I suppose if you know their distributions, a quick and dirty way is to use a Monte-Carlo approach.

Hepth · Aug 21, 2014

Sorry for the delayed response. That's is exactly what I was looking for. I think in the end I will have to use some random-varying (MC) for the parameters to determine the covariance. I am trying to avoid using any experimental information if I can until I begin the fitting procedure. I would like a theoretical estimate of the correlation and errors prior, hence this problem. I think I'll just do a MC run on it; hoping that I have enough information about the errors and their correlation to each other.

Thanks!

Finding the error/correlation between two functions

1. What is the purpose of finding the error/correlation between two functions?

2. How is the error between two functions calculated?

3. What does a high correlation between two functions indicate?

4. Can the correlation between two functions be used to determine causation?

5. What are some limitations of finding the error/correlation between two functions?

Similar threads

Hot Threads

Recent Insights