Finding the error/correlation between two functions

  • Thread starter Hepth
  • Start date
  • Tags
    Functions
In summary, the author is trying to find the error correlation between two functions. For example, if he has the function at F[0,a,b] and F[0.0001,a,b] the errors should be highly correlated (nearly 1) if treated as different functions/inputs. However, if the two functions are deterministic and the input parameters are known, then the correlation can be found.
  • #1
Hepth
Gold Member
464
40
I have a function
$$F[x,a,b]$$
I am trying to find the error correlation between the function at one point and another.
For example, if I have the function at F[0,a,b] and F[0.0001,a,b] the errors should be highly correlated (nearly 1) if treated as different functions/inputs, assuming a smooth function in "x", which it is. "a" and "b" have an error that is known, and I believe I can find the correlation between "a" and "b" using external means. (I have a LOT of functions to use for that)

So if I define two functions
$$F_1[a,b] = F[0,a,b]$$
$$F_2[a,b] = F[0.1,a,b]$$

How can I find the correlation ##\sigma_{F_1 F_2}##? Even approximately would be fine. The function isn't algebraic, but programmatic in nature, though numerical derivatives are well-behaved due to its smoothness.

In the end really the function is going to be ##F[x,a_1,a_2,...,a_{11}]## But I assume if I can do it with 2 variables I can do it with more. Again, the variables "a","b","a_n" are not independent.
 
Physics news on Phys.org
  • #2
For example, if I have the function at F[0,a,b] and F[0.0001,a,b] the errors should be highly correlated (nearly 1) if treated as different functions/inputs
That depends on the scale of your input parameter x.

Where does your function (or your knowledge about it) come from?
 
  • #3
It is a long process. A lot of numerical integrals, different cuts, some fortran code, etc. Some parts of it are from numerical fits previously done.

Basically it is a function integrated over 3 phase space variables, with "x" being one of the variables, with about 11 parameters that have to be input before the function can be evaluated. It is not something that can be manipulated, BUT its nearly linear in all of the variables. So F = c_i a_i for the most part. So I can separate out building blocks by setting all parameters to zero but one, etc.

The function is well behaved in that it is slightly parabolic, with F[0,a] and F[0.1,a] differing by less than a few percent at any given point in "a".
"x" can go from 0..1 is fine. Scale doesn't really matter, as I can adjust that. By "correlation" close to "1" I mean such that ##\sigma_{1,2} = \rho_{1,2} \sigma_1 \sigma_2 \approx (0.8-1.0) \sigma_1 \sigma_2##
 
  • #4
We must clarify whether you are dealing with a problem that involves probability (random variables, correlation in the statistical sense) or whether you are talking about a "correlation" that is defined between two deterministic functions. (If the functions are deterministic, what definition are you using for "correlation"?) Are you talking about "error" as a random variable or are you talking about it as error in approximation between a deterministic function and another deterministic function that approximates it?
 
  • #5
The functions are deterministic, in that they are a theoretical prediction with error only appearing in the input parameters, ##a_i##.
There are corresponding experimental data for these functions, some of which are F[0], F[0.1], etc. Experimentally these have an error correlation. They also should for the theoretical side. In the end, the theoretical functions will be used to fit all ##a_i## parameters.
So I have errors on the ##a_i## for determining my theoretical points. I have functions F, G, H, I, J. These are uncorrelated. But I also have each evaluated at different points in ##x##. I can construct both an experimental and theoretical correlation matrix relating each of these datapoints. {F[0], F[0.1], F[0.2], G[0]. etc}.
There should be some correlation in the error of the theoretical F[0] and F[0.1]. The errors come strictly from the input parameters. F,G,H,I,J are independent of each other, but F[0], F[0.1] are not.

Does that clear anything up?
I tried something like
##F_{+} \equiv F[0]+F[0.1]##
##F_{-} \equiv F[0]-F[0.1]##
##\sigma^2(F_{+}) - \sigma^2(F_{-}) = 4 \sigma^2_{F[0],F[0.1]} \approx \sum_{i,j} \partial_{a_i} F_{+} \partial_{a_j} F_{+} \sigma_{a_i} \sigma_{a_j} \rho_{i,j}-\sum_{i,j} \partial_{a_i} F_{-} \partial_{a_j} F_{-} \sigma_{a_i} \sigma_{a_j} \rho_{i,j}##

But I am not sure if that is the correct way to progress.
 
  • #6
In your example:

Hepth said:
assuming a smooth function in "x", which it is. "a" and "b" have an error that is known, and I believe I can find the correlation between "a" and "b" using external means. (I have a LOT of functions to use for that)

So if I define two functions
$$F_1[a,b] = F[0,a,b]$$
$$F_2[a,b] = F[0.1,a,b]$$

How can I find the correlation ##\sigma_{F_1 F_2}##?.

If the parameters [itex]a [/itex] and [itex] b [/itex] are random variables, this makes [itex] F_1, F_2 [/itex] random variables that are functions of [itex] a [/itex] and [itex] b [/itex]. To compute the correlation between [itex] F_1 [/itex] and [itex] F_2 [/itex] you need to know the joint distribution of [itex] F_1, F_2 [/itex] or the joint distribution of [itex] a, b [/itex].. To estimate the correlation between [itex] F_1 , F_2 [/itex] you need data that contains several random realizations of [itex] F_1, F_2 [/itex] or data that contains several random realizations of [itex] a, b [/itex].

If you want to treat [itex] a, b [/itex] as fixed constants instead of random variables, then it isn't clear to me what you mean by a "correlation" between [itex] F_1 [/itex] and [itex] F_2 [/itex] because [itex] F_1, F_2 [/itex] would be fixed constants.
 
  • #7
You have a scalar function then with error-free input [itex]\vec x[/itex] as well as the input [itex]\vec a[/itex], which has correlated errors [itex]\vec\epsilon[/itex], which I'll assume is a random vector. Let the function be [itex]f(\vec x, \vec a)[/itex], and the covariance between two errors of the parameters [itex]cov(\epsilon_i, \epsilon_j) = \sigma_{i,j}[/itex].

At any point, for a reasonably "nice" function [itex]f[/itex], the function can be approximated with the first-order Taylor expansion around the point [itex](\hat{\vec x}, \hat{\vec a})[/itex] as
$$f(\vec x, \vec a) \approx f(\hat{\vec x}, \hat{\vec a}) + \sum_{r=1}^m \left (\frac{\partial f}{\partial x_r} \right )_{(\hat{\vec x}, \hat{\vec a})}(x_r - \hat{x}_r) + \sum_{s=1}^n \left ( \frac{\partial f}{\partial a_s} \right )_{(\hat{\vec x}, \hat{\vec a})}(a_s - \hat{a}_s)$$

For the sake of my sanity, let the nominal value [itex]\vec p = \begin{bmatrix}\vec x \\ \vec a\end{bmatrix}[/itex] and the error be [itex]\vec p_\epsilon = \begin{bmatrix}\vec 0 \\ \vec \epsilon\end{bmatrix}[/itex]. If we then evaluate the function and include input error, I can then rewrite the approximation as [itex]f(\vec p + \vec p_\epsilon) \approx f(\hat{\vec p}) + \vec J_f(\hat{\vec p})(\vec p + \vec p_\epsilon - \hat{\vec p})[/itex] where [itex]\vec J_f(\hat{\vec p})[/itex] is the Jacobian evaluated at the expansion point. Let's also assume that [itex]\hat{\vec p}[/itex] is error-free. Therefore, the error of the function is [itex]\vec J_f(\hat{\vec p})\vec p_\epsilon[/itex]. We can interpret this as a linear function of a random vector.

Let the error function be [itex]\delta(\vec t) = \vec J_f(\hat{\vec p})\vec t[/itex]. It can then be seen that
$$E(\delta(\vec p_\epsilon)) = \vec J_f(\hat{\vec p})E(\vec p_\epsilon) \\
\Sigma(\delta(\vec p_\epsilon)) = \vec J_f (\hat{\vec p})\Sigma(\vec p_\epsilon)\vec J_f(\hat{\vec p})^\text{T}$$
where [itex]\Sigma[/itex] is the covariance matrix. Note that since the function is a scalar function, the error function as well as these values are scalars as well.

If we now evaluate at two points, each one having their own identically distributed errors [itex]\vec p_{\epsilon, 1}[/itex] and [itex]\vec p_{\epsilon, 2}[/itex], you want to find the correlation between the errors of the function, [itex]\delta_1 = \delta(\vec p_{\epsilon, 1})[/itex] and [itex]\delta_2 = \delta(\vec p_{\epsilon, 2})[/itex].

The correlation is [itex]\rho_{\delta_1, \delta_2} = \frac{cov(\delta_1, \delta_2)}{\sqrt{\Sigma(\delta_1)\Sigma(\delta_2)}}[/itex]. From what I can tell, you should be able to calculate the denominator. The numerator is the tricky one. Apart from estimating it from experimental data, I don't know of a good way of doing it without knowing more about the errors of the parameters themselves. I suppose if you know their distributions, a quick and dirty way is to use a Monte-Carlo approach.
 
  • #8
Sorry for the delayed response. That's is exactly what I was looking for. I think in the end I will have to use some random-varying (MC) for the parameters to determine the covariance. I am trying to avoid using any experimental information if I can until I begin the fitting procedure. I would like a theoretical estimate of the correlation and errors prior, hence this problem. I think I'll just do a MC run on it; hoping that I have enough information about the errors and their correlation to each other.

Thanks!
 

1. What is the purpose of finding the error/correlation between two functions?

The purpose of finding the error/correlation between two functions is to measure the degree of relationship or similarity between the two functions. This allows us to determine if there is a consistent pattern or trend between the two functions, and how closely they are related.

2. How is the error between two functions calculated?

The error between two functions is typically calculated by finding the difference between the actual value and the predicted value for each data point, squaring those differences, and then summing them. This is known as the mean squared error and is a common method for measuring the error between functions.

3. What does a high correlation between two functions indicate?

A high correlation between two functions indicates a strong relationship between the two variables. This means that as one function changes, the other function is likely to change in a consistent and predictable manner. A correlation of 1 indicates a perfect positive relationship, while a correlation of -1 indicates a perfect negative relationship.

4. Can the correlation between two functions be used to determine causation?

No, correlation does not imply causation. While a high correlation between two functions may suggest a relationship, it does not necessarily mean that one function causes the other. Other factors and variables may be at play, and further research and analysis are needed to establish causation.

5. What are some limitations of finding the error/correlation between two functions?

One limitation is that it only measures the relationship between two specific variables, and may not account for other factors or variables that could also be influencing the data. Additionally, correlation does not necessarily indicate a causal relationship, and it is important to consider other evidence and factors when drawing conclusions from the results.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
16
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
902
  • Set Theory, Logic, Probability, Statistics
2
Replies
58
Views
5K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
765
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
Back
Top