# Probability Density of a Constrained Chi-Square

Soveraign
Hello PF! It's been a while. How are things?

In my research I'm faced with determining a probability distribution from a function built as follows:

Perform three measurements X, Y, Z that have normally distributed errors.

Impose a constraint and variable change that allows me to reduce the dimensionality to 2.

My question is: Can I assume the resulting function is a chi-square with 2 dof and therefore write my pdf as

$$exp(- \chi^2 / 2)$$

The long version with specifics:

I am measuring the energies and opening angle of two photons with a common point of origin and I wish to determine the probability density of true energies and angles from this single measurement. For simplicity I assuming Gaussian errors on the measurements. The opening angle is transformed a bit to make the calculations easier and I start with an initial chi-square of (subscript "m" is my measured value and "z" is my transformed angle measurement):

$$\chi^2 = \frac {(E_1 - E_{1m})^2} {\sigma_{E1}^2} + \frac {(E_2 - E_{2m})^2} {\sigma_{E2}^2} + \frac {(z - z_{m})^2} {\sigma_{z}^2}$$

The photons are produced by a common particle and therefore I can impose the constraint that the invariant mass of these photons is a specific value "M" (p is four-momentum).

$$C = (\mathbf p_{\gamma 1} + \mathbf p_{\gamma 2})^2 - M^2 = 0$$

This allows me to reduce the variables from 3 to 2, but in a fairly non-linear way. My final chi-square is a function of energy of the original common particle and the cosine of the center of momentum decay angle of the photons:

$$\chi^2 = f(E, \cos{\theta^*})$$

As one would expect the transformations are quite non-linear, but in practice frequently are "close" to linear for the actual values being considered. I don't want to further burden this post with the ugly transformation details but would happily provide if it is needed.

So the long version of the question is: Can I assume the above expression is still a chi-square? Is the dof 2? Does the fact that E and cos(th*) are not independent play a role in determining the proper dof?

Many, many thanks to anyone that can help. I am especially interested in sources I can reference so I know I'm standing on strong theoretical grounds.

## Answers and Replies

Hey Soveraign.

One recommendation I have (and this applies to any situation similar to yours) is to use simulation as a verification tool whenever you need to check a distribution against gut instinct or theoretical verification.

If you have dependencies then you will either need to find a joint distribution and the entangled limits (since they are dependent) or you will need to express one variable as a function of another.

The theory for things like this include transformation theorems of functions of random variables, probability transforms like the characteristic transformation and a variety of other results.

Given that you have complex constraints, my first suggestion is to resort to simulation. You can use a package like R, or if you have massive complex conditional distributions, you would want to use something like WinBUGS. Both are free and R is open source.

After you do a simulation with enough data points (say 10,000 - 100,000) you can then plot the distribution, calculate its moments, and even do a goodness of fit test against the chi-square distribution with two degrees of freedom.

Soveraign
Thanks for the reply. I do check assumptions about distributions with high-stat simulations to try and catch mistakes. In the situation I'm describing above, it is the high-stat sims that show I must be doing something wrong... but not horribly so. I have both a bias in my final results as well as a suspect chi-square/dof when trying to combine multiple events from the simulations.

While giving an exam I was thinking this over and began to wonder if I should be approaching my constraint as more of a "condition" like so:

$$P(E1=a, E2=b, Z=c | M=m) = P(M=m | E1=a, E2=b, Z=c) P(E1=a, E2=b, Z=c) / P(M=m)$$

The P(E1=a, E2=b, Z=c) would be trivariate normal and the P(M=m | E1=a, E2=b, Z=c) might end up as a chi-square with dof 1. But then I realized the constraint imposes zero probability on a large range of E1, E2, Z (thus why I am able to reduce to the two variables mentioned in the first post) and might be the wrong approach.

Do you recommend any links/books that would go into some detail about joint distributions for non-independent variables (and what you mean by entangled limits)?

Am meeting with some stats people tomorrow about it, hoping for some more insight.

I'd recommend getting a basic book on probability that covers the various kinds of transforms. These include

finding the distribution of a function of a random variable, characteristic and probability generating transforms, moment generating functions, and results to do with sums, products, and quotients of random variables.

I'd also suggest understanding how to take conditional and marginal distributions to get the joint distribution which is what you were getting at in the post above.

If you are dealing with normal random variables then you should be looking at how to estimate the covariance matrix. If you can derive the covariance matrix from your constraints (which you should be able to) then you can get the final joint distribution. Also any linear combination of normals is normal and there are theorems that allow you to get the conditional means and variances of any compound normal against another compound normal.

Entangled limits are just limits that depend on other variables and not constants. An example is say 0 < x < y as opposed to 0 < x < 1. If you have any kind of entanglement then you have dependencies between the random variables that have the entangled limits.

If you can find the entangled limits, you can use that to specify the distribution but I don't know of a lot of solid theories or results to do it in general.

Soveraign
Thought I would follow up here and try to more succinctly describe the problem. At its core, I have three random variables that are normally distributed:

$$x_1 \sim N(\mu_1, \sigma_1) \\ x_2 \sim N(\mu_2, \sigma_2) \\ x_3 \sim N(\mu_3, \sigma_3)$$

I also know there is a relationship among the distributions:

$$\mu_1 \mu_2 \mu_3 = C$$

where "C" is a constant. Given exactly one sample from the distributions, I want to determine the probability densities for hypothesis \mu's. Specifically I need:

$$p(\bar \mu \vert \bar x)$$

My latest attempt to work this out has taken me to a Bayes style of looking at it:

$$p(\bar \mu \vert \bar x) = \frac {p(\bar x \vert \bar \mu) p(\bar \mu)} {p(\bar x)}$$

where:

$$p(\bar x \vert \bar \mu)$$

is simply the joint probability of the three normals for specific x's. The relationship among the means seems to certainly be prior information and my instinct is to model it as:

$$p(\bar \mu) = \begin{array}{l l} A, & \mu_1\mu_2\mu_3 = C \\ 0, & otherwise \end{array}$$

where "A" is just a constant. This would enforce a zero probability when the constraint is not met. Then finally p(x) as just normalization. The final posterior distributions look "ok" but I'm unsure if I'm doing this right.

Thoughts anyone? Thanks!