Solving for P_x(X) given P_y(Y) and P_(x+y)(X+Y)

  • Thread starter Boorglar
  • Start date
In summary, the chi-squared statistic is a function that measures the error between predicted values and the observed values, and can be used to find values for the unknowns that make predictions that closely fit the observed data. The problem can be viewed as curve fitting - find the values of the unknowns that make predictions that closely fit the observed data. The chi-squared statistic can be used to find values for the unknowns that minimize error of the prediction subject to contraints on the variables that make them legitimate probabilities. The optimization problem would be to find values of the unknowns that minimize this function subject to the given constraints.
  • #1
Boorglar
210
10
I am considering two discrete random variables, X and Y, with values in {0,1,...,9}.
From my data, I can obtain an estimate for the distribution of Y (the function [itex]P_{Y}(y) = P(Y=y)[/itex] for y in {0,1,...,9}).
I can also obtain experimentally the distribution of the sum [itex]P_{X+Y}(s)[/itex] (that is, P(X+Y = s) for s in {0, 1, ..., 9} (the sum is modulo 10).

From that, I want to solve for distribution function of X ([itex]P_{X}(x) = [/itex] probablility that X = x for x in {0,1,...,9}).

I furthermore assume that X and Y are independent variables, so that their joint probability is the product of their individual probabilities.

From probability theory, I know I can write [itex]P_{X+Y}(s) = \sum_{x = 0}^{9}P_{Y}((s-x) \mod 10) P_{X}(x)[/itex]
Since I want to solve for [itex]P_{X}(x)[/itex], I need to solve a 10x10 system of equations with coefficient matrix given by the values [itex]P_{Y}(s-x)[/itex] and the vector of all values of [itex]P_{X+Y}(s)[/itex] for s in {0,...,9}.

Unfortunately, perhaps due to the estimates not being perfectly accurate, the solution of this system of equations (done numerically) gives me negative values for some of the probabilities in [itex]P_X(x)[/itex].

So is there another way of obtaining an estimate for the values of [itex]P_X(x)[/itex]?

Thank you.
 
Physics news on Phys.org
  • #2
One of my professors said "All problems can be viewed as optimization problems".

You can view the problem to be something like curve fitting - find the values of the unknowns that make predictions that closely fit the observed data.

Treat both [itex]P_X(i) [/itex] and [itex] P_Y(i) [/itex] as unknowns, for all [itex] i [/itex]. Define a function of those variables that predicts some aspects of the observed data. Define a function that measures the "error" between the predicted aspects and the observed aspects. Treat the the problem as finding values of the unknowns that minimize "error" of the prediction subject to contraints on the variables that make them legitimate probabilities (e.g. [itex] 0 \le P_X(i) \le 1 [/itex] , [itex] \sum_{i=0}^9 P_X(i) = 1 [/itex] , [itex] 0 \le P_Y(i) \le 1[/itex] , [itex] \sum_{i=0}^9 P_Y(i) = 1 [/itex] ).

For example, look at the definition of the [itex] \chi^2 [/itex] statistic. http://en.wikipedia.org/wiki/Pearson's_chi-squared_test Forget about [itex] \chi^2 [/itex] being used in a statistical test and just consider it as a function that measures the error between predicted values (the expected values) and the observed values. In your data you have two types of "bins", Y-bins and X+Y bins.

In the "Y-bins" you have counts of how many times particular Y-values were observed. For example if you ran 1000 experiments and observed Y = 3 happened 88 times, the expected value [itex] E_3 = (1000)P_Y(3) [/itex] and the observed value [itex] O_3 = 88 [/itex].

In the "X+Y" bins, you may have run a different number of experiments, let's say it was 2000. Then, for example, the expected number in the bin for X+Y=3 is [itex] E_3 = 2000( P_X(0)P_Y(3) + P_X(1)P_Y(2) + P_X(2)P_Y(1) + ...P_X(9)P_Y(4) + ..) [/itex] (corresponding to 2000 mutiplied by expression that appears on one of your ten equations.).

The formula for [itex] \chi^2 [/itex] defines a function in the unknowns [itex] P_X(i), P_Y(i) [/itex]. Values of the unknowns that make this function small produce a better fit to the observed data. Solve for the values of the unknowns that minimize [itex] \chi^2 [/itex], subject to the given constraints.

---
There are different methods for doing statistical estimation and they may not give the same answers. Which method is best? That is a complicated question. Defining "best" involves technicalities and gets into the fact that estimation procedures that are best in one respect may not be best in another respect.
 
  • #3
Hello,
I see, this seems like a good solution. So if I use the chi-squared function as my error measurement, I would compute [itex]\sum_{y=0}^{9}\frac{(E_Y(y) - O_Y(y))^2}{E_Y(y)} + \sum_{s=0}^{9}\frac{(E_{X+Y}(s) - O_{X+Y}(s))^2}{E_{X+Y}(s)}[/itex] where [itex]E_{X+Y}(s)[/itex] is replaced by the convolution sum in terms of [itex]E_X[/itex] and [itex]E_Y[/itex].
And my optimization would run on the space of values for [itex]E_X(x), E_Y(y)[/itex] for x, y in {0, 1, ..., 9}, constrained by [itex]\frac{1}{|X|}\sum_{x=0}^{9}E_X(x) = \frac{1}{|Y|}\sum_{y=0}^{9}E_Y(y) = 1[/itex] and values between 0 and 1.

Any suggestions on how to perform the optimization? First thing that came to mind was using the zero-gradient test, and Lagrange multipliers. But I don't like these unknowns occurring at the denominators...
 
  • #4
If the form of the probability model for the data is perfect then when the expected value associated with a given bin is zero, no observations should appear in that bin. (Since the random variable involved is a non-negative count, it can't have a zero mean except by being zero with probability 1.) So you could define the function [itex] \frac{ (O(s) - E(s))^2}{E(s)} [/itex] to be zero when [itex] E(s) = 0 [/itex].

If we acknowledge the probability model is only an imperfect approximation to how the data is generated, then we'll have to think about a different error function to minimize. (Perhaps, a better term than "error function" would be "penalty function".)

If I had to write a computer program this morning to do the problem, I wouldn't do anything that depended on finding the symbolic expression for the gradient of the penalty function or lagrange multipliers. Someone who was good at using computer symbolic algebra software might be able to do that. I would rely on approximating the gradient numerically. To overcome local minima, I'd pick a large number of initial points and use approximate gradient following to see where they lead.

I don't know how much effort you want to spend on the problem. If it is an important problem then I suggest the following:

1) Define one or more utility functions that measure the utility (or dis-utility, if you prefer) of a wrong estimate. If the actual value of the unknowns is A and your estimate of the unknowns is B then find a function F(A,B) that measures the utility of the situation. ( F(A,B) may or may not be related to the function you minimize to make the estimate B. )

2) Do a Monte-Carlo simulation where you generate some random "true" values of A and apply your estimation method to obtain a B. Look at the distribution of F(A,B) as a random variable. That indicates how well your estimation method works "in general".
 
  • #5
Thank you for the advice. For the optimization, I found a fast way would be to use MATLAB. I found it has a function called "fmincon" which can solve nonlinear optimization problems with linear constraints (in this case, the requirement that probabilities add up to 1 and that they are between 0 and 1). As for the penalty/error function, I am not sure if I should be minimizing the sum of the chi-squared values of [itex]P_Y[/itex] and [itex] P_{X+Y}[/itex], or something else.

I am not sure what you mean by the probability model, but to give you a concrete example of the data I am considering, Y would be the difference between two consecutive plaintext letters using some alphabet, and X would be the difference between two consecutive key values (this is in the context of Vigenere / Running Key cryptograms). I am basically studying some properties of my plaintext and ciphertext distributions, to try to infer something about the key distribution. (My variables X, Y would actually take values in a set as large as the alphabet used). Note, the plaintext distribution does not come from the plaintext being enciphered, but from typical english language data. I may not know the content of the encrypted data, except that it is in english, and of reasonable length (thousands of letters).
 
Last edited:
  • #6
Boorglar said:
I am not sure what you mean by the probability model,

I mean the assumption that the realizations of [itex] X [/itex] and [itex] Y [/itex] are actually independent realizations from the probability distributions [itex] P_X(i),\ P_Y(i) [/itex]. If that isn't exactly true then it might happen that the "best" fit of such a model to the data would have (for example) [itex] P_{X+Y}( 4) = 0 [/itex] even though the actual data might have (say) one occurrence of [itex] X + Y = 4 [/itex] in several thousand realizations.
 

1. How do I solve for Px(X) given Py(Y) and Px+y(X+Y)?

To solve for Px(X), you can use the formula Px(X) = Px+y(X+Y) / Py(Y). This formula uses the fact that P(X) and P(Y) are independent events, so their probabilities can be multiplied together to find the probability of their intersection.

2. Can I use this formula if Px+y(X+Y) is not provided?

No, you need to have all three probabilities (Px(X), Py(Y), and Px+y(X+Y)) in order to solve for Px(X). If one of the probabilities is missing, you will not be able to find the solution.

3. Is there a different formula for solving for Px(X) if Py(Y) and Px+y(X+Y) are not independent?

Yes, if Py(Y) and Px+y(X+Y) are not independent, you will need to use the formula Px(X) = Px+y(X+Y) - Py(Y). This formula takes into account the overlap between the two events and subtracts it from the total probability.

4. Can this formula be used for more than two events?

No, this formula is specifically designed for solving for Px(X) given two events: Py(Y) and Px+y(X+Y). If you have more than two events, you will need to use a different approach and possibly use the general rule of conditional probability.

5. Are there any assumptions or limitations to using this formula?

Yes, this formula assumes that the events are independent and that the probabilities are known. It also assumes that the events are mutually exclusive, meaning that they cannot occur at the same time. If these assumptions are not met, the formula may not give an accurate solution.

Similar threads

  • Precalculus Mathematics Homework Help
Replies
6
Views
1K
  • Precalculus Mathematics Homework Help
Replies
1
Views
502
Replies
39
Views
479
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
742
Replies
3
Views
1K
  • Engineering and Comp Sci Homework Help
Replies
2
Views
1K
  • Calculus and Beyond Homework Help
Replies
2
Views
542
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Precalculus Mathematics Homework Help
Replies
2
Views
512
Back
Top