# Pathological PDFs. eg: ratio of normals including Cauchy.

1. Jul 30, 2012

### andrewr

Hi all,
I've been having a discussion about doing calculations on data which is supposedly Gaussian.
And (Of course) there is a problem: Once operations are performed on the measurements -- such as taking a ratio of one kind of measurement to another; the result is often no longer a Gaussian; In particular, I'd like to explore in this thread the problem of Gaussians ratios.

Stephen Tashi made some excellent comments, and provided some links that I think describe the pathological nature of this distribution well -- but also see, especially, the paper by Marsaglia.
Background information

What I am going to present in this thread is an analysis of the properties of ratio's of Gaussians (which ends in the very pathological Cauchy) I wish to study the mean (hopefully exact) and a quasi standard deviation (quasi because many of these distributions won't have a finite one...)

Based on symmetry arguments, I would say even the Cauchy distribution has a real mean, or no mean -- in the sense that it is a definite number, 0. The Cauchy, and I think many others with mu's very close to zero *do* have means.

For many ratios of Gaussians -- especially those ratios where the numerator and denominators have respective μ >> σ -- doing numerical experiments (eg: sampling approximations) I get repeatable results for both the sample mean and deviation of the experiment. Occasionally I will get a catastrophic failure... and this happens much more often as the mean of the denominator approaches zero.

I'd like to derive formulas for both the mean, and probable *sample* deviation, and also the confidence intervals that a sample will avoid a "catastophic" sub-sampling;

and I'll have to explain some of this later on in the thread.

For now, I'd like to verify my derivation for the mean of a ratio of Gaussians. Attached to the bottom of this post is a graph showing (the red line) what my derivation produced as a final result. Also on the graph are 6 locations that I did numerical experiments on and received results in agreement with the derivation often. I don't think there is a question of whether the result is correct or not -- I'm confident it is correct; there is just more to the problem...

Notice, the graph has a numerator of N(1,0); a constant; but see the derivation itself to understand why it is sufficient for calculating mu of N(a,1) / N(b,1).

The formula I came up with is (drumroll please!):
$$\mu = a \sqrt { 2 } \int \limits _{ 0 } ^{ b \over \sqrt { 2 } } e ^{ t ^ { 2 } - b ^{ 2 } \over 2 } dt$$
Or, alternately,
$$\mu = a \sqrt { \pi \over 2 } e ^{ - { b ^{ 2 } \over 2 } } \times erfi \left ( { b \over \sqrt { 2 } } \right )$$
where
$$erfi(t) = \sqrt { 4 \over \pi } \int \limits _{ 0 } ^{ t } e ^{ t ^ { 2 } } dt$$

I will give the derivation in the next post, which needs some cleaning up. I'd appreciate some pointers on how to improve the derivation's quality -- as that will undoubtedly help me work out (clearly) the issues about higher moments...

#### Attached Files:

• ###### invertNormal.png
File size:
4.3 KB
Views:
118
Last edited: Jul 31, 2012
2. Jul 31, 2012

### andrewr

How to calculate the mean of a ratio of Gaussians; a result.

The probability of each ratio element is:

$$p(X,Y)={ 1 \over 2\pi } ( e^{-{ 1 \over 2}(x-a)^2} e^{-{ 1 \over 2}(y-b)^2} )$$

The elements of the mean of the ratio are:

$$\mu_{e} ( X,Y )=x/y$$

Which makes the weighted mean's elements:

$$p(X,Y) \mu_{e} ( X,Y ) = { 1 \over 2\pi } x/y( e^{-{ 1 \over 2}(x-a)^2} e^{-{ 1 \over 2}(y-b)^2} )$$

Combining yields:

$$\mu = \int \limits _{ - \infty }^{ \infty } \int \limits_{ - \infty }^{ \infty }{ 1 \over 2 \pi } x/y( e^{-{ 1 \over 2 }(x-a)^2} e^{-{ 1 \over 2 }(y-b)^2} ) dxdy$$

Changing to polar coordinates to allow fixed radii vs probability pairs:

$r^{2}=(x-a)^{2}+(y-b)^{2}$ and $x-a=cos( \theta )r$ and $y-b=sin( \theta )r$

$$dxdy=r \times d \theta dr$$

$$\mu = \int \limits_{ 0 }^{ \infty } \int \limits _{0}^{ 2 \pi } { 1 \over 2 \pi } (cos( \theta )r+a)/(sin( \theta )r+b) e^{-{ 1 \over 2}r^2} (r \times d \theta dr)$$

The ratio weight's numerator has two parts summed, the angular integral will also have two parts:
The portion caused by $cos( \theta )r \over sin( \theta )r+b$ is just $log \left ( sin( \theta ) r + b \right )$

Since there are no discontinuities in the log(), this term vanishes and may be ignored when r<b. When r>=b, an imaginary solution results -- which I will not treat of, for I can show the result must be zero based on symmetry; the numerator is equally distributed across the zero point, so it must have exactly equal positives an negatives to produce equal probability positives and negatives after division; thus cancelling. I do treat a similar problem, below, though which may need to be improved.

The problem is now reduced to integrating $a \over sin( \theta )r+b$
The function, being periodic, has the same area if cos() is substituted for sin(), and the results are easier to work with...

The sub-problem to solve is now:

$$\int \limits _{0}^{ 2 \pi } f( \theta ) = \int \limits _{0}^{ 2 \pi } { a \over cos( \theta )r+b }$$
$$= - { 2a \over \sqrt { r ^{ 2 } - b^{ 2 } } } tanh ^{-1 } \left ( (b-r)tan( \theta / 2 ) \over \sqrt { r ^{ 2 } - b^{ 2 } } \right ) | _{ ? } ^ { ? }$$

$$= { 2a \over \sqrt { r ^{ 2 } - b^{ 2 } } } tanh ^{-1 } \left ( \sqrt { r-b \over r+b } tan( \theta / 2 ) \right ) | _{ ? } ^ { ? }$$

However, there are three regions to consider in order to be able to compute the integral everywhere:

Region 1: $r \in [ 0 , b ]$

Region 2: $r \in ( b , \infty )$ and $\theta \in ( 0, \theta _{ d } )$

Region 3: $r \in ( b , \infty )$ and $\theta \in ( \theta _{ d }, \pi )$

Where $\theta _{ d } = cos ^{ -1 } ( -{ b \over r } )$ is the angle of the first f() discontinuity when it exists.
As a convenience, set $t _{d} = tan \left ( { cos ^{ -1 } (- { b \over r } ) \over 2} \right ) = \sqrt { r+b \over r-b }$

In Region 1:
Note, Focusing on the jump discontinuity found at $\pi$.

$$\int \limits _{0}^{ 2 \pi } f( \theta ) = - { 2a \over \sqrt { r ^{ 2 } - b^{ 2 } } } tanh ^{-1 } \left ( { (b-r)tan( \theta / 2 ) \over \sqrt { r ^{ 2 } - b^{ 2 } } } \right ) | _{ \pi ^{ + } } ^ { \pi ^ { - } }$$

$$= i { 2a \over \sqrt { b ^{ 2 } - r^{ 2 } } } tanh ^{-1 } \left ( -i { (b-r)tan( \theta / 2 ) \over \sqrt { b^{ 2 } - r ^{ 2 } } } \right ) | _{ \pi ^{ + } } ^ { \pi ^ { - } }$$

Identity: $\tan ^{ -1 } (x) = \tanh ^ {-1} (-ix)i$

$$= { 2a \over \sqrt { b ^{ 2 } - r^{ 2 } } } tan ^{-1 } \left ( { (b-r)tan( \theta / 2 ) \over \sqrt { b^{ 2 } - r ^{ 2 } } } \right ) | _{ \pi ^{ + } } ^ { \pi ^ { - } }$$

$$= { 2 \pi a \over \sqrt { b ^{ 2 } - r^{ 2 } } }$$

Regions 2 and 3: Because of symmetry around $\theta = \pi$, computing the area for $\theta \in [ 0 , \pi ]$ gives exactly half the total desired area. In Region 2 (negative values of f()) , the anti-derivative is purely real and presents no difficulty, but in Region 3 (positive values of f() ) , the anti-derivative is complex because the hyperbolic arc-tangent receives values above one.
But, Notice if $|x| \in (1,\infty)$ then $tanh ^{ -1 } ( x ) = { ln( 1+x ) - ln (x-1) \over 2 } + i { \pi \over 2 }$, which has a constant imaginary. Hence, computing a definite integral will always cancel the imaginary out. Taking a numerical derivative of the integral produced in this way, gives the correct values -- validating that the imaginary constant may be ignored.

Since the real anti-derivatives at the extreme ends of the regions of interest $\theta \in [0, \theta _{d}] \cup [\theta _{ d}, \pi ]$ are both exactly 0; The only part contributing to the area, then, is the *difference* in the left and right handed limits at $\theta = \theta _{d}$. To find this area:

set $z = \sqrt { r-b \over r+b } tan( { \theta \over 2 } )$ and notice z=1 at the discontinuity point.
So, I'd like to compute the area measured with a small distance omitted around the discontinuity point.

$\int \limits _{0}^{ \theta_d - d ^{ - } } { a \over cos( \theta )r+b } + \int \limits _{\theta_d + d ^{ + } }^{ 2 \pi } { a \over cos( \theta )r+b }$

$={ 2a \over \sqrt { r ^{2} - b ^{2} } } \left ( \lim \limits _{ z \to 1 ^{ - } } tanh ^{ -1 } (z) - \lim \limits _{ z \to 1 ^{ + } } tanh ^{ -1 } (z) \right )$

$={ 2a \over \sqrt { r ^{2} - b ^{2} } } \left ( \lim \limits _{ z \to 1 ^{ - } } \left ( { ln(1+z) - ln(1-z) \over 2 } \right ) - \lim \limits _{ z \to 1 ^{ + } } \left ( { ln(1+z) - ln(z-1) + i \pi \over 2 } - i{ \pi \over 2 } \right ) \right )$

It doesn't matter what the size of the omission is, so long as they both sides shrink to nothing together. So, I'm going to do a change of variables for the right hand limit, and replace z with 2 - z; allowing the omission to shrink to nothing simultaneously.

$$={ 2a \over \sqrt { r ^{2} - b ^{2} } } \left ( \lim \limits _{ z \to 1 ^{ - } } \left ( { ln(1+z) - ln(1-z) \over 2 } - { ln(1+ (2-z) ) - ln( (2-z)-1) \over 2 } \right ) \right )$$

$$={ a \over \sqrt { r ^{2} - b ^{2} } } \left ( \lim \limits _{ z \to 1 ^{ - } } \left ( ln(1+z) - ln(1-z) - ln(3-z) + ln( 1-z ) \right ) \right ) = 0$$

I am not sure if the way I did this is formal enough to be convincing to the general reader; so if I need to L'Hospitals, or something similar -- I'd appreciate anyone suggesting how to set that up properly. (also, if I need to do the same for the log() problem earlier, which could be solved using the same logic, I'd like to know that also...)

The important result is that *ONLY* Region 1 affects the mean. All Regions 2 and 3, at each radius, cancel to zero. I can't actually see why this is true by visualization or symmetry -- but the math does work out -- and simulations do agree.

Continuing the original calculation, then:

$$\mu = \int \limits _{ 0 } ^{ \infty } { 1 \over 2 \pi } { 2 \pi a \over \sqrt { b ^{ 2 } - r^{ 2 } } } re^{-{ 1 \over 2}r^2} dr$$

Since the area for $r>b$ is already known to be zero, a reduced limits integral is sufficient.

$$\mu = a \int \limits _{ 0 } ^{ b } { r \over \sqrt { b ^{ 2 } - r^{ 2 } } } e^{-{ 1 \over 2}r^2} dr$$

$$\mu = a \sqrt { \pi \over 2 } e ^{ - { b ^{ 2 } \over 2 } } \times erfi \left ( { b \over \sqrt { 2 } } \right )$$

Q.E.D.

What I have discovered.
1. Change of variables naturally separates the numerator into a N(0,1), and a constant a.
2. All division problems with std=1, can be reduced to computing 1/N(b,1) and scaling by a.
3. There is a definite mean for every ratio of a and b
4. Although the Cauchy distribution a=0,b=0 lies on a boundary that I don't believe is computed directly, none the less -- the limit of the solution clearly goes to zero at b=0 at very least, as the limit.

Last edited: Jul 31, 2012
3. Jul 31, 2012

### chiro

Hey andrewr.

I'm curious for one part involving $$\frac{a}{sin(θ)r+b}$$ where you change the sin(θ) to a cos(θ): can you explain how you justify this step in a little more detail?

4. Jul 31, 2012

### DrDu

Change integration variable from $\theta \to \theta-\pi/2$.

5. Jul 31, 2012

### chiro

Yes, but the limits remain unchanged in the above post. Under a change like that, you would update the limits as -pi/2 to 3pi/2 and they have not changed. I'll wait for the OP's response.

6. Jul 31, 2012

### haruspex

It's the integral around a complete 2pi either way. Does it matter where it starts/ends?

7. Jul 31, 2012

### chiro

Don't worry I just realized what is going on (for some reason I was considering that the r variable was in some way dependent in the double integral).

8. Aug 1, 2012

### Stephen Tashi

(I've changed the letters used in the notation. His z and w have become our X and Y. He deals with possibly correlated normal random variables. I assume his $\rho = 0$.)

Let X and Y be uncorrelated Normal random variables. Using the usual notation for their means and variances let

$r = \frac{\sigma_Y}{\sigma_X}$

$a = \frac{\mu_X}{\sigma_X}$

$b = \frac{\mu_Y}{\sigma_Y}$

$\frac{X}{Y}$ is distributed as $\frac{1}{r} \frac{a + N_x}{b + N_y}$
Where $N_x,N_y$ are independent standard Normal random variables (mean 0, variance 1).

In section 4, he says that none of the moments of
$T = \frac{a + N_x}{b + N_y}$ exist.

If we restrict ourselves to the portion of the distribution where the denominator is greater than -4, he offers this approximation for the mean of T:

$\mu_T = \frac{a}{1.01 b - .2713}$

Since this is the math section of the forum, it doesn't do any good to claim that non-existing mathematical things really exist. If you want to talk about a thing that embodies some of the inutuitive aspsects of a mean then you should call it a pseudo-mean and (eventually) define it precisely.

9. Aug 1, 2012

### andrewr

Hi Stephen, Yes -- I read the exact same thing in the marsiglia paper; which means that my proof must have a mistake in it ? Would you please locate the mistake? (I am asking the general community, and so far -- no complaints...)

In all locations where the integral has the potential to produce infinite area, there was (for computing the mean, only) a portion which *exactly* balances it in the negative direction.
The Cauchy -- as a distribuition -- is symmetrical; That alone is enough to show the same point -- the variance is a different issue. (That doesn't exist!).

If there is a mathematical reason that I need to call it the "mode"or something else, I'd like to understand that now -- in terms of *what* I am looking for when doing integration -- before progressing on to the pseudo-higher moments. eg: What step of my derivation is illegitimate ?

Thanks!

10. Aug 1, 2012

### Stephen Tashi

+
I hope the general comunity is the one that wades through all that.

The mean of a probability density f(x) defined on the real number line is $\int_{-\infty}^{\infty} x f(x) dx$ and I'm sure we can look up a definition that tells us what it means for such an "improper" integral to exist. (There is nothing in the definition of mean value that defines it in terms of balancing things out etc.) You are taking the view that if $f(-x) = -f(x)$ for $x \neq 0$ then the integral for mean value must exist. This is not true. Look at the various types of Cauchy principal value integrals (http://en.wikipedia.org/wiki/Cauchy_principal_value). That's what you are effectively doing. So you are defining a pseudo-mean value in terms of that type of integral.

11. Aug 2, 2012

### andrewr

They clearly have, and commented on it... see previous posts!

Not in words, but anti-symmetric integrals are routinely held to be zero. This is standard operating procedure in Engineering, for example,when doing Fourier series. Besides, the very page you quote did the exact same thing as I did...

I am taking the view that all samples of finite values are equal and opposite; so they *must* add up to zero; yes. I took a limit where the function went infinite to see if it canceled, and it did...

This function isn't one that switches between 0 and 1 infinitely many times, or anything.
As far as I know, it's compactly supported; The tangent, even at the discontinuity, is the same for both sides of the limit --- (eg: This is also true for 1/x , which is a hyperbola -- and historically that's the very poster equation of smoothness from geometry!). So, it has the qualification the Wikipedia article requires... am I mistaken?

I am looking at the page, and it shows exactly what I did in the derivation; I took a limit, and it produced a single number. According to that page, what I have is a Cauchy principle value which is a finite number. So, what's wrong with that? I don't get why that means it doesn't exist.

It acts like a mean, it numerically simulates like a mean... but we need to call it a duck or something?

In undergraduate calculus classes, I was taught to do this very thing whenever "point" discontinuities happened -- and I was never told the answer didn't exist.

I'm not trying to be a jerk -- I just don't understand the complaint.

12. Aug 2, 2012

### Stephen Tashi

It doesn't matter what is routinely done in engineering. If you're going to talk math, you'll have to use the definitions mathematicians use.

The complaint is about the proper use of mathematical terminology. What you are doing is interesting, but you'll undermine your credibility if you declare that you know what the mean value of a random variable is and that all the mathematical statistics experts are wrong.

A good place to discuss the existence of improper integrals would be in the Calculus & Analysis section of the forum. Of course, the existence of such integrals is germane to what you are doing, but I think a lot of the Calculus and Analysis experts don't like to read the statistics threads.

13. Aug 2, 2012

### chiro

I'm going to check out the paper referenced by Stephen Tashi above, but as a question for andrewr: have you looked at the paper and if so do you agree/disagree (and if you disagree, then with what part)?

14. Aug 2, 2012

### Stephen Tashi

I think the only issue between andrewr and Marsiglia the queston:

When statisticians define the mean of the density $f(x)$ to be $\int_{-\infty}^{\infty} x f(x) dx$ does this imply that we are to use the Cauchy principal value of the integral when the ordinary improper integral doesn't exist?

The answer to that is no according to mathematical statistics, which is why texts say that the mean of the Cauchy distribution doesn't exist. Andrewr says it does. I suggest that andrewr invent a new term for what he is computing.

When you use the Cauchy principal value to define integrals, you lose important properties of ordinary integration. So it's hard (for me, at least) to scrutinize a long calculation that does this.

15. Aug 2, 2012

### andrewr

I'm going to give a thorough answer -- if you want shorter ones in the future, just let me know. I aim to please.

----

With what he said under the particular conditions he said it, I agree; but there were many things he left unsaid and several which must be examined contextually; Let me demonstrate the circumspect way he talks about the problem.

In the Abstract, and in the text body it is said:

Nowhere in the paper does Marsiglia explicitly say he agrees with the theory -- rather, he says several things that leads me to suspect his attitude is ambivalent.

Consider:
pp.2
In theory, there *are* no ratios of Gaussians where the denominator is never zero. Hence, in many places -- Marsaglia is ignoring theory or is implying a probable outcome of an experiment by "conditioning".

The most serious objection is in section 4, which Stephen Tashi quotes -- BUT again, he says -- "Yet practical applications, in which the denominator ... is not expected to approach zero"

Here is not talking about conditioning, but just expecting ?
But that's nonsense in the case of mu=4 and sigma=1, ZERO is at the four sigma mark! It isn't nonsense when numeric integration or sampling is involved and boundary conditions can guarantee the zero is missed....

And throughout the whole discussion -- notice: *Marsaglia* never changes the word "mean" into "quasi-mean" in his paper; rather he says earlier in the document
DIFFICULT???! (Impossible theoretically!)

So, it is traditional to call these non-existing means -- "means";

( Please Note: It is highly advisable to choose an estimator of the mean of samples which is traditionally less efficient than the direct application of the simple arithmetic mean; one that is resistant to outliers is highly advisable. Truncation of data, however, is *not* necessary. )

Again, Let's actually read Marsaglia's remark in section 4 -- as a lawyer (which all mathematicians ought to be...)

Marsaglia says:
But Marsaglia is quite tricky to disagree with here. He doesn't say all integrals are infinite -- and if the mean is zero, then that moment isn't an infinite integral -- so clearly he's right either way. Again, if he meant all "the" integrals which followed -- he ought to have used at least the definite article "the" integrals; Infinity does not exist as a real number, and a definite number is not infinity. He can't go wrong! (real is not super-real)

(Not to mention that there is no way to tell if he means the numeric integral, or if he has ever even considered a Cauchy value- or ????? ... Perhaps he quit early and went home that day.)

With this in mind, re-read my OP comment.

I am simply following Marsiglia's notation as best I understand it.

I loved Marsaglia's words "in the sense", that's why I mimmic'd him;
He calls what he is seeking a mean; and at the same time doesn't dispute that there is no mean in some undefined sense or another.

Marsaglia is not concerned with finding the most exacting formula, anyway. He is only interested in finding a simple computational approximation; His entire paper is about an empirical approach, and crude approximations. (even at 50 digits, only using 2 points guaranteed inaccuracy) My approach is interested in more accuracy and a closer examination of the theory.

In Algebra, there are problems where symbolic variables can hide the fact that one is doing a divide by zero; How one approaches the problem, then, can affect whether the result is incorrect in one location or in all locations. So, that, if one attempts a problem in a certain way -- there is no valid result. But, if one separates the problem or refuses to do certain steps in a certain order -- there *can* be a result. .

In the problem I posed and attempted, I didn't actually solve the problem with a Gaussian in the numerator.

Rather, I found the problem nicely separated into two parts by about the fourth step of the derivation -- eg: one which is a Cauchy, one of which is not. (See opening posts graph, I explicitly noted that.) And I left the Cauchy alone, assuming the symmetry argument was enough to show it could at most randomly affect the mean in actual experiments and could not produce any *consistent* bias toward either positive or negative infinity ( I am speaking about the mean, only ).

There are two other (minor) reasons to read Marsaglia with care; He notes that his work was associated with Military installations of Radar at one time (Cold war). Much of this work was classified, and in spite of theory -- not everything that is classified is declassified in a timely manner... (Bureaucracy is king!). He may not be telling everything he actually knows -- and there is evidence of this.

(2nd and in explanation), Marsaglia makes an explicit allusion in his article to a conflict between him and another mathematician, namely Hinkley, which caused him to avoid including things out of "embarrassment" earlier. Hinkley, note, is WRONG according to Marsaglia;
Nowhere does Marsaglia indicate he is giving the full account now;

With this in mind, notice who is at the forefront of characterizing the distribution of ratio's of Gaussians? Hinkley is the one cited on the Wikipedia article given to me by Stephen Tashi for background information. I am open, of course, to someone giving me more detailed documentation -- but the quality so far isn't solving the problem.

Regarding Marsaglia, Given the context of his document -- I have no reason to disagree with him. So the simple answer is "yes" I agree with him as I understand him.

Thanks for the question, Chiro. I'm glad to have clarified some of that.
Also, I took your earlier remark about the change of integration limits to heart -- and will try to change them even when it doesn't matter in the future. I do want to make my work easier to read -- and not just correct.

Last edited: Aug 2, 2012
16. Aug 2, 2012

### andrewr

I don't know what texts you refer to;
I would like a link to a reputable statistics site that says one can't use a Cauchy principle *number*, ever, and goes into detail about why. There is a difference between saying statisticians must not use a Cauchy principle value, and saying that it is not always reliable to use one.

Suggestion noted:

Marsiglia, whom's paper you gave to me, used the word "mean" for something which is not a mean.
Hence, it is traditional to call it a "mean" -- but if it helps -- I would be happy to call it a "mean sub duck" to distinguish it.

$$\mu _{duck}$$

I wouldn't know how that issue affects the question at hand. Do you have a link to these properties? I'm willing to study it.

My Engineering professor (as all who teach Fourier analysis, PDE's, ODE's, etc.) had his doctorate in Mathematics and a minor in Engineering. Engineers don't instrinsically fail to use mathematics properly. The degrees are required for teachers; and students had to use standard notation and terms.

But in any event, none of these mathematics profs, ever made mention that these integrals always invalidated a numerical result; However, on more reflection and review of notes, they did mention something of a caveat:

Sometimes an equation or system has more than one solution; in such a case, we are forced to examine the equation to see if some solutions are spurious or if they are valid. ​

Now, I myself am thinking: In the case of *any* sampling distribution, with real numbers as the only accepted data points, Given N points the arithmetic mean is finite.
This must be true: $|\mu _{x}| \le {\rm max} ( |x _i| )$; AND; The same is still true when n is taken as a limit toward infinity; the mean will still be a finite number.

In the Case of the Cauchy, then, considering the distribution is historically a "limiting" PDF (eg: Just as Binomial in the limit infinite is *defined* to be Gaussian, etc.) -- the mean of any such (real number) distribution is still finite.

So, I know we can eliminate any integral methods that produce infinite means.
But -- That's the typical answer given on sites as to why the mean *doesn't exist;
However, such reasoning isn't about a real distribution who's limit is the Cauchy -- it's about the indeterminacy of the integration method.

I found this very helpful; counterexamples, Cauchy has no mean.

There are a couple of things that are bothering me, though: first, the Cauchy integral has a CDF; and that alone suggests we can calculate a worst case deviation from 0 that the mean could have in a probabilistic way.

Secondly: There obviously must be a reason that numerical experiment agrees with the Cauchy Principal value so closely....

Something important is clearly being overlooked.
I'll think about it a while before making any rash statements.

Last edited: Aug 3, 2012
17. Aug 3, 2012

### Stephen Tashi

Reputable math texts don't bother to mention all the things that are NOT true about a mathematical topic. Engineering texts may have their own conventions about integrals, but "reputable" math texts that talk about integrals of real valued functions are referring to Riemann or Lebesgue integration, not Cauchy principal value integration. If they wish to treat the Cauchy principal value type integration, they would feel obligated to mention it explicitly.

Marsiglia provides results for the case when the denominator is conditioned by a restriction that prevents it from being zero. The standard interpretation of that would be that he replaces the distribution in the denominator by another distribution, a "conditional" distribution. I see nowhere in the paper that he claims that the moments exist without that replacement.

You aren't going to get around the fact that by the standard definition of the mean of a distribution, the mean value of the ratio of gaussian distributions doesn't exist. Talking about what happens in numerical approximations and simulations doesn't weasel around it. If numerical approximations and simulations don't reproduce a proven theoretical result, it shows the numerical approximations and simualtions are inadequate, not that the theoretical result is wrong. The legalistic nature of mathematics is price one must pay for keeping statements unambiguous and not proving claims that are false.

A problem with integrating the ratio of gaussians is not merely that the denominator might be exactly zero. There is no bound to the ratio even if we exclude the possiblity that the denominator is exactly zero. If you do a numerical approximation method that establishes a bound for the ratio, it's the numerical approximation that's wrong. If you observe that denominators very close to zero never showed up in a simulation then it didn't explore all the possibilties of a gaussian denominator.

I don't understand your resistance to stating your results in terms of conditional gaussian distributions. Suppose we are using Cauchy principal value integration and computing the duck mean of a density $f(x)$

duck_mean $= lim_{h \rightarrow 0+} ( \int_{-\infty}^{a-h} x f(x) dx + \int_{a+h}^{\infty} x f(x) dx )$

For $h > 0$ define $K(h) = \int_{-\infty}^{a-h} f(x) dx + \int_{a+h}^{\infty} f(x) dx$

Define the family of probability densities $g(h,x)$ by
$g(h,x) =\frac{ f(x)}{K(h)}$ for $x < a-h$ or $a+h < x$
and
$g(h,x) = 0$ otherwise

The the duck mean is also equal to

duck mean $= \lim_{h \rightarrow 0+}$ (mean of $g(h,x)$)

so the duck mean could be written as the limit of means.

It's no suprise in mathematics when a limit of things of one type is not of the same type. ( For example, the limit of a sequence of continuous functions need not be a continuous function. The limit of a sequence of rational numbers need not be a rational number etc. ) So it's OK when the limit of means isn't a mean. You can always say the duck mean is infinitely close to an actual mean if that would make the NIST happy.

I don't have a reference that tells how to employ common integration techniques with Cauchy principal value integration. I'm also curious about that. I started a thread about it in the Calculus & Analysis section.

Ask your engineering professor about whether the "default" interpretation of integration in a mathematics text includes Cauchy prinicpal value integration. Ask him if the online documents and internet strangers who say the Cauchy distribution has no mean are wrong.

.

Last edited: Aug 3, 2012
18. Aug 3, 2012

### andrewr

Chiro,

I made an out of context remark regarding Marsaglia. He is quite justified in going to the trouble of using a 50 digit number -- My statement that his approximations were "crude" was based on the general notion of approximation vs. exact integrals...-- I didn't mean to give an impression of his work being garbage, and after re-reading my quick remark; I just realized you might have taken my comment that way.

Until I actually give my own approximation, I have no idea whether or not I can trump his.

Cheers.
--Andrew.

19. Aug 3, 2012

### andrewr

I don't expect that Engineering texts are different. In the derivation I gave, I specifically asked regarding the appropriateness of the way I did the limit; My point is that Cauchy principal value did come up in a problem regarding Fourier transforms, and we were to use it under a certain set of conditions.

If you go back to the earlier thread from which this branched out, and you look for Chiro' giving me a suggestion on computing the integral -- That's where Fourier came into the discussion...

Yes, and his particular restriction seem arbitrary.

I realize that; But I also need to understand how real means which do exist -- for samples -- are related to none for the distribution as a whole.

What is the theoretical result? eg: Infinite is not the THEORETICAL result even though Marsaglia mentions integrals going infinite (he is perhaps numerically speaking, or half integral speaking?)

All distributions using Gaussian randoms implicitly come from a limiting approach based on samples. The Gaussian was discovered by asking the question, if a measurement (discrete) is repeated what is the probability of the error.

To answer the question; Gauss took a limit for infinite repetitions, and the bell curve was arrived at.

There is, then, in each distribution (& Cauchy) -- an implicit limit going from the discrete/finite to the continuous.

If the continuous integrals for the mean had always diverged toward positive infinity or negative infinity, there would be no question of what theory said/predicted... But the moment theory can arrive at infinity minus infinity -- that isn't a result of any kind. If that is true, then an estimate of 0 is as good as any other estimate -- they are all *equally* bad.

BUT: In a real experiment, there is going to be a sampling granularity -- and an actual finite value for the mean.

When I hear the words that the integral does not converge, then I understand The QUESTION was somehow asked wrong -- and the boundary conditions need to be looked at to decide why.

Bravo -- I'll clap my hands. I think that's wonderful.

Simulations are a way to look for problems, and inconsistencies -- because they indicate the results a real measurement would report. The problem may be resolved in many different ways -- without changing the theory itself.

But, I need *quantitative* models of how simulation of Cauchy distribution using discrete samples is going to be different from the continuous case.

I agree that removing the zero point is insufficient. Your analysis, here, however isn't clear to me yet. Since Cauchy is a limit of a ratio of binomials taken to n-->infinity; The correlation between the character of the discrete distribution -- and the continuous one changed it's properties somewhere -- and that bothers me.

Of course! I think you have gotten the wrong impression about my numerical simulations. My apology belatedly...

I think (intuition) what we're missing is the idea of a confidence interval. How likely an experiment will "hit" the div/zero discontinuity close enough to disturb an otherwise tranquil mean...

(Don't answer that! it's in English -- and I need to quantify it)

You're just moving through the problem more quickly than I can possibly do;
I am looking at your equations, and I don't find them objectionable -- but I'm not sure they give the same result that I have... (I'll comment later).

The question, to me, is not about whether or not we condition the distributions -- the question is; how, and on what basis, and why.

As I said in the OP -- I'll have to explain some of this later in the thread -- (I'm working on it, hard...)

When I wrote the OP -- I was expecting corrections inside the derivation, your approach is unexpected, and that's difficult for me to adjust to. I'm trying.

Last edited: Aug 3, 2012
20. Aug 3, 2012

### andrewr

Yes, but earlier you made a strong impression on me that a distribution may not be called the same if it were modified in any way. Thus, the title of Marsaglias's paper (for this reason) appeared to me to be puporting something to be a Gaussian -- which is actually not a Gaussian, and computing the mean of a ratio of ????.

In contrast, if I now take this new view you are giving:
I implicitly gave the information about how the Gaussian's were conditioned in my derivation, even if I didn't know I was doing it.

At most it's a notational issue... and I asked people for help about formalizing the derivation; intentionally attempting to uncover issues like this one, directly.

Hindsight, of course... Hopefully, tomorrow will be better.

Last edited: Aug 3, 2012
21. Aug 3, 2012

### andrewr

Stephen, I think this will be a simple idea...

The Cauchy has a CDF of ${1 \over 2} + {1 \over \pi} tan ^{ -1 } ( x )$;
When estimating a mean, I might think like this: (assume purely positive numbers).

The mean must be between the lowest value in the sample and the highest; If I assume the highest value takes the whole weight of the average (n samples times highest value / n), Then the mean must be lower or equal to that value. Hence, I can ask -- what is the greatest distance from 0 that the mean could be?

90% of the time, the largest sample will be no higher than: 6.31376; hence the weight of that is 5.1... ( 0.9 * 6.313.. )

99% of the time, the largest sample will be no more than 63.65675; hence the weight of that is 5.72... ( 63.65675 * 0.09 )

99.9% of the time -- 636.61925 and weight = 5.72... (636 * 0.009)

So, I can expect with some confidence -- that a mean of no more than 18 will be computed at least a certain definite percentage of the time; depending on the number of samples taken. Hence, it doesn't seem right to say the mean is totally unbounded -- but there must be some kind of relationship between sample size and a typical mean.

If there were no rhyme nor reason to the mean, all values would be equally likely. But that's not the case...

Last edited: Aug 3, 2012
22. Aug 3, 2012

### chiro

Basically, it looks what you are doing is talking about a different distribution with each level of 'confidence'.

One suggestion I have is that for the X/Y problem, you should modify Y so that you exclude a region of a neighbourhood around 0 (i.e. you censor this region where P(-e < X < e) = 0 for some epsilon e) and then recompute the density function.

The idea of using a pure Gaussian for the denominator, even for something like NIST is absolutely stupid and if they want to use X/Y without any modification, then they are going to deal with the case of no moments existing.

Since all you are doing is effectively changing the distribution for each level of confidence, you are probably IMO better off in just creating the distribution you intended and then calculating the mean in the way that it is calculated rather than trying to fudge the calculation of the mean for a distribution where it does not exist.

This way, you'll keep to the definitions (which are there for a reason because they work both theoretically and practically) and you will be able to clarify your assumptions by the nature of the definition of the actual distribution (for example censor the region around 0 is due to getting rid of dividing by numbers close to zero). (Also remember if you do censoring, you have to normalize the distribution to make sure it integrates to 1).

23. Aug 4, 2012

### andrewr

Believe me -- I'd love to throw it out. :rofl:
I'm not a glutton for punishment...

But consider a realistic case; There is a wall 2000(1) mm away. Divide that distance by a *SLOPPY* 1 meter scale 1000(1) mm long. What are the results?

Well, the result is obviously about 2 (on mean).
However, this still factors into a form which Marsaglia treats as having a Cauchy.
( I forget exactly what he said about that -- my eyes glazed over...)

N(2000,1) / N(1000,1) =
(2000 + N(0,1)) / (1000 + N(0,1)) =
=2000/(1000+N(0,1)) + N(0,1)/(1000+N(0,1))

My solution so far, is just for the first part -- and even that is supposedly invalid because I used a cauchy principle value...

Yet the second part is the only thing really Cauchy distribution like in the problem, so I assume that's a Cauchy with the mode offset in one direction or the other....

But in any event, it's not possible to avoid having that second part in the equation -- and if it's mean can be anything -- then the sum of the two means, can be anything -- and well, the *very* practical problem has just become theoretically impossible when ACTUALLY using the theory and not faking it.

I find that quite perplexing. Gosh! the odds of hitting the zero point are 1 in 1000 sigmas. It *aint* gonna happen... BUT -- it's still a Cauchy?!!!!!

Now, let's talk about the choice of "theory". There is no reason both of these lengths couldn't be repetitively measured over and over -- so a Gaussian is the most appropriate distribution.

But once we do a ratio, we are going to have a Cauchy and a 1/Gaussian.

WOW.

I have a formula that simulates well -- if I just knew what the distribution was that made that shape, I'd be done. Funny, I have an answer looking for a distribution...

Ummm ... I'd like to see that practical part...

But,
Yes, that sounds possible -- although I need to start with the confidence interval: eg: I need someone to be able to tell me they want my result 99.9% certain, and then I need to compute how much of the zero divide to censor... THEN, I can do it.

I'm going to sleep on it tonight... Dunno....

Last edited: Aug 4, 2012
24. Aug 4, 2012

### chiro

For the practical part, the first thing to focus on is getting the distribution for 1/X where X is censored and then look at Y/X after you get the censored distribution for 1/X.

It's best if you leave the 1/X distribution in terms of the e mentioned above so that later you can see how this e effects the calculation of the mean of Y/X: this solves your problem of analyticity and you can use this to compare how many standard deviations you need to get a mean of a particular value, but looking at how the epsilon affects the final calculation of E[Y/X] where Y is Gaussian and 1/X is the transformation of the inverse of your censored distribution.

You can simulate this extremely easily by using a method to simulate from a censored distribution (an MCMC approach will do this) and then simply simulating from the Gaussian giving a simulation for Y/X.

The assumption of censoring is one that can quantified in the context of more general assumptions in the domain (i.e. engineering) by considering the nature of what is being calculated (i.e. scales of things, what these things are) in relation to the epsilon used in the censoring process.

I think that the above suggestion will help you not only derive a distribution and ultimately a mean using censorship around 0 for the denominator RV, but also to actually quantify the characteristics and how the epsilon changes the value of not only the mean, but also the other moments as well.

25. Aug 4, 2012

### Stephen Tashi

As a very general observation (general enough to apply to the whole of mathematical society, not particularly to yourself), there is always a mental contest between formal mathematics vs the philosophy of mathematics that I would called Mathematical Platonism. The Wikipedia deistinguishes quite a number of species of Mathematical Platonism, but to me, the common element in this philosophy is the belief that things with mathematicl definitions have a reality that exists apart from the definition. In your particular case, you believe that the concept of "mean value of a distribution" has a reality beyond the formal definition, so you allow yourself to reason about this reality and reach conclusions based on your private vision of it. I think almost everybody does this to some degree.

Sometimes Mathematical Platonism leads nowhere. For example, if you look at threads on the forum that are inviitations to Mathematical Platonists, such as "Is multiplication repeated addition?", "Is dy/dx a ratio?", you find that many of the posts with a Platonic slant are opinionated and unimaginative. But sometimes you do find Platonic outlooks that are very helpful intuitive ways to think about mathematical ideas.

Physicists and engineers often take the Platonic view of mathematics and I suspsect the reason that physics and engineering are able to cruise along with the Platonists on board is that most concepts they deal with don't depend on a legalistic and precise application of logic. On the other hand, mathematics gets into a mess if it tries to develop results based on Platonic arguments. There are simply too many different contradictory private concepts of things like "limits", "infinity", "probability" etc. among human beings. The only way to arrive at definite results is to have formal definitions and develop arguments based on those definitions, not people's private visions of what things are.

I don't want to discourage you from Platonic reasoning. I just want to make the point that whatever conclusions you reach by that reasoning have to reconciled with the formal mathematical definitions and presented in those terms in order for them to be accepted as mathematics.

----

As to the observations about the sample mean:

The sample mean of the Cauchy distribution is a statistic that does have a distribution. For a sample size of 1, it is obviously just the Cauchy distribution. It's an interesting question what the distribution is for larger size samples. The Central Limit Theorem (that the mean of an independent samples of size n is approximately normally distributed for large n) doesn't apply to the Cauchy distribution since that theorem requires that the distribution being sampled have a (finite) variance.

(This sets me wondering about such things as: Are their distributions whose k-th moment doesn't exist, but the k-th moment of the sample distribution (for sample size > 1) exists? Are their distributions whose k-th moment doesn't exist, but such that the kth-moment of the limiting distribution of their sampling distribution (as sample size approches infinity) exists. If no kind soul happens to tell me, I may start a thread with such questions someday.)

The non-existence of the mean of Cauchy distribution involves (according the formal definition) the non-existence of a integral that is done using the distribution. Thus the entire distribution is considered when doing the integration. The fact that a particular large value of a Cauchy random variable is unlikely in a sample doesn't mean that you can leave that value out when you do the integral. The problem with the (formal) existence of the integral depends on how you define integrals (Riemann vs Lebesgue - either way, therei's a problem). Again, it's the people who frequent the Calculus & Analysis section can probably give an authoritative answer about that.

I'm sure you've studied integrals invovling infinite limits and various theorems about when they exist or not. Some function die-off quick enough so that the integral from 0 to infinity exists, other's die-off but not quickly enough. That's the type of thing involved in the integral for the mean of the Cauchy. In the integral for the mean of the ratio of Gaussians, the problem is that the integrand is unbounded.

I don't think you should give-up on using the Cauchy principal part in your calculations. I merely suggest that you rephrase the claim about what you are calculating. My (Platonic) view is that your are calculating a limit of the means of distributions that are "conditioned" by setting them equal to zero on parts of the real line. (A density f(x) defined on a the real line can be used to define another density g(x) that leaves out intervals of the real line. On the part that is not left out, define the modified density to be g(x) = f(x)(/ 1 - P) where P is the probability of the left-out part. On the part that is left out, you define g(x) = 0.)