How to Estimate the Intrinsic Distribution

In summary, this conversation is about the derivation of the normal distribution and how it is just one distribution in a family of similar distributions. The speaker originally intended to post this in a different forum, but it was suggested to be posted here for feedback and potential collaboration. They also mention their technical writing skills and their interest in recreational mathematics. The main focus is on how the normal distribution can be represented using parameters such as mean and variance, which is not the case for most distributions. The conversation also includes equations and integrals to support the derivation of the intrinsic distribution and the normal distribution.
  • #1
Watts
38
0
The following is crude derivation demonstrating how a distribution such as the normal distribution is simply one distribution

that stems from a family of similar distributions. I originally was going to post this in the new Independent Research forum

but the moderator thought it was better suited to be posted here instead. I am looking for feedback and thoughts from viewers

in this forum. I would like to find someone to help me coauthor a paper on this subject. My technical writing skills aren’t

that great and I don’t claim to be a professional mathematician only a recreational mathematician. I do this stuff for fun. I

am not an academic nor a professor so I am not under the gun to have papers published on a routine basis. I have a lot more

work done than what I have showed here. The intent here also is to show how several common distributions can be manipulated

into a form that contains parameters such as variance and mean. Very few distributions exist that require the parameters mean

and variance. Most distributions are in a form that contains adjustment constants. From a practical perspective this is

insufficient due to the fact that generic constants require a certain amount of trial and error to adjust a distribution to

fit a certain data set. Distributions such as the normal distribution only require the parameters mean and variance. The mean

and variance are easily obtained from a data set.

Derivation of Normal Intrinsic Distribution

From the equation
[itex]\frac{1}{{\sqrt {2 \cdot \pi } \cdot \sigma }} = \sqrt {\frac{1}{{2 \cdot \pi \cdot \sigma ^2 }}} [/itex]
the normal distribution can be written in the form given here.
[itex]P(q) = \frac{1}{{\sqrt {2 \cdot \pi } \cdot \sigma }} \cdot e^{ - \frac{1}{{2 \cdot \sigma ^2 }} \cdot (q - \mu )^2 }

= \sqrt {\frac{1}{{2 \cdot \pi \cdot \sigma ^2 }}} \cdot e^{ - \pi \cdot \left( {\sqrt {\frac{1}{{2 \cdot \pi \cdot

\sigma ^2 }}} } \right)^2 \cdot (q - \mu )^2 } [/itex]
By letting
[itex]P_{q_1 } = \sqrt {\frac{1}{{2 \cdot \pi \cdot \sigma ^2 }}} [/itex]
the normal distribution takes the form.
[itex]P(q) = P_{q_1 } \cdot e^{ - \pi \cdot \left(P_{q_1 }\right)^{2} \cdot (q - \mu )^2 } [/itex]
Using the integral given here
[itex]\alpha = \int\limits_{ - \infty }^\infty {e^{ - x^{2 \cdot k} } dx} = \frac{1}{k} \cdot \Gamma (\frac{1}{{2 \cdot

k}}),k = 1,2,3,...,\infty[/itex]
and evaluating k=1 produces the integral.
[itex]\int\limits_{ - \infty }^\infty {e^{ - q^2 } dq} = \sqrt \pi [/itex]
The relationship between the constant pi and the integral can be seen
[itex]P(q) = P{}_q \cdot e^{ - (\sqrt \pi \cdot P_q )^2 \cdot (q - \mu )^2 } = P_{q_1 } \cdot e^{ - \pi

\cdot\left(P_{q_1}\right)^{2} \cdot (q - \mu )^2 } [/itex]
Multiplying both sides of the unevaluated version of the integral and substituting the solution.
[itex]N\cdot \alpha = N\cdot \int\limits_{ - \infty }^\infty {e^{ - x^{2 \cdot k} } dx} = N\cdot \frac{1}{k} \cdot \Gamma

(\frac{1}{{2 \cdot k}}),k = 1,2,3,...,\infty[/itex]
Produces the equation
[itex]P(q) = P_q \cdot e^{ - \left[ {\frac{1}{k} \cdot \Gamma (\frac{1}{{2 \cdot k}}) \cdot N \cdot P_q (q - \mu )}

\right]^{2 \cdot k} } ,k = 1,2,3,...,\infty . [/itex]
For the case N=1 and [itex]k = 1,2,3,...,\infty[/itex]
[itex]\int\limits_{ - \infty }^\infty {P(q)dq = 1} [/itex]
and for the case N=2 and [itex]k = 1,2,3,...,\infty[/itex]
[itex]\int\limits_{ - \infty }^\infty {P(q)dq = \frac{1}{2}} [/itex]
hence the Intrinsic Distribution can be written below for all cases of [itex]N=1,2,3,...,\infty[/itex]
[itex]P(q) = \sum\limits_{i = 1}^N {P_{q_i } \cdot e^{ - \left[ {\frac{1}{k} \cdot \Gamma (\frac{1}{{2 \cdot k}}) \cdot N

\cdot P_{q_i } (q - q_i )} \right]^{2 \cdot k} } }= \sum\limits_{i = 1}^N {P_{q_i } \cdot e^{ - \left[ {\alpha \cdot N

\cdot P_{q_i } \cdot \left( {q - q_i } \right)} \right]^{2 \cdot k} } } ,k = 1,2,3,...,\infty \int\limits_{ - \infty

}^\infty {P(q)dq} = 1 [/itex]

Derivation of Normal Distribution

Fore the case N=1 and k=1
[itex]P(q) = \sum\limits_{i = 1}^N {P_{q_i } \cdot e^{ - \left[ {\alpha \cdot N \cdot P_{q_i } \cdot \left( {q - q_i }

\right)} \right]^{2 \cdot k} } } ,k = 1,2,3,...,\infty [/itex].
The equation is reduced to.
[itex]P(q) = P_{q_1 } \cdot e^{ - \left(\alpha \cdot P_{q_1 }\right)^{2} \cdot (q - q_1 )^2 } [/itex]
Using the integral
[itex]\alpha = \int\limits_{ - \infty }^\infty {e^{ - x^{2 \cdot k} } dx} = \frac{1}{k} \cdot \Gamma (\frac{1}{{2 \cdot

k}}),k = 1,2,3,...,\infty [/itex]
the equation takes the form.
[itex] P(q) = P_{q_1 } \cdot e^{ - \pi \cdot \left(P_{q_1 }\right)^{2} \cdot (q - \mu )^2 }[/itex]
Using the equation
[itex]\mu = \int\limits_{ - \infty }^\infty {P(q) \cdot q \cdot dq = q_1 } [/itex]
it can be seen that
[itex]P(q) = P_{q_1 } \cdot e^{ - \pi \cdot \left(P_{q_1 }\right)^{2} \cdot (q - \mu )^2 } [/itex]
and from
[itex]\sigma ^2 = \int\limits_{ - \infty }^\infty {P(q) \cdot (q - \mu )^2 \cdot dq} [/itex]
the following table is generated.
[itex]\begin{array}{*{20}c}
{P_{q_1 } = 1} & {\sigma ^2 = \frac{1}{{2 \cdot \pi }}} & {\sigma ^2 = \frac{1}{{2 \cdot 1^2 \cdot \pi }}} \\
{P_{q_1 } = 2} & {\sigma ^2 = \frac{1}{{8 \cdot \pi }}} & {\sigma ^2 = \frac{1}{{2 \cdot 2^2 \cdot \pi }}} \\
{P_{q_1 } = 3} & {\sigma ^2 = \frac{1}{{18 \cdot \pi }}} & {\sigma ^2 = \frac{1}{{2 \cdot 3^2 \cdot \pi }}} \\
{P_{q_1 } = 4} & {\sigma ^2 = \frac{1}{{32 \cdot \pi }}} & {\sigma ^2 = \frac{1}{{2 \cdot 4^2 \cdot \pi }}} \\
{P_{q_1 } = P_{q_1 } } & \Rightarrow & {\sigma ^2 = \frac{1}{{2 \cdot P_{q_1 } ^2 \cdot \pi }}} \\
\end{array} [/itex]
From this table the equation is found.
[itex]\sigma ^2 = \frac{1}{{2 \cdot \pi \cdot (P_{q_1 } )^2 }} [/itex]
Solving the equation for
[itex]P_{q_1 } = \sqrt {\frac{1}{{2 \cdot \pi \cdot \sigma ^2 }}} = \frac{1}{{\sqrt {2 \cdot \pi } \cdot \sigma }}

[/itex]
and substituting into the equation
[itex] P(q) = P_{q_1 } \cdot e^{ - \pi \cdot \left(P_{q_1 }\right)^{2} \cdot (q - \mu )^2 } [/itex]
produces the normal distribution.
[itex]P(q) = \frac{1}{{\sqrt {2 \cdot \pi } \cdot \sigma }} \cdot e^{ - \frac{1}{{2 \cdot \sigma ^2 }} \cdot \left( {q -

\mu } \right){}^2} [/itex]
Multiple distributions can be generated in the like manor from multiple values of k. You can also generate multi modal

distributions from different values of N. For example for N=2 and k=1 a true bimodal normal distrubtion could be generated.

You do not have to be restricted to gausian distributions only. By using the same methods as above you could generate several

different types of Intrinsic Distribution. An example shown here is the Cauchy Intrinsic distribution
[itex]P(q) = \sum\limits_{i = 1}^N {\frac{{P_{q_i } }}{{((\left( {q - q_i } \right) \cdot P_{q_i } \cdot N \cdot \varsigma

)^{2 \cdot k} + 1)^k }}} ,\varsigma = \int\limits_{ - \infty }^\infty {(\frac{1}{{(x^{2 \cdot k} + 1)^k }})} dx =

\frac{1}{k} \cdot \beta (\frac{1}{{2 \cdot k}},\frac{{2 \cdot k^2 - 1}}{{2 \cdot k}}),k = 1,2,3,...,\infty [/itex]
Many different distributions are incomplete and can be finished by putting into a form that contains measurable quantities

such as mean and variance. You can use the same method shown above to complete many different distributions such as the

logistic distribution shown here. The Logistic Distribution typically takes the form
[itex]P(q) = \frac{{e^{ - (q - m)/b} }}{{b \cdot \left[ {1 + e^{ - (q - m)/b} } \right]^2 }} [/itex]
and its distribution function
[itex]D(q) = \frac{1}{{1 + e^{ - (q - m)/b} }} [/itex]
The complete form is
[itex]P(q) = 4 \cdot \sqrt {\frac{{\pi ^2 }}{{\sigma ^2 \cdot 48}}} \cdot \left( {\frac{{e^{\left( {\left( {q - \mu }

\right) \cdot 4 \cdot \sqrt {\frac{{\pi ^2 }}{{\sigma ^2 \cdot 48}}} } \right)} }}{{\left( {1 + e^{\left( {\left( {q - \mu }

\right) \cdot 4 \cdot \sqrt {\frac{{\pi ^2 }}{{\sigma ^2 \cdot 48}}} } \right)} } \right)^2 }}} \right) [/itex]
and its complete distribution function
[itex] D(x) = \frac{1}{2} - \frac{1}{{\left( {1 + e^{\left( {\frac{1}{3} \cdot \left( {q - \mu } \right) \cdot \sqrt 3

\cdot \pi \cdot \sqrt {\frac{1}{{\sigma ^2 }}} } \right)} } \right)}}[/itex]

Thousands of different distributions that have never been seen or studied before can be generated or created using similar

techniques shown as above. An example such as this is the Hyperbolic Distribution below.
[itex]P(q) ={\sqrt {\frac{{\pi ^2 }}{{48^2 \cdot \sigma ^2 }}} } \cdot \cosh (2 \cdot \sqrt {\frac{{\pi ^2 }}{{48^2 \cdot

\sigma ^2 }}} \cdot (q - \mu )^{ - 2} ) [/itex]
Multivariate Intrinsic Distributions are also achievable. The Multivariate Cauchy Distribution is shown below.
[itex]P(q_1 ,q_2 ,q{}_3,...,q_j ) = \prod\limits_{j = 1}^m {\sum\limits_{i = 1}^N {\frac{{P_{q_{i,j} }

}}{{((N^{\frac{1}{m}} \cdot \varsigma \cdot (q_j - q_{i,j} ) \cdot P_{q_{i,j} } )^{2 \cdot k} + 1)^k }}(q_j - q_{i,j} )}

},k = 1,2,3,...,\infty [/itex]

[itex] \varsigma = \int\limits_{ - \infty }^\infty {(\frac{1}{{(x^{2 \cdot k} + 1)^k }})} dx = \frac{1}{k} \cdot \beta

(\frac{1}{{2 \cdot k}},\frac{{2 \cdot k^2 - 1}}{{2 \cdot k}}),k = 1,2,3,...,\infty [/itex]
 
Last edited:
Physics news on Phys.org
  • #2
Would it be fair to say that the main thrust of your work here is to rewrite various probably density functions and cumulative distribution functions in a form such that the free variables are the mean and variance?


And beyond that, you have generalized slightly to an expression that represents a equally weighted combination of the individual distributions?

An example of what I mean by that is your "bimodal" normal distribution: it would correspond to an experiment like:

Flip a coin.
If heads, generate a normal variable with mean 5 and variance 1
If tails, generate a normal variable with mean 3 and variance 2

but not be applicable to

Roll a 6-sided die.
If 1, generate a normal variable with mean 5 and variance 1
If 2-6, generate a normal variable with mean 3 and variance 2


And you've introduced a new parameter (k) that defines some apparent relative of the normal distribution.



Anyways, your opening remarks turned me off somewhat -- I figure it would be good to know the reasons for that.

The parameters to the various distributions were not arbitrarily chosen -- many (all?) are highly convenient parameters, and often the formulae are simpler in the original variables.

For example, I would generally use the chi-squared distribution with n degrees of freedom because I have a theoretical reason for n degrees of freedom, not because I want a distribution with mean n and variance 2n. (Ick, I hope I have them right from memory)

It almost sounds like you're suggesting that the "best" way to select the parameters for a given distribution is to pick the parameters for which the mean and variance are equal to the sample mean and sample variance -- while that sounds good in theory, such idealistic approaches are rarely optimal in the real world. I'll try and remember to ping a statistician at work tomorrow to get a more informed opinion on this. :smile:
 
  • #3
Thoughts

Hurkyl said:
Would it be fair to say that the main thrust of your work here is to rewrite various probably density functions and cumulative distribution functions in a form such that the free variables are the mean and variance?

Not always. I work on allot of other things also (Nonlinear regression, time series analysis and forecasting modeling, statistical physics, and other things).

And beyond that, you have generalized slightly to an expression that represents a equally weighted combination of the individual distributions?

Yes-I have basically have written a distribution as the sum of several other distribution.

An example of what I mean by that is your "bimodal" normal distribution: it would correspond to an experiment like:

Flip a coin.
If heads, generate a normal variable with mean 5 and variance 1
If tails, generate a normal variable with mean 3 and variance 2

Yes

But not be applicable to

Roll a 6-sided die.
If 1, generate a normal variable with mean 5 and variance 1
If 2-6, generate a normal variable with mean 3 and variance 2

Not for this particular case

And you've introduced a new parameter (k) that defines some apparent relative of the normal distribution.

Yes- It was bloody I know but the intent was demonstrate its relevance.

Anyways, your opening remarks turned me off somewhat -- I figure it would be good to know the reasons for that.

I realize that parameters are not arbitrarily chosen. They are the result of some mathematical consequence. n degrees of freedom means something too me but for example the Weibull distribution [itex]P(x) = \alpha \cdot \beta ^{ - \alpha } \cdot x^{\alpha - 1} \cdot e^{ - (x/\beta )^\alpha } [/itex] parameters alpha and beta doesn’t mean too much to me other than they are constants.

The parameters to the various distributions were not arbitrarily chosen -- many (all?) are highly convenient parameters, and often the formulae are simpler in the original variables.

For example, I would generally use the chi-squared distribution with n degrees of freedom because I have a theoretical reason for n degrees of freedom, not because I want a distribution with mean n and variance 2n. (Ick, I hope I have them right from memory)

I agree

It almost sounds like you're suggesting that the "best" way to select the parameters for a given distribution is to pick the parameters for which the mean and variance are equal to the sample mean and sample variance -- while that sounds good in theory, such idealistic approaches are rarely optimal in the real world. I'll try and remember to ping a statistician at work tomorrow to get a more informed opinion on this. :smile:

No- The sample mean and variance are only estimates of the distributions mean and variance.
 
  • #4
Most distributions are in a form that contains adjustment constants. From a practical perspective this is

insufficient due to the fact that generic constants require a certain amount of trial and error to adjust a distribution to

fit a certain data set. Distributions such as the normal distribution only require the parameters mean and variance. The mean

and variance are easily obtained from a data set.

Then I entirely don't understand this passage.

What is the benefit of writing, say, the gamma distribution in terms of its mean and variance, instead of in terms of α and β?

And what is the point of mentioning that the mean and variance are easily obtained from a data set?
 
  • #5
Poor Terminology

Hurkyl

I am sorry for the incorrect usage of the terminology mean and variance. The statement should have read the mean and variance can be estimated from the sample mean and variance. That was my intent of this posting was to have things like that brought to my attention. The gamma distribution typically describes a natural process with waiting times between events that are distributed according to a poisson distribution. If I had a collection of data from a process that I new was best described by the gamma distribution. Would it not be simpler to estimate the mean and variance of the data from its sample mean and sample variance and simple put those numbers into the gamma distribution rather than trying to determine the parameters alpha and beta. I may be missing something here I have not worked with the gamma distribution much but I cannot recall if the parameters alpha and beta can be directly related to data set through some type of mathematical relationship such as an equation. I can estimate the mean from [itex]m = \frac{1}{N} \cdot \sum\limits_i^N {x_i }[/itex] easily but how do you estimate or determine alpha directly through the use of an equation. One last thing it is not always practical to use an estimate of the mean and variance. For an example the Cauchy distribution has an infinite variance. I appreciate your feedback keep it coming.
 
Last edited:
  • #6
What you have "reinvented" is the method of parameter estimation known as "matching of moments."
 

1. What is an intrinsic distribution?

An intrinsic distribution is a statistical term used to describe the inherent or natural distribution of a particular variable within a population. This distribution is not influenced by external factors or interventions, but rather reflects the underlying characteristics of the population itself.

2. How is an intrinsic distribution different from an extrinsic distribution?

An extrinsic distribution, also known as an observed distribution, is the result of external influences such as interventions or experimental manipulations. In contrast, an intrinsic distribution is not affected by these external factors and represents the underlying distribution of the population.

3. What types of variables are commonly represented by intrinsic distributions?

Intrinsic distributions are commonly used to describe continuous variables such as height, weight, and blood pressure, as well as categorical variables such as gender or race. These variables are typically inherent characteristics of an individual and are not easily influenced by external factors.

4. How is an intrinsic distribution represented graphically?

An intrinsic distribution can be represented graphically using a histogram or a probability density function (PDF). These graphs show the frequency or probability of each value occurring within the population, allowing for a visual representation of the shape and characteristics of the distribution.

5. Can an intrinsic distribution change over time?

Intrinsic distributions are generally considered stable and consistent over time, as they represent the inherent characteristics of a population. However, external factors and interventions can potentially alter the distribution, leading to changes over time. In these cases, it is important to carefully consider the effects of these factors to accurately interpret changes in the distribution.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
12
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
14
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
818
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
709
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
855
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
25
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
741
Back
Top