# Sum independent normal variables

• I
Summary
The sum of normally distributed random variables is normal
(I know how to prove it). Prove that a finite sum of of independent normal random variables is normal. I suspect that independence may not be necessary.

## Answers and Replies

Mentor
2021 Award
Independence is necessary. Suppose, for example that ##X_1 \sim \mathcal{N}(\mu=0,\sigma=1)## and ##X_2 = -X_1## then ##X_2 \sim \mathcal{N}(\mu=0,\sigma=1)## but ##X_1+X_2=0## which is not normally distributed.

• WWGD
Independence is necessary. Suppose, for example that ##X_1 \sim \mathcal{N}(\mu=0,\sigma=1)## and ##X_2 = -X_1## then ##X_2 \sim \mathcal{N}(\mu=0,\sigma=1)## but ##X_1+X_2=0## which is not normally distributed.
I was wondering about random variables which are correlated, but with##|\rho|\lt 1## or in general as long as they are linearly in dependent. In that case they could be transformed into a set of independent random variables.

Gold Member
your original claim is true iff they are jointly normally distributed. This is implied by the standard claim that an random n-vector is a multivariate gaussian iff every 1-dimensional marginal one is a normal r.v.

It depends on pedantry, but Dale's counterexample actually isn't one-- it is just a case of a "normal random variable" with mean zero and variance of zero. Not particularly satisfying but it is useful as a vacuous truth.

Gold Member
Iirc ,you can use generating functions for the proof.

BWV
Iirc ,you can use generating functions for the proof.
A proof w/ MGFs would have the same form the characteristic function proof in the link

Are their theorems that deal with the general question? - Given a set of random variables ##\{X_1,X_2,..,X_M\}##, when does their exist some subspace ##S## of the vector space of random variables such that ##S## contains each ##X_i## and ##S## has a basis of mutually independent random variables?

I was wondering about random variables which are correlated, but with##|\rho|\lt 1## or in general as long as they are linearly in dependent.

"Linearly independent" presumably implies we are considering a set of random variables to be a vector space under the operations of multiplication by scalars and addition of the random variables.

In that case they could be transformed into a set of independent random variables.

Suppose we take "transformed" to mean transformed by a linear transformation in a vector space. If the vector space containing the random variables ##{X_1,X_2,...X_M}## has a finite basis ##B_1,B_2,...B_n## consisting of mutually independent random variables then (trivially) for each ##X_k## there exists a possibly non-invertible linear transformation ##T## that transforms some linear combination of the ##B_i## into ##X_k##.

If the smallest vector space containing the ##X_i## is infinite dimensional (e.g the vector space of all measurable functions on the real number line) , I don't know what happens.

I don't recall any texts that focus on vector spaces of random variables. Since the product of random variables is also a random variable, the topic for textbooks seems to be the algebra of random variables. But that approach downplays the concept of probability distributions.

by the CLT, a finite sum of random variables from any distribution with finite variance will converge to normal

On the sum of 2 normals, some proofs in wikipedia

https://en.wikipedia.org/wiki/Sum_of_normally_distributed_random_variables
My original question was specifically for finite sums. The Wiki article gives the answer for a pair of correlated variables which then implies it holds for any finite sum.

BWV
The Wiki article gives the answer for a pair of correlated variables which then implies it holds for any finite sum.
Yes, if that did not work, then all of Modern Portfolio Theory would fail

Staff Emeritus
Gold Member
2021 Award
Are their theorems that deal with the general question? - Given a set of random variables ##\{X_1,X_2,..,X_M\}##, when does their exist some subspace ##S## of the vector space of random variables such that ##S## contains each ##X_i## and ##S## has a basis of mutually independent random variables?

I'm not sure this is the right question. Given a single random variable, it plus itself is just 2 time itself, since you're not adding two independent copies of itself. So any one dimensional subspace for example.

If you are wondering about sets of random variables that are stable under adding independent copies, then I think it's just the normal random variables, since if you repeatedly add a random variable to itself you get something that slowly deforms into a gaussian. I guess the space of all things deforming into gaussians also works.

Gold Member
I'm not sure this is the right question. Given a single random variable, it plus itself is just 2 time itself, since you're not adding two independent copies of itself. So any one dimensional subspace for example.

If you are wondering about sets of random variables that are stable under adding independent copies, then I think it's just the normal random variables, since if you repeatedly add a random variable to itself you get something that slowly deforms into a gaussian. I guess the space of all things deforming into gaussians also works.
Don't mean to go far OT, but I am kind of curious about transformations that preserve normality.

An interesting thing is that covariance defines an inner product on the space of random variables.

BWV
Don't mean to go far OT, but I am kind of curious about transformations that preserve normality.

An interesting thing is that covariance defines an inner product on the space of random variables.
Wouldn't any linear or affine transformation of a normal RV preserve normality?

The Wiki article gives the answer for a pair of correlated variables

Yes, the article gives an answer for a pair of correlated normal random variables that have have a joint bivariate normal distribution, but not for more general case of correlated normal random variables whose joint distribution is not specified.

BWV
Yes, the article gives an answer for a pair of correlated normal random variables that have have a joint bivariate normal distribution, but not for more general case of correlated normal random variables whose joint distribution is not specified.
So if you have the distribution of each component and the covariance, dont you also have the joint distribution?

Wouldn't any linear or affine transformation of a normal RV preserve normality?

I'd say yes,

The topic
but I am kind of curious about transformations that preserve normality.
could include all transformations that preserve normality so perhaps it should be specialized to particular kinds of transformation to eliminate relatively trivial ones. For example if ##X## has a normal distribution with mean zero and we define the transformation ##T(x) = x## except for x = 3 or -3, where we define T(x) = -x, have we have transformed ##X## to a (technically) different normally distributed random variable?

• WWGD
I'm not sure this is the right question. Given a single random variable, it plus itself is just 2 time itself, since you're not adding two independent copies of itself. So any one dimensional subspace for example.
The question I propose is whether each ##X_i## is an element of the same subspace ##S##. So the fact that each ##X_i## can be regarded as being in the 1 dimensional subspace generated by itself, doesn't answer that.

It appears that a joint distribution of two dependent normal variables may not be normal. However it is not clear to me whether or not the sum has a normal distribution.

Gold Member
It appears that a joint distribution of two dependent normal variables may not be normal. However it is not clear to me whether or not the sum has a normal distribution.
Just use , e.g., for X~N(0,1), the sum X+(-X).

Just use , e.g., for X~N(0,1), the sum X+(-X).
This is an artificial special case where the variance = 0. It could be called a normal distribution.

• FactChecker
BWV
Seems bogus to me, as η contains a discontinuous function σ , and is only 'Normal' because the Bernoulli distribution is hidden until you add it to ξ. I think if you wrote out the characteristic functions for the example like in the proof here: https://en.wikipedia.org/wiki/Sum_of_normally_distributed_random_variables

, it would be obvious as you would have the composition of Bernoulli CF of in the gaussian integral and they are really not two normal distributions

Gold Member
It seems we can set the Copula in such a way that the sum violates normality. I will give it a try, should be a nice exercise. Never tried it before, but should work.

Last edited:
• Stephen Tashi
Seems bogus to me, as η contains a discontinuous function σ

How would we state a theorem that avoids that counterexample?

If we say "Let ##\eta## be a normally distributed random variable that does not contain a discontinuous function", what would that mean?

On attempt to define "##\eta## does not contain a discontinuous function" might be: There doesn not exist a finite set of random variables ##\{X_1, X_2,...,X_n\}## such that at least one of the ##X_i## does not have a continuous distribution and such that for some function ##f## we have ##\eta = f(X_1,X_2,....X_n)##. However, given such great freedom for people to choose ##f## and the ##X_i## we might eliminate all normally distributed ##\eta## from consideration.

In practical situations the evidence about how ##\eta## can be represented as functions of other random variables would come from joint measurements of ##\eta## and those random variables or functions of those random variables. So perhaps the definition would need to be of the form "##\eta## does not contain a discontinuous function with respect to the random variable ##X## means that ......".

It seems we can set the Copula in such a way that the sum violates normality. I will give it a try, should be a nice exercise. Never tried it before, but should work.

Be careful! https://en.wikipedia.org/wiki/David_X._Li • WWGD
BWV
How would we state a theorem that avoids that counterexample?

If we say "Let ##\eta## be a normally distributed random variable that does not contain a discontinuous function", what would that mean?

On attempt to define "##\eta## does not contain a discontinuous function" might be: There doesn not exist a finite set of random variables ##\{X_1, X_2,...,X_n\}## such that at least one of the ##X_i## does not have a continuous distribution and such that for some function ##f## we have ##\eta = f(X_1,X_2,....X_n)##. However, given such great freedom for people to choose ##f## and the ##X_i## we might eliminate all normally distributed ##\eta## from consideration.

In practical situations the evidence about how ##\eta## can be represented as functions of other random variables would come from joint measurements of ##\eta## and those random variables or functions of those random variables. So perhaps the definition would need to be of the form "##\eta## does not contain a discontinuous function with respect to the random variable ##X## means that ......".

Just also define the generating process for the RV - perhaps the sum of normally distributed RVs where each can be completely described by the Gaussian MGF or CF

Just also define the generating process for the RV - perhaps the sum of normally distributed RVs where each can be completely described by the Gaussian MGF or CF

However, the existence of one sort of generating process for a random variable doesn't rule out the existence of a different generating processes for it.

I think the goal is prove a theorem of the form:

If ##X## and ##Y## are normally distributed random variables with joint distribution ##J(X,Y)## and .... some conditions.... then X + Y is normally distributed.

To get anything interesting the "some conditions" must be conditions that aren't trivially equivalent to assuming ##J(X,Y)## is a bivariate normal distribution.

BWV
Also if D is the Dirichlet Function and A is a standard normal is B= D(A)*A normal? Seems it would pass the measure theoretic criteria - it is L-integrable for example. But if C is another standard normal, then C-B has measure zero

BWV
However, the existence of one sort of generating process for a random variable doesn't rule out the existence of a different generating processes for it.

I think the goal is prove a theorem of the form:

If ##X## and ##Y## are normally distributed random variables with joint distribution ##J(X,Y)## and .... some conditions.... then X + Y is normally distributed.

To get anything interesting the "some conditions" must be conditions that aren't trivially equivalent to assuming ##J(X,Y)## is a bivariate normal distribution.

but you have to define normal by some generating function. If A = the sum of 100 uniform RVs is A normal? In practice it would be treated as such, but then there is some arbitrary number of uniform RVs where the sum is not normal. If you use the full limit and A is the infinite sum of appropriately scaled uniform RVs, then you could subtract some RH unbounded interval in the sum, which would also be normal, and get a non-normal RV

but you have to define normal by some generating function.

A normal random variable ##\eta## can , in many different ways, be represented as the sum of two other independent normal random variables.

I don't understand how you want to restrict the types of representations of ##\eta## that are permitted.

BWV
Added note: These random variables are uncorrelated.
Is that correct? The intension of the η=σξ example is to randomly flip the sign on half of ξ, which does not change the distribution, but half of η is perfectly negatively correlated to ξ, and this information is recovered in P(ξ+η=0)=P(σ=−1)=1/2.

ISTM there are problems with the construction on this, but they are above my pay grade. ξ is a function on ℝ, while σ is discrete. How would you actually flip the sign randomly on half of ℝ? any countably finite number in σ would not change the outcome of the distribution, and if σ in uncountably infinite, then how is P(ξ+η=0) not 0?

How would you actually flip the sign randomly on half of ℝ?
Why would that be necessary? If we grant that you can take a random sample from ##\xi##, in order to realize a sample of ##\xi \sigma## you only need one to decide whether to flip the value of that particular random sample by using one realization of ##\sigma##.

The probability of any particular realization of a sample from a normally distributed random variable is zero, so if zero probability events are going to be a conceptual problem, they are already present in the idea of taking a sample from ##\xi##.