Undergrad Re-scaling of exponentially distributed numbers

Click For Summary
The discussion centers on the transformation of exponentially distributed random numbers into a nearly uniform distribution after re-scaling. The re-scaling process, which involves dividing each number by the sum of its row, alters the original distribution due to the dependency created among the numbers. The intent behind the re-scaling is clarified as generating pairs of random variables that sum to one, rather than performing a simple linear transformation. There is a lack of consensus on the precise mathematical explanation for the resulting distribution shape, indicating a need for clearer procedural definitions. Ultimately, the transformation leads to uniformly distributed results, which aligns with mathematical principles regarding the nature of the operations performed.
roam
Messages
1,265
Reaction score
12
TL;DR
I am trying to generate ##M## random numbers which are exponentially distributed and whose sum adds up to ##N##. However, the re-scaling always causes the numbers to become uniformly distributed.
For simplicity, let ##N=1##. The following histograms show my results. The generated random numbers are initially exponentially distributed. But after re-scaling they become almost uniformly distributed.

244159


What is the cause of that, and is there a solution?

P.S. Here is my code in Matlab:

Matlab:
subplot(121)
samples = 10000;
lambda = 1;
X = -log(rand(samples,2))/lambda;
hist(X(:,1),100)
subplot(122)
X = X./sum(X,2); % re-scaling
hist(X(:,1),100)
 
Physics news on Phys.org
roam said:
The generated random numbers are initially exponentially distributed. But after re-scaling they become almost uniformly distributed.
Yes, of course they do. That's the way math works. If you have a log distribution on a log scale, that's the same as a flat distribution in a linear scale. I don't know why you would expect otherwise. There IS no "solution".
 
roam said:
Summary: However, the re-scaling always causes the numbers to become uniformly distributed.
The mathematical question is more likely to be answered if it is stated precisely. As I read the MATLAB code, the mathematical question is:

##X## and ##Y## are independent random variables and each is uniformly distributed on [0,1]. What is the distribution of ##W = \frac{ \log(X)}{ \log(X) + \log(Y)}## ?
 
  • Like
Likes roam
I don't quite understand why you are scaling by sum(X,2). That will rescale each row by the sum of the numbers on that row (2 numbers). I changed it to sum(X,1), which sums the 10000 numbers along index 1 (sums each column), and got what I think you were expecting. I like to keep it simple and see the intermediate calculations so that I can make sure it is doing what I expected:
N=10
S = sum(X,1)
Y = N*X(:,1)/S(1,1); % re-scaling
 
  • Like
Likes Stephen Tashi and roam
roam said:
Summary: I am trying to generate ##M## random numbers which are exponentially distributed and whose sum adds up to ##N##.
If their sum adds up to a given N then their distributions are not independent and so they cannot individually be exponentially distributed. Take the case of M = 2; If the first number ## n_0 ## is exponentially distributed in the range ## [0,N] ## but the second number must always equal ## N-n_0 ##.
roam said:
However, the re-scaling always causes the numbers to become uniformly distributed.
Yes of course, because as @StephenTashi points out your 'rescaling' creates a completely different distribution.
 
pbuk said:
If their sum adds up to a given N then their distributions are not independent
He was trying to generate the data and do a simple linear rescaling of the data afterward. It should not have changed the general shape of the distribution. He had a MATLAB coding error.
 
  • Like
Likes pbuk and roam
FactChecker said:
He was trying to generate the data and do a simple linear rescaling of the data afterward. It should not have changed the general shape of the distribution.

As to rescaling, I think the goal is take pairs of random variables ##X_a, X_b## , and from each pair , create the pair ##W_a = X_a/(X_a + X_b),\ W_b = (X_b)/(X_a + X_b)##. Then we look at the distribution of ##W_a##. So the intent is not to do a linear rescaling. The intent is to create pairs of random variables ##W_a,\ W_b## that sum to 1.
 
  • Like
Likes roam
Stephen Tashi said:
As to rescaling, I think the goal is take pairs of random variables ##X_a, X_b## , and from each pair , create the pair ##W_a = X_a/(X_a + X_b),\ W_b = (X_b)/(X_a + X_b)##. Then we look at the distribution of ##W_a##. So the intent is not to do a linear rescaling. The intent is to create pairs of random variables ##W_a,\ W_b## that sum to 1.
That is what his original code did. I don't know what the real intention was. He got a valid answer to either case in this thread. I didn't see anything about "pairs" in the description. I still think that my assumption fits the original description better (or at least as well).
 
FactChecker said:
I don't know what the real intention was. He got a valid answer to either case in this thread.

We don't yet have a good mathematical explanation for the shape of the second histogram. I agree that we don't yet have a clear statement of a mathematical question!
roam said:
Summary: I am trying to generate ##M## random numbers which are exponentially distributed and whose sum adds up to ##N##.

The generated random numbers are initially exponentially distributed. But after re-scaling they become almost uniformly distributed.

It's often hard to translate a procedure into a question about random variables. The first step is to describe the procedure clearly.

As a procedure, one interpretation of what you want to do, in general, is to generate 1 set of ##M## random numbers that sum to ##N## and then you want to make a histogram of all those ##M## random numbers. - i.e. all ##M## of the numbers contribute to the histogram. Your claim is that most such histograms are approximately uniform distributions.

A different interpretation is that you want to generate many sets of ##M## random numbers, each set being one where the ##M## numbers sum to the same ##N##. Then you want to make a histogram by using one number from each of the sets. For example, if you generate 100 sets of ##M= 20## numbers, you might make a histogram by picking the first number from each of the 100 sets of numbers. The histogram would involve 100 values.

Or, to make a hybrid of the previous procedures, perhaps you want to generate, say, ##100## sets of ##M = 20## random numbers such that the sum of the numbers in each set is ##N##. Then you want to histogram all 2000 of the numbers.
 
  • Like
Likes FactChecker

Similar threads

  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 3 ·
Replies
3
Views
12K
  • · Replies 5 ·
Replies
5
Views
2K
Replies
2
Views
2K
  • · Replies 12 ·
Replies
12
Views
4K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
1
Views
8K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
4K
Replies
22
Views
5K