Average of prime factors growth

Click For Summary

Discussion Overview

The discussion revolves around the growth of the average number of prime factors of integers, exploring the mathematical relationships and models that describe this phenomenon. Participants share their findings from numerical simulations and theoretical insights, focusing on the implications of the Prime Number Theorem and related concepts.

Discussion Character

  • Exploratory
  • Technical explanation
  • Mathematical reasoning
  • Debate/contested

Main Points Raised

  • One participant shares their experience plotting the number of prime factors for integers up to one million and suggests that averaging over batches of numbers reveals a curve that may relate to log(log(x)).
  • Another participant references the Prime Number Theorem to discuss the probability of a number being prime and its implications for the expected number of unique prime factors.
  • Some participants note that differences arise when counting all factors versus just unique prime factors, prompting a request for additional theorems that account for multiplicities.
  • A mathematical expression involving integrals is proposed to estimate the average number of prime factors, with some participants discussing the impact of constants and the distribution of primes on their results.
  • One participant expresses uncertainty about the fit of their model and wonders if it is limited to the range they simulated.
  • Another participant shares specific numerical results and deviations observed when extending their analysis to larger ranges, indicating that while the fit is close, it is not perfect.
  • Discussions about the choice of constants in their models lead to speculation about the influence of the Euler-Mascheroni constant on their findings.
  • Participants engage in a collaborative effort to refine their models and clarify discrepancies in their results, leading to improved agreement in their findings.

Areas of Agreement / Disagreement

Participants express a mix of agreement and disagreement regarding the mathematical models and constants used in their analyses. While some findings align, there are notable differences in interpretations and results, indicating that the discussion remains unresolved.

Contextual Notes

Limitations include the dependence on specific ranges of integers and the potential impact of the distribution of prime numbers on the accuracy of the proposed models. Some participants acknowledge that their results may vary with different ranges or assumptions.

Who May Find This Useful

This discussion may be of interest to those studying number theory, mathematical modeling, or anyone curious about the properties of prime factors and their distributions.

diegzumillo
Messages
180
Reaction score
20
TL;DR
Average of number of prime factors have a curious fit, and I don't know what it means.
Hi there, bored physicist here. I wondered about factorization so I made some plots and a model fit to satiate my curiosity, but it only made me more curious. Now I just want the answers! I didn't know which prefix to use. "Hobbyist" seemed more appropriate but we don't have that so I went with "I" (undergrad level). Hope that works.

So I started calculating how many prime factors a number has, then plotted this number for all the numbers up to some arbitrary high number (I went up to a million). The plot was not very informative. (this one has 10000, for higher numbers the plot becomes an unreadable series of lines)
factors.png

This looks like a random function. But, obviously we will start seeing higher number of factors the higher the number, so maybe if I take the average of the number of factors every batch of numbers that will tell me something interesting. So I experimented with different batch sizes (steps) to see if that influenced the result much, and it doesn't. It just narrows the curve a little, which is nice.
meanvalues.png

Bigger step/batch:
meanvalues_bigstep.png

Naturally I wanted to know what curve this is, so I tried some naive fits. It looks like a square root of x so I tried a x^b. Close but definitely not it. Tried log, also close but not it. Then I went to wikipedia and, even though most of those number theory articles flew over my head, I did notice a recurrence of log(log(x)), so I tried that and got 1.27568 log ( log (815024. x - 600100.))
fit.png


All right! that looks spot on!

but why? What are these numbers? Changing the step makes no difference on this fit, by the way.
 
Mathematics news on Phys.org
This proves something using the Prime Number Theorem.
 
  • Like
Likes   Reactions: diegzumillo and jim mcnamara
If you average over a sufficiently large range for sufficiently large numbers, a number n has a 1/(log(n)+1) chance to be prime, and a 1/n chance to be a divisor of a larger N. This makes our expected number of unique prime factors
$$\sum_{i=2}^{N} \frac{1}{n(\log(n)+1)} \approx \int_{x=2}^{N} \frac{1}{x(\log(x)+1)} =
\left[ \log(\log(x)+1)\right]_2^N = \log(\log N+1) + c $$ neglecting some edge effects.
 
  • Like
  • Informative
Likes   Reactions: diegzumillo and Keith_McClary
Nice! The expression is definitely recognizable. The differences come from counting all factors, not just the unique prime factors.
Do you have any more theorems in that hat of yours that could include multiplicities as well?
 
If we count primes with their powers, then a prime n will contribute 1/n + 1/n2 + ... = 1/(n-1)

$$\int_{x=2}^{N} \frac{1}{(x-1)(\log(x)+1)} = \int_{x=1}^{N-1} \frac{1}{(x)(\log(x+1)+1)} \approx \int_{x=1}^{N} \frac{1}{x(\log(x)+1)} = \log(\log N+1) + c' $$
Same result just with a different constant*. You get an additional 1/2 from 2 (1/4 chance to have a second power of 2, 1/8 to have a third, ...), you increase the contribution from 3 from 1/3 to 1/2, but after that the changes get small quickly as higher prime powers are very rare.

*that constant is distorted by the sum to integral approximation and the primes being not evenly distributed, so it doesn't help to evaluate the integral for an estimate. You can take the sum for the first 1000 (or whatever) primes and use the integral from there on to get a better estimate of the constant.
 
  • Like
Likes   Reactions: diegzumillo
That's unexpected. I got some really specific numbers in there and I can't find a good fit with just that integration constant. I wonder if that fit I got only works for that range I used.
 
What is the range you simulate?
 
(I had a gif of Mike Myers saying "ONE MILLION DOLLARS" but it was causing problems to post it here.)
Anyway, a million.
I know, I know, in the realm of number theory this must be laughably small.
 
log(log(N)+1)+1.005 looks spot-on if I take the bin centers and ignore the first few bins where the numbers vary significantly within the bins.

I use log for the natural logarithm here, if you don't use that then you need some additional factors.

primes.png


Edit: Did some spot checks up to 10 million and there are some deviations, but the deviations are still below 0.2%.
Edit2: Reaching 0.23% deviation at 100 million. It's not perfect, but it's pretty close throughout. And that's a constant that I chose based on the first million. It's probably better to choose a constant based on the larger values.

primesratio.png


Slightly broken formatting, but here are my numbers:
Code:
N	prime factors	logs	logs+c	diff	ratio
995000	3.7022	2.69533625473261	3.70033625473261	-0.001863745267388	0.999496584391068
3005000	3.7738	2.76731158045996	3.77231158045996	-0.001488419540043	0.999605591303184
5005000	3.8079	2.79886227781825	3.80386227781825	-0.004037722181753	0.998939645951377
7005000	3.8283	2.81912245793061	3.8241224579306	-0.004177542069395	0.998908773588957
10005000	3.8509	2.84016533031566	3.84516533031566	-0.005734669684338	0.998510823525841
20005000	3.8908	2.87984391188412	3.88484391188412	-0.005956088115885	0.998469186769846
30005000	3.9141	2.90234833992804	3.90734833992804	-0.00675166007196	0.998275041498184
40005000	3.9288	2.91801475659254	3.92301475659254	-0.005785243407455	0.998527478261185
50005000	3.9432	2.93000016348676	3.93500016348676	-0.008199836513243	0.997920512144136
60005000	3.9534	2.939687627776	3.944687627776	-0.008712372224002	0.997796233059138
70005000	3.9622	2.94780579067127	3.95280579067127	-0.00939420932873	0.997629042115812
80005000	3.9704	2.95478525510842	3.95978525510842	-0.01061474489158	0.997326530099844
90005000	3.974	2.96090145139401	3.96590145139401	-0.008098548605987	0.997962116606445
100005000	3.9804	2.96634109051274	3.97134109051274	-0.009058909487262	0.997724120820203
N	prime factors	logs	logs+c	diff	ratio
And here is the python code that produced the second column:
Python:
import math

def primefactors(num,f=0):
    if num<4:
        return f+1
    for i in range(2,int(math.sqrt(num))+1):
        if num%i == 0:
            return f+1+primefactors(num/i)
    return f+1

for k in range(4000,10001,1000):
    sum=0.
    for i in range(10000*k,10000*(k+1)):
        sum+=primefactors(i)
    print(sum/10000)
 
Last edited:
  • #10
Look what I got with a bit more curve-fitting: log(log(N)+0.57)+1.036
From the range 60000-70000 to the range 100 million to 100 million+10000 (using bins of 10000 to 1 million, and then the numbers from above) the maximal deviation is 0.0023, or 0.06%.
In the 500 million bin the real number is 4.0614 and the approximation is 4.06130. That's corresponding to an error of just 1 prime factor in 10,000 numbers.

Why 0.57? I don't know. Maybe a better estimate for the prime density somehow. Could also be some other approximation I made.
 
  • #11
mfb said:
Why 0.57? I don't know. Maybe a better estimate for the prime density somehow. Could also be some other approximation I made.
Just a wild guess, but the Euler-Mascheroni constant is around ~0.577 and it appears in asymptotic formulas for some number theoretical functions.
 
  • #12
Sigh. Sometimes I hate my ADD-riddled brain. I'm staring at your code, then at my code, then at my plot, then at your data, and I could not understand why we were getting different results. In my laziness I did the fit without using the actual numbers, but their indices instead! since I was using steps of 1000 or even bigger, the fit was trying to work with 1,2,3,4... instead of the center of the bin, 1000, 2000,... or whatever the number of steps.

But I'm on the same page now! I have very similar numbers :D

This was pretty fun.
 
  • #13
Small improvement in the prime density. ##\pi(x) = \frac x {\log x}##, therefore ##\pi'(x) = \frac {\log x-1} {\log^2 x} = \frac{1}{\log x} - \frac{1}{\log^2 x}##.
Compare this to ##\frac{1}{\log (x) + 1} = \frac{1}{\log x} - \frac{1}{\log^2 x} + \frac{1}{\log^3 x} \pm ...## which has extra terms.

The new expression leads to an integral of log(log(x))+1/log(x)+c using the same shift trick and approximation as before.
That fits worse than the empirical function before, however. log(log(x)-0.44)+1/log(x)+1.036 leads to a great fit again.
 
  • Like
Likes   Reactions: diegzumillo

Similar threads

  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 3 ·
Replies
3
Views
2K
Replies
9
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K