Discrete power law distributions

Old Guy
Messages
101
Reaction score
1
I'm not a mathematician, but I want to understand how a mathematician would view this issue.

I'm working primarily with degree distributions for finite graphs, and when I make a log log plot of the frequency distribution the data points form a nice straight line (at least for low degree values). To be specific, the x-axis is the degree (number of edges per vertex) and the y-axis is the number of vertices that have a particular degree.

Question 1: What is the convention for dealing with gaps in the data? This has two parts. First, it is obvious that there will be gaps at the high end of the degree scale. How is this dealt with when, for example, trying to find the power law exponent? What about the case where there are gaps throughout the data, such as the case where the vertices are constrained to only an even number of vertices?

Question 2: A. Clauset, C. Shalizi, and M. Newman, SIAM Rev. 51, 661 (2009) discusses calculation of the exponent for power law distributions, and propose a MLE method and KS test for goodness of fit. It seems to me that this won't work for graphs where the exponent is less than 2 because the zeta function blows up. Can anyone suggest an alternative? Or is there a mathematical basis for saying that these distributions MUST be something other than a power law?

Question 3: A simplistic approach would be to base the power law exponent on the data that fits it. This would essentially ignore the zero values and (potentially) some of the high degree tail of the distribution. On the other hand, it would be (I think) a useful description of the behavior of the bulk of the system. What are the risks here?

Thanks!
 
Physics news on Phys.org
The gaps between the extreme values aren't really a problem for most estimation methods.

Also as an alternative to Zeta there's the Zipf distribution which is similar but has a finite maximum, which would solve the blowup problem. MLE could be used for the fit but it tends to set the upper bound to the largest data point, so perhaps MVUEs could be used instead (though I don't know the details - lookup the German Tank Problem for an example).
 
Hi all, I've been a roulette player for more than 10 years (although I took time off here and there) and it's only now that I'm trying to understand the physics of the game. Basically my strategy in roulette is to divide the wheel roughly into two halves (let's call them A and B). My theory is that in roulette there will invariably be variance. In other words, if A comes up 5 times in a row, B will be due to come up soon. However I have been proven wrong many times, and I have seen some...
Thread 'Detail of Diagonalization Lemma'
The following is more or less taken from page 6 of C. Smorynski's "Self-Reference and Modal Logic". (Springer, 1985) (I couldn't get raised brackets to indicate codification (Gödel numbering), so I use a box. The overline is assigning a name. The detail I would like clarification on is in the second step in the last line, where we have an m-overlined, and we substitute the expression for m. Are we saying that the name of a coded term is the same as the coded term? Thanks in advance.
Back
Top