Discrete power law distributions

Old Guy · Aug 17, 2011

I'm not a mathematician, but I want to understand how a mathematician would view this issue.

I'm working primarily with degree distributions for finite graphs, and when I make a log log plot of the frequency distribution the data points form a nice straight line (at least for low degree values). To be specific, the x-axis is the degree (number of edges per vertex) and the y-axis is the number of vertices that have a particular degree.

Question 1: What is the convention for dealing with gaps in the data? This has two parts. First, it is obvious that there will be gaps at the high end of the degree scale. How is this dealt with when, for example, trying to find the power law exponent? What about the case where there are gaps throughout the data, such as the case where the vertices are constrained to only an even number of vertices?

Question 2: A. Clauset, C. Shalizi, and M. Newman, SIAM Rev. 51, 661 (2009) discusses calculation of the exponent for power law distributions, and propose a MLE method and KS test for goodness of fit. It seems to me that this won't work for graphs where the exponent is less than 2 because the zeta function blows up. Can anyone suggest an alternative? Or is there a mathematical basis for saying that these distributions MUST be something other than a power law?

Question 3: A simplistic approach would be to base the power law exponent on the data that fits it. This would essentially ignore the zero values and (potentially) some of the high degree tail of the distribution. On the other hand, it would be (I think) a useful description of the behavior of the bulk of the system. What are the risks here?

Thanks!

bpet · Aug 18, 2011

The gaps between the extreme values aren't really a problem for most estimation methods.

Also as an alternative to Zeta there's the Zipf distribution which is similar but has a finite maximum, which would solve the blowup problem. MLE could be used for the fit but it tends to set the upper bound to the largest data point, so perhaps MVUEs could be used instead (though I don't know the details - lookup the German Tank Problem for an example).

Discrete power law distributions

SUMMARY

PREREQUISITES

NEXT STEPS

USEFUL FOR

Similar threads

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Graduate Probability puzzle

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Undergrad The countability paradox of computable numbers

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect