- #1
Old Guy
- 103
- 1
I'm not a mathematician, but I want to understand how a mathematician would view this issue.
I'm working primarily with degree distributions for finite graphs, and when I make a log log plot of the frequency distribution the data points form a nice straight line (at least for low degree values). To be specific, the x-axis is the degree (number of edges per vertex) and the y-axis is the number of vertices that have a particular degree.
Question 1: What is the convention for dealing with gaps in the data? This has two parts. First, it is obvious that there will be gaps at the high end of the degree scale. How is this dealt with when, for example, trying to find the power law exponent? What about the case where there are gaps throughout the data, such as the case where the vertices are constrained to only an even number of vertices?
Question 2: A. Clauset, C. Shalizi, and M. Newman, SIAM Rev. 51, 661 (2009) discusses calculation of the exponent for power law distributions, and propose a MLE method and KS test for goodness of fit. It seems to me that this won't work for graphs where the exponent is less than 2 because the zeta function blows up. Can anyone suggest an alternative? Or is there a mathematical basis for saying that these distributions MUST be something other than a power law?
Question 3: A simplistic approach would be to base the power law exponent on the data that fits it. This would essentially ignore the zero values and (potentially) some of the high degree tail of the distribution. On the other hand, it would be (I think) a useful description of the behavior of the bulk of the system. What are the risks here?
Thanks!
I'm working primarily with degree distributions for finite graphs, and when I make a log log plot of the frequency distribution the data points form a nice straight line (at least for low degree values). To be specific, the x-axis is the degree (number of edges per vertex) and the y-axis is the number of vertices that have a particular degree.
Question 1: What is the convention for dealing with gaps in the data? This has two parts. First, it is obvious that there will be gaps at the high end of the degree scale. How is this dealt with when, for example, trying to find the power law exponent? What about the case where there are gaps throughout the data, such as the case where the vertices are constrained to only an even number of vertices?
Question 2: A. Clauset, C. Shalizi, and M. Newman, SIAM Rev. 51, 661 (2009) discusses calculation of the exponent for power law distributions, and propose a MLE method and KS test for goodness of fit. It seems to me that this won't work for graphs where the exponent is less than 2 because the zeta function blows up. Can anyone suggest an alternative? Or is there a mathematical basis for saying that these distributions MUST be something other than a power law?
Question 3: A simplistic approach would be to base the power law exponent on the data that fits it. This would essentially ignore the zero values and (potentially) some of the high degree tail of the distribution. On the other hand, it would be (I think) a useful description of the behavior of the bulk of the system. What are the risks here?
Thanks!