I would like to know whether the output of a script to plot a degree distribution can be correct.
So the script is ( where the vector with the degrees of all my vertices is stored in x):
x is
x
[1] 7 9 8 5 6 2 8 9 7 5 2 4 6 9 2 6 10 8
x is the degree of a certain network vertice - like vertice 1 has degree 7, vertice 2 has degree 9 and so on x <- v2 summary(x)
library(igraph)
split.screen(c(1,2))
screen(1)
plot (tabulate(x), log = "xy", ylab = "Frequency (log scale)", xlab = "Degree (log scale)", main = "Log-log plot of degree distribution")
screen(2)
y <- (length(x) - rank(x, ties.method = "first"))/length(x)
plot(x, y, log = "xy", ylab = "Fraction with min. degree k (log scale)", xlab = "Degree (k) (log scale)", main = "Cumulative log-log plot of degree distribution")
close.screen(all = TRUE)
power.law.fit(x, xmin = 50)
My problem is that the log-log plot seems to be incorrect - for instance, I have the degree '7' 8 times overall so shouldn't this point on a log-log plot become 0.845 (log 7)/ 0.903 (log(8) as in (x/y)?
Moreover, can somebody tell me how to fit the line ( the power-law on the log-log scale) to the plot in the screen 2 ?
I'm not familar with the igraph
package, so can't you help with that specific package. However, here is some code for plotting distributions on the log-log plot. First some data:
set.seed(1)
x = ceiling(rlnorm(1000, 4))
Then we need to rearrange the to get the inverse CDF:
occur = as.vector(table(x))
occur = occur/sum(occur)
p = occur/sum(occur)
y = rev(cumsum(rev(p)))
x = as.numeric(names(table(x)))
plot(x, y, log="xy", type="l")
Gives
Regarding your fitting question, I think the discrepancy arises because igraph
uses the MLE whereas you are doing simple linear regression (which is not recommended).
As a bit of a plug, I've started work on a package for fitting and plotting powerlaws. So, using this package you get:
library(poweRlaw)
##Create a displ object
m = displ$new(x)
##Estimate the cut-off
estimate_xmin(m)
m$setXmin(105); m$setPars(2.644)
##Plot the data and the PL line
plot(m)
lines(m, col=2)