Correct way to standardize/scale/normalize multiple variables following power law distribution for use in linear combination

graph normalize linear-equation rescale power-law

Jacob Rigby · Apr 1, 2009 · Viewed 29k times · Source

I'd like to combine a few metrics of nodes in a social network graph into a single value for rank ordering the nodes:

in_degree + betweenness_centrality = informal_power_index

The problem is that in_degree and betweenness_centrality are measured on different scales, say 0-15 vs 0-35000 and follow a power law distribution (at least definitely not the normal distribution)

Is there a good way to rescale the variables so that one won't dominate the other in determining the informal_power_index?

Three obvious approaches are:

Standardizing the variables (subtract mean and divide by stddev). This seems it would squash the distribution too much, hiding the massive difference between a value in the long tail and one near the peak.
Re-scaling variables to the range [0,1] by subtracting min(variable) and dividing by max(variable). This seems closer to fixing the problem since it won't change the shape of the distribution, but maybe it won't really address the issue? In particular the means will be different.
Equalize the means by dividing each value by mean(variable). This won't address the difference in scales, but perhaps the mean values are more important for the comparison?

Any other ideas?

Answer

You seem to have a strong sense of the underlying distributions. A natural rescaling is to replace each variate with its probability. Or, if your model is incomplete, choose a transformation that approximately acheives that. Failing that, here's a related approach: If you have a lot of univariate data from which to build a histogram (of each variate), you could convert each to a 10 point scale based on whether it is in the 0-10% percentile or 10-20%-percentile ...90-100% percentile. These transformed variates have, by construction, a uniform distribution on 1,2,...,10, and you can combine them however you wish.

Correct way to standardize/scale/normalize multiple variables following power law distribution for use in linear combination

Answer

Related questions