How to make a pandas crosstab with percentages?

Question 1

How to make a pandas crosstab with percentages?

python pandas crosstab

Brian Keegan · Jan 21, 2014 · Viewed 68.1k times · Source

Answer

Answer

From Pandas 0.18.1 onwards, there's a normalize option:

In [1]: pd.crosstab(df.A,df.B, normalize='index')
Out[1]:

B              A           B           C
A           
one     0.333333    0.333333    0.333333
three   0.333333    0.333333    0.333333
two     0.333333    0.333333    0.333333

Where you can normalise across either all, index (rows), or columns.

More details are available in the documentation.

Question 2

Given a dataframe with different categorical variables, how do I return a cross-tabulation with percentages instead of frequencies?

df = pd.DataFrame({'A' : ['one', 'one', 'two', 'three'] * 6,
                   'B' : ['A', 'B', 'C'] * 8,
                   'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 4,
                   'D' : np.random.randn(24),
                   'E' : np.random.randn(24)})


pd.crosstab(df.A,df.B)


B       A    B    C
A               
one     4    4    4
three   2    2    2
two     2    2    2

Using the margins option in crosstab to compute row and column totals gets us close enough to think that it should be possible using an aggfunc or groupby, but my meager brain can't think it through.

B       A     B    C
A               
one     .33  .33  .33
three   .33  .33  .33
two     .33  .33  .33

How to make a pandas crosstab with percentages?

Answer

Related questions