I wrote a function to plot the distribution of values for variables in a pie chart, as shown below.
def draw_piecharts(df, variables, n_rows, n_cols):
df[variables].value_counts.plot(kind='pie', layout=(n_rows,n_cols), subplots=True)
plt.show()
def main():
util.draw_piecharts(df, [ 'TARGET', 'BanruptcyInd'], 1,2)
if __name__ == "__main__":
main()
Unfortunately my function doesn't compute because dataframes have no attribute value_counts()
, and value_counts is the only way I know how to get the distribution plotted in a pie chart.
Here's a sample of the variables being plotted:
0 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 0
12 1
13 0
14 0
15 0
16 0
17 1
18 0
19 0
20 0
21 1
22 0
23 0
24 1
25 0
26 1
27 0
28 0
29 0
Name: TARGET, dtype: int64
0 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 0
18 0
19 0
20 0
21 0
22 0
23 0
24 0
25 0
26 0
27 0
28 0
29 0
While value_counts
is a Series method, it's easily applied to the Series inside DataFrames by using DataFrame.apply
. In your case. for example,
df[variables].apply(pd.value_counts).plot(kind='pie', layout=(n_rows,n_cols), subplots=True)
(assuming pandas
has been imported as pd
).
For a complete example:
import pandas as pd
a = pd.DataFrame({'a': [1,0,0,0,1,1,0,0,1,0,1,1,1],'b': [1,0,0,0,1,1,0,0,1,0,0,0,0]})
a.apply(pd.value_counts).plot.pie(subplots=True)