How to make Pareto Chart in python?

user8864088 picture user8864088 · Dec 2, 2018 · Viewed 35.7k times · Source

Pareto is very popular diagarm in Excel and Tableu. In excel we can easily draw a Pareto diagram but I found no easy way to draw the diagram in Python.

I have a pandas dataframe like this:

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.DataFrame({'country': [177.0, 7.0, 4.0, 2.0, 2.0, 1.0, 1.0, 1.0]})
df.index = ['USA', 'Canada', 'Russia', 'UK', 'Belgium', 'Mexico', 'Germany', 'Denmark']
print(df)

         country
USA        177.0
Canada       7.0
Russia       4.0
UK           2.0
Belgium      2.0
Mexico       1.0
Germany      1.0
Denmark      1.0

How to draw the Pareto diagram ? Using maybe pandas, seaborn, matplotlib etc?

So far I have been able to make descending order bar chart. But its still remaining to put cumulative sum line plot on top of them.

My attempt: df.sort_values(by='country',ascending=False).plot.bar()

Required plot:

Answer

ImportanceOfBeingErnest picture ImportanceOfBeingErnest · Dec 2, 2018

You would probably want to create a new column with the percentage in it and plot one column as bar chart and the other as a line chart in a twin axes.

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter

df = pd.DataFrame({'country': [177.0, 7.0, 4.0, 2.0, 2.0, 1.0, 1.0, 1.0]})
df.index = ['USA', 'Canada', 'Russia', 'UK', 'Belgium', 'Mexico', 'Germany', 'Denmark']
df = df.sort_values(by='country',ascending=False)
df["cumpercentage"] = df["country"].cumsum()/df["country"].sum()*100


fig, ax = plt.subplots()
ax.bar(df.index, df["country"], color="C0")
ax2 = ax.twinx()
ax2.plot(df.index, df["cumpercentage"], color="C1", marker="D", ms=7)
ax2.yaxis.set_major_formatter(PercentFormatter())

ax.tick_params(axis="y", colors="C0")
ax2.tick_params(axis="y", colors="C1")
plt.show()

enter image description here