Boxplots in matplotlib: Markers and outliers

Amelio Vazquez-Reina picture Amelio Vazquez-Reina · Jul 18, 2013 · Viewed 62.2k times · Source

I have some questions about boxplots in matplotlib:

Question A. What do the markers that I highlighted below with Q1, Q2, and Q3 represent? I believe Q1 is maximum and Q3 are outliers, but what is Q2?

                       enter image description here

Question B How does matplotlib identify outliers? (i.e. how does it know that they are not the true max and min values?)

Answer

Amelio Vazquez-Reina picture Amelio Vazquez-Reina · Apr 27, 2014

A picture is worth a thousand words. Note that the outliers (the + markers in your plot) are simply points outside of the wide [(Q1-1.5 IQR), (Q3+1.5 IQR)] margin below.

    enter image description here

However, the picture is only an example for a normally distributed data set. It is important to understand that matplotlib does not estimate a normal distribution first and calculates the quartiles from the estimated distribution parameters as shown above.

Instead, the median and the quartiles are calculated directly from the data. Thus, your boxplot may look different depending on the distribution of your data and the size of the sample, e.g., asymmetric and with more or less outliers.