I have some questions about boxplots in matplotlib:
Question A. What do the markers that I highlighted below with Q1, Q2, and Q3 represent? I believe Q1 is maximum and Q3 are outliers, but what is Q2?
Question B How does matplotlib identify outliers? (i.e. how does it know that they are not the true max
and min
values?)
A picture is worth a thousand words. Note that the outliers (the +
markers in your plot) are simply points outside of the wide [(Q1-1.5 IQR), (Q3+1.5 IQR)]
margin below.
However, the picture is only an example for a normally distributed data set. It is important to understand that matplotlib does not estimate a normal distribution first and calculates the quartiles from the estimated distribution parameters as shown above.
Instead, the median and the quartiles are calculated directly from the data. Thus, your boxplot may look different depending on the distribution of your data and the size of the sample, e.g., asymmetric and with more or less outliers.