In a boxplot
I've set the option outline=FALSE
to remove the outliers.
Now I'd like to include points
that show the mean in the boxplot. Obviously, the means calculated using mean
include the outliers.
How can the very same outliers be removed from a dataframe so that the calculated mean corresponds to the data shown in the boxplot?
I know how outliers can be removed, but which settings are used by the outline
option from boxplot
internally? Unfortunately, the manual does not give any clarifications.
To answer the second part of your question, about how the outliers are choosen, it's good to remind how the boxplot is constructed:
If you take the hypothesis that your data has a normal distribution, there are this amount of data outside each whisker:
1-pnorm(qnorm(0.75)+1.5*2*qnorm(0.75))
being 0.0035. Therefore, a normal variable has 0.7% of "boxplot outliers".
But this is not a very "reliable" way to detect outliers, there are packages specifically designed for this.