Looking at the documentation about violon plots with seaborn, I would like to know how to plot two series of violin plots on the same axis (point 1) and that they are comparable (point 2).
About point 1, I would like to reproduce that plot for each sex :
fig, ax = plt.subplots()
sns.violinplot(x="day", y="total_bill", hue="smoker",
data=tips, split=True, ax=ax)
I could do it on two subplots:
fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(211)
sns.violinplot(x="day", y="total_bill", hue="smoker",
data=tips[tips.sex == "Female"], split=True, ax=ax)
ax = fig.add_subplot(212)
sns.violinplot(x="day", y="total_bill", hue="smoker",
data=tips[tips.sex == "Male"], split=True, ax=ax)
I would like to plot the two violin plot series on the same matplotlib axes.
Another point is about the width of the violin plot. I do not clearly understand if the violins are normalized or not and how ? I assume that the width is computed for each plot. On the example above the width is computed for Female for the first subplot and for Male for the second subplot. Thus can I compare directly the densities ? I suppose that I can compare the shapes but, for example, I cannot compare the amount of Male smoker on Monday and Female ones ? Is there a way to manage the normalization of the violins ?
For you first point, there is no way of doing this in Seaborn. Check out my comments for possible workarounds, but in brief, I don't think the time spent would be worthwhile.
For you second question, the scale
and scale_hue
parameters to violinplot
controls how the violin patch is normalized/scaled:
scale : {“area”, “count”, “width”}, optional
The method used to scale the width of each violin. If area, each violin will have the same area. If count, the width of the violins will be scaled by the number of observations in that bin. If width, each violin will have the same width.
scale_hue : bool, optional
When nesting violins using a hue variable, this parameter determines whether the scaling is computed within each level of the major grouping variable (scale_hue=True) or across all the violins on the plot (scale_hue=False).
The defaults are 'area'
and False
. You can see below how varying these parameters affects the violins. For example, if you want to compare between plots and represent the absolute counts truthfully, you could set scale='count'
and scale_hue=False
. Note that the violins would still be scaled to the maximum count within the plot (and not in the dataset), so in our case, the biggest violin for females would represent ~40 observations while the biggest violin for males would represent ~25 observations.
fig, axes = plt.subplots(4, 2, figsize=(10, 16), sharey='row')
axes_cols = (axes.flatten()[::2], axes.flatten()[1::2])
for (sex_name, sex), axes_col in zip(tips.groupby('sex'), axes_cols):
sns.countplot(x="day", hue="smoker", data=sex, ax=axes_col[0])
for scale, ax in zip(['area', 'count', 'width'], axes_col[1:]):
sns.violinplot(x="day", y="total_bill", hue="smoker",
data=sex, split=True, ax=ax, scale=scale)
ax.set_title('scale = {}'.format(scale), y=0.95)
sns.despine()
fig.tight_layout()