What to do with missing values when plotting with seaborn?

datavinci picture datavinci · Oct 2, 2015 · Viewed 26.8k times · Source

I replaced the missing values with NaN using lambda following function:

data = data.applymap(lambda x: np.nan if isinstance(x, basestring) and x.isspace() else x)

,where data is the dataframe I am working on.

Using seaborn afterwards,I tried to plot one of its attributes,alcconsumption using seaborn.distplot as follows:

seaborn.distplot(data['alcconsumption'],hist=True,bins=100)
plt.xlabel('AlcoholConsumption')
plt.ylabel('Frequency(normalized 0->1)')

It's giving me the following error:

AttributeError: max must be larger than min in range parameter.

Answer

jtlz2 picture jtlz2 · Sep 10, 2018

This is a known issue with matplotlib/pylab histograms!

See e.g. https://github.com/matplotlib/matplotlib/issues/6483

where various workarounds are suggested, two favourites (for example from https://stackoverflow.com/a/19090183/1021819) being:

import numpy as np
nbins=100
A=data['alcconsumption']
Anan=A[~np.isnan(A)] # Remove the NaNs

seaborn.distplot(Anan,hist=True,bins=nbins)

Alternatively, specify bin edges (in this case by anyway making use of Anan...):

Amin=min(Anan)
Amax=max(Anan)
seaborn.distplot(A,hist=True,bins=np.linspace(Amin,Amax,nbins))