How to visualize a nonlinear relationship in a scatter plot

tobip picture tobip · May 21, 2014 · Viewed 11.6k times · Source

I want to visually explore the relationship between two variables. The functional form of the relationship is not visible in dense scatter plots like this:

scatter plot

How can I add a lowess smooth to the scatter plot in Python?

Or do you have any other suggestions to visually explore non-linear relationships?

I tried the following but it didn't work properly (drawing on an example from Michiel de Hoon):

import numpy as np
from statsmodels.nonparametric.smoothers_lowess import lowess
x = np.arange(0,10,0.01)
ytrue = np.exp(-x/5.0) + 2*np.sin(x/3.0)

# add random errors with a normal distribution                      
y = ytrue + np.random.normal(size=len(x))
plt.scatter(x,y,color='cyan')

# calculate a smooth curve through the scatter plot
ys = lowess(x, y)
_ = plt.plot(x,ys,'red',linewidth=1)

# draw the true values for comparison
plt.plot(x,ytrue,'green',linewidth=3)

lowess

The lowess smoother (red lines) is strange.

EDIT:

The following matrix also includes lowess smoothers (taken from this question on CV): enter image description here

Does someone have the code for such a graph?

Answer

mwaskom picture mwaskom · May 21, 2014

You could also use seaborn:

import numpy as np
import seaborn as sns

x = np.arange(0, 10, 0.01)
ytrue = np.exp(-x / 5) + 2 * np.sin(x / 3)
y = ytrue + np.random.normal(size=len(x))

sns.regplot(x, y, lowess=True)

enter image description here