Multiple data in scatter matrix

Manuel picture Manuel · Jan 15, 2014 · Viewed 8.2k times · Source

Is it possible to add multiple data to a pandas.tools.plotting.scatter_matrix and assigning a color to each group of data?

I'd like to show the scatter plots with data points for one group of data, let's say, in green and the other group in red in the very same scatter matrix. The same should apply for the density plots on the diagonal. I know that this is possible by using matplotlib's scatter function, but that does not give me a scatter matrix.

The documentation of pandas is mum on that.

Answer

neone4373 picture neone4373 · Dec 20, 2014

The short answer is determine the color of each dot in the scatter plot, role it into an array and pass it as the color argument.

Example:

from pandas.tools.plotting import scatter_matrix
import pandas as pd
from sklearn import datasets

iris = datasets.load_iris()
iris_data = pd.DataFrame(data=iris['data'],columns=iris['feature_names'])
iris_data["target"] = iris['target']

color_wheel = {1: "#0392cf", 
               2: "#7bc043", 
               3: "#ee4035"}
colors = iris_data["target"].map(lambda x: color_wheel.get(x + 1))
ax = scatter_matrix(iris_data, color=colors, alpha=0.6, figsize=(15, 15), diagonal='hist')

Iris Dataset