Principal components analysis using pandas dataframe

user3362813 picture user3362813 · Apr 25, 2014 · Viewed 39.4k times · Source

How can I calculate Principal Components Analysis from data in a pandas dataframe?

Answer

Akavall picture Akavall · Apr 25, 2014

Most sklearn objects work with pandas dataframes just fine, would something like this work for you?

import pandas as pd
import numpy as np
from sklearn.decomposition import PCA

df = pd.DataFrame(data=np.random.normal(0, 1, (20, 10)))

pca = PCA(n_components=5)
pca.fit(df)

You can access the components themselves with

pca.components_