PCA inverse transform manually

Baron Yugovich picture Baron Yugovich · Sep 24, 2015 · Viewed 11.7k times · Source

I am using scikit-learn. The nature of my application is such that I do the fitting offline, and then can only use the resulting coefficients online(on the fly), to manually calculate various objectives.

The transform is simple, it is just data * pca.components_, i.e. simple dot product. However, I have no idea how to perform the inverse transform. Which field of the pca object contains the relevant coefficients for the inverse transform? How do I calculate the inverse transform?

Specifically, I am referring to the PCA.inverse_transform() method call available in the sklearn.decomposition.PCA package: how can I manually reproduce its functionality using various coefficients calculated by the PCA?

Answer

yangjie picture yangjie · Sep 24, 2015

1) transform is not data * pca.components_.

Firstly, * is not dot product for numpy array. It is element-wise multiplication. To perform dot product, you need to use np.dot.

Secondly, the shape of PCA.components_ is (n_components, n_features) while the shape of data to transform is (n_samples, n_features), so you need to transpose PCA.components_ to perform dot product.

Moreover, the first step of transform is to subtract the mean, therefore if you do it manually, you also need to subtract the mean at first.

The correct way to transform is

data_reduced = np.dot(data - pca.mean_, pca.components_.T)

2) inverse_transform is just the inverse process of transform

data_original = np.dot(data_reduced, pca.components_) + pca.mean_

If your data already has zero mean in each column, you can ignore the pca.mean_ above, for example

import numpy as np
from sklearn.decomposition import PCA

pca = PCA(n_components=3)
pca.fit(data)

data_reduced = np.dot(data, pca.components_.T) # transform
data_original = np.dot(data_reduced, pca.components_) # inverse_transform