I am using scikit-learn. The nature of my application is such that I do the fitting offline, and then can only use the resulting coefficients online(on the fly), to manually calculate various objectives.
The transform is simple, it is just data * pca.components_
, i.e. simple dot product. However, I have no idea how to perform the inverse transform. Which field of the pca
object contains the relevant coefficients for the inverse transform? How do I calculate the inverse transform?
Specifically, I am referring to the PCA.inverse_transform() method call available in the sklearn.decomposition.PCA package
: how can I manually reproduce its functionality using various coefficients calculated by the PCA?
1) transform
is not data * pca.components_
.
Firstly, *
is not dot product for numpy array. It is element-wise multiplication. To perform dot product, you need to use np.dot
.
Secondly, the shape of PCA.components_
is (n_components, n_features) while the shape of data to transform is (n_samples, n_features), so you need to transpose PCA.components_
to perform dot product.
Moreover, the first step of transform is to subtract the mean, therefore if you do it manually, you also need to subtract the mean at first.
The correct way to transform is
data_reduced = np.dot(data - pca.mean_, pca.components_.T)
2) inverse_transform
is just the inverse process of transform
data_original = np.dot(data_reduced, pca.components_) + pca.mean_
If your data already has zero mean in each column, you can ignore the pca.mean_
above, for example
import numpy as np
from sklearn.decomposition import PCA
pca = PCA(n_components=3)
pca.fit(data)
data_reduced = np.dot(data, pca.components_.T) # transform
data_original = np.dot(data_reduced, pca.components_) # inverse_transform