How to use Robust PCA output as principal-component (eigen)vectors from traditional PCA

Question 1

How to use Robust PCA output as principal-component (eigen)vectors from traditional PCA

python pca

AstroBen · Nov 21, 2016 · Viewed 8.4k times · Source

Answer

Answer

The robust-pca code factors the data matrix D into two matrices, L and S which are "low-rank" and "sparse" matrices (see the paper for details). L is what's mostly constant between the various observations, while S is what varies. Figures 2 and 3 in the paper give a really nice example from a couple of security cameras, picking out the static background (L) and variability such as passing people (S).

If you just want the eigenvectors, treat the S as junk (the "large outliers" you're wanting to clip out) and do an eigenanalysis on the L matrix.

Here's an example using the robust-pca code:

    L, S = RPCA(data).fit()
    rcomp, revals, revecs = pca(L)
    print("Normalised robust eigenvalues: %s" % (revals/np.sum(revals),))

Here, the pca function is:

def pca(data, numComponents=None):
    """Principal Components Analysis

    From: http://stackoverflow.com/a/13224592/834250

    Parameters
    ----------
    data : `numpy.ndarray`
        numpy array of data to analyse
    numComponents : `int`
        number of principal components to use

    Returns
    -------
    comps : `numpy.ndarray`
        Principal components
    evals : `numpy.ndarray`
        Eigenvalues
    evecs : `numpy.ndarray`
        Eigenvectors
    """
    m, n = data.shape
    data -= data.mean(axis=0)
    R = np.cov(data, rowvar=False)
    # use 'eigh' rather than 'eig' since R is symmetric,
    # the performance gain is substantial
    evals, evecs = np.linalg.eigh(R)
    idx = np.argsort(evals)[::-1]
    evecs = evecs[:,idx]
    evals = evals[idx]
    if numComponents is not None:
        evecs = evecs[:, :numComponents]
    # carry out the transformation on the data using eigenvectors
    # and return the re-scaled data, eigenvalues, and eigenvectors
    return np.dot(evecs.T, data.T).T, evals, evecs

Question 2

I am using PCA to reduce the dimensionality of a N-dimensional dataset, but I want to build in robustness to large outliers, so I've been looking into Robust PCA codes.

For traditional PCA, I'm using python's sklearn.decomposition.PCA which nicely returns the principal components as vectors, onto which I can then project my data (to be clear, I've also coded my own versions using SVD so I know how the method works). I found a few pre-coded RPCA python codes out there (like https://github.com/dganguli/robust-pca and https://github.com/jkarnows/rpcaADMM).

The 1st code is based on the Candes et al. (2009) method, and returns low rank L and sparse S matrices for a dataset D. The 2nd code uses the ADMM method of matrix decomposition (Parikh, N., & Boyd, S. 2013) and returns X_1, X_2, X_3 matrices. I must admit, I'm having a very hard time figuring out how to connect these to the principal axes that are returned by a standard PCM algorithm. Can anyone provide any guidance?

Specifically, in one dataset X, I have a cloud of N 3-D points. I run it through PCA:

pca=sklean.decompose.PCA(n_components=3)
pca.fit(X)
comps=pca.components_

and these 3 components are 3-D vectors define the new basis onto which I project all my points. With Robust PCA, I get matrices L+S=X. Does one then run pca.fit(L)? I would have thought that RPCA would have given me back the eigenvectors but have internal steps to throw out outliers as part of building the covariance matrix or performing SVD.

Maybe what I think of as "Robust PCA" isn't how other people are using/coding it?

How to use Robust PCA output as principal-component (eigen)vectors from traditional PCA

Answer

Related questions