For example, I have 9 variables and 362 cases. I've made PCA calculation, and found out that first 3 PCA coordinates are enough for me.
Now, I have new point in my 9-dimensional structure, and I want to project it to principal component system coordinate. How to get its new coordinates?
%# here is data (362x9)
load SomeData
[W, Y] = pca(data, 'VariableWeights', 'variance', 'Centered', true);
%# orthonormal coefficient matrix
W = diag(std(data))\W;
% Getting mean and weights of data (for future data)
[data, mu, sigma] = zscore(data);
sigma(sigma==0) = 1;
%# New point in original 9dim system
%# For example, it is the first point of our input data
x = data(1,:);
x = bsxfun(@minus,x, mu);
x = bsxfun(@rdivide, x, sigma);
%# New coordinates as principal components
y0 = Y(1,:); %# point we should get in result
y = (W*x')'; %# our result
%# error
sum(abs(y0 - y)) %# 142 => they are not the same point
%# plot
figure()
plot(y0,'g'); hold on;
plot(y,'r');
How to get coordinates of a new point projected to new principal component basis?
Main fallacy was in operation that converts points to new basis:
y = (W*x')';
Wikipedia says:
The projected vectors are the columns of the matrix
Y = W*·Z,
where
Y is L×N, W is M×L, Z is M×N
,
but pca()
returns W
of size L×M
and Y
of size NxL
so, correct equation in Matlab is:
y = x*W
Below is the corrected code:
[W, Y] = pca(data, 'VariableWeights', 'variance', 'Centered', true);
W = diag(std(data))\W;
%# Getting mean and weights of data (for future data)
[~, mu, we] = zscore(data);
we(we==0) = 1;
%# New point in original 9dim system
%# For example, it is the first point of our input data
x = data(1,:);
x = bsxfun(@minus,x, mu);
x = bsxfun(@rdivide, x, we);
%# New coordinates as principal components
y = x*W;
y0 = Y(1,:);
sum(abs(y0 - y)) %# 4.1883e-14 ~= 0