Using the EM algorithm, I want to train a Gaussian Mixture model with four components on a given dataset. The set is three dimensional and contains 300 samples.
The problem is that after about 6 rounds of the EM algorithm, the covariance matrices sigma become close to singular according to matlab (rank(sigma) = 2
instead of 3). This in turn leads to undesired results like complex values evaluating the gaussian distribution gm(k,i)
.
Furthermore I used the log of the gaussian to account for underflow troubles - see E-step. I am not sure if this is correct and if I have to take the exp of the responsibilites p(w_k | x^(i), theta) somewhere else?
Can you tell me if my implementation of the EM algorithm is correct so far? And how to account for the problem with the close to singular covariance sigma?
Here is my implementation of the EM algorithm:
First I initialized the means and the covariance of the components using kmeans:
load('data1.mat');
X = Data'; % 300x3 data set
D = size(X,2); % dimension
N = size(X,1); % number of samples
K = 4; % number of Gaussian Mixture components
% Initialization
p = [0.2, 0.3, 0.2, 0.3]; % arbitrary pi
[idx,mu] = kmeans(X,K); % initial means of the components
% compute the covariance of the components
sigma = zeros(D,D,K);
for k = 1:K
sigma(:,:,k) = cov(X(idx==k,:));
end
For the E-step I am using the following formula to calculate the responsibilities.
w_k are the k gaussian components.
x^(i) is a single datapoint (sample)
theta stands for the parameters of the gaussian mixture model: mu, Sigma, pi.
Here is the corresponding code:
% variables for convergence
converged = 0;
prevLoglikelihood = Inf;
prevMu = mu;
prevSigma = sigma;
prevPi = p;
round = 0;
while (converged ~= 1)
round = round +1
gm = zeros(K,N); % gaussian component in the nominator
sumGM = zeros(N,1); % denominator of responsibilities
% E-step: Evaluate the responsibilities using the current parameters
% compute the nominator and denominator of the responsibilities
for k = 1:K
for i = 1:N
Xmu = X-mu;
% I am using log to prevent underflow of the gaussian distribution (exp("small value"))
logPdf = log(1/sqrt(det(sigma(:,:,k))*(2*pi)^D)) + (-0.5*Xmu*(sigma(:,:,k)\Xmu'));
gm(k,i) = log(p(k)) * logPdf;
sumGM(i) = sumGM(i) + gm(k,i);
end
end
% calculate responsibilities
res = zeros(K,N); % responsibilities
Nk = zeros(4,1);
for k = 1:K
for i = 1:N
% I tried to use the exp(gm(k,i)/sumGM(i)) to compute res but this leads to sum(pi) > 1.
res(k,i) = gm(k,i)/sumGM(i);
end
Nk(k) = sum(res(k,:));
end
Nk(k)
is computed using the formula given in the M-step and is used in the M-step to calculate the new probabilities p(k)
.
M-step
% M-step: Re-estimate the parameters using the current responsibilities
for k = 1:K
for i = 1:N
mu(k,:) = mu(k,:) + res(k,i).*X(k,:);
sigma(:,:,k) = sigma(:,:,k) + res(k,i).*(X(k,:)-mu(k,:))*(X(k,:)-mu(k,:))';
end
mu(k,:) = mu(k,:)./Nk(k);
sigma(:,:,k) = sigma(:,:,k)./Nk(k);
p(k) = Nk(k)/N;
end
Now in order to check for convergence the log-likelihood is computed using this formula:
% Evaluate the log-likelihood and check for convergence of either
% the parameters or the log-likelihood. If not converged, go to E-step.
loglikelihood = 0;
for i = 1:N
loglikelihood = loglikelihood + log(sum(gm(:,i)));
end
% Check for convergence of parameters
errorLoglikelihood = abs(loglikelihood-prevLoglikelihood);
if (errorLoglikelihood <= eps)
converged = 1;
end
errorMu = abs(mu(:)-prevMu(:));
errorSigma = abs(sigma(:)-prevSigma(:));
errorPi = abs(p(:)-prevPi(:));
if (all(errorMu <= eps) && all(errorSigma <= eps) && all(errorPi <= eps))
converged = 1;
end
prevLoglikelihood = loglikelihood;
prevMu = mu;
prevSigma = sigma;
prevPi = p;
end % while
Is there something wrong with my Matlab implementation of EM algorithm for Gaussian Mixture Models?
Previous troubles:
The problem is that I cannot check for convergence using the log-likelihood because it is -Inf
. This results from rounded zero values while evaluating the gaussian in the formula of the responsibilities (see E-step).
Can you tell me if my implementation of the EM algorithm is correct so far? And how to account for the problem with the rounded zero values?
Here is my implementation of the EM algorithm:
First I initialized the means and the covariance of the components using kmeans:
load('data1.mat');
X = Data'; % 300x3 data set
D = size(X,2); % dimension
N = size(X,1); % number of samples
K = 4; % number of Gaussian Mixture components
% Initialization
p = [0.2, 0.3, 0.2, 0.3]; % arbitrary pi
[idx,mu] = kmeans(X,K); % initial means of the components
% compute the covariance of the components
sigma = zeros(D,D,K);
for k = 1:K
sigma(:,:,k) = cov(X(idx==k,:));
end
For the E-step I am using the following formula to calculate the responsibilities
Here is the corresponding code:
% variables for convergence
converged = 0;
prevLoglikelihood = Inf;
prevMu = mu;
prevSigma = sigma;
prevPi = p;
round = 0;
while (converged ~= 1)
round = round +1
gm = zeros(K,N); % gaussian component in the nominator -
% some values evaluate to zero
sumGM = zeros(N,1); % denominator of responsibilities
% E-step: Evaluate the responsibilities using the current parameters
% compute the nominator and denominator of the responsibilities
for k = 1:K
for i = 1:N
% HERE values evalute to zero e.g. exp(-746.6228) = -Inf
gm(k,i) = p(k)/sqrt(det(sigma(:,:,k))*(2*pi)^D)*exp(-0.5*(X(i,:)-mu(k,:))*inv(sigma(:,:,k))*(X(i,:)-mu(k,:))');
sumGM(i) = sumGM(i) + gm(k,i);
end
end
% calculate responsibilities
res = zeros(K,N); % responsibilities
Nk = zeros(4,1);
for k = 1:K
for i = 1:N
res(k,i) = gm(k,i)/sumGM(i);
end
Nk(k) = sum(res(k,:));
end
Nk(k)
is computed using the formula given in the M-step.
M-step
% M-step: Re-estimate the parameters using the current responsibilities
mu = zeros(K,3);
for k = 1:K
for i = 1:N
mu(k,:) = mu(k,:) + res(k,i).*X(k,:);
sigma(:,:,k) = sigma(:,:,k) + res(k,i).*(X(k,:)-mu(k,:))*(X(k,:)-mu(k,:))';
end
mu(k,:) = mu(k,:)./Nk(k);
sigma(:,:,k) = sigma(:,:,k)./Nk(k);
p(k) = Nk(k)/N;
end
Now in order to check for convergence the log-likelihood is computed using this formula:
% Evaluate the log-likelihood and check for convergence of either
% the parameters or the log-likelihood. If not converged, go to E-step.
loglikelihood = 0;
for i = 1:N
loglikelihood = loglikelihood + log(sum(gm(:,i)));
end
% Check for convergence of parameters
errorLoglikelihood = abs(loglikelihood-prevLoglikelihood);
if (errorLoglikelihood <= eps)
converged = 1;
end
errorMu = abs(mu(:)-prevMu(:));
errorSigma = abs(sigma(:)-prevSigma(:));
errorPi = abs(p(:)-prevPi(:));
if (all(errorMu <= eps) && all(errorSigma <= eps) && all(errorPi <= eps))
converged = 1;
end
prevLoglikelihood = loglikelihood;
prevMu = mu;
prevSigma = sigma;
prevPi = p;
end % while
After the first round the loglikelihood
is around 700.
In the second round it is -Inf
because some gm(k,i)
values in the E-step are zero. Therefore the log is obviously negative infinity.
The zero values also lead to sumGM
equals to zero and therefore leading to all NaN entries inside the mu
and sigma
matrices.
How can I solve this problem? Can you tell me if there is something wrong with my implementation? Could it be solved by increasing Matlab's precision somehow?
EDIT:
I added a scaling for the exp() term in gm(k,i). Unfortunately this doesn't help much. After some more rounds I still get the underflow problem.
scale = zeros(N,D);
for i = 1:N
max = 0;
for k = 1:K
Xmu = X(i,:)-mu(k,:);
if (norm(scale(i,:) - Xmu) > max)
max = norm(scale(i,:) - Xmu);
scale(i,:) = Xmu;
end
end
end
for k = 1:K
for i = 1:N
Xmu = X(i,:)-mu(k,:);
% scale gm to prevent underflow
Xmu = Xmu - scale(i,:);
gm(k,i) = p(k)/sqrt(det(sigma(:,:,k))*(2*pi)^D)*exp(-0.5*Xmu*inv(sigma(:,:,k))*Xmu');
sumGM(i) = sumGM(i) + gm(k,i);
end
end
Further I noticed that kmeans initializes the means completely different compared to the following rounds where the means are computed in the M-step.
kmeans:
mu = 13.500000000000000 0.026602138870044 0.062415945993735
88.500000000000000 -0.009869960132085 -0.075177888210981
39.000000000000000 -0.042569305020309 0.043402772876513
64.000000000000000 -0.024519281362918 -0.012586980924762
after M-step:
round = 2
mu = 1.000000000000000 0.077230046948357 0.024498886414254
2.000000000000000 0.074260118474053 0.026484346404660
3.000000000000002 0.070944016105476 0.029043085983168
4.000000000000000 0.067613431480832 0.031641849205021
In the next rounds mu
doesn't change at all. It stays the same as in round 2.
I guess this is caused because of the underflow in gm(k,i)? Either my implementation of the scaling is incorrect or the whole implementation of the algorithm is wrong somewhere :(
EDIT 2
After four rounds I got NaN
values and looked into gm in more detail. Looking at only one sample (and without the 0.5 factor), gm
becomes zero in all components. Put in matlab gm(:,1) = [0 0 0 0]
. This in turn leads to sumGM equal to zero -> NaN because I divided by zero. I have given more details in
round = 1
mu = 62.0000 -0.0298 -0.0078
37.0000 -0.0396 0.0481
87.5000 -0.0083 -0.0728
12.5000 0.0303 0.0614
gm(:,1) = [11.7488, 0.0000, 0.0000, 0.0000]
round = 2
mu = 1.0000 0.0772 0.0245
2.0000 0.0743 0.0265
3.0000 0.0709 0.0290
4.0000 0.0676 0.0316
gm(:,1) = [0.0000, 0.0000, 0.0000, 0.3128]
round = 3
mu = 1.0000 0.0772 0.0245
2.0000 0.0743 0.0265
3.0000 0.0709 0.0290
4.0000 0.0676 0.0316
gm(:,1) = [0, 0, 0.0000, 0.2867]
round = 4
mu = 1.0000 0.0772 0.0245
NaN NaN NaN
3.0000 0.0709 0.0290
4.0000 0.0676 0.0316
gm(:,1) = 1.0e-105 * [0, NaN, 0, 0.5375]
First of all the means doesn't seem to change and are completely different compared to the initializaiton of kmeans.
And every sample (not just for the first one like here) corresponds only to one gaussian component according to the output of gm(:,1)
. Shouldn't the sample be "partially distributed" among every gaussian component?
EDIT3:
So I guess the problem with mu not changing was the first line in the M-step: mu = zeros(K,3);
.
To account the underflow problem I am currently trying to use the log of the gaussian:
function logPdf = logmvnpdf(X, mu, sigma, D)
Xmu = X-mu;
logPdf = log(1/sqrt(det(sigma)*(2*pi)^D)) + (-0.5*Xmu*inv(sigma)*Xmu');
end
The new problem is the covariance matrix sigma. Matlab claims: Warning: Matrix is close to singular or badly scaled. Results may be inaccurate.
After 6 rounds I get imaginary values for gm (gaussian distribution).
The updated E-Step looks like this now:
gm = zeros(K,N); % gaussian component in the nominator
sumGM = zeros(N,1); % denominator of responsibilities
for k = 1:K
for i = 1:N
%gm(k,i) = p(k)/sqrt(det(sigma(:,:,k))*(2*pi)^D)*exp(-0.5*Xmu*inv(sigma(:,:,k))*Xmu');
%gm(k,i) = p(k)*mvnpdf(X(i,:),mu(k,:),sigma(:,:,k));
gm(k,i) = log(p(k)) + logmvnpdf(X(i,:), mu(k,:), sigma(:,:,k), D);
sumGM(i) = sumGM(i) + gm(k,i);
end
end
It looks like you should be able to use a scale factor scale(i) to bring gm(k, i) into a representable range, because if you multiply gm(k, i) by scale(i) this will end up multiplying sumGM(i) as well, and be cancelled away when you work out res(k, i) = gm(k, i) / sumGM(i).
I would make scale(i) = 1 / max_k(exp(-0.5*(X(i,:)-mu(k,:))) in theory, and actually calculate it without doing the exponentiation, so you end up dealing with its log, max_k(-0.5*(X(i,:)-mu(k,:)) - this gives you a common term you can add to each -0.5*(X(i,:)-mu(k,:) before using exp() and will keep at least the maximum within a representable range - anything that still underflows to zero after this correction you don't care about anyway, because it is vanishingly small compared to the other contributions.