I have cell array each containing a sequence of values as a row vector. The sequences contain some missing values represented by NaN
.
I would like to replace all NaNs using some sort of interpolation method, how can I can do this in MATLAB? I am also open to other suggestions on how to deal with these missing values.
Consider this sample data to illustrate the problem:
seq = {randn(1,10); randn(1,7); randn(1,8)};
for i=1:numel(seq)
%# simulate some missing values
ind = rand( size(seq{i}) ) < 0.2;
seq{i}(ind) = nan;
end
The resulting sequences:
seq{1}
ans =
-0.50782 -0.32058 NaN -3.0292 -0.45701 1.2424 NaN 0.93373 NaN -0.029006
seq{2}
ans =
0.18245 -1.5651 -0.084539 1.6039 0.098348 0.041374 -0.73417
seq{3}
ans =
NaN NaN 0.42639 -0.37281 -0.23645 2.0237 -2.2584 2.2294
Edit:
Based on the responses, I think there's been a confusion: obviously I'm not working with random data, the code shown above is simply an example of how the data is structured.
The actual data is some form of processed signals. The problem is that during the analysis, my solution would fail if the sequences contain missing values, hence the need for filtering/interpolation (I already considered using the mean of each sequence to fill the blanks, but I am hoping for something more powerful)
Well, if you're working with time-series data then you can use Matlab's built in interpolation function.
Something like this should work for your situation, but you'll need to tailor it a little ... ie. if you don't have equal spaced sampling you'll need to modify the times
line.
nseq = cell(size(seq))
for i = 1:numel(seq)
times = 1:length(seq{i});
mask = ~isnan(seq{i});
nseq{i} = seq{i};
nseq{i}(~mask) = interp1(times(mask), seq{i}(mask), times(~mask));
end
You'll need to play around with the options of interp1
to figure out which ones work best for your situation.