MATLAB: Using interpolation to replace missing values (NaN)

Dave picture Dave · Sep 2, 2010 · Viewed 35.2k times · Source

I have cell array each containing a sequence of values as a row vector. The sequences contain some missing values represented by NaN.

I would like to replace all NaNs using some sort of interpolation method, how can I can do this in MATLAB? I am also open to other suggestions on how to deal with these missing values.

Consider this sample data to illustrate the problem:

seq = {randn(1,10); randn(1,7); randn(1,8)};
for i=1:numel(seq)
    %# simulate some missing values
    ind = rand( size(seq{i}) ) < 0.2;
    seq{i}(ind) = nan;
end

The resulting sequences:

seq{1}
ans =
     -0.50782     -0.32058          NaN      -3.0292     -0.45701       1.2424          NaN      0.93373          NaN    -0.029006
seq{2}
ans =
      0.18245      -1.5651    -0.084539       1.6039     0.098348     0.041374     -0.73417
seq{3}
ans =
          NaN          NaN      0.42639     -0.37281     -0.23645       2.0237      -2.2584       2.2294

Edit:

Based on the responses, I think there's been a confusion: obviously I'm not working with random data, the code shown above is simply an example of how the data is structured.

The actual data is some form of processed signals. The problem is that during the analysis, my solution would fail if the sequences contain missing values, hence the need for filtering/interpolation (I already considered using the mean of each sequence to fill the blanks, but I am hoping for something more powerful)

Answer

JudoWill picture JudoWill · Sep 2, 2010

Well, if you're working with time-series data then you can use Matlab's built in interpolation function.

Something like this should work for your situation, but you'll need to tailor it a little ... ie. if you don't have equal spaced sampling you'll need to modify the times line.

nseq = cell(size(seq))
for i = 1:numel(seq)
    times = 1:length(seq{i});
    mask =  ~isnan(seq{i});
    nseq{i} = seq{i};
    nseq{i}(~mask) = interp1(times(mask), seq{i}(mask), times(~mask));

end

You'll need to play around with the options of interp1 to figure out which ones work best for your situation.