I have been struggling to inteprolate the data for "empty" pixels in my 2D matrix. Basically, I understand (but not deeply) interpolation techniques such as Inverse Distance Weighting, Kriging, Bicubic etc. I dont know the starting point exactly (either in the statement of the problem or Python case).
The problem definition: I have MxN matrix (regular grid) in which each pixel represents certain measurement value (figure below and data used in this figure is here). I wanted to interpolate the data for "question mark space" (white space which also consists of the same sized but empty pixels) areas using the existing data I have as blue pixels.
My question:
1) How can I interpolate this data. Could anyone give me simple example (e.g. 3x3 matrix) to understand that clearly?
2) Could anyone guide me how to perform the steps towards solution in Python environment?
3) How can I compare the interpolation techniques in accuracy sense using Python?
4) Do you think it is a good idea to use different interpolation depending on the density of the data?
I will appreciate your answers and suggestions.
What is a sensible solution largely depends on what questions you're trying to answer with the interpolated pixels -- caveat emptor: extrapolating over missing data can lead to very misleading answers!
Radial Basis Function Interpolation / Kernel Smoothing
In terms of practical solutions available in Python, one way to fill those pixels in would be to use Scipy's implementation of Radial Basis Function interpolation (see here) which is intended for the smoothing/interpolation of scattered data.
Given your matrix M
and underlying 1D coordinate arrays r
and c
(such that M.shape == (r.size, c.size)
), where missing entries of M are set to nan
, this seems to work fairly well with a linear RBF kernel as follows:
import numpy as np
import scipy.interpolate as interpolate
with open('measurement.txt') as fh:
M = np.vstack(map(float, r.split(' ')) for r in fh.read().splitlines())
r = np.linspace(0, 1, M.shape[0])
c = np.linspace(0, 1, M.shape[1])
rr, cc = np.meshgrid(r, c)
vals = ~np.isnan(M)
f = interpolate.Rbf(rr[vals], cc[vals], M[vals], function='linear')
interpolated = f(rr, cc)
This yields the following interpolation of the data you've linked to above, which although reasonable looking, does highlight how unfavourable the ratio of missing samples to real data is:
Gaussian Process Regression / Kriging
Kriging interpolation is available via the Gaussian Process Regression implementation (which is itself based on the DACE Kriging toolbox for Matlab) in the scikit-learn library. This could be invoked as follows:
from sklearn.gaussian_process import GaussianProcess
gp = GaussianProcess(theta0=0.1, thetaL=.001, thetaU=1., nugget=0.01)
gp.fit(X=np.column_stack([rr[vals],cc[vals]]), y=M[vals])
rr_cc_as_cols = np.column_stack([rr.flatten(), cc.flatten()])
interpolated = gp.predict(rr_cc_as_cols).reshape(M.shape)
This yields a very similar interpolation to the Radial Basis Function example above. In both cases there are a lot of parameters to explore - the choice of these largely hinges on the assumptions that you can make about the data. (One advantage of the linear kernel used in the RBF example above is that it has no free parameters)
Inpainting
As a final aside, an entirely visually motivated solution would be use OpenCV's inpainting functionality, although this assumes 8bit arrays (0 - 255), and does not have a straightforward mathematical interpretation.