I was inspired by this answer by @James to see how griddata
and map_coordinates
might be used. In the examples below I'm showing 2D data, but my interest is in 3D. I noticed that griddata
only provides splines for 1D and 2D, and is limited to linear interpolation for 3D and higher (probably for very good reasons). However, map_coordinates seems to be fine with 3D using higher order (smoother than piece-wise linear) interpolation.
My primary question: if I have random, unstructured data (where I can not use map_coordinates) in 3D, is there some way to get smoother than piece-wise linear interpolation within the NumPy SciPy universe, or at least nearby?
My secondary question: is spline for 3D not available in griddata
because it is difficult or tedious to implement, or is there a fundamental difficulty?
The images and horrible python below show my current understanding of how griddata and map_coordinates can or can't be used. Interpolation is done along the thick black line.
STRUCTURED DATA:
UNSTRUCTURED DATA:
Horrible python:
import numpy as np
import matplotlib.pyplot as plt
def g(x, y):
return np.exp(-((x-1.0)**2 + (y-1.0)**2))
def findit(x, X): # or could use some 1D interpolation
fraction = (x - X[0]) / (X[-1]-X[0])
return fraction * float(X.shape[0]-1)
nth, nr = 12, 11
theta_min, theta_max = 0.2, 1.3
r_min, r_max = 0.7, 2.0
theta = np.linspace(theta_min, theta_max, nth)
r = np.linspace(r_min, r_max, nr)
R, TH = np.meshgrid(r, theta)
Xp, Yp = R*np.cos(TH), R*np.sin(TH)
array = g(Xp, Yp)
x, y = np.linspace(0.0, 2.0, 200), np.linspace(0.0, 2.0, 200)
X, Y = np.meshgrid(x, y)
blob = g(X, Y)
xtest = np.linspace(0.25, 1.75, 40)
ytest = np.zeros_like(xtest) + 0.75
rtest = np.sqrt(xtest**2 + ytest**2)
thetatest = np.arctan2(xtest, ytest)
ir = findit(rtest, r)
it = findit(thetatest, theta)
plt.figure()
plt.subplot(2,1,1)
plt.scatter(100.0*Xp.flatten(), 100.0*Yp.flatten())
plt.plot(100.0*xtest, 100.0*ytest, '-k', linewidth=3)
plt.hold
plt.imshow(blob, origin='lower', cmap='gray')
plt.text(5, 5, "don't use jet!", color='white')
exact = g(xtest, ytest)
import scipy.ndimage.interpolation as spndint
ndint0 = spndint.map_coordinates(array, [it, ir], order=0)
ndint1 = spndint.map_coordinates(array, [it, ir], order=1)
ndint2 = spndint.map_coordinates(array, [it, ir], order=2)
import scipy.interpolate as spint
points = np.vstack((Xp.flatten(), Yp.flatten())).T # could use np.array(zip(...))
grid_x = xtest
grid_y = np.array([0.75])
g0 = spint.griddata(points, array.flatten(), (grid_x, grid_y), method='nearest')
g1 = spint.griddata(points, array.flatten(), (grid_x, grid_y), method='linear')
g2 = spint.griddata(points, array.flatten(), (grid_x, grid_y), method='cubic')
plt.subplot(4,2,5)
plt.plot(exact, 'or')
#plt.plot(ndint0)
plt.plot(ndint1)
plt.plot(ndint2)
plt.title("map_coordinates")
plt.subplot(4,2,6)
plt.plot(exact, 'or')
#plt.plot(g0)
plt.plot(g1)
plt.plot(g2)
plt.title("griddata")
plt.subplot(4,2,7)
#plt.plot(ndint0 - exact)
plt.plot(ndint1 - exact)
plt.plot(ndint2 - exact)
plt.title("error map_coordinates")
plt.subplot(4,2,8)
#plt.plot(g0 - exact)
plt.plot(g1 - exact)
plt.plot(g2 - exact)
plt.title("error griddata")
plt.show()
seed_points_rand = 2.0 * np.random.random((400, 2))
rr = np.sqrt((seed_points_rand**2).sum(axis=-1))
thth = np.arctan2(seed_points_rand[...,1], seed_points_rand[...,0])
isinside = (rr>r_min) * (rr<r_max) * (thth>theta_min) * (thth<theta_max)
points_rand = seed_points_rand[isinside]
Xprand, Yprand = points_rand.T # unpack
array_rand = g(Xprand, Yprand)
grid_x = xtest
grid_y = np.array([0.75])
plt.figure()
plt.subplot(2,1,1)
plt.scatter(100.0*Xprand.flatten(), 100.0*Yprand.flatten())
plt.plot(100.0*xtest, 100.0*ytest, '-k', linewidth=3)
plt.hold
plt.imshow(blob, origin='lower', cmap='gray')
plt.text(5, 5, "don't use jet!", color='white')
g0rand = spint.griddata(points_rand, array_rand.flatten(), (grid_x, grid_y), method='nearest')
g1rand = spint.griddata(points_rand, array_rand.flatten(), (grid_x, grid_y), method='linear')
g2rand = spint.griddata(points_rand, array_rand.flatten(), (grid_x, grid_y), method='cubic')
plt.subplot(4,2,6)
plt.plot(exact, 'or')
#plt.plot(g0rand)
plt.plot(g1rand)
plt.plot(g2rand)
plt.title("griddata")
plt.subplot(4,2,8)
#plt.plot(g0rand - exact)
plt.plot(g1rand - exact)
plt.plot(g2rand - exact)
plt.title("error griddata")
plt.show()
Good question! (and nice plots!)
For unstructured data, you'll want to switch back to functions meant for unstructured data. griddata
is one option, but uses triangulation with linear interpolation in between. This leads to "hard" edges at triangle boundaries.
Splines are radial basis functions. In scipy terms, you want scipy.interpolate.Rbf
. I'd recommend using function="linear"
or function="thin_plate"
over cubic splines, but cubic is available as well. (Cubic splines will exacerbate problems with "overshooting" compared to linear or thin-plate splines.)
One caveat is that this particular implementation of radial basis functions will always use all points in your dataset. This is the most accurate and smooth approach, but it scales poorly as the number of input observation points increases. There are several ways around this, but things will get more complex. I'll leave that for another question.
At any rate, here's a simplified example. We'll generate random data and then interpolate it at points that are on a regular grid. (Note that the input is not on a regular grid, and the interpolated points don't need to be either.)
import numpy as np
import scipy.interpolate
import matplotlib.pyplot as plt
np.random.seed(1977)
x, y, z = np.random.random((3, 10))
interp = scipy.interpolate.Rbf(x, y, z, function='thin_plate')
yi, xi = np.mgrid[0:1:100j, 0:1:100j]
zi = interp(xi, yi)
plt.plot(x, y, 'ko')
plt.imshow(zi, extent=[0, 1, 1, 0], cmap='gist_earth')
plt.colorbar()
plt.show()
I chose "thin_plate"
as the type of spline. Our input observations points range from 0 to 1 (they're created by np.random.random
). Notice that our interpolated values go slightly above 1 and well below zero. This is "overshooting".
Linear splines will completely avoid overshooting, but you'll wind up with "bullseye" patterns (nowhere near as severe as with IDW methods, though). For example, here's the exact same data interpolated with a linear radial basis function. Notice that our interpolated values never go above 1 or below 0:
Higher order splines will make trends in the data more continuous but will overshoot more. The default "multiquadric"
is fairly similar to a thin-plate spline, but will make things a bit more continuous and overshoot a bit worse:
However, as you go to even higher order splines such as "cubic"
(third order):
and "quintic"
(fifth order)
You can really wind up with unreasonable results as soon as you move even slightly beyond your input data.
At any rate, here's a simple example to compare different radial basis functions on random data:
import numpy as np
import scipy.interpolate
import matplotlib.pyplot as plt
np.random.seed(1977)
x, y, z = np.random.random((3, 10))
yi, xi = np.mgrid[0:1:100j, 0:1:100j]
interp_types = ['multiquadric', 'inverse', 'gaussian', 'linear', 'cubic',
'quintic', 'thin_plate']
for kind in interp_types:
interp = scipy.interpolate.Rbf(x, y, z, function=kind)
zi = interp(xi, yi)
fig, ax = plt.subplots()
ax.plot(x, y, 'ko')
im = ax.imshow(zi, extent=[0, 1, 1, 0], cmap='gist_earth')
fig.colorbar(im)
ax.set(title=kind)
fig.savefig(kind + '.png', dpi=80)
plt.show()