How can I interpolate station data with Kriging in Python?

lanadaquenada picture lanadaquenada · Jul 18, 2017 · Viewed 15.6k times · Source

Browsing the web I've found that some tools to use Kriging in Python are pyKriging and Gaussian Process Regression. However, I couldn't make any of them to work. The first one doesn't work for me (can't even import it):

import pyKriging

  File "~/python3.6/site-packages/pyKriging/krige.py", line 142
    except Exception, err: 
                    ^
  SyntaxError: invalid syntax

and the second one I don't understand how to use it. I couldn't find a simple working example (this rroowwllaanndd answer for instance is great but sadly the data is no longer available to download)

So my question is, how could I interpolate my data using Kriging? I have several station data saved in numpy arrays that look like this:

2000      1         1         5.0
2000      1         2         3.4
2000      1         3         0.2

and the columns are Year - Month - Day - Precipitation. I have several of these data arrays (st1, st2, st3), and another array which contains the ID of each station and the coordinates at which each station is located (stid, so station 1 is located in longitude 15.6865, latitude 62.6420, and so on).

import numpy as np
st1 = np.array([[2000,1,1,5.0],[2000,1,2,3.4],[2000,1,3,0.2]])
st2 = np.array([[2000,1,1,8.2],[2000,1,2,2.5],[2000,1,3,0.0]])
st3 = np.array([[2000,1,1,np.nan],[2000,1,2,4.5],[2000,1,3,1.2]])

stid = np.array([[1,15.6865,62.6420],[2,15.7325,62.1254],[3,16.1035,61.1449]])

What I need is an array per day (or a 3D array) which contains the data of all stations interpolated with Kriging in a grid like this for each day:

y = np.arange(61,63,0.125)
x = np.arange(14,17,0.125)
X,Y = np.meshgrid(x,y)

Any help is appreciated.

Answer

Pop picture Pop · Jul 28, 2017

It is good to know to find interesting documentation, packages, etc. that kriging is often called "Gaussian Process Regression".

In python, a good implementation with many examples is the one of the well-known machine learning package scikit-learn. It is based on the well-known DACE matlab implementation.

Documentation on the Gaussian Process Regression implementation can be found on this page and in the links therein. You can find 5 tutorials at the bottom of this page: enter link description here. A list of available kernels can be found here.

With the data you provided, you just have to do the following to fit a simple model with the kernel of your choice:

import sklearn
gp = sklearn.gaussian_process.GaussianProcessRegressor(kernel=your_chosen_kernel)
gp.fit(X, y)