Pandas interpolate within a groupby

R. W. picture R. W. · May 5, 2016 · Viewed 9.1k times · Source

I've got a dataframe with the following information:

    filename    val1    val2
t                   
1   file1.csv   5       10
2   file1.csv   NaN     NaN
3   file1.csv   15      20
6   file2.csv   NaN     NaN
7   file2.csv   10      20
8   file2.csv   12      15

I would like to interpolate the values in the dataframe based on the indices, but only within each file group.

To interpolate, I would normally do

df = df.interpolate(method="index")

And to group, I do

grouped = df.groupby("filename")

I would like the interpolated dataframe to look like this:

    filename    val1    val2
t                   
1   file1.csv   5       10
2   file1.csv   10      15
3   file1.csv   15      20
6   file2.csv   NaN     NaN
7   file2.csv   10      20
8   file2.csv   12      15

Where the NaN's are still present at t = 6 since they are the first items in the file2 group.

I suspect I need to use "apply", but haven't been able to figure out exactly how...

grouped.apply(interp1d)
...
TypeError: __init__() takes at least 3 arguments (2 given)

Any help would be appreciated.

Answer

Alexander picture Alexander · May 5, 2016
>>> df.groupby('filename').apply(lambda group: group.interpolate(method='index'))
    filename  val1  val2
t                       
1  file1.csv     5    10
2  file1.csv    10    15
3  file1.csv    15    20
6  file2.csv   NaN   NaN
7  file2.csv    10    20
8  file2.csv    12    15