KeyError: Not in index, using a keys generated from a Pandas dataframe on itself

Jason picture Jason · Jun 11, 2014 · Viewed 7.1k times · Source

I have two columns in a Pandas DataFrame that has datetime as its index. The two column contain data measuring the same parameter but neither column is complete (some row have no data at all, some rows have data in both column and other data on in column 'a' or 'b').

I've written the following code to find gaps in columns, generate a list of indices of dates where these gaps appear and use this list to find and replace missing data. However I get a KeyError: Not in index on line 3, which I don't understand because the keys I'm using to index came from the DataFrame itself. Could somebody explain why this is happening and what I can do to fix it? Here's the code:

def merge_func(df):
    null_index = df[(df['DOC_mg/L'].isnull() == False) & (df['TOC_mg/L'].isnull() == True)].index
    df['TOC_mg/L'][null_index] = df[null_index]['DOC_mg/L']
    notnull_index = df[(df['DOC_mg/L'].isnull() == True) & (df['TOC_mg/L'].isnull() == False)].index
    df['DOC_mg/L'][notnull_index] = df[notnull_index]['TOC_mg/L']

    df.insert(len(df.columns), 'Mean_mg/L', 0.0)
    df['Mean_mg/L'] = (df['DOC_mg/L'] + df['TOC_mg/L']) / 2
    return df

merge_func(sve)

Answer

EdChum picture EdChum · Jun 11, 2014

Whenever you are considering performing assignment then you should use .loc:

df.loc[null_index,'TOC_mg/L']=df['DOC_mg/L']

The error in your original code is the ordering of the subscript values for the index lookup:

df['TOC_mg/L'][null_index] = df[null_index]['DOC_mg/L']

will produce an index error, I get the error on a toy dataset: IndexError: indices are out-of-bounds

If you changed the order to this it would probably work:

df['TOC_mg/L'][null_index] = df['DOC_mg/L'][null_index]

However, this is chained assignment and should be avoided, see the online docs

So you should use loc:

df.loc[null_index,'TOC_mg/L']=df['DOC_mg/L']
df.loc[notnull_index, 'DOC_mg/L'] = df['TOC_mg/L']

note that it is not necessary to use the same index for the rhs as it will align correctly