Python / Pandas - KeyError merging dataframes

abutremutante picture abutremutante · Jul 23, 2017 · Viewed 7.2k times · Source

I have two dataframes I'm trying to merge:

target:

    version  city_id            code
id                                                                            
4         2        4   5736201000175   
26        2        3   8290265000183   
27        3        3   9529184000156   
30        3        3   9263064000150   
34        2        3   9312770000144   
54        1        3   8407830000140   
55        1        3   5590100000139   



city:

    federation_unit_id       name
id                                        
3                    8  SAO PAULO
4                    8   CAMPINAS
7                    8    BARUERI
8                    8  BEBEDOURO
9                    8     SANTOS

I want to merge them combining target's "city_id" with city's "id", in a way that the final dataframe looks like this:

target:

    version  city_id            code    federation_unit_id       name
id                                                                            
4         2        4   5736201000175                     8   CAMPINAS
26        2        3   8290265000183                     8  SAO PAULO
27        3        3   9529184000156                     8  SAO PAULO
30        3        3   9263064000150                     8  SAO PAULO
34        2        3   9312770000144                     8  SAO PAULO
54        1        3   8407830000140                     8  SAO PAULO
55        1        3   5590100000139                     8  SAO PAULO

To achieve that, I'm trying to use the following code:

target=target.merge(city, left_on='city_id', right_on='id')

However it keeps getting me the following KeyError:

Traceback (most recent call last):
  File "/file.py", line 12, in <module>
target=target.merge(city, left_on='index', right_on='city_id')
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/frame.py", line 4437, in merge
copy=copy, indicator=indicator)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/tools/merge.py", line 38, in merge
copy=copy, indicator=indicator)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/tools/merge.py", line 210, in __init__
self.join_names) = self._get_merge_keys()
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/tools/merge.py", line 434, in _get_merge_keys
right_keys.append(right[rk]._values)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/frame.py", line 1997, in __getitem__
return self._getitem_column(key)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/frame.py", line 2004, in _getitem_column
return self._get_item_cache(key)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/generic.py", line 1350, in _get_item_cache
values = self._data.get(item)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/internals.py", line 3290, in get
loc = self.items.get_loc(item)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/indexes/base.py", line 1947, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)
  File "pandas/index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)
  File "pandas/hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368)
  File "pandas/hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12322)
KeyError: 'id'

I can't find out what am I doing wrong :/ Can someone help on that?

Answer

piRSquared picture piRSquared · Jul 23, 2017

You can use join

target.join(city, on='city_id')

join is inherently index oriented. However, you can specify an alternative column to join on in the dataframe that constitutes the left side. If we call the join method on target then we want to specify 'city_id' as that alternative column. The city dataframe already has the appropriate index.