I have two dataframes I'm trying to merge:
target:
version city_id code
id
4 2 4 5736201000175
26 2 3 8290265000183
27 3 3 9529184000156
30 3 3 9263064000150
34 2 3 9312770000144
54 1 3 8407830000140
55 1 3 5590100000139
city:
federation_unit_id name
id
3 8 SAO PAULO
4 8 CAMPINAS
7 8 BARUERI
8 8 BEBEDOURO
9 8 SANTOS
I want to merge them combining target
's "city_id" with city
's "id", in a way that the final dataframe looks like this:
target:
version city_id code federation_unit_id name
id
4 2 4 5736201000175 8 CAMPINAS
26 2 3 8290265000183 8 SAO PAULO
27 3 3 9529184000156 8 SAO PAULO
30 3 3 9263064000150 8 SAO PAULO
34 2 3 9312770000144 8 SAO PAULO
54 1 3 8407830000140 8 SAO PAULO
55 1 3 5590100000139 8 SAO PAULO
To achieve that, I'm trying to use the following code:
target=target.merge(city, left_on='city_id', right_on='id')
However it keeps getting me the following KeyError:
Traceback (most recent call last):
File "/file.py", line 12, in <module>
target=target.merge(city, left_on='index', right_on='city_id')
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/frame.py", line 4437, in merge
copy=copy, indicator=indicator)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/tools/merge.py", line 38, in merge
copy=copy, indicator=indicator)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/tools/merge.py", line 210, in __init__
self.join_names) = self._get_merge_keys()
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/tools/merge.py", line 434, in _get_merge_keys
right_keys.append(right[rk]._values)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/frame.py", line 1997, in __getitem__
return self._getitem_column(key)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/frame.py", line 2004, in _getitem_column
return self._get_item_cache(key)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/generic.py", line 1350, in _get_item_cache
values = self._data.get(item)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/internals.py", line 3290, in get
loc = self.items.get_loc(item)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/indexes/base.py", line 1947, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)
File "pandas/index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)
File "pandas/hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368)
File "pandas/hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12322)
KeyError: 'id'
I can't find out what am I doing wrong :/ Can someone help on that?
You can use join
target.join(city, on='city_id')
join
is inherently index oriented. However, you can specify an alternative column to join on in the dataframe that constitutes the left side. If we call the join
method on target
then we want to specify 'city_id'
as that alternative column. The city
dataframe already has the appropriate index.