I am having issues accessing data inside a dictionary.
Sys: Macbook 2012
Python: Python 3.5.1 :: Continuum Analytics, Inc.
I am working with a dask.dataframe created from a csv.
Assume I start out with a Pandas Series:
df.Coordinates
130 {u'type': u'Point', u'coordinates': [-43.30175...
278 {u'type': u'Point', u'coordinates': [-51.17913...
425 {u'type': u'Point', u'coordinates': [-43.17986...
440 {u'type': u'Point', u'coordinates': [-51.16376...
877 {u'type': u'Point', u'coordinates': [-43.17986...
1313 {u'type': u'Point', u'coordinates': [-49.72688...
1734 {u'type': u'Point', u'coordinates': [-43.57405...
1817 {u'type': u'Point', u'coordinates': [-43.77649...
1835 {u'type': u'Point', u'coordinates': [-43.17132...
2739 {u'type': u'Point', u'coordinates': [-43.19583...
2915 {u'type': u'Point', u'coordinates': [-43.17986...
3035 {u'type': u'Point', u'coordinates': [-51.01583...
3097 {u'type': u'Point', u'coordinates': [-43.17891...
3974 {u'type': u'Point', u'coordinates': [-8.633880...
3983 {u'type': u'Point', u'coordinates': [-46.64960...
4424 {u'type': u'Point', u'coordinates': [-43.17986...
The problem is, this is not a true dataframe of dictionaries. Instead, it's a column full of strings that LOOK like dictionaries. Running this show it:
df.Coordinates.apply(type)
130 <class 'str'>
278 <class 'str'>
425 <class 'str'>
440 <class 'str'>
877 <class 'str'>
1313 <class 'str'>
1734 <class 'str'>
1817 <class 'str'>
1835 <class 'str'>
2739 <class 'str'>
2915 <class 'str'>
3035 <class 'str'>
3097 <class 'str'>
3974 <class 'str'>
3983 <class 'str'>
4424 <class 'str'>
My Goal: Access the coordinates
key and value in the dictionary. That's it. But it's a str
I converted the strings to dictionaries using eval
.
new = df.Coordinates.apply(eval)
130 {'coordinates': [-43.301755, -22.990065], 'typ...
278 {'coordinates': [-51.17913026, -30.01201896], ...
425 {'coordinates': [-43.17986794, -22.91000096], ...
440 {'coordinates': [-51.16376782, -29.95488677], ...
877 {'coordinates': [-43.17986794, -22.91000096], ...
1313 {'coordinates': [-49.72688407, -29.33757253], ...
1734 {'coordinates': [-43.574057, -22.928059], 'typ...
1817 {'coordinates': [-43.77649254, -22.86940539], ...
1835 {'coordinates': [-43.17132318, -22.90895217], ...
2739 {'coordinates': [-43.1958313, -22.98755333], '...
2915 {'coordinates': [-43.17986794, -22.91000096], ...
3035 {'coordinates': [-51.01583481, -29.63593292], ...
3097 {'coordinates': [-43.17891379, -22.96476163], ...
3974 {'coordinates': [-8.63388008, 41.14594453], 't...
3983 {'coordinates': [-46.64960938, -23.55902666], ...
4424 {'coordinates': [-43.17986794, -22.91000096], ...
Next I text the type of object and get:
130 <class 'dict'>
278 <class 'dict'>
425 <class 'dict'>
440 <class 'dict'>
877 <class 'dict'>
1313 <class 'dict'>
1734 <class 'dict'>
1817 <class 'dict'>
1835 <class 'dict'>
2739 <class 'dict'>
2915 <class 'dict'>
3035 <class 'dict'>
3097 <class 'dict'>
3974 <class 'dict'>
3983 <class 'dict'>
4424 <class 'dict'>
If I try to access my dictionaries: new.apply(lambda x: x['coordinates']
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-71-c0ad459ed1cc> in <module>()
----> 1 dfCombined.Coordinates.apply(coord_getter)
/Users/linwood/anaconda/envs/dataAnalysisWithPython/lib/python3.5/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
2218 else:
2219 values = self.asobject
-> 2220 mapped = lib.map_infer(values, f, convert=convert_dtype)
2221
2222 if len(mapped) and isinstance(mapped[0], Series):
pandas/src/inference.pyx in pandas.lib.map_infer (pandas/lib.c:62658)()
<ipython-input-68-748ce2d8529e> in coord_getter(row)
1 import ast
2 def coord_getter(row):
----> 3 return (ast.literal_eval(row))['coordinates']
TypeError: 'bool' object is not subscriptable
It's some type of class, because when I run dir
I get this for one object:
new.apply(lambda x: dir(x))[130]
130 __class__
130 __contains__
130 __delattr__
130 __delitem__
130 __dir__
130 __doc__
130 __eq__
130 __format__
130 __ge__
130 __getattribute__
130 __getitem__
130 __gt__
130 __hash__
130 __init__
130 __iter__
130 __le__
130 __len__
130 __lt__
130 __ne__
130 __new__
130 __reduce__
130 __reduce_ex__
130 __repr__
130 __setattr__
130 __setitem__
130 __sizeof__
130 __str__
130 __subclasshook__
130 clear
130 copy
130 fromkeys
130 get
130 items
130 keys
130 pop
130 popitem
130 setdefault
130 update
130 values
Name: Coordinates, dtype: object
My Problem: I just want to access the dictionary. But, the object is <class 'dict'>
. How do I covert this to a regular dict or just access the key:value pairs?
Any ideas??
My first instinct is to use the json.loads
to cast the strings into dicts. But the example you've posted does not follow the json standard since it uses single instead of double quotes. So you have to convert the strings first.
A second option is to just use regex to parse the strings. If the dict strings in your actual DataFrame do not exactly match my examples, I expect the regex method to be more robust since lat/long coords are fairly standard.
import re
import pandasd as pd
df = pd.DataFrame(data={'Coordinates':["{u'type': u'Point', u'coordinates': [-43.30175, 123.45]}",
"{u'type': u'Point', u'coordinates': [-51.17913, 123.45]}"],
'idx': [130, 278]})
##
# Solution 1- use json.loads
##
def string_to_dict(dict_string):
# Convert to proper json format
dict_string = dict_string.replace("'", '"').replace('u"', '"')
return json.loads(dict_string)
df.CoordDicts = df.Coordinates.apply(string_to_dict)
df.CoordDicts[0]['coordinates']
#>>> [-43.30175, 123.45]
##
# Solution 2 - use regex
##
def get_lat_lon(dict_string):
# Get the coordinates string with regex
rs = re.search("(\-?\d+(\.\d+)?),\s*(\-?\d+(\.\d+)?)", dict_string).group()
# Cast to floats
coords = [float(x) for x in rs.split(',')]
return coords
df.Coords = df.Coordinates.apply(get_lat_lon)
df.Coords[0]
#>>> [-43.30175, 123.45]