Arff Loader : AttributeError: 'dict' object has no attribute 'data'

Erdnase picture Erdnase · Mar 10, 2015 · Viewed 28.5k times · Source

I am trying to load a .arff file into a numpy array using liac-arff library. (https://github.com/renatopp/liac-arff)

This is my code.

import arff, numpy as np
dataset = arff.load(open('mydataset.arff', 'rb'))
data = np.array(dataset.data)

when executing, I am getting the error.

ArffLoader.py", line 8, in <module>
data = np.array(dataset.data)
AttributeError: 'dict' object has no attribute 'data'

I have seen similar threads, Smartsheet Data Tracker: AttributeError: 'dict' object has no attribute 'append'. I am new to Python and is not able to resolve this issue. How can I fix this?

Answer

TheBlackCat picture TheBlackCat · Mar 10, 2015

Short version

dataset is a dict. For a dict, you access the values using the python indexing notation, dataset[key], where key could be a string, integer, float, tuple, or any other immutable data type (it is a bit more complicated than that, more below if you are interested).

In your case, the key is in the form of a string. To access it, you need to give the string you want as an index, like so:

import arff
import numpy as np
dataset = arff.load(open('mydataset.arff', 'rb'))
data = np.array(dataset['data'])

(you also shouldn't put the imports on the same line, although this is just a readability issue)

More detailed explanation

dataset is a dict, which on some languages is called a map or hashtable. In a dict, you access values in a similar way to how you index in a list or array, except the "index" can be any data-type that is "hashable" (which is, ideally, unique identifier for each possible value). This "index" is called a "key". In practice, at least for built-in types and most major packages, only immutable data types or hashable, but there is no actual rule that requires this to be the case.

Do you come from MATLAB? If so, then you are probably trying to use MATLAB's struct access technique. You could think of a dict as a much faster, more flexible struct, but syntax for accessing values are is different.