How to rank features by their importance in a Weka classifier?

khadre picture khadre · Jan 21, 2014 · Viewed 11.2k times · Source

I use Weka to successfully build a classifier. I would now like to evaluate how effective or important my features are. Fot this I use AttributeSelection. But I don't know how to ouput the different features with their corresponding importance. I want simply list the features in decreasing order of their information gain scores!

Answer

Chthonic Project picture Chthonic Project · Jan 21, 2014

There are many ways of scoring the features, which are called attributes, in Weka. These methods are available as subclasses of weka.attributeSelection.ASEvaluation.

Any of these evaluation classes will give you a score for each attribute. If you use information gain for scoring, for example, you will be using it the class InfoGainAttributeEval. The helpful methods are

  • InfoGainAttributeEval.html#buildEvaluator(), and
  • InfoGainAttributeEval.html#evaluateAttribute()

The other types of feature scoring (gain ratio, correlation, etc.) have the same methods for scoring. Using any of these, you can rank all your features.

The ranking itself is independent of Weka. Of the many ways of doing it, this is one:

Map<Attribute, Double> infogainscores = new HashMap<Attribute, Double>();
for (int i = 0; i < instances.numAttributes(); i++) {
    Attribute t_attr = instaces.attribute(i);
    double infogain  = evaluation.evaluateAttribute(i);
    infogainscores.put(t_attr, infogain);
}

Now you have a map which needs to be sorted by value. Here's a generic code to do that:

 /**
  * Provides a {@code SortedSet} of {@code Map.Entry} objects. The sorting is in ascending order if {@param order} > 0
  * and descending order if {@param order} <= 0.
  * @param map   The map to be sorted.
  * @param order The sorting order (positive means ascending, non-positive means descending).
  * @param <K>   Keys.
  * @param <V>   Values need to be {@code Comparable}.
  * @return      A sorted set of {@code Map.Entry} objects.
  */
 static <K,V extends Comparable<? super V>> SortedSet<Map.Entry<K,V>>
 entriesSortedByValues(Map<K,V> map, final int order) {
     SortedSet<Map.Entry<K,V>> sortedEntries = new TreeSet<>(
         new Comparator<Map.Entry<K,V>>() {
             public int compare(Map.Entry<K,V> e1, Map.Entry<K,V> e2) {
                 return (order > 0) ? compareToRetainDuplicates(e1.getValue(), e2.getValue()) : compareToRetainDuplicates(e2.getValue(), e1.getValue());
         }
     }
    );
    sortedEntries.addAll(map.entrySet());
    return sortedEntries;
}

and finally,

private static <V extends Comparable<? super V>> int compareToRetainDuplicates(V v1, V v2) {
    return (v1.compareTo(v2) == -1) ? -1 : 1;
}

Now you have a list of entries sorted by values (in ascending or descending order, as you wish). Go crazy with it!

Please note that you should handle the case where more than one attribute has the same information gain. That is why I went through the process of sorting by values while retaining duplicates.