find important features for classification

scikit-learn feature-selection

Mads Jensen · Apr 3, 2013 · Viewed 19.2k times · Source

I'm trying to classify some EEG data using a logistic regression model (this seems to give the best classification of my data). The data I have is from a multichannel EEG setup so in essence I have a matrix of 63 x 116 x 50 (that is channels x time points x number of trials (there are two trial types of 50), I have reshaped this to a long vector, one for each trial.

What I would like to do is after the classification to see which features were the most useful in classifying the trials. How can I do that and is it possible to test the significance of these features? e.g. to say that the classification was drive mainly by N-features and these are feature x to z. So I could for instance say that channel 10 at time point 90-95 was significant or important for the classification.

So is this possible or am I asking the wrong question?

any comments or paper references are much appreciated.

Answer

Scikit-learn includes quite a few methods for feature ranking, among them:

Univariate feature selection (http://scikit-learn.org/stable/auto_examples/feature_selection/plot_feature_selection.html)
Recursive feature elimination (http://scikit-learn.org/stable/auto_examples/feature_selection/plot_rfe_digits.html)
Randomized Logistic Regression/stability selection (http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RandomizedLogisticRegression.html)

(see more at http://scikit-learn.org/stable/modules/feature_selection.html)

Among those, I definitely recommend giving Randomized Logistic Regression a shot. In my experience, it consistently outperforms other methods and is very stable. Paper on this: http://arxiv.org/pdf/0809.2932v2.pdf

Edit: I have written a series of blog posts on different feature selection methods and their pros and cons, which are probably useful for answering this question in more detail:

find important features for classification

Answer

Related questions