Converting CSV file to LIBSVM compatible data file using python

user3378649 picture user3378649 · Apr 19, 2014 · Viewed 13.2k times · Source

I am doing a project using libsvm and I am preparing my data to use the lib. How can I convert CSV file to LIBSVM compatible data?

CSV File: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/datasets/data/iris.csv

In the frequencies questions:

How to convert other data formats to LIBSVM format?

It depends on your data format. A simple way is to use libsvmwrite in the libsvm matlab/octave interface. Take a CSV (comma-separated values) file in UCI machine learning repository as an example. We download SPECTF.train. Labels are in the first column. The following steps produce a file in the libsvm format.

matlab> SPECTF = csvread('SPECTF.train'); % read a csv file
matlab> labels = SPECTF(:, 1); % labels from the 1st column
matlab> features = SPECTF(:, 2:end); 
matlab> features_sparse = sparse(features); % features must be in a sparse matrix
matlab> libsvmwrite('SPECTFlibsvm.train', labels, features_sparse);
The tranformed data are stored in SPECTFlibsvm.train.
Alternatively, you can use convert.c to convert CSV format to libsvm format.

but I don't wanna use matlab, I use python.

I found this solution as well using JAVA

Can anyone recommend a way to tackle this problem ?

Answer

emeth picture emeth · Apr 19, 2014

You can use csv2libsvm.py to convert csv to libsvm data

python csv2libsvm.py iris.csv libsvm.data 4 True

where 4 means target index, and True means csv has a header.

Finally, you can get libsvm.data as

0 1:5.1 2:3.5 3:1.4 4:0.2
0 1:4.9 2:3.0 3:1.4 4:0.2
0 1:4.7 2:3.2 3:1.3 4:0.2
0 1:4.6 2:3.1 3:1.5 4:0.2
...

from iris.csv

150,4,setosa,versicolor,virginica
5.1,3.5,1.4,0.2,0
4.9,3.0,1.4,0.2,0
4.7,3.2,1.3,0.2,0
4.6,3.1,1.5,0.2,0
...