Scikit-learn: How to run KMeans on a one-dimensional array?

Irene picture Irene · Feb 9, 2015 · Viewed 23.8k times · Source

I have an array of 13.876(13,876) values between 0 and 1. I would like to apply sklearn.cluster.KMeans to only this vector to find the different clusters in which the values are grouped. However, it seems KMeans works with a multidimensional array and not with one-dimensional ones. I guess there is a trick to make it work but I don't know how. I saw that KMeans.fit() accepts "X : array-like or sparse matrix, shape=(n_samples, n_features)", but it wants the n_samples to be bigger than one

I tried putting my array on a np.zeros() matrix and run KMeans, but then is putting all the non-null values on class 1 and the rest on class 0.

Can anyone help in running this algorithm on a one-dimensional array? Thanks a lot!

Answer

ryanpattison picture ryanpattison · Feb 9, 2015

You have many samples of 1 feature, so you can reshape the array to (13,876, 1) using numpy's reshape:

from sklearn.cluster import KMeans
import numpy as np
x = np.random.random(13876)

km = KMeans()
km.fit(x.reshape(-1,1))  # -1 will be calculated to be 13876 here