Java, Weka: How to predict numeric attribute?

Anton Ashanin picture Anton Ashanin · Apr 25, 2013 · Viewed 15.4k times · Source

I was trying to use NaiveBayesUpdateable classifier from Weka. My data contains both nominal and numeric attributes:

  @relation cars
  @attribute country {FR, UK, ...}
  @attribute city {London, Paris, ...}
  @attribute car_make {Toyota, BMW, ...}
  @attribute price numeric   %% car price 
  @attribute sales numeric   %% number of cars sold

I need to predict the number of sales (numeric!) based on other attributes.

I understand that I can not use numeric attribute for Bayes classification in Weka. One technique is to split value of numeric attribute in N intervals of length k and use instead nominal attribute, where n is a class name, like this: @attribute class {1,2,3,...N}.

Yet numeric attribute that I need to predict ranges from 0 to 1 000 000. Creating 1 000 000 classes make no sense at all. How to predict numeric attribute with Weka or what algorithms to look for in case Weka has no tools for this task?

Answer

Sentry picture Sentry · Apr 28, 2013

What you want to do is regression, not classification. The difference is exactly what you describe/want:

  • Classification has discrete classes/labels, any nominal attribute could be used as class here
  • Regression has continuous labels, classes would be a wrong term here.

Most regression based techniques can be transformed into a binary classification by defining a threshold and the class is determined by whether the predicted value is above or below this threshold.

I don't know all of WEKA's classifiers that offer regression, but you can start by looking at those two:

You might have to use the NominalToBinary filter to convert your nominal attributes to numerical (binary) ones.