I want to get the coefficients of a weighted linear regression of an x-y pair represented by two arrays in java. I have zeroed in on weka, but it is asking an 'Instances' class object in the 'LinearRegression' class. To create an 'Instances' class file, an ARFF file is needed which contains the data. I have come across solutions that use the FastVector class but that has now been deprecated in the latest weka version. How do I create an ARFF file for the x-y pair and the corresponding weights all represented by arrays in java?
Here's my code based on Baz's answer. It's giving an exception on the last line "lr.buildClassifier(newDataset)" - Thread [main] (Suspended (exception UnassignedClassException))
Capabilities.testWithFail(Instances) line: 1302 . Here's the code -
public static void test() throws Exception
{
double[][] data = {{4058.0, 4059.0, 4060.0, 214.0, 1710.0, 2452.0, 2473.0, 2474.0, 2475.0, 2476.0, 2477.0, 2478.0, 2688.0, 2905.0, 2906.0, 2907.0, 2908.0, 2909.0, 2950.0, 2969.0, 2970.0, 3202.0, 3342.0, 3900.0, 4007.0, 4052.0, 4058.0, 4059.0, 4060.0}, {19.0, 20.0, 21.0, 31.0, 103.0, 136.0, 141.0, 142.0, 143.0, 144.0, 145.0, 146.0, 212.0, 243.0, 244.0, 245.0, 246.0, 247.0, 261.0, 270.0, 271.0, 294.0, 302.0, 340.0, 343.0, 354.0, 356.0, 357.0, 358.0}};
int numInstances = data[0].length;
ArrayList<Attribute> atts = new ArrayList<Attribute>();
List<Instance> instances = new ArrayList<Instance>();
for(int dim = 0; dim < 2; dim++)
{
Attribute current = new Attribute("Attribute" + dim, dim);
if(dim == 0)
{
for(int obj = 0; obj < numInstances; obj++)
{
instances.add(new SparseInstance(numInstances));
}
}
for(int obj = 0; obj < numInstances; obj++)
{
instances.get(obj).setValue(current, data[dim][obj]);
//instances.get(obj).setWeight(weights[obj]);
}
atts.add(current);
}
Instances newDataset = new Instances("Dataset", atts, instances.size());
for(Instance inst : instances)
newDataset.add(inst);
LinearRegression lr = new LinearRegression();
lr.buildClassifier(newDataset);
}
I think this might help you:
FastVector atts = new FastVector();
List<Instance> instances = new ArrayList<Instance>();
for(int dim = 0; dim < numDimensions; dim++)
{
// Create new attribute / dimension
Attribute current = new Attribute("Attribute" + dim, dim);
// Create an instance for each data object
if(dim == 0)
{
for(int obj = 0; obj < numInstances; obj++)
{
instances.add(new SparseInstance(numDimensions));
}
}
// Fill the value of dimension "dim" into each object
for(int obj = 0; obj < numInstances; obj++)
{
instances.get(obj).setValue(current, data[dim][obj]);
}
// Add attribute to total attributes
atts.addElement(current);
}
// Create new dataset
Instances newDataset = new Instances("Dataset", atts, instances.size());
// Fill in data objects
for(Instance inst : instances)
newDataset.add(inst);
Afterwards Instances
is you dataset.
Note: The current version (3.6.8) of Weka did not complain, even though I used FastVector
.
However, for the Developer version (3.7.7), use this:
ArrayList<Attribute> atts = new ArrayList<Attribute>();
List<Instance> instances = new ArrayList<Instance>();
for(int dim = 0; dim < numDimensions; dim++)
{
Attribute current = new Attribute("Attribute" + dim, dim);
if(dim == 0)
{
for(int obj = 0; obj < numInstances; obj++)
{
instances.add(new SparseInstance(numDimensions));
}
}
for(int obj = 0; obj < numInstances; obj++)
{
instances.get(obj).setValue(current, data[dim][obj]);
}
atts.add(current);
}
Instances newDataset = new Instances("Dataset", atts, instances.size());
for(Instance inst : instances)
newDataset.add(inst);