I'm using this library to implement a learning agent.
I have generated the training cases, but I don't know for sure what the validation and test sets are.
The teacher says:
70% should be train cases, 10% will be test cases and the rest 20% should be validation cases.
edit
I have this code for training, but I have no idea when to stop training.
def train(self, train, validation, N=0.3, M=0.1):
# N: learning rate
# M: momentum factor
accuracy = list()
while(True):
error = 0.0
for p in train:
input, target = p
self.update(input)
error = error + self.backPropagate(target, N, M)
print "validation"
total = 0
for p in validation:
input, target = p
output = self.update(input)
total += sum([abs(target - output) for target, output in zip(target, output)]) #calculates sum of absolute diference between target and output
accuracy.append(total)
print min(accuracy)
print sum(accuracy[-5:])/5
#if i % 100 == 0:
print 'error %-14f' % error
if ? < ?:
break
edit
I can get an average error of 0.2 with validation data, after maybe 20 training iterations, that should be 80%?
average error = sum of absolute difference between validation target and output, given the validation data input/size of validation data.
1
avg error 0.520395
validation
0.246937882684
2
avg error 0.272367
validation
0.228832420879
3
avg error 0.249578
validation
0.216253590304
...
22
avg error 0.227753
validation
0.200239244714
23
avg error 0.227905
validation
0.199875013416
The training and validation sets are used during training.
for each epoch
for each training data instance
propagate error through the network
adjust the weights
calculate the accuracy over training data
for each validation data instance
calculate the accuracy over the validation data
if the threshold validation accuracy is met
exit training
else
continue training
Once you're finished training, then you run against your testing set and verify that the accuracy is sufficient.
Training Set: this data set is used to adjust the weights on the neural network.
Validation Set: this data set is used to minimize overfitting. You're not adjusting the weights of the network with this data set, you're just verifying that any increase in accuracy over the training data set actually yields an increase in accuracy over a data set that has not been shown to the network before, or at least the network hasn't trained on it (i.e. validation data set). If the accuracy over the training data set increases, but the accuracy over the validation data set stays the same or decreases, then you're overfitting your neural network and you should stop training.
Testing Set: this data set is used only for testing the final solution in order to confirm the actual predictive power of the network.