Using NNET for classification

Kharoof picture Kharoof · Nov 16, 2013 · Viewed 12.8k times · Source

I am new to neural networks and I have a question about classification with the nnet package.

I have data which is a mixture of numeric and categoric variables. I wanted to make a win lose prediction by using nnet and a function call such as

nnet(WL~., data=training, size=10) 

but this gives a different result than if I use a dataframe with only numeric versions of the variables (i.e. convert all the factors to numeric (except my prediction WL)).

Can someone explain to me what is happening here? I guess nnet is interpreting the variables different but I would like to understand what is happening. I appreciate its difficult without any data to recreate the problem but I am just looking at a high level explanation of how neural networks are fitted using nnet. I cant find this anywhere. Many thanks.

str(training)
'data.frame':   1346 obs. of  9 variables:
 $ WL                   : Factor w/ 2 levels "win","lose": 2 2 1 1 NA 1 1 2 2 2 ...
 $ team.rank            : int  17 19 19 18 17 16 15 14 14 16 ...
 $ opponent.rank        : int  14 12 36 16 12 30 11 38 27 31 ...
 $ HA                   : Factor w/ 2 levels "A","H": 1 1 2 2 2 2 2 1 1 2 ...
 $ comp.stage           : Factor w/ 3 levels "final","KO","league": 3 3 3 3 3 3 3 3 3 3 ...
 $ days.since.last.match: num  132 9 5 7 14 7 7 7 14 7 ...
 $ days.to.next.match   : num  9 5 7 14 7 9 7 9 7 8 ...
 $ comp.last.match      : Factor w/ 5 levels "Anglo-Welsh Cup",..: 5 5 5 5 5 5 3 5 3 5 ...
 $ comp.next.match      : Factor w/ 4 levels "Anglo-Welsh Cup",..: 4 4 4 4 4 3 4 3 4 3 ...

vs

str(training.nnet)
'data.frame':   1346 obs. of  9 variables:
 $ WL                   : Factor w/ 2 levels "win","lose": 2 2 1 1 NA 1 1 2 2 2 ...
 $ team.rank            : int  17 19 19 18 17 16 15 14 14 16 ...
 $ opponent.rank        : int  14 12 36 16 12 30 11 38 27 31 ...
 $ HA                   : num  1 1 2 2 2 2 2 1 1 2 ...
 $ comp.stage           : num  3 3 3 3 3 3 3 3 3 3 ...
 $ days.since.last.match: num  132 9 5 7 14 7 7 7 14 7 ...
 $ days.to.next.match   : num  9 5 7 14 7 9 7 9 7 8 ...
 $ comp.last.match      : num  5 5 5 5 5 5 3 5 3 5 ...
 $ comp.next.match      : num  4 4 4 4 4 3 4 3 4 3 ...

Answer

musically_ut picture musically_ut · Nov 16, 2013

The difference you are looking for can be explained with a very small example:

fit.factors <- nnet(y ~ x, data.frame(y=c('W', 'L', 'W'), x=c('1', '2' , '3')), size=1)
fit.factors
# a 2-1-1 network with 5 weights
# inputs: x2 x3 
# output(s): y 
# options were - entropy fitting 

fit.numeric <- nnet(y ~ x, data.frame(y=c('W', 'L', 'W'), x=c(1, 2, 3)), size=1)
fit.numeric
# a 1-1-1 network with 4 weights
# inputs: x 
# output(s): y 
# options were - entropy fitting 

While fitting models in R, the factor variables are actually split out into several indicator/dummy variables.

Hence, a factor variable x = c('1', '2', '3') actually is split into three variables: x1, x2, x3, one of which holds the value 1 while others hold the value 0. Moreover, since the factors {1, 2, 3} are exhaustive, one (and only one) of x1, x2, x3 must be one. Hence, variables x1, x2, x3 are not independent since x1 + x2 + x3 = 1. So we can drop the first variable x1 and keep only values of x2 and x3 in the model and conclude that the level is 1 if both x2 == 0 and x2 == 0.

That is what you see in the output of nnet; when x is a factor, there are actually length(levels(x)) - 1 inputs to the neural network and if x is a number, then there is only one input to the neural network which is x.

Most R regression functions (nnet, randomForest, glm, gbm, etc.) do this mapping from a factor level to dummy variables internally and one doesn't need to be aware of it as a user.


Now it should be clear what is the difference between using a dataset with factors and a dataset with numbers replacing the factors. If you do the conversion to numbers, then you are:

  1. Losing the unique properties of each level and quantising the difference between them.
  2. Enforcing an ordering between the levels

This does result in a slightly simpler model (with fewer variables as we do not need dummy variables for each level), but is often not the correct thing to do.