I need to train a neural network with 2-4 hidden layers, not sure yet on the structure of the actual net. I was thinking to train it using Hadoop map reduce (cluster of 12 pcs) or a gpu in order to get faster results. What do you think it would be better ? Also are there any available libraries that have these already implemented? Thanks
I've been luckily to work in a lab which has dabbled in both of these methods for training networks, and while both are useful in very computationally expensive settings, the location of the computational bottleneck usually determines which method to use.
Training a network using a distributed system (e.g. HADOOP)
This is useful when your network is large enough that the matrix multiplications involved in training become unwieldy on a traditional PC. This problem is particularly prevalent when you have harsh time constraints (e.g. online training), as otherwise the hassle of a HADOOP implementation isn't worth it (just run the network overnight). If you're thinking about HADOOP because you want to fiddle with network parameters and not have to wait a day before fiddling some more (frequently the case in my lab), then simply run multiple instances of the network with different parameters on different machines. That way you can make use of your cluster without dealing with actual distributed computation.
Example:
You're training a network to find the number of people in images. Instead of a predefined set of training examples (image-number of people pairs) you decide to have the program pull random images from Google. While the network is processing the image, you must view the image and provide feedback on how many people are actually in the image. Since this is image processing, your network size is probably on the scale of millions of units. And since you're providing the feedback in real time the speed of the network's computations matters. Thus, you should probably invest in a distributed implementation.
Training a network on a GPU
This is the right choice if the major computational bottleneck isn't the network size, but the size of the training set (though the networks are still generally quite large). Since GPUs are ideal for situations involving applying the same vector/matrix operation across a large number of data sets, they are mainly used when you can use batch training with a very large batch size.
Example:
You're training a network to answer questions posed in natural language. You have a huge database of question-answer pairs and don't mind the network only updating its weights every 10000 questions. With such a large batch size and presumably a rather large network as well, a GPU based implementation would be a good idea.