What is the difference between the dense sift implementation compare to sift? What are the advantages/disadvantages of one to another? I'm talking in particular about the VLFeat implementations.
The obvious difference is that with dense SIFT you get a SIFT descriptor at every location, while with normal sift you get a SIFT descriptions at the locations determined by Lowe's algorithm.
There are many applications where you require non-dense SIFT, one great example is Lowe's original work.
There are plenty of applications where good results have been obtained by computing a descriptor everywhere (densely) one such example is this. A descriptor similar to dense SIFT is called HOG or DHOG, they are technically not the same thing but conceptually both based on histograms of gradients and are very similar.