I'm trying to find the most probable path (i.e. a sequence of states) on an HMM using the Viterbi algorithm. However, I don't know the transition and emission matrices, which I need to estimate from the observations (data).
To estimate these matrices, which algorithm should I use: the Baum-Welch or the Viterbi training algorithm? Why?
In case I should use the Viterbi training algorithm, can anyone provide me a good pseudocode (it's not easy to find)?
Given enough resources, you should probably use the Baum-Welch (forward-backward) algorithm over the Viterbi training algorithm (a.k.a. segmental k-means algorithm), which is an alternative parameter estimation process that sacrifices some of Baum-Welch's generality for computational efficiency. In general, the Baum-Welch algorithm will give parameters that lead to better performance, although there are cases where this is not the case. Here is a nice comparative study.
Furthermore, note that you should use the Baum-Welch algorithm to estimate the parameters of the model. This sets the emission probability and transmission probabilities using something similar to the EM algorithm. After you have trained the HMM, you would then use the Viterbi decoding algorithm to compute the most likely sequence of states which could have generated your observations.
Reference-wise I would recommend Speech and Language Processing, Artificial Intelligence a Modern Approach or this paper