Hybrid Techniques for Training HMM POS Taggers.

Ted Briscoe, Greg Grefenstette
We describe and experimentally evaluate a hybrid technique for training part of speech taggers which utilises training from small quantities of unambiguously-tagged material combined with maximum likelihood re-estimation over the target untagged corpus. This approach, unlike previous ones employing re-estimation, does not involve skilled manipulation of the initial parameters of the model or the use of sophisicated models of suffix-tag probabilities derived from unambiguously-tagged material. We conclude that this technique can yield usefully accurate taggers for several languages, but that the conditions required for success are difficult to state precisely.
technical report MLTT-007 (May 94)