Font Size:
Pseudo-lemmatization in Croatian-English SMT
Last modified: 2014-07-16
Abstract
One of the first difficulties in conducting a thorough analysis of statistical machine translation involving Croatian as a morphologically rich and resource poor language is the lack of quality language resources. This paper presents results of two standard fourteen feature Croatian-English phrase-based statistical machine translation systems. Prior to building the second system a partial pseudo-lemmatization of the Croatian parts of training and test sets is made in an attempt to simplify the translation process. Besides automatic evaluation, manual evaluation is conducted in order to gain insight into the nature of the translation differences achieved between two systems.