Analysis of the Impact of Machine Translation Evaluation Metrics for Semantic Textual Similarity
Simone Magnolini, Ngoc-Phuoc-An Vo, Octavian Popescu
We present a work to evaluate the hypothesis that automatic evaluation
metrics developed for Machine Translation (MT) systems have significant
impact on predicting semantic similarity scores in Semantic Textual Similarity
(STS) task, in light of their usage for paraphrase identification. We show
that different metrics may have different behaviors and significance along the
semantic scale [0-5] of the STS task. In addition, we compare several classification
algorithms using a combination of different MT metrics to build an
STS system; consequently, we show that although this approach obtains remarkable
result in paraphrase identification task, it is insufficient to achieve
the same result in STS. We show that this problem is due to an excessive
adaptation of some algorithms to dataset domain and at the end a way to
mitigate or avoid this issue.
15th International Conference of the Italian Association for Artificial Intelligence, Genoa, Italy, November 28-December 1, 2016.