Although this may intuitively be a good method of evaluation, it has been shown that round-trip translation is a, “poor predictor of quality”. The reason why it is such a poor predictor of quality is reasonably intuitive. When a round-trip translation is performed, the method is not testing one system, but two systems. The language pair of the engine for translating in to the target language, and the language pair translating back from the target language.
Consider the following examples of round-trip translation performed from English to Italian and Portuguese from Somers (2005):
Original text Select this link to look at our home page. Translated Selezioni questo collegamento per guardare il nostro Home Page. Translated back Selections this connection in order to watch our Home Page.
Original text Tit for tat Translated Melharuco para o tat Translated back Tit for tat
In the first example, where the text is translated into Italian and then back into English, although the English text is significantly garbled, the Italian is a serviceable translation. In the second example, although the text that is translated back into English is perfect, the Portuguese translation is meaningless.
While round-trip translation may be useful in order to generate a “surplus of fun”, the methodology is deficient for any serious study of the quality of machine translation output.