» » » » » » » Quasi instant voice translation

Quasi instant voice translation

voice translation

Translation in real time, in reality “near real time”, carried out by specialized software, is aprocess not yet operational, requiring simultaneously control in three main steps to be performed for each group of spoken words in a few tenths of a second and with a high quality.

  1. voice recognition and discourse analysis,

  2. translation (with use of a translation memory and rules)

  3. rendering in speech synthesis, possibly based on processes and techniques of automatic text generation (some metaphors or comparisons or expressions having a clear meaning in a language, but no possible literal translation into other languages).

In real-time (or near real-time), the system must also provide an ideal state of the supposed (ie calculated) quality of translation. It is measured by the words error rate (WER, the unity of classical measure to measure the performance of a speech recognition system).

If in doubt, a word can be added as vocal overlay to indicate to the listener that several translations are possible. An audible or visual signal can give indication of a good translation probability.

The real-time translation has long been regarded as technically impossible, given the computer technology and software availability.


Accents, individual variations in pronunciation, dialects, idioms, nicknames, individual neologisms, ellipses, idioms, continuous speech in monotone or sung, stuttering, and other idiosyncrasies were previously considered insurmountable barriers to good quasi instant translation with a voice exchange, even using the best tools in computational linguistics and machine translation.

The process may be very consumer of computing power, requiring new data center to process and calculate the flow of voice data, and increased consumption of electrical energy and bandwidth.

Ongoing projects

In early 2009, a project was underway in Japan consisting in providing a mobile phone for an automatic multilingual translator. This project aims initially to display on the phone screen the translation of sentences and simple words spoken in Japanese or other languages in in seconds and autonomously, that is to say, without dependence on a server.

February 7, 2010, Google announced an almost instant voice translation application (speech-to-speech translation). According to a Times article, Google is preparing to integrate into a mobile phone voice recognition system coupled to an automatic translation. The system should only operate properly in a few years, nevertheless said Franz Och, head of translation at Google, who believes that the mobile phone should support the translation into “output” because it is a priori more likely to recognize and eventually “learn” the voice and language of its owner or frequent speaker (as long as it is not strongly cold, drunk, injured or blurred by ambient noise). Google benefits from the experience of its online translators (which in early 2010 resulted in writing – more or less – 52 languages). Google could also record and define the voice of laptop users when they make use of voice queries on its search engine. This would facilitate the understanding of the speaker’s voice by the translation system. It would even be theoretically possible to imitate the tone of voice, or feelings it expresses (anger) when returning the translation by speech synthesis. Google is also well placed to use the huge database of web data and translated documents.


  • The perfect universal translator will reveal still long time, but various direct or derived uses of voice translators seem plausible for the years and decades to come, especially with collaborative improvement techniques that could facilitate insertion into office suites.
  • Live subtitling (for the deaf and hard of hearing on television or film, or in special glasses for example).
  • Subtitling translated from the soundtrack of a video recording or a dictaphone recording.
  • In a same room, or during a guided tour, different listeners via a headset or headphones could listen in their own language the same speaker or commentator
  • Assistance to speech (via a mobile phone or a direct translator for people with speech disabilities)
  • Possible malicious use; Ultimately, there is a risk such as timbre, tone and voice of a person can be well reconstructed enough to simulate his voice, possibly misused.
  • Such a translator, depending on what you will do, could both curb and facilitate language learning, and possibly encourage the persistence of rare and ancient indigenous languages (if they can be taken into account by the translator, some of these languages have been relatively well studied by ethnologists and linguists). First, the positive effects could be better diction and better constructed sentences from the users who want their translation software make the fewest mistakes.
  • It is possible to imagine being able to hear in his tongue and near-live text written in a dead language (Latin and Greek in particular), if it can be “read” by a handwriting recognition software

Leave a Reply

Your e-mail address will not be published. Required fields are marked *