Microsoft’s real-time speech translation

At the inaugural Code Conference in California, CEO Satya Nadella revealed that Microsoft’s real-time speech translation technology will finally make the jump from the mystical, bottomless pit of its R&D department to a consumer product: Skype.

On stage at the conference, Nadella demoed a beta version of Skype Translator, which performed real-time translation of English to German speech, and vice versa. Skype Translator isn’t perfect, but it’s tantalizingly close to the creation of a Star Trek-like universal translator — or Babel fish if you prefer — that allows everyone in the world to communicate, even if they don’t share a common language.
We first saw Microsoft’s speech translation tech way back in 2012, when Microsoft Research’s Rick Rashid translated his own English speech into Spanish, Italian, and Mandarin. We then saw the tech again in November 2012 — but since then, Microsoft has been fairly quiet. Now we know why: Microsoft has been trying to squeeze the technology into Skype.
Back when the real-time speech translation was first demoed in 2012, it actually used the speaker’s voice in the translations — as in, it would convert English into German, but keep accent, timbre, and intonation. This was some seriously impressive tech that essentially reverse engineered your voice into a series of phonemes (individual sounds), and then used that information to reconstruct your voice in a new language — in near-real-time (the demo starts at around the six-minute mark). Presumably this technique required too much processing power, and so now we just get generic, Microsoft Sam and Microsoft Anna computer speech.
While the Skype Translator demo wasn’t quite as awesome as we’d hoped, in reality the lack of accent/timbre is only a minor quibble. The potential for real-time speech translation in education, business, diplomacy, and multilingual families is huge. Just by downloading a new version of Skype, western companies could start doing business with companies in China and other huge growth markets. And yes, there’s no reason Microsoft will reserve this tech just for Skype — a real-time speech translation app for Windows Phone would be pretty useful for travel…