I was reading a promo for an upcoming Speech Technology conference that was positively gushing about recent developments in spontaneous speech transcription. One application is voice mail transcription, in which voice mail messages are transformed into text and sent to subscribers as an SMS. Another is transcribing multimedia audio content for indexing to form the basis of a search application.
Along with the enthusiasm there was also this disclaimer: “spontaneous speech includes a very large vocabulary of words, hesitations and unstructured speech. As such, speech recognition performance is lower than domain-specific applications in which the vocabulary size is lower and language models can be used.”
So, I wondered, where is the advancement? It seems to me that despite all of the gains made in processing power in the last decade, we are not much further along than we were when speech recognition products such as ViaVoice and Naturally Speaking first appeared more than ten years ago..
What do you think? When is speech technology going to break through as a practical and reliable work-a-day tool?