Google Creates A Text To Speech AI system Alike Human voice
Posted By : Manisha Jangwal | 12-Jan-2018

Google has plunged high towards its ‘AI-first’ dream. The tech giant has attempted to develop a Text-to-speech system that has exactly human-like articulation. This AI system is called "Tacotron 2" that has the ability to give an AI-generated computer speech in a human-voice. Google researchers mentioned in the blog post that the new procedure does not utitilise complex linguistic and acoustic features as input. In place of it, they developed human-like speech from text using neural networks trained using only speech examples and corresponding text transcript.
Google’s CEO Sundar Pichai announced that the company will be shifting its focus from mobile-first to AI-first at the Google I/O 2017 developers conference. In fact, it has come up with many products and features which includes Google Lens, Smart Reply for Gmail and Google Assistant for iPhone.
You may also read: 5 Best Artificial Intelligence Tools
A paper published in arXiv.org discovers that, the human-like text-to-speech system first develops a spectrogram of the text, a graphical representation which tells how the speech should sound. Next, the visual is carried through the Google’s WaveNet algorithm. This algorithm makes use of this visual and puts AI closer than ever to vaguely imitate human speech. It can also memorizes different human voices and even produce artificial breaths. Google’s researchers also assessed the voices produced. While evaluating they requested humans to give rating about the generated speech based on how natural it is. And they got a score of 4.53 that was comparable to a MOS of 4.58 that of professional recordings.
Over the past few years, research into text-to-speech technology has advanced vastly. Even several companies are working on it. With its audio samples, Google professed that "Tacotron 2" is capable of finding out from context the difference between the noun "desert" and the verb "desert,"and can change its pronunciation appropriately.
The company has said that in a paper that it is able to put emphasis on capitalised words and apply the proper inflection when asking a question rather than making a statement. At the same time, the company’s engineers did not spill the beans on any information but they gave hint for developers to discover how far they have come in building up this framework.
The system has some problems too. The new system has issues with complex words’ pronounciation such as ‘decorum’ and ‘merlot’. Exceptionally, it can randomly generate strange noises. Also, the system has difficulties generating audio in real time.
Cookies are important to the proper functioning of a site. To improve your experience, we use cookies to remember log-in details and provide secure log-in, collect statistics to optimize site functionality, and deliver content tailored to your interests. Click Agree and Proceed to accept cookies and go directly to the site or click on View Cookie Settings to see detailed descriptions of the types of cookies and choose whether to accept certain cookies while on the site.
About Author
Manisha Jangwal
Manisha is working as a Content Writer. She enjoys elaborating on minor details with a plethora of information. Her hobbies are going out , exploring new things and listening to music.