sofer.com » blog » tech
22 February 2009

Text to voice from a web page

Text-to-voice software and synthesised voices are getting to the point where they are really quite impressive and certainly usable. I recently discovered Acapela’s range of multilingual voices and I want to use them. However, accessing a computer’s voices from within a web page is not entirely trivial.

My first attempt to do so involved playing with Fire Vox and CliCk, Speak, developed by Charles Chen, but these were no good for my purposes, because I need control of what is spoken and when. These two Firefox extensions are designed for web users, not web developers.

I then discovered outfox, which is “a Firefox extension that allows in-page JavaScript to access local platform services and devices such as text-to-speech synthesis”. Exactly what I want. However, as of February 2009 it is still unstable and accented characters cause it to crash.

Eventually, outfox is probably going to be a good solution, but another way of doing is to create a web service. So, instead of using the browser to control the desktop directly, I could send JSON requests from within the application to an HTTP server sitting on my computer, that would then handle the text-to-speech stuff.

So, I created a server using WEBrick and ruby along with jQuery (specifically, a JSONP callback using jQuery.ajax). The ruby script which acts as the server is only a few lines of code and the javascript in my web application is even shorter. this setup gives me full access to all the intalled system voices on my Mac via a system call to the OS X say command.

This is now working quite nicely.

The trickiest bit was getting jQuery to pick up a response from my script. Because I was not familiar with jsonp and the way it is implemented in jQuery, I didn’t understand that I needed to wrap my json response in a callback that looked something like json123456789(...), the exact value of which can be got from the ‘callback’ parameter in the query sent from jQuery. It was confusing at first.

Although the script is specific both to Macs in general and my installed voices in particular, it would be easy enough to implement something similar on any platform with Ruby installed.

The one piece missing from this jigsaw is that on OS X it does not appear to be possible to associate a language with a voice, even though the System Voices in the System Preferences are associated with the appropriate national flag.

It would be nice to be able to type say -l french bonjour in a console window, instead of say -v Julie bonjour.