2012-01-18

Embedding text-to-speech into HTML5 games

I admit that the title is a bit misleading, but it better describes the aim of my article. This write-up offers one solution with code snippets to the problem of embedding Google's TTS service into a HTML5 page.

The unofficial Google TTS "API" lets you turn text to speech in well over 20 languages. It is a part of Google Translate, and works with text less than 100 characters long. So - how can you technically make use of this service to provide a better user experience?

I'll do my best not to ponder the game design aspects of using TTS in games in general, but I will say a few words about whether the API in question is available for public use at all - regardless of being available.

It's at http://translate.google.com/translate_tts if you haven't yet tried it yet. Try clicking here to NOT hear it in action: (Update: You sometimes do! It seems inconsistent so far.)
Now try copy/pasting the same url into your browser:
http://translate.google.com/translate_tts?tl=en&q=the%20brown%20fox%20jumped%20over%20the%20lazy%20dog.

There is a reason why you (probably) don't hear the speech when clicking on the link. I'll get to that in a minute. First a recap of the service and its dubious availability for public use.

Google has not been very strict on protecting this access point. I recall reading a blog about a TTS API for Android that referred to people using this service for their mobile tts apps as "clever". But then again, there is no API and it is clearly intended for internal use only. Although Google so far have not hurt developers making use of this technology, the legal aspects of doing so remain unclear.

Decide for yourself whether you want to give it a try. One thing is for certain - without a public API there is no guarantee that future versions of the service will stay compatible with the current one, nor that it will be available at all tomorrow. Client side applications should especially be wary of this with even more emphasis on mobile platforms where updates can take several days in case your app breaks as a result of ie. Google modifying query request requirements.

Google actually did do something that would keep most hotlinkers away (coming back to my original reason for not being able to hear the speech when clicking a link), which is they made sure that if your browser forwards a Referer header with any value other than an empty string (meaning it tells the service which page you clicked the link on) then they will return a 404 (Not Found) http error. This does not happen when you click a Favorites link or copy/paste the link into your browser, because there is no "history" for that link. This might be a good indication of what Google's original intent with this service was. I also wouldn't be surprised if the service was available from domains under Google's control.

Either way - the tricky part in getting this to work is that while there is a way for JavaScript to manipulate AJAX headers, browsers will always override your Referer settings. To my best knowledge currently there is no client side only solution. However, what you can do is set up a gateway service that will do nothing but clear your Referer header and forward content to your HTML5 page.

So just to be clear, let's see some code. Create a new HTML5 page with the source:

1
2
3
4
5
6
7
8
9
10
11
<!DOCTYPE html>
<html>
<head>
   <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
</head>
<body>
   <audio controls="controls" autoplay="autoplay" style="display:none;">
   </audio>
</body>
</html>

The doctype is the official HTML5 doctype. Since this puts IE9 into quirks mode, I also included a meta tag that would tell IE that we're expecting it to go to HTML5 mode instead. In the body there are the audio and source tags - the new, convenient way to play audio embedded into a web page. This usually comes with a browser-specific player which I hid by hiding the audio control itself via css.

The source tag defines what sound we want to load and play. I am passing the url discussed above and setting the content type of the source to mpeg audio.

Put this code into an html file and open it. You won't hear a thing. If you check the error codes through the debugging tools of your browser, you will see that the url fails to load and the file is not present.

Now, consider the following PHP script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<?php
 
$qs = http_build_query(array("ie" => "utf-8","tl" => $_GET["tl"], "q" => $_GET["q"]));
$ctx = stream_context_create(array("http"=>array("method"=>"GET","header"=>"Referer: \r\n")));
$soundfile = file_get_contents("http://translate.google.com/translate_tts?".$qs, false, $ctx);
 
header("Content-type: audio/mpeg");
header("Content-Transfer-Encoding: binary");
header('Pragma: no-cache');
header('Expires: 0');
 
echo($soundfile);
 
?>

Put that somewhere. I'll refer to it as http://www.example.com/ttsgateway.php. This gets the information from that access point and spits it out right away. But before it requests the data using file_get_contents, the Referer header is set to an empty string.

Now if you change the original source url in our html audio to http://www.example.com/ttsgateway.php?tl=en&q=the%20brown%20fox%20jumped%20over%20the%20lazy%20dog. then thanks to autoplay you should hear the speech right after loading the page. (I used a fictitious address here, so copy/pasting this won't work.)

If you use this service for one thing or another, please be fair. Google seems to be fair by leaving it out in the open for the wolves, so I think its only fair that we don't overuse it. Try caching your audio files and be reasonable about the quantity of content you want to turn from text to speech. If your usage case allows it, perhaps save the generated files on the server where your gateway resides and reuse those instead of making a new query to the service for the same text each time.

If you found this useful or have any thoughts on the topic please leave a comment. Thanks!

No comments:

Post a Comment