Generating thousands of dialogue files
This post describes how we generated thousands of voice files to test Alto, the new audio localization tool released by Tsugi.
So many files, so little time
With the explosion of content in AAA games, it is not rare to end up having thousands or even tens of thousands of dialog lines in a project once they are all translated. Some lines may be recorded internally, others some by external studios or localization agencies. But how do you manage quality control? Comparing all these dialog lines manually would be cumbersome, error-prone, and actually pretty much impossible without an army of little elves with multilingual skills.
Enter Alto. Alto is an audio localization tool (see how clever?) that compares these thousands of dialogue lines in several languages automatically, finding missing or misnamed files, checking their audio format (sample rate, channels and bit depth), their duration (including leading and trailing silence), their levels (average and peak) or even their timbre. Alto can get the data directly from your folders (with files organized in subfolders, having prefixes or suffixes…) or by reading the files referenced by an Excel sheet or even by importing a game audio middleware project (ADX2, FMOD and Wwise are all supported). Here are a couple of screenshots:
Once the analysis done, you can give a nice PDF / Excel / HTML report to your manager or client, showing the quality of you work. Or go back to check all the errors You can also get an XML file so third party tools could read it and automatically mark dialogue as final or placeholder for example.
It’s incredible the number of small – and sometimes bigger- errors the studios who beta-tested Alto have found in their older projects. That does not mean their work was bad, but simply that it is nearly impossible to get perfect localized dialogue in each language on big projects without some kind of automated process to test everything.
Testing Alto meant that we needed thousands of audio files in many languages. This kind of data is quite hard to get. Asking files from old projects to some of our clients would have taken time due to IP concerns. Of course, we planned to have beta-testing with game studios, but we at least needed something to help us develop the program, before it even reached the beta-testing stage.
It’s not that uncommon in the game industry to generate placeholder dialogue lines with a speech synthesizer. So I decided to do the same with the test files and to write an application which would create all these files for us.
This application takes a text file as input. It starts by segmenting it into sentences based on the punctuation marks. Then I use the Microsoft Speech Synthesizer to generate the audio files for the reference language. The next step is to send the sentences to the Google Translate API to be translated in order to synthesize them in other languages. It turns out that the Google Translate API is not free anymore (alternatives are not free either or of lesser quality). Even for a few files, you need to register and enter your credit card information, you need an API key etc… However, Google’s interactive translation web page still makes callbacks using public access without an API key. By giving a look to these AJAX calls and sending them directly from code it is possible to get your translation done (you just need to parse out the JSON returned).
Since we are sending the text sentence by sentence and not as a big chunk of text, we don’t have a problem with the size limit imposed by Google’s translation web page. Of course this also means that we make many more calls, which could be a problem speed-wise. Fortunately, it ends up being very fast. Having now translated the text in various languages, I sent it to the corresponding voices I downloaded for the Microsoft Speech Synthesizer. You can find 27 of them here! (On that topic, make sure that the voices you are downloading are for the version of the synthesizer you have installed).
Finally, I needed to introduce some errors to make sure Alto was doing its job correctly. The Microsoft Speech API allows us to choose different types of voices as well as different rates and volumes of speech, which was an easy starting point to add some variations. In addition, the program re-opens some of the files generated and saves them in a different format, with a different number of channels, sample rate or bit-depth. Some files are also renamed or deleted to simulate missing and misnamed files. I used some existing Tsugi libraries to add some simple audio processing to stretch the sounds, add silence at the beginning or at the end, as well as to insert random peaks. To test the spectral centroid, the application also boasts a few filters.
The user interface of our little tool allows us to select the voices (and therefore the corresponding languages) we want to generate dialogue for, as well as a percentage for each type of error we want to insert. Playing with the percentages allows us to test various cases from one or two tiny mistakes lost in thousands of files to all files having multiple errors. The program also generates a log file so we can see which errors were added to which files and check that they are actually detected in Alto.
Here is a picture of AltoTestGen:
Not very pretty but it did the job! All in all, after just a couple of hours of programming, we had thousands of lines in about 20 languages and could generate thousands of new ones in a matter of minutes. Although the voice acting is definitely not AAA , these files are easy to recognize and perfect for our tests: mission accomplished!