Automating the localization process

June 6, 2000
by Christopher Schmidt

Anyone who has ever had the pleasure of localizing a large multimedia production into several languages has either been transformed into a mindless cut-and-paste drone, or been quickly frustrated at the tediousness of the process and sought out an automated solution to the problem.

There is, I'm sure, more than one way to skin the localization cat. In the end, it boils down to collecting all text used in the project, passing it on to translators, and then replacing the original text with the translated text. This article describes the approach I took at tackling the problem, including the programming strategy and the Xtra and database used. As the Director sample movie and the database described in the article are available for download, I've included very little of the Lingo used in the project in the article.

Overview

Before getting started, an overview of the localization procedure might be helpful. Knowledge or information transfer in a multimedia production occurs, by definition, though several types of media -- graphic, text, audio and video. With the exception of text, most media can be decoupled from the localization process. Graphics and video need not contain any language-specific elements. The audio aspect should require no more effort than swapping external audio files. And with a relatively small presentation, or a production in which the text members contain little or no formatting, localizing text members need not involve more than switching to a cast containing the members for the given language.

When confronted with dozens of Director movies, numerous external casts, and hundreds or thousands of tediously formatted text members, the thought of spending a bit of time devising a method that exports all text members to a database format easily handled by a translator and imports the translated text back into the respective locations, retaining all formatting in the process, becomes very attractive.

The approach discussed here involves creating, in Peter Small's words, a couple of "intelligent agents" to handle the dirty work. One is used to seek out text members and, once found, write the content to a database. The other also seeks out text members, but reads the database and replaces the original text with the respective translation. This takes place, logically enough, in two phases. The objective of the first phase is to collect all text used in the project and store it in a database. In addition to the text, several other pieces of information need to be written to the database in order for the second agent to be able to place the translated texts in the respective text members with the appropriate formatting.

The method should work with any Director project, provided that Director 7 text cast members (not to be confused with field-type cast members) are used, and that each text member is named and is named uniquely.

Whether the text members are in a single external cast or scattered about in numerous internal and external casts plays no role in the method presented here. However, it is advantageous from an organizational standpoint to keep the text members in as few external casts as possible -- ideally one external cast.

Choice of Database and Xtra

There are several powerful databases and database Xtras available on the market, and each has its merits. Because the database to be created here needs to double as an interface for the translators, I have chosen the Access database and, obviously, the DataGrip Xtra. Access is readily available in most translation agencies and offers a comfortable environment in which to translate text. Should Access not be available to the translator, an equivalent Excel table can easily be created. From a programming standpoint, DataGrip is able to handle all read and write operations to and from the database used by our agents with just a few lines of code. The database itself contains a single table with several fields and a few SQL queries to handle the database operations.

The discussion that follows is based on the code contained in the sample movie. While this means putting any hope of presenting a more general localization concept on the back burner, I hope that the approach is general enough to be easily adapted to various applications.

About the database

Before the movie can start, we need to have a place to put the text once we've found it. As mentioned above, an Access database, textMaster.mdb, was created for this purpose. The database is simple, containing a single table with 8 fields:

id_memberKey: the master key for the database;
d_movieName: if the member is located in an internal cast, the name of the movie in which the cast is contained is entered. If the given text member is located in an external cast, the value "common" is entered instead;
d_memberNum: member number of the given text member;
d_castName: name of the cast in which the text member is located; out of the database and the respective text member overwritten with the new contents;
d_textMemberName: name of the given text member;
d_text: text to be translated;
d_ubersetzung: field in which the translation is to be performed. This is also the field that contains the text that is read out following translation of the database, and which will replace the original text in the Director movies.

The sample movie communicates with the database using the Datagrip Xtra and four SQL statements that are contained in the Access database:

q_CastName: used to search records by cast name, e.g. when checking whether a given cast member has been registered in the database;
qNewText: used when writing new records to the database;
q_dg_status: used to check whether a given text has been read out of the database and written to the appropriate Director movie (for text members located in internal casts);
q_dg_status_ext: used to check whether a given text has been read out of the database and written to the appropriate Director movie (for text members located in external casts).

Step 1: Text search

Once the final version of the Director project has been created, the agents can go to work. The sample movie contains the code for instantiating the objects and communicating with the database. This file needs to be copied into the folder containing the open Director files to be inspected. Also required is the database into which the texts are to be written. An empty copy of the Access database is also available here for downloading. In addition, the DataGrip Xtra, available from www.datagrip.com, needs to be installed on the computer. As the DataGrip Xtra is only available for Windows, the sample movie will not work on a Mac.

All Director movies contained in the given directory as well as all external casts linked to those movies will be searched for text members. If, for whatever reason, Director movies are located in different layers in the directory structure, the procedure described here should be performed at each directory level, i.e. a separate database should be created for each level.

To begin, open the sample movie. The user interface consists of two buttons: SEARCH, for searching the Director files for text cast members and writing the information to the database, and REPLACE, for reading the database and writing the translated texts back their the respective locations in the Director files.

Overview of the search routine

When the program, which is run from authoring mode, is started for the first time, a list of all Director movies contained in the given directory is created. In addition, two objects are instantiated: one for searching the movies and one for replacing text members. The objects are "set free" by clicking on the respective buttons.

When the SEARCH button is clicked, the search object begins working its way through the list of Director movies created when the program was started. Using Director's go movie command, the object reads the first movie name out of the list and takes the show on the road, opening each movie in succession. Once inside a movie, the object assembles a property list of the casts linked to the movie, where the property here is the name of the cast and the value specifies whether the given cast is external or internal:

returnList = [:]

repeat with x = 1 to the number of castLibs

  case (the fileName of castLib x) of
    EMPTY, the moviePath & the movieName:
      returnList.addProp(string(castLib(x).name),"INTERNAL")
    otherwise
      returnList.addProp(string(castLib(x).name),"EXTERNAL")
  end case

end repeat

The SEARCH object then works its way through the list of cast names to search for text members that have not yet been written to the database. Each time an external cast is searched, the name of that cast is appended to a list, pCheckedCast. Before a cast is searched for new text members, a check is performed to determine whether or not this cast has yet been inspected. In this way, it is ensured that each cast is inspected only one time. As internal casts are only visible to one specific Director movie, not to mention the fact that the names of internal casts within a Director project are often duplicated, the names of these casts are not appended to the pCheckedCast list.

Assuming the first cast name is that of an internal cast or that of an external cast that has not yet been inspected, a handler contained in the object (mCheckCast) is called. This handler searches the first 1000 cast members for members of type #text. Each time a text member is encountered, a call is made to the database to check whether a text member with this name has been added to the database. Note: had cast numbers and cast names been used as identifiers instead of only cast member names, this step would not have been necessary.

The next step is to gather formatting information for the current text member. Font attributes, such as font type, color and size, are intrinsic properties of each text member. In other words, when the translated texts are read out of the database and written to their respective cast members, the font color, size, etc., are retained. This information therefore does not need to be written to the database. However, two properties that are important and are not retained are TABs and RETURNs. By storing this information, the amount of time spent reformatting the translated text members can be reduced considerably. This is accomplished through the use of a pair of simple tags: <TAB> and <RETURN>. Before a new record is written to the database, the text to be written is inspected for TAB and RETURN characters. TABs are replaced with the <TAB> tag and RETURNs with the <RETURN> tag. After the TABs and RETURNs have been extracted and replaced with the appropriate tags, a typical text would appear as follows:

1/1<TAB>Sally sells seashells by the<RETURN><TAB>seashore

A problem lurks in the bushes here: formatting is retained only as long as the tags are not overwritten by the translator. It is also important that the translator make an effort to enter approximately the same number of characters between each tag. As most languages require more characters than English -- some over 20% more -- some reformatting is certain to be required.

After the formatting information has been added to the text, the record can be written to the database. For text members located in external casts, the cast member name, member number, member name and text are written to the database. For those located in external casts, the same information plus the movie name are written to the database.

This procedure is repeated for all of the movies contained in the movie list assembled when the movie was started. Depending on the number of text members contained in the project, the process may take many minutes. There are no bells and whistles indicating that Director is chugging along and has not crashed. As the agent works its way through the list of movies, each is opened and the name output to the message box. If the process seems to be taking an eternity, an ear pressed against the tender casing of your hard drive should alleviate any doubts as the program's progress.

Once all of the movies in the list have been inspected, the agent returns control to the movie.

Step 2: Have text, will trave

The Access database can now be passed on to a translation agency. We're assuming here, of course, that the translation agency has Access and that the translator is familiar enough with the program to overwrite the text in the appropriate column. In any case, including a brief description of how the database is to be translated wouldn't hurt. Of the eight columns in the database, only that farthest to the right, d_ubersetzung, is to be edited by the translator.

If the translation agency does not have Access (97), the relevant columns can be copied from Access and pasted into an Excel table. These columns are the two farthest to the right: d_text and d_ubersetzung. Once translated, these columns can be copied from Excel back into the Access table. Note: if your project includes relatively long text members, make certain that the texts are not truncated during the cut and paste operations.

Step 3: Inserting the translated texts

After you've received the translated database, you can put the localizer agents back to work. The basic structure of this step of the process is very similar to that used in the first step: a list of all movies contained in the current directory is created on program start-up. When the REPLACE button, located to the right in the movie, is pressed, this list is cycled through. As each movie is opened, a list of all casts linked to the movie is created. Again, this list is cycled through and cast members of type #text are located. Instead of writing text to the database when a text member is found, however, text is read from the database, reformatted and written back to the appropriate text member.

Each time a text member is encountered, a call is made to the database and the status of a Boolean field (d_status) contained in the record for the current text member is queried. If the value is FALSE, the text member has not yet been read from the database. In this case, the data for the record are read and returned to the agent object and the d_status Boolean for the record is set to TRUE.

Before the translated text can replace the original text, the temporary formatting inserted during the creation of the database needs to be replaced with the Director equivalents. Each text is sent to a handler (mTagCount) one time for each tag, i.e., once for <TAB> and once for <RETURN>. The text is repeatedly searched until all occurences of the given tag have been replaced.

The agent now overwrites the original text with the reformatted, translated text.

As in Step 1, this process is repeated until all text members in all casts in all movies in the current directory have been replaced. Again, this procedure may take several minutes, so put on your patience cap before pushing the REPLACE button.

Shortcomings and ideas for improvement

The Director/Access tag-team method presented here, though effective, isn't without room for improvement. One glaring weak point, as mentioned earlier, is the fact that cast member names are used as identifiers. This means that, should a cast member name occur twice, whether in an internal or an external cast, only the first occurrence will be replaced when it comes time to insert the translated text. This could be avoided by using the movie name, cast name and member number as the identifier. This would also eliminate the requirement that the text members be uniquely named.

Another area that could be improved is the text formatting, in particular text member width. Due to time constraints, I was not able to add a routine to adjust the widths of the translated text members. As I mentioned, some languages require considerably more text than English. This means that, although the RETURNs and TABs are inserted correctly into the translated text members, the members themselves may be too narrow to display the text correctly. A handler to correct this would be very simple if not for the fact that Director's lineCount() function, which returns the actual number of lines not necessarily the number of line breaks, applies only to field cast members. Text members only have the number of lines function, which returns the number of lines as forced by line returns. If, for example, a text member was two lines high before translation, and three lines high after translation, testing the number of lines of the translated text will still return two lines. To work around this, you might insert the text into a dummy field, and increase the width until the lineCount() returns the correct number of lines, here two. You would then set the width of the text member to the width of this dummy field. And presto, the formatting of the original text and the translated text should be identical.

Conclusion

The method I've described in this article provides an efficient way to extract all text from a Director project and import it into an Access database using the DataGrip Xtra. In this way, text can, for the most part, be collected, passed on to a translator for translation, and reimported to the Director project, all with minimal manual changes to the text members. Though not without flaws and room for improvement, I hope the technique will be of assistance to someone, somewhere, faced with localizing a text- heavy Director project.

Christopher Schmidt left his dreams of being a lightning scientist behind him for the as nearly electrifying world of multimedia development. In addition to freelance Director and Web programming, Christopher also performs German-English technical translations to support his bicycle racing habit.