Skip to page content or Skip to Accesskey List.

Work

Main Page Content

Java Localization With Tmx Standard

Rated 3.49 (Ratings: 2)

Want more?

  • More articles in Code
 
Picture of nicolaasuni

Nicola Asuni

Member info

User since: 15 Jul 2003

Articles written: 3

Foreword

One of the main concerns of internationalization consists of separating the main source code from the texts, the labels, the messages and all the other objects related to the specific language in use. This facilitates the translation process as such as all the resources related to the local language context are well identified and separated.

Since version JDK 1.1, Java provides great support for internationalization (i18n) by offering several instruments and tools, for example the support to Unicode 2.0, the multilingual environments and the object localization, just to mention a few.

However, all these instruments may not be sufficient when we target a global market in which the costs to translate and update the texts (including labels, messages, menu elements and so on) can easily become quite high.

This is the context where the TMX standard comes to help by applying to the translation and management process of these texts the concepts of reuse, increase of consistency, and the shortening of the production cycle. All this with the added bonus of cutting the development costs.

TMX - Translation Memory eXchange

http://www.lisa.org/tmx/

TMX is an open standard that uses XML for the archiving and mutual exchange of the Translation Memories (TM). These memories are created by using specific translation and localization software called CAT software (Computer Aided Translation).

TMX is the result of a project developed by one of the Special Interest Groups of LISA, known as OSCAR (Open Standards for Container/Content Allowing Re-use).

The goal of TMX is to provide a neutral system to exchange data between different translation systems, while minimizing or eliminating the loss of critical data. The TMX format is supported by the majority of the translation software in the market today.

The specifics of the TMX standard are available for free in the website http://www.lisa.org/tmx/, together with several related links, documents, articles and software tools.

TMX file example

Download Source Code:

sample_tmx.xml

<?xml version="1.0" ?>

<tmx version="1.4">

<header

creationtool="XYZTool"

creationtoolversion="1.01-023"

datatype="PlainText"

segtype="sentence"

adminlang="en-us"

srclang="EN"

o-tmf="ABCTransMem">

</header>

<body>

<tu tuid="hello" datatype="plaintext">

<tuv xml:lang="en">

<seg>hello</seg>

</tuv>

<tuv xml:lang="it">

<seg>ciao</seg>

</tuv>

</tu>

<tu tuid="world" datatype="plaintext">

<tuv xml:lang="en">

<seg>world</seg>

</tuv>

<tuv xml:lang="it">

<seg>mondo</seg>

</tuv>

</tu>

</body>

</tmx>

where:

tu - translation unit
unit father of every element to be translated. It can contain a unique identifier (tuid).

tuv - translation unit variant
unit that contains the language code of the translation (xml:lang).

seg - segment
it contains the translated text.

TM - Translation Memory
http://www.opentag.com/tm.htm

The Translation Memories (TM), also known as Translation Database, consist of a database in which the various sentences written in a reference language are linked to the associated translations in one or more languages.

A reference sentence together with its translations is called translation memory unit (record of the database).

The applications that use TM's are helpful tools for language translations, intended to improve the quality and the efficiency of the human translation process and not to substitute it.

Whenever a new sentence is entered from the TM application, the application will search for it among the reference sentences in the database and will calculate a corresponding specific value according to the match (matching value).

When the matching value is 100%, meaning exact match, the corresponding translation found in the database will be assumed to be correct and it will be directly utilized to build the translated text. When the matching value is smaller than 100% but bigger than a certain threshold (fuzzy match), the corresponding translation found in the database will be proposed to a human translator, so as to be judged and possibly corrected. For the sentences whose score falls under the threshold there will not be any proposed translation, and they will have to be entirely translated by hand. The new sentences for which a translation has been entered will be stored in the database and used for future searches.

Several software houses offer complex commercial products that work similarly to these concepts.

LISA - Localization Industry Standards Association

http://www.lisa.org

Founded in 1990, LISA is the premier no-profit worldwide organization for GILT (Globalization, Internationalization, Localization and Translation). LISA includes different subjects as individuals, businesses, associations and organizations involved in languages, technologies for languages, and standards for languages.

Over 400 leading IT manufacturers and services providers, along with industry professionals representing corporations with an international business focus, have helped establish LISA's best practice guidelines and language-technology standards for enterprise globalization.

LISA serves as a nexus between the many organizations engaged in helping businesses to become global enterprises. This includes customers, governments, technical and industry-specific standards organizations, research and consulting firms, language technology developers and service providers.

LISA offers services in the form of standards initiatives, Special Interest Groups, conferences and training programs to provide GILT support to businesses.

LISA partners and affiliate groups include the International Organization for Standardization (ISO Liaison Category A Members of TC 37 and TC 46), The World Bank, OASIS, IDEAlliance, AIIM, The Advisory Council (TAC), Fort-Ross, €TTEC, the Japan Technical Communicators Association, the Society of Automotive Engineers (SAE), the European Union, the Canadian Translation Bureau, TermNet, the American Translators Association (ATA), IWIPS, Fédération Internationale des Traducteurs (FIT), Termium, JETRO, the Institute of Translating and Interpreting (ITI), The Unicode Consortium, OpenI18N, and other professional and trade organizations.

LISA members and co-founders include some of the largest and best-known companies in the world, including Adobe, Avaya, Cisco Systems, CLS Communication, EMC, Hewlett Packard, IBM, Innodata Isogen, Fuji Xerox, Microsoft, Oracle, Nokia, Logitech, SAP, Siebel Systems, Standard Chartered Bank, FileNet, LionBridge Technologies, Lucent, Sun Microsystems, WH&P, PeopleSoft, Philips Medical Systems, Rockwell Automation, The RWS Group, Xerox Corporation and Canon Research, among others.

TMX Java Bridge

With the java.util.ResourceBundle class, Java provides a useful solution for localization. Indeed, the methods of this class enable us to extract the textual elements from the original source code, by isolating them in a component named ResourceBundle, for example the ListResourceBundle class or a proprietary file.

This solutions offer several advantages to the programmer but can become very complicated for the translator, especially in terms of reusability of the translation.

A better option consists of the archiving of the textual resources in the exchange format TMX (XML file). This enables the translators to export and import the translations to and from their preferred translation tools (there are several compatible with TMX) in a way completely independent from the programming language utilized.

As suggested by Masaki Itagaki in his article Use XML as a Java Localization Solution, the best solution to implement the TMX standard in Java applications consist of extending the ResourceBundle class so that it can directly read data from XML files complying with the TMX standard:

Java class ==> TMX file <== translation program

This allows us to take advantage of all the aspects of the ResourceBundle class and to simplify the porting process toward external TMX applications.

The disadvantages of this technique are mainly related to the time and the memory necessary to load the entire TMX file.

With the intention to simplify our explanation, we will consider just those TMX elements necessary to translate a simple text (see sample_tmx.xml):

tu - translation unit
unit father of every element to be translated. It can contain a unique identifier (tuid).

tuv - translation unit variant
unit that contains the language code of the translation (xml:lang).

seg - segment
it contains the translated text.

for example:

	<tu tuid="hello" datatype="plaintext">

<tuv xml:lang="en">

<seg>hello</seg>

</tuv>

<tuv xml:lang="it">

<seg>ciao</seg>

</tuv>

</tu>

TMXResourceBundle.java Class

To instantiate the class TMXResourceBundle is the same as instantiating the class PropertyResourceBundle.

With the constructor we specify the name and path of the file in TMX format that contains the translations and the ISO code of the reference language.

Once the class has been instantiated, the method parseXmlFile (that uses the XML parser by Sun javax.xml.parsers.DocumentBuilder.parse) loads the TMX data in an object of the org.w3c.dom.Document type (DOM - Document Object Model).

At this point, the nodes of the documents are examined and the key-value couples are added to hashcontents, an object of the type java.util.Hashtable. These couples consist respectively of the attribute tuid of the element tu, and the value of the node seg contained inside the node tuv in which the value of the attribute xml:lang is identical to the one of the specified language.

The extension of the class ResourceBundle requires the overriding of the abstract methods handleGetObject and getKeys so as to enable us to extract the element corresponding to a particular key. This is done by using the methods inherited from ResourceBundle: getObject(String&nbsp;key), getString(String&nbsp;key), getStringArray(String&nbsp;key).

The getString(String&nbsp;key,&nbsp;String&nbsp;def) overloading of the method getString(String&nbsp;key) of ResourceBundle returns the string associated to a particular key, or a default value in case of errors.

Source Code

Download source code:

TMXResourceBundle.java

package com.tecnick.tmxjavabridge;

import java.io.*;

import java.util.*;

import javax.xml.parsers.*;

import org.w3c.dom.*;

import org.xml.sax.*;

/**

* <p>

* Reads resource text data directly from a TMX (XML) file.

* </p>

* <p>

* First, the TMXResourceBundle class instantiates itself with two

* parameters: a TMX file name and a target language name. Then, using a DOM

* parser, it reads all of a translation unit's properties for the key

* information and specified language data and populates a hashtable with them.

* </p>

* <p>

* <b>TMX info: </b> http://www.lisa.org/tmx/

* </p>

*

* <h4>Implementation notes</h4>

* <p>

* You instantiate the TMXResourceBundle class in a program to read data from

* a TMX file. Once the class is instantiated, it reads all the data in a TMX

* file and loads into a DOM tree. Then it populates a hashtable so the

* handleGetObject() method can be called to find text information based on a

* key just as a standard ResourceBundle class does. <br>

* Instantiating the TMXResouceBundle class is the same as instantiating the

* PropertyResourceBundle class. First you obtain a system language code (e.g.:

* from a locale's information). In TMX the value of the attribute must be one

* of the ISO language identifiers (a two- or three-letter code) or one of the

* standard locale identifiers (a two- or three-letter language code, a dash,

* and a two-letter region code).

* </p>

*

* Copyright (c) 2004-2005

* Tecnick.com S.r.l (www.tecnick.com)

* Via Ugo Foscolo n.19 - 09045 Quartu Sant'Elena (CA) - ITALY

* www.tecnick.com - info@tecnick.com <br/>

* Project homepage: <a href="http://tmxjavabridge.sourceforge.net" target="_blank">http://tmxjavabridge.sourceforge.net</a><br/>

* License: http://www.gnu.org/copyleft/lesser.html LGPL

*

* @author Nicola Asuni [www.tecnick.com].

* @version 1.1.005

*/

public class TMXResourceBundle extends ResourceBundle {

/**

* The hastable that will contain data loaded from XML

*/

protected Hashtable hashcontents = null;

/**

* Number of translation units (tu) items

*/

protected int numberOfItems = 0;

/**

* Vector to store tu items keys

*/

protected Vector vectOfItems;

/**

* TMX to Hashtable conversion. Reads XML and store data in HashTable.

*

* @param xmlfile the TMX (XML) file to read, supports also URI resources or JAR resources

* @param language ISO language identifier (a two- or three-letter code)

*/

public TMXResourceBundle(String xmlfile, String language) {

String temp_key = null; // store hashtable key names

String temp_value = null; // store hashtable values

NamedNodeMap temp_list = null; // list of <tu> attributes

Attr temp_attr = null; // <tu> attribute

NodeList listOfTUVs = null; // list of <tuv> elements

NodeList listOfSEG = null; // list of <seg> elements

Element SEGElements = null; // <seg> element

int numberOfTUVs = 0; // number of <tuv> elements

// Create Document with parser

Document document = parseXmlFile(xmlfile, false);

// handle document error

if (document == null) {

hashcontents = new Hashtable(); //initialize a void hashtable

return;

}

// Make a list of Term Units and count the number of items

NodeList listOfTermUnits = document.getElementsByTagName("tu");

numberOfItems = listOfTermUnits.getLength();

// set tu keys vector size

vectOfItems = new Vector(numberOfItems);

// set hash size

hashcontents = new Hashtable(numberOfItems);

for (int i = 0; i < numberOfItems; i++) {

temp_value = null;

// set a key

temp_list = listOfTermUnits.item(i).getAttributes();

temp_attr = (Attr) temp_list.getNamedItem("tuid");

temp_key = temp_attr.getValue();

vectOfItems.add(temp_key); // store key on vector

// get a value

// Make a TUV list => "listOfTUVs"

Node TUVs = listOfTermUnits.item(i);

if (TUVs.getNodeType() == Node.ELEMENT_NODE) {

Element TUVElements = (Element) TUVs;

listOfTUVs = TUVElements.getElementsByTagName("tuv");

numberOfTUVs = listOfTUVs.getLength();

}

// Check each TUV. If it's a specified lang, then get a SEG value

for (int j = 0; j < numberOfTUVs; j++) {

temp_list = listOfTUVs.item(j).getAttributes();

temp_attr = (Attr) temp_list.getNamedItem("xml:lang");

if (temp_attr.getValue().equalsIgnoreCase(language)) {

// -- Get a SEG value

SEGElements = (Element) listOfTUVs.item(j);

listOfSEG = SEGElements.getElementsByTagName("seg");

try {

temp_value = listOfSEG.item(0).getFirstChild().getNodeValue();

} catch (Exception e) {

// in case of error print error message and set value to

// void string

System.err.println(this.getClass().getName() + "(\""

+ xmlfile + "\", \"" + language + "\") :: "

+ "Void <seg> value on <tu tuid=\"" + temp_key

+ "\"> key");

temp_value = "";

}

}

}

// Populate hashtable

if ((temp_key != null) && (temp_value != null)) {

hashcontents.put(temp_key, temp_value);

}

} // for loop

} // convert

/**

* Parses an XML file and returns a DOM document.

*

* @param filename the name of XML file

* @param validating If true, the contents is validated against the DTD specified in the file.

* @return the parsed document

*/

public Document parseXmlFile(String filename, boolean validating) {

Document doc = null;

DocumentBuilderFactory factory = null;

// Create a builder factory

try {

factory = DocumentBuilderFactory.newInstance();

} catch (FactoryConfigurationError e) {

System.err.println(e);

return null;

}

factory.setValidating(validating);

// Create the builder and parse the file

try {

try {

// try to get the file from jar

InputStream instream = getClass().getResourceAsStream(filename);

doc = factory.newDocumentBuilder().parse(instream);

} catch (Exception ejar) {

try {

// try to get the file as external URI

doc = factory.newDocumentBuilder().parse(filename);

} catch (IOException euri) {

try {

// try to get the file as local filename

doc = factory.newDocumentBuilder().parse(new File(filename));

} catch (IOException efile) {

try {

// try to resolve the path as relative to local class folder

String[] classPath = System.getProperties().getProperty("java.class.path", ".").split(";");

String newpath = classPath[0] + "/" + filename;

doc = factory.newDocumentBuilder().parse(new File(newpath));

} catch (IOException epath) {

// unable to get the input file

System.err.println("IOException:" + epath);

}

}

}

}

} catch (ParserConfigurationException e) {

System.err.println("[" + filename + "] ParserConfigurationException:" + e);

} catch (SAXException e) {

System.err.println("[" + filename + "] SAXException:" + e);

}

return doc;

}

/**

* Get key value, return default if void.

*

* @param key name of key

* @param def default value

* @return parameter value or default

*/

public String getString(String key, String def) {

String param_value = "";

try {

param_value = this.getString(key);

if ((param_value != null) && (param_value.length() > 0)) {

return param_value;

}

} catch (Exception e) {

// for any exception return the default value

return def;

}

return def;

}

/**

* handleGetObject implementation

*

* @param key the resource key

* @return the content associated to the specified key

* @throws MissingResourceException

*/

public final Object handleGetObject(String key)

throws MissingResourceException {

return hashcontents.get(key);

}

/**

* Returns the number of translation units

*

* @return number of Items

*/

public int getNumberOfItems() {

return numberOfItems;

}

/**

* Define getKeys method

*

* @return item elements

*/

public Enumeration getKeys() {

return vectOfItems.elements();

}

}

Source Code

This class shows how to instantiate the class TMXResourceBundle with the example XML file quoted above.

In this example the language code (en = English) is explicitly specified, but it can also be obtained from a locale's information.

Download Source Code: tmxtest.java

package com.tecnick.tmxjavabridge.sample;

import com.tecnick.tmxjavabridge.TMXResourceBundle;

/**

* Sample class for TMXResourceBundle class.

* <br/><br/>

* Copyright (c) 2004-2005

* Tecnick.com S.r.l (www.tecnick.com)

* Via Ugo Foscolo n.19 - 09045 Quartu Sant'Elena (CA) - ITALY

* www.tecnick.com - info@tecnick.com<br/>

* License: http://www.gnu.org/copyleft/lesser.html LGPL

*

* @author Nicola Asuni [www.tecnick.com].

* @version 1.1.005

*/

public class TMXJBSample {

/**

* loads TMX data

*/

final static TMXResourceBundle res_en = new TMXResourceBundle("tmx/sample_tmx.xml", "en");

final static TMXResourceBundle res_it = new TMXResourceBundle("tmx/sample_tmx.xml", "it");

/**

* Prints 2 strings on System.out

* @param args String[]

*/

public static void main(String[] args) {

System.out.println(res_en.getString("hello", ""));

System.out.println(res_en.getString("world", ""));

System.out.println(res_it.getString("hello", ""));

System.out.println(res_it.getString("world", ""));

}

}

References

Nicola Asuni is the founder and president of Tecnick.com S.r.l., a leading provider of award-winning Web Software.
He has been a freelance programmer since 1993 and he actively contributed to several web-related Open-Source Projects.
He is the founder of Technick.net site website, since 1998 the largest connector and cable pinout archive on the web.
He is also member and co-founder of Java User Group Sardegna Onlus, and a member of GULCh - Gruppo Utenti Linux Cagliari.

For a complete Curriculum Vitae please browse: http://nicolaasuni.tecnick.com

The access keys for this page are: ALT (Control on a Mac) plus:

evolt.org Evolt.org is an all-volunteer resource for web developers made up of a discussion list, a browser archive, and member-submitted articles. This article is the property of its author, please do not redistribute or use elsewhere without checking with the author.