Main Page Content
Java Localization With Tmx Standard
Foreword
One of the main concerns of internationalization consists of separating the main source code from the texts, the labels, the messages and all the other objects related to the specific language in use. This facilitates the translation process as such as all the resources related to the local language context are well identified and separated. Since version JDK 1.1, Java provides great support for internationalization (i18n) by offering several instruments and tools, for example the support to Unicode 2.0, the multilingual environments and the object localization, just to mention a few.However, all these instruments may not be sufficient when we target a global market in which the costs to translate and update the texts (including labels, messages, menu elements and so on) can easily become quite high.This is the context where the TMX standard comes to help by applying to the translation and management process of these texts the concepts of reuse, increase of consistency, and the shortening of the production cycle. All this with the added bonus of cutting the development costs.TMX - Translation Memory eXchange
TMX is an open standard that uses XML for the archiving and mutual exchange of the Translation Memories (TM). These memories are created by using specific translation and localization software called CAT software (Computer Aided Translation).TMX is the result of a project developed by one of the Special Interest Groups of LISA, known as OSCAR (Open Standards for Container/Content Allowing Re-use). The goal of TMX is to provide a neutral system to exchange data between different translation systems, while minimizing or eliminating the loss of critical data. The TMX format is supported by the majority of the translation software in the market today. The specifics of the TMX standard are available for free in the website http://www.lisa.org/tmx/, together with several related links, documents, articles and software tools.TMX file example
Download Source Code:
sample_tmx.xml<?xml version="1.0" ?><tmx version="1.4"> <header creationtool="XYZTool" creationtoolversion="1.01-023" datatype="PlainText" segtype="sentence" adminlang="en-us" srclang="EN" o-tmf="ABCTransMem"> </header> <body> <tu tuid="hello" datatype="plaintext"> <tuv xml:lang="en"> <seg>hello</seg> </tuv> <tuv xml:lang="it"> <seg>ciao</seg> </tuv> </tu> <tu tuid="world" datatype="plaintext"> <tuv xml:lang="en"> <seg>world</seg> </tuv> <tuv xml:lang="it"> <seg>mondo</seg> </tuv> </tu> </body></tmx>where:
- tu - translation unit
- unit father of every element to be translated. It can contain a unique identifier (tuid).
- tuv - translation unit variant
- unit that contains the language code of the translation (xml:lang).
- seg - segment
- it contains the translated text.
TM - Translation Memory
http://www.opentag.com/tm.htm
The Translation Memories (TM), also known as Translation Database
, consist of a database in which the various sentences written in a reference language are linked to the associated translations in one or more languages.
translation memory unit(record of the database).The applications that use TM's are helpful tools for language translations, intended to improve the quality and the efficiency of the human translation process and not to substitute it.Whenever a new sentence is entered from the TM application, the application will search for it among the reference sentences in the database and will calculate a corresponding specific value according to the match (matching value).When the matching value is 100%, meaning exact match, the corresponding translation found in the database will be assumed to be correct and it will be directly utilized to build the translated text. When the matching value is smaller than 100% but bigger than a certain threshold (fuzzy match), the corresponding translation found in the database will be proposed to a human translator, so as to be judged and possibly corrected. For the sentences whose score falls under the threshold there will not be any proposed translation, and they will have to be entirely translated by hand. The new sentences for which a translation has been entered will be stored in the database and used for future searches. Several software houses offer complex commercial products that work similarly to these concepts.
LISA - Localization Industry Standards Association
Founded in 1990, LISA is the premier no-profit worldwide organization for GILT (Globalization, Internationalization, Localization and Translation). LISA includes different subjects as individuals, businesses, associations and organizations involved in languages, technologies for languages, and standards for languages.
Over 400 leading IT manufacturers and services providers, along with industry professionals representing corporations with an international business focus, have helped establish LISA's best practice guidelines and language-technology standards for enterprise globalization. LISA serves as a nexus between the many organizations engaged in helping businesses to become global enterprises. This includes customers, governments, technical and industry-specific standards organizations, research and consulting firms, language technology developers and service providers. LISA offers services in the form of standards initiatives, Special Interest Groups, conferences and training programs to provide GILT support to businesses.LISA partners and affiliate groups include the International Organization for Standardization (ISO Liaison Category A Members of TC 37 and TC 46), The World Bank, OASIS, IDEAlliance, AIIM, The Advisory Council (TAC), Fort-Ross, €TTEC, the Japan Technical Communicators Association, the Society of Automotive Engineers (SAE), the European Union, the Canadian Translation Bureau, TermNet, the American Translators Association (ATA), IWIPS, Fédération Internationale des Traducteurs (FIT), Termium, JETRO, the Institute of Translating and Interpreting (ITI), The Unicode Consortium, OpenI18N, and other professional and trade organizations. LISA members and co-founders include some of the largest and best-known companies in the world, including Adobe, Avaya, Cisco Systems, CLS Communication, EMC, Hewlett Packard, IBM, Innodata Isogen, Fuji Xerox, Microsoft, Oracle, Nokia, Logitech, SAP, Siebel Systems, Standard Chartered Bank, FileNet, LionBridge Technologies, Lucent, Sun Microsystems, WH&P, PeopleSoft, Philips Medical Systems, Rockwell Automation, The RWS Group, Xerox Corporation and Canon Research, among others.
TMX Java Bridge
With thejava.util.ResourceBundle
class, Java provides a useful solution for localization. Indeed, the methods of this class enable us to extract the textual elements from the original source code, by isolating them in a component named ResourceBundle
, for example the ListResourceBundle
class or a proprietary file.This solutions offer several advantages to the programmer but can become very complicated for the translator, especially in terms of reusability of the translation.A better option consists of the archiving of the textual resources in the exchange format TMX (XML file). This enables the translators to export and import the translations to and from their preferred translation tools (there are several compatible with TMX) in a way completely independent from the programming language utilized. As suggested by Masaki Itagaki in his article Use XML as a Java Localization Solution, the best solution to implement the TMX standard in Java applications consist of extending the ResourceBundle
class so that it can directly read data from XML files complying with the TMX standard:Java class ==> TMX file <== translation program
- tu - translation unit
- unit father of every element to be translated. It can contain a unique identifier (tuid).
- tuv - translation unit variant
- unit that contains the language code of the translation (xml:lang).
- seg - segment
- it contains the translated text.
<tu tuid="hello" datatype="plaintext"> <tuv xml:lang="en"> <seg>hello</seg> </tuv> <tuv xml:lang="it"> <seg>ciao</seg> </tuv> </tu>
TMXResourceBundle.java Class
To instantiate the classTMXResourceBundle
is the same as instantiating the class PropertyResourceBundle
. With the constructor we specify the name and path of the file in TMX format that contains the translations and the ISO code of the reference language. Once the class has been instantiated, the method parseXmlFile
(that uses the XML parser by Sun javax.xml.parsers.DocumentBuilder.parse
) loads the TMX data in an object of the org.w3c.dom.Document
type (DOM - Document Object Model).At this point, the nodes of the documents are examined and the key-value couples are added to hashcontents, an object of the type java.util.Hashtable
. These couples consist respectively of the attribute tuid
of the element tu
, and the value of the node seg
contained inside the node tuv
in which the value of the attribute xml:lang
is identical to the one of the specified language. The extension of the class ResourceBundle requires the overriding of the abstract methods handleGetObject
and getKeys
so as to enable us to extract the element corresponding to a particular key. This is done by using the methods inherited from ResourceBundle: getObject(String key)
, getString(String key)
, getStringArray(String key)
.The getString(String key, String def)
overloading of the method getString(String key)
of ResourceBundle returns the string associated to a particular key, or a default value in case of errors.
Source Code
Download source code:
TMXResourceBundle.javapackage com.tecnick.tmxjavabridge;import java.io.*;
import java.util.*;import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*;/**
* <p> * Reads resource text data directly from a TMX (XML) file. * </p> * <p> * First, the TMXResourceBundle class instantiates itself with two * parameters: a TMX file name and a target language name. Then, using a DOM * parser, it reads all of a translation unit's properties for the key * information and specified language data and populates a hashtable with them. * </p> * <p> * <b>TMX info: </b> http://www.lisa.org/tmx/ * </p> * * <h4>Implementation notes</h4> * <p> * You instantiate the TMXResourceBundle class in a program to read data from * a TMX file. Once the class is instantiated, it reads all the data in a TMX * file and loads into a DOM tree. Then it populates a hashtable so the * handleGetObject() method can be called to find text information based on a * key just as a standard ResourceBundle class does. <br> * Instantiating the TMXResouceBundle class is the same as instantiating the * PropertyResourceBundle class. First you obtain a system language code (e.g.: * from a locale's information). In TMX the value of the attribute must be one * of the ISO language identifiers (a two- or three-letter code) or one of the * standard locale identifiers (a two- or three-letter language code, a dash, * and a two-letter region code). * </p> * * Copyright (c) 2004-2005 * Tecnick.com S.r.l (www.tecnick.com) * Via Ugo Foscolo n.19 - 09045 Quartu Sant'Elena (CA) - ITALY * www.tecnick.com - info@tecnick.com <br/> * Project homepage: <a href="http://tmxjavabridge.sourceforge.net" target="_blank">http://tmxjavabridge.sourceforge.net</a><br/> * License: http://www.gnu.org/copyleft/lesser.html LGPL * * @author Nicola Asuni [www.tecnick.com]. * @version 1.1.005 */public class TMXResourceBundle extends ResourceBundle {
/**
* The hastable that will contain data loaded from XML */ protected Hashtable hashcontents = null;/**
* Number of translation units (tu) items */ protected int numberOfItems = 0;/**
* Vector to store tu items keys */ protected Vector vectOfItems;/**
* TMX to Hashtable conversion. Reads XML and store data in HashTable. * * @param xmlfile the TMX (XML) file to read, supports also URI resources or JAR resources * @param language ISO language identifier (a two- or three-letter code) */ public TMXResourceBundle(String xmlfile, String language) {String temp_key = null; // store hashtable key names
String temp_value = null; // store hashtable values NamedNodeMap temp_list = null; // list of <tu> attributes Attr temp_attr = null; // <tu> attribute NodeList listOfTUVs = null; // list of <tuv> elements NodeList listOfSEG = null; // list of <seg> elements Element SEGElements = null; // <seg> element int numberOfTUVs = 0; // number of <tuv> elements// Create Document with parser
Document document = parseXmlFile(xmlfile, false);// handle document error
if (document == null) { hashcontents = new Hashtable(); //initialize a void hashtable return; }// Make a list of Term Units and count the number of items
NodeList listOfTermUnits = document.getElementsByTagName("tu"); numberOfItems = listOfTermUnits.getLength();// set tu keys vector size
vectOfItems = new Vector(numberOfItems);// set hash size
hashcontents = new Hashtable(numberOfItems); for (int i = 0; i < numberOfItems; i++) { temp_value = null;// set a key
temp_list = listOfTermUnits.item(i).getAttributes(); temp_attr = (Attr) temp_list.getNamedItem("tuid"); temp_key = temp_attr.getValue();vectOfItems.add(temp_key); // store key on vector
// get a value
// Make a TUV list => "listOfTUVs" Node TUVs = listOfTermUnits.item(i); if (TUVs.getNodeType() == Node.ELEMENT_NODE) { Element TUVElements = (Element) TUVs; listOfTUVs = TUVElements.getElementsByTagName("tuv"); numberOfTUVs = listOfTUVs.getLength(); }// Check each TUV. If it's a specified lang, then get a SEG value
for (int j = 0; j < numberOfTUVs; j++) { temp_list = listOfTUVs.item(j).getAttributes(); temp_attr = (Attr) temp_list.getNamedItem("xml:lang"); if (temp_attr.getValue().equalsIgnoreCase(language)) { // -- Get a SEG value SEGElements = (Element) listOfTUVs.item(j); listOfSEG = SEGElements.getElementsByTagName("seg"); try { temp_value = listOfSEG.item(0).getFirstChild().getNodeValue(); } catch (Exception e) { // in case of error print error message and set value to // void string System.err.println(this.getClass().getName() + "(\"" + xmlfile + "\", \"" + language + "\") :: " + "Void <seg> value on <tu tuid=\"" + temp_key + "\"> key"); temp_value = ""; } } }// Populate hashtable
if ((temp_key != null) && (temp_value != null)) { hashcontents.put(temp_key, temp_value); } } // for loop } // convert/**
* Parses an XML file and returns a DOM document. * * @param filename the name of XML file * @param validating If true, the contents is validated against the DTD specified in the file. * @return the parsed document */ public Document parseXmlFile(String filename, boolean validating) { Document doc = null; DocumentBuilderFactory factory = null; // Create a builder factory try { factory = DocumentBuilderFactory.newInstance(); } catch (FactoryConfigurationError e) { System.err.println(e); return null; } factory.setValidating(validating); // Create the builder and parse the file try { try { // try to get the file from jar InputStream instream = getClass().getResourceAsStream(filename); doc = factory.newDocumentBuilder().parse(instream); } catch (Exception ejar) { try { // try to get the file as external URI doc = factory.newDocumentBuilder().parse(filename); } catch (IOException euri) { try { // try to get the file as local filename doc = factory.newDocumentBuilder().parse(new File(filename)); } catch (IOException efile) { try { // try to resolve the path as relative to local class folder String[] classPath = System.getProperties().getProperty("java.class.path", ".").split(";"); String newpath = classPath[0] + "/" + filename; doc = factory.newDocumentBuilder().parse(new File(newpath)); } catch (IOException epath) { // unable to get the input file System.err.println("IOException:" + epath); } } } } } catch (ParserConfigurationException e) { System.err.println("[" + filename + "] ParserConfigurationException:" + e); } catch (SAXException e) { System.err.println("[" + filename + "] SAXException:" + e); } return doc; }/**
* Get key value, return default if void. * * @param key name of key * @param def default value * @return parameter value or default */ public String getString(String key, String def) { String param_value = "";try {
param_value = this.getString(key); if ((param_value != null) && (param_value.length() > 0)) { return param_value; } } catch (Exception e) { // for any exception return the default value return def; } return def; }/**
* handleGetObject implementation * * @param key the resource key * @return the content associated to the specified key * @throws MissingResourceException */ public final Object handleGetObject(String key) throws MissingResourceException { return hashcontents.get(key); }/**
* Returns the number of translation units * * @return number of Items */ public int getNumberOfItems() { return numberOfItems; }/**
* Define getKeys method * * @return item elements */ public Enumeration getKeys() { return vectOfItems.elements(); }}
Source Code
This class shows how to instantiate the classTMXResourceBundle
with the example XML file quoted above.In this example the language code (en = English) is explicitly specified, but it can also be obtained from a locale's information.Download Source Code: tmxtest.java
package com.tecnick.tmxjavabridge.sample;import com.tecnick.tmxjavabridge.TMXResourceBundle;
/**
* Sample class for TMXResourceBundle class. * <br/><br/> * Copyright (c) 2004-2005 * Tecnick.com S.r.l (www.tecnick.com) * Via Ugo Foscolo n.19 - 09045 Quartu Sant'Elena (CA) - ITALY * www.tecnick.com - info@tecnick.com<br/> * License: http://www.gnu.org/copyleft/lesser.html LGPL * * @author Nicola Asuni [www.tecnick.com]. * @version 1.1.005 */public class TMXJBSample { /** * loads TMX data */ final static TMXResourceBundle res_en = new TMXResourceBundle("tmx/sample_tmx.xml", "en"); final static TMXResourceBundle res_it = new TMXResourceBundle("tmx/sample_tmx.xml", "it"); /** * Prints 2 strings on System.out * @param args String[] */ public static void main(String[] args) { System.out.println(res_en.getString("hello", "")); System.out.println(res_en.getString("world", "")); System.out.println(res_it.getString("hello", "")); System.out.println(res_it.getString("world", "")); }}
References
- Asuni N, Java Localization with TMX standard [online] 2004-10-14, http://evolt.org/Java-Localization-with-TMX-standard.
- Asuni N, TMXResourceBundle - TMX PHP Bridge [online] 2005-01-08, http://tmxphpbridge.sourceforge.net.
- Asuni N, TMXResourceBundle - TMX Java Bridge [online] 2005-01-08, http://tmxjavabridge.sourceforge.net.
- Itagaki M, Use XML as a Java Localization Solution [online] 2000-11-10, http://www.ftponline.com/javapro/archives/mi0011/default.asp.
- O'Conner J, Java Internationalization: Localization with ResourceBundles [online] 1998-10-01, http://java.sun.com/developer/technicalArticles/Intl/ResourceBundles/.
- OSCAR - LISA, TMX - Translation Memory eXchange [online] 2004-10-01, http://www.lisa.org/standards/tmx.
- OSCAR - LISA, TMX 1.4b Specification [online] 2005-03-26, http://www.lisa.org/standards/tmx/tmx.html.
- W3C, Extensible Markup Language (XML) [online] 2005-08-02, http://www.w3.org/XML/.