[jsword-devel] i18n basic

DM Smith jsword-devel@crosswire.org
Mon, 05 Apr 2004 09:34:30 -0400


The first thing anyone thinks of when internationalizing a program is the 
text a user sees.

I consider this i18n basic. The following is an analysis of it in JSword. 
The upshot is that I think there are some opportunities for improvement. I 
have specific proposals for these opportunities. If you like, I will work on 
it.

For Java the key mechanisms for it are ResourceBundles and MessageFormat. 
The former to locate the translations and the latter for formatting 
composited text.

For JSword the key design used with in the program already is:
   1) That resources for a class can be in three different areas.
          a) In the same package as the classes using the resources.
          b) In the resources jar stored in one of the following locations
                 i) stored in a file in the root w/ '.' between parts of the 
pkg name
                         e.g. 
org.crosswire.common.util.MyResources_de_CH.properties
                 ii) stored in a file with '/' between parts of the pkg name
                         e.g. 
org/crosswire/common/util/MyResources.properties
          c) In ~/jsword in the same kind of locations as those stored in 
resource.jar
    2) Externalized Strings are represented in an Enum (currently from 
Apache).
        This allows for the explicit cataloging of the externalized strings.

The current implementation uses MsgBase, LucidException and EventException 
to represent the translations. (I have supplied an additional mechanism 
ActionFactory). LogicError does not internationalize its messages.

EventException does its lookup for a resource bundle called Exception. There 
are a few problems with the implementation. (it is not a big deal since it 
is only used once)
   1) It assumes that the message passed to it is a key in the resource 
bundle.
   2) This resource does not exist.
   3) This resource is required to be in a) the same package as the class. 
This means that every change to the program requires.

MsgBase is designed to be subclassed in every package that has strings that 
need to be externalized. By practice the derived class is always called Msg. 
  MsgBase extends apache's Enum. And each Msg is a member of the enum. The 
ResourceBundle lookup is for Msg which finds Msg.class and loads it as the 
resource itself.

There are a few problems/weaknesses with this implementation.
   1) The resource is required to be in the same package as the derived Msg 
class.
   2) Access protection is protected for MsgBase constructor and private for 
each derived Msg class.
           One cannot create a Msg_de.java (which probably is a good thing).
           This means that translations must be put into property files in 
the same package
           as the derived Msg class.
   3) The Msg objects are enumerated explicitly as Msg objects and the 
literal text of the Msg is used as a key to lookup the resource.
       a) If a spelling error or some other change happens to the text, 
every property file with its translation need to be modified.
       b) the keys have spaces in them and need to be escaped
              I\ think\ that\ it\ looks\ bad=Don't you?

Here are the changes that I think would solve these problems:
1) Keys are independent from their messages and are of the form:
           public class MsgKey extends Enum
      and
          public class MsgKeyImpl extends MsgKey
   (It does not matter to me what the class names are. I always struggle 
with finding good class names).
2) Resources are allowed to be in all the locations but will be held in 
property files just like the others in resource. This can be done with the 
new CWClassLoader.
3) MsgBase derives from Object and uses MsgKey to do the lookup. With a 
little bit of magic it does not require init to do the resource loading:
          ResourceBundle resources =
                 ResourceBundle.getResources(
                       getClass().getName(),   // load the derived classes 
resources by class name
                       Locale.getDefault(),      // for the user's locale
                       new CWClassLoader());  // looking in all the right 
places

EventException and LogicError are renamed to LucidRuntimeExceptoin and 
LucidError. And that the common code is factored into a static methods in 
LucidException or a new class LucidUtil. Since we are at Java 1.4, I suggest 
a further change to use the chaining of throwables in the class.

I think that these classes don't need to do the i18n themselves. This can be 
done in the class that is doing the creation of the object throwing a new 
lucid exception using its Msg class. If we want to separate error messages 
from other messages, we could have a convention:
          class ExceptionMsg extends MsgBase
in each pkg using lucid exceptions.
The advantage of reusing MsgBase is that we reuse code.

_________________________________________________________________
Is your PC infected? Get a FREE online computer virus scan from McAfeeŽ 
Security. http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963