[jsword-devel] i18n basic

DM Smith jsword-devel@crosswire.org
Mon, 05 Apr 2004 16:36:43 -0400


Joe, Thanks for your explanation. It makes sense. I did not mean to 
criticize the existing
implementation in a way that put you on the defensive. So I don't take your 
response as defensive. A little bit of professional contention provides for 
better design and code. At least for me, it does.

I will try to clarify some of my points below and comment on your reply.

First, I find the design of MsgBase and Msg to be solid.

Joe Walker wrote:
>
>Quick outline of the thinking behind Msg i18n.
>
>Problems with the pure Java way of doing things:
>1. resources can get lost. when you look at a resource file it isn't easy 
>to know if everything is needed and what has been orphaned. over time this 
>file becomes a nightmare to maintain and translate

I have always found that programmers want everything in the code. Which is 
generally where I want it. I agree that it is easier to maintain explicit 
references.

>2. it hinders re-factoring because there is a central point on which 
>everything depends moving things around is made harder

I am not sure how having keyed strings in a property file makes it difficult 
to refactor. I find the refactoring support in Eclipse to be outstanding. I 
think I saw that it will even refactor in non-java files if you wish. Though 
that requires preview, which makes sense since it is not a programmatic 
analysis but just a brute force text substitution.

>3. many programmers are lazy and don't bother to i18n their strings and 
>need some "encouragement" to do so.

I like the design of having LucidException require an object i.e. MsgBase 
since it does enforce the issue. That is if the constructor that takes 
strings is removed.

>4. it is fairly expensive to consider when the market for an i18n version 
>of your program is unknown. To put it another way: Given the effort, should 
>I do this now or later?
>
>The idea behind Msg is to use the compiler to solve some of these problems.
>

And for that reason, I think it or something like it should continue.

>1. To find the uses of a resource just click on it in eclipse and press 
>ctrl+shift+G. If there are no matches, the resource can be deleted.

Learn something new every day! Thanks. I was doing this by brute force 
searching.

>2. Since each Msg class lives with the classes it serves it can be moved 
>with them very easily

On one project which I had to internationalize existing code, we found it 
easier to have an English property file translated than have code 
translated.

Turns out that the people who did the translation were not programmers.

Having the class be the resource bundle/property set would have confused our 
translators.

Having those property files in the same package should not be a problem.

>3. You fairly much can't throw an exception without i18n enabling your 
>message. It would be good if we could i18n enforce swing in the same way
>4. Is the justification for basing things on an English string. The barrier 
>to entry is lower for English speaking programmers, but the same for 
>others. It is fragile, but easier for most.
>

I do think that the keys should be English. I don't know that the key should 
be the English message. Since the string is not used but the member 
variable's name it should not matter what is the implementation behind that 
symbolic constant.

>That's the thinking behind it anyway. I'm not saying it has to stay as it 
>is because I'm not convinced it is totally right, but it might help to 
>explain why things are as they are.
>I've noted some of the flaws that I know of below.
>
>Where I've explained things below don't take me as being defensive against 
>criticism, just anticipating comments and explaining! Feel free to 
>challenge.
>
>
>DM Smith wrote:
>>The first thing anyone thinks of when internationalizing a program is the 
>>text a user sees.
>>
>>I consider this i18n basic. The following is an analysis of it in JSword. 
>>The upshot is that I think there are some opportunities for improvement. I 
>>have specific proposals for these opportunities. If you like, I will work 
>>on it.
>>
>>For Java the key mechanisms for it are ResourceBundles and MessageFormat. 
>>The former to locate the translations and the latter for formatting 
>>composited text.
>
>Our I18N system builds on these.
>
>>For JSword the key design used with in the program already is:
>>1) That resources for a class can be in three different areas.
>>   a) In the same package as the classes using the resources.
>>   b) In the resources jar stored in one of the following locations
>>     i) stored in a file in the root w/ '.' between parts of the pkg name
>>       e.g. org.crosswire.common.util.MyResources_de_CH.properties
>>     ii) stored in a file with '/' between parts of the pkg name
>>       e.g. org/crosswire/common/util/MyResources.properties
>>   c) In ~/jsword in the same kind of locations as those stored in 
>>resource.jar
>>2) Externalized Strings are represented in an Enum (currently from 
>>Apache).
>>        This allows for the explicit cataloging of the externalized 
>>strings.
>
>The only place that we are using a) is for JAXB, over which we have no
>control, I think???

The current implementation of Msg requires a) for translations. Currently we 
are using the Msg class for the ResourceBundle.

>I like it that we can ship resources in b) to be overridden in c). This 
>would seem to be useful to me. However I'm not sure that having i) and ii) 
>are such a good idea.
>
>I think Enum isn't a good idea and we could easily strip it out and replace 
>with something lighter-weight.
>
>>The current implementation uses MsgBase, LucidException and EventException 
>>to represent the translations. (I have supplied an additional mechanism 
>>ActionFactory). LogicError does not internationalize its messages.
>
>I don't see the Exception classes as a big part of this, more something 
>that converts Msgs to Strings. The fact that getMessage() in these classes 
>is more than Msg.toString is poor, and I'm not sure why it is like that.
>LogicError should probably be replaced with assert, because that is what it 
>is supposed to be used for. Hence no i18n needed.
>
I think that the program should be able to handle every error it can without 
exiting abruptly. Even if it is a message that states that the program must 
shut down because of it.

If an Assert abruptly shuts down the application or can't be trapped to put 
up a friendly message, then I think that a LogicError is in order. If it can 
be used and still have nice shutdown behavior, then use asserts.

I have not had experience with asserts to know whether they are a good fit. 
So let us know what you find.

>>EventException does its lookup for a resource bundle called Exception. 
>>There are a few problems with the implementation. (it is not a big deal 
>>since it is only used once)
>>   1) It assumes that the message passed to it is a key in the resource 
>>bundle.
>>   2) This resource does not exist.
>>   3) This resource is required to be in a) the same package as the class. 
>>This means that every change to the program requires.
>
>I think EventException has a bad name, RuntimeLucidException would be 
>better and as you say it is poorly implemented.

Below I suggest LucidRuntimeException so that it sorts with LucidException.

>
>>MsgBase is designed to be subclassed in every package that has strings 
>>that need to be externalized. By practice the derived class is always 
>>called Msg.  MsgBase extends apache's Enum. And each Msg is a member of 
>>the enum. The ResourceBundle lookup is for Msg which finds Msg.class and 
>>loads it as the resource itself.
>
>The Msg convention is just to keep a much repeated string short to avoid 
>much code bloat. As soon as you know what it is about, a long name just 
>gets in the way.
>

I agree that we should use symbolic constants or the enum pattern to hide 
the actual string.

>>There are a few problems/weaknesses with this implementation.
>>   1) The resource is required to be in the same package as the derived 
>>Msg class.
>
>There is nothing to stop you from declaring your resources public, and that 
>does highlight people as to their wider use.
>It might be a pain to use resources from more than one package because 
>everything is called Msg, so you would need to fully qualify the Msg class. 
>However I've not found a time when I've needed to do this so far.

I don't think that we should collect resources from different packages into 
one property file.
I think we should either have property files scoped by the package to which 
they refer.

What I meant here is that the property file is required to be physically 
present in the same directory as the class or in another jar with the same 
path. In otherwords, it cannot be overridden or as a file like 
org.crosswire.common.swing.Msg.properties in the resource folder.

>
>>   2) Access protection is protected for MsgBase constructor and private 
>>for each derived Msg class.
>>     One cannot create a Msg_de.java (which probably is a good thing).
>>     This means that translations must be put into property files       in 
>>the same package as the derived Msg class.
>
>I don't understand the second comment?

I have thought again about this. You can have a Msg_de.java derived from 
MsgBase.java.
If derivation had to be from Msg.java, it could not have been done, which 
would have forced using files instead of classes.

I think that even with the "out of sight, out of mind" maintenance problem, 
it is still worth having files over classes for resource bundles.

>
>>   3) The Msg objects are enumerated explicitly as Msg objects and the 
>>literal text of the Msg is used as a key to lookup the resource.
>>       a) If a spelling error or some other change happens to the text, 
>>every property file with its translation need to be modified.
>
>Agreed, and as things get more professional we will probably need to run a 
>macro across the code to create lots of English language property files and 
>introduce keys based on the constant names.

Eclipse has "Externalize Strings..." which looks promising. It will let you 
pick which strings need to be internationalized and which ones don't. You 
can name the keys. It will generate an implementation which may not be what 
we want, but at least it will create the property file correctly.

For the strings that don't need to be internationalized, it puts 
//$NO-NLS-1$ at the end of the line.
These are ignored for subsequent passes of "Externalize Strings".

Further once this is done for the whole program, I think it is possible to 
warn (maybe error, too) of strings that are not externalized. (I used 
CodePro from Instantiations to do this at work, but I think it may be in 
Eclipse 3 by now) This would help enforce the practice.

>
>>       b) the keys have spaces in them and need to be escaped
>>              I\ think\ that\ it\ looks\ bad=Don't you?
>
>Yes it does look bad.
>We could probably re-write the Properties parser to make this better??

Not sure that we can plug in a new Properties parser into ResourceBundle.
Not sure that we would want to even if we could.

>
>>Here are the changes that I think would solve these problems:
>>1) Keys are independent from their messages and are of the form:
>>           public class MsgKey extends Enum
>>      and
>>          public class MsgKeyImpl extends MsgKey
>>   (It does not matter to me what the class names are. I always struggle 
>>with finding good class names).
>
>I'm not sure I understand this, how do these relate to Msg and MsgBase?

Background (which you probably already know).
Currently, when calling ResourceBundle.getResource("MyResource") it will 
first look for a class name MyResource and if found will load that, if it is 
assignment compatable with Resource (I guess that this means derived from 
ResourceBundle or its descendants). If not it looks for 
MyResource.properties.
Then it looks for "MyResource_lang.class" or "MyResource_lang.properties". 
And loads this overtop of what it already found.
Then it looks for "MyResource_lang_country.class" and 
"MyResource_lang_country.properties" and again loads what it finds.
Then if the locale defines a dialect, it goes for that.

I misspoke, I thought that Msg and MsgBase were acting as a ResourceBundle. 
They are not. They require property files to provide the text. These classes 
are written such that if they do not find the key in the loaded resource 
bundle, they return their text.

My thought was that we would need to separate the enumeration of the key 
from the class itself.

>
>>2) Resources are allowed to be in all the locations but will be held in 
>>property files just like the others in resource. This can be done with the 
>>new CWClassLoader.
>
>This sounds like a good idea. Since J-Sword has never been translated, I'd 
>not given any though to where other translations sit. I think I once got a 
>basic French translation working, but since my French is very poor, I 
>deleted it in case anyone saw! However this was a long time ago.

Once we have the English property files we can request translations from 
non-programmers.

>
>>3) MsgBase derives from Object and uses MsgKey to do the lookup. With a 
>>little bit of magic it does not require init to do the resource loading:
>>          ResourceBundle resources =
>>                 ResourceBundle.getResources(
>>                       getClass().getName(),   // load the derived classes 
>>resources by class name
>>                       Locale.getDefault(),      // for the user's locale
>>                       new CWClassLoader());  // looking in all the right 
>>places
>
>So MsgKey is a utility class to help with resource lookup?

We can ignore MsgKey. The idea was to split the keys to a separate 
enumeration.

>
>>EventException and LogicError are renamed to LucidRuntimeExceptoin and 
>>LucidError. And that the common code is factored into a static methods in 
>>LucidException or a new class LucidUtil.
>
>Mostly my thoughts from above.
>I wonder if I could easily kill LogicError? I'll investigate.
>
>>Since we are at Java 1.4, I suggest a further change to use the chaining 
>>of throwables in the class.
>
>Definitely

The change to the classes will be trivial, however it will create an 
opportunity for Reporter to show the chain.

>
>>I think that these classes don't need to do the i18n themselves. This can 
>>be done in the class that is doing the creation of the object throwing a 
>>new lucid exception using its Msg class.
>
>i18n in the exception is a bad idea. If the i18n is done in a utility then 
>we can use the compiler to enforce i18n.

Currently, the LucidException takes either a MsgBase or a string. The former 
enforces internationalization but the latter circumvents it and should be 
removed. I presume it is there to facilitate a migration to Msg objects.

My recommendation here was to eliminate the 
ResourceBundle.getResource("Exception") in EventException.

There is nothing but a gatekeeper to prevent a developer from using other 
kinds of exceptions. Perhaps, directions should be added to the coding 
standards page to reflect what should be done.

I find that there are two messages that are needed in an exception when an 
exception is used as the vehicle for transporting a message from the point 
of infraction to the point where it can be reported. First is a message that 
the developers can understand and is normally hidden from the user (as is 
done with the Details button on the Reporter). Second is a message that the 
end users can understand.

In the application that we internationalized, we attempted to separate these 
messages. The stack traces with the internal messages remained in English so 
that they could be e-mailed to the developers (via cut and paste). By having 
it in English we did not need to have to re-translate it.

>
>>If we want to separate error messages from other messages, we could have a 
>>convention:
>>          class ExceptionMsg extends MsgBase
>>in each pkg using lucid exceptions.
>>The advantage of reusing MsgBase is that we reuse code.
>
>Comments welcome.
>
>Joe.

_________________________________________________________________
Limited-time offer: Fast, reliable MSN 9 Dial-up Internet access FREE for 2 
months! 
http://join.msn.com/?page=dept/dialup&pgmarket=en-us&ST=1/go/onm00200361ave/direct/01/