[jsword-devel] Languages
DM Smith
dmsmith at crosswire.org
Mon Feb 11 09:55:57 MST 2013
I've been looking at the issues regarding the Language and Languages classes; bcp-47 (the standard that defines the designation of Locale); java7's support for it in Locale; and what SWORD has defined.
First, the basic purpose of the JSword classes is to provide a friendly name for the language code. It is not meant as Locale support.
There are 7000+ languages, so JSword splits these into two.
First Part:
Those languages that are in use by SWORD modules on the CrossWire server and those found on the CrossWire wiki under http://www.crosswire.org/wiki/Localized_Language_Names.
These are in the iso639.properties property files. When we started out, this name was appropriate, but it has morphed into names in the above list. So really it is a subset of bcp-47.
These files can be localized. The default file has localized names from the wiki. Failing that from www.sil.org/iso639-3.
The iso639_en.properties file is similar to the default file, but has the localized name in parens following the English name.
So the default is not English.
Second Part:
As a fall back to the First Part, if there is a new SWORD module for a language that is not covered by the first part, we can do one of two things:
a) Just show the code as the language name.
b) Show the name as defined by SIL's iso639-3 files.
The property file iso639full.properties is a map of 2 and 3 letter language codes to the name of the language from SIL's files. This is a huge, slow property file.
Since this is a fall back, this file has no need to be internationalized.
BCP-47 and SWORD:
This standard far exceeds what SWORD allows in a Lang field. SWORD defines the field as having a required part and two optional parts: LL-SSSS-CC
Where LL is required and a 2 or 3 letter language code.
Where -SSSS is the optional, 4 character script.
Where -CC is the optional, 2 character region code.
The following are valid combinations:
LL
LL-SSSS
LL-CC
LL-SSSS-CC
The parts are case insensitive.
In Chris Little's recent email to the sword-devel list, the lookup algorithm should be:
Given LL, look for LL.
Given LL-SSSS, look for LL-SSSS and failing that look for LL.
Given LL-CC, look for LL-CC and failing that look for LL.
Given LL-SSSS-CC, look for LL-SSSS-CC, LL-SSSS, LL-CC and lastly LL.
In the last one SSSS is prioritized over CC because the script has more impact on the representation of the name than the region does.
JSword does not properly support this. I thought it did.
Java Locale:
Java Locale has a fatal flaw in that if given "he" (Hebrew), "yi" (Yiddish) or "id" (Indonesian) or no-NO-NY (Norwegian spoken in the Norsk region) it will change these to "iw", "ji", "in" and "nn-NO" and not remember what it was given. There are a few other re-write exceptions too. The Hebrew, Indonesian and Norsk affect our users.
Java 7:
Introduces support for script. And it introduces a parser via Locale.forLanguageTag(); But given the flaw above we'd have to write a work around for what it does. Also, it will be quite a while before we get to Java 7.
JSword's Languages and Language classes:
The Languages classes is meant to support the lookup of a SWORD Lang field and provide a friendly name for it.
The Language class is meant to be a holder of the result of that lookup.
Currently this needs some love to get to do what it needs to do. I'm trying to provide it.
For example Languages does not expect - between parts but _ (I thought it was the other way around). It ignores the _ and everything that follows. This needs to be replaced with a proper parsing of LL, SSSS and CC.
The lookup currently is for only the LL. It needs to change to do all the lookup aspects.
Language needs to change to include SSSS. Currently, it does not store the value used to do the lookup, but rather what was effective in doing the lookup.
There are a number of issues open on the problem and I hope to resolve them all, but may need some help in reproducing them.
In His Service,
DM
More information about the jsword-devel
mailing list