[jsword-devel] Languages

DM Smith dmsmith at crosswire.org
Mon Feb 11 11:07:38 MST 2013


I checked a long time ago and it did not support many of the languages that SWORD has modules in.

We just added a module with a language of cbk (Central Kurdish), the result of System.out.println(new Locale("ckb").getDisplayLanguage()); is cbk.

From what I can tell, it supports most if not all 2 letter codes.

Some 3 letter codes for which there is a 2 letter code are not supported, E.g. azb should return the same as az.

The coverage for what we have in SWORD modules is spotty.

On Feb 11, 2013, at 12:22 PM, Martin Denham <mjdenham at gmail.com> wrote:

> I had been wondering if we actually needed the list of language names at all.  The Locale class provides many functions such as getDisplayName, getDisplayLanguage(Locale inLocale), getISOLanguages, getISOCountries, etc.
> 
> Martin
> 
> On 11 February 2013 16:55, DM Smith <dmsmith at crosswire.org> wrote:
> I've been looking at the issues regarding the Language and Languages classes; bcp-47 (the standard that defines the designation of Locale); java7's support for it in Locale; and what SWORD has defined.
> 
> First, the basic purpose of the JSword classes is to provide a friendly name for the language code. It is not meant as Locale support.
> 
> There are 7000+ languages, so JSword splits these into two.
> First Part:
> Those languages that are in use by SWORD modules on the CrossWire server and those found on the CrossWire wiki under http://www.crosswire.org/wiki/Localized_Language_Names.
> 
> These are in the iso639.properties property files. When we started out, this name was appropriate, but it has morphed into names in the above list. So really it is a subset of bcp-47.
> 
> These files can be localized. The default file has localized names from the wiki. Failing that from www.sil.org/iso639-3.
> 
> The iso639_en.properties file is similar to the default file, but has the localized name in parens following the English name.
> 
> So the default is not English.
> 
> Second Part:
> As a fall back to the First Part, if there is a new SWORD module for a language that is not covered by the first part, we can do one of two things:
> a) Just show the code as the language name.
> b) Show the name as defined by SIL's iso639-3 files.
> 
> The property file iso639full.properties is a map of 2 and 3 letter language codes to the name of the language from SIL's files. This is a huge, slow property file.
> 
> Since this is a fall back, this file has no need to be internationalized.
> 
> BCP-47 and SWORD:
> This standard far exceeds what SWORD allows in a Lang field. SWORD defines the field as having a required part and two optional parts: LL-SSSS-CC
> Where LL is required and a 2 or 3 letter language code.
> Where -SSSS is the optional, 4 character script.
> Where -CC is the optional, 2 character region code.
> The following are valid combinations:
> LL
> LL-SSSS
> LL-CC
> LL-SSSS-CC
> 
> The parts are case insensitive.
> 
> In Chris Little's recent email to the sword-devel list, the lookup algorithm should be:
> Given LL, look for LL.
> Given LL-SSSS, look for LL-SSSS and failing that look for LL.
> Given LL-CC, look for LL-CC and failing that look for LL.
> Given LL-SSSS-CC, look for LL-SSSS-CC, LL-SSSS, LL-CC and lastly LL.
> 
> In the last one SSSS is prioritized over CC because the script has more impact on the representation of the name than the region does.
> 
> JSword does not properly support this. I thought it did.
> 
> Java Locale:
> Java Locale has a fatal flaw in that if given "he" (Hebrew), "yi" (Yiddish) or "id" (Indonesian) or no-NO-NY (Norwegian spoken in the Norsk region) it will change these to "iw", "ji", "in" and "nn-NO" and not remember what it was given. There are a few other re-write exceptions too. The Hebrew, Indonesian and Norsk affect our users.
> 
> Java 7:
> Introduces support for script. And it introduces a parser via Locale.forLanguageTag(); But given the flaw above we'd have to write a work around for what it does. Also, it will be quite a while before we get to Java 7.
> 
> JSword's Languages and Language classes:
> The Languages classes is meant to support the lookup of a SWORD Lang field and provide a friendly name for it.
> The Language class is meant to be a holder of the result of that lookup.
> 
> Currently this needs some love to get to do what it needs to do. I'm trying to provide it.
> 
> For example Languages does not expect - between parts but _ (I thought it was the other way around). It ignores the _ and everything that follows. This needs to be replaced with a proper parsing of LL, SSSS and CC.
> The lookup currently is for only the LL. It needs to change to do all the lookup aspects.
> 
> Language needs to change to include SSSS. Currently, it does not store the value used to do the lookup, but rather what was effective in doing the lookup.
> 
> There are a number of issues open on the problem and I hope to resolve them all, but may need some help in reproducing them.
> 
> In His Service,
>         DM
> 
> 
> 
> _______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel
> 
> _______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20130211/0ceeac68/attachment-0001.html>


More information about the jsword-devel mailing list