[jsword-devel] Languages

Chris Burrell chris at burrell.me.uk
Mon Feb 11 11:05:52 MST 2013


Yes and off the top of my head I thought it was possible to tell java to
extend the known set via some means without having to implement much
ourselves
On 11 Feb 2013 17:23, "Martin Denham" <mjdenham at gmail.com> wrote:

> I had been wondering if we actually needed the list of language names at
> all.  The Locale class provides many functions such as getDisplayName,
> getDisplayLanguage(Locale inLocale), getISOLanguages, getISOCountries, etc.
>
> Martin
>
> On 11 February 2013 16:55, DM Smith <dmsmith at crosswire.org> wrote:
>
>> I've been looking at the issues regarding the Language and Languages
>> classes; bcp-47 (the standard that defines the designation of Locale);
>> java7's support for it in Locale; and what SWORD has defined.
>>
>> First, the basic purpose of the JSword classes is to provide a friendly
>> name for the language code. It is not meant as Locale support.
>>
>> There are 7000+ languages, so JSword splits these into two.
>> First Part:
>> Those languages that are in use by SWORD modules on the CrossWire server
>> and those found on the CrossWire wiki under
>> http://www.crosswire.org/wiki/Localized_Language_Names.
>>
>> These are in the iso639.properties property files. When we started out,
>> this name was appropriate, but it has morphed into names in the above list.
>> So really it is a subset of bcp-47.
>>
>> These files can be localized. The default file has localized names from
>> the wiki. Failing that from www.sil.org/iso639-3.
>>
>> The iso639_en.properties file is similar to the default file, but has the
>> localized name in parens following the English name.
>>
>> So the default is not English.
>>
>> Second Part:
>> As a fall back to the First Part, if there is a new SWORD module for a
>> language that is not covered by the first part, we can do one of two things:
>> a) Just show the code as the language name.
>> b) Show the name as defined by SIL's iso639-3 files.
>>
>> The property file iso639full.properties is a map of 2 and 3 letter
>> language codes to the name of the language from SIL's files. This is a
>> huge, slow property file.
>>
>> Since this is a fall back, this file has no need to be internationalized.
>>
>> BCP-47 and SWORD:
>> This standard far exceeds what SWORD allows in a Lang field. SWORD
>> defines the field as having a required part and two optional parts:
>> LL-SSSS-CC
>> Where LL is required and a 2 or 3 letter language code.
>> Where -SSSS is the optional, 4 character script.
>> Where -CC is the optional, 2 character region code.
>> The following are valid combinations:
>> LL
>> LL-SSSS
>> LL-CC
>> LL-SSSS-CC
>>
>> The parts are case insensitive.
>>
>> In Chris Little's recent email to the sword-devel list, the lookup
>> algorithm should be:
>> Given LL, look for LL.
>> Given LL-SSSS, look for LL-SSSS and failing that look for LL.
>> Given LL-CC, look for LL-CC and failing that look for LL.
>> Given LL-SSSS-CC, look for LL-SSSS-CC, LL-SSSS, LL-CC and lastly LL.
>>
>> In the last one SSSS is prioritized over CC because the script has more
>> impact on the representation of the name than the region does.
>>
>> JSword does not properly support this. I thought it did.
>>
>> Java Locale:
>> Java Locale has a fatal flaw in that if given "he" (Hebrew), "yi"
>> (Yiddish) or "id" (Indonesian) or no-NO-NY (Norwegian spoken in the Norsk
>> region) it will change these to "iw", "ji", "in" and "nn-NO" and not
>> remember what it was given. There are a few other re-write exceptions too.
>> The Hebrew, Indonesian and Norsk affect our users.
>>
>> Java 7:
>> Introduces support for script. And it introduces a parser via
>> Locale.forLanguageTag(); But given the flaw above we'd have to write a work
>> around for what it does. Also, it will be quite a while before we get to
>> Java 7.
>>
>> JSword's Languages and Language classes:
>> The Languages classes is meant to support the lookup of a SWORD Lang
>> field and provide a friendly name for it.
>> The Language class is meant to be a holder of the result of that lookup.
>>
>> Currently this needs some love to get to do what it needs to do. I'm
>> trying to provide it.
>>
>> For example Languages does not expect - between parts but _ (I thought it
>> was the other way around). It ignores the _ and everything that follows.
>> This needs to be replaced with a proper parsing of LL, SSSS and CC.
>> The lookup currently is for only the LL. It needs to change to do all the
>> lookup aspects.
>>
>> Language needs to change to include SSSS. Currently, it does not store
>> the value used to do the lookup, but rather what was effective in doing the
>> lookup.
>>
>> There are a number of issues open on the problem and I hope to resolve
>> them all, but may need some help in reproducing them.
>>
>> In His Service,
>>         DM
>>
>>
>>
>> _______________________________________________
>> jsword-devel mailing list
>> jsword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/jsword-devel
>>
>
>
> _______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20130211/81f08dc4/attachment.html>


More information about the jsword-devel mailing list