[sword-devel] support for locale codes with region/script subtags

DM Smith dmsmith at crosswire.org
Sun Feb 10 16:23:54 MST 2013


Didn't mean this to become a JSword thread. We're using Java 5 which does not have any notion of script. So we roll our own and replace it when we get to Java 7.

The question still remains, is _ intended in the module conf? If so, we'll change JSword code to handle it.

In Him,
	DM

On Feb 10, 2013, at 3:26 PM, Chris Burrell <chris at burrell.me.uk> wrote:

> Hi DM/Chris
> 
> The standard is defined in BCP47 which only supports a '-'. (http://tools.ietf.org/html/bcp47)
> 
> as documented by JAVA here: http://docs.oracle.com/javase/7/docs/api/java/util/Locale.html#def_variant. Java seems to support both a dash and an underscore.
> 
> DM, we should ideally be using the Java functionality which supports both, rather than implementing our own decoding scheme. Not sure what we do/don't do here.
> Chris
> 
> 
> 
> On 10 February 2013 20:09, DM Smith <dmsmith at crosswire.org> wrote:
> Chris,
> We've got this in JSword (not sure it works) for  a while now for the next release. We used the codes as you've given here. But in the conf file you have ur_Deva. We're not expecting an _ but a -. We can change the code. Please advise.
> 
> In Him,
>         DM
> 
> On Feb 10, 2013, at 5:56 AM, Chris Little <chrislit at crosswire.org> wrote:
> 
> > Just a quick heads up:
> >
> > In general, locale codes (the Lang= field of .confs) can have subtags that indicate region, script, etc. Ideally these should be dealt with in some fashion by front ends since they identify important distinctions (in the eyes of the module maker or publisher at least).
> >
> > When unknown subtags are encountered, it's probably best to recursively fall back to the tag minus its right-most subtag. For example, if 'en-Latn-US' is unknown, fall back to 'en-Latn'. If that is unknown, fall back to 'en'. (Hopefully nearly all language subtags are known.)
> >
> > We should handle this in the library, but currently don't. :(
> >
> >
> > As a specific case in point:
> > We now have two Urdu translations. They're the same translation and differ in their script (one is Arabic, the other Devanagari). Their language codes (as of the 1.2.1 release just made, which corrected the code for the Devanagari version) are: ur (Urdu in Arabic script--the usual script for Urdu) and ur-Deva (Urdu in Devanagari script).
> >
> > Possible behaviors are to categorize the ur-Deva module as belonging to an unknown language (bad), to fall back and categorize it as simply Urdu (better, but certainly confusing if the language name is written in Arabic and the module is itself written in Devanagari), or to categorize it separately as Urdu written in Devanagari (best).
> >
> > For implementers who localize the language name, Urdu written in Arabic is written "اردو". Urdu written in Devanagari is written "उर्दू".
> >
> > --Chris
> >
> > _______________________________________________
> > sword-devel mailing list: sword-devel at crosswire.org
> > http://www.crosswire.org/mailman/listinfo/sword-devel
> > Instructions to unsubscribe/change your settings at above page
> 
> 
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
> 
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20130210/1d7a0e29/attachment.html>


More information about the sword-devel mailing list