[sword-devel] LANG values in sword?
Chris Little
sword-devel@crosswire.org
Sat, 06 Dec 2003 17:24:55 -0600
Hugo van der Kooij wrote:
> Hi,
>
> I know I reported that sword is not handling longer versions of the LANG
> environment variable.
>
> Could someone point me to the correct URL where the usage of the LANG
> variable is defined as only two characters?
The system for assigning lang values used by Sword files was essentially
designed by me and is more or less what we adopted for OSIS. (There are
a few differences that will be fixed, but they only affect minority
languages that none of you can speak or read.) I need to do a write up
for assigning them, but basically the system is this:
Any language should be represented by a single unique code.
Its format should match that described by IETF RFC 3066. (So ISO 639-1 &
ISO 639-2 codes and IANA registered codes are all valid. Plus you can
use SIL Ethnologue codes or LINGUIST List codes if you preface them with
"x-SIL-" and "x-LINGUIST-" respectively. Also, country codes are
permissible, when they are applicable.)
Since these code systems have considerable overlap, you should choose
the shortest code that describes the language with the greatest
specificity. (Hence, Ancient Greek would be "grc", not "el", which is
Modern Greek. And there might be instances where a group of languages
are covered by an ISO 639-2 code, in which case a more specific SIL code
would probably be better.)
Country codes are almost never necessary. The only instances where they
are relevent are between English spoken in the US, UK, etc. and between
Chinese written in Taiwan and mainland China.
>>From my reading on
> http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap08.html
>
> I can only conclude that nl_NL.UTF-8 is a valid variable and should be
> handled by sword in such a way that it would point me to the Dutch names
> as would nl_NL or just nl.
It's a valid variable according to some other standard, but not IETF RFC
3066. The format described in the page you cite is specifc to POSIX
locales. Our language codes are used in all books and on non-POSIX systems.
I think we're in agreement that Sword should convert POSIX locales to
IETF format and then match the most similar available locale, which is
why I put this feature request into our database the first time you
brought it up. But I don't know of anyone who has had time to work on
it since then. If anyone has the time, patches are always welcome.
--Chris