[sword-devel] Lang tags

Chris Little sword-devel@crosswire.org
Tue, 14 Aug 2001 11:00:51 -0700



> Chris, I see that you added Lang= tags to all modules now. 
> Just that I 
> understand: Which scheme/system did you use? You talked about the 
> lang/country (e.g. en_US) scheme, but you didn't use it, 
> right? How are the 
> strange languages (e.g. xx_KET) covered now?

They follow the scheme I described many months ago.  The following is
copied from the module making docs:

Lang is the primary language code of the module. ISO 639-1 codes are the
preferred code (e.g. en for English). If there is none for the given
language, use an ISO 639-2/T code (e.g. ceb for Cebuano). See
http://lcweb.loc.gov/standards/iso639-2/englangn.htmlfor ISO 639-1 and
639-2/T codes. In cases where no ISO 639 code is available, use "xx_"
followed by the SIL Ethnologue code for the language (e.g. xx_KEK for
Ketchi). If a text is country specific, such as the Anglicized NIV,
include the ISO 3166-1 country code after the language code and an
underscore (e.g. en_GB for UK English). See
http://www.din.de/gremien/nas/nabd/iso3166ma/codlstp1/en_listp1.html for
ISO 3166-1 codes

I realize now I forgot to add a link to the Ethnologue, but it can be
found at http://www.ethnologue.com/web.asp.  It's a very cool resource
that I also hope to make into a dictionary-type module some day.

This system is not exhaustive, but I believe it covers all living
languages.  A number of ancient languages, like Ugaritic, are unhandled
by this system, so for those we can derive our own codes like yy_UGA.

> Can moddsp.jsp now be modified to sort the modules?

It can, assuming someone has both the skills and motivation to do so. :)

> BTW, how can we handle modules with several languages such as 
> dictionaries? Maybe a list of languages with the main language first?

I had considered that.  Maybe it would make sense to list just the major
languages.  For example, most commentaries include lots of Greek/Hebrew
text, but it wouldn't make much sense to list those also.  The system I
used was just listing the primary language of the people who would be
using the text.  So, dictionaries were all marked as "en" since they
translate into English.  And the IGNT was marked "en" since it is
primarily for English speakers.

--Chris