[osis-core] User manual bug list - 23

Chris Little osis-core@bibletechnologieswg.org
Mon, 16 Feb 2004 15:49:44 -0800


Todd Tillinghast wrote:

>>23. Preference or equivalency for language declarations? 
>>Seciton 7.3.6 - Given the number of different ways that I can specify
> 
> the >language to be english, do we need to have some type of equivancy
> tables? >Or is there an order of preference for using the different
> types? Submitted >by: Jim Schaad, jimsch@nwlink.com 
> 
> Resolution: 
> I think we should require a preference order.
> 
> My suggestion, ISO 2 letter, ISO 3 letter, Ethnologue language codes.
> 
> Todd

Basically, I agree (with IANA codes being preferred over Ethnologue and 
LINGUIST codes following Ethnologue in the precedence).  Unfortunately, 
the whole issue of "what is a valid IETF language code" is currently in 
flux.  IETF is creating a replacement for RFC 3066.  ISO & SIL are 
working on the next update to ISO 639.  And OLAC has put a hold on its 
recommendation for using RFC 3066 to express Ethnologue & LINGUIST codes.

My draft of my recommendation for our statement of best practice 
follows, based on what we had discussed at previous meetings and on 
OLAC's recommendation (available at 
www.language-archives.org/REC/language.html).  Notable changes from our 
existing practices and what we decided at previous meetings are that 
"sil" is substituted for "SIL" and "ll" is substituted for "LINGUIST". 
These follow from OLAC's recommendation.  The "sil" to "SIL" change is 
not a big deal since IETF language codes are explicitly case-independent.

The replacement for IETF RFC 3066 has some nice additions for private 
use, has a much more robust syntax, and incorporates script codes (so we 
can deprecate that script attribute, eventually).

--Chris

-----

Language codes are used in three locations in an OSIS document:
1) The <language> element of the <header> element.
2) The xml:lang attribute, optional on most elements and required on the 
<osisText> element.
3) Within the value of the <identifier type="osis"> element.

The first of these, the <language> element, offers the following options 
for its type attribute:
     'IETF' (codes according to RFC 3066 or that which obsoletes it)
	http://www.ietf.org/rfc/rfc3066.txt
     'ISO-639-1' (ISO 639-1 codes)
	http://www.loc.gov/standards/iso639-2/englangn.html	
     'ISO-639-2' (ISO 639-2 codes)
	http://www.loc.gov/standards/iso639-2/englangn.html
     'ISO-639-2-B' (ISO 639-2/B codes)
	http://www.loc.gov/standards/iso639-2/englangn.html
     'ISO-639-2-T' (ISO 639-2/T codes)
	http://www.loc.gov/standards/iso639-2/englangn.html
     'IANA' (registered IANA values)
	http://www.iana.org/assignments/language-tags
     'SIL' (SIL Ethnologue codes)
	http://www.ethnologue.com/
     'LINGUIST' (LINGUIST List codes)
	http://linguistlist.org/ancientlgs.html
	http://linguistlist.org/constructedlgs.html
     'other'

A <language> element with type="IETF" is recommended for all OSIS 
documentes.  Additional <language> elements with other type values are 
optional.

The value of <language type="IETF">, xml:lang, and <identifier 
type="osis"> should match, whenever possible and should follow the 
following system, based on IETF RFC 3066.  For any language, there 
should be a single unique code.

1) Languages with an ISO 639-1 code should be represented by that code.

    For example, English is "en", Hebrew is "he", and Modern Greek 
(since 1453) is "el".

2) Otherwise, languages with an ISO 639-2 code should be represented by 
that code.  (There are no languages for which ISO 639-2/B and ISO 
639-2/T codes are different that have not been assigned an ISO 639-1 code.)

    For example, Ancient Greeg (to 1453) is "grc", Anglo Saxon is "ang", 
and Aramaic is "arc".

3) Otherwise, languages with a registered IANA code should be 
represented by that code.

    For example, Klingon is "i-klingon" and Scouse is "en-scouse".

4) Otherwise, languages with SIL Ethnologue codes should be represented 
by their SIL Ethnologue code, preceded by "x-sil-".

    For example, ...

5) Otherwise, languages with LINGUIST List codes should be represented 
by their LINGUIST List codes, preceded by "x-ll-".

    For example, ...

6) Otherwise, any private use code may be used, provided that it starts 
with "x-" and this is not immediately followed by either "sil-" or "ll-".

7) If necessary, the language portion of the code may be followed by the 
an ISO 3166 country code (according to 
http://www.oasis-open.org/cover/country3166.html).

8) Additional variant subtags may follow this.