[osis-core] User manual bug list - 23
Chris Little
osis-core@bibletechnologieswg.org
Mon, 16 Feb 2004 15:49:44 -0800
Todd Tillinghast wrote:
>>23. Preference or equivalency for language declarations?
>>Seciton 7.3.6 - Given the number of different ways that I can specify
>
> the >language to be english, do we need to have some type of equivancy
> tables? >Or is there an order of preference for using the different
> types? Submitted >by: Jim Schaad, jimsch@nwlink.com
>
> Resolution:
> I think we should require a preference order.
>
> My suggestion, ISO 2 letter, ISO 3 letter, Ethnologue language codes.
>
> Todd
Basically, I agree (with IANA codes being preferred over Ethnologue and
LINGUIST codes following Ethnologue in the precedence). Unfortunately,
the whole issue of "what is a valid IETF language code" is currently in
flux. IETF is creating a replacement for RFC 3066. ISO & SIL are
working on the next update to ISO 639. And OLAC has put a hold on its
recommendation for using RFC 3066 to express Ethnologue & LINGUIST codes.
My draft of my recommendation for our statement of best practice
follows, based on what we had discussed at previous meetings and on
OLAC's recommendation (available at
www.language-archives.org/REC/language.html). Notable changes from our
existing practices and what we decided at previous meetings are that
"sil" is substituted for "SIL" and "ll" is substituted for "LINGUIST".
These follow from OLAC's recommendation. The "sil" to "SIL" change is
not a big deal since IETF language codes are explicitly case-independent.
The replacement for IETF RFC 3066 has some nice additions for private
use, has a much more robust syntax, and incorporates script codes (so we
can deprecate that script attribute, eventually).
--Chris
-----
Language codes are used in three locations in an OSIS document:
1) The <language> element of the <header> element.
2) The xml:lang attribute, optional on most elements and required on the
<osisText> element.
3) Within the value of the <identifier type="osis"> element.
The first of these, the <language> element, offers the following options
for its type attribute:
'IETF' (codes according to RFC 3066 or that which obsoletes it)
http://www.ietf.org/rfc/rfc3066.txt
'ISO-639-1' (ISO 639-1 codes)
http://www.loc.gov/standards/iso639-2/englangn.html
'ISO-639-2' (ISO 639-2 codes)
http://www.loc.gov/standards/iso639-2/englangn.html
'ISO-639-2-B' (ISO 639-2/B codes)
http://www.loc.gov/standards/iso639-2/englangn.html
'ISO-639-2-T' (ISO 639-2/T codes)
http://www.loc.gov/standards/iso639-2/englangn.html
'IANA' (registered IANA values)
http://www.iana.org/assignments/language-tags
'SIL' (SIL Ethnologue codes)
http://www.ethnologue.com/
'LINGUIST' (LINGUIST List codes)
http://linguistlist.org/ancientlgs.html
http://linguistlist.org/constructedlgs.html
'other'
A <language> element with type="IETF" is recommended for all OSIS
documentes. Additional <language> elements with other type values are
optional.
The value of <language type="IETF">, xml:lang, and <identifier
type="osis"> should match, whenever possible and should follow the
following system, based on IETF RFC 3066. For any language, there
should be a single unique code.
1) Languages with an ISO 639-1 code should be represented by that code.
For example, English is "en", Hebrew is "he", and Modern Greek
(since 1453) is "el".
2) Otherwise, languages with an ISO 639-2 code should be represented by
that code. (There are no languages for which ISO 639-2/B and ISO
639-2/T codes are different that have not been assigned an ISO 639-1 code.)
For example, Ancient Greeg (to 1453) is "grc", Anglo Saxon is "ang",
and Aramaic is "arc".
3) Otherwise, languages with a registered IANA code should be
represented by that code.
For example, Klingon is "i-klingon" and Scouse is "en-scouse".
4) Otherwise, languages with SIL Ethnologue codes should be represented
by their SIL Ethnologue code, preceded by "x-sil-".
For example, ...
5) Otherwise, languages with LINGUIST List codes should be represented
by their LINGUIST List codes, preceded by "x-ll-".
For example, ...
6) Otherwise, any private use code may be used, provided that it starts
with "x-" and this is not immediately followed by either "sil-" or "ll-".
7) If necessary, the language portion of the code may be followed by the
an ISO 3166 country code (according to
http://www.oasis-open.org/cover/country3166.html).
8) Additional variant subtags may follow this.