[osis-core] Schema: type on language

Sun, 19 Oct 2003 15:44:50 -0600

Chris,

That all seems sensible.  Now we just need to carefully document it.

I would say that if SIL wants to use a more specific language than sq
they should put what they mean on xml:lang.  (Use the x-SIL-3letter
code.)  As you indicated if it is really xml:lang="sq" AND it covers all
three of the SIL codes then it is a good idea to put all that apply in
the <work> element.

Todd

> -----Original Message-----
> From: osis-core-admin@bibletechnologieswg.org 
> [mailto:osis-core-admin@bibletechnologieswg.org] On Behalf Of 
> Chris Little
> Sent: Sunday, October 19, 2003 10:59 AM
> To: osis-core@bibletechnologieswg.org
> Subject: RE: [osis-core] Schema: type on language
> 
> 
> Todd,
> 
> On Sun, 19 Oct 2003, Todd Tillinghast wrote:
> 
> > Chris,
> > 
> > Are you saying that you will not able to sort out which of the many 
> > forms allowed in IETF/xml:lang has been stated and that you 
> would like 
> > to use <language type="...">language code</language> to 
> help sort out 
> > with case has been encoded, but that the values for <language> and 
> > xml:lang would be identical?
> > 
> > That seems resonable.
> 
> Almost.  I'm saying it would be reasonable for an 
> organization like SIL to 
> encode:
> <language use="base" type="IETF">sq</language>
> <language use="base" type="SIL">ALS</language>
> 
> That is, they should be able to identify the language according to a 
> common form, to be used by all documents & organizations, 
> identical to the 
> form used for xml:lang (the IETF form).  But they should also 
> be able to 
> use a form of their own for in-house categorization.
> 
> Using values like "x-ISO-639-1-sq" might be valid, but to be 
> of any use, 
> it would have to be parsed as a string and cut into chunks.  
> I say, why 
> not just use type and be more explicit.
> 
> > It also seems unfortunant that the XML/ISO standards bodies 
> have made 
> > it difficult for it to be obvious which standard is being 
> used.  (I am 
> > sure with an enumeration of all possible values you can 
> derive which 
> > standard a value comes from.)
> 
> The only real ambiguity comes with discerning between ISO 
> 639-2/T and /B.  
> Besides that, 2-letter elements are ISO 639-1, 3-letter are 
> one of the -2 
> standards, those starting with i- are IANA, and everything 
> starting with 
> x- is officially unknown.
> 
> > I am not sure why you want to add "French", "English", and 
> "native"? 
> > This would seem to further confuse the situation.  Maybe I don't 
> > understand how you would use them.
> 
> My thought was to add it as a convenience to those who might 
> wish to use 
> it.  Rather than forcing lookups from a table that maps codes 
> to language 
> names, the name would be held in the document.  The reason 
> for choosing 
> English & French is that they are the international languages 
> used by ISO 
> & SIL for their code databases.
> 
> If you think it would be better to leave this out, I'm okay with that.
> 
> > Relative to people using codes like "Austronesian (Other)", I think 
> > the documentation should recommend a "concrete" language 
> for xml:lang 
> > and that a <language> entry for "Austronesian (Other)" 
> would be fine 
> > to use within <work> in addition to the "concrete" language code.
> 
> I'm in agreement here.  I think the value for xml:lang should 
> match that 
> chosen for the IETF type, and should identify the most 
> specific language 
> code that makes the encoder happy.
> 
> Going back to Albanian... Ethnologue lists 4 dialects of 
> Albanian, all of which would be identified with ISO 639-1 
> code 'sq', but different SIL codes.  Dialects of a single 
> language can often have a common written form.  If that is 
> the case with Albanian and I have a Bible in the common 
> written form, I might (if I were SIL and wanted to identify SIL 
> codes in my work) encode:
> 
> <osisText xml:lang="sq">
> ...
> <language type="IETF">sq</language>
> <langauge type="SIL">AAH</language>
> <language type="SIL">AAE</language>
> <language type="SIL">ALS</language>
> <language type="SIL">ALN</language>
> 
> However, if they were not all the same written language and I 
> had a Bible 
> written specifically in Tosk Albanian, I would encode:
> 
> <osisText xml:lang="x-SIL-ALN">
> ...
> <language type="IETF">x-SIL-ALN</language>
> <language type="ISO-639-1">sq</language>
> 
> Does that seem sensible?
> 
> --Chris
> 
> 
> > 
> > Todd
> > 
> > > -----Original Message-----
> > > From: osis-core-admin@bibletechnologieswg.org
> > > [mailto:osis-core-admin@bibletechnologieswg.org] On Behalf Of 
> > > Chris Little
> > > Sent: Sunday, October 19, 2003 2:25 AM
> > > To: osis-core@bibletechnologieswg.org
> > > Subject: RE: [osis-core] Schema: type on language
> > > 
> > > 
> > > 
> > > Todd,
> > > 
> > > For one, it's questionable whether we can really say any
> > > language can be 
> > > unambiguously identified.  But let's suppose we really know 
> > > what English 
> > > is and we really know that 'en' identifies it.  ISO 639 does 
> > > a better job 
> > > of unambiguously identifying some languages than it does 
> for others.  
> > > There are a bunch of codes that describe groups of codes, 
> > > such as "Native 
> > > America Indian" and "Austronesian (Other)".
> > > 
> > > So, it's not quite true that Javanese has no ISO code, it's
> > > just a very, 
> > > very ambiguous code shared with hundreds of other langauges.  
> > > (The code 
> > > would be 'map' -- "Austronesian (Other)".)
> > > 
> > > I think it is valuable to keep type="...", since some
> > > organizations use 
> > > those codes themselves for various sorting purposes (e.g. the 
> > > Library of 
> > > Congress uses ISO 639-2/B and SIL uses Ethnologue codes).  If 
> > > they need to 
> > > use such data, I think we should provide a place to hold it.
> > > 
> > > But for interoperability, IETF/xml:lang is probably best.
> > > 
> > > What are your thoughts on also adding "English", "French", &
> > > "native" to 
> > > the types enumeration.  Is that unnecessary/inappropriate?
> > > 
> > > 
> > > --Chris
> > > 
> > > 
> > > On Fri, 17 Oct 2003, Todd Tillinghast wrote:
> > > 
> > > > Chris,
> > > > 
> > > > If there is a way to unambiguously express ALL of the
> > > various language
> > > > values using xml:lang in a IETF compliant string then it
> > > would seem to
> > > > make sense to use that same structure for the value of
> > > <language> and
> > > > for xml:lang AND not have a type="..." set of enumerated types.
> > > > 
> > > > Ex:
> > > > Javanese for which there is not ISO code:
> > > > <osisText xml:lang="x-SIL-JVN">
> > > > and
> > > > <work>
> > > >    <language>x-SIL-JVN</language>
> > > > </work>
> > > > 
> > > > Albanian:
> > > > <osisText xml:lang="sq">
> > > > and
> > > > <work>
> > > >    <language>sq</language>
> > > >    <language>x-ISO-639-1-sq</language>
> > > >    <language>x-ISO-639-2-T-sqi</language>
> > > >    <language>x-ISO-639-2-B-alb</language>
> > > >    <language>x-SIL-ALS</language>
> > > > </work>
> > > > 
> > > > This would keep the xml:lang and <language> values 
> consistent.  It
> > > > would seem that we will have to enumerate the "x-" 
> alternatives for 
> > > > xml:lang in the documentation so we might as well use the same 
> > > > structure both places.
> > > > 
> > > > I believe that "x-" is allowed in the w3c's xml.xsd 
> schema so the
> > > > above options should work.  (Naturally if there is already an 
> > > > established syntax for ISO values within xml:lang we 
> should use it 
> > > > rather than my x- values above.)
> > > 
> > > 
> > > 
> > > 
> > > _______________________________________________
> > > osis-core mailing list
> > > osis-core@bibletechnologieswg.org
> > > http://www.bibletechnologieswg.org/mailman/lis> tinfo/osis-core
> > > 
> > 
> > _______________________________________________
> > osis-core mailing list
> > osis-core@bibletechnologieswg.org 
> > http://www.bibletechnologieswg.org/mailman/listinfo/osis-core
> > 
> 
> _______________________________________________
> osis-core mailing list
> osis-core@bibletechnologieswg.org 
> http://www.bibletechnologieswg.org/mailman/lis> tinfo/osis-core
>