[osis-core] Schema: type on language
Todd Tillinghast
osis-core@bibletechnologieswg.org
Sun, 19 Oct 2003 15:44:50 -0600
Chris,
That all seems sensible. Now we just need to carefully document it.
I would say that if SIL wants to use a more specific language than sq
they should put what they mean on xml:lang. (Use the x-SIL-3letter
code.) As you indicated if it is really xml:lang="sq" AND it covers all
three of the SIL codes then it is a good idea to put all that apply in
the <work> element.
Todd
> -----Original Message-----
> From: osis-core-admin@bibletechnologieswg.org
> [mailto:osis-core-admin@bibletechnologieswg.org] On Behalf Of
> Chris Little
> Sent: Sunday, October 19, 2003 10:59 AM
> To: osis-core@bibletechnologieswg.org
> Subject: RE: [osis-core] Schema: type on language
>
>
> Todd,
>
> On Sun, 19 Oct 2003, Todd Tillinghast wrote:
>
> > Chris,
> >
> > Are you saying that you will not able to sort out which of the many
> > forms allowed in IETF/xml:lang has been stated and that you
> would like
> > to use <language type="...">language code</language> to
> help sort out
> > with case has been encoded, but that the values for <language> and
> > xml:lang would be identical?
> >
> > That seems resonable.
>
> Almost. I'm saying it would be reasonable for an
> organization like SIL to
> encode:
> <language use="base" type="IETF">sq</language>
> <language use="base" type="SIL">ALS</language>
>
> That is, they should be able to identify the language according to a
> common form, to be used by all documents & organizations,
> identical to the
> form used for xml:lang (the IETF form). But they should also
> be able to
> use a form of their own for in-house categorization.
>
> Using values like "x-ISO-639-1-sq" might be valid, but to be
> of any use,
> it would have to be parsed as a string and cut into chunks.
> I say, why
> not just use type and be more explicit.
>
> > It also seems unfortunant that the XML/ISO standards bodies
> have made
> > it difficult for it to be obvious which standard is being
> used. (I am
> > sure with an enumeration of all possible values you can
> derive which
> > standard a value comes from.)
>
> The only real ambiguity comes with discerning between ISO
> 639-2/T and /B.
> Besides that, 2-letter elements are ISO 639-1, 3-letter are
> one of the -2
> standards, those starting with i- are IANA, and everything
> starting with
> x- is officially unknown.
>
> > I am not sure why you want to add "French", "English", and
> "native"?
> > This would seem to further confuse the situation. Maybe I don't
> > understand how you would use them.
>
> My thought was to add it as a convenience to those who might
> wish to use
> it. Rather than forcing lookups from a table that maps codes
> to language
> names, the name would be held in the document. The reason
> for choosing
> English & French is that they are the international languages
> used by ISO
> & SIL for their code databases.
>
> If you think it would be better to leave this out, I'm okay with that.
>
> > Relative to people using codes like "Austronesian (Other)", I think
> > the documentation should recommend a "concrete" language
> for xml:lang
> > and that a <language> entry for "Austronesian (Other)"
> would be fine
> > to use within <work> in addition to the "concrete" language code.
>
> I'm in agreement here. I think the value for xml:lang should
> match that
> chosen for the IETF type, and should identify the most
> specific language
> code that makes the encoder happy.
>
> Going back to Albanian... Ethnologue lists 4 dialects of
> Albanian, all of which would be identified with ISO 639-1
> code 'sq', but different SIL codes. Dialects of a single
> language can often have a common written form. If that is
> the case with Albanian and I have a Bible in the common
> written form, I might (if I were SIL and wanted to identify SIL
> codes in my work) encode:
>
> <osisText xml:lang="sq">
> ...
> <language type="IETF">sq</language>
> <langauge type="SIL">AAH</language>
> <language type="SIL">AAE</language>
> <language type="SIL">ALS</language>
> <language type="SIL">ALN</language>
>
> However, if they were not all the same written language and I
> had a Bible
> written specifically in Tosk Albanian, I would encode:
>
> <osisText xml:lang="x-SIL-ALN">
> ...
> <language type="IETF">x-SIL-ALN</language>
> <language type="ISO-639-1">sq</language>
>
> Does that seem sensible?
>
> --Chris
>
>
> >
> > Todd
> >
> > > -----Original Message-----
> > > From: osis-core-admin@bibletechnologieswg.org
> > > [mailto:osis-core-admin@bibletechnologieswg.org] On Behalf Of
> > > Chris Little
> > > Sent: Sunday, October 19, 2003 2:25 AM
> > > To: osis-core@bibletechnologieswg.org
> > > Subject: RE: [osis-core] Schema: type on language
> > >
> > >
> > >
> > > Todd,
> > >
> > > For one, it's questionable whether we can really say any
> > > language can be
> > > unambiguously identified. But let's suppose we really know
> > > what English
> > > is and we really know that 'en' identifies it. ISO 639 does
> > > a better job
> > > of unambiguously identifying some languages than it does
> for others.
> > > There are a bunch of codes that describe groups of codes,
> > > such as "Native
> > > America Indian" and "Austronesian (Other)".
> > >
> > > So, it's not quite true that Javanese has no ISO code, it's
> > > just a very,
> > > very ambiguous code shared with hundreds of other langauges.
> > > (The code
> > > would be 'map' -- "Austronesian (Other)".)
> > >
> > > I think it is valuable to keep type="...", since some
> > > organizations use
> > > those codes themselves for various sorting purposes (e.g. the
> > > Library of
> > > Congress uses ISO 639-2/B and SIL uses Ethnologue codes). If
> > > they need to
> > > use such data, I think we should provide a place to hold it.
> > >
> > > But for interoperability, IETF/xml:lang is probably best.
> > >
> > > What are your thoughts on also adding "English", "French", &
> > > "native" to
> > > the types enumeration. Is that unnecessary/inappropriate?
> > >
> > >
> > > --Chris
> > >
> > >
> > > On Fri, 17 Oct 2003, Todd Tillinghast wrote:
> > >
> > > > Chris,
> > > >
> > > > If there is a way to unambiguously express ALL of the
> > > various language
> > > > values using xml:lang in a IETF compliant string then it
> > > would seem to
> > > > make sense to use that same structure for the value of
> > > <language> and
> > > > for xml:lang AND not have a type="..." set of enumerated types.
> > > >
> > > > Ex:
> > > > Javanese for which there is not ISO code:
> > > > <osisText xml:lang="x-SIL-JVN">
> > > > and
> > > > <work>
> > > > <language>x-SIL-JVN</language>
> > > > </work>
> > > >
> > > > Albanian:
> > > > <osisText xml:lang="sq">
> > > > and
> > > > <work>
> > > > <language>sq</language>
> > > > <language>x-ISO-639-1-sq</language>
> > > > <language>x-ISO-639-2-T-sqi</language>
> > > > <language>x-ISO-639-2-B-alb</language>
> > > > <language>x-SIL-ALS</language>
> > > > </work>
> > > >
> > > > This would keep the xml:lang and <language> values
> consistent. It
> > > > would seem that we will have to enumerate the "x-"
> alternatives for
> > > > xml:lang in the documentation so we might as well use the same
> > > > structure both places.
> > > >
> > > > I believe that "x-" is allowed in the w3c's xml.xsd
> schema so the
> > > > above options should work. (Naturally if there is already an
> > > > established syntax for ISO values within xml:lang we
> should use it
> > > > rather than my x- values above.)
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > osis-core mailing list
> > > osis-core@bibletechnologieswg.org
> > > http://www.bibletechnologieswg.org/mailman/lis> tinfo/osis-core
> > >
> >
> > _______________________________________________
> > osis-core mailing list
> > osis-core@bibletechnologieswg.org
> > http://www.bibletechnologieswg.org/mailman/listinfo/osis-core
> >
>
> _______________________________________________
> osis-core mailing list
> osis-core@bibletechnologieswg.org
> http://www.bibletechnologieswg.org/mailman/lis> tinfo/osis-core
>