[osis-core] Schema: type on language
Chris Little
osis-core@bibletechnologieswg.org
Sat, 11 Oct 2003 21:22:45 -0700
Patrick,
Sorry I couldn't get this out before the new Schema beta. I've been too
busy cursing at my cell phone every time it drops my call as I work on
the reply. Oh for a land line...
Patrick Durusau wrote:
> Chris,
>
> Sanity check:
>
> So the attributes are: ISO-639-1, ISO-639-2, SIL, LINGUIST LIST, but the
> content of the element is your x-SIL-ENG? In other words, no regex to
> validate the content of the <language> element?
My suggestions would be to:
1) Change "LINGUIST List" to just "LINGUIST". "LINGUIST List" usually
refers to just the list itself, whereas "LINGUIST" frequently refers to
things associated with the list, such as their code list. Anyone who is
likely to use a LINGUIST code will recognize & understand the meaning of
"LINGUIST". Sorry about that, I was kind of misleading in my last reply.
2) Add "other" back to the enumeration. I think this was a good idea.
Or would we prefer people to name their own private schemes for language
codes and use an "x-" type value?
3) Your question about a regex made me think... "x-SIL-ENG" was just an
example of how SIL suggests using their codes if you need an RFC
3066-compliant code. I think people would expect to use just the
Ethnologue code itself, e.g. "ENG" if they set their <language> type to
"SIL". So, yes, I think it should just be an xs:string, not a pattern.
4) However, in thinking about it, it did seem like it would advantageous
to provide a mechanism for identifying codes that would be identical to
the codes in xml:lang values in the document itself, which are RFC
3066-compliant (in theory). So, I would recommend we also add the
values "IETF" (for RFC 3066, or whatever supercedes it) and "IANA" (for
IANA registered values, such as the IETF RFCs refer to). The contents
of <language type="IETF"> should be constrained to RFC
3066/xml:lang/[A-Za-z]{1,8}(\-[A-Za-z]{1,8})*, but only in prose (since
I assume that's all that's possible if we want all other types to be
unconstrained xs:string).
Cliff's notes version:
Change "LINGUIST List" to "LINGUIST".
Add "other", "IANA", "IETF".
Comments/objections welcome, but I think the "IETF" value would be
invaluable down the road.
> Works for me, just wanted to check.
>
> Assume role is just xs:string? We don't try to enumerate?
I say enumerate whenever possible. All the values I could think of
were: original, translation, interlinear, quotation, didactic, source &
target. I make no claim to those being exhaustive, but there's always x-.
--Chris