[osis-core] osisGenRegex: General Statement
Chris Little
osis-core@bibletechnologieswg.org
Mon, 20 Oct 2003 12:57:04 -0700 (MST)
Patrick,
My hope/desire for attributes like lemma would be that they be easy to
parse and easy for humans to understand. Towards that end, it is my hope
that we come up with something that makes "Strong:G1234" the only valid
way to refer to Strong's Greek lemma number 1234.
If we want to say that encoders should have a work named "Strong" listed
in the <works>s, that's fine with me. If we want to add a prefix like
"osisRef" that indicates what follows is a valid osisRef, that works for
me too. (e.g. someone wanted to use Mounce's numbering for some kind of
limited lemmatization, so he might have a work with ID "Mounce" and lemma
attributes like "osisRef:Mounce:123".
To answer your questions, I would say a prefix should always be required.
It reduces ambiguity, and I'm not concerned with filesize issues.
Regarding whether they should point to a work in the header, I don't think
they should, necessarily. Lemmata are fairly limited in number. Strong
enumerated about 14,000, I think. Morphological tags are frequently just
patterns of slots with variables that can fit into each slot. They're
defined algorithmically rather than by enumeration. For example, a
morphological tag for English pronouns might consist of "NP-" followed by
slots for person, number, gender, and case (72 possible tags, just for
pronouns). I think it is unlikely, in some cases, that a document would
ever be made to hold all the possible values of some tag systems. For the
Sword Project, we have two morph tag indexes, both of which are based on
algorithmic systems. However, the indexes themselves are not exhaustive,
but are based on all of the tags that actually occur in those specific
texts that happen to be coded to them.
We could still create a work element in the header that does not refer to
an actual document that will ever exist. But I fail to see the worth in
doing so. Better to just tell people that Strong: and GK: mean one thing,
osisRef: means you have a real document and are referencing osisIDs in it,
and x- (if we keep it) means you don't care about standards.
But honestly, I won't complain if we just set them all back to x-[^\s] and
deal with the problem in 2.1 or whatever comes next.
--Chris
On Mon, 20 Oct 2003, Patrick Durusau wrote:
> Greetings!
>
> I think we are talking about two separable issues in the osisGenRegex
> thread and I would like to separate the two.
>
> First, there is the question of the prefix, is it required?, must it
> "point" to a work in the header, etc. In my next post: subjectLine:
> Prefix for osisGenRegex.
>
> Second, there is the question of how do we do a list of values in an XML
> attribute? In my next post: subjectLine: Lists in Attribute values
>
> I think the idea of the osisGenRegex is a very good one but suspect it
> is too late to consider all the issues for it to be included in this
> release. If we don't, all the current texts will continue to be valid
> and we can consider the implications of such a change more fully.
>
> More substantive comments in the following posts.
>
> Hope everyone is having a great day!
>
> Patrick
>
>