[osis-core] Milestones Summary
Kirk Lowery
osis-core@bibletechnologieswg.org
Thu, 20 Jun 2002 08:31:46 -0400
Steve DeRose wrote:
> At 10:06 AM -0400 06/06/02, Patrick Durusau wrote:
>
>> Guys,
>>
>> An attempt to summarize the issues and ask some questions that may (or
>> may not!) help frame this for resolution:
>>
>> 1. Prev/Next: TEI allows these attributes both for crossing boundaries
>> as well as segments that are "out of order" (which assumes you have
>> some notion other than the order of appearance in the text as the
>> correct order for the segments).
>
>
> Which could, of course, arise for us in various translation cases where
> the receptor language has different order constraints.
>
>>
>> 2. Assuming segs were in the proper order, interesting idea about
>> using a segID mechanism to find all the particular segs for a verse
>> for example. (Suggested by Troy.)
>>
>> 3. Not quite sure about the assumption that English translations with
>> multiple levels of quotes (the most common case cited so far) will
>> have additional levels of overlapping markup. I do agree in principle
>> that we need mechanisms that would solve that problem. Annotation
>> markup is most generally pointing (attaching) to a segment of text and
>> does not generally exist in the same node as the principal text.
>
>
> Kirk, want to weigh in on this one?
As for point "2", this is one possibility for handling the problems of
discontinuous morphemes, e.g., the Hebrew verbal stems (which are
distinguished by different vowels appearing between the consonants of
the root). Segmenting those are a major problem our working group has to
solve. So the idea of making the necessary pointers using <seg> and then
collecting them all together sounds like it would work. Whether that is
the best solution or not, I don't know enough yet. I'm still working my
way through the TEI P4 to see how they would handle this.
As for point "3", linguistic annotation has been vexed by the problem of
"attribute vs. element" question. From the standpoint of theoretical
linguistics, any analysis of text creates an abstract structure of some
kind. That would argue for its own informational hierarchy. Linguists
like to talk about "phonology," "morphology," "syntax," and "discourse"
and one might get the impression that these levels of language are
autonomous and separate. But the reality is that the ambiguity (and
flexibility) of language means that some parsing of phonemes, for
example have to be deferred until syntactic analysis is completed to
decide what its part of speech is! (An extreme example are the so-called
Hebrew "inseperable prepositions" which are only one syllable long: is
it (1) a root letter, (2) marker of the infinitive construct/participle,
(3) marker of the direct object, (4) preposition, (5) an emphatic
particle (6) enclitic? Most of the time the decision can only be made
after the syntactic analysis is done. BTW, this is what most people mean
when they say "context" tells you what it means!) Creating multiple
over-lapping hierarchies are what linguists do, and that will make any
kind of segmenting and markup very "interesting!"
Our WG is only just now starting its work, but speaking only for myself
and given what I know now, I think we ought to consider first stand-off
markup for linguistic analysis physically residing in a separate
file(s), with its own DOM and hierarchy and all. To try to merge all
that into one file along with Core markup is going to be more complexity
than is practical to handle, IMO.
But there is are a lot of very smart and knowledgeable folk in OSIS, and
I expect we will find the optimal solution.
Blessings,
Kirk
--
Kirk E. Lowery, Ph.D.
Director, Westminster Hebrew Institute
Adjunct Professor of Old Testament
Westminster Theological Seminary, Philadelphia