[osis-core] Milestones Summary

Thu, 20 Jun 2002 08:31:46 -0400

Steve DeRose wrote:
> At 10:06 AM -0400 06/06/02, Patrick Durusau wrote:
> 
>> Guys,
>>
>> An attempt to summarize the issues and ask some questions that may (or 
>> may not!) help frame this for resolution:
>>
>> 1. Prev/Next: TEI allows these attributes both for crossing boundaries 
>> as well as segments that are "out of order" (which assumes you have 
>> some notion other than the order of appearance in the text as the 
>> correct order for the segments).
> 
> 
> Which could, of course, arise for us in various translation cases where 
> the receptor language has different order constraints.
> 
>>
>> 2. Assuming segs were in the proper order, interesting idea about 
>> using a segID mechanism to find all the particular segs for a verse 
>> for example. (Suggested by Troy.)
>>
>> 3. Not quite sure about the assumption that English translations with 
>> multiple levels of quotes (the most common case cited so far) will 
>> have additional levels of overlapping markup. I do agree in principle 
>> that we need mechanisms that would solve that problem. Annotation 
>> markup is most generally pointing (attaching) to a segment of text and 
>> does not generally exist in the same node as the principal text.
> 
> 
> Kirk, want to weigh in on this one?

As for point "2", this is one possibility for handling the problems of 
discontinuous morphemes, e.g., the Hebrew verbal stems (which are 
distinguished by different vowels appearing between the consonants of 
the root). Segmenting those are a major problem our working group has to 
solve. So the idea of making the necessary pointers using <seg> and then 
collecting them all together sounds like it would work. Whether that is 
the best solution or not, I don't know enough yet. I'm still working my 
way through the TEI P4 to see how they would handle this.

As for point "3", linguistic annotation has been vexed by the problem of 
"attribute vs. element" question. From the standpoint of theoretical 
linguistics, any analysis of text creates an abstract structure of some 
kind. That would argue for its own informational hierarchy. Linguists 
like to talk about "phonology," "morphology," "syntax," and "discourse" 
and one might get the impression that these levels of language are 
autonomous and separate. But the reality is that the ambiguity (and 
flexibility) of language means that some parsing of phonemes, for 
example have to be deferred until syntactic analysis is completed to 
decide what its part of speech is! (An extreme example are the so-called 
Hebrew "inseperable prepositions" which are only one syllable long: is 
it (1) a root letter, (2) marker of the infinitive construct/participle, 
(3) marker of the direct object, (4) preposition, (5) an emphatic 
particle (6) enclitic? Most of the time the decision can only be made 
after the syntactic analysis is done. BTW, this is what most people mean 
when they say "context" tells you what it means!) Creating multiple 
over-lapping hierarchies are what linguists do, and that will make any 
kind of segmenting and markup very "interesting!"

Our WG is only just now starting its work, but speaking only for myself 
and given what I know now, I think we ought to consider first stand-off 
markup for linguistic analysis physically residing in a separate 
file(s), with its own DOM and hierarchy and all. To try to merge all 
that into one file along with Core markup is going to be more complexity 
than is practical to handle, IMO.

But there is are a lot of very smart and knowledgeable folk in OSIS, and 
I expect we will find the optimal solution.

Blessings,

Kirk

-- 
Kirk E. Lowery, Ph.D.
Director, Westminster Hebrew Institute
Adjunct Professor of Old Testament
Westminster Theological Seminary, Philadelphia