[osis-core] Linguistic Annotation Design Document - next iteration
Kirk Lowery
osis-core@bibletechnologieswg.org
Tue, 23 Dec 2003 19:00:54 -0500
This is a multi-part message in MIME format.
--------------020509020804030000020902
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Attached is a revised version, in which I've tried to address the
comments I received for the first draft. Feel free to take any of these
issues and make it a new thread.
Changes:
1. This design reflects Todd's recommendation of a deep rather than a
flat model of the annotation. I had originally considered the
hierarchical approach and concede all the advantages Todd put forth.
Steve and I were hoping to keep the markup relatively simple, but the
loss of those aforementioned benefits was too great.
2. Because this is a *design* document that is evolving, I had not paid
too much attention to performance and consistency issues. For this
document (and later documentation) I've chosen not to use abbreviations
or shorter names (as Chris recommended) for the purpose of clarity. Any
actual schema must use shorter names, especially the most frequent. I'm
concerned about namespace clashes, and wonder if we should declare a
namespace for the module?
3. For consistency with the core tag set, Chris recommended the
"CamelCase" naming convention. I agree, and have changed the names in
the document.
4. Scope of this proposal: Chris pointed out that certain analytical
categories are missing (e.g., derivational morphology). The problem gets
worse: missing are transformational labels, and -- my personal favorite
-- the attribute-value pairs of unification-class grammars. And there
are more needed for various camps of linguistic theories. There's no way
that we can anticipate the annotation needs of linguistic annotators.
There *must* be a procedure whereby the user can redefine elements and
add/subtract elements to suit not only their language but conceptual
framework.
5. Then there are the authority lists for linguistic labels. So far as I
know, the EAGLES list is the only one out there. There is ISO/TC 37/SC 4
"Language Resource Management" (http://www.tc37sc4.org/), and they've
just had their first meeting on Linguistic Annotation in November.
They're looking at some sort of TEI feature structure approach (ugh!).
They aren't going to have anything of any kind very soon.
6. Roadmap: some of Chris' comments have to do with when, where and what
we release. Here are the broad strokes: create a module that will allow
us to annotate the original text of the Bible with classic inflectional
morphology. Then, invite clueful individuals (I'm thinking linguists and
translators) to look over the annotated selections and tell us what is
missing, what needs different handling, etc. Then we abstract the whole
procedure into a "language declaration file" which an XSLT or something
can use to generate a language-specific annotation module. As for any
"public" release, that's for the OSIS TC to say. But I don't think that
"1.0" should be released until the system can be applied to
(theoretically) any language.
Chris raised some other issues, but should probably be dealt with
separately. My short-term goal has been to get *something* concrete,
which can evolve into something more generally useful. That's why I
began with Hebrew, since I have data I need to get into some sort of XML
format. I'm aware that there are many "hebraicisms" that need
generalization...
Comments, please.
Blessings,
Kirk
--
Kirk E. Lowery, Ph.D.
Director, Westminster Hebrew Institute
Adjunct Professor of Old Testament
Westminster Theological Seminary, Philadelphia
Theorie ist, wenn man alles weiss und nichts klappt.
Praxis ist, wenn alles klappt und keiner weiss warum.
Bei uns sind Theorie und Praxis vereint:
nichts klappt und keiner weiss warum!
--------------020509020804030000020902
Content-Type: text/html; charset=WINDOWS-1252;
name="osisLAdesign.html"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="osisLAdesign.html"
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="CONTENT-TYPE" CONTENT="text/html; charset=windows-1252">
<TITLE></TITLE>
<META NAME="GENERATOR" CONTENT="OpenOffice.org 1.1.0 (Win32)">
<META NAME="AUTHOR" CONTENT="Kirk Lowery">
<META NAME="CREATED" CONTENT="20031102;9350813">
<META NAME="CHANGEDBY" CONTENT="Kirk Lowery">
<META NAME="CHANGED" CONTENT="20031223;18580001">
<STYLE>
<!--
@page { size: 8.5in 11in }
TD P.western { font-family: "Verdana", sans-serif; font-size: 10pt }
H1.western { font-family: "Verdana", sans-serif; font-size: 20pt }
P.western { font-family: "Verdana", sans-serif; font-size: 10pt }
H3.western { font-family: "Verdana", sans-serif; font-size: 12pt }
H2.western { font-family: "Verdana", sans-serif; font-size: 16pt }
P.sdfootnote-western { margin-left: 0.2in; text-indent: -0.2in; margin-bottom: 0in; font-family: "Verdana", sans-serif; font-size: 8pt }
P.sdfootnote-cjk { margin-left: 0.2in; text-indent: -0.2in; margin-bottom: 0in; font-size: 10pt }
P.sdfootnote-ctl { margin-left: 0.2in; text-indent: -0.2in; margin-bottom: 0in; font-size: 10pt }
TH P.western { font-family: "Verdana", sans-serif; font-size: 10pt }
TT.western { font-size: 10pt }
CODE.western { font-family: "Courier New", monospace; font-size: 10pt; font-weight: bold }
A.sdfootnoteanc { font-size: 57% }
-->
</STYLE>
</HEAD>
<BODY LANG="en-US" BGCOLOR="#ffffcc" DIR="LTR">
<H1 CLASS="western" ALIGN=CENTER>Schema Design for OSIS Linguistic
Annotation</H1>
<H3 CLASS="western">by Kirk Lowery and Steve DeRose<BR>OSIS Technical
Committee</H3>
<CENTER>
<TABLE WIDTH=100% BORDER=1 BORDERCOLOR="#ff6633" CELLPADDING=4 CELLSPACING=3 STYLE="page-break-inside: avoid">
<COL WIDTH=39*>
<COL WIDTH=46*>
<COL WIDTH=171*>
<THEAD>
<TR VALIGN=TOP>
<TH WIDTH=15%>
<P CLASS="western">Revision</P>
</TH>
<TH WIDTH=18%>
<P CLASS="western">Date</P>
</TH>
<TH WIDTH=67%>
<P CLASS="western">Comments</P>
</TH>
</TR>
</THEAD>
<TBODY>
<TR>
<TD WIDTH=15% BGCOLOR="#ffff99" SDVAL="0.3" SDNUM="1033;">
<P CLASS="western" ALIGN=CENTER><SPAN STYLE="background: transparent">0.3</SPAN></P>
</TD>
<TD WIDTH=18% BGCOLOR="#ffff99">
<P CLASS="western" ALIGN=CENTER><SPAN STYLE="background: transparent">12/17/2003
16:22:58</SPAN></P>
</TD>
<TD WIDTH=67% VALIGN=TOP BGCOLOR="#ffff99">
<P CLASS="western"><SPAN STYLE="background: transparent">Changed
<CODE CLASS="western"><morpheme></CODE> content model from
flat to deep (hierarchical).</SPAN></P>
</TD>
</TR>
<TR>
<TD WIDTH=15% SDVAL="0.2" SDNUM="1033;">
<P CLASS="western" ALIGN=CENTER><SPAN STYLE="background: transparent">0.2</SPAN></P>
</TD>
<TD WIDTH=18%>
<P CLASS="western" ALIGN=CENTER><BR>
</P>
</TD>
<TD WIDTH=67% VALIGN=TOP>
<P CLASS="western"><SPAN STYLE="background: transparent">Corrected
</SPAN><CODE CLASS="western"><SPAN STYLE="background: transparent">lang</SPAN></CODE><SPAN STYLE="background: transparent">
codes to the ISO 639-2 standard.</SPAN></P>
</TD>
</TR>
<TR>
<TD WIDTH=15% BGCOLOR="#ffff99" SDVAL="0.1" SDNUM="1033;">
<P CLASS="western" ALIGN=CENTER><SPAN STYLE="background: transparent">0.1</SPAN></P>
</TD>
<TD WIDTH=18% BGCOLOR="#ffff99">
<P CLASS="western" ALIGN=CENTER><SPAN STYLE="background: transparent">11/02/2003
10:16:23</SPAN></P>
</TD>
<TD WIDTH=67% VALIGN=TOP BGCOLOR="#ffff99">
<P CLASS="western"><SPAN STYLE="background: transparent">Original
draft.</SPAN></P>
</TD>
</TR>
</TBODY>
</TABLE>
</CENTER>
<H2 CLASS="western">Introduction</H2>
<P CLASS="western">The OSIS Linguistic Annotation schema
(<TT CLASS="western"><B>osisLA.x.x.xsd</B></TT>) defines the
elements, attributes and their relationships for linguistic
annotation of an OSIS compliant document. The schema is an extension
– not a replacement – of the OSIS Core schema. The
instance document should be a valid OSIS document. The present
proposal assumes inline markup, since we do not expect anyone to be
doing stand-off markup anytime in the near future, given the current
state of software. The goal for version 1.0 will be to have a system
adequate for the markup of the Bible in its original languages at the
morphologic level of analysis.</P>
<H2 CLASS="western">Basic Concepts</H2>
<P CLASS="western">Philosophically, we view an arbitrary span or
segment of the text stream (i. e., the biblical text or the text to
be annotated) to be the element, and the annotation (including
parsing) as child nodes of that element. The first issue is that of
the granularity of segmentation of the text. What unit do we wish to
annotate? Since this first phase is focused upon morphology, we
choose the label “morpheme” to be our unit of text that
we wish to annotate. The <CODE CLASS="western"><B><w></B></CODE>
element is redefined to contain at least one or more <CODE CLASS="western"><B><morpheme></B></CODE>
elements. <CODE CLASS="western"><B><morpheme></B></CODE> is the
only new element to be added. It will have a very complex content
model: the immediate children are <CODE CLASS="western"><lemma></CODE>
and <CODE CLASS="western"><partOfSpeech></CODE>.<A CLASS="sdfootnoteanc" NAME="sdfootnote1anc" HREF="#sdfootnote1sym"><SUP>1</SUP></A>
Most of the classic parsing will be child elements of <CODE CLASS="western"><partOfSpeech></CODE>.</P>
<P CLASS="western">The schema will attempt to include everything that
annotation of any language will need. Of course, each individual
language will have its own unique characteristics. These
characteristics will be captured by the language declaration
document. In the beginning, the schema will contain all that is
needed for Hebrew, Aramaic and Greek annotation. From there, later
revisions will begin the process of abstraction for language
universals.</P>
<H2 CLASS="western">Global Issues</H2>
<H3 CLASS="western">Namespace</H3>
<P CLASS="western"><FONT COLOR="#ff0000"><FONT FACE="Verdana, sans-serif"><FONT SIZE=2><SPAN STYLE="background: transparent">Should
this module have its own namespace: <B>osisLA</B> or perhaps just
<B>ola</B>?</SPAN></FONT></FONT></FONT></P>
<H3 CLASS="western">Constraints</H3>
<P CLASS="western"><FONT COLOR="#ff0000"><FONT FACE="Verdana, sans-serif"><FONT SIZE=2>Is
there a way that some attributes can be made contingent upon others?</FONT></FONT></FONT>
For example, nouns do not have <CODE CLASS="western"><B>person</B></CODE>,
but verbs and pronouns do. Nouns have <CODE CLASS="western"><B>cases</B></CODE>,
but verbs have <CODE CLASS="western"><B>tense</B></CODE>.</P>
<H3 CLASS="western">Inheritance</H3>
<P CLASS="western">It seems reasonable that <CODE CLASS="western"><B><morpheme></B></CODE>
should inherit all of the default attributes of an element from the
<CODE CLASS="western"><B>osis</B></CODE> namespace. <FONT COLOR="#ff0000"><FONT FACE="Verdana, sans-serif"><FONT SIZE=2>Is
there any reason why <B><morpheme></B> should have the <B>osisID</B>
attribute explicitly set?</FONT></FONT></FONT></P>
<H3 CLASS="western">Data Types</H3>
<P CLASS="western">First impressions suggest that no new data types
need to be derived from those already in place. <FONT COLOR="#ff0000"><FONT FACE="Verdana, sans-serif"><FONT SIZE=2>Would
there be a reason to create new derived types just for linguistic
annotation?</FONT></FONT></FONT></P>
<H3 CLASS="western">Discontinuous Morphemes</H3>
<P CLASS="western">Many languages have morphemes which leap across
spans of morphemes. For example, in Hebrew, the verbal stems are sets
of vowels that are inserted in between root consonants. <FONT COLOR="#ff0000"><FONT FACE="Verdana, sans-serif"><FONT SIZE=2>How
can these be handled?</FONT></FONT></FONT></P>
<H2 CLASS="western">Top-level Element Summary</H2>
<TABLE WIDTH=100% BORDER=1 BORDERCOLOR="#ff6633" CELLPADDING=4 CELLSPACING=3 STYLE="page-break-inside: avoid">
<COL WIDTH=52*>
<COL WIDTH=204*>
<THEAD>
<TR VALIGN=TOP>
<TH WIDTH=20% BGCOLOR="#ffff99">
<P CLASS="western">Element
</P>
</TH>
<TH WIDTH=80% BGCOLOR="#ffff99">
<P CLASS="western">Description</P>
</TH>
</TR>
</THEAD>
<TBODY>
<TR VALIGN=TOP>
<TD WIDTH=20%>
<P CLASS="western"><CODE CLASS="western"><B><w></B></CODE></P>
</TD>
<TD WIDTH=80%>
<P CLASS="western"><CODE CLASS="western"><B><redefine></B></CODE>
the OSIS <I>word</I> element to include <CODE CLASS="western"><B><morpheme></B></CODE></P>
</TD>
</TR>
<TR VALIGN=TOP>
<TD WIDTH=20% BGCOLOR="#ffff99">
<P CLASS="western"><CODE CLASS="western"><B><morpheme></B></CODE></P>
</TD>
<TD WIDTH=80% BGCOLOR="#ffff99">
<P CLASS="western">This is the primary container for
morphological parsing. Allow <CODE CLASS="western"><B><note></B></CODE>
inside <CODE CLASS="western"><B><morpheme></B></CODE>.
Required is the text of the morpheme itself (PCDATA), the <CODE CLASS="western"><lemma></CODE>
and <CODE CLASS="western"><partOfSpeech></CODE> elements.
If more than one <CODE CLASS="western"><partOfSpeech></CODE>
is present, each must be of a different <CODE CLASS="western">type</CODE>.</P>
</TD>
</TR>
</TBODY>
</TABLE>
<H2 CLASS="western"><CODE CLASS="western"><FONT SIZE=5><morpheme></FONT></CODE>
Content Model</H2>
<TABLE WIDTH=100% BORDER=1 BORDERCOLOR="#ff6633" CELLPADDING=4 CELLSPACING=3>
<COL WIDTH=52*>
<COL WIDTH=29*>
<COL WIDTH=41*>
<COL WIDTH=134*>
<THEAD>
<TR VALIGN=TOP>
<TH WIDTH=20% BGCOLOR="#ffff99">
<P CLASS="western">Attribute</P>
</TH>
<TH WIDTH=11% BGCOLOR="#ffff99">
<P CLASS="western">Type</P>
</TH>
<TH WIDTH=16% BGCOLOR="#ffff99">
<P CLASS="western">Values</P>
</TH>
<TH WIDTH=52% BGCOLOR="#ffff99">
<P CLASS="western">Description</P>
</TH>
</TR>
</THEAD>
<TBODY>
<TR>
<TD WIDTH=20%>
<P CLASS="western" ALIGN=CENTER><CODE CLASS="western"><B><FONT FACE="Times New Roman, serif">lang</FONT></B></CODE></P>
</TD>
<TD WIDTH=11%>
<P CLASS="western" ALIGN=CENTER>language</P>
</TD>
<TD WIDTH=16%>
<P CLASS="western" ALIGN=CENTER><I>he<BR>arc<BR>el</I></P>
</TD>
<TD WIDTH=52% VALIGN=TOP>
<P CLASS="western">Defaults to the <CODE CLASS="western">xml:<B>lang</B></CODE>
of the instance document. Intended for multi-lingual documents,
such as the Hebrew Bible (Hebrew and Aramaic). <FONT COLOR="#ff0000"><FONT FACE="Verdana, sans-serif"><FONT SIZE=2>Is
this a global OSIS element attribute?</FONT></FONT></FONT> <FONT COLOR="#ff0000"><FONT FACE="Verdana, sans-serif"><FONT SIZE=2>Or
from the <B>xml</B> namespace?</FONT></FONT></FONT> From the ISO
639-2 Language Codes: <U><FONT COLOR="#000080">http://www.w3.org/WAI/ER/IG/ert/iso639.htm</FONT></U></P>
</TD>
</TR>
<TR>
<TD WIDTH=20% BGCOLOR="#ffff99">
<P CLASS="western" ALIGN=CENTER><CODE CLASS="western"><B>wordPart</B></CODE></P>
</TD>
<TD WIDTH=11% BGCOLOR="#ffff99">
<P CLASS="western" ALIGN=CENTER>integer</P>
</TD>
<TD WIDTH=16% BGCOLOR="#ffff99">
<P CLASS="western" ALIGN=CENTER><I><EM>1...∞</EM></I></P>
</TD>
<TD WIDTH=52% VALIGN=TOP BGCOLOR="#ffff99">
<P CLASS="western">The position of the morpheme within the word.
If the morpheme and word are co-extensive, then the value is “1”.
Unbounded.</P>
</TD>
</TR>
<TR>
<TD WIDTH=20%>
<P CLASS="western" ALIGN=CENTER><CODE CLASS="western"><B>*kqtype</B></CODE></P>
</TD>
<TD WIDTH=11%>
<P CLASS="western" ALIGN=CENTER>enumerated</P>
</TD>
<TD WIDTH=16%>
<P CLASS="western" ALIGN=CENTER STYLE="font-weight: medium"><I>neither
(0)<BR>ketiv (1)<BR>qere (2)</I></P>
</TD>
<TD WIDTH=52% VALIGN=TOP>
<P CLASS="western" STYLE="font-weight: medium">The <I>ketiv-qere</I>
“what is written; what is read” is a scribal
“marginal” note to correct the reading of the text.
As such, it is unique to Hebrew Bible manuscripts. Default is
<I>neither</I><SPAN STYLE="font-style: normal">.</SPAN></P>
<P CLASS="western" STYLE="font-weight: medium">When Jewish
medieval scribes recognized what was to them an obvious “error”
in the main biblical text, they had a problem: the text is sacred
and may not be changed. So they made the correction in the
consonants in the margin, and the vowels in the main line of the
text are those that match the consonants in the margin. The
consonants in the main column of the text is called the “<I>ketiv</I>”
or “what is written”; the consonants in the margin
combined with the vowels written with the <I>ketiv</I> is called
the <I>qere </I>or “what is read”.</P>
</TD>
</TR>
</TBODY>
</TABLE>
<P CLASS="western"><BR><BR>
</P>
<TABLE WIDTH=990 BORDER=1 BORDERCOLOR="#ff6633" CELLPADDING=4 CELLSPACING=3>
<COL WIDTH=125>
<COL WIDTH=132>
<COL WIDTH=211>
<COL WIDTH=473>
<THEAD>
<TR>
<TH WIDTH=125 BGCOLOR="#ffff99">
<P CLASS="western">Child Element</P>
</TH>
<TD COLSPAN=3 WIDTH=838 VALIGN=TOP BGCOLOR="#ffff99">
<P CLASS="western">Alternative: each part of speech is its own
element. <FONT COLOR="#ff0000"><FONT FACE="Verdana, sans-serif"><FONT SIZE=2>Should
all the parsings be “containerized?” Or should
<morpheme> have <CODE CLASS="western"><lemma></CODE>,
plus one of <CODE CLASS="western"><noun></CODE>, <CODE CLASS="western"><verb></CODE>,
<CODE CLASS="western"><adjective></CODE>, etc.?</FONT></FONT></FONT></P>
</TD>
</TR>
</THEAD>
<TBODY>
<TR>
<TD ROWSPAN=2 WIDTH=125 BGCOLOR="#ffff99">
<P CLASS="western" ALIGN=CENTER><CODE CLASS="western"><B>partOfSpeech</B></CODE></P>
</TD>
<TH WIDTH=132 VALIGN=TOP>
<P CLASS="western" ALIGN=CENTER>Attributes</P>
</TH>
<TH WIDTH=211 VALIGN=TOP>
<P CLASS="western">Values</P>
</TH>
<TH WIDTH=473 VALIGN=TOP>
<P CLASS="western">Description</P>
</TH>
</TR>
<TR>
<TD WIDTH=132 BGCOLOR="#ffff99">
<P CLASS="western" ALIGN=CENTER><CODE CLASS="western">type</CODE></P>
</TD>
<TD WIDTH=211 BGCOLOR="#ffff99">
<P CLASS="western" ALIGN=CENTER><EM>formal<BR>base<BR>alternate</EM></P>
<P CLASS="western" ALIGN=CENTER><EM>wordlevel<BR>phraseLevel<BR>clauselevel</EM></P>
<P CLASS="western" ALIGN=CENTER><EM>contextFree<BR>contextBound</EM></P>
</TD>
<TD WIDTH=473 VALIGN=TOP BGCOLOR="#ffff99">
<P CLASS="western">“Part of speech” is a slippery
concept, apt to change substantially in meaning from language to
language, and from within various linguistic theoretical camps.
For example, there is no inflectional category for adverbs in
biblical Hebrew, but there are lexical adverbs.</P>
<P CLASS="western">One may take many different perspectives in
analyzing a morpheme. One can take a purely formalist approach;
one can view how the morpheme is used relative to another
morpheme or set of morphemes; how the morpheme relates to the
verb, or across clause boundaries (e. g., pronoun antecedents).
This is not always the choice of the analyst: languages often
require a particular perspective by the very inflectional
category distribution itself. The default value is <I>formal</I>.</P>
<P CLASS="western">When the annotator wishes to indicate the type
of analysis: alternate, context-bound, context-free,
phrase-level, clause-level. Defaults to formal, i. e., the basic,
context-free analysis, <CODE CLASS="western"><CODE CLASS="western"><morpheme></CODE></CODE>
may contain more than one <CODE CLASS="western"><CODE CLASS="western"><partOfSpeech></CODE></CODE>.
In this case, each <CODE CLASS="western"><CODE CLASS="western">type</CODE></CODE>
attribute must be unique. These are alternative parsings. The
user may specify a “base” parsing (based upon the
form of the morpheme) and additional parsings (based upon
contextual usage).</P>
<P CLASS="western">One and only one of <CODE CLASS="western"><noun></CODE>,
<CODE CLASS="western"><verb></CODE>, or <CODE CLASS="western"><particle></CODE>
is <B>required</B><SPAN STYLE="font-weight: medium">. </SPAN>
</P>
</TD>
</TR>
<TR>
<TD WIDTH=125>
<P CLASS="western" ALIGN=CENTER><CODE CLASS="western"><B>lemma</B></CODE></P>
</TD>
<TD WIDTH=132>
<P CLASS="western" ALIGN=CENTER><CODE CLASS="western"><B>homographNumber</B></CODE></P>
</TD>
<TD WIDTH=211>
<P CLASS="western" ALIGN=CENTER><I><EM>0...∞</EM></I></P>
</TD>
<TD WIDTH=473 VALIGN=TOP>
<P CLASS="western">The “dictionary” or “base”
form of the morpheme. Older philological terminology: “root”
or “stem”. Homographs are forms which are spelled the
same but have more than one (unrelated) meaning, or have
differing etymology. The default is “0”, i. e., no
homograph, the form is unique.There is no default, and the value
can be <I>empty. </I>More than one <lemma> may be specified
as alternative derivations.</P>
<P CLASS="western">The content of <CODE CLASS="western"><lemma></CODE>
is PCDATA.
</P>
<P CLASS="western"><BR>
</P>
</TD>
</TR>
</TBODY>
</TABLE>
<P CLASS="western"><BR><BR>
</P>
<H2 CLASS="western"><CODE CLASS="western"><FONT SIZE=5><partOfSpeech></FONT></CODE>
Content Model</H2>
<TABLE WIDTH=100% BORDER=1 BORDERCOLOR="#ff6633" CELLPADDING=4 CELLSPACING=3>
<COL WIDTH=52*>
<COL WIDTH=29*>
<COL WIDTH=33*>
<COL WIDTH=142*>
<THEAD>
<TR VALIGN=TOP>
<TH WIDTH=20% BGCOLOR="#ffff99">
<P CLASS="western">Child Element</P>
</TH>
<TH WIDTH=11% BGCOLOR="#ffff99">
<P CLASS="western" ALIGN=CENTER>Attributes</P>
</TH>
<TH WIDTH=13% BGCOLOR="#ffff99">
<P CLASS="western" ALIGN=CENTER>Values</P>
</TH>
<TH WIDTH=56% BGCOLOR="#ffff99">
<P CLASS="western">Description</P>
</TH>
</TR>
</THEAD>
<TBODY>
<TR>
<TD WIDTH=20%>
<P CLASS="western" ALIGN=CENTER><CODE CLASS="western">noun</CODE></P>
</TD>
<TH WIDTH=11%>
<P CLASS="western" ALIGN=CENTER>type</P>
</TH>
<TD WIDTH=13%>
<P CLASS="western" ALIGN=CENTER><EM>commonNoun<BR>properNoun<BR>adjective<BR>pronoun</EM></P>
</TD>
<TD WIDTH=56% VALIGN=TOP>
<P CLASS="western">If <CODE CLASS="western">type = “commonNoun”</CODE>
or “<CODE CLASS="western">adjective”</CODE>, then
<CODE CLASS="western"><gender></CODE>, <CODE CLASS="western"><number></CODE>
and <CODE CLASS="western"><state></CODE> are <B>required</B>.<BR>If
<CODE CLASS="western">type = “properNoun”</CODE>,
then <CODE CLASS="western"><gender></CODE>, <CODE CLASS="western"><number></CODE>
and <CODE CLASS="western"><state></CODE> are <B>optional</B>.<BR>If
<CODE CLASS="western">type = “pronoun”</CODE>, then
<CODE CLASS="western"><gender></CODE>, <CODE CLASS="western"><number></CODE>
and <CODE CLASS="western"><person></CODE> are <B>required</B>.</P>
</TD>
</TR>
<TR>
<TD WIDTH=20% BGCOLOR="#ffff99">
<P CLASS="western" ALIGN=CENTER><CODE CLASS="western">verb</CODE></P>
</TD>
<TH WIDTH=11% BGCOLOR="#ffff99">
<P CLASS="western" ALIGN=CENTER>type</P>
</TH>
<TD WIDTH=13% BGCOLOR="#ffff99">
<P CLASS="western" ALIGN=CENTER><EM>finiteVerb<BR>participle<BR>infinitive</EM></P>
</TD>
<TD WIDTH=56% VALIGN=TOP BGCOLOR="#ffff99">
<P CLASS="western">If <CODE CLASS="western">typ<CODE CLASS="western">e</CODE>
= “finiteVerb”</CODE>, then <CODE CLASS="western"><stem></CODE>,
<CODE CLASS="western"><conjugation></CODE>, <CODE CLASS="western"><gender></CODE>,
<CODE CLASS="western"><number>, <person></CODE> are
<B>required</B> and <CODE CLASS="western"><suffix type =
“verbal”></CODE> is <B>optional</B>.<BR>If <CODE CLASS="western">typ<CODE CLASS="western">e</CODE>
= “participle”</CODE>, then <CODE CLASS="western"><stem></CODE>,
<CODE CLASS="western"><gender></CODE>, <CODE CLASS="western"><number>,
<CODE CLASS="western"><state></CODE> </CODE>are <B>required</B>
and <CODE CLASS="western"><suffix></CODE> is <B>optional</B>.<BR>If
<CODE CLASS="western">typ<CODE CLASS="western">e</CODE> =
“infinitive”</CODE>, then <CODE CLASS="western"><stem></CODE>
and <CODE CLASS="western"><state></CODE> are <B>required</B>,
and <CODE CLASS="western"><suffix></CODE> is <B>optional</B>.</P>
</TD>
</TR>
<TR>
<TD WIDTH=20%>
<P CLASS="western" ALIGN=CENTER><CODE CLASS="western">particle</CODE></P>
</TD>
<TH WIDTH=11%>
<P CLASS="western" ALIGN=CENTER>type</P>
</TH>
<TD WIDTH=13%>
<P CLASS="western" ALIGN=CENTER><EM>adverb<BR>preposition<BR>definiteArticle<BR>interrogative<BR>negative</EM></P>
</TD>
<TD WIDTH=56% VALIGN=TOP>
<P CLASS="western">If <CODE CLASS="western">typ<CODE CLASS="western">e</CODE>
= “adverb”</CODE>, then no other content is
allowed.<BR>If <CODE CLASS="western">typ<CODE CLASS="western">e</CODE>
= “preposition”</CODE>, then <CODE CLASS="western"><suffix></CODE>
is <B>optional</B>.<BR>If <CODE CLASS="western">typ<CODE CLASS="western">e</CODE>
= “definiteArticle”</CODE>, then no other content is
allowed.<BR>If <CODE CLASS="western">typ<CODE CLASS="western">e</CODE>
= “interrogative”</CODE>, then no other content is
allowed.<BR>If <CODE CLASS="western">typ<CODE CLASS="western">e</CODE>
= “negative”</CODE>, then no other content is
allowed.</P>
</TD>
</TR>
</TBODY>
</TABLE>
<P CLASS="western"><BR><BR>
</P>
<H2 CLASS="western">Elements required or optional in <CODE CLASS="western"><FONT SIZE=5><noun></FONT></CODE>,
<CODE CLASS="western"><FONT SIZE=5><verb></FONT></CODE>, or
<CODE CLASS="western"><FONT SIZE=5><particle></FONT></CODE></H2>
<TABLE WIDTH=100% BORDER=1 BORDERCOLOR="#ff6633" CELLPADDING=4 CELLSPACING=3>
<COL WIDTH=52*>
<COL WIDTH=34*>
<COL WIDTH=39*>
<COL WIDTH=131*>
<THEAD>
<TR VALIGN=TOP>
<TH WIDTH=20% BGCOLOR="#ffff99">
<P CLASS="western">Child Element</P>
</TH>
<TH WIDTH=13% BGCOLOR="#ffff99">
<P CLASS="western" ALIGN=CENTER>Attributes</P>
</TH>
<TH WIDTH=15% BGCOLOR="#ffff99">
<P CLASS="western" ALIGN=CENTER>Values</P>
</TH>
<TH WIDTH=51% BGCOLOR="#ffff99">
<P CLASS="western">Description</P>
</TH>
</TR>
</THEAD>
<TBODY>
<TR>
<TD WIDTH=20%>
<P CLASS="western" ALIGN=CENTER><CODE CLASS="western"><B>person</B></CODE></P>
</TD>
<TD WIDTH=13%>
<P CLASS="western" ALIGN=CENTER><CODE CLASS="western">ordinal</CODE></P>
</TD>
<TD WIDTH=15%>
<P CLASS="western" ALIGN=CENTER><I>1, 2, 3</I></P>
</TD>
<TD WIDTH=51% VALIGN=TOP>
<P CLASS="western">Found in <CODE CLASS="western"><noun></CODE>,
<CODE CLASS="western"><verb></CODE>, <CODE CLASS="western"><pronoun></CODE>
and <CODE CLASS="western"><suffix></CODE>. Milestone: no
content.</P>
</TD>
</TR>
<TR>
<TD WIDTH=20% BGCOLOR="#ffff99">
<P CLASS="western" ALIGN=CENTER><CODE CLASS="western"><B>gender</B></CODE></P>
</TD>
<TD WIDTH=13% BGCOLOR="#ffff99">
<P CLASS="western" ALIGN=CENTER><CODE CLASS="western">type</CODE></P>
</TD>
<TD WIDTH=15% VALIGN=TOP BGCOLOR="#ffff99">
<P CLASS="western" ALIGN=CENTER STYLE="font-weight: medium"><I>masculine<BR>feminine<BR>neuter<BR>common</I></P>
</TD>
<TD WIDTH=51% VALIGN=TOP BGCOLOR="#ffff99">
<P CLASS="western">Hebrew and Aramaic do not have a <I>neuter</I>,
but Greek does. Gender in Hebrew is an unresolved anomaly. Some
nouns seem to be used both as masculine and feminine;
verb-subject agreement is often violated. Found in <CODE CLASS="western"><noun></CODE>,
<CODE CLASS="western"><verb></CODE>, <CODE CLASS="western"><pronoun></CODE>
and <CODE CLASS="western"><suffix></CODE>. Milestone: no
content.</P>
<P CLASS="western">Gender is very language-specific. In Hebrew,
there is no neuter, and many nouns are treated ambiguously. Some
languages, such as Hungarian, do not inflect for gender at all.
<FONT COLOR="#ff0000"><FONT FACE="Verdana, sans-serif"><FONT SIZE=2>Do
we distinguish between <I>lexical</I> and <I>formal</I>
(inflected) gender?</FONT></FONT></FONT></P>
</TD>
</TR>
<TR>
<TD WIDTH=20%>
<P CLASS="western" ALIGN=CENTER><CODE CLASS="western"><B>number</B></CODE></P>
</TD>
<TD WIDTH=13%>
<P CLASS="western" ALIGN=CENTER><CODE CLASS="western">type</CODE></P>
</TD>
<TD WIDTH=15%>
<P CLASS="western" ALIGN=CENTER STYLE="font-weight: medium"><I>singular<BR>dual<BR>plural</I></P>
</TD>
<TD WIDTH=51% VALIGN=TOP>
<P CLASS="western">Of the biblical languages, Greek does not have
a dual. Found in <CODE CLASS="western"><noun></CODE>,
<CODE CLASS="western"><verb></CODE>, <CODE CLASS="western"><pronoun></CODE>
and <CODE CLASS="western"><suffix></CODE>. Milestone: no
content.</P>
<P CLASS="western">This covers most language use. “One”
and “many” seems to be the primary distinction, but
some cultures will have special forms to meet special needs. One
example here: the Semitic languages have a special <I>dual</I>
form for objects which are natural pairs – hands, eyes,
etc.</P>
</TD>
</TR>
<TR>
<TD WIDTH=20% BGCOLOR="#ffff99">
<P CLASS="western" ALIGN=CENTER><CODE CLASS="western"><B>*state</B></CODE></P>
</TD>
<TD WIDTH=13% BGCOLOR="#ffff99">
<P CLASS="western" ALIGN=CENTER><CODE CLASS="western">type</CODE></P>
</TD>
<TD WIDTH=15% VALIGN=TOP BGCOLOR="#ffff99">
<P CLASS="western" ALIGN=CENTER STYLE="font-weight: medium"><I>absolute<BR>construct</I></P>
</TD>
<TD WIDTH=51% VALIGN=TOP BGCOLOR="#ffff99">
<P CLASS="western" STYLE="font-weight: medium">Unique to Hebrew
and Aramaic (and other semitic languages).</P>
<P CLASS="western" STYLE="font-weight: medium">State has to do
with the intonation of the noun. In the <I>absolute</I> state,
the accent usually occurs on the last syllable. In the <I>construct</I>
state, the accent shifts forward, and long vowels usually shorten
as much as possible. Semantically, the <I>construct</I> form
marks the “genitive” or “possessive”, and
can also have an adjectival function, e. g., “king of
righteousness” == “righteous king”.</P>
</TD>
</TR>
<TR>
<TD WIDTH=20%>
<P CLASS="western" ALIGN=CENTER><CODE CLASS="western"><B><FONT FACE="Courier New, monospace">*stem</FONT></B></CODE></P>
</TD>
<TD WIDTH=13%>
<P CLASS="western" ALIGN=CENTER><CODE CLASS="western">type</CODE></P>
</TD>
<TD WIDTH=15%>
<P CLASS="western" ALIGN=CENTER STYLE="font-weight: medium"><I><EM>qal<BR>qal
passive<BR>piel<BR>pual<BR><SPAN STYLE="font-weight: medium">hiphil<BR>hophal<BR></SPAN>niphal<BR>hitpael<BR>palel<BR>pealal<BR>pilel<BR>pilpel<BR>polel<BR>poel<BR>tiphil<BR>polal<BR>polpal<BR>pulal<BR>poal<BR>hotpaal<BR>hitpolel<BR>pitpalpel<BR>hishtaphel<BR>nitpael</EM></I></P>
</TD>
<TD WIDTH=51% VALIGN=TOP>
<P CLASS="western" STYLE="font-weight: medium">More precisely,
these are verbal patterns: vocalic insertions into the
tri-radical verbal root consonants, modifying the basic lexical
meaning in some consistent way.</P>
<P CLASS="western" STYLE="font-weight: medium">This is an example
of a discontinuous morpheme: the stem is determined by the vowels
that are inserted between the root consonants.<FONT COLOR="#ff0000"><FONT FACE="Verdana, sans-serif"><FONT SIZE=2>
How should discontinuous morphemes be represented in markup? Is
this an example of overlapping hierarchies?</FONT></FONT></FONT></P>
</TD>
</TR>
<TR>
<TD WIDTH=20% BGCOLOR="#ffff99">
<P CLASS="western" ALIGN=CENTER><CODE CLASS="western"><B>conjugation</B></CODE></P>
</TD>
<TD WIDTH=13% BGCOLOR="#ffff99">
<P CLASS="western" ALIGN=CENTER><CODE CLASS="western">type</CODE></P>
</TD>
<TD WIDTH=15% BGCOLOR="#ffff99">
<P CLASS="western" ALIGN=CENTER><EM>perfect<BR>imperfect<BR><EM>imperative<BR>jussive<BR><EM>participle<BR>infinitiveAbsolute<BR>infinitiveConstruct</EM></EM></EM></P>
</TD>
<TD WIDTH=51% VALIGN=TOP BGCOLOR="#ffff99">
<P CLASS="western">In Hebrew, these are the <I>inflectional</I>
sets for verbs; each language is going to have its own set of
values. Conjugations sometimes mark verbal aspect, other times
tense or a combination of the two.</P>
<P CLASS="western">For Hebrew and Aramaic, the verbal
inflectional sets mark different verbal aspects. For Greek,
tenses and aspects are combined for the various paradigms; so
this list would not be adequate for Greek NT markup.</P>
</TD>
</TR>
<TR>
<TD WIDTH=20%>
<P CLASS="western" ALIGN=CENTER><CODE CLASS="western"><B>tense</B></CODE></P>
</TD>
<TD WIDTH=13%>
<P CLASS="western" ALIGN=CENTER><CODE CLASS="western">type</CODE></P>
</TD>
<TD WIDTH=15%>
<P CLASS="western" ALIGN=CENTER><EM>past<BR><EM>present<BR><EM>future</EM></EM></EM></P>
</TD>
<TD WIDTH=51% VALIGN=TOP>
<P CLASS="western">In some languages this category is marked by
inflection; in other languages by modal or auxiliary verbs or
words; in still others, time is contextually marked, i. e., is a
discourse-level phenomenon. This latter is true for Hebrew.</P>
<P CLASS="western">Time is often combined with kind of action in
verbs. What is listed here is “pure” time, and
nothing else. This simple list is hardly exhaustive: one can
enumerate many different kinds of time, depending upon where one
stands on the timeline.</P>
</TD>
</TR>
<TR>
<TD ROWSPAN=2 WIDTH=20% BGCOLOR="#ffff99">
<P CLASS="western" ALIGN=CENTER><CODE CLASS="western">suffix</CODE></P>
</TD>
<TD WIDTH=13% BGCOLOR="#ffff99">
<P CLASS="western" ALIGN=CENTER><CODE CLASS="western">type</CODE></P>
</TD>
<TD WIDTH=15% BGCOLOR="#ffff99">
<P CLASS="western" ALIGN=CENTER><I>apocopated<BR>paragogicNun<BR>paragogicHe<BR>directionalHe<BR>pronominal<BR></I><BR>
</P>
</TD>
<TD WIDTH=51% VALIGN=TOP BGCOLOR="#ffff99">
<P CLASS="western"><SPAN STYLE="font-style: normal">apocopated
probably doesn’t belong here.<BR><BR><BR><BR>two types:
those attached to nouns and those attached to verbs</SPAN></P>
</TD>
</TR>
<TR>
<TD WIDTH=13%>
<P CLASS="western" ALIGN=CENTER><CODE CLASS="western">PronomnalType</CODE></P>
</TD>
<TD WIDTH=15%>
<P CLASS="western" ALIGN=CENTER><I>nominal<BR>verbal</I></P>
</TD>
<TD WIDTH=51% VALIGN=TOP>
<P CLASS="western">The <I> nominal</I><SPAN STYLE="font-style: normal">
and </SPAN><I>verbal</I><SPAN STYLE="font-style: normal">
suffixes are separate paradigms in Hebrew, with morphophonemic
changes at the boundary.</SPAN></P>
</TD>
</TR>
</TBODY>
</TABLE>
<H2 CLASS="western">To Do</H2>
<UL>
<LI><P CLASS="western">Add the grammatical categories for Aramaic
and Greek.</P>
<LI><P CLASS="western">Enrich the annotation scheme.</P>
<LI><P CLASS="western">Abstract a “universal” language
declaration: those declarations that all languages will need.</P>
<LI><P CLASS="western">Create language declarations for Hebrew,
Greek, Aramaic, English and the other major European languages.</P>
<LI><P CLASS="western">Resolve issues of how to modularize and
invoke the OSIS Linguistic Annotation module along with the
concomitant language declarations.</P>
<LI><P CLASS="western">Create simple mark up examples, but using
real-world text.</P>
</UL>
<DIV ID="sdfootnote1">
<P CLASS="sdfootnote-western" STYLE="margin-bottom: 0.2in"><A CLASS="sdfootnotesym" NAME="sdfootnote1sym" HREF="#sdfootnote1anc">1</A>For
this document, I am using full names for elements and attributes.
For the actual implementation, shorter abbreviations ought to be
assigned to the most common element names.</P>
</DIV>
</BODY>
</HTML>
--------------020509020804030000020902--