[osis-core] Linguistic Annotation Design Document - next iteration

Tue, 23 Dec 2003 19:00:54 -0500

This is a multi-part message in MIME format.
--------------020509020804030000020902
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

Attached is a revised version, in which I've tried to address the 
comments I received for the first draft. Feel free to take any of these 
issues and make it a new thread.

Changes:

1. This design reflects Todd's recommendation of a deep rather than a 
flat model of the annotation. I had originally considered the 
hierarchical approach and concede all the advantages Todd put forth. 
Steve and I were hoping to keep the markup relatively simple, but the 
loss of those aforementioned benefits was too great.

2. Because this is a *design* document that is evolving, I had not paid 
too much attention to performance and consistency issues. For this 
document (and later documentation) I've chosen not to use abbreviations 
or shorter names (as Chris recommended) for the purpose of clarity. Any 
actual schema must use shorter names, especially the most frequent. I'm 
concerned about namespace clashes, and wonder if we should declare a 
namespace for the module?

3. For consistency with the core tag set, Chris recommended the 
"CamelCase" naming convention. I agree, and have changed the names in 
the document.

4. Scope of this proposal: Chris pointed out that certain analytical 
categories are missing (e.g., derivational morphology). The problem gets 
worse: missing are transformational labels, and -- my personal favorite 
-- the attribute-value pairs of unification-class grammars. And there 
are more needed for various camps of linguistic theories. There's no way 
that we can anticipate the annotation needs of linguistic annotators. 
There *must* be a procedure whereby the user can redefine elements and 
add/subtract elements to suit not only their language but conceptual 
framework.

5. Then there are the authority lists for linguistic labels. So far as I 
know, the EAGLES list is the only one out there. There is ISO/TC 37/SC 4 
"Language Resource Management" (http://www.tc37sc4.org/), and they've 
just had their first meeting on Linguistic Annotation in November. 
They're looking at some sort of TEI feature structure approach (ugh!). 
They aren't going to have anything of any kind very soon.

6. Roadmap: some of Chris' comments have to do with when, where and what 
we release. Here are the broad strokes: create a module that will allow 
us to annotate the original text of the Bible with classic inflectional 
morphology. Then, invite clueful individuals (I'm thinking linguists and 
translators) to look over the annotated selections and tell us what is 
missing, what needs different handling, etc. Then we abstract the whole 
procedure into a "language declaration file" which an XSLT or something 
can use to generate a language-specific annotation module. As for any 
"public" release, that's for the OSIS TC to say. But I don't think that 
"1.0" should be released until the system can be applied to 
(theoretically) any language.

Chris raised some other issues, but should probably be dealt with 
separately. My short-term goal has been to get *something* concrete, 
which can evolve into something more generally useful. That's why I 
began with Hebrew, since I have data I need to get into some sort of XML 
format. I'm aware that there are many "hebraicisms" that need 
generalization...

Comments, please.

Blessings,

Kirk
-- 
Kirk E. Lowery, Ph.D.
Director, Westminster Hebrew Institute
Adjunct Professor of Old Testament
Westminster Theological Seminary, Philadelphia

Theorie ist, wenn man alles weiss und nichts klappt.
Praxis ist, wenn alles klappt und keiner weiss warum.
Bei uns sind Theorie und Praxis vereint:
nichts klappt und keiner weiss warum!

--------------020509020804030000020902
Content-Type: text/html; charset=WINDOWS-1252;
 name="osisLAdesign.html"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="osisLAdesign.html"

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
	<META HTTP-EQUIV="CONTENT-TYPE" CONTENT="text/html; charset=windows-1252">
	<TITLE></TITLE>
	<META NAME="GENERATOR" CONTENT="OpenOffice.org 1.1.0  (Win32)">
	<META NAME="AUTHOR" CONTENT="Kirk Lowery">
	<META NAME="CREATED" CONTENT="20031102;9350813">
	<META NAME="CHANGEDBY" CONTENT="Kirk Lowery">
	<META NAME="CHANGED" CONTENT="20031223;18580001">
	<STYLE>
	<!--
		@page { size: 8.5in 11in }
		TD P.western { font-family: "Verdana", sans-serif; font-size: 10pt }
		H1.western { font-family: "Verdana", sans-serif; font-size: 20pt }
		P.western { font-family: "Verdana", sans-serif; font-size: 10pt }
		H3.western { font-family: "Verdana", sans-serif; font-size: 12pt }
		H2.western { font-family: "Verdana", sans-serif; font-size: 16pt }
		P.sdfootnote-western { margin-left: 0.2in; text-indent: -0.2in; margin-bottom: 0in; font-family: "Verdana", sans-serif; font-size: 8pt }
		P.sdfootnote-cjk { margin-left: 0.2in; text-indent: -0.2in; margin-bottom: 0in; font-size: 10pt }
		P.sdfootnote-ctl { margin-left: 0.2in; text-indent: -0.2in; margin-bottom: 0in; font-size: 10pt }
		TH P.western { font-family: "Verdana", sans-serif; font-size: 10pt }
		TT.western { font-size: 10pt }
		CODE.western { font-family: "Courier New", monospace; font-size: 10pt; font-weight: bold }
		A.sdfootnoteanc { font-size: 57% }
	-->
	</STYLE>
</HEAD>
<BODY LANG="en-US" BGCOLOR="#ffffcc" DIR="LTR">
<H1 CLASS="western" ALIGN=CENTER>Schema Design for OSIS Linguistic
Annotation</H1>
<H3 CLASS="western">by Kirk Lowery and Steve DeRose<BR>OSIS Technical
Committee</H3>
<CENTER>
	<TABLE WIDTH=100% BORDER=1 BORDERCOLOR="#ff6633" CELLPADDING=4 CELLSPACING=3 STYLE="page-break-inside: avoid">
		<COL WIDTH=39*>
		<COL WIDTH=46*>
		<COL WIDTH=171*>
		<THEAD>
			<TR VALIGN=TOP>
				<TH WIDTH=15%>
					<P CLASS="western">Revision</P>
				</TH>
				<TH WIDTH=18%>
					<P CLASS="western">Date</P>
				</TH>
				<TH WIDTH=67%>
					<P CLASS="western">Comments</P>
				</TH>
			</TR>
		</THEAD>
		<TBODY>
			<TR>
				<TD WIDTH=15% BGCOLOR="#ffff99" SDVAL="0.3" SDNUM="1033;">
					<P CLASS="western" ALIGN=CENTER><SPAN STYLE="background: transparent">0.3</SPAN></P>
				</TD>
				<TD WIDTH=18% BGCOLOR="#ffff99">
					<P CLASS="western" ALIGN=CENTER><SPAN STYLE="background: transparent">12/17/2003
					16:22:58</SPAN></P>
				</TD>
				<TD WIDTH=67% VALIGN=TOP BGCOLOR="#ffff99">
					<P CLASS="western"><SPAN STYLE="background: transparent">Changed
					<CODE CLASS="western">&lt;morpheme&gt;</CODE> content model from
					flat to deep (hierarchical).</SPAN></P>
				</TD>
			</TR>
			<TR>
				<TD WIDTH=15% SDVAL="0.2" SDNUM="1033;">
					<P CLASS="western" ALIGN=CENTER><SPAN STYLE="background: transparent">0.2</SPAN></P>
				</TD>
				<TD WIDTH=18%>
					<P CLASS="western" ALIGN=CENTER><BR>
					</P>
				</TD>
				<TD WIDTH=67% VALIGN=TOP>
					<P CLASS="western"><SPAN STYLE="background: transparent">Corrected
					</SPAN><CODE CLASS="western"><SPAN STYLE="background: transparent">lang</SPAN></CODE><SPAN STYLE="background: transparent">
					codes to the ISO 639-2 standard.</SPAN></P>
				</TD>
			</TR>
			<TR>
				<TD WIDTH=15% BGCOLOR="#ffff99" SDVAL="0.1" SDNUM="1033;">
					<P CLASS="western" ALIGN=CENTER><SPAN STYLE="background: transparent">0.1</SPAN></P>
				</TD>
				<TD WIDTH=18% BGCOLOR="#ffff99">
					<P CLASS="western" ALIGN=CENTER><SPAN STYLE="background: transparent">11/02/2003
					10:16:23</SPAN></P>
				</TD>
				<TD WIDTH=67% VALIGN=TOP BGCOLOR="#ffff99">
					<P CLASS="western"><SPAN STYLE="background: transparent">Original
					draft.</SPAN></P>
				</TD>
			</TR>
		</TBODY>
	</TABLE>
</CENTER>
<H2 CLASS="western">Introduction</H2>
<P CLASS="western">The OSIS Linguistic Annotation schema
(<TT CLASS="western"><B>osisLA.x.x.xsd</B></TT>) defines the
elements, attributes and their relationships for linguistic
annotation of an OSIS compliant document. The schema is an extension
&ndash; not a replacement &ndash; of the OSIS Core schema. The
instance document should be a valid OSIS document. The present
proposal assumes inline markup, since we do not expect anyone to be
doing stand-off markup anytime in the near future, given the current
state of software. The goal for version 1.0 will be to have a system
adequate for the markup of the Bible in its original languages at the
morphologic level of analysis.</P>
<H2 CLASS="western">Basic Concepts</H2>
<P CLASS="western">Philosophically, we view an arbitrary span or
segment of the text stream (i. e., the biblical text or the text to
be annotated) to be the element, and the annotation (including
parsing) as child nodes of that element. The first issue is that of
the granularity of segmentation of the text. What unit do we wish to
annotate? Since this first phase is focused upon morphology, we
choose the label &ldquo;morpheme&rdquo; to be our unit of text that
we wish to annotate. The <CODE CLASS="western"><B>&lt;w&gt;</B></CODE>
element is redefined to contain at least one or more <CODE CLASS="western"><B>&lt;morpheme&gt;</B></CODE>
elements. <CODE CLASS="western"><B>&lt;morpheme&gt;</B></CODE> is the
only new element to be added. It will have a very complex content
model: the immediate children are <CODE CLASS="western">&lt;lemma&gt;</CODE>
and <CODE CLASS="western">&lt;partOfSpeech&gt;</CODE>.<A CLASS="sdfootnoteanc" NAME="sdfootnote1anc" HREF="#sdfootnote1sym"><SUP>1</SUP></A>
Most of the classic parsing will be child elements of <CODE CLASS="western">&lt;partOfSpeech&gt;</CODE>.</P>
<P CLASS="western">The schema will attempt to include everything that
annotation of any language will need. Of course, each individual
language will have its own unique characteristics. These
characteristics will be captured by the language declaration
document. In the beginning, the schema will contain all that is
needed for Hebrew, Aramaic and Greek annotation. From there, later
revisions will begin the process of abstraction for language
universals.</P>
<H2 CLASS="western">Global Issues</H2>
<H3 CLASS="western">Namespace</H3>
<P CLASS="western"><FONT COLOR="#ff0000"><FONT FACE="Verdana, sans-serif"><FONT SIZE=2><SPAN STYLE="background: transparent">Should
this module have its own namespace: <B>osisLA</B> or perhaps just
<B>ola</B>?</SPAN></FONT></FONT></FONT></P>
<H3 CLASS="western">Constraints</H3>
<P CLASS="western"><FONT COLOR="#ff0000"><FONT FACE="Verdana, sans-serif"><FONT SIZE=2>Is
there a way that some attributes can be made contingent upon others?</FONT></FONT></FONT>
For example, nouns do not have <CODE CLASS="western"><B>person</B></CODE>,
but verbs and pronouns do. Nouns have <CODE CLASS="western"><B>cases</B></CODE>,
but verbs have <CODE CLASS="western"><B>tense</B></CODE>.</P>
<H3 CLASS="western">Inheritance</H3>
<P CLASS="western">It seems reasonable that <CODE CLASS="western"><B>&lt;morpheme&gt;</B></CODE>
should inherit all of the default attributes of an element from the
<CODE CLASS="western"><B>osis</B></CODE> namespace. <FONT COLOR="#ff0000"><FONT FACE="Verdana, sans-serif"><FONT SIZE=2>Is
there any reason why <B>&lt;morpheme&gt;</B> should have the <B>osisID</B>
attribute explicitly set?</FONT></FONT></FONT></P>
<H3 CLASS="western">Data Types</H3>
<P CLASS="western">First impressions suggest that no new data types
need to be derived from those already in place. <FONT COLOR="#ff0000"><FONT FACE="Verdana, sans-serif"><FONT SIZE=2>Would
there be a reason to create new derived types just for linguistic
annotation?</FONT></FONT></FONT></P>
<H3 CLASS="western">Discontinuous Morphemes</H3>
<P CLASS="western">Many languages have morphemes which leap across
spans of morphemes. For example, in Hebrew, the verbal stems are sets
of vowels that are inserted in between root consonants. <FONT COLOR="#ff0000"><FONT FACE="Verdana, sans-serif"><FONT SIZE=2>How
can these be handled?</FONT></FONT></FONT></P>
<H2 CLASS="western">Top-level Element Summary</H2>
<TABLE WIDTH=100% BORDER=1 BORDERCOLOR="#ff6633" CELLPADDING=4 CELLSPACING=3 STYLE="page-break-inside: avoid">
	<COL WIDTH=52*>
	<COL WIDTH=204*>
	<THEAD>
		<TR VALIGN=TOP>
			<TH WIDTH=20% BGCOLOR="#ffff99">
				<P CLASS="western">Element 
				</P>
			</TH>
			<TH WIDTH=80% BGCOLOR="#ffff99">
				<P CLASS="western">Description</P>
			</TH>
		</TR>
	</THEAD>
	<TBODY>
		<TR VALIGN=TOP>
			<TD WIDTH=20%>
				<P CLASS="western"><CODE CLASS="western"><B>&lt;w&gt;</B></CODE></P>
			</TD>
			<TD WIDTH=80%>
				<P CLASS="western"><CODE CLASS="western"><B>&lt;redefine&gt;</B></CODE>
				the OSIS <I>word</I> element to include <CODE CLASS="western"><B>&lt;morpheme&gt;</B></CODE></P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=20% BGCOLOR="#ffff99">
				<P CLASS="western"><CODE CLASS="western"><B>&lt;morpheme&gt;</B></CODE></P>
			</TD>
			<TD WIDTH=80% BGCOLOR="#ffff99">
				<P CLASS="western">This is the primary container for
				morphological parsing.  Allow <CODE CLASS="western"><B>&lt;note&gt;</B></CODE>
				inside <CODE CLASS="western"><B>&lt;morpheme&gt;</B></CODE>.
				Required is the text of the morpheme itself (PCDATA), the <CODE CLASS="western">&lt;lemma&gt;</CODE>
				and <CODE CLASS="western">&lt;partOfSpeech&gt;</CODE> elements.
				If more than one <CODE CLASS="western">&lt;partOfSpeech&gt;</CODE>
				is present, each must be of a different <CODE CLASS="western">type</CODE>.</P>
			</TD>
		</TR>
	</TBODY>
</TABLE>
<H2 CLASS="western"><CODE CLASS="western"><FONT SIZE=5>&lt;morpheme&gt;</FONT></CODE>
Content Model</H2>
<TABLE WIDTH=100% BORDER=1 BORDERCOLOR="#ff6633" CELLPADDING=4 CELLSPACING=3>
	<COL WIDTH=52*>
	<COL WIDTH=29*>
	<COL WIDTH=41*>
	<COL WIDTH=134*>
	<THEAD>
		<TR VALIGN=TOP>
			<TH WIDTH=20% BGCOLOR="#ffff99">
				<P CLASS="western">Attribute</P>
			</TH>
			<TH WIDTH=11% BGCOLOR="#ffff99">
				<P CLASS="western">Type</P>
			</TH>
			<TH WIDTH=16% BGCOLOR="#ffff99">
				<P CLASS="western">Values</P>
			</TH>
			<TH WIDTH=52% BGCOLOR="#ffff99">
				<P CLASS="western">Description</P>
			</TH>
		</TR>
	</THEAD>
	<TBODY>
		<TR>
			<TD WIDTH=20%>
				<P CLASS="western" ALIGN=CENTER><CODE CLASS="western"><B><FONT FACE="Times New Roman, serif">lang</FONT></B></CODE></P>
			</TD>
			<TD WIDTH=11%>
				<P CLASS="western" ALIGN=CENTER>language</P>
			</TD>
			<TD WIDTH=16%>
				<P CLASS="western" ALIGN=CENTER><I>he<BR>arc<BR>el</I></P>
			</TD>
			<TD WIDTH=52% VALIGN=TOP>
				<P CLASS="western">Defaults to the <CODE CLASS="western">xml:<B>lang</B></CODE>
				of the instance document. Intended for multi-lingual documents,
				such as the Hebrew Bible (Hebrew and Aramaic). <FONT COLOR="#ff0000"><FONT FACE="Verdana, sans-serif"><FONT SIZE=2>Is
				this a global OSIS element attribute?</FONT></FONT></FONT> <FONT COLOR="#ff0000"><FONT FACE="Verdana, sans-serif"><FONT SIZE=2>Or
				from the <B>xml</B> namespace?</FONT></FONT></FONT> From the ISO
				639-2 Language Codes: <U><FONT COLOR="#000080">http://www.w3.org/WAI/ER/IG/ert/iso639.htm</FONT></U></P>
			</TD>
		</TR>
		<TR>
			<TD WIDTH=20% BGCOLOR="#ffff99">
				<P CLASS="western" ALIGN=CENTER><CODE CLASS="western"><B>wordPart</B></CODE></P>
			</TD>
			<TD WIDTH=11% BGCOLOR="#ffff99">
				<P CLASS="western" ALIGN=CENTER>integer</P>
			</TD>
			<TD WIDTH=16% BGCOLOR="#ffff99">
				<P CLASS="western" ALIGN=CENTER><I><EM>1...&infin;</EM></I></P>
			</TD>
			<TD WIDTH=52% VALIGN=TOP BGCOLOR="#ffff99">
				<P CLASS="western">The position of the morpheme within the word.
				If the morpheme and word are co-extensive, then the value is &ldquo;1&rdquo;.
				Unbounded.</P>
			</TD>
		</TR>
		<TR>
			<TD WIDTH=20%>
				<P CLASS="western" ALIGN=CENTER><CODE CLASS="western"><B>*kqtype</B></CODE></P>
			</TD>
			<TD WIDTH=11%>
				<P CLASS="western" ALIGN=CENTER>enumerated</P>
			</TD>
			<TD WIDTH=16%>
				<P CLASS="western" ALIGN=CENTER STYLE="font-weight: medium"><I>neither
				(0)<BR>ketiv (1)<BR>qere (2)</I></P>
			</TD>
			<TD WIDTH=52% VALIGN=TOP>
				<P CLASS="western" STYLE="font-weight: medium">The <I>ketiv-qere</I>
				&ldquo;what is written; what is read&rdquo; is a scribal
				&ldquo;marginal&rdquo; note to correct the reading of the text.
				As such, it is unique to Hebrew Bible manuscripts. Default is
				<I>neither</I><SPAN STYLE="font-style: normal">.</SPAN></P>
				<P CLASS="western" STYLE="font-weight: medium">When Jewish
				medieval scribes recognized what was to them an obvious &ldquo;error&rdquo;
				in the main biblical text, they had a problem: the text is sacred
				and may not be changed. So they made the correction in the
				consonants in the margin, and the vowels in the main line of the
				text are those that match the consonants in the margin. The
				consonants in the main column of the text is called the &ldquo;<I>ketiv</I>&rdquo;
				or &ldquo;what is written&rdquo;; the consonants in the margin
				combined with the vowels written with the <I>ketiv</I> is called
				the <I>qere </I>or &ldquo;what is read&rdquo;.</P>
			</TD>
		</TR>
	</TBODY>
</TABLE>
<P CLASS="western"><BR><BR>
</P>
<TABLE WIDTH=990 BORDER=1 BORDERCOLOR="#ff6633" CELLPADDING=4 CELLSPACING=3>
	<COL WIDTH=125>
	<COL WIDTH=132>
	<COL WIDTH=211>
	<COL WIDTH=473>
	<THEAD>
		<TR>
			<TH WIDTH=125 BGCOLOR="#ffff99">
				<P CLASS="western">Child Element</P>
			</TH>
			<TD COLSPAN=3 WIDTH=838 VALIGN=TOP BGCOLOR="#ffff99">
				<P CLASS="western">Alternative: each part of speech is its own
				element. <FONT COLOR="#ff0000"><FONT FACE="Verdana, sans-serif"><FONT SIZE=2>Should
				all the parsings be &ldquo;containerized?&rdquo; Or should
				&lt;morpheme&gt; have <CODE CLASS="western">&lt;lemma&gt;</CODE>,
				plus one of <CODE CLASS="western">&lt;noun&gt;</CODE>, <CODE CLASS="western">&lt;verb&gt;</CODE>,
				<CODE CLASS="western">&lt;adjective&gt;</CODE>, etc.?</FONT></FONT></FONT></P>
			</TD>
		</TR>
	</THEAD>
	<TBODY>
		<TR>
			<TD ROWSPAN=2 WIDTH=125 BGCOLOR="#ffff99">
				<P CLASS="western" ALIGN=CENTER><CODE CLASS="western"><B>partOfSpeech</B></CODE></P>
			</TD>
			<TH WIDTH=132 VALIGN=TOP>
				<P CLASS="western" ALIGN=CENTER>Attributes</P>
			</TH>
			<TH WIDTH=211 VALIGN=TOP>
				<P CLASS="western">Values</P>
			</TH>
			<TH WIDTH=473 VALIGN=TOP>
				<P CLASS="western">Description</P>
			</TH>
		</TR>
		<TR>
			<TD WIDTH=132 BGCOLOR="#ffff99">
				<P CLASS="western" ALIGN=CENTER><CODE CLASS="western">type</CODE></P>
			</TD>
			<TD WIDTH=211 BGCOLOR="#ffff99">
				<P CLASS="western" ALIGN=CENTER><EM>formal<BR>base<BR>alternate</EM></P>
				<P CLASS="western" ALIGN=CENTER><EM>wordlevel<BR>phraseLevel<BR>clauselevel</EM></P>
				<P CLASS="western" ALIGN=CENTER><EM>contextFree<BR>contextBound</EM></P>
			</TD>
			<TD WIDTH=473 VALIGN=TOP BGCOLOR="#ffff99">
				<P CLASS="western">&ldquo;Part of speech&rdquo; is a slippery
				concept, apt to change substantially in meaning from language to
				language, and from within various linguistic theoretical camps.
				For example, there is no inflectional category for adverbs in
				biblical Hebrew, but there are lexical adverbs.</P>
				<P CLASS="western">One may take many different perspectives in
				analyzing a morpheme. One can take a purely formalist approach;
				one can view how the morpheme is used relative to another
				morpheme or set of morphemes; how the morpheme relates to the
				verb, or across clause boundaries (e. g., pronoun antecedents).
				This is not always the choice of the analyst: languages often
				require a particular perspective by the very inflectional
				category distribution itself. The default value is <I>formal</I>.</P>
				<P CLASS="western">When the annotator wishes to indicate the type
				of analysis: alternate, context-bound, context-free,
				phrase-level, clause-level. Defaults to formal, i. e., the basic,
				context-free analysis, <CODE CLASS="western"><CODE CLASS="western">&lt;morpheme&gt;</CODE></CODE>
				may contain more than one <CODE CLASS="western"><CODE CLASS="western">&lt;partOfSpeech&gt;</CODE></CODE>.
				In this case, each <CODE CLASS="western"><CODE CLASS="western">type</CODE></CODE>
				attribute must be unique. These are alternative parsings. The
				user may specify a &ldquo;base&rdquo; parsing (based upon the
				form of the morpheme) and additional parsings (based upon
				contextual usage).</P>
				<P CLASS="western">One and only one of <CODE CLASS="western">&lt;noun&gt;</CODE>,
				<CODE CLASS="western">&lt;verb&gt;</CODE>, or <CODE CLASS="western">&lt;particle&gt;</CODE>
				is <B>required</B><SPAN STYLE="font-weight: medium">. </SPAN>
				</P>
			</TD>
		</TR>
		<TR>
			<TD WIDTH=125>
				<P CLASS="western" ALIGN=CENTER><CODE CLASS="western"><B>lemma</B></CODE></P>
			</TD>
			<TD WIDTH=132>
				<P CLASS="western" ALIGN=CENTER><CODE CLASS="western"><B>homographNumber</B></CODE></P>
			</TD>
			<TD WIDTH=211>
				<P CLASS="western" ALIGN=CENTER><I><EM>0...&infin;</EM></I></P>
			</TD>
			<TD WIDTH=473 VALIGN=TOP>
				<P CLASS="western">The &ldquo;dictionary&rdquo; or &ldquo;base&rdquo;
				form of the morpheme. Older philological terminology: &ldquo;root&rdquo;
				or &ldquo;stem&rdquo;. Homographs are forms which are spelled the
				same but have more than one (unrelated) meaning, or have
				differing etymology. The default is &ldquo;0&rdquo;, i. e., no
				homograph, the form is unique.There is no default, and the value
				can be <I>empty. </I>More than one &lt;lemma&gt; may be specified
				as alternative derivations.</P>
				<P CLASS="western">The content of <CODE CLASS="western">&lt;lemma&gt;</CODE>
				is PCDATA. 
				</P>
				<P CLASS="western"><BR>
				</P>
			</TD>
		</TR>
	</TBODY>
</TABLE>
<P CLASS="western"><BR><BR>
</P>
<H2 CLASS="western"><CODE CLASS="western"><FONT SIZE=5>&lt;partOfSpeech&gt;</FONT></CODE>
Content Model</H2>
<TABLE WIDTH=100% BORDER=1 BORDERCOLOR="#ff6633" CELLPADDING=4 CELLSPACING=3>
	<COL WIDTH=52*>
	<COL WIDTH=29*>
	<COL WIDTH=33*>
	<COL WIDTH=142*>
	<THEAD>
		<TR VALIGN=TOP>
			<TH WIDTH=20% BGCOLOR="#ffff99">
				<P CLASS="western">Child Element</P>
			</TH>
			<TH WIDTH=11% BGCOLOR="#ffff99">
				<P CLASS="western" ALIGN=CENTER>Attributes</P>
			</TH>
			<TH WIDTH=13% BGCOLOR="#ffff99">
				<P CLASS="western" ALIGN=CENTER>Values</P>
			</TH>
			<TH WIDTH=56% BGCOLOR="#ffff99">
				<P CLASS="western">Description</P>
			</TH>
		</TR>
	</THEAD>
	<TBODY>
		<TR>
			<TD WIDTH=20%>
				<P CLASS="western" ALIGN=CENTER><CODE CLASS="western">noun</CODE></P>
			</TD>
			<TH WIDTH=11%>
				<P CLASS="western" ALIGN=CENTER>type</P>
			</TH>
			<TD WIDTH=13%>
				<P CLASS="western" ALIGN=CENTER><EM>commonNoun<BR>properNoun<BR>adjective<BR>pronoun</EM></P>
			</TD>
			<TD WIDTH=56% VALIGN=TOP>
				<P CLASS="western">If <CODE CLASS="western">type = &ldquo;commonNoun&rdquo;</CODE>
				or &ldquo;<CODE CLASS="western">adjective&rdquo;</CODE>, then
				<CODE CLASS="western">&lt;gender&gt;</CODE>, <CODE CLASS="western">&lt;number&gt;</CODE>
				and <CODE CLASS="western">&lt;state&gt;</CODE> are <B>required</B>.<BR>If
				<CODE CLASS="western">type = &ldquo;properNoun&rdquo;</CODE>,
				then <CODE CLASS="western">&lt;gender&gt;</CODE>, <CODE CLASS="western">&lt;number&gt;</CODE>
				and <CODE CLASS="western">&lt;state&gt;</CODE> are <B>optional</B>.<BR>If
				<CODE CLASS="western">type = &ldquo;pronoun&rdquo;</CODE>, then
				<CODE CLASS="western">&lt;gender&gt;</CODE>, <CODE CLASS="western">&lt;number&gt;</CODE>
				and <CODE CLASS="western">&lt;person&gt;</CODE> are <B>required</B>.</P>
			</TD>
		</TR>
		<TR>
			<TD WIDTH=20% BGCOLOR="#ffff99">
				<P CLASS="western" ALIGN=CENTER><CODE CLASS="western">verb</CODE></P>
			</TD>
			<TH WIDTH=11% BGCOLOR="#ffff99">
				<P CLASS="western" ALIGN=CENTER>type</P>
			</TH>
			<TD WIDTH=13% BGCOLOR="#ffff99">
				<P CLASS="western" ALIGN=CENTER><EM>finiteVerb<BR>participle<BR>infinitive</EM></P>
			</TD>
			<TD WIDTH=56% VALIGN=TOP BGCOLOR="#ffff99">
				<P CLASS="western">If <CODE CLASS="western">typ<CODE CLASS="western">e</CODE>
				= &ldquo;finiteVerb&rdquo;</CODE>, then <CODE CLASS="western">&lt;stem&gt;</CODE>,
				<CODE CLASS="western">&lt;conjugation&gt;</CODE>, <CODE CLASS="western">&lt;gender&gt;</CODE>,
				<CODE CLASS="western">&lt;number&gt;, &lt;person&gt;</CODE> are
				<B>required</B> and <CODE CLASS="western">&lt;suffix type =
				&ldquo;verbal&rdquo;&gt;</CODE> is <B>optional</B>.<BR>If <CODE CLASS="western">typ<CODE CLASS="western">e</CODE>
				= &ldquo;participle&rdquo;</CODE>, then <CODE CLASS="western">&lt;stem&gt;</CODE>,
				<CODE CLASS="western">&lt;gender&gt;</CODE>, <CODE CLASS="western">&lt;number&gt;,
				<CODE CLASS="western">&lt;state&gt;</CODE> </CODE>are <B>required</B>
				and <CODE CLASS="western">&lt;suffix&gt;</CODE> is <B>optional</B>.<BR>If
				<CODE CLASS="western">typ<CODE CLASS="western">e</CODE> =
				&ldquo;infinitive&rdquo;</CODE>, then <CODE CLASS="western">&lt;stem&gt;</CODE>
				and <CODE CLASS="western">&lt;state&gt;</CODE> are <B>required</B>,
				and <CODE CLASS="western">&lt;suffix&gt;</CODE> is <B>optional</B>.</P>
			</TD>
		</TR>
		<TR>
			<TD WIDTH=20%>
				<P CLASS="western" ALIGN=CENTER><CODE CLASS="western">particle</CODE></P>
			</TD>
			<TH WIDTH=11%>
				<P CLASS="western" ALIGN=CENTER>type</P>
			</TH>
			<TD WIDTH=13%>
				<P CLASS="western" ALIGN=CENTER><EM>adverb<BR>preposition<BR>definiteArticle<BR>interrogative<BR>negative</EM></P>
			</TD>
			<TD WIDTH=56% VALIGN=TOP>
				<P CLASS="western">If <CODE CLASS="western">typ<CODE CLASS="western">e</CODE>
				= &ldquo;adverb&rdquo;</CODE>, then no other content is
				allowed.<BR>If <CODE CLASS="western">typ<CODE CLASS="western">e</CODE>
				= &ldquo;preposition&rdquo;</CODE>, then <CODE CLASS="western">&lt;suffix&gt;</CODE>
				is <B>optional</B>.<BR>If <CODE CLASS="western">typ<CODE CLASS="western">e</CODE>
				= &ldquo;definiteArticle&rdquo;</CODE>, then no other content is
				allowed.<BR>If <CODE CLASS="western">typ<CODE CLASS="western">e</CODE>
				= &ldquo;interrogative&rdquo;</CODE>, then no other content is
				allowed.<BR>If <CODE CLASS="western">typ<CODE CLASS="western">e</CODE>
				= &ldquo;negative&rdquo;</CODE>, then no other content is
				allowed.</P>
			</TD>
		</TR>
	</TBODY>
</TABLE>
<P CLASS="western"><BR><BR>
</P>
<H2 CLASS="western">Elements required or optional in <CODE CLASS="western"><FONT SIZE=5>&lt;noun&gt;</FONT></CODE>,
<CODE CLASS="western"><FONT SIZE=5>&lt;verb&gt;</FONT></CODE>, or
<CODE CLASS="western"><FONT SIZE=5>&lt;particle&gt;</FONT></CODE></H2>
<TABLE WIDTH=100% BORDER=1 BORDERCOLOR="#ff6633" CELLPADDING=4 CELLSPACING=3>
	<COL WIDTH=52*>
	<COL WIDTH=34*>
	<COL WIDTH=39*>
	<COL WIDTH=131*>
	<THEAD>
		<TR VALIGN=TOP>
			<TH WIDTH=20% BGCOLOR="#ffff99">
				<P CLASS="western">Child Element</P>
			</TH>
			<TH WIDTH=13% BGCOLOR="#ffff99">
				<P CLASS="western" ALIGN=CENTER>Attributes</P>
			</TH>
			<TH WIDTH=15% BGCOLOR="#ffff99">
				<P CLASS="western" ALIGN=CENTER>Values</P>
			</TH>
			<TH WIDTH=51% BGCOLOR="#ffff99">
				<P CLASS="western">Description</P>
			</TH>
		</TR>
	</THEAD>
	<TBODY>
		<TR>
			<TD WIDTH=20%>
				<P CLASS="western" ALIGN=CENTER><CODE CLASS="western"><B>person</B></CODE></P>
			</TD>
			<TD WIDTH=13%>
				<P CLASS="western" ALIGN=CENTER><CODE CLASS="western">ordinal</CODE></P>
			</TD>
			<TD WIDTH=15%>
				<P CLASS="western" ALIGN=CENTER><I>1, 2, 3</I></P>
			</TD>
			<TD WIDTH=51% VALIGN=TOP>
				<P CLASS="western">Found in <CODE CLASS="western">&lt;noun&gt;</CODE>,
				<CODE CLASS="western">&lt;verb&gt;</CODE>, <CODE CLASS="western">&lt;pronoun&gt;</CODE>
				and <CODE CLASS="western">&lt;suffix&gt;</CODE>. Milestone: no
				content.</P>
			</TD>
		</TR>
		<TR>
			<TD WIDTH=20% BGCOLOR="#ffff99">
				<P CLASS="western" ALIGN=CENTER><CODE CLASS="western"><B>gender</B></CODE></P>
			</TD>
			<TD WIDTH=13% BGCOLOR="#ffff99">
				<P CLASS="western" ALIGN=CENTER><CODE CLASS="western">type</CODE></P>
			</TD>
			<TD WIDTH=15% VALIGN=TOP BGCOLOR="#ffff99">
				<P CLASS="western" ALIGN=CENTER STYLE="font-weight: medium"><I>masculine<BR>feminine<BR>neuter<BR>common</I></P>
			</TD>
			<TD WIDTH=51% VALIGN=TOP BGCOLOR="#ffff99">
				<P CLASS="western">Hebrew and Aramaic do not have a <I>neuter</I>,
				but Greek does. Gender in Hebrew is an unresolved anomaly. Some
				nouns seem to be used both as masculine and feminine;
				verb-subject agreement is often violated. Found in <CODE CLASS="western">&lt;noun&gt;</CODE>,
				<CODE CLASS="western">&lt;verb&gt;</CODE>, <CODE CLASS="western">&lt;pronoun&gt;</CODE>
				and <CODE CLASS="western">&lt;suffix&gt;</CODE>. Milestone: no
				content.</P>
				<P CLASS="western">Gender is very language-specific. In Hebrew,
				there is no neuter, and many nouns are treated ambiguously. Some
				languages, such as Hungarian, do not inflect for gender at all.
				<FONT COLOR="#ff0000"><FONT FACE="Verdana, sans-serif"><FONT SIZE=2>Do
				we distinguish between <I>lexical</I> and <I>formal</I>
				(inflected) gender?</FONT></FONT></FONT></P>
			</TD>
		</TR>
		<TR>
			<TD WIDTH=20%>
				<P CLASS="western" ALIGN=CENTER><CODE CLASS="western"><B>number</B></CODE></P>
			</TD>
			<TD WIDTH=13%>
				<P CLASS="western" ALIGN=CENTER><CODE CLASS="western">type</CODE></P>
			</TD>
			<TD WIDTH=15%>
				<P CLASS="western" ALIGN=CENTER STYLE="font-weight: medium"><I>singular<BR>dual<BR>plural</I></P>
			</TD>
			<TD WIDTH=51% VALIGN=TOP>
				<P CLASS="western">Of the biblical languages, Greek does not have
				a dual. Found in <CODE CLASS="western">&lt;noun&gt;</CODE>,
				<CODE CLASS="western">&lt;verb&gt;</CODE>, <CODE CLASS="western">&lt;pronoun&gt;</CODE>
				and <CODE CLASS="western">&lt;suffix&gt;</CODE>. Milestone: no
				content.</P>
				<P CLASS="western">This covers most language use. &ldquo;One&rdquo;
				and &ldquo;many&rdquo; seems to be the primary distinction, but
				some cultures will have special forms to meet special needs. One
				example here: the Semitic languages have a special <I>dual</I>
				form for objects which are natural pairs &ndash; hands, eyes,
				etc.</P>
			</TD>
		</TR>
		<TR>
			<TD WIDTH=20% BGCOLOR="#ffff99">
				<P CLASS="western" ALIGN=CENTER><CODE CLASS="western"><B>*state</B></CODE></P>
			</TD>
			<TD WIDTH=13% BGCOLOR="#ffff99">
				<P CLASS="western" ALIGN=CENTER><CODE CLASS="western">type</CODE></P>
			</TD>
			<TD WIDTH=15% VALIGN=TOP BGCOLOR="#ffff99">
				<P CLASS="western" ALIGN=CENTER STYLE="font-weight: medium"><I>absolute<BR>construct</I></P>
			</TD>
			<TD WIDTH=51% VALIGN=TOP BGCOLOR="#ffff99">
				<P CLASS="western" STYLE="font-weight: medium">Unique to Hebrew
				and Aramaic (and other semitic languages).</P>
				<P CLASS="western" STYLE="font-weight: medium">State has to do
				with the intonation of the noun. In the <I>absolute</I> state,
				the accent usually occurs on the last syllable. In the <I>construct</I>
				state, the accent shifts forward, and long vowels usually shorten
				as much as possible. Semantically, the <I>construct</I> form
				marks the &ldquo;genitive&rdquo; or &ldquo;possessive&rdquo;, and
				can also have an adjectival function, e. g., &ldquo;king of
				righteousness&rdquo; == &ldquo;righteous king&rdquo;.</P>
			</TD>
		</TR>
		<TR>
			<TD WIDTH=20%>
				<P CLASS="western" ALIGN=CENTER><CODE CLASS="western"><B><FONT FACE="Courier New, monospace">*stem</FONT></B></CODE></P>
			</TD>
			<TD WIDTH=13%>
				<P CLASS="western" ALIGN=CENTER><CODE CLASS="western">type</CODE></P>
			</TD>
			<TD WIDTH=15%>
				<P CLASS="western" ALIGN=CENTER STYLE="font-weight: medium"><I><EM>qal<BR>qal
				passive<BR>piel<BR>pual<BR><SPAN STYLE="font-weight: medium">hiphil<BR>hophal<BR></SPAN>niphal<BR>hitpael<BR>palel<BR>pealal<BR>pilel<BR>pilpel<BR>polel<BR>poel<BR>tiphil<BR>polal<BR>polpal<BR>pulal<BR>poal<BR>hotpaal<BR>hitpolel<BR>pitpalpel<BR>hishtaphel<BR>nitpael</EM></I></P>
			</TD>
			<TD WIDTH=51% VALIGN=TOP>
				<P CLASS="western" STYLE="font-weight: medium">More precisely,
				these are verbal patterns: vocalic insertions into the
				tri-radical verbal root consonants, modifying the basic lexical
				meaning in some consistent way.</P>
				<P CLASS="western" STYLE="font-weight: medium">This is an example
				of a discontinuous morpheme: the stem is determined by the vowels
				that are inserted between the root consonants.<FONT COLOR="#ff0000"><FONT FACE="Verdana, sans-serif"><FONT SIZE=2>
				How should discontinuous morphemes be represented in markup? Is
				this an example of overlapping hierarchies?</FONT></FONT></FONT></P>
			</TD>
		</TR>
		<TR>
			<TD WIDTH=20% BGCOLOR="#ffff99">
				<P CLASS="western" ALIGN=CENTER><CODE CLASS="western"><B>conjugation</B></CODE></P>
			</TD>
			<TD WIDTH=13% BGCOLOR="#ffff99">
				<P CLASS="western" ALIGN=CENTER><CODE CLASS="western">type</CODE></P>
			</TD>
			<TD WIDTH=15% BGCOLOR="#ffff99">
				<P CLASS="western" ALIGN=CENTER><EM>perfect<BR>imperfect<BR><EM>imperative<BR>jussive<BR><EM>participle<BR>infinitiveAbsolute<BR>infinitiveConstruct</EM></EM></EM></P>
			</TD>
			<TD WIDTH=51% VALIGN=TOP BGCOLOR="#ffff99">
				<P CLASS="western">In Hebrew, these are the <I>inflectional</I>
				sets for verbs; each language is going to have its own set of
				values. Conjugations sometimes mark verbal aspect, other times
				tense or a combination of the two.</P>
				<P CLASS="western">For Hebrew and Aramaic, the verbal
				inflectional sets mark different verbal aspects. For Greek,
				tenses and aspects are combined for the various paradigms; so
				this list would not be adequate for Greek NT markup.</P>
			</TD>
		</TR>
		<TR>
			<TD WIDTH=20%>
				<P CLASS="western" ALIGN=CENTER><CODE CLASS="western"><B>tense</B></CODE></P>
			</TD>
			<TD WIDTH=13%>
				<P CLASS="western" ALIGN=CENTER><CODE CLASS="western">type</CODE></P>
			</TD>
			<TD WIDTH=15%>
				<P CLASS="western" ALIGN=CENTER><EM>past<BR><EM>present<BR><EM>future</EM></EM></EM></P>
			</TD>
			<TD WIDTH=51% VALIGN=TOP>
				<P CLASS="western">In some languages this category is marked by
				inflection; in other languages by modal or auxiliary verbs or
				words; in still others, time is contextually marked, i. e., is a
				discourse-level phenomenon. This latter is true for Hebrew.</P>
				<P CLASS="western">Time is often combined with kind of action in
				verbs. What is listed here is &ldquo;pure&rdquo; time, and
				nothing else. This simple list is hardly exhaustive: one can
				enumerate many different kinds of time, depending upon where one
				stands on the timeline.</P>
			</TD>
		</TR>
		<TR>
			<TD ROWSPAN=2 WIDTH=20% BGCOLOR="#ffff99">
				<P CLASS="western" ALIGN=CENTER><CODE CLASS="western">suffix</CODE></P>
			</TD>
			<TD WIDTH=13% BGCOLOR="#ffff99">
				<P CLASS="western" ALIGN=CENTER><CODE CLASS="western">type</CODE></P>
			</TD>
			<TD WIDTH=15% BGCOLOR="#ffff99">
				<P CLASS="western" ALIGN=CENTER><I>apocopated<BR>paragogicNun<BR>paragogicHe<BR>directionalHe<BR>pronominal<BR></I><BR>
				</P>
			</TD>
			<TD WIDTH=51% VALIGN=TOP BGCOLOR="#ffff99">
				<P CLASS="western"><SPAN STYLE="font-style: normal">apocopated
				probably doesn&rsquo;t belong here.<BR><BR><BR><BR>two types:
				those attached to nouns and those attached to verbs</SPAN></P>
			</TD>
		</TR>
		<TR>
			<TD WIDTH=13%>
				<P CLASS="western" ALIGN=CENTER><CODE CLASS="western">PronomnalType</CODE></P>
			</TD>
			<TD WIDTH=15%>
				<P CLASS="western" ALIGN=CENTER><I>nominal<BR>verbal</I></P>
			</TD>
			<TD WIDTH=51% VALIGN=TOP>
				<P CLASS="western">The <I> nominal</I><SPAN STYLE="font-style: normal">
				and </SPAN><I>verbal</I><SPAN STYLE="font-style: normal">
				suffixes are separate paradigms in Hebrew, with morphophonemic
				changes at the boundary.</SPAN></P>
			</TD>
		</TR>
	</TBODY>
</TABLE>
<H2 CLASS="western">To Do</H2>
<UL>
	<LI><P CLASS="western">Add the grammatical categories for Aramaic
	and Greek.</P>
	<LI><P CLASS="western">Enrich the annotation scheme.</P>
	<LI><P CLASS="western">Abstract a &ldquo;universal&rdquo; language
	declaration: those declarations that all languages will need.</P>
	<LI><P CLASS="western">Create language declarations for Hebrew,
	Greek, Aramaic, English and the other major European languages.</P>
	<LI><P CLASS="western">Resolve issues of how to modularize and
	invoke the OSIS Linguistic Annotation module along with the
	concomitant language declarations.</P>
	<LI><P CLASS="western">Create simple mark up examples, but using
	real-world text.</P>
</UL>
<DIV ID="sdfootnote1">
	<P CLASS="sdfootnote-western" STYLE="margin-bottom: 0.2in"><A CLASS="sdfootnotesym" NAME="sdfootnote1sym" HREF="#sdfootnote1anc">1</A>For
	this document, I am using full names for elements and attributes.
	For the actual implementation, shorter abbreviations ought to be
	assigned to the most common element names.</P>
</DIV>
</BODY>
</HTML>
--------------020509020804030000020902--