<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

<div class="moz-text-flowed"

 style="font-family: -moz-fixed; font-size: 13px;" lang="x-western">I

appreciate the problems with trying to use an inherently

tree-structured notation like XML to mark overlapping regions. I'm not

in touch with what the OSIS thinkers are thinking: other than the

milestone approach, has anybody considered combining in-line markup

identifying critical base elements (probably words) with standoff

indexed markup? As a concrete (and over-simplified) example, the text

would be marked in-line like this (Matt 27:11):

<br>

<br>

&lt;t id="1"&gt;Jesus&lt;/t&gt;

<br>

&lt;t id="2"&gt;said&lt;/t&gt;

<br>

&lt;t id="3"&gt;,&lt;/t&gt;

<br>

&lt;t id="4"&gt;"&lt;/t&gt;

<br>

&lt;t id="5"&gt;You&lt;t&gt;

<br>

&lt;t id="6"&gt;have&lt;/t&gt;

<br>

&lt;t id="7"&gt;said&lt;/t&gt;

<br>

&lt;t id="8"&gt;so&lt;/t&gt;

<br>

&lt;t id="9"&gt;.&lt;/t&gt;

<br>

&lt;t id="10"&gt;"&lt;/t&gt;

<br>

<br>

with red-letter spans indexed with start-end indices &lt;woc start="5"

end="9"/&gt;. The standoff index markup doesn't need to nest, it just

marks spans.

Assume the numbering space for words extends throughout a book, you can

have quotations, sentences, or paragraphs that span verses, etc. And of

course individual tokens can be marked as punctuation, starting vs.

ending quotes, etc. <br>

<br>

I assume somebody has already considered this and there's a good reason

why it doesn't solve all the problems (or introduces new ones). <br>

<br>

Sean

<br>

<br>

<br>

</div>

</body>

</html>