<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
<div class="moz-text-flowed"
style="font-family: -moz-fixed; font-size: 13px;" lang="x-western">I
appreciate the problems with trying to use an inherently
tree-structured notation like XML to mark overlapping regions. I'm not
in touch with what the OSIS thinkers are thinking: other than the
milestone approach, has anybody considered combining in-line markup
identifying critical base elements (probably words) with standoff
indexed markup? As a concrete (and over-simplified) example, the text
would be marked in-line like this (Matt 27:11):
<br>
<br>
<t id="1">Jesus</t>
<br>
<t id="2">said</t>
<br>
<t id="3">,</t>
<br>
<t id="4">"</t>
<br>
<t id="5">You<t>
<br>
<t id="6">have</t>
<br>
<t id="7">said</t>
<br>
<t id="8">so</t>
<br>
<t id="9">.</t>
<br>
<t id="10">"</t>
<br>
<br>
with red-letter spans indexed with start-end indices <woc start="5"
end="9"/>. The standoff index markup doesn't need to nest, it just
marks spans.
Assume the numbering space for words extends throughout a book, you can
have quotations, sentences, or paragraphs that span verses, etc. And of
course individual tokens can be marked as punctuation, starting vs.
ending quotes, etc. <br>
<br>
I assume somebody has already considered this and there's a good reason
why it doesn't solve all the problems (or introduces new ones). <br>
<br>
Sean
<br>
<br>
<br>
</div>
</body>
</html>