No subject
Sun Jan 3 17:45:15 MST 2010
On Sun, Jan 24, 2010 at 12:37 AM, Weston Ruter <westonruter at gmail.com>wrote=
:
> Attached is an example of what the ESV could look like as the result of a
> web service API response for 1 John 5:7-8, including virtual elements and
> stand-off markup. The relevant fragment:
>
> <concurrent>
> <!--
> @virtual can be "start", "end", "both", or "none" (default)
> target attribute used by Open Siddur; Efraim Feinstein notes range()
> is a TEI-defined XPointer scheme:
> http://www.tei-c.org/release/doc/tei-p5-doc/en/html/SA.html#SATS
> Alternative would be to use @sID and @eID
> -->
> <p virtual=3D"both" target=3D"#range(w6200500701, w6200500812)"
> /><!--sID=3D"w6200500701" eID=3D"w6200500706b"-->
> <verse osisID=3D"1John.5.7" target=3D"#range(h6200500601, p6200500706=
)"
> /><!--sID=3D"w6200500701" eID=3D"p6200500706"-->
> <verse osisID=3D"1John.5.8" target=3D"#range(w6200500801, p6200500812=
)"
> /><!--sID=3D"w6200500801" eID=3D"p6200500812"-->
> </concurrent>
> <content><!-- isn't @scope=3D"1John.5.7-1John.5.8" redundant here? -->
> <title ID=3D"h6200500601" canonical=3D"false" virtual=3D"true">Testim=
ony
> Concerning the Son of God</title>
> <w ID=3D"w6200500701">For</w>
> <w ID=3D"w6200500702">there</w>
> <w ID=3D"w6200500703">are</w>
> <w ID=3D"w6200500704">three</w>
> <w ID=3D"w6200500705">that</w>
> <w ID=3D"w6200500706">testify</w><w ID=3D"p6200500706">:</w>
> <w ID=3D"w6200500801">the</w>
> <w ID=3D"w6200500802">Spirit</w>
> <w ID=3D"w6200500803">and</w>
> <w ID=3D"w6200500804">the</w>
> <w ID=3D"w6200500805">water</w>
> <w ID=3D"w6200500806">and</w>
> <w ID=3D"w6200500807">the</w>
> <w ID=3D"w6200500808">blood</w><w ID=3D"p6200500808">;</w>
> <w ID=3D"w6200500809">and</w>
> <w ID=3D"w6200500810">these</w>
> <w ID=3D"w6200500811">three</w>
> <w ID=3D"w6200500812">agree</w><w ID=3D"w6200500812">.</w>
> </content>
>
>
>
>
> On Thu, Jan 21, 2010 at 9:40 AM, Weston Ruter <westonruter at gmail.com>wrot=
e:
>
>> Troy:
>>
>> I did say that since OSIS allows different ways to mark the same
>>> structure, we have an importer which attempts to accept any valid OSIS =
doc
>>> and _normalizes_ that doc into a form of OSIS we find easiest for our e=
ngine
>>> to process. It is still OSIS, just a form of OSIS with all structures
>>> represented in a single way.
>>>
>>
>> Thank you for clarifying this, and also for sharing some of this history
>> behind the development of OSIS.
>>
>> [We chose to] augment the specification with a 'best practices' doc whic=
h
>>> recommends a single specific method for encoding OSIS.
>>>
>>
>> I don't think I have seen this best practices doc. Is this something you
>> use internally at CrossWire as part of your importer script? Could you
>> direct me to it? I like the approach you took, allowing varying OSIS
>> encodings but recommending only one of them. This is similar to the
>> development of XHTML 1.0 dialects, where you are allowed to use the
>> Transitional doctype, but the Strict doctype is recommended. Doing this =
for
>> OSIS could answer the need for an unambiguous single markup language. Th=
e
>> best practices document would need to contain the practices that are
>> endorsed by at least the majority of players; the others could abstain a=
nd
>> still use their preferred (deprecated) dialect of OSIS. Along with this =
best
>> practices doc, an official normalizer script that translates OSIS into t=
he
>> recommended encoding would be great.
>>
>> I look forward to your thoughts about stand-off markup encoding of OSIS,
>> especially how well it might serve as the new recommended way to
>> unambiguously encode OSIS.
>>
>> Thanks!
>> Weston
>>
>>
>> 2010/1/19 Troy A. Griffitts <scribe at crosswire.org>
>>
>> Weston Ruter wrote:
>>>
>>>> ... Troy, as you've said before, you can't actually use OSIS as your r=
aw
>>>> data format at CrossWire because an OSIS document can be authored in m=
any
>>>> different ways and so there is much more programming logic that is nee=
ded to
>>>> handle all of the possible OSIS styles.
>>>>
>>>
>>> Hey Weston,
>>>
>>> Hope to have time for a thoughtful response to more of your suggestions=
,
>>> but just wanted to clear a couple things up first:
>>>
>>> I hope I never implied that we can't/don't use OSIS internally as our
>>> primary markup standard.
>>>
>>> I did say that since OSIS allows different ways to mark the same
>>> structure, we have an importer which attempts to accept any valid OSIS =
doc
>>> and _normalizes_ that doc into a form of OSIS we find easiest for our e=
ngine
>>> to process. It is still OSIS, just a form of OSIS with all structures
>>> represented in a single way.
>>>
>>> Even so, we still don't use any plain text format as our "raw data
>>> format". We typically compress and index documents when they are impor=
ted
>>> into our engine. You can ask our engine for OSIS, HTML, RTF, GBF, ThML=
, or
>>> plaintext and it will do its best to give you the data in the requested
>>> format.
>>>
>>> None of this to argue against your point: OSIS has multiple ways to
>>> encode a single structure in a document.
>>>
>>> The real answer to this is not technical. I too am frustrated with thi=
s.
>>> But many people working at many organizations were consulted when
>>> developing the OSIS specification. They gave great insights to how the=
y
>>> work. Sometimes they even made demands with an ultimatum that they wou=
ld
>>> absolutely not use the specification if a certain feature was not added=
to
>>> the spec.
>>>
>>> OSIS could have been technically finished in less than a year. It took
>>> us 3 years to get buy-in from all the participating organizations.
>>>
>>> In the end, the purpose of OSIS was to build collaboration between
>>> organizations. We could have developed a much easier to use technical
>>> specification which no one would have used, or conceded to demands to g=
ain
>>> buy-in, and augment the specification with a 'best practices' doc which
>>> recommends a single specific method for encoding OSIS. We chose the la=
ter.
>>>
>>> Implementing code against the spec now, it makes our importer a pain in
>>> the butt to write, but in the end, we get what we want: a single OSIS s=
tyle
>>> that our engine knows how to work with, and multiple supporting
>>> organizations producing OSIS documents.
>>>
>>>
>>> Troy.
>>>
>>>
>>>
>>>
>>> If we could define a single document structure, however, one
>>>
>>>> that is a subset of the freedom that OSIS provides (perhaps taking cue=
s
>>>> from OXES), we could then have an XML format for scripture that would =
be
>>>> suited for efficient interchange and application traversal.
>>>>
>>>> Currently we have the problem of two overlapping hierarchies: BSP and
>>>> BCV. However, there could be potentially multiple versification system=
s, so
>>>> there could be even more than two overlapping hierarchies, probably wh=
y the
>>>> <p> element isn't currently milestonable. To get around the problem of
>>>> overlapping hierarchies, what if we introduced stand-off markup into t=
he
>>>> equation? The words of scripture themselves could all be located in a =
flat
>>>> structure as siblings; then in the header there could be multiple CONC=
UR
>>>> sections (views) that list out the elements which belong to the variou=
s
>>>> parts of the hierarchies
>>>>
>>>> For example, the current approach:
>>>>
>>>> <p>
>>>> <verse osisID=3D"Example.1.1" sID=3D"Example.1.1" />
>>>> <w id=3D"w1">Then</w>
>>>> <w id=3D"w2">he</w>
>>>> <w id=3D"w3">said</w><w id=3D"p1">,</w>
>>>> <q marker=3D"=93" sID=3D"Example.1.1.q1" />
>>>> <w id=3D"w4">Let</w>
>>>> <w id=3D"w5">us</w>
>>>> <w id=3D"w6">go</w><w id=3D"p2">...</w>
>>>> </p>
>>>> <p>
>>>> <w id=3D"w7">but</w>
>>>> <verse eID=3D"Example.1.1" />
>>>> <verse osisID=3D"Example.1.2" sID=3D"Example.1.2"/>
>>>> <w id=3D"w8">don't</w>
>>>> <w id=3D"w9">forget</w>
>>>> <w id=3D"w10">your</w>
>>>> <w id=3D"w11">backpack</w><w id=3D"p3">.</w>
>>>> <q marker=3D"=94" eID=3D"Example.1.1.q1" />
>>>> <verse eID=3D"Example.1.2" />
>>>> </p>
>>>>
>>>>
>>>>
>>>> Could instead appear as (I'm making up these element names):
>>>>
>>>> <concur>
>>>> <view type=3D"verse" osisID=3D"Example.1.1" xpointer=3D"range(#w1, =
#w7)" />
>>>> <view type=3D"verse" osisID=3D"Example.1.2" xpointer=3D"range(#w8, =
#q2)" />
>>>> <view type=3D"quote" xpointer=3D"range(#q1, #q2)" />
>>>> <view type=3D"para" xpointer=3D"range(#w1, #p2)" />
>>>> <view type=3D"para" xpointer=3D"range(#w7, #q2)" />
>>>> </concur>
>>>> <content>
>>>> <w id=3D"w1">Then</w>
>>>> <w id=3D"w2">he</w>
>>>> <w id=3D"w3">said</w><w id=3D"p1">,</w>
>>>> <w id=3D"q1">=93</w><w id=3D"w4">Let</w>
>>>> <w id=3D"w5">us</w>
>>>> <w id=3D"w6">go</w><w id=3D"p2">...</w>
>>>> <w id=3D"w7">but</w>
>>>> <w id=3D"w8">don't</w>
>>>> <w id=3D"w9">forget</w>
>>>> <w id=3D"w10">your</w>
>>>> <w id=3D"w11">backpack</w><w id=3D"p3">.</w><w id=3D"q2">=94</w>
>>>> </content>
>>>> By structuring a document like this, multiple overlapping hierarchies
>>>> can be cleanly defined, although they are separated from the underlyin=
g
>>>> content: this however, provides the benefit of clearing up the confusi=
on as
>>>> to where the <verse>, <p>, and <q> elements should be placed: in the c=
oncur
>>>> section, they each can share references to the same content elements a=
nd so
>>>> their boundaries are specified at the exact same location. This means =
that
>>>> XML processors would be able to consistently handle each of the hierar=
chies
>>>> as they interweave throughout the content data.
>>>>
>>>> Efraim Feinstein and James Tauber introduced me to this approach to
>>>> structuring markup. See also:
>>>> http://www.tei-c.org/Guidelines/P4/html/NH.html#NHCO
>>>>
>>>> Weston
>>>>
>>>>
>>>
>>
>
--0016e64dca66083724047df0160c
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
To follow up again, here is the Open Siddur project's writeup on the XM=
L schema their came up with (JLPTEI) and why they didn't go with OSIS. =
The problem of concurrent hierarchies was a major concern:<br><blockquote s=
tyle=3D"margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204=
); padding-left: 1ex;" class=3D"gmail_quote">
<p>The primary question then becomes: which structure should be encoded?
Prose can be divided into paragraphs and sentences, poetic text can be
divided into line groups and verse lines, lists into items and lists,=20
etc. Many parts of the <i>siddur</i> have more than one structure on the
same text! XML assumes that a document has a pure hierarchical tree=20
structure. This suggests that XML is not an appropriate encoding=20
technology for the <i>siddur</i>. At the same time, XML encoding is=20
nearly universally standard and more software tools support XML-based=20
formats than other encoding formats. One of the primary innovations of=20
JLPTEI is its particular encoding of concurrent structural hierarchies.=20
While the idea is not novel, the implementation is. The potential for=20
the existence of concurrent structure is a guiding force in JLPTEI=20
design.
</p><p>The disadvantage of JLPTEI's encoding solutions is that the=20
archival form of the text is not immediately consumable by humans. We=20
are forced to rely extensively on processing software to make the format
editable and displayable. The disadvantage, however, is balanced by=20
the encoding format's extensibility and conservation of human labor.
</p><p>The Open Siddur intends to work within open standards whenever=20
possible. In choosing a basis for our encoding, we searched for=20
available encoding standards that would suit our purposes. We seriously
considered using <a href=3D"http://bibletechnologies.net/" title=3D"http:/=
/bibletechnologies.net/" rel=3D"nofollow" target=3D"_blank">Open Scripture =
Information Standard</a> (OSIS), an XML=20
format used for encoding bibles. It was quickly discovered that=20
representations of some of the more advanced features required to encode
the liturgy (such as those discussed above) would have to be "hacked&=
quot;=20
on top of the standard. The <a href=3D"http://www.tei-c.org/" title=3D"htt=
p://www.tei-c.org" rel=3D"nofollow" target=3D"_blank">Text=20
Encoding Initiative</a> (TEI) XML format is a de-facto standard within=20
the digital humanities community. It is also is specified in=20
well-documented texts, is actively supported by tools, and has a large=20
community built around its use and development. Further, the standard=20
is deliberately extensible using a relatively simple mechanism. The TEI
was therefore a natural choice as a basis for our encoding.
</p></blockquote>From <<a href=3D"http://wiki.jewishliturgy.org/JLPTEI" =
target=3D"_blank">http://wiki.jewishliturgy.org/JLPTEI</a>><br><br><div =
class=3D"gmail_quote">On Sun, Jan 24, 2010 at 12:37 AM, Weston Ruter <span =
dir=3D"ltr"><<a href=3D"mailto:westonruter at gmail.com" target=3D"_blank">=
westonruter at gmail.com</a>></span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde=
r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">Attached is an ex=
ample of what the ESV could look like as the result of a web service API re=
sponse for 1 John 5:7-8, including virtual elements and stand-off markup. T=
he relevant fragment:<br>
<br><span style=3D"font-family: courier new,monospace;"><concurrent><=
/span><br style=3D"font-family: courier new,monospace;">
<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 <!--</span=
><br style=3D"font-family: courier new,monospace;"><span style=3D"font-fami=
ly: courier new,monospace;">=A0=A0=A0 @virtual can be "start", &q=
uot;end", "both", or "none" (default)</span><br st=
yle=3D"font-family: courier new,monospace;">
<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 target attrib=
ute used by Open Siddur; Efraim Feinstein notes range()</span><br style=3D"=
font-family: courier new,monospace;"><span style=3D"font-family: courier ne=
w,monospace;">=A0=A0=A0 is a TEI-defined XPointer scheme:</span><br style=
=3D"font-family: courier new,monospace;">
<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 <a href=3D"ht=
tp://www.tei-c.org/release/doc/tei-p5-doc/en/html/SA.html#SATS" target=3D"_=
blank">http://www.tei-c.org/release/doc/tei-p5-doc/en/html/SA.html#SATS</a>=
</span><br style=3D"font-family: courier new,monospace;">
<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 Alternative w=
ould be to use @sID and @eID</span><br style=3D"font-family: courier new,mo=
nospace;"><span style=3D"font-family: courier new,monospace;">=A0=A0=A0 --&=
gt;</span><br style=3D"font-family: courier new,monospace;">
<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 <p virtual=
=3D"both" target=3D"#range(w6200500701, w6200500812)" /=
><!--sID=3D"w6200500701" eID=3D"w6200500706b"--&g=
t;</span><br style=3D"font-family: courier new,monospace;">
<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 <verse osi=
sID=3D"1John.5.7" target=3D"#range(h6200500601, p6200500706)=
" /><!--sID=3D"w6200500701" eID=3D"p6200500706&qu=
ot;--></span><br style=3D"font-family: courier new,monospace;">
<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 <verse osi=
sID=3D"1John.5.8" target=3D"#range(w6200500801, p6200500812)=
" /><!--sID=3D"w6200500801" eID=3D"p6200500812&qu=
ot;--></span><br style=3D"font-family: courier new,monospace;">
<span style=3D"font-family: courier new,monospace;"></concurrent></sp=
an><br style=3D"font-family: courier new,monospace;"><span style=3D"font-fa=
mily: courier new,monospace;"><content><!-- isn't @scope=3D&qu=
ot;1John.5.7-1John.5.8" redundant here? --></span><br style=3D"font=
-family: courier new,monospace;">
<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 <title ID=
=3D"h6200500601" canonical=3D"false" virtual=3D"tr=
ue">Testimony Concerning the Son of God</title></span><br sty=
le=3D"font-family: courier new,monospace;">
<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 <w ID=3D&q=
uot;w6200500701">For</w></span><br style=3D"font-family: cour=
ier new,monospace;"><span style=3D"font-family: courier new,monospace;">=A0=
=A0=A0 <w ID=3D"w6200500702">there</w></span><br styl=
e=3D"font-family: courier new,monospace;">
<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 <w ID=3D&q=
uot;w6200500703">are</w></span><br style=3D"font-family: cour=
ier new,monospace;"><span style=3D"font-family: courier new,monospace;">=A0=
=A0=A0 <w ID=3D"w6200500704">three</w></span><br styl=
e=3D"font-family: courier new,monospace;">
<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 <w ID=3D&q=
uot;w6200500705">that</w></span><br style=3D"font-family: cou=
rier new,monospace;"><span style=3D"font-family: courier new,monospace;">=
=A0=A0=A0 <w ID=3D"w6200500706">testify</w><w ID=
=3D"p6200500706">:</w></span><br style=3D"font-family: c=
ourier new,monospace;">
<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 <w ID=3D&q=
uot;w6200500801">the</w></span><br style=3D"font-family: cour=
ier new,monospace;"><span style=3D"font-family: courier new,monospace;">=A0=
=A0=A0 <w ID=3D"w6200500802">Spirit</w></span><br sty=
le=3D"font-family: courier new,monospace;">
<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 <w ID=3D&q=
uot;w6200500803">and</w></span><br style=3D"font-family: cour=
ier new,monospace;"><span style=3D"font-family: courier new,monospace;">=A0=
=A0=A0 <w ID=3D"w6200500804">the</w></span><br style=
=3D"font-family: courier new,monospace;">
<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 <w ID=3D&q=
uot;w6200500805">water</w></span><br style=3D"font-family: co=
urier new,monospace;"><span style=3D"font-family: courier new,monospace;">=
=A0=A0=A0 <w ID=3D"w6200500806">and</w></span><br sty=
le=3D"font-family: courier new,monospace;">
<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 <w ID=3D&q=
uot;w6200500807">the</w></span><br style=3D"font-family: cour=
ier new,monospace;"><span style=3D"font-family: courier new,monospace;">=A0=
=A0=A0 <w ID=3D"w6200500808">blood</w><w ID=3D&quo=
t;p6200500808">;</w></span><br style=3D"font-family: courier =
new,monospace;">
<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 <w ID=3D&q=
uot;w6200500809">and</w></span><br style=3D"font-family: cour=
ier new,monospace;"><span style=3D"font-family: courier new,monospace;">=A0=
=A0=A0 <w ID=3D"w6200500810">these</w></span><br styl=
e=3D"font-family: courier new,monospace;">
<span style=3D"font-family: courier new,monospace;">=A0=A0=A0 <w ID=3D&q=
uot;w6200500811">three</w></span><br style=3D"font-family: co=
urier new,monospace;"><span style=3D"font-family: courier new,monospace;">=
=A0=A0=A0 <w ID=3D"w6200500812">agree</w><w ID=3D&=
quot;w6200500812">.</w></span><br style=3D"font-family: couri=
er new,monospace;">
<span style=3D"font-family: courier new,monospace;"></content></span>=
<div><div></div><div><br><br><br><br><div class=3D"gmail_quote">On Thu, Jan=
21, 2010 at 9:40 AM, Weston Ruter <span dir=3D"ltr"><<a href=3D"mailto:=
westonruter at gmail.com" target=3D"_blank">westonruter at gmail.com</a>></spa=
n> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde=
r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">Troy:<div><br><bl=
ockquote style=3D"border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0p=
t 0pt 0.8ex; padding-left: 1ex;" class=3D"gmail_quote">
I did say that since OSIS allows different ways to mark the same
structure, we have an importer which attempts to accept any valid OSIS
doc and _normalizes_ that doc into a form of OSIS we find easiest for
our engine to process. =A0It is still OSIS, just a form of OSIS with all
structures represented in a single way.<br></blockquote></div><div><br>Than=
k you for clarifying this, and also for sharing some of this history behind=
the development of OSIS.<br><br><blockquote style=3D"border-left: 1px soli=
d rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;" class=
=3D"gmail_quote">
[We chose to] augment the specification with a 'best practices' doc=
which recommends
a single specific method for encoding OSIS.<br></blockquote>=A0<br>I don=
9;t think I have seen this best practices doc. Is this something you use in=
ternally at CrossWire as part of your importer script? Could you direct me =
to it? I like the approach you took, allowing varying OSIS encodings but re=
commending only one of them. This is similar to the development of XHTML 1.=
0 dialects, where you are allowed to use the Transitional doctype, but the =
Strict doctype is recommended. Doing this for OSIS could answer the need fo=
r an unambiguous single markup language. The best practices document would =
need to contain the practices that are endorsed by at least the majority of=
players; the others could abstain and still use their preferred (deprecate=
d) dialect of OSIS. Along with this best practices doc, an official normali=
zer script that translates OSIS into the recommended encoding would be grea=
t.<br>
<br>I look forward to your thoughts about stand-off markup encoding of OSIS=
, especially how well it might serve as the new recommended way to unambigu=
ously encode OSIS.<br><br>Thanks!<br>Weston<br><br></div><br><div class=3D"=
gmail_quote">
2010/1/19 Troy A. Griffitts <span dir=3D"ltr"><<a href=3D"mailto:scribe@=
crosswire.org" target=3D"_blank">scribe at crosswire.org</a>></span><div><d=
iv></div><div><br><blockquote class=3D"gmail_quote" style=3D"border-left: 1=
px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"=
>
Weston Ruter wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, =
204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
... Troy, as you've said before, you can't actually use OSIS as you=
r raw data format at CrossWire because an OSIS document can be authored in =
many different ways and so there is much more programming logic that is nee=
ded to handle all of the possible OSIS styles.<br>
</blockquote>
<br>
Hey Weston,<br>
<br>
Hope to have time for a thoughtful response to more of your suggestions, bu=
t just wanted to clear a couple things up first:<br>
<br>
I hope I never implied that we can't/don't use OSIS internally as o=
ur primary markup standard.<br>
<br>
I did say that since OSIS allows different ways to mark the same structure,=
we have an importer which attempts to accept any valid OSIS doc and _norma=
lizes_ that doc into a form of OSIS we find easiest for our engine to proce=
ss. =A0It is still OSIS, just a form of OSIS with all structures represente=
d in a single way.<br>
<br>
Even so, we still don't use any plain text format as our "raw data=
format". =A0We typically compress and index documents when they are i=
mported into our engine. =A0You can ask our engine for OSIS, HTML, RTF, GBF=
, ThML, or plaintext and it will do its best to give you the data in the re=
quested format.<br>
<br>
None of this to argue against your point: OSIS has multiple ways to encode =
a single structure in a document.<br>
<br>
The real answer to this is not technical. =A0I too am frustrated with this.=
=A0But many people working at many organizations were consulted when devel=
oping the OSIS specification. =A0They gave great insights to how they work.=
=A0Sometimes they even made demands with an ultimatum that they would abso=
lutely not use the specification if a certain feature was not added to the =
spec.<br>
<br>
OSIS could have been technically finished in less than a year. =A0It took u=
s 3 years to get buy-in from all the participating organizations.<br>
<br>
In the end, the purpose of OSIS was to build collaboration between organiza=
tions. =A0We could have developed a much easier to use technical specificat=
ion which no one would have used, or conceded to demands to gain buy-in, an=
d augment the specification with a 'best practices' doc which recom=
mends a single specific method for encoding OSIS. =A0We chose the later.<br=
>
<br>
Implementing code against the spec now, it makes our importer a pain in the=
butt to write, but in the end, we get what we want: a single OSIS style th=
at our engine knows how to work with, and multiple supporting organizations=
producing OSIS documents.<br>
<font color=3D"#888888">
<br>
<br>
Troy.</font><div><div></div><div><br>
<br>
<br>
<br>
If we could define a single document structure, however, one<br>
<blockquote class=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, =
204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
that is a subset of the freedom that OSIS provides (perhaps taking cues fro=
m OXES), we could then have an XML format for scripture that would be suite=
d for efficient interchange and application traversal.<br>
<br>
Currently we have the problem of two overlapping hierarchies: BSP and BCV. =
However, there could be potentially multiple versification systems, so ther=
e could be even more than two overlapping hierarchies, probably why the <=
;p> element isn't currently milestonable. To get around the problem =
of overlapping hierarchies, what if we introduced stand-off markup into the=
equation? The words of scripture themselves could all be located in a flat=
structure as siblings; then in the header there could be multiple CONCUR s=
ections (views) that list out the elements which belong to the various part=
s of the hierarchies<br>
<br>
For example, the current approach:<br>
<br>
<p><br>
=A0 =A0<verse osisID=3D"Example.1.1" sID=3D"Example.1.1&=
quot; /><br>
=A0 =A0<w id=3D"w1">Then</w><br>
=A0 =A0<w id=3D"w2">he</w><br>
=A0 =A0<w id=3D"w3">said</w><w id=3D"p1"=
;>,</w><br>
=A0 =A0<q marker=3D"=93" sID=3D"Example.1.1.q1" /&g=
t;<br>
=A0 =A0 =A0 =A0<w id=3D"w4">Let</w><br>
=A0 =A0 =A0 =A0<w id=3D"w5">us</w><br>
=A0 =A0 =A0 =A0<w id=3D"w6">go</w><w id=3D"p=
2">...</w><br>
</p><br>
<p><br>
=A0 =A0<w id=3D"w7">but</w><br>
=A0 =A0<verse eID=3D"Example.1.1" /><br>
=A0 =A0<verse osisID=3D"Example.1.2" sID=3D"Example.1.2&=
quot;/><br>
=A0 =A0<w id=3D"w8">don't</w><br>
=A0 =A0<w id=3D"w9">forget</w><br>
=A0 =A0<w id=3D"w10">your</w><br>
=A0 =A0<w id=3D"w11">backpack</w><w id=3D"p3=
">.</w><br>
=A0 =A0<q marker=3D"=94" eID=3D"Example.1.1.q1" /&g=
t;<br>
=A0 =A0<verse eID=3D"Example.1.2" /><br>
</p><br>
<br>
<br>
<br>
Could instead appear as (I'm making up these element names):<br>
<br>
<concur><br>
=A0 =A0<view type=3D"verse" osisID=3D"Example.1.1" =
xpointer=3D"range(#w1, #w7)" /><br>
=A0 =A0<view type=3D"verse" osisID=3D"Example.1.2" =
xpointer=3D"range(#w8, #q2)" /><br>
=A0 =A0<view type=3D"quote" xpointer=3D"range(#q1, #q2)&=
quot; /><br>
=A0 =A0<view type=3D"para" =A0xpointer=3D"range(#w1, #p2=
)" /><br>
=A0 =A0<view type=3D"para" =A0xpointer=3D"range(#w7, #q2=
)" /><br>
</concur><br>
<content><br>
=A0 =A0<w id=3D"w1">Then</w><br>
=A0 =A0<w id=3D"w2">he</w><br>
=A0 =A0<w id=3D"w3">said</w><w id=3D"p1"=
;>,</w><br>
=A0 =A0<w id=3D"q1">=93</w><w id=3D"w4"=
>Let</w><br>
=A0 =A0<w id=3D"w5">us</w><br>
=A0 =A0<w id=3D"w6">go</w><w id=3D"p2"&=
gt;...</w><br>
=A0 =A0<w id=3D"w7">but</w><br>
=A0 =A0<w id=3D"w8">don't</w><br>
=A0 =A0<w id=3D"w9">forget</w><br>
=A0 =A0<w id=3D"w10">your</w><br>
=A0 =A0<w id=3D"w11">backpack</w><w id=3D"p3=
">.</w><w id=3D"q2">=94</w><br>
</content> =A0 <br>
By structuring a document like this, multiple overlapping hierarchies can b=
e cleanly defined, although they are separated from the underlying content:=
this however, provides the benefit of clearing up the confusion as to wher=
e the <verse>, <p>, and <q> elements should be placed: in=
the concur section, they each can share references to the same content ele=
ments and so their boundaries are specified at the exact same location. Thi=
s means that XML processors would be able to consistently handle each of th=
e hierarchies as they interweave throughout the content data.<br>
<br>
Efraim Feinstein and James Tauber introduced me to this approach to structu=
ring markup. See also: <a href=3D"http://www.tei-c.org/Guidelines/P4/html/N=
H.html#NHCO" target=3D"_blank">http://www.tei-c.org/Guidelines/P4/html/NH.h=
tml#NHCO</a><br>
<br>
Weston<br>
<br>
</blockquote>
<br>
</div></div></blockquote></div></div></div><br>
</blockquote></div><br>
</div></div></blockquote></div><br>
--0016e64dca66083724047df0160c--
More information about the osis-users
mailing list