[osis-core] Segmenation.
Steve DeRose
osis-core@bibletechnologieswg.org
Wed, 19 Jun 2002 15:26:20 -0400
At 12:36 PM -0500 06/05/02, Todd Tillinghast wrote:
>There are several types/classes of hierarchies that could be segmented
>using our schema. Stepping back and looking at the big picture, it
>seems to me that we need to determine which types/classes of hierarchies
>can be segmented and which elements within each logical hierarchy can be
>segmented.
>
>Assuming that there can be more than one hierarchy segmented
>simultaneously there needs to be clear guidelines that detail which
>elements go together to reconstitute the logical elements that were
>originally segmented. And it would be helpful to there were "best
>practices" regarding the identifiers used for xxxID, next, and previous
>attributes of the segmented elements.
Yup; the rules for reconstituting are actually a really interesting
current research problem -- so I'd like to punt on it if we can for
now.
I like the best-practice idea though -- how about this:
A) All parts of a broken verse get the same verse ID.
B) The next and previous chain will use the verseID, but with "_a"
"_b" and so on tacked on, in document order.
C) Other element types that must be broken will label their next/prev chain by
the nearest verse identifier, "_", the element type, another "_", and a letter.
Just the first thing that came to mind....
>
>The trickiest piece seems to be lowest level container of "actual"
>scripture text. If we say that "actual" scripture text must always be
>directly contained by <verse> (or within <abbr>, <foreign>,
><inscription>, <name>, or a simple <q> contained within <verse>) then
><verse> elements will ALWAYS hold the identifiers that allow us to
>reconstitute "pure" verses that were segmented. However, as it stands
>it is POSSIBLE and even NATURAL to encode "actual" scripture text in
><lineGroup>/<line>, <q>, <list>/<item>, <p>, and <blockQuote> with out
>any <verse> elements at all (or with a mixture including some <verse>
>elements).
>
>If we identify a "role" for elements that is "lowest level container of
>'actual' scripture", then when reconstituting the text into logical
>verses, elements acting in this "role" could be identified INDEPENANT of
>their element name. This would allow any of element acting in this
>"role" to act the same as a <verse> element for the purpose of
>identification. In fact that is what we have said we would like to do
>with <p> when it is exactly one verse. This would eliminate the COMMON
>cases where you see.
Interesting.... Are there some elements that would only *sometimes*
be the lowest, though? Hmmm. So if we allowed verseID on a lot of
things, people could save the double markup....
What do people think on this? Patrick also called it interesting; did
I miss any other replies?
This would be a new way of using markup -- in effect, any element
that had a non-empty "verseID" attribute would be held to be a verse.
Kind of like architectural forms, except that it goes by having the
attribute name, rather than by value. I'm slightly inclined not to go
for it, but mainly for non-technical reasons like minimizing
last-minute changes, and the fact that it is a pretty novel
construct. It would certainly save some markup for people doing this
by hand....
>
><line><verse verseID="...">...</verse></line>
>and
><p><verse verseID="...">...</verse></p>
>
>replace them with
><line verseID="...">...</line>
>and
><p verseID="...">...</p>
>
>but does not prevent
><p>
> <verse verseID="a">...</verse>
> <verse verseID="b">...</verse>
> <verse verseID="c">...</verse>
> <verse verseID="d">...</verse>
></p>
>
>This does not PRECLUDE the more complicated cases where there are
>multiple hierarchies segmented simultaneously.
>
><p pID="s" next="t">
> <verse verseID="x">...</verse>
> <verse verseID="a" verseNext="b">...</verse>
></p>
><p pID="t" prev="s" verseID="b" versePrev="a">...</p>
>
>For an element to take on this proposed "role" they would simple assign
>a value to their "verseID" attribute and the appropriate "verseNext" and
>"versePrev" attributes. If the same element were segmented through
>their participation in another logical hierarchy then the element
>specific xID attribute and next/prev attributes would be assigned
>appropriate values.
>
>SUMMARY: There are a lot of elements that naturally take on the "role"
>of "lowest level container of 'actual' scripture". In order to simplify
>allow a discrete set of elements to all perform the same role as
><verse>. When reconstituting, simply go to the next element with an
>attribute verseID with a value equal to the current nodes verseNext and
>a versePrev with a value that matches the current nodes verseID. Other
>segmentation would require the element name to be the same as it other
>parts. (This makes a special case out of <verse> which simplifies
>encoding and element construction.)
>
>PROPOSAL: Create an abstract type that defines the attributes and
>possible child elements of an element acting in the proposed "role".
>Derive all elements that can act in this role from this element.
>
>Todd
--
Steve DeRose -- http://www.stg.brown.edu/~sjd
Chair, Bible Technologies Group -- http://www.bibletechnologies.net
Email: sderose@speakeasy.net
Backup email: sderose@mac.com, sjd@stg.brown.edu