[sword-devel] XML idea: modular spec

David Burry sword-devel@crosswire.org
Thu, 11 Oct 2001 23:55:03 -0700


I've been thinking for a long time about how to provide a reasonable 
storage/index mechanism, and still give the end user interface designer 
access to the complete the Bible in a variety of XML ways depending on the 
needs of the application.  There has been previous discussion on this list 
regarding this, I called it looking at the data in different "slices" and 
Patrick Durusau called it "concurrent markup" 
(http://www.sbl-site2.org/Extreme2001/Concur.html).

However!!! <light goes on>  I just thought of a great idea today about this 
(I think, you tell me)....  What if the Bible were stored in compressed 
and/or indexed form on disk, yet "virtually" available/queryable as a large 
repetitious XML type object, from which you could extract just the 
portion/format you need, with say, an XPath or XQuery statement.

What I mean is that, suppose the Bible were stored in a binary/text 
compressed and/or indexed format, but available for query _as_if_ it were 
in this kind of format:

<version name="kjv">
   <book name="genesis">
     <chapter>
       <verse><paragraphmarker/>contents of verse 1</verse>
       <verse>contents of verse 2</verse>
       <verse><paragraphmarker/>etc</verse>
        ...
     </chapter>
     ...
     <paragraph><chaptermarker/><versemarker/>contents of verse 
1<versemarker/>contents of verse 2</paragraph>
     <paragraph><versemarker/>etc</paragraph>
     ...
   </book>
</version>

(Notice I didn't put paragraphs inside chapters because in fact paragraphs 
can occasionally straddle chapter boundaries.)

You can see I'm proposing that the entire thing be duplicated 2 times for 
the simple example above, but it only has to be "vitrually" duplicated, not 
actually recorded twice anywhere on disk nor in memory.  It allows you to 
specify an XPath of 
"/version[@name='kjv']/book[@name='genesis']/chapter/verse" to grab the 
contents of all the verses in genesis in a verse-by-verse fashion with 
paragraph markers, but 
"/version[@name='kjv']/book[@name='genesis']/paragraph" to grab the same 
contents in a paragraph-by-paragraph fashion with chapter/verse 
markers.  It's great because a properly extended thing like this could 
allow you to query the Bible and get your results in many different 
chapter/verse/paragraph/sentence/word/etc forms!

This would mean that we'd have to glue an XPath or XQuery parser into our 
data store in a way it probably wasn't originally designed, so that we can 
interpret the query first and then reconstitute the requested XML from our 
data store without doing the entire extended duplicated XML tree.  But it's 
certainly possible, and more and more of this kind of stuff is getting more 
modularized like this so it can only get easier to do in time... perhaps 
someone else has even already thought of and done stuff like this.  Anyone 
know of any?

thoughts?  comments?

Dave