[sword-devel] XML idea: modular spec
David Burry
sword-devel@crosswire.org
Thu, 11 Oct 2001 23:55:03 -0700
I've been thinking for a long time about how to provide a reasonable
storage/index mechanism, and still give the end user interface designer
access to the complete the Bible in a variety of XML ways depending on the
needs of the application. There has been previous discussion on this list
regarding this, I called it looking at the data in different "slices" and
Patrick Durusau called it "concurrent markup"
(http://www.sbl-site2.org/Extreme2001/Concur.html).
However!!! <light goes on> I just thought of a great idea today about this
(I think, you tell me).... What if the Bible were stored in compressed
and/or indexed form on disk, yet "virtually" available/queryable as a large
repetitious XML type object, from which you could extract just the
portion/format you need, with say, an XPath or XQuery statement.
What I mean is that, suppose the Bible were stored in a binary/text
compressed and/or indexed format, but available for query _as_if_ it were
in this kind of format:
<version name="kjv">
<book name="genesis">
<chapter>
<verse><paragraphmarker/>contents of verse 1</verse>
<verse>contents of verse 2</verse>
<verse><paragraphmarker/>etc</verse>
...
</chapter>
...
<paragraph><chaptermarker/><versemarker/>contents of verse
1<versemarker/>contents of verse 2</paragraph>
<paragraph><versemarker/>etc</paragraph>
...
</book>
</version>
(Notice I didn't put paragraphs inside chapters because in fact paragraphs
can occasionally straddle chapter boundaries.)
You can see I'm proposing that the entire thing be duplicated 2 times for
the simple example above, but it only has to be "vitrually" duplicated, not
actually recorded twice anywhere on disk nor in memory. It allows you to
specify an XPath of
"/version[@name='kjv']/book[@name='genesis']/chapter/verse" to grab the
contents of all the verses in genesis in a verse-by-verse fashion with
paragraph markers, but
"/version[@name='kjv']/book[@name='genesis']/paragraph" to grab the same
contents in a paragraph-by-paragraph fashion with chapter/verse
markers. It's great because a properly extended thing like this could
allow you to query the Bible and get your results in many different
chapter/verse/paragraph/sentence/word/etc forms!
This would mean that we'd have to glue an XPath or XQuery parser into our
data store in a way it probably wasn't originally designed, so that we can
interpret the query first and then reconstitute the requested XML from our
data store without doing the entire extended duplicated XML tree. But it's
certainly possible, and more and more of this kind of stuff is getting more
modularized like this so it can only get easier to do in time... perhaps
someone else has even already thought of and done stuff like this. Anyone
know of any?
thoughts? comments?
Dave