[sword-devel] XML idea: modular spec
David Burry
sword-devel@crosswire.org
Fri, 12 Oct 2001 00:11:33 -0700
By the way, it has occurred to me in re-reading Patrick's page that he
probably said the same thing I just did, only he said so much in such a way
it's hard to absorb.... ;o)
Dave
At 11:55 PM 10/11/2001 -0700, David Burry wrote:
>I've been thinking for a long time about how to provide a reasonable
>storage/index mechanism, and still give the end user interface designer
>access to the complete the Bible in a variety of XML ways depending on the
>needs of the application. There has been previous discussion on this list
>regarding this, I called it looking at the data in different "slices" and
>Patrick Durusau called it "concurrent markup"
>(http://www.sbl-site2.org/Extreme2001/Concur.html).
>
>However!!! <light goes on> I just thought of a great idea today about
>this (I think, you tell me).... What if the Bible were stored in
>compressed and/or indexed form on disk, yet "virtually"
>available/queryable as a large repetitious XML type object, from which you
>could extract just the portion/format you need, with say, an XPath or
>XQuery statement.
>
>What I mean is that, suppose the Bible were stored in a binary/text
>compressed and/or indexed format, but available for query _as_if_ it were
>in this kind of format:
>
><version name="kjv">
> <book name="genesis">
> <chapter>
> <verse><paragraphmarker/>contents of verse 1</verse>
> <verse>contents of verse 2</verse>
> <verse><paragraphmarker/>etc</verse>
> ...
> </chapter>
> ...
> <paragraph><chaptermarker/><versemarker/>contents of verse
> 1<versemarker/>contents of verse 2</paragraph>
> <paragraph><versemarker/>etc</paragraph>
> ...
> </book>
></version>
>
>(Notice I didn't put paragraphs inside chapters because in fact paragraphs
>can occasionally straddle chapter boundaries.)
>
>You can see I'm proposing that the entire thing be duplicated 2 times for
>the simple example above, but it only has to be "vitrually" duplicated,
>not actually recorded twice anywhere on disk nor in memory. It allows you
>to specify an XPath of
>"/version[@name='kjv']/book[@name='genesis']/chapter/verse" to grab the
>contents of all the verses in genesis in a verse-by-verse fashion with
>paragraph markers, but
>"/version[@name='kjv']/book[@name='genesis']/paragraph" to grab the same
>contents in a paragraph-by-paragraph fashion with chapter/verse
>markers. It's great because a properly extended thing like this could
>allow you to query the Bible and get your results in many different
>chapter/verse/paragraph/sentence/word/etc forms!
>
>This would mean that we'd have to glue an XPath or XQuery parser into our
>data store in a way it probably wasn't originally designed, so that we can
>interpret the query first and then reconstitute the requested XML from our
>data store without doing the entire extended duplicated XML tree. But
>it's certainly possible, and more and more of this kind of stuff is
>getting more modularized like this so it can only get easier to do in
>time... perhaps someone else has even already thought of and done stuff
>like this. Anyone know of any?
>
>thoughts? comments?
>
>Dave