[sword-devel] HELP! Need your feedback on XML Markup Language
Troy A. Griffitts
sword-devel@crosswire.org
Fri, 17 Aug 2001 17:32:40 -0700
You're such a smart alec! :)
Liked the paper, though it stretched my knowledge of XPointer and XSLT.
Might I suggest, rather than force the granularity to the smallest
PCDATA in all associative meta hierarchies, that you designate one
document the 'master'; let it raise it's hierarchy from a flat 'word' to
something in which the other ancillary hierarchies _might_ have in
common (e.g. module/testament/chapter/verse/word, or anything it
wishes). Force key attributed to be unique for all levels (like you have
done for 'word') in the master. This will allow a greatly reduced size
and complexity of additional auxiliary hierarchies, and remove the
redundant CDATA from all documents.
To use your example:
Dub your mostly unchanged Pages document the 'master' (just for example
purposes; any file could be dubbed the 'master', but it looks like we
get the most benefit from this first choice). I've added unique
attributes-- per our 'master' document requirements, above-- throughout
the document (l1[2]=l3, and l2[2]=l4:
<pages>
<page id="p1">
<line id="l1">
<w id="w1">This</w>
<w id="w2">is</w>
</line>
<line id="l2">
<w id="w3">text</w>
</line>
</page>
<page id="p2">
<line id="l3">
<w id="w4">in</w>
<w id="w5">a</w>
<w id="w6">base</w>
</line>
<line id="l4">
<w id="w7">file</w>
</line>
</page>
</pages>
This allows your Text document to be reduced from:
<text>
<para id="p1">
<w id="w1">This</w>
<w id="w2">is</w>
<w id="w3">text</w>
<w id="w4">in</w>
<w id="w5">a</w>
<w id="w6">base</w>
<w id="w7">file</w>
</para>
</text>
to:
<text>
<para id="p1">
<page id="p1" />
<page id="p2" />
</para>
</text>
Clauses from:
<clauses>
<clause id="c1">
<s>
<w id="w1">This</w>
</s>
<p>
<w id="w2">is</w>
</p>
<c>
<w id="w3">text</w>
</c>
<a>
<w id="w4">in</w>
<w id="w5">a</w>
<w id="w6">base</w>
<w id="w7">file</w>
</a>
</clause>
</clauses>
to:
<clauses>
<clause id="c1">
<s>
<w id="w1" />
</s>
<p>
<w id="w2" />
</p>
<c>
<w id="w3" />
</c>
<a>
<page id="p2" />
</a>
</clause>
</clauses>
The saving in space we see here is minimal, but I believe it reduces
error prone redundancy and provides a mechanism to potentially save
exponentially on space.
Please ignore me if I'm may be way off base. Just my 1/2 cent worth.
-Troy.