[sword-devel] XML DOM

DM Smith dmsmith555 at yahoo.com
Fri Mar 9 12:28:30 MST 2007


DJ Ortley wrote:
> Looking through the source code, it seems to me (which are key words 
> that indicate this is only an opinion, one which may not be worth 
> much) that using a library such as Xerces or some sort of XML DOM like 
> library would be of benefit.
>
> I was wondering if any thought had been given to that previously?

This is the approach that JSword uses. We actually use JAXP which is an 
interface layer over a plug-in implementation of XML. So in some cases 
we use Crimson and in others we use Xerces. It all depends upon what is 
bundled with the user's JDK. SAX is a better model for most processing 
than DOM, as most processing does not need an object representation of

That said, I think that there are significant advantages and also 
disadvantages to using it.
To me the most significant advantages are that it is a full 
implementation of an XML parser and we don't need to maintain it.

Disadvantages:
It is a full implementation of the XML parser. Sword doesn't need a full 
implementation of the parser. Our documents have a well defined 
vocabulary (i.e. the DTD specs) and we only need a parser sufficient to 
parse that vocabulary.

Parsing serves two purposes: search/indexing, i.e. stripping out only 
the text from the "verse" and display, i.e. converting the module raw 
source into some kind of presentation source. The former benefits from 
being very fast. Sword's "stripping" routines are built for speed. It 
would be a huge performance loss to use a true XML parser. For the most 
part, parsing for converting to a display representation can be slower 
because it will likely be fast enough.

The other thing is that the Sword library has taken a least common 
denominator approach to its requirements. It is targeted to small 
handhelds (phones, pdas and the like) and to computers of all ages, 
colors and creeds. Introducing a fairly large library would need to be 
optional (like curl, icu4c and lucene) and it would still leave the need 
for the current custom parsing.

Earlier I submitted a patch to make the parser more accurate and it was 
rejected as a performance hit and too big/risky of a change. And these 
were the reasons that I was given.



More information about the sword-devel mailing list