[sword-devel] Improvements to osis2mod to handle XML comments and <header> correctly

DM Smith dmsmith at crosswire.org
Mon Apr 5 05:43:34 MST 2010


On 04/05/2010 08:21 AM, DM Smith wrote:
> John,
>
> Sorry for the late reply. This patch looks good and we'll commit it 
> shortly.
>
> Regarding using a "real" parser, it is a good idea. But we don't want 
> SWORD to be dependant on an external parser. The only way I see us 
> doing it is to implement the SAX interface ourselves but allow for an 
> alternative implementation to be used. I don't think that would be too 
> hard or that much of a change.
If not SAX, which is a push processor, then another streaming XML 
parser, based upon the pull model (StAX, XPP, ....)

>
> In Him,
>     DM
>
> On 02/04/2010 05:31 AM, John Zaitseff wrote:
>> Dear SWORD developers,
>>
>> Firstly, thanks for developing the SWORD library!  I have been using
>> this library, in conjunction with the BibleTime front-end, for many
>> years.
>>
>> I have recently started to develop some OSIS documents of my own.
>> In doing so, I found that the XML parser in osis2mod is somewhat
>> fragile---something that you are, no doubt, aware of.
>>
>> In particular, osis2mod does not handle XML comments at all, nor
>> does it correctly parse the<header>  element.  Being able to handle
>> XML comments is, I think, quite important---I like to document the
>> SVN revision ID, for example, in an XML comment.
>>
>> Furthermore, the osis2mod XML parser looks for the first<div>  in
>> the document, no matter where that occurs.  In particular, if the
>> OSIS document includes a<revisionDesc>  tag in the header, it will
>> have<p>  tags as well---which will be translated by transformBSP()
>> into<div>  tags---and get used as the starting point for the
>> document!
>>
>> For this reason, I have generated a quick patch that will solve
>> these particular problems.  Could you please apply it to the SVN
>> head for utilities/osis2mod.cpp.  Comments are handled similar to
>> spaces: they are skipped.  And handleToken() now looks for the first
>> <div>  after the</revision>  end tag.
>>
>> In general, I think that (perhaps eventually) the proper way to
>> parse XML is to use a library like libxml---which is designed
>> specifically for this purpose.
>>
>> Yours truly,
>>
>> John Zaitseff
>>      

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20100405/5ae0eeca/attachment.html>


More information about the sword-devel mailing list