[sword-devel] Improvements to osis2mod to handle XML comments and <header> correctly

DM Smith dmsmith at crosswire.org
Mon Apr 5 09:05:31 MST 2010


On 04/05/2010 09:03 AM, Dmitrijs Ledkovs wrote:
> On 5 April 2010 13:55, Manfred Bergmann<manfred.bergmann at me.com>  wrote:
>    
>> Hi DM.
>>
>> Am 05.04.2010 um 13:21 schrieb DM Smith:
>>
>>      
>>> Regarding using a "real" parser, it is a good idea. But we don't want SWORD to be dependant on an external parser.
>>>        
>> What's the reason for that?
>> I could understand if it would mean for the user to install certain libraries manually but when the sources can be integrated into the project and has the appropriate licence then why not?
>>
>>
>> Manfred
>>
>>      
> IMHO there is no harm in bringing in libxml or a much more lightweight
> parser like GMarkup. The build system just needs to be adjusted to
> link e.g. libxml for the osis2mod binary and not shared sword library.
> in can be even called a new tool osisxml2mod for example and make it
> be build optionally such that you can still have full sword dev
> environment without libxml.
>
> Tools for creating modules do not have be linked with sword or even
> live in sword taball / svn. Although it does help consistent
> distribution of tools.
>    
I don't remember all of Troy's reasoning when I argued for a true parser.

 From what I recall:
o To maintain freedom to re-license SWORD (e.g. for some other Bible 
society) we need to be able to keep 3-rd party library dependencies well 
managed. The license needs to be compatible with the GPL but cannot be GPL.

o The parser that we have is minimal and simple, sacrificing accuracy 
and completeness for speed. Regarding accuracy, e.g. the parser allows 
for spaces around = in attribute declarations. Regarding completeness, 
e.g. it does not handle namespaces, cdata, dtds/schemas, .... 
Significantly, it does not require a well-formed document, allowing for 
fragments. Rather than an error, it continues when an xml parser is 
required to stop.

o This parser has better error reporting in that it is based upon 
knowledge of the input. E.g. it reports the verse having the problem.

o By SWORD having the parser, we are not dependent on finding an 
implementation for every platform (e.g. Windows).

There may be other reasons. I'm willing to live with it.

But what we really need is not a parser but a tokenizer. I'm thinking 
about writing one (my degree work was in compiler writing). Basically, 
we repeat the same tokenization code in several places. It should be 
trivial to write a complete, accurate one.

In His Service,
     DM



More information about the sword-devel mailing list