[sword-devel] Validating ThML and OSIS modules

Chris Little chrislit at crosswire.org
Tue Jan 6 23:09:42 MST 2009



Jonathan Morgan wrote:
> On Tue, Jan 6, 2009 at 3:40 PM, Chris Little <chrislit at crosswire.org> wrote:
>> So write it and submit a patch.
>>
>> [Some basic requirements: Don't add library dependencies to Sword itself,
>> make the validator toggleable at runtime, and ensure that the validation
>> library is in C/C++ and can compile under Win32 and with GCC.]
> 
> It was the expected answer, but my answer is: no.  I do not have time
> to spend on it.  I have not strongly complained about validity of
> modules.  My statement is that if you care about validity, you would
> better spend your time enforcing validity in the importer than arguing
> about it on a mailing list.

I have just a couple quick comments, from the perspective of a content 
encoder (and ignoring all other roles that I may personally have).

XML Validators tend not to be very good for doing document validation at 
the editing stage. They are fine for confirming that a document is valid 
or not, but are often very unhelpful when bringing a document to the 
point of validity. I've tried to use xmllint for that purpose but find 
it a long and frustrating process. Any other validation facility based 
on libxml2 (like xmllint) would likely be the same. (Xerces might be 
considerably better, and I have a feeling one or two of the editors I 
mention below use it beneath the surface.)

As a content encoder, XML editors give me the best results when I want 
to find encoding errors. They tend to give a better indication of 
patterns of errors, whereas validators might only give the first error 
they identify and then quit. I use Oxygen now, partly because it is a 
Java program so it will run on whatever platform I'm using and they had 
a nice student license, but I have used Topologi & XML Spy in the past 
and they were fine for my needs. I'm sure there are some good OSS XML 
editors out there (though I've seen less encouraging results from jEdit).

And secondly, invalid OSIS according to the schema isn't _always_ 
invalid OSIS according to what we meant the schema to express. That is 
to say, we know there is one outstanding bug in the OSIS schema, and 
there may be others. As far as our TEI P5 schema goes--I maintain it 
myself and it's quite experimental. I've expressed willingness in the 
past to add additional TEI modules to our schema or even to add/adjust 
elements or attributes if we need them. So, in some cases it may be 
important for the encoder to overrule the judgment of the 
validator/schema so that he can encode and import a document he knows to 
be correctly encoded.

So that is to say that a validator within the importer has some value 
(and I've suggested adding one in the past), but it's not the most 
useful feature for content encoders. A good XML editor is (IMO).

--Chris



More information about the sword-devel mailing list