[jsword-devel] [sword-devel] Log error suggest a module issue?
DM Smith
dmsmith at crosswire.org
Tue Apr 16 13:55:12 MST 2013
The modules are well formed as a whole. We also require that chapters are well formed, which is not strictly necessary. If you grab an entire chapter (from verse 0 to the last verse), it will be well formed. You grab a single verse, which is a fragment of the chapter, it might not be.
The current version of osis2mod ensures that every fragment returned from a module is well formed. So going forward that is true.
Even if we said that we should redo ever module and even if we were to do that, we'd still have to cope with the way it was. Some people have modules that we don't offer any more.
In Him,
DM
On Apr 16, 2013, at 4:27 PM, Chris Burrell <chris at burrell.me.uk> wrote:
> Isn't another solution that going forward we change the modules to be using well-formed XML? I guess I'm not understanding...
>
>
> On 16 April 2013 12:55, DM Smith <dmsmith at crosswire.org> wrote:
> Chris,
>
> There is no issue with UKJV, per se.
>
> osis2mod preserves all module markup, perhaps transformed, except the <verse> element. Earlier versions of osis2mod did not transform the <chapter> element to its milestoned version.
>
> This should be considered JSword's problem to deal with, which is what JSword is doing. Whenever JSword encounters a "verse" (in this case verse 0) it uses an xml parser to convert the text into DOM. All xml parsers require well-formed xml and are required to fail when otherwise. When JSword encounters an error in its assumption, it reports it and then strips xml from it.
>
> We have an open issue to do a better job with the handling of broken xml.
>
> There are a couple of improvements:
> Gather the text to display and convert all of it, instead of converting each verse one at a time. This recognizes that a tag opened in one verse may be closed in another. However, it does not work for "verse in isolation" (search results, lookups, parallel viewing, ...)
>
> Use a "lenient" xml parser (by definition there is no such thing) to repair text to be well-formed. I found Flying Saucer and jsoup, which look promising.
>
> The other possibility is to not use an xml parser at all to create the DOM but to do it with our own parsing (like we do for GBF and ThML).
>
> I'll cross-post this to Sword-devel, as that is where this started.
>
> In Him,
> DM
>
> On Apr 15, 2013, at 4:15 PM, Chris Burrell wrote:
>
> > Hi all
> >
> > There is perhaps an issue with the UKJV module. My logs show me:
> >
> > 2013-04-15 20:31:30,236 INFO - UKJV:Exo 21:0: Parse UKJV(Exo 21:0) failed: Error on line 1: The element type "chapter" must be terminated by the matching end-tag "</chapter>".
> >
> > Cheers,
> > Chris
> >
> > _______________________________________________
> > sword-devel mailing list: sword-devel at crosswire.org
> > http://www.crosswire.org/mailman/listinfo/sword-devel
> > Instructions to unsubscribe/change your settings at above page
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20130416/eccb16df/attachment.html>
More information about the jsword-devel
mailing list