<div dir="ltr">Isn't another solution that going forward we change the modules to be using well-formed XML? I guess I'm not understanding...</div><div class="gmail_extra"><br><br><div class="gmail_quote">On 16 April 2013 12:55, DM Smith <span dir="ltr"><<a href="mailto:dmsmith@crosswire.org" target="_blank">dmsmith@crosswire.org</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">Chris,<br>
<br>
There is no issue with UKJV, per se.<br>
<br>
osis2mod preserves all module markup, perhaps transformed, except the <verse> element. Earlier versions of osis2mod did not transform the <chapter> element to its milestoned version.<br>
<br>
This should be considered JSword's problem to deal with, which is what JSword is doing. Whenever JSword encounters a "verse" (in this case verse 0) it uses an xml parser to convert the text into DOM. All xml parsers require well-formed xml and are required to fail when otherwise. When JSword encounters an error in its assumption, it reports it and then strips xml from it.<br>
<br>
We have an open issue to do a better job with the handling of broken xml.<br>
<br>
There are a couple of improvements:<br>
Gather the text to display and convert all of it, instead of converting each verse one at a time. This recognizes that a tag opened in one verse may be closed in another. However, it does not work for "verse in isolation" (search results, lookups, parallel viewing, ...)<br>
<br>
Use a "lenient" xml parser (by definition there is no such thing) to repair text to be well-formed. I found Flying Saucer and jsoup, which look promising.<br>
<br>
The other possibility is to not use an xml parser at all to create the DOM but to do it with our own parsing (like we do for GBF and ThML).<br>
<br>
</div>I'll cross-post this to Sword-devel, as that is where this started.<br>
<br>
In Him,<br>
DM<br>
<div class="HOEnZb"><div class="h5"><br>
On Apr 15, 2013, at 4:15 PM, Chris Burrell wrote:<br>
<br>
> Hi all<br>
><br>
> There is perhaps an issue with the UKJV module. My logs show me:<br>
><br>
> 2013-04-15 20:31:30,236 INFO - UKJV:Exo 21:0: Parse UKJV(Exo 21:0) failed: Error on line 1: The element type "chapter" must be terminated by the matching end-tag "</chapter>".<br>
><br>
> Cheers,<br>
> Chris<br>
><br>
</div></div><div class="HOEnZb"><div class="h5">> _______________________________________________<br>
> sword-devel mailing list: <a href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a><br>
> <a href="http://www.crosswire.org/mailman/listinfo/sword-devel" target="_blank">http://www.crosswire.org/mailman/listinfo/sword-devel</a><br>
> Instructions to unsubscribe/change your settings at above page<br>
<br>
</div></div></blockquote></div><br></div>