[jsword-devel] JSword and Map/Image modules

Brian Fernandes infernalproteus at gmail.com
Sat Jan 31 07:20:08 MST 2009


DM,

I could find no way to attach a patch to the bug report, other than 
pasting it in the comment directly, so attaching them here instead.

BD-135.txt is a fix for the Map TOC display in BD.

closehtmltags.txt uses a regular expression to close <br>, <hr> and 
<img> tags if they are not already closed. This is a one shot operation 
after which parsing is attempted, before the last "remove all tags" 
failsafe.

I have tested these with around 4 map modules (though the tag fix will 
be applied to all ThML modules for which parsing fails) which fail with 
the current version of BD. Please review and feel free to modify / 
correct as you see fit.

Thanks,
Brian.


DM Smith wrote:
> 
> On Jan 29, 2009, at 9:54 AM, Brian Fernandes wrote:
> 
>>>>
>>>> The options I see are:
>>>> a) Change logic (just for maps) to make a list/tree decision based 
>>>> on the type of book object.
>>> I think this is the right choice.
>>
>> I have a patch ready, I'll attach it to the bug report shortly.
> 
> Many thanks!
> 
>>
>>
>>> Is <br> is the only element in HTML defined to have not content? If 
>>> we have a complete list, I'd be happy for your suggested change to be 
>>> added.
>>
>> That's a good question. I found a list of empty tags here:
>> http://everything2.com/title/Empty%2520HTML%2520Tags (most lists are 
>> very similar). Out of these, I think only BR, HR and IMG make sense in 
>> our context, what do you think?
>>
>> I was testing further this morning and noticed that there are several
>> other Map modules in Karl's repository which failed to show images in
>> BD. So it seems like Karl has several modules with bad ThML. The good 
>> news is that a really simple "<br>" to "<br/>" fix I put in was enough 
>> to get all of these working.
>>
>> In HTML, it's common to see open <img> tags as well, though none of 
>> Karl's modules had that problem. What is a good source for modules 
>> with problems? Would be helpful to see different problems across 
>> different modules so that a more robust (but simple) solution could be 
>> developed (definitely not something as advanced as what you suggested 
>> earlier).
>>
>> Do you think we should add a single step after the character clean up 
>> which fixes <br>, <hr> and <img> in one go? Or does it make more sense 
>> to fix them one at a time, and attempt to parse after each fix?
> 
> I would suggest: Do what is easiest to understand and maintain. But hold 
> parsing until all are done.
> 
> Again thanks,
> 
>     DM
> 
> 
> 
> 
> _______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel
> 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: bd-135.txt
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20090131/09be51ca/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: closehtmltags.txt
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20090131/09be51ca/attachment-0001.txt>


More information about the jsword-devel mailing list