[jsword-devel] bug report: thml parsing error
NF
lzj369 at gmail.com
Fri Mar 13 12:50:32 MST 2009
Dear DM/other buddies,
in an effort to compile gen book module, I noticed that the current
transformation of thml module has a bug. Not sure if it is fixed .
in order to transform text lke this:
<p>仅仅有爱是不够的你也许记得甲壳虫乐队唱的那句歌词:“你需要的只是爱。”我完全不赞同此观点。 </p>
the above sentence is in Chinese, but I guess the issue will the same
as other lang.
after transforming, it became:
<p仅仅有爱是不够的你也许记得甲壳虫乐队唱的那句歌词:“你需要的只是爱。”我完全不赞同此观点。> </p>
I have debugged another existing thml modules, I found out they are
not transformed either. But
the affected class is : PrettySerializingContentHandler
public void startElement(String uri, String localname, String qname,
Attributes attrs) {
if (depth > 0) {
handlePending();
}
write(getTagStart());
write(decorateTagName(localname));
for (int i = 0; i < attrs.getLength(); i++) {
write(' ');
write(decorateAttributeName(XMLUtil.getAttributeName(attrs, i)));
write("='"); //$NON-NLS-1$
write(decorateAttributeValue(XMLUtil.escape(attrs.getValue(i))));
write('\'');
}
pendingEndTag = true;
this line ==> lookingForChars = false;
depth++;
}
I added the line above so I can move on, but there is still something
wrong. I will continue to research this issue when I got a chance.
This line :
<p>This text was prepared by Logos Research Systems, Inc. from an
edition marked as follows:</p><p align="center">Auburn:<br>Derby and
Miller.<br>Buffalo:<br>Geo. H. Derby and Co.<br>1853</p>
When HTML parsor parses it, it thrwo out exception on <br> tag, which
is good. then the parsor striped out all the tags and returned the
text only.
<root>This text was prepared by Logos Research Systems, Inc. from an
edition marked as follows:Auburn:Derby and Miller.Buffalo:Geo. H.
Derby and Co.1853
</root>
So, in the front end , it seems it worked, in fact it did not.
Thanks,
ZJ Li
More information about the jsword-devel
mailing list