[jsword-devel] bug report: thml parsing error

NF lzj369 at gmail.com
Fri Mar 13 12:50:32 MST 2009


Dear DM/other buddies,

in an effort to compile gen book module, I noticed that the current
transformation of thml module has a bug. Not sure if it is fixed .

in order to transform text lke this:

<p>仅仅有爱是不够的你也许记得甲壳虫乐队唱的那句歌词:“你需要的只是爱。”我完全不赞同此观点。 </p>


the above sentence is in Chinese, but I guess the issue will the same
as other lang.

after transforming, it became:

<p仅仅有爱是不够的你也许记得甲壳虫乐队唱的那句歌词:“你需要的只是爱。”我完全不赞同此观点。> </p>


I have debugged another existing thml modules, I found out they are
not transformed either. But

the affected class is :  PrettySerializingContentHandler

	public void startElement(String uri, String localname, String qname,
			Attributes attrs) {
		if (depth > 0) {
			handlePending();
		}

		write(getTagStart());
		write(decorateTagName(localname));

		for (int i = 0; i < attrs.getLength(); i++) {
			write(' ');
			write(decorateAttributeName(XMLUtil.getAttributeName(attrs, i)));
			write("='"); //$NON-NLS-1$
			write(decorateAttributeValue(XMLUtil.escape(attrs.getValue(i))));
			write('\'');
		}

		pendingEndTag = true;
this line  ==>		lookingForChars = false;
		depth++;
	}

I added the line above so I can move on, but there is still something
wrong.  I will continue to research this issue when I got a chance.

This line :

<p>This text was prepared by Logos Research Systems, Inc. from an
edition marked as follows:</p><p align="center">Auburn:<br>Derby and
Miller.<br>Buffalo:<br>Geo. H. Derby and Co.<br>1853</p>

When HTML parsor parses it, it thrwo out exception on <br> tag, which
is good. then the parsor striped out all the tags and returned the
text only.
<root>This text was prepared by Logos Research Systems, Inc. from an
edition marked as follows:Auburn:Derby and Miller.Buffalo:Geo. H.
Derby and Co.1853
</root>
So, in the front end , it seems it worked, in fact it did not.

Thanks,

ZJ Li



More information about the jsword-devel mailing list