[sword-devel] Fix for & (Re: Updating Clarke commentary to become readable)
DM Smith
dmsmith555 at yahoo.com
Wed Sep 27 19:19:21 MST 2006
On Sep 27, 2006, at 12:24 PM, Karl Kleinpaste wrote:
>
> That is, the reason & wasn't being properly handled is because [a]
> all those EscapeSequences in thmlhtml.cpp being commented out lead to
> handleEscapeString() returning false -- no substitutions exist -- and
> so [b] because passThruUnknownEsc is false (see ctor), all &symbols;
> are dropped. The code was actually willfully eliminating every
> possible such &symbol;. Turning on passThruUnknownEsc lets them go by
> unmolested.
I think the intention of the code was to let known entities pass
through.
>
> What I don't know is if this should be considered a correct fix,
> rather than just one that makes it work for me, a GS user. That is,
> why would it ever be desirable _not_ to pass a &symbol; just because
> it's not known to the particular substitution set coded? Hence, I
> *think* it's correct just to turn on passThruUnknownEsc globally, but
> I'm not positive.
I think that you found that the code expected to strip out unknown
entities. Entities that are not handled via html should not be passed
through. So, if there were an entity &disclaimer; for example, it
should be stripped. When the block of addEscapeStringSubstitute was
commented out, it changed the behavior.
I think the correct fix is to have something *like* the following:
bool ThMLHTML::substituteEscapeString(SWBuf &buf, const char
*escString) {
DualStringMap::iterator it;
if (!escStringCaseSensitive) {
char *tmp = 0;
stdstr(&tmp, escString);
toupperstr(tmp);
it = p->escSubMap.find(tmp);
delete [] tmp;
} else
it = p->escSubMap.find(escString);
if (it != p->escSubMap.end()) {
// This is the one line that changes
// It probably should get the declared escapeStart and
escapeEnd
buf += '&' + escString + ';';
return true;
}
return false;
}
And then uncomment the section in ThMLHTML ctor that declares the
entity replacements. Note, the famous 4 entities &, ", >
and <) should not be replaced, but should be passed. And IE does
not handle ' in xhtml (don't know about html) so it should be
replaced. I don't know that all browsers (used by Sword applications)
can handle Latin-1 entities. Recent ones do. The lowest common
denominator would be to replace them with Latin-1 or UTF-8, depending
on the encoding.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.crosswire.org/pipermail/sword-devel/attachments/20060927/8765e83f/attachment.html
More information about the sword-devel
mailing list