[sword-devel] Fix for &
DM Smith
dmsmith555 at yahoo.com
Thu Sep 28 05:27:40 MST 2006
On Sep 28, 2006, at 1:27 AM, Chris Little wrote:
> What is more, there is absolutely no necessity to use any entity other
> than & and < in Sword. Entities other than the XML set (&,
> ", ', <, >) are not supported at all in Sword and
> should
> not be used. There is no good reason to do so.
Just a minor quibble:
" is necessary within an attribute value, unless the attribute
is quoted with '. Because SWORD programmatically generates xml and
always uses " to quote attributes, it is necessary.
> is necessary in a few instances, e.g. <[CDATA[....]]> and <?....?
>. But I don't think they will occur in SWORD module.
' is not defined in HTML4.0.1 in the Voyager DTD. This implies
that attribute values are quoted with " when containing '.
Also, thmlrtf.cpp, thmlhtml.cpp, thmlplain.cpp and a few others have
explicit support for latin-1 entities and the 5 predefined. So SWORD
supports them in ThML. Given SWORD's history of backward
compatibility, I don't see this going away.
>
> Any other character should be encoded as UTF-8, not named entities.
I think this is a best placed in the module creation code, either as
a hard stop or as a conversion. The problem with automatically
converting them to unicode is that the module might be latin-1
otherwise and that would be bad. I also note that SWORD does not
support entities of the form &#xxx; which are allowed in a Latin-1
encoding.
In His Service,
DM
More information about the sword-devel
mailing list