[sword-devel] Fix for &
DM Smith
dmsmith555 at yahoo.com
Thu Sep 28 05:06:18 MST 2006
On Sep 28, 2006, at 12:32 AM, Karl Kleinpaste wrote:
> DM Smith <dmsmith555 at yahoo.com> writes:
>> Entities that are not handled via html should not be passed
>> through. So, if there were an entity &disclaimer; for example, it
>> should be stripped.
>
> I believe I disagree. If some &symbol; is unknown to Sword (say,
> because some new HTML standard has come along, already implemented in
> GtkHTML [which GS uses], so that a Sword module is produced which
> contains it, yet Sword itself has not been updated to recognize it),
> why shouldn't Sword simply pass it through? The fact that Sword
> doesn't know about &disclaimer; is no guarantee that both the module
> author and the end-line HTML renderer can't be perfectly happy with
> it -- Sword may quite possibly be behind the curve.
The behavior of entities are well defined in xml. If the entity does
not have a definition in the DTD it is an error. More interestingly
(at least to me), schemas, with which OSIS is defined, do not support
the definition of entities. The famous 4 are predefined.
One of the fundamental uses of entities in writing a DTD is that of a
non-parameterized, conditional macro. When an entity is expanded, it
is recursively processed for entities. There are two forms of
entities: &entity; and %entity; One of the common scenarios is to use
% entities to modularize a DTD into separate files that are included.
There is also a mechanism to allow for a document to override any
entity.
Given this, without processing a DTD for a document for all entities
via a robust entity resolver, it is impossible to know what
&disclaimer; resolves to.
ThML, as a Voyager superset, supports 3 sets of entities Latin-1,
symbol and special. The famous 4 are in special and ' is not
defined anywhere. With the inertia of Microsoft's Internet Explorer,
I don't expect any changes in this arena.
For details see: http://www.w3.org/TR/1998/WD-html-in-xml-19981205/
dtd.html
Of these, Sword's ThML filters handle/support Latin-1 and the famous
4. (You did find a bug here)
> And in fact, it surely is, in a few small areas. For example,
> WinSword/BibleCS doesn't implement <u> or <font color=...>, though it
> implements <b> and <i>. Conversely, GtkHTML implements <u> and <font
> color=...> but does not have support for <sup>. So pass the source
> material and let the renderer take its best shot.
I think it is important that we have some guarantee of well-defined
XML in SWORD. XML states that undefined entities are an error that
produce a hard stop. A system that uses the SWORD engine and uses an
xml parser should have a reasonable expectation that the text it is
given will not cause it to abend.
More information about the sword-devel
mailing list