[sword-devel] Entities in modules

Sebastien Koechlin seb.sword at koocotte.org
Thu Nov 12 10:08:49 MST 2009


On Wed, Nov 11, 2009 at 03:50:12PM -0500, DM Smith wrote:
> We have a few modules that have entities in them. These are of the fashion
>   (a character entity), U (a numeric decimal entity) and Å
> (a numeric hex entity).
> 
> These cause various problems:

This is because osis2mod does not use an XML parser. Character entitie is
just a useful way to write a characters you can not or you want not to
put in your XML file. When parsed and resolved, they must not be
distinguable from others characters. The same apply for CDATA sections.

osis2mod should not keep entities when reading an OSIS file. I think it's a
big mistake and we should not rely on external programs many people will
have trouble to run.

We also had troubles with non-canonical Unicode sequences and I think
osis2mod was corrected.

Named entities as nbsp came from HTML and should not be used in OSIS as they
are not declared in osisCore.2.1.1.xsd, it result in an invalid document.
BUT, as we do not use an XML parser, we can use the HTML DTD[1] to resolve its
and be more friendly with OSIS writers.


[1] see thoses URL, for this a perl program can produce a .cc or .h file.
	http://www.w3.org/TR/html4/HTMLlat1.ent
	http://www.w3.org/TR/html4/HTMLsymbol.ent
	http://www.w3.org/TR/html4/HTMLspecial.ent


(Sorry if my message look rude, I'm not native english speaker)

-- 
Sébastien Koechlin



More information about the sword-devel mailing list