[sword-devel] XML Numeric character references (entities) in BibleCS
Chris Little
chrislit at crosswire.org
Thu Jan 31 15:16:14 MST 2008
On Jan 31, 2008, at 1:29 PM, Benny Wasty wrote:
> Hello,
>
> I noticed that BibleCS doesn't seem to be able to display unicode
> characters encoded as numeric character references (e.g. ö) in an
> OSIS module I am currently working on. The characters are just
> omitted.
> I guess they should be displayed correctly, as this a "basic" XML
> feature as far as I know.
> BibleDesktop shows them by the way.
Correct, Sword does not handle numbered entities. I don't think we
want to add support for them at runtime either, because doing so would
1) waste processor time in converting to UTF-8 and 2) waste a lot of
storage space compared to UTF-8. I will, however add a todo to the bug
tracker to do conversion to UTF-8 during import.
All data in modules is assumed to be NFC normalized UTF-8.
I haven't looked at the code or tested this, but I would be willing to
bet BibleDesktop is displaying you characters correctly but wouldn't
match them in a search.
--Chris
More information about the sword-devel
mailing list