[sword-devel] Module .conf files, Unicode Normalization

David Haslam dfhdfh at protonmail.com
Sat Jan 6 09:59:50 MST 2018


Dear all,

We've known for quite a few years that there are aspects of Biblical Hebrew that mean we should avoid converting the Unicode source text to NFC when we build a module.

This prompts me to suggest that we ought to define a new key for .conf files.

Normalization=NFC (this would be the default, and may be omitted for the vast majority of modules)
Normalization=Custom (we should include this in certain Biblical Hebrew modules)

This would make it clear to front-end developers and users alike that the source text was not converted to NFC during module build.
i.e. osis2mod was used intentionally with the -N switch, in accordance with the requirements of the source text provider.

The Unicode source text may already be encoded in UTF-8 ; this memo is only about normalization.

In the rare eventuality that there could arise a requrement for any of the other three normalization forms (NFD, NFKC, NFKD) defined by the Unicode Consortium,
these would also be permitted values for the conf file key.

A further benefit arises when a module needs to be updated.
If the modules team sees that the .conf file includes the line
Normalization=Custom
they would be forewarned against converting to NFC through inadvertently omitting the -N switch during module build.

Aside: Another language with a need for non-standard normalization is Tibetan. We don't yet have a module in that script.

Best regards,

David

Sent with [ProtonMail](https://protonmail.com) Secure Email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20180106/82dc041a/attachment.html>


More information about the sword-devel mailing list