[sword-devel] Module .conf files, Unicode Normalization

David Haslam dfhdfh at protonmail.com
Sat Jan 6 10:32:04 MST 2018


Hi Greg,

One area where it might turn out to be useful is for the search features of front-end apps.
It could be important to know that the underlying module text is not NFC.

That's not to lay down a requirement as to how search features should be designed,
but at least to provide the information in case it does matter for some types of search option.

Like other things in .conf files, a key can also be educational.
It may prompt developers and users to ask, Why did they do this?

cf. It was almost by accident that in 2014, I first came across this aspect of using Unicode for Biblical Hebrew.
It applies only to texts with both vowel accents and cantillation.

Even though it's mentioned in our developers' wiki, it's all too easily missed by other CrossWire volunteers.

Best regards,

David

Sent with [ProtonMail](https://protonmail.com) Secure Email.

> -------- Original Message --------
> Subject: Re: [sword-devel] Module .conf files, Unicode Normalization
> Local Time: 6 January 2018 5:19 PM
> UTC Time: 6 January 2018 17:19
> From: greg.hellings at gmail.com
> To: David Haslam <dfhdfh at protonmail.com>, SWORD Developers' Collaboration Forum <sword-devel at crosswire.org>
>
> Why would the front end or engine need to know this information? Would it help the front end developers or users to know it? What do we gain by adding this? (I'm not implying it wouldn't be beneficial. But the only thing I know about Unicode is how the different UTF encodings work, so I have no idea what use this information could be. I also think changes to formats and information standards should be conservative instead of liberal)
>
> --Greg
>
> On Jan 6, 2018 11:01, "David Haslam" <dfhdfh at protonmail.com> wrote:
>
>> Dear all,
>>
>> We've known for quite a few years that there are aspects of Biblical Hebrew that mean we should avoid converting the Unicode source text to NFC when we build a module.
>>
>> This prompts me to suggest that we ought to define a new key for .conf files.
>>
>> Normalization=NFC (this would be the default, and may be omitted for the vast majority of modules)
>> Normalization=Custom (we should include this in certain Biblical Hebrew modules)
>>
>> This would make it clear to front-end developers and users alike that the source text was not converted to NFC during module build.
>> i.e. osis2mod was used intentionally with the -N switch, in accordance with the requirements of the source text provider.
>>
>> The Unicode source text may already be encoded in UTF-8 ; this memo is only about normalization.
>>
>> In the rare eventuality that there could arise a requrement for any of the other three normalization forms (NFD, NFKC, NFKD) defined by the Unicode Consortium,
>> these would also be permitted values for the conf file key.
>>
>> A further benefit arises when a module needs to be updated.
>> If the modules team sees that the .conf file includes the line
>> Normalization=Custom
>> they would be forewarned against converting to NFC through inadvertently omitting the -N switch during module build.
>>
>> Aside: Another language with a need for non-standard normalization is Tibetan. We don't yet have a module in that script.
>>
>> Best regards,
>>
>> David
>>
>> Sent with [ProtonMail](https://protonmail.com) Secure Email.
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20180106/a8bfe423/attachment-0001.html>


More information about the sword-devel mailing list