[sword-devel] Using the -N option in osis2mod ?

DM Smith dmsmith at crosswire.org
Wed Jun 25 06:26:14 MST 2014


On Jun 25, 2014, at 8:53 AM, David Haslam <dfhmch at googlemail.com> wrote:

> My observations arise in connection with Hebrew Unicode text.

Hebrew probably should be NFD. My experiments with NFC did not look good (in both SWORD and JSword frontends) with various fonts having Hebrew support.

To have NFD the -N flag should be used. Otherwise it will become NFKC.

> 
> I do know why NFC is default, and why it's recommended.
> 
> The Hebrew MapM module is not NFC normalized, so there must have been a
> genuine reason why the -N option was used during its build. Another Hebrew
> module (from IBT) is also not normalized.
> 
> Likewise, an earlier version of the Hebrew WLC module was rebuilt without
> NFC, albeit the current release is normalized. Refer to the file wlc.conf
> for the history.
> 
> This suggests that the -N option can be made to work, but perhaps it has
> only ever been tested under Linux?  As a Windows user, I am curious as to
> why I could not get it to work at all. 

If SWORD is built with ICU then it should work. If it is not then it is the responsibility of the user to ensure that the text is properly encoded in UTF-8.

> 
> Though I can't go into any details, my OSIS XML source text is already
> UTF-8, and is valid to the OSIS schema.

That your text is UTF-8 is good, but it is not necessarily sufficient. I've seen a few texts that are UTF-8 but have both multiple representations (NFD, NFC, ...). It is really frustrating to figure out why the same word in two places looks different. Having osis2mod do normalization is marginally helpful.

It'd be better if all frontends used ICU and did the normalization. To my knowledge, none do. Some can't/won't. So, osis2mod to the rescue.

If you know your text is UTF-8 and uniformly in one encoding then use the -N flag. Also use it if you know that your text needs to be other than NF(K)C.

> 
> I am still curious as to why there was a historic reversion of normalization
> for the WLC module.

I think that NFC doesn't work for Hebrew.

> cf. I asked Chris, but he never responded, though I guess he's too busy this
> year.
> 
> Best regards,
> 
> David
> 
> 
> 
> 
> 
> 
> 
> 
> 
> --
> View this message in context: http://sword-dev.350566.n4.nabble.com/Using-the-N-option-in-osis2mod-tp4653983p4654013.html
> Sent from the SWORD Dev mailing list archive at Nabble.com.
> 
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4145 bytes
Desc: not available
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20140625/c093cce1/attachment.p7s>


More information about the sword-devel mailing list