<div>"Here's one I made earlier."<br></div><div><br></div><div><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;">Comment...</span><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;">| Normalize to NFC excluding any Hebrew text</span><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;">| </span><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;">| NB. Does not expect any alphabetical presentation forms!</span><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;">|</span><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;">+--Perl pattern [[\x{0590}-\x{05FF}]+] with []</span><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;"> | [X] Match case</span><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;"> | [ ] Whole words only</span><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;"> | [ ] Case sensitive replace</span><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;"> | [ ] Prompt on replace</span><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;"> | [ ] Skip prompt if identical</span><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;"> | [ ] First only</span><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;"> | [ ] Extract matches</span><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;"> | Maximum text buffer size 4096</span><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;"> | [X] Maximum match (greedy)</span><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;"> | [ ] Allow comments</span><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;"> | [ ] '.' matches newline</span><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;"> | [X] UTF-8 Support</span><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;"> |</span><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;"><br></span></div><div><span class="font" style="font-family: menlo, consolas, courier new, monospace, sans-serif;"> +--NFC - Canonical Decomposition, followed by Canonical Composition</span><br></div><div> <br></div><div><i>NB. That's merely the clipboard copy of the filter for illustration purposes.</i><br></div><div><br></div><div class="protonmail_signature_block"><div class="protonmail_signature_block-user"><div>Best regards,<br></div><div><br></div><div>David<br></div></div><div><br></div><div class="protonmail_signature_block-proton">Sent with <a href="https://protonmail.com">ProtonMail</a> Secure Email.<br></div></div><div><br></div><blockquote class="protonmail_quote" type="cite"><div>-------- Original Message --------<br></div><div>Subject: Re: [sword-devel] Module .conf files, Unicode Normalization<br></div><div>Local Time: 6 January 2018 7:26 PM<br></div><div>UTC Time: 6 January 2018 19:26<br></div><div>From: dfhdfh@protonmail.com<br></div><div>To: SWORD Developers' Collaboration Forum <sword-devel@crosswire.org><br></div><div><br></div><div>Good question, Tom.<br></div><div><br></div><div>Assuming that the Latin script part of the source text actually required normalization to NFC,<br></div><div>and that at least some of the Biblical Hebrew should not be converted to NFC,<br></div><div>you'd build the module using the -N switch of osis2mod, after first applying a script <br></div><div>to the source text to ensure that both the requirements were implemented.<br></div><div><br></div><div>It would be a very simple task for a bespoke TextPipe filter with a restrict filter <br></div><div>designed to limit the Convert to NFC subfilter to the text that was not Hebrew.<br></div><div><br></div><div>Ignoring alphabetical presentation forms, all the Hebrew characters are in one Unicode block.<br></div><div>A PCRE to exclude the Hebrew would be very simple.<br></div><div>I could almost do it in my sleep after 17 years using TextPipe.<br></div><div>No doubt other programmers could do likewise with Perl or Python, etc.<br></div><div><br></div><div>Best regards,<br></div><div><br></div><div>David<br></div><div><br></div><div>Sent from ProtonMail Mobile<br></div><div><div><br></div><div><div><br></div><div>On Sat, Jan 6, 2018 at 19:14, Tom Sullivan <<a class="" href="mailto:info@beforgiven.info">info@beforgiven.info</a>> wrote:<br></div></div><blockquote type="cite" class="protonmail_quote">Y'all: For text, such as in a commentary, which includes both Hebrew and English (or another modern Latin script using language), what do you put for the normalization? Tom Tom Sullivan info@BeForgiven.INFO FAX: 815-301-2835 ---------------------
Great News! God created you, owns you and gave you commands to obey. You have disobeyed God - as your conscience very well attests to you. God's holiness and justice compel Him to punish you in Hell. Jesus Christ became Man, was crucified, buried
and rose from the dead as a substitute for all who trust in Him, redeeming them from Hell. If you repent (turn from your sin) and believe (trust) in Jesus Christ, you will go to Heaven. Otherwise you will go to Hell. Warning! Good works are a
result, not cause, of saving trust. More info is at www.esig.beforgiven.info Do you believe this? Copy this signature into your email program and use the Internet to spread the Great News every time you email. On 01/06/2018 12:32 PM, David Haslam
wrote: > Hi Greg, > > One area where it might turn out to be useful is for the search features > of front-end apps. > It could be important to know that the underlying module text is _not_ > *NFC*. > > That's not to lay
down a requirement as to how search features should be > designed, > but at least to provide the information in case it does matter for some > types of search option. > > Like other things in .conf files, a key can also be _educational_.
> It may prompt developers and users to ask, /*Why did they do this?*/ > > cf. It was _almost by accident_ that in 2014, I first came across this > aspect of using Unicode for Biblical Hebrew. > /It applies only to texts with _both_
vowel accents and cantillation./ > > Even though it's mentioned in our developers' wiki, it's all too easily > missed by other CrossWire volunteers. > > Best regards, > > David > > Sent with ProtonMail
Secure Email. > >> -------- Original Message -------- >> Subject: Re: [sword-devel] Module .conf files, Unicode Normalization >> Local Time: 6 January 2018 5:19 PM >> UTC Time: 6 January 2018 17:19 >> From: greg.hellings@gmail.com
>> To: David Haslam
, SWORD Developers' >> Collaboration Forum
>> >> Why would the front end or engine need to know this information? Would >> it help the front end developers or users to know it? What do we gain >> by adding this? (I'm not implying it wouldn't be beneficial. But the >>
only thing I know about Unicode is how the different UTF encodings >> work, so I have no idea what use this information could be. I also >> think changes to formats and information standards should be >> conservative
instead of liberal) >> >> --Greg >> >> On Jan 6, 2018 11:01, "David Haslam"
>
> wrote: >> >> Dear all, >> >> We've known for quite a few years that there are aspects of >> *Biblical Hebrew* that mean we should _avoid_ converting the >> Unicode source text to *NFC* when
we build a module. >> >> This prompts me to suggest that we ought to define a new *key* for >> .conf files. >> >> *Normalization=NFC* (this would be the default, and may be >> _omitted_ for
the vast majority of modules) >> *Normalization=Custom* (we should include this in certain Biblical >> Hebrew modules) >> >> This would make it clear to front-end developers and users alike >>
that the source text was _not_ converted to NFC during module build. >> i.e. *osis2mod* was used intentionally with the *-N* switch, in >> _accordance with the requirements of the source text provider_. >>
>> The Unicode source text may already be encoded in *UTF-8* ; this >> memo is /only /about normalization. >> >> In the rare eventuality that there could arise a requrement for >> any of the other
three normalization forms (*NFD*, *NFKC*, *NFKD*) >> defined by the Unicode Consortium, >> these would also be permitted values for the conf file key. >> >> A further benefit arises when a module needs
to be updated. >> If the modules team sees that the .conf file includes the line >> *Normalization=Custom* >> they would be forewarned against converting to NFC through >> /inadvertently/ omitting the
*-N* switch during module build. >> >> _Aside_: Another language with a need for non-standard >> normalization is *Tibetan*. We don't yet have a module in that script. >> >> Best regards, >>
>> David >> >> Sent with ProtonMail
Secure Email. >> >> >> _______________________________________________ >> sword-devel mailing list: sword-devel@crosswire.org >>
>> http://www.crosswire.org/mailman/listinfo/sword-devel >>
>> Instructions to unsubscribe/change your settings at above page > > > ______________________________________________________________________ > This email has been scanned by the Symantec Email Security.cloud service. > For more
information please visit http://www.symanteccloud.com > ______________________________________________________________________ > > > _______________________________________________ > sword-devel mailing
list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page > _______________________________________________
sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page<br></blockquote></div></blockquote><div><br></div>