[sword-devel] testing for diacritics

Peter von Kaehne refdoc at gmx.net
Tue Sep 1 14:41:46 MST 2015


On Fri, 2015-08-28 at 14:13 -0400, Ryan wrote:
> On Thu, 2015-08-27 at 23:22 +0100, Peter von Kaehne wrote:
> > Is there a clever and reliable way one could test in a given OSIS 
> > text
> > to see whether it contains diacritically enhanced texts or not? 
> > Perl,
> > preferably. 
> > 
> > Specifically Hebrew, Arabic type alphabets and Greek - for all of 
> > which
> > we have special a GlobalOptionFilter.
> 
> Given a variable with a copy of the text using the unicode NFD
> normalization, I would think that all you would need to do is test 
> for
> the presence of the specific diacritic marks themselves. 

Thanks - but as I said in a previous email I did not want to test for
individual items in my proposed utility as the filters (at least the
Arabic one) will likely grow in future. Testing should be done using
the engine. 

The amount of available Arabic diacritical marks  is endless and not
even remotely touched by our filter (which covers only standard Arabic
and Persian). So any new item added to our filters would require
amendment to the script to.

I have now created a c++ example which is in svn for the kind of 
 utility which I meant. It works as I wanted it - relying on the engine
to do the lifting. I guess this is my own answer to my query.

sword/examples/cmdline/stripaccents.cpp

Peter






More information about the sword-devel mailing list