[sword-devel] Soft hyphens

Michael H cmahte at gmail.com
Fri Nov 3 08:31:47 MST 2017


Hi Cyrille,

I am preparing to study breakpoints for Cebuano to produce a hunspell
hyphenation list, but haven't completed the process of implementing it. I
am working from 3 paper Cebuano bibles typeset at different times, and
manually copying the existing hyphenated words into a list.

Here's my proposed process to produce a preliminary hyphenation dictionary
1. study the (vowels OR consonants) before the hyphen + vowels OR
consonants after the hyphen.
     the entire group of vowels together or consonants together.
        That is, for English the matches for detecting breaking letter
boundary frequency looks (technically) something like:

([aeiouy]*|[bcdfghjklmnpqrstvwxz]*)Ux00AD([aeiouy]*|[bcdfghjklmnpqrstvwxz]*)
        but you'd need to work out turning the matches from this regex into
a list of boundary pairs and frequencies.
2. This should yield a list of the most common hyphenation points for the
language.
3. Auto insert the Hunspell hyphenation numbering into the dictionary.
      For Cebuano, I am hoping to use just the letter combinations in the
hunspell dictionary.  I have hopes that hunspell can accomodate this, but I
haven't completed the word list to analyze yet, so my hopes are based only
on reading the documentation about hunspell and hyphenation, and looking
through some of the existing examples.


On Fri, Nov 3, 2017 at 9:39 AM, Cyrille <lafricain79 at gmail.com> wrote:

> It becomes a bit difficult for me to follow this post with all these
> technical terms in another language :-) :-( But what I can tell you is
> that I am very interested in a hyphenation dictionary. I have already
> created a spelling dictionary for kikongo
> <https://gitlab.com/lafricain79/kituba-dic>, and have created an extension
> for libreoffice
> <https://extensions.libreoffice.org/extensions/kituba-kikongo-ya-leta-dictionary>
> and a hunspell dictionary for Linux. This now allows us to enable spell
> checking in kikongo/kituba. This is all the more interesting as Verbum
> Bible is translating the Old Testament and the Roman Missal. My intention
> is the same for lingala, once finished the module I will create a
> dictionary. Moreover for the lingala case the hunspell .aff file already
> exists!
> Now for the hyphenation dictionary, I read about it and it seemed like a
> tedious operation, so if you have a solution to offer me to take advantage
> of already existing hyphenation words it would be great.
> I open an issue about this on gitlab
> <https://gitlab.com/lafricain79/LinVB/issues/11>.
>
> Br Cyrille
>
>
> Le 03/11/2017 à 12:37, David Haslam a écrit :
>
> I had similar thoughts as Michael outlined.
>
> This morning, I compiled an Excel workbook tabulating the Lingala words
> found to contain a soft hyphen.
>
> It has been attached to the issue in the GitLab repo.
> https://gitlab.com/lafricain79/LinVB/issues/10
>
> And - yes - it's not only incomplete as a dictionary, but it's also further
> evidence of inconsistency.
> The use of soft hyphens was entirely an ad hoc operation done to address
> contingencies.
>
> As a CrossWire volunteer, I don't consider this sort of activity to be
> outside our purview.
> We're here to assist other Bible Agencies too, just as we say in our
> website.
> Or, if you like, we can count it as "going the extra mile".
>
> And I'm sure that Fr Cyrille appreciates the spirit in which this service is
> provided.
>
> Best regards,
>
> David
>
>
>
> --
> Sent from: http://sword-dev.350566.n4.nabble.com/
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.orghttp://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>
>
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20171103/56b0a4cd/attachment-0001.html>


More information about the sword-devel mailing list