<div dir="ltr">Hi Cyrille, <br><br>I am preparing to study breakpoints for Cebuano to produce a hunspell hyphenation list, but haven't completed the process of implementing it. I am working from 3 paper Cebuano bibles typeset at different times, and manually copying the existing hyphenated words into a list. <br><br>Here's my proposed process to produce a preliminary hyphenation dictionary<br>1. study the (vowels OR consonants) before the hyphen + vowels OR consonants after the hyphen. <br> the entire group of vowels together or consonants together. <div> That is, for English the matches for detecting breaking letter boundary frequency looks (technically) something like: </div><div> ([aeiouy]*|[bcdfghjklmnpqrstvwxz]*)Ux00AD([aeiouy]*|[bcdfghjklmnpqrstvwxz]*)</div><div> but you'd need to work out turning the matches from this regex into a list of boundary pairs and frequencies. <br><div>2. This should yield a list of the most common hyphenation points for the language.<br>3. Auto insert the Hunspell hyphenation numbering into the dictionary. <br></div></div><div> For Cebuano, I am hoping to use just the letter combinations in the hunspell dictionary. I have hopes that hunspell can accomodate this, but I haven't completed the word list to analyze yet, so my hopes are based only on reading the documentation about hunspell and hyphenation, and looking through some of the existing examples. <br><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Nov 3, 2017 at 9:39 AM, Cyrille <span dir="ltr"><<a href="mailto:lafricain79@gmail.com" target="_blank">lafricain79@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
It becomes a bit difficult for me to follow this post with all these
technical terms in another language <span class="m_-6050787621622657054moz-smiley-s1"><span>:-)</span></span>
<span class="m_-6050787621622657054moz-smiley-s2"><span>:-(</span></span> But what I can
tell you is that I am very interested in a hyphenation dictionary. I
have already created a spelling <a href="https://gitlab.com/lafricain79/kituba-dic" target="_blank">dictionary for
kikongo</a>, and have created an <a href="https://extensions.libreoffice.org/extensions/kituba-kikongo-ya-leta-dictionary" target="_blank">extension
for libreoffice</a> and a hunspell dictionary for Linux. This now
allows us to enable spell checking in kikongo/kituba. This is all
the more interesting as Verbum Bible is translating the Old
Testament and the Roman Missal. My intention is the same for
lingala, once finished the module I will create a dictionary.
Moreover for the lingala case the hunspell .aff file already exists!<br>
Now for the hyphenation dictionary, I read about it and it seemed
like a tedious operation, so if you have a solution to offer me to
take advantage of already existing hyphenation words it would be
great.<br>
I open an issue about this on <a href="https://gitlab.com/lafricain79/LinVB/issues/11" target="_blank">gitlab</a>.<br>
<br>
Br Cyrille<div><div class="h5"><br>
<br>
<div class="m_-6050787621622657054moz-cite-prefix">Le 03/11/2017 à 12:37, David Haslam a
écrit :<br>
</div>
<blockquote type="cite">
<pre>I had similar thoughts as Michael outlined.
This morning, I compiled an Excel workbook tabulating the Lingala words
found to contain a soft hyphen.
It has been attached to the issue in the GitLab repo.
<a class="m_-6050787621622657054moz-txt-link-freetext" href="https://gitlab.com/lafricain79/LinVB/issues/10" target="_blank">https://gitlab.com/<wbr>lafricain79/LinVB/issues/10</a>
And - yes - it's not only incomplete as a dictionary, but it's also further
evidence of inconsistency.
The use of soft hyphens was entirely an ad hoc operation done to address
contingencies.
As a CrossWire volunteer, I don't consider this sort of activity to be
outside our purview.
We're here to assist other Bible Agencies too, just as we say in our
website.
Or, if you like, we can count it as "going the extra mile".
And I'm sure that Fr Cyrille appreciates the spirit in which this service is
provided.
Best regards,
David
--
Sent from: <a class="m_-6050787621622657054moz-txt-link-freetext" href="http://sword-dev.350566.n4.nabble.com/" target="_blank">http://sword-dev.350566.n4.<wbr>nabble.com/</a>
______________________________<wbr>_________________
sword-devel mailing list: <a class="m_-6050787621622657054moz-txt-link-abbreviated" href="mailto:sword-devel@crosswire.org" target="_blank">sword-devel@crosswire.org</a>
<a class="m_-6050787621622657054moz-txt-link-freetext" href="http://www.crosswire.org/mailman/listinfo/sword-devel" target="_blank">http://www.crosswire.org/<wbr>mailman/listinfo/sword-devel</a>
Instructions to unsubscribe/change your settings at above page
</pre>
</blockquote>
<br>
</div></div></div>
<br>______________________________<wbr>_________________<br>
sword-devel mailing list: <a href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a><br>
<a href="http://www.crosswire.org/mailman/listinfo/sword-devel" rel="noreferrer" target="_blank">http://www.crosswire.org/<wbr>mailman/listinfo/sword-devel</a><br>
Instructions to unsubscribe/change your settings at above page<br></blockquote></div><br></div>