<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Hi Michael,<br>
Thank you for this informations, I have to read them carefully. But
can you give me an example with a file of dic with hyphenation.<br>
<br>
<div class="moz-cite-prefix">Le 03/11/2017 à 16:31, Michael H a
écrit :<br>
</div>
<blockquote type="cite"
cite="mid:CAJ9hia8JMU6R6snLtiG5hf44m4ACV1dBNkA9LZx8cr2DbZjxsA@mail.gmail.com">
<div dir="ltr">Hi Cyrille, <br>
<br>
I am preparing to study breakpoints for Cebuano to produce a
hunspell hyphenation list, but haven't completed the process of
implementing it. I am working from 3 paper Cebuano bibles
typeset at different times, and manually copying the existing
hyphenated words into a list. <br>
<br>
Here's my proposed process to produce a preliminary hyphenation
dictionary<br>
1. study the (vowels OR consonants) before the hyphen + vowels
OR consonants after the hyphen. <br>
the entire group of vowels together or consonants
together.
<div> That is, for English the matches for detecting
breaking letter boundary frequency looks (technically)
something like: </div>
<div>
([aeiouy]*|[bcdfghjklmnpqrstvwxz]*)Ux00AD([aeiouy]*|[bcdfghjklmnpqrstvwxz]*)</div>
<div> but you'd need to work out turning the matches from
this regex into a list of boundary pairs and frequencies. <br>
<div>2. This should yield a list of the most common
hyphenation points for the language.<br>
3. Auto insert the Hunspell hyphenation numbering into the
dictionary. <br>
</div>
</div>
<div> For Cebuano, I am hoping to use just the letter
combinations in the hunspell dictionary. I have hopes that
hunspell can accomodate this, but I haven't completed the word
list to analyze yet, so my hopes are based only on reading the
documentation about hunspell and hyphenation, and looking
through some of the existing examples. <br>
<br>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Fri, Nov 3, 2017 at 9:39 AM, Cyrille
<span dir="ltr"><<a href="mailto:lafricain79@gmail.com"
target="_blank" moz-do-not-send="true">lafricain79@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"> It becomes a bit
difficult for me to follow this post with all these
technical terms in another language <span
class="m_-6050787621622657054moz-smiley-s1"><span>:-)</span></span>
<span class="m_-6050787621622657054moz-smiley-s2"><span>:-(</span></span>
But what I can tell you is that I am very interested in a
hyphenation dictionary. I have already created a spelling
<a href="https://gitlab.com/lafricain79/kituba-dic"
target="_blank" moz-do-not-send="true">dictionary for
kikongo</a>, and have created an <a
href="https://extensions.libreoffice.org/extensions/kituba-kikongo-ya-leta-dictionary"
target="_blank" moz-do-not-send="true">extension for
libreoffice</a> and a hunspell dictionary for Linux.
This now allows us to enable spell checking in
kikongo/kituba. This is all the more interesting as Verbum
Bible is translating the Old Testament and the Roman
Missal. My intention is the same for lingala, once
finished the module I will create a dictionary. Moreover
for the lingala case the hunspell .aff file already
exists!<br>
Now for the hyphenation dictionary, I read about it and it
seemed like a tedious operation, so if you have a solution
to offer me to take advantage of already existing
hyphenation words it would be great.<br>
I open an issue about this on <a
href="https://gitlab.com/lafricain79/LinVB/issues/11"
target="_blank" moz-do-not-send="true">gitlab</a>.<br>
<br>
Br Cyrille
<div>
<div class="h5"><br>
<br>
<div class="m_-6050787621622657054moz-cite-prefix">Le
03/11/2017 à 12:37, David Haslam a écrit :<br>
</div>
<blockquote type="cite">
<pre>I had similar thoughts as Michael outlined.
This morning, I compiled an Excel workbook tabulating the Lingala words
found to contain a soft hyphen.
It has been attached to the issue in the GitLab repo.
<a class="m_-6050787621622657054moz-txt-link-freetext" href="https://gitlab.com/lafricain79/LinVB/issues/10" target="_blank" moz-do-not-send="true">https://gitlab.com/<wbr>lafricain79/LinVB/issues/10</a>
And - yes - it's not only incomplete as a dictionary, but it's also further
evidence of inconsistency.
The use of soft hyphens was entirely an ad hoc operation done to address
contingencies.
As a CrossWire volunteer, I don't consider this sort of activity to be
outside our purview.
We're here to assist other Bible Agencies too, just as we say in our
website.
Or, if you like, we can count it as "going the extra mile".
And I'm sure that Fr Cyrille appreciates the spirit in which this service is
provided.
Best regards,
David
--
Sent from: <a class="m_-6050787621622657054moz-txt-link-freetext" href="http://sword-dev.350566.n4.nabble.com/" target="_blank" moz-do-not-send="true">http://sword-dev.350566.n4.<wbr>nabble.com/</a>
______________________________<wbr>_________________
sword-devel mailing list: <a class="m_-6050787621622657054moz-txt-link-abbreviated" href="mailto:sword-devel@crosswire.org" target="_blank" moz-do-not-send="true">sword-devel@crosswire.org</a>
<a class="m_-6050787621622657054moz-txt-link-freetext" href="http://www.crosswire.org/mailman/listinfo/sword-devel" target="_blank" moz-do-not-send="true">http://www.crosswire.org/<wbr>mailman/listinfo/sword-devel</a>
Instructions to unsubscribe/change your settings at above page
</pre>
</blockquote>
<br>
</div>
</div>
</div>
<br>
______________________________<wbr>_________________<br>
sword-devel mailing list: <a
href="mailto:sword-devel@crosswire.org"
moz-do-not-send="true">sword-devel@crosswire.org</a><br>
<a
href="http://www.crosswire.org/mailman/listinfo/sword-devel"
rel="noreferrer" target="_blank" moz-do-not-send="true">http://www.crosswire.org/<wbr>mailman/listinfo/sword-devel</a><br>
Instructions to unsubscribe/change your settings at above
page<br>
</blockquote>
</div>
<br>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
sword-devel mailing list: <a class="moz-txt-link-abbreviated" href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a>
<a class="moz-txt-link-freetext" href="http://www.crosswire.org/mailman/listinfo/sword-devel">http://www.crosswire.org/mailman/listinfo/sword-devel</a>
Instructions to unsubscribe/change your settings at above page</pre>
</blockquote>
<br>
</body>
</html>