[sword-devel] Soft hyphens?
David Haslam
dfhmch at googlemail.com
Sat Apr 1 09:07:03 MST 2017
Someone once developed an algorithm called *KUCut* to insert zero width
spaces into Thai text.
Not sure of the current state of play, but I do know that the text used as
the test bed for machine learning was the *ThaiKJV* of Philip Pope, which
was the source text for our module.
An unrelated discussion on the same subject gives the flavour.
https://stackoverflow.com/questions/8492763/thai-line-breaking-how-to-break-thai-text-effectively
It was Michael Hart that alerted me to this back in 2012 or earlier.
WBTC (as was) even used *KUCut* to add ZWSP to their *ThaiERV* translation
to improve word-wrapping.
*KUCut* was described here
http://veer66.wordpress.com/2009/11/23/kucutwindows/
Back in 2012, the Python source code was maintained here
https://bitbucket.org/veer66/kucut
And there's an online demo (probably the same source) here:
http://www.thai-language.com/?nav=zwsp
There's now a *KUCut* repository on GitHub.
Thai isn't unique, either. See
https://en.wikipedia.org/wiki/Category:Writing_systems_without_word_boundaries
But we won't go there ..... yet!
Tag this for "something to do on a rainy day".
Blessings,
David
PS. Not checked to see if any of the above links are broken.
--
View this message in context: http://sword-dev.350566.n4.nabble.com/Soft-hyphens-tp4657045p4657050.html
Sent from the SWORD Dev mailing list archive at Nabble.com.
More information about the sword-devel
mailing list