[sword-devel] Soft hyphens?

David Haslam dfhmch at googlemail.com
Sat Apr 1 09:07:03 MST 2017


Someone once developed an algorithm called *KUCut* to insert zero width
spaces into Thai text.

Not sure of the current state of play, but I do know that the text used as
the test bed for machine learning was the *ThaiKJV* of Philip Pope, which
was the source text for our module.

An unrelated discussion on the same subject gives the flavour.
https://stackoverflow.com/questions/8492763/thai-line-breaking-how-to-break-thai-text-effectively

It was Michael Hart that alerted me to this back in 2012 or earlier.
WBTC (as was) even used *KUCut* to add ZWSP to their *ThaiERV* translation
to improve word-wrapping.

*KUCut* was described here
http://veer66.wordpress.com/2009/11/23/kucutwindows/

Back in 2012, the Python source code was maintained here
https://bitbucket.org/veer66/kucut

And there's an online demo (probably the same source) here:
http://www.thai-language.com/?nav=zwsp

There's now a *KUCut* repository on GitHub.

Thai isn't unique, either. See
https://en.wikipedia.org/wiki/Category:Writing_systems_without_word_boundaries

But we won't go there .....  yet!

Tag this for "something to do on a rainy day".

Blessings,

David

PS. Not checked to see if any of the above links are broken.









--
View this message in context: http://sword-dev.350566.n4.nabble.com/Soft-hyphens-tp4657045p4657050.html
Sent from the SWORD Dev mailing list archive at Nabble.com.



More information about the sword-devel mailing list