[sword-devel] Detecting and correcting poor hyphenation in source texts?
David Haslam
d.haslam at ukonline.co.uk
Fri Jan 2 09:24:38 MST 2009
While reading through Hebreux 6 in the French BBB (Go Bible on my K750i),
today I found some inappropriate spaces occurring immediately after a
hyphen. A search for "- " found 45 of these bad hyphenations, but three of
these were valid. I have done a manual search and replace on the
source-text, and then rebuilt the FrenchBBB Go Bible.
I am reporting this because CrossWire also has a SWORD beta-module for the
FrenchBBB.
Generalising from this, detecting bad hyphenation requires a knowledge of
the language, else how can one distinguish it from valid hyphenation. The
instance that caught my eye was "pe- tit", which should be "petit".
The usual rejoinder one gets from CrossWire when even minor source text
issues are observed is, "Wait until we get a better source!" From a
practical viewpoint, we should admit that this rarely happens, especially
for such minor blemishes that can easily occur because of word-wrapping or
during OCR.
I don't have a generic solution, but I do wish to start a discussion. Any
ideas? What can we do to help our "suppliers" when such "proof-reading
errors" are found?
-- David Haslam
--
View this message in context: http://www.nabble.com/Detecting-and-correcting-poor-hyphenation-in-source-texts--tp21253460p21253460.html
Sent from the SWORD Dev mailing list archive at Nabble.com.
More information about the sword-devel
mailing list