[sword-devel] NFC and osis2mod
DM Smith
dmsmith555 at yahoo.com
Thu Jan 31 18:19:14 MST 2008
Can someone offer some pointers as to what I am doing wrong?
I am trying to add the ability to osis2mod to optionally ensure that a
UTF-8 document is normalized to NFC.
I added -n as a flag to indicate that normalization should occur and
set a global boolean variable "normalize" to true iff the flag is
present.
Rather than reinventing the wheel, I figured Sword's UTF8NFC filter
would be the ticket.
First I added the header with:
#ifdef _ICU_
#include <utf8nfc.h>
#endif
And I created a global variable:
#ifdef _ICU_
UTF8NFC normalizer;
#endif
Then right before adding the entry I ran it through the filter:
#ifdef _ICU_
if (normalize) {
normalizer.processText(activeVerseText, (SWKey *)2); // note the
hack of 2 to mimic a real key. TODO: remove all hacks
}
#endif
Now I ran the KJV.xml at www.crosswire.org/~dmsmith/kjv2006 through
osis2mod.
Since I thought I had already normalized the text, I expected a diff
to show nothing.
However I found corruption in Matthew 3:17 at the end of the raw text
in the module. (and many places later.)
The corruption is always at the end of the line. Here is the raw text
for that verse:
<w lemma="strong:G3588" morph="robinson:T-NSM" src="13"></w><w
lemma="strong:G2532" morph="robinson:CONJ" src="1">And</w> <w
lemma="strong:G2400" morph="robinson:V-2AAM-2S" src="2">lo</w> <w
lemma="strong:G5456" morph="robinson:N-NSF" src="3">a voice</w> <w
lemma="strong:G1537" morph="robinson:PREP" src="4">from</w> <w
lemma="strong:G3588 strong:G3772" morph="robinson:T-GPM robinson:N-
GPM" src="5 6">heaven</w>, <w lemma="strong:G3004" morph="robinson:V-
PAP-NSF" src="7">saying</w>, <w lemma="strong:G3778" morph="robinson:D-
NSM" src="8">This</w> <w lemma="strong:G2076" morph="robinson:V-
PXI-3S" src="9">is</w> <w lemma="strong:G3450" morph="robinson:P-1GS"
src="12">my</w> <w lemma="strong:G27" morph="robinson:A-NSM"
src="14">beloved</w> <w lemma="strong:G3588 strong:G5207"
morph="robinson:T-NSM robinson:N-NSM" src="10 11">Son</w>, <w
lemma="strong:G1722" morph="robinson:PREP" src="15">in</w> <w
lemma="strong:G3739" morph="robinson:R-DSM" src="16">whom</w> <w
lemma="strong:G2106" morph="robinson:V-AAI-1S" src="17">I am well
pleased</w>.<milestone resp="pdy 2003-12-14-08:48" type="x-
strongsMarkup"/>="22"꧁
Any help would be appreciated.
Thanks!
Working together,
DM Smith
More information about the sword-devel
mailing list