[sword-devel] NFC and osis2mod

DM Smith dmsmith555 at yahoo.com
Fri Feb 1 06:43:26 MST 2008


I have put a copy at www.crosswire.org/~dmsmith/osis2mod.cpp if anyone 
can see if they can help.
Also, the email said I used kjv.xml. That should be kjvfull.xml. Here is 
the path to it: www.crosswire.org/~dmsmith/kjv2006/sword/kjvxml.zip

DM Smith wrote:
> Can someone offer some pointers as to what I am doing wrong?
>
> I am trying to add the ability to osis2mod to optionally ensure that a  
> UTF-8 document is normalized to NFC.
>
> I added -n as a flag to indicate that normalization should occur and  
> set a global boolean variable "normalize" to true iff the flag is  
> present.
>
> Rather than reinventing the wheel, I figured Sword's UTF8NFC filter  
> would be the ticket.
>
> First I added the header with:
>
> #ifdef _ICU_
> #include <utf8nfc.h>
> #endif
>
> And I created a global variable:
>
> #ifdef _ICU_
> UTF8NFC normalizer;
> #endif
>
>
> Then right before adding the entry I ran it through the filter:
>
> #ifdef _ICU_
> 			if (normalize) {
> 				normalizer.processText(activeVerseText, (SWKey *)2);  // note the  
> hack of 2 to mimic a real key. TODO: remove all hacks
> 			}
> #endif
>
> Now I ran the KJV.xml at www.crosswire.org/~dmsmith/kjv2006 through  
> osis2mod.
>
> Since I thought I had already normalized the text, I expected a diff  
> to show nothing.
>
> However I found corruption in Matthew 3:17 at the end of the raw text  
> in the module. (and many places later.)
>
> The corruption is always at the end of the line. Here is the raw text  
> for that verse:
> <w lemma="strong:G3588" morph="robinson:T-NSM" src="13"></w><w  
> lemma="strong:G2532" morph="robinson:CONJ" src="1">And</w> <w  
> lemma="strong:G2400" morph="robinson:V-2AAM-2S" src="2">lo</w> <w  
> lemma="strong:G5456" morph="robinson:N-NSF" src="3">a voice</w> <w  
> lemma="strong:G1537" morph="robinson:PREP" src="4">from</w> <w  
> lemma="strong:G3588 strong:G3772" morph="robinson:T-GPM robinson:N- 
> GPM" src="5 6">heaven</w>, <w lemma="strong:G3004" morph="robinson:V- 
> PAP-NSF" src="7">saying</w>, <w lemma="strong:G3778" morph="robinson:D- 
> NSM" src="8">This</w> <w lemma="strong:G2076" morph="robinson:V- 
> PXI-3S" src="9">is</w> <w lemma="strong:G3450" morph="robinson:P-1GS"  
> src="12">my</w> <w lemma="strong:G27" morph="robinson:A-NSM"  
> src="14">beloved</w> <w lemma="strong:G3588 strong:G5207"  
> morph="robinson:T-NSM robinson:N-NSM" src="10 11">Son</w>, <w  
> lemma="strong:G1722" morph="robinson:PREP" src="15">in</w> <w  
> lemma="strong:G3739" morph="robinson:R-DSM" src="16">whom</w> <w  
> lemma="strong:G2106" morph="robinson:V-AAI-1S" src="17">I am well  
> pleased</w>.<milestone resp="pdy 2003-12-14-08:48" type="x- 
> strongsMarkup"/>="22"꧁
>
>
> Any help would be appreciated.
>
> Thanks!
>
> Working together,
> 	DM Smith
>   



More information about the sword-devel mailing list