[sword-devel] Re: Accented Greek NT (Westcott-Hort)

Chris Little sword-devel@crosswire.org
Sat, 3 Apr 2004 22:50:02 -0700 (MST)


On Sat, 3 Apr 2004, Costas Stergiou wrote:

> Hi Chris,
> 
> > I suspect that the problem is either that the text is encoded using the
> > Extended Greek codepoints or the RTF control is transcoding them to use
> > these points.  I'm pretty sure the whole Extended Greek section violates
> > the recommendation that data be normalized as NFC.  It also makes
> > searching more problematic if we don't use NFC internally.  So, that said,
> > when we transcode the text to use Greek + combining marks instead of
> > precomposed characters, it might work fine with even Times New Roman.
> >
> 
> Can you explain which one is the best way to encode greek accented texts in
> order to be 'compatible' with what is most commonly accepted?
> I gather from the above that the greek+combining is the preferred way
> (instead of using precomposed chars), but i think precomposed chars tend to
> show up better (since the correct drawing of diacriticals is a bit
> difficult).
> 
> In Christ,
> Costas

Hi Costas,

I don't know whether greek+combining or extended greek is the preferred 
method, but NFC itself specifically is named as the preferred method of 
encoding.  NFC is just a Unicode normalization form for every 
codepoint/codepoint sequence.  The uconv utility from ICU can convert any 
Unicode file into NFC if you use the "Any-NFC" transliterator. (There's a 
copy of uconv at 
http://crosswire.org/ftpmirror/pub/sword/utils/win32/uconv.zip, but it 
might be too old to work for this purpose.)

Display is, of course, an issue.  One option is to convert NFC to combined 
characters whenever possible.  Another is to assume sufficiently 
intelligent rendering that will place combining characters in the correct 
locations anyway.  But as far as storage goes, NFC is definitely best, and 
it's also the best format for searching (precisely because it is 
normalized).  

--Chris