[sword-devel] Re: Accented Greek NT (Westcott-Hort)
Chris Little
sword-devel@crosswire.org
Sat, 3 Apr 2004 22:50:02 -0700 (MST)
On Sat, 3 Apr 2004, Costas Stergiou wrote:
> Hi Chris,
>
> > I suspect that the problem is either that the text is encoded using the
> > Extended Greek codepoints or the RTF control is transcoding them to use
> > these points. I'm pretty sure the whole Extended Greek section violates
> > the recommendation that data be normalized as NFC. It also makes
> > searching more problematic if we don't use NFC internally. So, that said,
> > when we transcode the text to use Greek + combining marks instead of
> > precomposed characters, it might work fine with even Times New Roman.
> >
>
> Can you explain which one is the best way to encode greek accented texts in
> order to be 'compatible' with what is most commonly accepted?
> I gather from the above that the greek+combining is the preferred way
> (instead of using precomposed chars), but i think precomposed chars tend to
> show up better (since the correct drawing of diacriticals is a bit
> difficult).
>
> In Christ,
> Costas
Hi Costas,
I don't know whether greek+combining or extended greek is the preferred
method, but NFC itself specifically is named as the preferred method of
encoding. NFC is just a Unicode normalization form for every
codepoint/codepoint sequence. The uconv utility from ICU can convert any
Unicode file into NFC if you use the "Any-NFC" transliterator. (There's a
copy of uconv at
http://crosswire.org/ftpmirror/pub/sword/utils/win32/uconv.zip, but it
might be too old to work for this purpose.)
Display is, of course, an issue. One option is to convert NFC to combined
characters whenever possible. Another is to assume sufficiently
intelligent rendering that will place combining characters in the correct
locations anyway. But as far as storage goes, NFC is definitely best, and
it's also the best format for searching (precisely because it is
normalized).
--Chris