[sword-devel] Accented Greek Texts

Tue Sep 18 09:25:35 MST 2007

Chris Little wrote:
> MorphGNT and an updated Tisch, both from morphgnt.org are up in the beta 
> area.
>   
Both of these modules use composed UTF-8 characters.

In April 2005 we had a discussion on whether Greek should be composed or 
decomposed. I don't remember coming to a resolution. Are we going with 
composed?

To summarize, some frontends (including different browers viewing the 
Bible Tool) handled composed better than decomposed. Others did the 
opposite. Font choice had significant impact on the results.

It was noted that we could have filters for composition or decomposition 
to transform as the frontend needed.

If we allow for modules to vary with regard to this, could/should we 
have an entry in the conf indicating the normalization? Perhaps with the 
values from NFC, NFD, NFKD, NFKC, FCD?

Should osis2mod do normalization to an agreed upon normalization?

How should a Greek (or any other accented text) be indexed with Lucene. 
Should we index various representations: Fully (de)composed, 
un-accented, transliterated?

It seems that the frontend needs to know how the index is represented so 
that it can appropriately normalize user input.

Right now Lucene indexes what it is handed and the user is responsible 
for matching that.

In Him,
    DM