[sword-devel] Accented Greek Texts
DM Smith
dmsmith555 at yahoo.com
Tue Sep 18 09:25:35 MST 2007
Chris Little wrote:
> MorphGNT and an updated Tisch, both from morphgnt.org are up in the beta
> area.
>
Both of these modules use composed UTF-8 characters.
In April 2005 we had a discussion on whether Greek should be composed or
decomposed. I don't remember coming to a resolution. Are we going with
composed?
To summarize, some frontends (including different browers viewing the
Bible Tool) handled composed better than decomposed. Others did the
opposite. Font choice had significant impact on the results.
It was noted that we could have filters for composition or decomposition
to transform as the frontend needed.
If we allow for modules to vary with regard to this, could/should we
have an entry in the conf indicating the normalization? Perhaps with the
values from NFC, NFD, NFKD, NFKC, FCD?
Should osis2mod do normalization to an agreed upon normalization?
How should a Greek (or any other accented text) be indexed with Lucene.
Should we index various representations: Fully (de)composed,
un-accented, transliterated?
It seems that the frontend needs to know how the index is represented so
that it can appropriately normalize user input.
Right now Lucene indexes what it is handed and the user is responsible
for matching that.
In Him,
DM
More information about the sword-devel
mailing list