[sword-devel] Lucene search index and Coptic ?
David Haslam
dfhmch at googlemail.com
Wed Apr 26 12:48:37 MST 2017
If you examine the result preview pane in the Xiphos Advanced Search dialog,
the problem becomes apparent.
Most Coptic Unicode characters are not displayed correctly.
The remainder seem to have been converted to U+FFFD REPLACEMENT CHARACTER.
i.e. All these Coptic letters are basically not handled aright by this part
of the software:
U+2C81 ⲁ COPTIC SMALL LETTER ALFA
U+2C83 ⲃ COPTIC SMALL LETTER VIDA
U+2C85 ⲅ COPTIC SMALL LETTER GAMMA
U+2C87 ⲇ COPTIC SMALL LETTER DALDA
U+2C89 ⲉ COPTIC SMALL LETTER EIE
U+2C8B ⲋ COPTIC SMALL LETTER SOU
U+2C8D ⲍ COPTIC SMALL LETTER ZATA
U+2C8F ⲏ COPTIC SMALL LETTER HATE
U+2C91 ⲑ COPTIC SMALL LETTER THETHE
U+2C93 ⲓ COPTIC SMALL LETTER IAUDA
U+2C95 ⲕ COPTIC SMALL LETTER KAPA
U+2C97 ⲗ COPTIC SMALL LETTER LAULA
U+2C99 ⲙ COPTIC SMALL LETTER MI
U+2C9B ⲛ COPTIC SMALL LETTER NI
U+2C9D ⲝ COPTIC SMALL LETTER KSI
U+2C9F ⲟ COPTIC SMALL LETTER O
U+2CA1 ⲡ COPTIC SMALL LETTER PI
U+2CA3 ⲣ COPTIC SMALL LETTER RO
U+2CA5 ⲥ COPTIC SMALL LETTER SIMA
U+2CA7 ⲧ COPTIC SMALL LETTER TAU
U+2CA9 ⲩ COPTIC SMALL LETTER UA
U+2CAB ⲫ COPTIC SMALL LETTER FI
U+2CAD ⲭ COPTIC SMALL LETTER KHI
U+2CAF ⲯ COPTIC SMALL LETTER PSI
U+2CB1 ⲱ COPTIC SMALL LETTER OOU
U+2CC1 ⳁ COPTIC SMALL LETTER SAMPI
U+2CE8 ⳨ COPTIC SYMBOL TAU RO
Only the few Coptic letters in the block U+03E2 to U+03EF are displayed
aright.
It's no wonder that a search has so many spurious results if most of the
search space has been squashed into Unicode replacement characters.
I'm a Windows user, as most of you know already.
Does the same thing happen in Xiphos under Linux?
Is this an issue common to all SWORD based front-ends?
The fact that we see similar results in PocketSword strongly suggests it is.
Best regards,
David
--
View this message in context: http://sword-dev.350566.n4.nabble.com/Lucene-search-index-and-Coptic-tp4657103p4657106.html
Sent from the SWORD Dev mailing list archive at Nabble.com.
More information about the sword-devel
mailing list