[sword-devel] Lucene search index and Coptic ?
DM Smith
dmsmith at crosswire.org
Wed Apr 26 15:21:22 MST 2017
Consider using Luke to analyze the constructed Lucene index. See: https://code.google.com/archive/p/luke/ <https://code.google.com/archive/p/luke/>
I think you’ll need one that matches Lucene 1.9.1. Maybe 1.4.x.
DM
> On Apr 26, 2017, at 3:48 PM, David Haslam <dfhmch at googlemail.com> wrote:
>
> If you examine the result preview pane in the Xiphos Advanced Search dialog,
> the problem becomes apparent.
>
> Most Coptic Unicode characters are not displayed correctly.
>
>
>
> The remainder seem to have been converted to U+FFFD REPLACEMENT CHARACTER.
>
> i.e. All these Coptic letters are basically not handled aright by this part
> of the software:
>
> U+2C81 ⲁ COPTIC SMALL LETTER ALFA
> U+2C83 ⲃ COPTIC SMALL LETTER VIDA
> U+2C85 ⲅ COPTIC SMALL LETTER GAMMA
> U+2C87 ⲇ COPTIC SMALL LETTER DALDA
> U+2C89 ⲉ COPTIC SMALL LETTER EIE
> U+2C8B ⲋ COPTIC SMALL LETTER SOU
> U+2C8D ⲍ COPTIC SMALL LETTER ZATA
> U+2C8F ⲏ COPTIC SMALL LETTER HATE
> U+2C91 ⲑ COPTIC SMALL LETTER THETHE
> U+2C93 ⲓ COPTIC SMALL LETTER IAUDA
> U+2C95 ⲕ COPTIC SMALL LETTER KAPA
> U+2C97 ⲗ COPTIC SMALL LETTER LAULA
> U+2C99 ⲙ COPTIC SMALL LETTER MI
> U+2C9B ⲛ COPTIC SMALL LETTER NI
> U+2C9D ⲝ COPTIC SMALL LETTER KSI
> U+2C9F ⲟ COPTIC SMALL LETTER O
> U+2CA1 ⲡ COPTIC SMALL LETTER PI
> U+2CA3 ⲣ COPTIC SMALL LETTER RO
> U+2CA5 ⲥ COPTIC SMALL LETTER SIMA
> U+2CA7 ⲧ COPTIC SMALL LETTER TAU
> U+2CA9 ⲩ COPTIC SMALL LETTER UA
> U+2CAB ⲫ COPTIC SMALL LETTER FI
> U+2CAD ⲭ COPTIC SMALL LETTER KHI
> U+2CAF ⲯ COPTIC SMALL LETTER PSI
> U+2CB1 ⲱ COPTIC SMALL LETTER OOU
> U+2CC1 ⳁ COPTIC SMALL LETTER SAMPI
> U+2CE8 ⳨ COPTIC SYMBOL TAU RO
>
> Only the few Coptic letters in the block U+03E2 to U+03EF are displayed
> aright.
>
> It's no wonder that a search has so many spurious results if most of the
> search space has been squashed into Unicode replacement characters.
>
> I'm a Windows user, as most of you know already.
> Does the same thing happen in Xiphos under Linux?
>
> Is this an issue common to all SWORD based front-ends?
> The fact that we see similar results in PocketSword strongly suggests it is.
>
> Best regards,
>
> David
>
>
>
> --
> View this message in context: http://sword-dev.350566.n4.nabble.com/Lucene-search-index-and-Coptic-tp4657103p4657106.html
> Sent from the SWORD Dev mailing list archive at Nabble.com.
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20170426/90a829bc/attachment-0001.html>
More information about the sword-devel
mailing list