[sword-devel] Lucene search index and Coptic ?

DM Smith dmsmith at crosswire.org
Wed Apr 26 15:21:22 MST 2017


Consider using Luke to analyze the constructed Lucene index. See: https://code.google.com/archive/p/luke/ <https://code.google.com/archive/p/luke/>
I think you’ll need one that matches Lucene 1.9.1. Maybe 1.4.x.

DM


> On Apr 26, 2017, at 3:48 PM, David Haslam <dfhmch at googlemail.com> wrote:
> 
> If you examine the result preview pane in the Xiphos Advanced Search dialog,
> the problem becomes apparent.
> 
> Most Coptic Unicode characters are not displayed correctly.
> 
> 
> 
> The remainder seem to have been converted to U+FFFD REPLACEMENT CHARACTER.
> 
> i.e. All these Coptic letters are basically not handled aright by this part
> of the software:
> 
> U+2C81	ⲁ	COPTIC SMALL LETTER ALFA
> U+2C83	ⲃ	COPTIC SMALL LETTER VIDA
> U+2C85	ⲅ	COPTIC SMALL LETTER GAMMA
> U+2C87	ⲇ	COPTIC SMALL LETTER DALDA
> U+2C89	ⲉ	COPTIC SMALL LETTER EIE
> U+2C8B	ⲋ	COPTIC SMALL LETTER SOU
> U+2C8D	ⲍ	COPTIC SMALL LETTER ZATA
> U+2C8F	ⲏ	COPTIC SMALL LETTER HATE
> U+2C91	ⲑ	COPTIC SMALL LETTER THETHE
> U+2C93	ⲓ	COPTIC SMALL LETTER IAUDA
> U+2C95	ⲕ	COPTIC SMALL LETTER KAPA
> U+2C97	ⲗ	COPTIC SMALL LETTER LAULA
> U+2C99	ⲙ	COPTIC SMALL LETTER MI
> U+2C9B	ⲛ	COPTIC SMALL LETTER NI
> U+2C9D	ⲝ	COPTIC SMALL LETTER KSI
> U+2C9F	ⲟ	COPTIC SMALL LETTER O
> U+2CA1	ⲡ	COPTIC SMALL LETTER PI
> U+2CA3	ⲣ	COPTIC SMALL LETTER RO
> U+2CA5	ⲥ	COPTIC SMALL LETTER SIMA
> U+2CA7	ⲧ	COPTIC SMALL LETTER TAU
> U+2CA9	ⲩ	COPTIC SMALL LETTER UA
> U+2CAB	ⲫ	COPTIC SMALL LETTER FI
> U+2CAD	ⲭ	COPTIC SMALL LETTER KHI
> U+2CAF	ⲯ	COPTIC SMALL LETTER PSI
> U+2CB1	ⲱ	COPTIC SMALL LETTER OOU
> U+2CC1	ⳁ	COPTIC SMALL LETTER SAMPI
> U+2CE8	⳨	COPTIC SYMBOL TAU RO
> 
> Only the few Coptic letters in the block U+03E2 to U+03EF are displayed
> aright.
> 
> It's no wonder that a search has so many spurious results if most of the
> search space has been squashed into Unicode replacement characters.
> 
> I'm a Windows user, as most of you know already.
> Does the same thing happen in Xiphos under Linux?
> 
> Is this an issue common to all SWORD based front-ends?
> The fact that we see similar results in PocketSword strongly suggests it is.
> 
> Best regards,
> 
> David
> 
> 
> 
> --
> View this message in context: http://sword-dev.350566.n4.nabble.com/Lucene-search-index-and-Coptic-tp4657103p4657106.html
> Sent from the SWORD Dev mailing list archive at Nabble.com.
> 
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20170426/90a829bc/attachment-0001.html>


More information about the sword-devel mailing list