[sword-devel] Lucene search index and Coptic ?

Greg Hellings greg.hellings at gmail.com
Wed Apr 26 15:17:41 MST 2017


Unicode replacement characters typically indicate a font issue, and would
not normally be represented as such within the internals of a program. Have
you tried using one of the command line utilities or examples directly?

--Greg

On Wed, Apr 26, 2017 at 2:48 PM, David Haslam <dfhmch at googlemail.com> wrote:

> If you examine the result preview pane in the Xiphos Advanced Search
> dialog,
> the problem becomes apparent.
>
> Most Coptic Unicode characters are not displayed correctly.
>
>
>
> The remainder seem to have been converted to U+FFFD REPLACEMENT CHARACTER.
>
> i.e. All these Coptic letters are basically not handled aright by this part
> of the software:
>
> U+2C81  ⲁ       COPTIC SMALL LETTER ALFA
> U+2C83  ⲃ       COPTIC SMALL LETTER VIDA
> U+2C85  ⲅ       COPTIC SMALL LETTER GAMMA
> U+2C87  ⲇ       COPTIC SMALL LETTER DALDA
> U+2C89  ⲉ       COPTIC SMALL LETTER EIE
> U+2C8B  ⲋ       COPTIC SMALL LETTER SOU
> U+2C8D  ⲍ       COPTIC SMALL LETTER ZATA
> U+2C8F  ⲏ       COPTIC SMALL LETTER HATE
> U+2C91  ⲑ       COPTIC SMALL LETTER THETHE
> U+2C93  ⲓ       COPTIC SMALL LETTER IAUDA
> U+2C95  ⲕ       COPTIC SMALL LETTER KAPA
> U+2C97  ⲗ       COPTIC SMALL LETTER LAULA
> U+2C99  ⲙ       COPTIC SMALL LETTER MI
> U+2C9B  ⲛ       COPTIC SMALL LETTER NI
> U+2C9D  ⲝ       COPTIC SMALL LETTER KSI
> U+2C9F  ⲟ       COPTIC SMALL LETTER O
> U+2CA1  ⲡ       COPTIC SMALL LETTER PI
> U+2CA3  ⲣ       COPTIC SMALL LETTER RO
> U+2CA5  ⲥ       COPTIC SMALL LETTER SIMA
> U+2CA7  ⲧ       COPTIC SMALL LETTER TAU
> U+2CA9  ⲩ       COPTIC SMALL LETTER UA
> U+2CAB  ⲫ       COPTIC SMALL LETTER FI
> U+2CAD  ⲭ       COPTIC SMALL LETTER KHI
> U+2CAF  ⲯ       COPTIC SMALL LETTER PSI
> U+2CB1  ⲱ       COPTIC SMALL LETTER OOU
> U+2CC1  ⳁ       COPTIC SMALL LETTER SAMPI
> U+2CE8  ⳨       COPTIC SYMBOL TAU RO
>
> Only the few Coptic letters in the block U+03E2 to U+03EF are displayed
> aright.
>
> It's no wonder that a search has so many spurious results if most of the
> search space has been squashed into Unicode replacement characters.
>
> I'm a Windows user, as most of you know already.
> Does the same thing happen in Xiphos under Linux?
>
> Is this an issue common to all SWORD based front-ends?
> The fact that we see similar results in PocketSword strongly suggests it
> is.
>
> Best regards,
>
> David
>
>
>
> --
> View this message in context: http://sword-dev.350566.n4.
> nabble.com/Lucene-search-index-and-Coptic-tp4657103p4657106.html
> Sent from the SWORD Dev mailing list archive at Nabble.com.
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20170426/d127123a/attachment.html>


More information about the sword-devel mailing list