<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Consider using Luke to analyze the constructed Lucene index. See: <a href="https://code.google.com/archive/p/luke/" class="">https://code.google.com/archive/p/luke/</a><div class="">I think you’ll need one that matches Lucene 1.9.1. Maybe 1.4.x.</div><div class=""><br class=""></div><div class="">DM</div><div class=""><br class=""></div><div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Apr 26, 2017, at 3:48 PM, David Haslam <<a href="mailto:dfhmch@googlemail.com" class="">dfhmch@googlemail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class="">If you examine the result preview pane in the Xiphos Advanced Search dialog,<br class="">the problem becomes apparent.<br class=""><br class="">Most Coptic Unicode characters are not displayed correctly.<br class=""><br class=""><br class=""><br class="">The remainder seem to have been converted to U+FFFD REPLACEMENT CHARACTER.<br class=""><br class="">i.e. All these Coptic letters are basically not handled aright by this part<br class="">of the software:<br class=""><br class="">U+2C81<span class="Apple-tab-span" style="white-space:pre">        </span>ⲁ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC SMALL LETTER ALFA<br class="">U+2C83<span class="Apple-tab-span" style="white-space:pre">        </span>ⲃ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC SMALL LETTER VIDA<br class="">U+2C85<span class="Apple-tab-span" style="white-space:pre">        </span>ⲅ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC SMALL LETTER GAMMA<br class="">U+2C87<span class="Apple-tab-span" style="white-space:pre">        </span>ⲇ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC SMALL LETTER DALDA<br class="">U+2C89<span class="Apple-tab-span" style="white-space:pre">        </span>ⲉ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC SMALL LETTER EIE<br class="">U+2C8B<span class="Apple-tab-span" style="white-space:pre">        </span>ⲋ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC SMALL LETTER SOU<br class="">U+2C8D<span class="Apple-tab-span" style="white-space:pre">        </span>ⲍ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC SMALL LETTER ZATA<br class="">U+2C8F<span class="Apple-tab-span" style="white-space:pre">        </span>ⲏ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC SMALL LETTER HATE<br class="">U+2C91<span class="Apple-tab-span" style="white-space:pre">        </span>ⲑ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC SMALL LETTER THETHE<br class="">U+2C93<span class="Apple-tab-span" style="white-space:pre">        </span>ⲓ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC SMALL LETTER IAUDA<br class="">U+2C95<span class="Apple-tab-span" style="white-space:pre">        </span>ⲕ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC SMALL LETTER KAPA<br class="">U+2C97<span class="Apple-tab-span" style="white-space:pre">        </span>ⲗ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC SMALL LETTER LAULA<br class="">U+2C99<span class="Apple-tab-span" style="white-space:pre">        </span>ⲙ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC SMALL LETTER MI<br class="">U+2C9B<span class="Apple-tab-span" style="white-space:pre">        </span>ⲛ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC SMALL LETTER NI<br class="">U+2C9D<span class="Apple-tab-span" style="white-space:pre">        </span>ⲝ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC SMALL LETTER KSI<br class="">U+2C9F<span class="Apple-tab-span" style="white-space:pre">        </span>ⲟ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC SMALL LETTER O<br class="">U+2CA1<span class="Apple-tab-span" style="white-space:pre">        </span>ⲡ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC SMALL LETTER PI<br class="">U+2CA3<span class="Apple-tab-span" style="white-space:pre">        </span>ⲣ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC SMALL LETTER RO<br class="">U+2CA5<span class="Apple-tab-span" style="white-space:pre">        </span>ⲥ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC SMALL LETTER SIMA<br class="">U+2CA7<span class="Apple-tab-span" style="white-space:pre">        </span>ⲧ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC SMALL LETTER TAU<br class="">U+2CA9<span class="Apple-tab-span" style="white-space:pre">        </span>ⲩ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC SMALL LETTER UA<br class="">U+2CAB<span class="Apple-tab-span" style="white-space:pre">        </span>ⲫ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC SMALL LETTER FI<br class="">U+2CAD<span class="Apple-tab-span" style="white-space:pre">        </span>ⲭ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC SMALL LETTER KHI<br class="">U+2CAF<span class="Apple-tab-span" style="white-space:pre">        </span>ⲯ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC SMALL LETTER PSI<br class="">U+2CB1<span class="Apple-tab-span" style="white-space:pre">        </span>ⲱ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC SMALL LETTER OOU<br class="">U+2CC1<span class="Apple-tab-span" style="white-space:pre">        </span>ⳁ<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC SMALL LETTER SAMPI<br class="">U+2CE8<span class="Apple-tab-span" style="white-space:pre">        </span>⳨<span class="Apple-tab-span" style="white-space:pre">        </span>COPTIC SYMBOL TAU RO<br class=""><br class="">Only the few Coptic letters in the block U+03E2 to U+03EF are displayed<br class="">aright.<br class=""><br class="">It's no wonder that a search has so many spurious results if most of the<br class="">search space has been squashed into Unicode replacement characters.<br class=""><br class="">I'm a Windows user, as most of you know already.<br class="">Does the same thing happen in Xiphos under Linux?<br class=""><br class="">Is this an issue common to all SWORD based front-ends?<br class="">The fact that we see similar results in PocketSword strongly suggests it is.<br class=""><br class="">Best regards,<br class=""><br class="">David<br class=""><br class=""><br class=""><br class="">--<br class="">View this message in context: <a href="http://sword-dev.350566.n4.nabble.com/Lucene-search-index-and-Coptic-tp4657103p4657106.html" class="">http://sword-dev.350566.n4.nabble.com/Lucene-search-index-and-Coptic-tp4657103p4657106.html</a><br class="">Sent from the SWORD Dev mailing list archive at <a href="http://Nabble.com" class="">Nabble.com</a>.<br class=""><br class="">_______________________________________________<br class="">sword-devel mailing list: <a href="mailto:sword-devel@crosswire.org" class="">sword-devel@crosswire.org</a><br class=""><a href="http://www.crosswire.org/mailman/listinfo/sword-devel" class="">http://www.crosswire.org/mailman/listinfo/sword-devel</a><br class="">Instructions to unsubscribe/change your settings at above page</div></div></blockquote></div><br class=""></div></body></html>