[sword-devel] Lucene search index and Coptic ?

David Haslam dfhmch at googlemail.com
Fri Apr 28 07:46:51 MST 2017


Greg wrote, "Have you tried using one of the command line utilities or
examples directly?"

Well, yes, but now I have hit a brick wall.

Assuming that *mkfastmod.exe* exactly mimics Xiphos in how it constructs the
Lucene index, that's not the problem.

The problem is that in Windows, how do you get the non-ANSI search key into
diatheke?

Well one might think that this is simply a matter of creating a suitable CMD
file containing the following line:

xiphos\diatheke -b SahidicBible -s lucene -k ⲉⲩϩⲩⲡⲟⲙⲟⲛⲏ >test.log

Where *xiphos* was set up long ago on my PC as /symbolic link/ to the xiphos
directory - using the *mklink* command.

This is what running such a CMD file gave:

Verses containing "ндвнд«ц«нд«ндьндЃндондЃндЭнд┼"-- none (SahidicBible)

The quoted search key is 29 characters long of obscure text.
That's 3 characters for every higher block Coptic letter
and 2 characters for the third letter ϩ which is in the lower block.

Looks like the fact that (as you know) Windows handles everything as UTF-16
LE,
inevitably causes diatheke to convert the search key into something
unrecognisable!

The same thing happens without the "-s lucene"!

And it makes no difference whether the CMD file's text is UTF-8 or UTF-16
encoded.

Nice try, Greg. But it's not added much to the identification of the root
cause.

Can something like this be tried on a Linux machine for comparison?

Best regards,

David 






--
View this message in context: http://sword-dev.350566.n4.nabble.com/Lucene-search-index-and-Coptic-tp4657103p4657120.html
Sent from the SWORD Dev mailing list archive at Nabble.com.



More information about the sword-devel mailing list