[sword-devel] Lucene search index and Coptic ?
David Haslam
dfhmch at googlemail.com
Fri Apr 28 07:46:51 MST 2017
Greg wrote, "Have you tried using one of the command line utilities or
examples directly?"
Well, yes, but now I have hit a brick wall.
Assuming that *mkfastmod.exe* exactly mimics Xiphos in how it constructs the
Lucene index, that's not the problem.
The problem is that in Windows, how do you get the non-ANSI search key into
diatheke?
Well one might think that this is simply a matter of creating a suitable CMD
file containing the following line:
xiphos\diatheke -b SahidicBible -s lucene -k ⲉⲩϩⲩⲡⲟⲙⲟⲛⲏ >test.log
Where *xiphos* was set up long ago on my PC as /symbolic link/ to the xiphos
directory - using the *mklink* command.
This is what running such a CMD file gave:
Verses containing "ндвнд«ц«нд«ндьндЃндондЃндЭнд┼"-- none (SahidicBible)
The quoted search key is 29 characters long of obscure text.
That's 3 characters for every higher block Coptic letter
and 2 characters for the third letter ϩ which is in the lower block.
Looks like the fact that (as you know) Windows handles everything as UTF-16
LE,
inevitably causes diatheke to convert the search key into something
unrecognisable!
The same thing happens without the "-s lucene"!
And it makes no difference whether the CMD file's text is UTF-8 or UTF-16
encoded.
Nice try, Greg. But it's not added much to the identification of the root
cause.
Can something like this be tried on a Linux machine for comparison?
Best regards,
David
--
View this message in context: http://sword-dev.350566.n4.nabble.com/Lucene-search-index-and-Coptic-tp4657103p4657120.html
Sent from the SWORD Dev mailing list archive at Nabble.com.
More information about the sword-devel
mailing list