[sword-devel] Does the CLucene indexing work for non-English texts?

Tom Sullivan info at beforgiven.info
Fri Nov 2 01:49:59 MST 2018


In the meantime, I suppose that users ought to be instructed to search 
using the final sigma for words that end in sigma, and normal sigma 
otherwise, and to use lower case only. Or one might just not insert 
final sigmas, but that might broaden search results.

Tom Sullivan
info at BeForgiven.INFO
FAX: 815-301-2835
---------------------
Great News!
God created you, owns you and gave you commands to obey.
You have disobeyed God - as your conscience very well attests to you.
God's holiness and justice compel Him to punish you in Hell.
Jesus Christ became Man, was crucified, buried and rose from the dead
as a substitute for all who trust in Him, redeeming them from Hell.
If you repent (turn from your sin) and believe (trust) in Jesus Christ,
you will go to Heaven. Otherwise you will go to Hell.
Warning! Good works are a result, not cause, of saving trust.
More info is at www.esig.beforgiven.info
Do you believe this? Copy this signature into your email program
and use the Internet to spread the Great News every time you email.

On 11/1/18 8:17 PM, Nic Carter wrote:
> PocketSword uses the standard SWORD library search implementation, using CLucene. Last I looked, the C version is a _long_ way behind the Java version (Lucene). The C version seemed to stop being developed after it worked well enough for English text and didn’t seem to get any love for other languages, which is unfortunately for us that use the C version of SWORD.
> 
> Nic.
> 
>> On 2 Nov 2018, at 8:42 am, DM Smith <dmsmith at crosswire.org> wrote:
>>
>>  From memory, SWORD uses SimpleAnalyzer. This analyzer works well for Western European languages. It won’t for non-latinate texts. It may work in part.
>>
>> The basic rule of thumb is that both the index has to be created with an analyzer and the search request has to be analyzed the same.
>>
>> PocketSword uses externally created indexes which need to be downloaded to work. It uses the SWORD library for creation and for searching.
>>
>> In Him,
>> 	DM
>>
>>
>>
>>> On Nov 1, 2018, at 4:14 PM, TS <outofthecube at icloud.com> wrote:
>>>
>>> Does the CLucene indexing work for non-English texts?
>>>
>>> David's recent question about languages without spaces caused me to be a bit curious about this matter. Briefly looking at the current Apache Lucene code, their appears to be extra code for non-English text. However, this is in comparison to the Clucene code for PocketSword. And I seem to recall that, in general, the CLucene code in PocketSword may only be for reading indices and not for writing them. Also, to clarify  further, it is possible that an index is created, but with errors?
>>>     For example, when I search Koine Greek texts, it does so erroneously. I think that I'll start a separate post regarding the Greek search and indexing in specific.
>>>
>>> -TS
>>>
>>> --Sent from phone--
>>> _______________________________________________
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>>
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
> 
> 
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
> 
> ______________________________________________________________________
> This email has been scanned by the Symantec Email Security.cloud service.
> For more information please visit http://www.symanteccloud.com
> ______________________________________________________________________
> 



More information about the sword-devel mailing list