[sword-devel] SWORD search index creation for Khmer, Thai and other languages with no space between words?

DM Smith dmsmith at crosswire.org
Mon Oct 8 13:35:14 MST 2018


I don’t know what SWORD does.

JSword uses a Thai word break algorithm that’s part of Lucene. So to do a search, you have to know Thai well enough to know where words break or do an OR search.

Don’t remember if JSword does Khmer. Don’t think so.

The other technique that is available in Lucene is a windowing technique that breaks the search request into overlapping windows with a window size of a few characters (i think it is 4 to 5). I haven’t played with it.

DM

> On Oct 8, 2018, at 12:29 PM, David Haslam <dfhdfh at protonmail.com> wrote:
> 
> How does SWORD index a module written in a language whose writing system has no space between words?
> 
> Examples include Khmer and Thai.
> 
> David
> 
> Sent from ProtonMail Mobile
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page




More information about the sword-devel mailing list