[sword-devel] Search for word in Sword

Chris Little sword-devel@crosswire.org
Fri, 7 Mar 2003 10:19:41 -0700 (MST)


On Thu, 6 Mar 2003, Adrian Korten wrote:

> We came up against a small problem with our Thai test module. When 
> searching for a word whose characters are part of other words, there is 
> no way to delimit the word. This occurs because Thai has no word breaks. 
> Somehow, the rtf engine seems to break the Thai words reasonably 
> accurately on the display of text. However, that same logic does not 
> seem to be in the search module.

Like Troy mentioned, we can turn on the ICU Thai word-breaking for
searches.  This, the option to display with whitespace word-breaks, and 
transliteration with whitespace word-breaks were actually the reasons why 
I didn't drop the relatively large Thai dictionary from ICU

> The only alternative that I could come up with is to place Unicode 
> characters in as word breaks. Unicode has various characters to indicate 
> word breaks (non-breaking spaces, hyphenable breaks) invisibly. These 
> would have to be placed in the actual text module as UTF8 characters.

You should encode as Unicode recommends, which I assume means no divisions 
between words at all.  Adding tags like Frank suggested wouldn't help 
anyway because the strip filters will strip them out before searching.

--Chris