[sword-devel] Search for word in Sword
Chris Little
sword-devel@crosswire.org
Fri, 7 Mar 2003 10:19:41 -0700 (MST)
On Thu, 6 Mar 2003, Adrian Korten wrote:
> We came up against a small problem with our Thai test module. When
> searching for a word whose characters are part of other words, there is
> no way to delimit the word. This occurs because Thai has no word breaks.
> Somehow, the rtf engine seems to break the Thai words reasonably
> accurately on the display of text. However, that same logic does not
> seem to be in the search module.
Like Troy mentioned, we can turn on the ICU Thai word-breaking for
searches. This, the option to display with whitespace word-breaks, and
transliteration with whitespace word-breaks were actually the reasons why
I didn't drop the relatively large Thai dictionary from ICU
> The only alternative that I could come up with is to place Unicode
> characters in as word breaks. Unicode has various characters to indicate
> word breaks (non-breaking spaces, hyphenable breaks) invisibly. These
> would have to be placed in the actual text module as UTF8 characters.
You should encode as Unicode recommends, which I assume means no divisions
between words at all. Adding tags like Frank suggested wouldn't help
anyway because the strip filters will strip them out before searching.
--Chris