[sword-devel] Search for word in Sword
Adrian Korten
sword-devel@crosswire.org
Sat, 08 Mar 2003 10:09:14 +0700
Good day,
I'd be willing to give it a try and can find some people to help test
it. Does that mean someone would compile a special test version? This is
a question from TBS but they are not in a rush for this at the moment.
IOW, when it is convenient for you then do it and meanwhile I can tell
them that it is being worked on.
Adrian
p.s. I'm not in a hurry for it either because they have first asked me
to setup a Linux file-server and e-mail server - both new things for me.
Chris Little wrote:
> On Thu, 6 Mar 2003, Adrian Korten wrote:
>
>
>>We came up against a small problem with our Thai test module. When
>>searching for a word whose characters are part of other words, there is
>>no way to delimit the word. This occurs because Thai has no word breaks.
>>Somehow, the rtf engine seems to break the Thai words reasonably
>>accurately on the display of text. However, that same logic does not
>>seem to be in the search module.
>
>
> Like Troy mentioned, we can turn on the ICU Thai word-breaking for
> searches. This, the option to display with whitespace word-breaks, and
> transliteration with whitespace word-breaks were actually the reasons why
> I didn't drop the relatively large Thai dictionary from ICU
>
>
>>The only alternative that I could come up with is to place Unicode
>>characters in as word breaks. Unicode has various characters to indicate
>>word breaks (non-breaking spaces, hyphenable breaks) invisibly. These
>>would have to be placed in the actual text module as UTF8 characters.
>
>
> You should encode as Unicode recommends, which I assume means no divisions
> between words at all. Adding tags like Frank suggested wouldn't help
> anyway because the strip filters will strip them out before searching.
>
> --Chris
>
>
> _______________________________________________
> sword-devel mailing list
> sword-devel@crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
>