[sword-devel] search failing in Hebrew modules
Matthew Talbert
ransom1982 at gmail.com
Thu Jul 30 19:49:51 MST 2009
> Couple of thoughts
> Assuming the search is a Lucene search.
Results are nearly the same whether it's indexed or not.
> Unicode can have multiple possible representations (byte sequences) for a
> single decorated character. Search will work only if the request and index
> match.
>
> The index has a single representation of the text. The analyzer assumes
> English as input and applies all kinds of transforms that may not be
> appropriate for Hebrew.
>
> When a search is performed the same analyzer is used to transform the search
> request. Generally this is sufficient to ensuer that the search will work.
> If the search request is not or is not transformed first into the same
> Unicode representation, then the search will fail as it will not form the
> stored byte sequence. Typically copy of displayed text for a search request
> will work. Typically typed input will fail. It is just too difficult to type
> the same stored text.
>
> IIRC, SWORD will use the current filters (e.g. Remove accents) in building
> the index. Searches that don't apply the same filters to the request as used
> to build the index will fail.
So is there a way to index a module with and without vowels? Or search
(non-indexed) that way? That seems to be a common request for Semitic
languages.
Matthew
btw, diatheke strangely works exactly the opposite. Searches for words
without vowels work fine, whereas searches with vowels don't work at
all.
More information about the sword-devel
mailing list