[sword-devel] Searching other languages

Dave Washburn sword-devel@crosswire.org
Thu, 29 May 2003 17:21:25 -0600


Could somebody translate this for those of us who are non-specialists and just 
want to search the Bible??? :-)

On Thursday 29 May 2003 16:48, Chris Little wrote:
> On Thu, 29 May 2003, Troy A. Griffitts wrote:
> > 	Currently the engine does not do MUCH logic when comparing string in
> > the search.  You can operate on the assumption that all modules are UTF8
> > encoded (though I don't know if absolutely ever module is), so sending a
> > UTF8 steam to the seach method should produce the appropriate results.
>
> Lots of modules are still Codepage 1252.  You can use the Latin1UTF8
> filter (or the logic included in it) to convert CP1252 to UTF-8.
>
> > There will be problems with the fact that some combining character may
> > be represented as a precomposed character, but ask in the search box as
> > a multiple combining character-- this will not match.  But basicly, the
> > answer is pass UTF8 text as the search term.
>
> Make sure your search string is normalized according to form NFC.  (You
> can use ICU for this.  See the UTF8NFC filter for an example of how to
> achieve this.)  All modules OUGHT to be NFC already, but I doubt they are.
> So you might also want to use the UTF8NFC filter as one of your
> stripfilters.
>
> --Chris
>
>
> _______________________________________________
> sword-devel mailing list
> sword-devel@crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel

-- 
Dave Washburn
http://www.nyx.net/~dwashbur