[sword-devel] Searching other languages

Dave Washburn sword-devel@crosswire.org
Thu, 29 May 2003 19:18:06 -0600


I know it's a coder list.  One of those other lists referred me here, so...

On Thursday 29 May 2003 17:39, David Burry wrote:
> Ok, the translation for those of you who "just want to search" is:  it's
> a front end app encoding issue, if it's broken in your favorite app then
> that app needs to be fixed!  (the way to fix it is by using UTF-8 etc,
> but that's more than you need to know if you're not a coder... This is a
> coder list though so....)  ;o)
>
> Dave
>
>
> -----Original Message-----
> From: sword-devel-admin@crosswire.org
> [mailto:sword-devel-admin@crosswire.org] On Behalf Of Dave Washburn
> Sent: Thursday, May 29, 2003 4:21 PM
> To: sword-devel@crosswire.org
> Subject: Re: [sword-devel] Searching other languages
>
>
> Could somebody translate this for those of us who are non-specialists
> and just
> want to search the Bible??? :-)
>
> On Thursday 29 May 2003 16:48, Chris Little wrote:
> > On Thu, 29 May 2003, Troy A. Griffitts wrote:
> > > 	Currently the engine does not do MUCH logic when comparing
>
> string
>
> > > in the search.  You can operate on the assumption that all modules
> > > are UTF8 encoded (though I don't know if absolutely ever module is),
> > >
> > > so sending a UTF8 steam to the seach method should produce the
> > > appropriate results.
> >
> > Lots of modules are still Codepage 1252.  You can use the Latin1UTF8
> > filter (or the logic included in it) to convert CP1252 to UTF-8.
> >
> > > There will be problems with the fact that some combining character
> > > may be represented as a precomposed character, but ask in the search
> > >
> > > box as a multiple combining character-- this will not match.  But
> > > basicly, the answer is pass UTF8 text as the search term.
> >
> > Make sure your search string is normalized according to form NFC.
> > (You can use ICU for this.  See the UTF8NFC filter for an example of
> > how to achieve this.)  All modules OUGHT to be NFC already, but I
> > doubt they are. So you might also want to use the UTF8NFC filter as
> > one of your stripfilters.
> >
> > --Chris
> >
> >
> > _______________________________________________
> > sword-devel mailing list
> > sword-devel@crosswire.org
> > http://www.crosswire.org/mailman/listinfo/sword-devel

-- 
Dave Washburn
http://www.nyx.net/~dwashbur