[sword-devel] [sword-support] Locales

Peter von Kaehne refdoc at gmx.net
Sat Sep 13 01:16:24 MST 2008


DM and I thought about this a while back wrt some problems we had with  Farsi - essentially there are three scenarios for each diacritic sign - not there, integrated or extra. Modules usually are a mixture of integrated use of diacritics and extra, more or less pure one or the other. 

Search entries depend heavily on the keyboard available - a German searching on a German keyboard will use umlauts, a German searching on a British keyboard will use ae, ue or oe, someone else searching a German text might well search simply for a, e or u.

So the best way forward appeared at the time  to normalise both text and search entry and accept the possibility of extraneous results - particularly around latinate scripts. 

Alternatively - and I think there is a lot of mileage in there - we should/could demand that modules are designed cleanly in terms of diacritics (i.e. only sequential) and rectified whereever there is a problem. Subsequently only the search entries would need to be normalised or even better could be subject to user settings

Peter




-------- Original-Nachricht --------
> Datum: Sat, 13 Sep 2008 08:43:08 +0100
> Von: "Troy A. Griffitts" <scribe at crosswire.org>
> An: SWORD Support Volunteers <sword-support at crosswire.org>, refdoc at gmx.net, SWORD Developers\' Collaboration Forum <sword-devel at crosswire.org>
> Betreff: Re: [sword-support] Locales

> I would guess if we build lucene indexes for that Bible, the lucene 
> would search ignoring accents?
> 
> Or that module is not UTF-8?
> 
> We have filters that we use on ancient Greek texts that allow searching 
> regarless of diacritics.  He could add a set for any language, but I'm 
> not sure if this is the right location to place responsibility.  Maybe 
> if it was an ICU filter that could work for any language-- like if it's 
> just a normalization problem.  We could use that one filter for all 
> Bibles like we do the filter for Greek.
> 
> Not sure, just thinking out loud.
> 
> 	-Troy.
> 
> 
> 
> 
> Peter von Kaehne wrote:
> > Thanks. this is a known problem which caases a lot of difficulties - in
> all languages which rely on diacritics.
> > 
> > There is a plan to improve the search facility.
> > 
> > Peter
> > 
> > -------- Original-Nachricht --------
> >> Datum: Fri, 12 Sep 2008 19:57:58 +0200 (CEST)
> >> An: sword-bugs at crosswire.org
> >> Betreff: [sword-support] Locales
> > 
> >> Peace and love to my brothers and sisters in Jesus Christ, our Lord,
> from
> >> Jan, His weak servant.
> >>
> >> I am sorry to inform you about an error in the search engine of The
> Bible
> >> Tool. While using Czech the search does not correctly interprets all
> the
> >> letters with diacritic, e.g.
> >>
> >> while typing the request: 
> >>
> >> Nesl svůj kříž
> >>
> >>
> http://www.crosswire.org/study/wordsearchresults.jsp?searchTerm=Nesl+sv%C5%AFj+k%C5%99%C3%AD%C5%BE
> >>
> >> the result says that there is 
> >>> 0 result in the text of Czech Ekumenicky Cesky preklad<
> >> even the searched text was copied & pasted directly from it.
> >>
> >> I hope, it neads only the minor repair only, while the search gives
> good
> >> results while looking for the phrases w/o Czech specific letters 
> >>
> >> Wish: the search default is "exact match" hence:
> >>> Co jsem napsal, napsal< gives result
> >> but
> >>> co jsem napsal, napsal< gives 0 result
> >> As  people use the search to help their poor memory, I wish to realy
> help
> >> them with less "censorious" matching criteria. These can be useful in
> the
> >> "Advanced search".
> >>
> >> God helps to your "Opus Dei"
> >>
> >>
> >> _______________________________________________
> >> sword-support mailing list
> >> sword-support at crosswire.org
> > 

-- 
GMX Kostenlose Spiele: Einfach online spielen und Spaß haben mit Pastry Passion!
http://games.entertainment.gmx.net/de/entertainment/games/free/puzzle/6169196



More information about the sword-devel mailing list