[sword-devel] [sword-support] Locales
Peter von Kaehne
refdoc at gmx.net
Sat Sep 13 01:16:24 MST 2008
DM and I thought about this a while back wrt some problems we had with Farsi - essentially there are three scenarios for each diacritic sign - not there, integrated or extra. Modules usually are a mixture of integrated use of diacritics and extra, more or less pure one or the other.
Search entries depend heavily on the keyboard available - a German searching on a German keyboard will use umlauts, a German searching on a British keyboard will use ae, ue or oe, someone else searching a German text might well search simply for a, e or u.
So the best way forward appeared at the time to normalise both text and search entry and accept the possibility of extraneous results - particularly around latinate scripts.
Alternatively - and I think there is a lot of mileage in there - we should/could demand that modules are designed cleanly in terms of diacritics (i.e. only sequential) and rectified whereever there is a problem. Subsequently only the search entries would need to be normalised or even better could be subject to user settings
Peter
-------- Original-Nachricht --------
> Datum: Sat, 13 Sep 2008 08:43:08 +0100
> Von: "Troy A. Griffitts" <scribe at crosswire.org>
> An: SWORD Support Volunteers <sword-support at crosswire.org>, refdoc at gmx.net, SWORD Developers\' Collaboration Forum <sword-devel at crosswire.org>
> Betreff: Re: [sword-support] Locales
> I would guess if we build lucene indexes for that Bible, the lucene
> would search ignoring accents?
>
> Or that module is not UTF-8?
>
> We have filters that we use on ancient Greek texts that allow searching
> regarless of diacritics. He could add a set for any language, but I'm
> not sure if this is the right location to place responsibility. Maybe
> if it was an ICU filter that could work for any language-- like if it's
> just a normalization problem. We could use that one filter for all
> Bibles like we do the filter for Greek.
>
> Not sure, just thinking out loud.
>
> -Troy.
>
>
>
>
> Peter von Kaehne wrote:
> > Thanks. this is a known problem which caases a lot of difficulties - in
> all languages which rely on diacritics.
> >
> > There is a plan to improve the search facility.
> >
> > Peter
> >
> > -------- Original-Nachricht --------
> >> Datum: Fri, 12 Sep 2008 19:57:58 +0200 (CEST)
> >> An: sword-bugs at crosswire.org
> >> Betreff: [sword-support] Locales
> >
> >> Peace and love to my brothers and sisters in Jesus Christ, our Lord,
> from
> >> Jan, His weak servant.
> >>
> >> I am sorry to inform you about an error in the search engine of The
> Bible
> >> Tool. While using Czech the search does not correctly interprets all
> the
> >> letters with diacritic, e.g.
> >>
> >> while typing the request:
> >>
> >> Nesl svůj kříž
> >>
> >>
> http://www.crosswire.org/study/wordsearchresults.jsp?searchTerm=Nesl+sv%C5%AFj+k%C5%99%C3%AD%C5%BE
> >>
> >> the result says that there is
> >>> 0 result in the text of Czech Ekumenicky Cesky preklad<
> >> even the searched text was copied & pasted directly from it.
> >>
> >> I hope, it neads only the minor repair only, while the search gives
> good
> >> results while looking for the phrases w/o Czech specific letters
> >>
> >> Wish: the search default is "exact match" hence:
> >>> Co jsem napsal, napsal< gives result
> >> but
> >>> co jsem napsal, napsal< gives 0 result
> >> As people use the search to help their poor memory, I wish to realy
> help
> >> them with less "censorious" matching criteria. These can be useful in
> the
> >> "Advanced search".
> >>
> >> God helps to your "Opus Dei"
> >>
> >>
> >> _______________________________________________
> >> sword-support mailing list
> >> sword-support at crosswire.org
> >
--
GMX Kostenlose Spiele: Einfach online spielen und Spaß haben mit Pastry Passion!
http://games.entertainment.gmx.net/de/entertainment/games/free/puzzle/6169196
More information about the sword-devel
mailing list