[sword-devel] Latin diacritics

Peter von Kaehne refdoc at gmx.net
Mon Feb 1 01:06:03 MST 2016


I guess we might require a bunch of filters.

Sent from my phone. Apologies for brevity and typos.On 1 Feb 2016 7:30 am, Peter von Kaehne <refdoc at gmx.net> wrote:
>
> What would you decompose/strip umlauts into? As a German a/o/u umlaut should become ae/oe/ue, but for other languages this might not apply.
>
> I guess Turks treat their umlauts similar as us Krauts.
>
> Sent from my phone. Apologies for brevity and typos.On 31 Jan 2016 20:29, Dominique Corbex <dominique at corbex.org> wrote:
> >
> > There are annoying search problems in French on words including: 
> > - accented letters 
> > - ligatures 
> >
> > Here is a sample, the first query show the number of results for 
> > 'Égypte' with an acute accent, the second without: 
> >
> > $ diatheke -b FreCrampon -s phrase -k Égypte | tr ';' '\n' | wc -l 
> > 107 
> > $ diatheke -b FreCrampon -s phrase -k Egypte | tr ';' '\n' | wc -l 
> > 498 
> >
> > Not all OS allow the user to easily enter ligature, so some texts 
> > have ligatures directly converted to regular letters, others have not. 
> >
> >
> > So, for languages based on Latin script, shouldn't SWORD provide a strip 
> > filter to remove accents and ligatures? What do you think? 
> >
> > -- 
> > domcox <dominique at corbex.org> 
> >
> > _______________________________________________ 
> > sword-devel mailing list: sword-devel at crosswire.org 
> > http://www.crosswire.org/mailman/listinfo/sword-devel 
> > Instructions to unsubscribe/change your settings at above page 
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page


More information about the sword-devel mailing list