[sword-devel] Latin diacritics
Peter von Kaehne
refdoc at gmx.net
Mon Feb 1 01:06:03 MST 2016
I guess we might require a bunch of filters.
Sent from my phone. Apologies for brevity and typos.On 1 Feb 2016 7:30 am, Peter von Kaehne <refdoc at gmx.net> wrote:
>
> What would you decompose/strip umlauts into? As a German a/o/u umlaut should become ae/oe/ue, but for other languages this might not apply.
>
> I guess Turks treat their umlauts similar as us Krauts.
>
> Sent from my phone. Apologies for brevity and typos.On 31 Jan 2016 20:29, Dominique Corbex <dominique at corbex.org> wrote:
> >
> > There are annoying search problems in French on words including:
> > - accented letters
> > - ligatures
> >
> > Here is a sample, the first query show the number of results for
> > 'Égypte' with an acute accent, the second without:
> >
> > $ diatheke -b FreCrampon -s phrase -k Égypte | tr ';' '\n' | wc -l
> > 107
> > $ diatheke -b FreCrampon -s phrase -k Egypte | tr ';' '\n' | wc -l
> > 498
> >
> > Not all OS allow the user to easily enter ligature, so some texts
> > have ligatures directly converted to regular letters, others have not.
> >
> >
> > So, for languages based on Latin script, shouldn't SWORD provide a strip
> > filter to remove accents and ligatures? What do you think?
> >
> > --
> > domcox <dominique at corbex.org>
> >
> > _______________________________________________
> > sword-devel mailing list: sword-devel at crosswire.org
> > http://www.crosswire.org/mailman/listinfo/sword-devel
> > Instructions to unsubscribe/change your settings at above page
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
More information about the sword-devel
mailing list