[sword-devel] Latin diacritics
Peter von Kaehne
refdoc at gmx.net
Mon Feb 1 00:30:37 MST 2016
What would you decompose/strip umlauts into? As a German a/o/u umlaut should become ae/oe/ue, but for other languages this might not apply.
I guess Turks treat their umlauts similar as us Krauts.
Sent from my phone. Apologies for brevity and typos.On 31 Jan 2016 20:29, Dominique Corbex <dominique at corbex.org> wrote:
>
> There are annoying search problems in French on words including:
> - accented letters
> - ligatures
>
> Here is a sample, the first query show the number of results for
> 'Égypte' with an acute accent, the second without:
>
> $ diatheke -b FreCrampon -s phrase -k Égypte | tr ';' '\n' | wc -l
> 107
> $ diatheke -b FreCrampon -s phrase -k Egypte | tr ';' '\n' | wc -l
> 498
>
> Not all OS allow the user to easily enter ligature, so some texts
> have ligatures directly converted to regular letters, others have not.
>
>
> So, for languages based on Latin script, shouldn't SWORD provide a strip
> filter to remove accents and ligatures? What do you think?
>
> --
> domcox <dominique at corbex.org>
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
More information about the sword-devel
mailing list