[sword-devel] Latin diacritics

Peter von Kaehne refdoc at gmx.net
Mon Feb 1 00:30:37 MST 2016


What would you decompose/strip umlauts into? As a German a/o/u umlaut should become ae/oe/ue, but for other languages this might not apply. 

I guess Turks treat their umlauts similar as us Krauts. 

Sent from my phone. Apologies for brevity and typos.On 31 Jan 2016 20:29, Dominique Corbex <dominique at corbex.org> wrote:
>
> There are annoying search problems in French on words including: 
> - accented letters 
> - ligatures 
>
> Here is a sample, the first query show the number of results for 
> 'Égypte' with an acute accent, the second without: 
>
> $ diatheke -b FreCrampon -s phrase -k Égypte | tr ';' '\n' | wc -l 
> 107 
> $ diatheke -b FreCrampon -s phrase -k Egypte | tr ';' '\n' | wc -l 
> 498 
>
> Not all OS allow the user to easily enter ligature, so some texts 
> have ligatures directly converted to regular letters, others have not. 
>
>
> So, for languages based on Latin script, shouldn't SWORD provide a strip 
> filter to remove accents and ligatures? What do you think? 
>
> -- 
> domcox <dominique at corbex.org> 
>
> _______________________________________________ 
> sword-devel mailing list: sword-devel at crosswire.org 
> http://www.crosswire.org/mailman/listinfo/sword-devel 
> Instructions to unsubscribe/change your settings at above page 


More information about the sword-devel mailing list