[sword-devel] Latin diacritics

DM Smith dmsmith at crosswire.org
Mon Feb 1 06:14:11 MST 2016


No need to re-invent the wheel… The folk at Lucene have done the invention. I don’t know if it has made its way into CLucene.

The issue is not what is seen or entered by a user but what is stored in the index. The typical search mechanism has to normalize the search request the same as the text that was put into the search index. In Lucene speak handling such as ß => ss is called folding. See the comments in the Lucene issue: https://issues.apache.org/jira/browse/LUCENE-1343 <https://issues.apache.org/jira/browse/LUCENE-1343> Robert Muir is the one who has a strong grasp on it. You’ll also see me in the thread :) .

In some languages, the “marks” are vowels and in others are not. Folding has to be done by language.

The issue I see is that one might change the semantic meaning of a word. Two different words may fold into the same. Such would produce false hits.

In Him,
	DM


> On Feb 1, 2016, at 6:24 AM, Peter von Kaehne <refdoc at gmx.net> wrote:
> 
> No, we would call it an "sz", but spell it "ss" as the alternative to the letter ß. Talk about confusing the kids...
> 
> Peter 
> 
>> Gesendet: Montag, 01. Februar 2016 um 10:54 Uhr
>> Von: "David Haslam" <dfhmch at googlemail.com>
>> An: sword-devel at crosswire.org
>> Betreff: Re: [sword-devel] Latin diacritics
>> 
>> Corrigendum:
>> 
>> ....with an Eszett "ß" by entering "sz" ? 
>> 
>> though of course, some words that used to have "ß" now have "ss", so it gets
>> even harder.
>> 
>> David
>> 
>> 
>> 
>> --
>> View this message in context: http://sword-dev.350566.n4.nabble.com/Latin-diacritics-tp4655927p4655937.html
>> Sent from the SWORD Dev mailing list archive at Nabble.com.
>> 
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
> 
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20160201/11b5fc4f/attachment.html>


More information about the sword-devel mailing list