[sword-devel] Search bug & New Arabic Bible, Not Shaped SVD Version

Peter von Kaehne refdoc at gmx.net
Mon Nov 26 02:55:24 MST 2012


If it is the diacritics, then the solution is a patch which was submitted (but probably never applied) a year or so ago.

Peter
-------- Original-Nachricht --------
> Datum: Mon, 26 Nov 2012 11:33:06 +0200
> Von: pola ashraf <5001 at hotmail.com>
> An: SWORD Developers\' Collaboration Forum <sword-devel at crosswire.org>
> Betreff: Re: [sword-devel] Search bug & New Arabic Bible, Not  Shaped SVD Version

> Sorry for choosing the wrong word 
> this wikipedia article talking about this topic 
> https://en.wikipedia.org/wiki/Arabic_diacritics
> 
> Thanks Chris for your reply about the filter, Actually I don't have any
> contact details for the developers of the frontends to report them this
> problem, hope someone in this list report them about all this discussion :)
> 
> So now we know the problem and the solution .
> 
> > Date: Mon, 26 Nov 2012 01:05:16 -0800
> > From: chrislit at crosswire.org
> > To: sword-devel at crosswire.org
> > Subject: Re: [sword-devel] Search bug & New Arabic Bible, Not  Shaped
> SVD Version
> > 
> > You're talking about vowels, not shaping. Shaping in Arabic changes the 
> > shape of the letter according to its context in the word (initial, 
> > medial, final, or isolated). I imagine unshaped Arabic would be very 
> > difficult to read. Arabic without vowel marks, on the other hand, is 
> > standard.
> > 
> > I would have thought that the indexing would have been done without 
> > vowels or both with and without vowels. It should be easy to recover the
> > vowel-less text for indexing by applying the UTF8ArabicPoints filter.
> > 
> > --Chris
> > 
> > On 11/25/2012 11:45 PM, pola ashraf wrote:
> > > Using a comparison tool from ICU the two strings resulted in different
> > > character numbers
> > > Words to compare
> > > يَسُوعَ
> > > يسوع
> > > Which is the Name of JESUS Christ in Arabic but one is shaped and the
> > > other isn't
> > >
> > > Words converted to HEX Format
> > > \u064a \u064e \u0633 \u064f \u0648 \u0639 \u064e
> > > \u064a \u0633 \u0648 \u0639
> > >
> > > That's why search engines of some frontends doesn't come with any
> > > results for not shaped words
> > >
> > > The suggestion is to make the index contain the shaped words plus the
> > > same words without shaping
> > >
> > > Comparison Tool link   https://ssl.icu-project.org/icu-bin/scompare
> > >
> > > Note: to clarify the meaning of shaping, shaping is the usage of
> > > Characters like the following ( ٌ    ُ   ٍ   َ    ْ  ً  )
> > > these special characters are shapes, and may change the whole word
> > > meaning and help in correct reading, but as mentioned before, it make
> > > reading harder and make problem with search functions
> > >
> > > Note: And Bible search normally without problems, but the desktop
> > > programs like Xiphos and Bible Time have this problem
> > >
> > > Pola
> > >
> ------------------------------------------------------------------------
> > >
> > > I think Arabic shapes add extra Unicode characters that's why the 2
> same
> > > words - i mentioned before - don't give the same results
> > >
> > > ------------------
> > > Any Arabic search problem is unconnected to shaping.
> > >
> > > Modules are routinely created and stored in a normalised format, user
> > > entries, e.g. for search ate equally normalised
> > >
> > >
> > >
> > > _______________________________________________
> > > sword-devel mailing list: sword-devel at crosswire.org
> > > http://www.crosswire.org/mailman/listinfo/sword-devel
> > > Instructions to unsubscribe/change your settings at above page
> > >
> > 
> > 
> > _______________________________________________
> > sword-devel mailing list: sword-devel at crosswire.org
> > http://www.crosswire.org/mailman/listinfo/sword-devel
> > Instructions to unsubscribe/change your settings at above page
>  		 	   		  



More information about the sword-devel mailing list