[sword-devel] Arabic Bible
Kamal Abou Mikhael
kamal.abm at gmail.com
Fri Jul 11 08:17:13 MST 2008
Dear All,
Some notes about the significance of the short vowels and Arabic search.
1. When short vowels are not present, the meaning of the word can be
ambiguous.
The reader disambiguates by context, logic, or previous knowledge of the
verse.
The difference between the verbs "to kill" or "to be killed" lies in the
short vowel.
Thus, we can never overestimate their importance.
2. Un-vowelized text is highly valuable in terms of search because it
makes it
much easier and beneficial. No Arabic searcher wants to type short
vowels, it's
tedious and you may get it wrong. Not only that... most queries into
the text are supposed
to be ambiguous. The fact that "to kill" and "to be killed" are packed
in one word would make
a vowel-free search equivalent to the same kind of search that occurs in
English.
In addition, words that differ only in their vowelization are often
related in meaning.
3. Arabic has a root/pattern morphology that makes many search options
possible.
One can search for words with a similar root or with a similar pattern.
There is even
a hybrid approach that I explored in my masters thesis that converts
related verbal nouns
to their related verbs.
This kind of stuff exists by default in English because
"reader","reading","read", and "readable"
will all show up in search because the "er","ing", and "able" are not
mixed inside the word.
Anyway, I bring all this up to say that it would be valuable to have
non-vowelized search
of vowelized text and to have varying modes of search.
I did some work with lucene in Java and I'm aware that it is possible to
implement different kind
of filters and to keep track of the location of the token in the
original document.
The time I can spend on this is limited, almost none. However, if
someone would like to take
these insights and use them, it would be beneficial and interesting at
the same time.
If someone is interesed, I can alsp provide you with my M.S. thesis,
which was about a
configurable stemming engine. The implementation was evaluated within
IR. However,
the methods use may be of more value in Bible search.
God bless,
Kamal Abou Mikhael
More information about the sword-devel
mailing list