[sword-devel] R/w CVS
Victor Porton
sword-devel@crosswire.org
Thu, 06 Jun 2002 02:33:06 +0600
> We are happy to assist you in the use of the Sword API, but I think
> giving us some more concrete explanations of what you're working on
> would help us in that role.
Firstly what I do:
I write software which will allow conveniently classify every word of Hebrew
Bible *in the form it is encountered*, not in the primary form (primary form
is single, masculine, 1st face etc.) More exactly, I write the software for
creating and using a dictionary of the words in Hebrew Bible in which the keys
are the words as these are in Bible instead of customary primary forms as in
most dictionaries. Note that I ignore vowels.
I'm going to incorporate it in the general purpose Bible study tools like
BibleTime and GnomeSword, creating special widget sets for this. It will among
other display the tree (as a tree widget) of all grammar forms of a given word
in Hebrew Bible, allowing to see all the possible meanings of the word.
I thought about many possible file formats. My last variant (If no complaints
on this, I will most probably stick with it even despite I already several
times changed "final" decisions about the file format :-) ) is the following:
I create _one_ LD in which there are keys of _two_ kinds:
1. Just a Hebrew word (Note that I will use non-existing Hebrew words and
non-existing grammar forms as examples, to not spend my time finding real
examples). Example:
LQWE
<form ref="LQWE:pqr@abc"/>
<form ref="LQWE:zzz@abc"/>
After the colon goes a "coded" (computer readable) description of a Hebrew
syntax form (which may be decoded like "noun sing. masc.") So this means that
"LQWE" can be translated as words with two different grammar forms: "pqr@abc"
and "zzz@abc".
2. Entry for a Hebrew word in a specified syntax form:
LQWE:zzz@abc
<sense root="word" short="to word">
<!-- a HTML fragment here -->
</sense>
<sense root="sense" short="being senseful">
<!-- a HTML fragment here -->
</sense>
This would mean that "LQWE" can be translated as an adverb-like syntax form
and has two senses: "to word" and "being senseful".
My question: do you consider this file format reasonable? Well, one problem
with it exists: for enumerating all the Hebrew words presented in a
dictionary, one would need to enumerate all the entries and throw away ones
with colons, so spending CPU time etc. one enumerating unneeded entries.
I would use two separate LDs for just words and particular syntax forms of the
words, but this would create two modules despite of conceptually these are a
whole and should be always used together. Or may be I mistake: can one module
include two LDs?
--
Victor Porton (porton@narod.ru)