[sword-devel] dictionary ordering revisited
Troy A. Griffitts
scribe at crosswire.org
Thu Mar 19 14:30:39 MST 2009
Well, just to give an update. I'm stuck in Rome.
So, I'm sitting around and have time to comment... :)
There are a couple issue here that most of us know about but I'll
enumerate a few:
The lexdict driver in the engine is designed as a quick key lookup
datastore. There is some processing done to try to do best matches.
all numbered keys are padded, so
123 gets changed to 00123
This helps match strongs numbers.
Other keys get touppered to help match case insensitive. We can maybe
do other things like strip accents and diacritics, but we should
probably add something to the .conf file to allow different key
massaging. Just letting the module create massage the keys beforehand
isn't a great solution because the same massaging needs to happen on the
user input when they try to lookup a key.
Massaging keys is all beneficial for looking up the best match, in most
cases, and presenting surrounding entries sometimes helps the user
choose, provided they didn't like the resolution. Let's not think of
the surrounding keys as the 'order of the lexicon'.
Presenting a lexicon as an ordered book is a different function. We
should not attempt to use the lexdict driver to support this.
If there is ever a time when you would want to present a lexicon as an
ordered book to an end user, then we should consider possibly having a
genbook index on the same module for that purpose.
I may be going out on a limb here, but most of the time, I don't think a
user would want to see a dictionary presented as a book. I have come
across one exception, and that is our Hesychius module, but it is really
meant to be an ancient work studied as such. I think we could create a
lexdict module from the Hesychius data and it would be cool to do
lookups and present the data from it, like we do with other SWORD
lexdicts, but the primary purpose for the module is to make the ancient
work available for scholars in its original form-- the original ancient
work just happens to be a synonym lexicon.
Searching is a similar issue. StripFilters are used to massage the text
to put it into searchable form. We've had the theory that user input is
sent through the module's same StripFilter set. We don't enforce this
in the SWModule::search method because there may be times this isn't
desirable, so it has been up to the frontend, if they think it is
useful, to call: module.StripText(searchTerm); before calling
module.search(searchTerm)
Maybe we should enforce some massaging logic in the engine instead of
leaving it to the frontend to make the choice, but I'd rather leave the
freedom to the consumer of the API to make the decision.
Just some comments...
-Troy.
Eeli Kaikkonen wrote:
> DM Smith wrote:
>
>>
>> The problem is a bit deeper than that.
>>
>
> Yes, and there are some other things I want to bring up again lest they
> be forgotten.
>
> 1) The case may convey information, e.g. Liddel&Scott uses capitals for
> root words.
> 2) L&S uses different ordering for iota subscriptum/accents/spiritus
> than BAGD, at least as far as I can remember.
> 3) Exact ordering may convey information, e.g. L&S adds word "hence" at
> the end of some entries because the next entry depends on the previous.
>
> The information of 1) and 3) can be represented in some other way, but
> even if it's taken care of, the subjective quality suffers if lexicons
> don't follow the originals.
>
> --Eeli Kaikkonen
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
More information about the sword-devel
mailing list