[sword-devel] dictionary ordering revisited

Troy A. Griffitts scribe at crosswire.org
Thu Mar 19 14:30:39 MST 2009


Well, just to give an update.  I'm stuck in Rome.

So, I'm sitting around and have time to comment... :)

There are a couple issue here that most of us know about but I'll 
enumerate a few:

The lexdict driver in the engine is designed as a quick key lookup 
datastore.  There is some processing done to try to do best matches.

all numbered keys are padded, so

123 gets changed to 00123

This helps match strongs numbers.

Other keys get touppered to help match case insensitive.  We can maybe 
do other things like strip accents and diacritics, but we should 
probably add something to the .conf file to allow different key 
massaging.  Just letting the module create massage the keys beforehand 
isn't a great solution because the same massaging needs to happen on the 
user input when they try to lookup a key.

Massaging keys is all beneficial for looking up the best match, in most 
cases, and presenting surrounding entries sometimes helps the user 
choose, provided they didn't like the resolution.  Let's not think of 
the surrounding keys as the 'order of the lexicon'.

Presenting a lexicon as an ordered book is a different function.  We 
should not attempt to use the lexdict driver to support this.

If there is ever a time when you would want to present a lexicon as an 
ordered book to an end user, then we should consider possibly having a 
genbook index on the same module for that purpose.

I may be going out on a limb here, but most of the time, I don't think a 
user would want to see a dictionary presented as a book.  I have come 
across one exception, and that is our Hesychius module, but it is really 
meant to be an ancient work studied as such.  I think we could create a 
lexdict module from the Hesychius data and it would be cool to do 
lookups and present the data from it, like we do with other SWORD 
lexdicts, but the primary purpose for the module is to make the ancient 
work available for scholars in its original form-- the original ancient 
work just happens to be a synonym lexicon.


Searching is a similar issue.  StripFilters are used to massage the text 
to put it into searchable form.  We've had the theory that user input is 
sent through the module's same StripFilter set.  We don't enforce this 
in the SWModule::search method because there may be times this isn't 
desirable, so it has been up to the frontend, if they think it is 
useful, to call: module.StripText(searchTerm); before calling 
module.search(searchTerm)

Maybe we should enforce some massaging logic in the engine instead of 
leaving it to the frontend to make the choice, but I'd rather leave the 
freedom to the consumer of the API to make the decision.

Just some comments...

	-Troy.




Eeli Kaikkonen wrote:
> DM Smith wrote:
> 
>>
>> The problem is a bit deeper than that.
>>
> 
> Yes, and there are some other things I want to bring up again lest they 
> be forgotten.
> 
> 1) The case may convey information, e.g. Liddel&Scott uses capitals for 
> root words.
> 2) L&S uses different ordering for iota subscriptum/accents/spiritus 
> than BAGD, at least as far as I can remember.
> 3) Exact ordering may convey information, e.g. L&S adds word "hence" at 
> the end of some entries because the next entry depends on the previous.
> 
> The information of 1) and 3) can be represented in some other way, but 
> even if it's taken care of, the subjective quality suffers if lexicons 
> don't follow the originals.
> 
> --Eeli Kaikkonen
> 
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page




More information about the sword-devel mailing list