[sword-devel] Dictionary ordering

Ben Morgan benpmorgan at gmail.com
Wed Sep 17 19:32:32 MST 2008


Hi Daniel,

Code points are not the only way to sort it.
However, there does need to be a comparison function defined, which will
compare two words and give which is bigger.
This needs to be used consistently, from module creation to frontend. There
could be a library of defined comparators provided by SWORD - but you would
need one for each sort order you wanted (which approaches one per language).

Personally, I don't find that sorted order is particularly important in
dictionaries - I would type in a word, and then hope that if it is a
different form of the word it would be relatively close. Some frontends may
not give the ability to type in words, though.

But I haven't used dictionaries in other languages, so it may be different
for them - especially once diacritics are involved.

The reasons why dictionaries are different from bibles are:
1) Bibles have a known structure, which is hardcoded in the key type (this
is going to be able to change soon, for alternate versification, though -
probably leading to less efficient modules)
2) Dictionaries can be much, much larger - Websters is a 14Mb download
compressed, as compared to the WEB's ~1.5Mb

That's not to say the dictionaries can't be done more efficiently than they
are currently. Looking at the code, they could be quicker for the (common?)
case of incrementing a module. Currently they do a binary search for every
increment.
Further, they could probably be optimized for key retrieval - which is the
really important thing here. (For example by storing the keys separately,
uncompressed, 1 key per line)

God Bless,
Ben
-------------------------------------------------------------------------------------------
The Lord is not slow to fulfill his promise as some count slowness,
but is patient toward you, not wishing that any should perish,
but that all should reach repentance.
2 Peter 3:9 (ESV)


On Thu, Sep 18, 2008 at 11:21 AM, Daniel Owens <dhowens at pmbx.net> wrote:

>  Is code point order the ONLY way to sort dictionary entries? Surely there
> is a solution which would retain the printed or intended order of dictionary
> entries without giving up lots of efficiency. If not, I think users would
> find a correctly ordered but slower dictionary to one which is fast but
> jumbled up.
>
> At the very least, even if dictionaries aren't sorted by the printed order,
> they should AT LEAST be in alphabetical order. To me that is a
> non-negotiable for a dictionary--people depend on dictionaries being in the
> right order, and code point order disturbs that for some languages. Here are
> a couple of ideas:
>     - Could a configuration file of some sort be created to define a
> sorted order for a given language that would actually be in alphabetical
> order?
>     - Could a dictionary index be created to handle large dictionaries
> which allows for the retention of the correct order of entries (whether that
> is the printed order or alphabetical order)?
>     - Bibles are not ordered by code point, and we are able to search them
> fairly quickly. Do dictionaries need to be compiled in a fashion similar to
> Bibles?
>
> As it stands, dictionaries are NOT displayed in alphabetical order (at
> least not Vietnamese, and apparently Farsi), which at best looks silly to
> the user and at worst means you have to manually hunt around to find the
> right entry, making a Genbook more efficient for the user in the end. But
> then you lose the dictionary lookup feature.
>
> Daniel
>
> Ben Morgan wrote:
>
> The issue with ordering as I understand it is that if it is in (some form
> of) sorted order, you can use binary search to find entries.
> If you want order retained, it is best to use a genbook - but it won't be as
> efficient, and may not have as good UI support.
> With huge english dictionaries (like Webster's, for instance) this becomes
> very important.
>
> >From BPBible's perspective, dictionary handling is done as follows:
> 1. Read the index of the dictionary and divide by 4 or 6 to get the length
> (depending on the driver)
> 2. Set the virtual list length to the dictionary length
> 3. When any item is displayed in the virtual list, it retrieves it from the
> module.
> 4. When the user starts typing in the text box above, it does a binary
> search to find which item to display.
>
> 4 is already quite slow enough on big dictionaries - by having it unsorted,
> it would make it quite a lot slower, I imagine.
> All the keys from the module would have to be read in, which takes a while.
>
> God Bless,
> Ben
> -------------------------------------------------------------------------------------------
> The Lord is not slow to fulfill his promise as some count slowness,
> but is patient toward you, not wishing that any should perish,
> but that all should reach repentance.
> 2 Peter 3:9 (ESV)
>
>
> On Thu, Sep 18, 2008 at 12:43 AM, Daniel Owens <dhowens at pmbx.net> <dhowens at pmbx.net> wrote:
>
>
>
>  mention that byte ordering does some strange things to Vietnamese
> dictionaries. The Vietnamese script is a Latin script, but because it uses
> some odd characters code point ordering results in illogical and
> non-alphabetical entry ordering. For example, the "d" with a line through it
> (đ) gets relegated to near the end of the dictionary instead of after the
> regular "d" or anything with an apostrophe at the beginning of a word or
> phrase gets moved to the top of the list regardless of the first letter
> (such as 'tis). I am supportive of the IIRC general opinion. Let the module
> creator worry about the ordering. Otherwise you get some very strange
> dictionary behavior.
>
>
>
> ------------------------------
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.orghttp://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>
>
> --
> PMBX license 1502
>
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.crosswire.org/pipermail/sword-devel/attachments/20080918/32a5d632/attachment-0001.html 


More information about the sword-devel mailing list