<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=UTF-8" http-equiv="Content-Type">
<title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
<font size="+1"><font face="Arial Unicode MS">Ben, <br>
<br>
Thanks for the explanation. It seems to me that setting up dictionaries
to use key retrieval from an uncompressed file with one key per line
(ordered as the module creator orders it) makes the most sense to me.
If that helps increase efficiency and preserves the order of dictionary
entries, then that is what we want.<br>
<br>
I will agree that the sorted order is not as important in BPBible
because of the lookup feature, but that breaks down when you need to
browse further within a range of entries. Furthermore, the example of
"'tis" suggests that, even in English, code pointing disturbs the
natural order of the dictionary, making it harder to browse for the
right entry. Unless you type in the apostrophe, you won't find "'tis"
because it will not be near "t" but be at the top of the dictionary,
which is very far away. In BibleDesktop, which doesn't yet have the
lookup feature yet, you have to browse for any dictionary entry (except
Strongs, where the key is a number and therefore in printed order!), so
the ordering really does matter. Also, frankly, a dictionary out of
alphabetical order just looks silly. </font></font><font size="+1"><font
face="Arial Unicode MS">In Vietnamese it's chaotic when dictionaries
are ordered by code point. </font></font><font size="+1"><font
face="Arial Unicode MS">Who ever heard of a dictionary where "d" comes
after "z"? That's what happens in Vietnamese.<br>
<br>
Daniel<br>
</font></font><br>
Ben Morgan wrote:
<blockquote
cite="mid:bbb201fa0809171932o1a701165vf98f90f9ac82177f@mail.gmail.com"
type="cite">
<pre wrap="">Hi Daniel,
Code points are not the only way to sort it.
However, there does need to be a comparison function defined, which will
compare two words and give which is bigger.
This needs to be used consistently, from module creation to frontend. There
could be a library of defined comparators provided by SWORD - but you would
need one for each sort order you wanted (which approaches one per language).
Personally, I don't find that sorted order is particularly important in
dictionaries - I would type in a word, and then hope that if it is a
different form of the word it would be relatively close. Some frontends may
not give the ability to type in words, though.
But I haven't used dictionaries in other languages, so it may be different
for them - especially once diacritics are involved.
The reasons why dictionaries are different from bibles are:
1) Bibles have a known structure, which is hardcoded in the key type (this
is going to be able to change soon, for alternate versification, though -
probably leading to less efficient modules)
2) Dictionaries can be much, much larger - Websters is a 14Mb download
compressed, as compared to the WEB's ~1.5Mb
That's not to say the dictionaries can't be done more efficiently than they
are currently. Looking at the code, they could be quicker for the (common?)
case of incrementing a module. Currently they do a binary search for every
increment.
Further, they could probably be optimized for key retrieval - which is the
really important thing here. (For example by storing the keys separately,
uncompressed, 1 key per line)
God Bless,
Ben
-------------------------------------------------------------------------------------------
The Lord is not slow to fulfill his promise as some count slowness,
but is patient toward you, not wishing that any should perish,
but that all should reach repentance.
2 Peter 3:9 (ESV)
On Thu, Sep 18, 2008 at 11:21 AM, Daniel Owens <a class="moz-txt-link-rfc2396E" href="mailto:dhowens@pmbx.net"><dhowens@pmbx.net></a> wrote:
</pre>
<blockquote type="cite">
<pre wrap=""> Is code point order the ONLY way to sort dictionary entries? Surely there
is a solution which would retain the printed or intended order of dictionary
entries without giving up lots of efficiency. If not, I think users would
find a correctly ordered but slower dictionary to one which is fast but
jumbled up.
At the very least, even if dictionaries aren't sorted by the printed order,
they should AT LEAST be in alphabetical order. To me that is a
non-negotiable for a dictionary--people depend on dictionaries being in the
right order, and code point order disturbs that for some languages. Here are
a couple of ideas:
- Could a configuration file of some sort be created to define a
sorted order for a given language that would actually be in alphabetical
order?
- Could a dictionary index be created to handle large dictionaries
which allows for the retention of the correct order of entries (whether that
is the printed order or alphabetical order)?
- Bibles are not ordered by code point, and we are able to search them
fairly quickly. Do dictionaries need to be compiled in a fashion similar to
Bibles?
As it stands, dictionaries are NOT displayed in alphabetical order (at
least not Vietnamese, and apparently Farsi), which at best looks silly to
the user and at worst means you have to manually hunt around to find the
right entry, making a Genbook more efficient for the user in the end. But
then you lose the dictionary lookup feature.
Daniel
Ben Morgan wrote:
The issue with ordering as I understand it is that if it is in (some form
of) sorted order, you can use binary search to find entries.
If you want order retained, it is best to use a genbook - but it won't be as
efficient, and may not have as good UI support.
With huge english dictionaries (like Webster's, for instance) this becomes
very important.
>From BPBible's perspective, dictionary handling is done as follows:
1. Read the index of the dictionary and divide by 4 or 6 to get the length
(depending on the driver)
2. Set the virtual list length to the dictionary length
3. When any item is displayed in the virtual list, it retrieves it from the
module.
4. When the user starts typing in the text box above, it does a binary
search to find which item to display.
4 is already quite slow enough on big dictionaries - by having it unsorted,
it would make it quite a lot slower, I imagine.
All the keys from the module would have to be read in, which takes a while.
God Bless,
Ben
-------------------------------------------------------------------------------------------
The Lord is not slow to fulfill his promise as some count slowness,
but is patient toward you, not wishing that any should perish,
but that all should reach repentance.
2 Peter 3:9 (ESV)
On Thu, Sep 18, 2008 at 12:43 AM, Daniel Owens <a class="moz-txt-link-rfc2396E" href="mailto:dhowens@pmbx.net"><dhowens@pmbx.net></a> <a class="moz-txt-link-rfc2396E" href="mailto:dhowens@pmbx.net"><dhowens@pmbx.net></a> wrote:
mention that byte ordering does some strange things to Vietnamese
dictionaries. The Vietnamese script is a Latin script, but because it uses
some odd characters code point ordering results in illogical and
non-alphabetical entry ordering. For example, the "d" with a line through it
(đ) gets relegated to near the end of the dictionary instead of after the
regular "d" or anything with an apostrophe at the beginning of a word or
phrase gets moved to the top of the list regardless of the first letter
(such as 'tis). I am supportive of the IIRC general opinion. Let the module
creator worry about the ordering. Otherwise you get some very strange
dictionary behavior.
------------------------------
_______________________________________________
sword-devel mailing list: <a class="moz-txt-link-abbreviated" href="mailto:sword-devel@crosswire.orghttp://www.crosswire.org/mailman/listinfo/sword-devel">sword-devel@crosswire.orghttp://www.crosswire.org/mailman/listinfo/sword-devel</a>
Instructions to unsubscribe/change your settings at above page
--
PMBX license 1502
_______________________________________________
sword-devel mailing list: <a class="moz-txt-link-abbreviated" href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a>
<a class="moz-txt-link-freetext" href="http://www.crosswire.org/mailman/listinfo/sword-devel">http://www.crosswire.org/mailman/listinfo/sword-devel</a>
Instructions to unsubscribe/change your settings at above page
</pre>
<pre wrap="">
<hr size="4" width="90%">
_______________________________________________
sword-devel mailing list: <a class="moz-txt-link-abbreviated" href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a>
<a class="moz-txt-link-freetext" href="http://www.crosswire.org/mailman/listinfo/sword-devel">http://www.crosswire.org/mailman/listinfo/sword-devel</a>
Instructions to unsubscribe/change your settings at above page</pre>
</blockquote>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
PMBX license 1502
</pre>
</body>
</html>