[sword-devel] TEI formatting, duplicated key (BDB Glosses)

Jonathan Morgan jonmmorgan at gmail.com
Mon Apr 30 07:36:31 MST 2012


Hi DM,

On Tue, May 1, 2012 at 12:00 AM, DM Smith <dmsmith at crosswire.org> wrote:

>
> On 04/30/2012 09:37 AM, Daniel Owens wrote:
>
>>
>>
>> On 04/30/2012 06:54 AM, Chris Little wrote:
>>
>>> On 4/30/2012 4:39 AM, David Troidl wrote:
>>>
>>>> Hi Chris,
>>>>
>>>> I'm certainly no expert on your TEI dictionaries, but wouldn't it make
>>>> sense to have the first key be one that would sort properly, and present
>>>> the dictionary in true alphabetical order? I'm thinking of Middle
>>>> Liddell, as well as the Hebrew. This key wouldn't even necessarily have
>>>> to be shown to the user. The second key, the title, could then maintain
>>>> the proper accents for display, without hindering sorting, searching or
>>>> navigation.
>>>>
>>>
>>> I confess, I don't understand what you're proposing this as an
>>> alternative to.
>>>
>>> In the example Karl cites, there's just one actual key per entry. It is
>>> an uppercased version of the entryFree's n attribute. This is the key that
>>> is sorted.
>>>
>>> The un-uppercased version from the n attribute is being rendered as part
>>> of the entry text via the TEI filters. This is the part I'm proposing we
>>> retain, but render somewhere else, e.g. right-justified at the bottom of
>>> the entry.
>>>
>>> We also render all the text of the entry, which in these cases includes
>>> the text from a title element.
>>>
>>> I don't know what 'true alphabetical order' means, but if you mean
>>> localized sort order, it's not possible with the current implementation of
>>> this module type.
>>>
>>> --Chris
>>>
>>>
>> I think David's concern is something that needs to be dealt with. A
>> number of possibilities could be pursued, some of them together:
>>
>>    1. The current implementation is to sort by unicode code points. This
>> works particularly well with numeric keys. A quick solution for languages
>> for which such sorting is not alphabetical would be to follow David's
>> suggestion of using keys that the user does not even see. This has the
>> advantage of providing a workable solution right away, but there are some
>> problems with this. First, we could create a new "strongs" standard because
>> the current implementation does not actually hide keys. That could be
>> solved by making the keys so obscure that no one would remember them.
>> Second, any future, more robust solution would require reworking all
>> modules keyed to it. I have toyed with this solution, and it might be the
>> pragmatic way forward, but it is not ideal.
>>
>>    2. A localized sort order, which I think this is what David means by
>> true alphabetical order, would be a better long-term solution.
>>
>>    3. In addition, using genbooks for lexica would work for lexica that
>> are sorted by root, with subentries nested in a hierarchy, just like in the
>> Hesychius module and BDB. I have been working with Troy on this.
>> Unfortunately, front-ends do not recognize the Feature=HebrewDef option in
>> the conf file and allow genbooks as lexica. I can send anyone an example
>> lexicon if you are interested in working on this. In that case, instead of
>> @n as the key, */x-entry/@osisID would be the key.
>>
>> Any thoughts?
>>
>
> I think there is a problem with the sorting of entries in dictionaries
> where the keys are not ascii. I don't remember the details, but I seem to
> remember it having been discussed here.
>
> For JSword, we'll be building a Lucene search index for the key, the term
> and the whole entry. A user lookup will be normalized and the search will
> return the key with which lookup will proceed internally as it does today.
> ICU provides the ability to create a localized sort key (not at all
> suitable for display) that can be used to sort dictionary entries for the
> end-users locale. I'm thinking that for TEI dictionaries the representation
> of the key should not be shown at all.
>

BPBible, and I believe some other frontends as well use binary search on
the original module order to locate a key in a virtual list.  This provides
very noticeable speedups on large dictionaries like ISBE.  I think this
would require the original module creation to place a module in localised
key order if we really wanted to order by that, not just have a lookup
which as I understand it would only be done when actually looking for a
key?  It also really means that a module can be sorted in one and only one
way.

Then again, I'm not even sure we can guarantee any kind of binary search on
localised keys.

A related issue for English dictionaries is allowing mixed-case dictionary
keys (and I think I have heard similar comments about Greek and maybe other
languages).  At the moment I think SWORD requires dictionary keys to be
upper-case to ensure that they sort correctly, but really "Aaron's Rod"
looks much better than "AARON'S ROD".  BPBible now attempts to
automatically and heuristically turn keys to mixed case, which I think
looks a lot better, but ideally this would be done in the same way as for
other languages: separating sort order from codepoint order in some way.

Jon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20120501/a10410e3/attachment.html>


More information about the sword-devel mailing list