<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    On 04/30/2012 10:36 AM, Jonathan Morgan wrote:

    <blockquote

cite="mid:CAOOKcO7bDguz5FcM-si4Ops56WGLB1fYEcLdNVuqnMp4gcO5kw@mail.gmail.com"

      type="cite">Hi DM,<br>

      <br>

      <div class="gmail_quote">On Tue, May 1, 2012 at 12:00 AM, DM Smith

        <span dir="ltr"><<a moz-do-not-send="true"

            href="mailto:dmsmith@crosswire.org" target="_blank">dmsmith@crosswire.org</a>></span>

        wrote:<br>

        <blockquote class="gmail_quote" style="margin:0 0 0

          .8ex;border-left:1px #ccc solid;padding-left:1ex">

          <div class="HOEnZb">

            <div class="h5"><br>

              On 04/30/2012 09:37 AM, Daniel Owens wrote:<br>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex">

                <br>

                <br>

                On 04/30/2012 06:54 AM, Chris Little wrote:<br>

                <blockquote class="gmail_quote" style="margin:0 0 0

                  .8ex;border-left:1px #ccc solid;padding-left:1ex">

                  On 4/30/2012 4:39 AM, David Troidl wrote:<br>

                  <blockquote class="gmail_quote" style="margin:0 0 0

                    .8ex;border-left:1px #ccc solid;padding-left:1ex">

                    Hi Chris,<br>

                    <br>

                    I'm certainly no expert on your TEI dictionaries,

                    but wouldn't it make<br>

                    sense to have the first key be one that would sort

                    properly, and present<br>

                    the dictionary in true alphabetical order? I'm

                    thinking of Middle<br>

                    Liddell, as well as the Hebrew. This key wouldn't

                    even necessarily have<br>

                    to be shown to the user. The second key, the title,

                    could then maintain<br>

                    the proper accents for display, without hindering

                    sorting, searching or<br>

                    navigation.<br>

                  </blockquote>

                  <br>

                  I confess, I don't understand what you're proposing

                  this as an alternative to.<br>

                  <br>

                  In the example Karl cites, there's just one actual key

                  per entry. It is an uppercased version of the

                  entryFree's n attribute. This is the key that is

                  sorted.<br>

                  <br>

                  The un-uppercased version from the n attribute is

                  being rendered as part of the entry text via the TEI

                  filters. This is the part I'm proposing we retain, but

                  render somewhere else, e.g. right-justified at the

                  bottom of the entry.<br>

                  <br>

                  We also render all the text of the entry, which in

                  these cases includes the text from a title element.<br>

                  <br>

                  I don't know what 'true alphabetical order' means, but

                  if you mean localized sort order, it's not possible

                  with the current implementation of this module type.<br>

                  <br>

                  --Chris<br>

                  <br>

                </blockquote>

                <br>

                I think David's concern is something that needs to be

                dealt with. A number of possibilities could be pursued,

                some of them together:<br>

                <br>

                   1. The current implementation is to sort by unicode

                code points. This works particularly well with numeric

                keys. A quick solution for languages for which such

                sorting is not alphabetical would be to follow David's

                suggestion of using keys that the user does not even

                see. This has the advantage of providing a workable

                solution right away, but there are some problems with

                this. First, we could create a new "strongs" standard

                because the current implementation does not actually

                hide keys. That could be solved by making the keys so

                obscure that no one would remember them. Second, any

                future, more robust solution would require reworking all

                modules keyed to it. I have toyed with this solution,

                and it might be the pragmatic way forward, but it is not

                ideal.<br>

                <br>

                   2. A localized sort order, which I think this is what

                David means by true alphabetical order, would be a

                better long-term solution.<br>

                <br>

                   3. In addition, using genbooks for lexica would work

                for lexica that are sorted by root, with subentries

                nested in a hierarchy, just like in the Hesychius module

                and BDB. I have been working with Troy on this.

                Unfortunately, front-ends do not recognize the

                Feature=HebrewDef option in the conf file and allow

                genbooks as lexica. I can send anyone an example lexicon

                if you are interested in working on this. In that case,

                instead of @n as the key, */x-entry/@osisID would be the

                key.<br>

                <br>

                Any thoughts?<br>

              </blockquote>

              <br>

            </div>

          </div>

          I think there is a problem with the sorting of entries in

          dictionaries where the keys are not ascii. I don't remember

          the details, but I seem to remember it having been discussed

          here.<br>

          <br>

          For JSword, we'll be building a Lucene search index for the

          key, the term and the whole entry. A user lookup will be

          normalized and the search will return the key with which

          lookup will proceed internally as it does today. ICU provides

          the ability to create a localized sort key (not at all

          suitable for display) that can be used to sort dictionary

          entries for the end-users locale. I'm thinking that for TEI

          dictionaries the representation of the key should not be shown

          at all.<br>

        </blockquote>

        <div><br>

          BPBible, and I believe some other frontends as well use binary

          search on the original module order to locate a key in a

          virtual list.  This provides very noticeable speedups on large

          dictionaries like ISBE.  I think this would require the

          original module creation to place a module in localised key

          order if we really wanted to order by that, not just have a

          lookup which as I understand it would only be done when

          actually looking for a key?  It also really means that a

          module can be sorted in one and only one way.<br>

          <br>

          Then again, I'm not even sure we can guarantee any kind of

          binary search on localised keys.<br>

          <br>

          A related issue for English dictionaries is allowing

          mixed-case dictionary keys (and I think I have heard similar

          comments about Greek and maybe other languages).  At the

          moment I think SWORD requires dictionary keys to be upper-case

          to ensure that they sort correctly, but really "Aaron's Rod"

          looks much better than "AARON'S ROD".  BPBible now attempts to

          automatically and heuristically turn keys to mixed case, which

          I think looks a lot better, but ideally this would be done in

          the same way as for other languages: separating sort order

          from codepoint order in some way.<br>

        </div>

      </div>

    </blockquote>

    <br>

    The idea given above is to have an index to the SWORD index. It can

    be built to be ordered and accessed in whatever way is needed to

    solve the problems.<br>

    <br>

    As you note, the problem is that SWORD makes severe assumptions

    about the order and nature of the keys. Unless care is taken

    uppercasing is not always appropriate. For example in Turkish the

    uppercase of 'i' is not 'I'.<br>

    <br>

    In Him,<br>

        DM<br>

  </body>

</html>