TEI Dictionaries/DictionaryProposal

From CrossWire Bible Society
Jump to: navigation, search


The current SWORD lexicon model allows for only flat dictionaries, which is ideally suited to Strongs but not to more recent lexica. A new model needs to take into account page numbers, non-alphabetic sorting, and hierarchical entries. Ideally it would also allow for browsing a dictionary more like a book.

Problems with the Current Model

  • Dictionaries are flat. Dictionaries that are hierarchical must be flattened, but BDB (forthcoming) is hierarchical. Roots form super-entries, and the lexicon as a whole is not strictly alphabetical. See the example document below, which is abstracted from BDB.
  • Entries are sorted according to unicode code points. This leads to a number of problems.
    • In many languages and scripts (including some Latin scripts), sorting by unicode code points does not preserve the proper alphabetic order.
    • If BDB were sorted in this way, it would nullify the information about the connections between words based on the roots.
  • There is no way to identify page numbers found in print editions of a given module. This information is particularly important for those doing academic work.
  • In practice, front-ends usually display lexicon entries as isolated containers. This model works well for Strongs because Strongs-tagged texts take you directly to the correct entry. This does not work for all dictionaries, though. When looking up a word, you might want to scan up and down the "page" to find the entry you are looking for. This is especially important if natural language keys are used so that the text might get you to roughly the right place in the dictionary but not necessarily the exact entry you need. Dictionaries need to allow for fuzzy lookup.

Proposed Features of a New Model

  • The order of the lexicon should be the same order as the XML file used to compile the module.
  • Perhaps an arbitrary (numeric?) key for entries could be created that would be hidden from the user to make life easier for developers, but topic maps could connect corresponding entries in numbered dictionaries (Strongs) and dictionaries keyed to natural language (BDB, etc.). This could be extensible over the long-term.
  • Continuous scrolling would facilitate displaying page numbers and browsing entries.
  • Markup should look similar to a Genbook, with the option of navigating using a tree structure, as in the Hesychius module. It may be that the easiest solution is to markup hierarchical dictionaries as Genbooks and add the functionality to the engine that allows lemma lookup in genbooks as well as "proper" dictionary modules.

Example TEI File

Note the following features in this snippet of the first few entries in BDB (without entry text):

  • Page breaks <pb>
  • Alphabetical Headings <div1>
  • Some entries are super-entries <superEntry> and others are not <entry>. I realize <entry> has to be structured, but for some reason the specification does not allow for using <entryFree> with <superEntry>, so I settled for markup that would illustrate the features of BDB.
  • The first word in Strong's is אב, but here in BDB the entry points forward to the full אב entry under the root אבה.
  • Of those entries that are here, one (אביב) is out of alphabetical sequence. The sequence of Strong's numbers is H3, H24, H6, etc.
<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.crosswire.org/2008/TEIOSIS/namespace" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.crosswire.org/2008/TEIOSIS/namespace http://www.crosswire.org/OSIS/teiP5osis.1.4.xsd">
  <pb n="1"/>
    <superEntry id="א" trans="a" />
    <entry id="אב" trans="ab">v. <ref target="BDB:"II_אבה">II אבה</ref></entry>
    <superEntry id="אבב" trans="abb">
      <entry id="אב" trans="ab" strong="H3"/>
      <entry id="אביב" trans="abib" strong="H24"/>
    <superEntry id="אבד" trans="abd" strong="H6">
  <pb n="2"/>
      <entry id="אבד" trans="abd" strong="H8"/>
      <entry id="אבדה" trans="abdh" strong="H9"/>
      <entry id="אבדון" trans="abdwn" strong="H10"/>