[jsword-devel] BookDataListener: Question 1
Joe Walker
jsword-devel@crosswire.org
Tue, 13 May 2003 09:38:22 +0100
Hi,
Hmmm. Yes, I was worried this would be the response.
Options:
1. Split the class up into 2 or more Listeners, more tightly focused to
lessen then number of methods on the interface. (I think this simply
wallpapers over the cracks - having 2 interfaces doesn't make things
easier to use)
2. Give up with the idea of using a SAX like interface and use a
generated data-bound style like JAXB (or Castor, ...). Directly using
JAXB does give us a strong dependency on it, which worries me, and I'm
sure anyone wanting to use JSword on a J2ME device would freak.
3. Give up with the idea of using a SAX like interface and use our own
custom data-bound style. So we write our own W class with a setMorph()
method etc. This just seems like JAXB with more work, so I'm even less
keen on this.
4. Use JDom (or DOM, DOM4J, XOM, etc). This again creates a big
dependency on an XML library that might not work on J2ME devices.
Any more?
If we don't care about J2ME then option feels right to me. If we do care
about J2ME then option 1 with it's problems still feels right.
My gut feeling is that while thinking about J2ME we may well be
straining to meet a target that could better be met by using a separate
tree, and in doing so making the JSword harder to use.
So there is an important question (1b!) How much do we care about J2ME?
Do we try to keep J2ME compatibility within the main tree or do we have
a spin off project - KSword?
I don't know much about J2ME, or any of the profiles so I can't comment
on how different things are there. Are they so different that we need a
separate project?
Joe.
Jacky Cheung wrote:
> Hi,
>
> I am not satisfied with this class. I am working on the GBF <WHxxx>,
> <WGxxx>, etc tags. I found that BookDataListener cannot do the job
> required to work for <WHxxxx>, etc tags. For example, in Gen 1:1, the
> word create has the following GBF data
> created<WH01254><WTH8804><WH0853>
> and it should be converted to
> <w morph="x-StrongsMorph:TH8804"
> lemma="x-Strongs:H0853|x-Strongs:H01254">create</w>
>
> The BookDataListener has startWord(...) and endWord() to generate w OSIS
> tag. However, it does not provide a proper interface to add attributes
> (like, morph, lemma).
>
> One way to fix it is to add 3 new startWord(...) method,
> - startWord(..., String morphAttrValue);
> - startWord(..., String lemmaAttrValue);
> - startWord(..., String morphAttrValue, String lemmaAttrValue);
>
> We need 3 because some word has only <WTH> (thus require to call the
> first method), some has only <WH>/<WG> (thus require to call the second
> one), and this case will call the third one.
>
> If we keep on doing things this way, there will be too many methods and
> difficult to maintain! => Bad design.
>
> Furthermore, with this design we cannot enjoy the benefit of XML
> binding. XML binding has slow startup.
>
> Any idea, I am thinking about how to generate the OSIS right now.
> Therefore, it is a right time to change it.
>
> Jacky
>
>
> Joe Walker wrote:
>
>>
>> Hi,
>>
>> I have 2 questions about the BookDataListener class. It's not really
>> an general API-user interface because I would only expect it to be
>> used by jsword developers, but I have a proposal that would make it
>> more important. But that is question 2.
>>
>> Question 1 is are we satisfied with BookDataListener right now? I've
>> attached it in case you don't have easy access to it. It is quite SAX
>> like if you are familiar with SAX
>>
>> Good points:
>> - Can be very memory efficient
>> - Not tied to any specific XML implementation
>> - Can be fast
>> - One 1 class to implement
>>
>> Bad points:
>> - Does not do 100% of OSIS (but it could)
>> - Lots of methods (2x num of elements)
>>
>> Joe.
>>
>> ------------------------------------------------------------------------
>>
>>
>> package org.crosswire.jsword.book.data;
>>
>> import org.crosswire.jsword.passage.Passage;
>> import org.crosswire.jsword.passage.Verse;
>>
>> /**
>> * A listener for events that can affect a BookData.
>> * * The OSIS (1.1.1) doumentation had these notes:
>> * <p>The <verse> element is used to make the standard verse
>> divisions in
>> * Bibles, although obviously a later imposition on the text. It is a
>> standard
>> * method for referencing biblical materials.</p>
>> * * <p>Note that verses often cross the boundaries of other elements
>> and that
>> * raises the question of how to deal with elements that overlap.
>> Normally the
>> * verse identifier will be its osisID, exampe "Matt.1.1" and
>> the like.
>> * When a verse is segmented, that is split into two or more parts to
>> cross a
>> * boundary, like a quotation, the ID should be used to indicate the
>> various
>> * parts.</p>
>> * * <p>The <verse> element in OSIS does not have a counterpart
>> in the TEI
>> * Guidelines.</p>
>> * * <p>NOTE(joe): Outstanding OSIS Questions ...
>> * <li>What to do with paragraph boundaries?</li>
>> * <li>What are the following OSIS attriibutes on the word element for:
>> POS, morph, lemmua, gloss, src, xlit?</li>
>> * </p>
>> * * <p><table border='1' cellPadding='3' cellSpacing='0'>
>> * <tr><td bgColor='white' class='TableRowColor'><font size='-7'>
>> *
>> * Distribution Licence:<br />
>> * JSword is free software; you can redistribute it
>> * and/or modify it under the terms of the GNU General Public License,
>> * version 2 as published by the Free Software Foundation.<br />
>> * This program is distributed in the hope that it will be useful,
>> * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
>> * General Public License for more details.<br />
>> * The License is available on the internet
>> * <a href='http://www.gnu.org/copyleft/gpl.html'>here</a>, or by
>> writing to:
>> * Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston,
>> * MA 02111-1307, USA<br />
>> * The copyright to this program is held by it's authors.
>> * </font></td></tr></table>
>> * @see docs.Licence
>> * @author Joe Walker [joe at eireneh dot com]
>> * @version $Id: BookDataListener.java,v 1.5 2003/05/08 12:55:28 joe Exp $
>> */
>> public interface BookDataListener
>> {
>> /**
>> * Start a Book Document
>> */
>> public void startDocument(String initials);
>> public BookData endDocument();
>>
>> /**
>> * Start a Section
>> */
>> public void startSection(String title);
>> public void endSection();
>>
>> /**
>> * Start a Verse
>> */
>> public void startVerse(Verse verse);
>> public void endVerse();
>>
>> /**
>> * Add a text string to the contents of the Verse
>> */
>> public void addText(String text);
>>
>> /**
>> * Add to a JAXBVerseData.
>> * * <p>OSIS v1.1.1 has this to say:
>> * * <p>The <note> element abounds in Bible translations,
>> commentaries
>> * and other works about biblical literature. The actual text of the
>> * <note> is recorded inline, where it applies to the text,
>> but that
>> * is not necessarily where it is displayed. If it is more
>> convenient to
>> * record notes at the end of a text, care should be taken to point
>> to the
>> * proper reference for the note using the work and cite
>> attributes.</p>
>> * * <p><b>Implementation details</b></p>
>> * <p>Ignored a host of optional attributes and the fact that the
>> contents
>> * can include a, abbr, catchWord, date, hi, figure, foreign, index,
>> * inscription, lg, list, mentioned, milestone, milestoneEnd,
>> * milestoneStart, name, p, q, rdg, table, verse.</p>
>> * <p>Also ignored after thought: divineName, reference, seg, title, w
>> * and the noteType and osisRef attrs</p>
>> */
>> public void addNote(String marker, String addition);
>>
>> /**
>> * Add to a JAXBVerseData.
>> * * <p>OSIS v1.1.1 has this to say:
>> * * <p>The <divineName> element is to be used to mark
>> the name of the
>> * diety in biblical and other texts. Traditions vary on the
>> writing of the
>> * divine name and this is our response to satisfy the need for
>> marking the
>> * divine name in the text and yet not make any presumptions about
>> various
>> * traditions concerning the writing of the name. This would be a most
>> * appropriate element to record a typology of the divine name as
>> found or
>> * translated in the text.</p>
>> * <p>There is no equivalent TEI element, other than perhaps
>> <name>,
>> * since the TEI Guidelines did not specifically address issues
>> related to
>> * the encoding of biblical materials.</p>
>> * * <p><b>Implementation details</b></p>
>> * <p>Ignored a host of optional attributes and the fact that the
>> contents
>> * can include a, abbr, foreign, index, w.</p>
>> * <p>Also ignored after thought: note, reference, seg</p>
>> */
>> public void addDivineName(String name);
>>
>> /**
>> * Add to a JAXBVerseData.
>> * * <p>OSIS v1.1.1 has this to say:
>> * * <p>The <q> element is used for quotes, block quotes,
>> embedded
>> * quotes, and (quotes within quotes). There is no real difference
>> between
>> * a quote and a block quote other than formatting so both of those
>> are
>> * treated with this single element. Note that this element does not
>> * contain the element <verse> such that is a speaker is
>> citing a
>> * verse in a speech, the proper way to encode that would be as a
>> * <reference> within the larger <q> element.</p>
>> * <p>When segmenting quotes, use the same qID, and increment the
>> segID to
>> * allow retrieval of the entire quotation.</p>
>> * * <p><b>Implementation details</b></p>
>> * <p>Ignored a host of optional attributes and the fact that the
>> contents
>> * can include a, abbr, closer, date, figure, foreign, hi, index,
>> * inscription, lg, list, mentioned, milestone, milestoneEnd,
>> * milestoneStart, name, p, q, salute, signed, speech, table,
>> verse.</p>
>> */
>> public void startQuote(String who, String level);
>> public void endQuote();
>>
>> /**
>> * Add to a JAXBVerseData.
>> * * <p>OSIS v1.1.1 has this to say:
>> * * <p>The <reference> element will be the subject of
>> extension to
>> * include XLink/XPointer syntax in a later OSIS release. At
>> present it
>> * marks the location of a reference in one text to another,
>> whether direct
>> * (like citation/quotation) or indirect (such as an allusion),
>> along with
>> * the reference/pointing mechanism in this release.</p>
>> * * <p><b>Implementation details</b></p>
>> * <p>Ignored a host of optional attributes and the fact that the
>> contents
>> * can include a, abbr, date, divineName, hi, foreign, index,
>> milestone
>> * milestoneEnd, milestoneStart, name.</p>
>> */
>> public void startReference(Passage ref);
>> public void endReference();
>>
>> /**
>> * Add to a JAXBVerseData.
>> * * <p>OSIS v1.1.1 has this to say:
>> * * <p>The <seg> element is a generic phrase container
>> element.
>> * Its primary use should be for phrase level markup that was
>> omitted in
>> * this release (please send a note to the project with your
>> requirements),
>> * representation of line breaks in a translation (or original
>> text), or for
>> * some other purpose for which no other element suffices.</p>
>> * * <p><b>Implementation details</b></p>
>> * <p>Ignored a host of optional attributes and the fact that the
>> contents
>> * can include a, abbr, date, divineName, hi, foreign, index,
>> milestone
>> * milestoneEnd, milestoneStart, name, q, title.</p>
>> */
>> public void startSegment();
>> public void endSegment();
>>
>> /**
>> * Add to a JAXBVerseData.
>> * * <p>OSIS v1.1.1 has this to say:
>> * * <p>The <speaker> element can be used to enclose the
>> name of a
>> * speaker in the text (when reported) but can also bear speaker
>> information
>> * when not apparent in the text, such as the shifts of speaker in
>> the Song
>> * of Solomon.</p>
>> * * <p><b>Implementation details</b></p>
>> * <p>Ignored a host of optional attributes and the fact that the
>> contents
>> * can include a, abbr, divineName, hi, foreign, index, name.</p>
>> */
>> public void startSpeaker(String who);
>> public void endSpeaker();
>>
>> /**
>> * Add to a JAXBVerseData.
>> * * <p>OSIS v1.1.1 has this to say:
>> * * <p>The <title> element is used for titles both in
>> the sense of
>> * those of divisions in a work, i.e., chapters, books, but also
>> for titles
>> * of other works that occur in notes or even the text. The
>> <title>
>> * element can occur within itself so users can have multiple
>> sub-titles if
>> * desired.</p>
>> * * <p><b>Implementation details</b></p>
>> * <p>Ignored a host of optional attributes and the fact that the
>> contents
>> * can include a, abbr, date, divineName, hi, figure, foreign, index,
>> * inscription, lg, milestone, name, q, title.</p>
>> */
>> public void startTitle();
>> public void endTitle();
>>
>> /**
>> * Add to a JAXBVerseData.
>> * * <p>OSIS v1.1.1 has this to say:
>> * * <p>The <transChange> element was formulated to deal
>> with cases
>> * where a literal translation has added words to clarify the
>> translation.
>> * The Amplified Bible is one example of where this element would
>> be useful
>> * but certainly not the only one. Changing the tense of a verb to
>> agree
>> * with modern language usage and yet wanting to preserve some
>> indication
>> * that the original text had been changed is another.</p>
>> * * <p><b>Implementation details</b></p>
>> * <p>Ignored a host of optional attributes and the fact that the
>> contents
>> * can include a, abbr, date, divineName, hi, foreign, index,
>> milestone,
>> * milestoneEnd, milestoneStart, name.</p>
>> */
>> public void startTransChange(String type);
>> public void endTransChange();
>>
>> /**
>> * Add to a JAXBVerseData.
>> * * <p>OSIS v1.1.1 has this to say:
>> * * <p>The <w> element is used to mark tokens separated
>> by whitespace,
>> * which is probably an inadequate definition of word. It is
>> provided to
>> * allow users to attach a variety of other information to such
>> tokens.</p>
>> * * <p><b>Implementation details</b></p>
>> * <p>Ignored a host of optional attributes and the fact that the
>> contents
>> * can include a, index, seg.</p>
>> */
>> public void startWord();
>> public void endWord();
>>
>> /*
>> /**
>> * Add to a JAXBVerseData.
>> * * <p>OSIS v1.1.1 has this to say:
>> * * <p>The <a> element is a clone of <reference>
>> and was added to
>> * allow simple links to be built with the current OSIS release.
>> * This will continue in future releases but expect to see full
>> * XLink/XPointer syntax in later releases. Both the standard HTML
>> links and
>> * XLinks will be supported for OSIS texts for the foreseeable
>> future.</p>
>> * * <p><b>Implementation details</b></p>
>> * <p>Ignored a host of optional attributes and the index element.</p>
>> *
>> public void addA(String href);
>>
>> /**
>> * Add to a JAXBVerseData.
>> * * <p>OSIS v1.1.1 has this to say:
>> * * <p>The <abbr> element contains abbreviations and the
>> expansion of
>> * abbreviations is placed in the expansion attribute. To illustrate:
>> * <abbr expansion="Journal of Biblical
>> Literature">JBL</abbr>
>> * This will be particularly helpful for users who are not
>> professional
>> * biblical scholars or to make sure references to lesser known
>> publications
>> * are easy to find (both by researchers and librarians).</p>
>> * * <p><b>Implementation details</b></p>
>> * <p>Ignored a host of optional attributes and the fact that the
>> short
>> * version can unclide the folowing elements a, divineName,
>> foreign, index,
>> * name, note, reference, w.</p>
>> *
>> public void addAbbr(String expansion, String shortened);
>>
>> /**
>> * Add to a JAXBVerseData.
>> * * <p>OSIS v1.1.1 has this to say:
>> * * <p>The <date> element is used to record the type of
>> date found in a
>> * text. It bears an optional calendarType attribute which will
>> allow the
>> * user to note what sort of date is being recorded. Permissible
>> values for
>> * the calendarType attribute are: Chinese, Gregorian, Islamic,
>> ISO, Jewish,
>> * Julian. As with other attribute values in this schema, the user can
>> * insert their own value for this attribute by appending
>> "x-"
>> * before the value they wish to use.
>> * Thus, calendarType="x-DisneyLand" would be one
>> possible value,
>> * although probably not a useful one.</p>
>> * <p>The <note> element may occur within <date> as
>> commentators
>> * may wish to record additional information about the date.</p>
>> * * <p><b>Implementation details</b></p>
>> * <p>Ignored a host of optional attributes and the fact that the
>> contents
>> * can include a, index, note and w.</p>
>> *
>> public void addDate(String calendarType, String date);
>>
>> /**
>> * Add to a JAXBVerseData.
>> * * <p>OSIS v1.1.1 has this to say:
>> * * <p>The <foreign> element is used to mark foreign
>> words or phrases
>> * that occur in a text. In some cases that may be for purposes of
>> special
>> * display of such words, such as displaying "foreign"
>> words in an
>> * English text in italics. In others, it may be to allow the use of a
>> * special font to insure properly rendering of the text. Foreign
>> in this
>> * sense means different from the text being encoded and not
>> foreign from a
>> * particular language. The quotation of Armaic words in a New
>> Testament
>> * translation (whether the translation is English, Spanish or
>> German) is an
>> * example of foreign words in a text.</p>
>> * * <p><b>Implementation details</b></p>
>> * <p>Ignored a host of optional attributes and the fact that the
>> contents
>> * can include a abbr, date, divineName, hi, foreign, index,
>> milestone,
>> * name, note, reference, seg, title, w.</p>
>> *
>> public void addForeign(String quoted);
>>
>> /**
>> * Add to a JAXBVerseData.
>> * * <p>OSIS v1.1.1 has this to say:
>> * * <p>The <hi> element is a generic element that can be
>> used to record
>> * emphasis and the type of emphasis for an authored text. It
>> should not be
>> * used with primary source materials or translations to mark
>> portions of
>> * texts that have been rendered differently from the surrounding
>> text.
>> * The words of Jesus in a red-letter edition for example, should
>> not be
>> * encoded using the <hi> element. To merely record that the
>> words
>> * appear in the color red is to miss the reason why they are
>> marked in red,
>> * a fact that will probably be of interest to others reading the
>> text.</p>
>> * * <p><b>Implementation details</b></p>
>> * <p>Ignored a host of optional attributes and the fact that the
>> contents
>> * can include a, abbr, date, divineName, hi, milestone, milestoneEnd,
>> * milestoneStart, name, w.</p>
>> *
>> public void addHi(String highlighted);
>>
>> /**
>> * Add to a JAXBVerseData.
>> * * <p>OSIS v1.1.1 has this to say:
>> * * <p>The <index> element is an empty element that is
>> used to mark
>> * index locations in a text. It follows the TEI Guidelines in most
>> * respects, with the addition of a "see" attribute which is
>> * explained below.
>> * Note that the level attributes (level1 - level4) correspond to a
>> main
>> * entry (level1) and nested entries under that entry. Thus, the
>> levels
>> * link the attribute values together in hierarchy of topics. If
>> you want a
>> * separate index entry for a particular location, use another index
>> * element.</p>
>> * <p>The "see" attribute was added to the TEI syntax to
>> enable
>> * the recording of a see or see also entry in the index. The value
>> of that
>> * attribute should be a level1 entry that occurs in the document
>> instance.
>> * Automatic processing should generate a link to the appropriate
>> location
>> * in the text but use of the string provides a useful fallback to
>> the user
>> * by specifying the index entry that should be a pointer to the
>> appropriate
>> * location.</p>
>> * <p>If it is desired to have see or see also entries at other
>> levels of
>> * the index, separate index tags should be inserted with the
>> appropriate
>> * see attributes.</p>
>> * <p>Examples:
>> * Standard usage of this element:
>> * <index id="1234" index="subject"
>> level1="Job" level2="theodicy"
>> level3="parallel literature" level4="Babylonian
>> sufferer" see="Suffering - Theodicy"/>
>> * would result in an entry in the subject index that looks like this:
>> * <pre>
>> * Job
>> * theodicy
>> * parallel literature
>> * Babylonian sufferer (pointer to location in text [id])
>> * see: Suffering - Theodicy</pre>
>> * </p>
>> * * <p><b>Implementation details</b></p>
>> * <p>Ignored a host of optional attributes.</p>
>> *
>> public void addIndex(String index, String level1, String level2,
>> String level3, String level4, String see);
>>
>> /**
>> * Add to a JAXBVerseData.
>> * * <p>OSIS v1.1.1 has this to say:
>> * * <p>The <list> element is used for common lists of
>> items, as well as
>> * simple glossaries and definition lists. The <list> element
>> can
>> * contain embedded lists, thereby allowing embedded sublists.</p>
>> * * <p><b>Implementation details</b></p>
>> * <p>Ignored a host of optional attributes and the fact that the
>> contents
>> * can include.</p>
>> *
>> public void addList();
>>
>> /**
>> * Add to a JAXBVerseData.
>> * * <p>OSIS v1.1.1 has this to say:
>> * * <p>The <milestone> element is a true empty element
>> that is used to
>> * mark locations in a text. It carries not semantics other than a
>> location
>> * in the text stream. It can be used along with its type attribute to
>> * record elements that otherwise would overlap. The milestone_Pt
>> attribute
>> * is used to indicate the type of attribute, such as a screen
>> break - sb
>> * (also known as a shadow milestone), pb (page break) and others.</p>
>> * * <p><b>Implementation details</b></p>
>> * <p>Ignored a host of optional attributes and the fact that the
>> contents
>> * can include.</p>
>> *
>> public void addMilestone();
>>
>> /**
>> * Add to a JAXBVerseData.
>> * * <p>OSIS v1.1.1 has this to say:
>> * * <p>The <milestoneEnd> element is used with the
>> * <milestoneStart> element to carry a semantic of
>> containership for
>> * an enumerated list of elements. That element type is specified
>> on the
>> * milestone_SE (Start/End) attribute. Milestones of this type are
>> linked by
>> * having identical osisID and splitID values.</p>
>> * * <p><b>Implementation details</b></p>
>> * <p>Ignored a host of optional attributes and the fact that the
>> contents
>> * can include.</p>
>> *
>> public void addMilestoneEnd();
>>
>> /**
>> * Add to a JAXBVerseData.
>> * * <p>OSIS v1.1.1 has this to say:
>> * * <p>The <milestoneStart> element element is used with
>> the
>> * <milestoneStart> element to carry a semantic of
>> containership.
>> * That element being emulated should be specified with the type
>> attribute.
>> * Milestones of this type are linked by the end attribute of the
>> * milestoneStart and the start attribute of milestoneEnd. In cases
>> where
>> * emulated containers have special attributes, such as who or
>> level for
>> * quote, those attributes should be added to milestoneStart.</p>
>> * * <p><b>Implementation details</b></p>
>> * <p>Ignored a host of optional attributes and the fact that the
>> contents
>> * can include.</p>
>> *
>> public void addMilestoneStart();
>>
>> /**
>> * Add to a JAXBVerseData.
>> * * <p>OSIS v1.1.1 has this to say:
>> * * <p>The <inscription> element should not be used for
>> quotations but
>> * only in cases where an actual physical inscription is being
>> reported or
>> * recorded.</p>
>> * <p>The TEI has no equivalent of this element.</p>
>> * * <p><b>Implementation details</b></p>
>> * <p>Ignored a host of optional attributes and the fact that the
>> contents
>> * can include.</p>
>> *
>> public void addInscription();
>>
>> /**
>> * Add to a JAXBVerseData.
>> * * <p>OSIS v1.1.1 has this to say:
>> * * <p>The <mentioned> element is used to mark words (or
>> phrases) that
>> * are mentioned but not used. When illustrating a grammatical
>> point, a
>> * commentary may insert a word as an example of a particular
>> usage. This is
>> * more important for use in notes or commentaries than original
>> texts, but
>> * there are cases where it would be appropriate there as well.</p>
>> * * <p><b>Implementation details</b></p>
>> * <p>Ignored a host of optional attributes and the fact that the
>> contents
>> * can include.</p>
>> *
>> public void addMentioned();
>>
>> /**
>> * Add to a JAXBVerseData.
>> * * <p>OSIS v1.1.1 has this to say:
>> * * <p>The <name> element is useful as it allows the
>> user to declare a
>> * regular form for a name that may be written in different forms
>> in the
>> * text. For example, to recover all the instances of the name
>> Susan, it
>> * might be necessary to recover all instances of Susan, Susie (as a
>> * nickname), Susie-Q (another nickname), as well as Suzanne (which
>> might be
>> * the person's full legal name. The regular attribute on <name>
>> * allows the use of a single form of the name for indexing and
>> searching
>> * purposes.</p>
>> * * <p><b>Implementation details</b></p>
>> * <p>Ignored a host of optional attributes and the fact that the
>> contents
>> * can include.</p>
>> *
>> public void addName();
>>
>> */
>> }
>>
>>
>
>
> _______________________________________________
> jsword-devel mailing list
> jsword-devel@crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel