[jsword-devel] ProjectB Primer (part3)

Troy A. Griffitts jsword-devel@bibletechnologieswg.org
Tue, 16 Apr 2002 16:01:02 -0700

Joe and all,
	Sorry, I've been away for a while and am only getting a chance to check
my email occasionally.  After reading your primer (I, II, and III) :)  I
must say, it's rather scary how much our projects, in theory, are
alike.  Also, you may find many of the things you are planning have been
implemented similarly in sword.  I think we will have a very easy
mapping between the projects if that is the way we continue to proceed.


Book = SWModule
Books = SWMgr.Modules
Pointers = SWKey (all subclasses are required to convert to/from string)
BookData manipulation (->Display) = SWFilter
MultiSource persistance (BookDrivers) = SWModule subclasses

Joe Walker wrote:
> This is cut from some of the JavaDoc for ProjectB. It describes the most
> fundamental interfaces in the Model layer.
> Project B Primer part 3: Design of "Model" layer
> This is overview documentation that explains how the Book and Bible
> interfaces are arranged and justifies the design decisions. Note that most
> of what is documented here is current for ProjectB - there are a few
> sections that reflect where we will be given a bit more re-factoring.
> The most important thing any Bible program does is to access Bible data, and
> in more general terms, book data. There may be many different sources of
> data - Bibles stored in different formats, or even on remote systems,
> dictionaries, lexicons and so on. It would be good (where appropriate) to be
> able to treat them all similarly without needing to reinvent the wheel too
> often. Clearly we should be able to use inheritance to specialize where
> needed.
> So we start by creating an interface that is common to all 'Books' but that
> allows is to do as much as possible.
> The first goal is to read data and to be able to direct where to read from
> with some sort of pointer system. So we start with an interface that looks
> like this:
> interface Book
> {
>     Data getData(Pointers ptr);
> }
> Before we look at what Data and Pointers look like, a Book will need to be
> able to do 2 other things:
> Firstly tell us about itself: what it is called, where it comes from and so
> on - MetaData about the Book itself.
> Secondly help us to find stuff by searching. Now searching can be complex
> and we should not aim to implement a full search system in every Book,
> however I think we can build a couple of simple methods that will allow us
> to construct a powerful generalized search system separately. All we need is
> to find any given word and to be able to find words that match a given
> specification.
> So we can develop our interface like this:
> interface Book
> {
>     BookMetaData getMetaData();
>     Data getData(Pointers ptr);
>     Pointers search(String word);
>     String[] matchingWords(String spec);
> }
> The definition of the matchingWord spec needs attention - in ProjectB the
> method is called getStartsWith(String base) which allows stemming (the most
> commmon search word manipulation) but does not allow more complex wildcard
> cases. I want to avoid adding too much complexity that will be hard to
> implement and rarely used. Certainly any solution that involves regular
> expression is going to be 1. very hard to implement and 2. not actually
> useful for 99% of users. So I propose a simplifiaction:
> interface Book
> {
>     ...
>     String[] startsWith(String spec);
> }
> This interface can be implemented several times, once by something that
> reads Sword format data, once by something that reads ProjectB format data,
> and so on.
> The users, and indeed the developers of the front ends do not want to know
> about the various different implementors and what implementations need to be
> looked at. So we can use a couple of classes to fix this. If you like GoF
> patterns, an AbstractFactory:
> interface Books
> {
>     Iterator listBooks();
>     void registerDriver(BookDriver driver);
> }
> interface BookDriver
> {
>     Iterator listBooks();
> }
> Both listBooks() methods allow you to iterate over the Books known to the
> whole system (Books) or the particular BookDriver. The question is what sort
> of Object should these iterators iterate over?
> The Books themselves would be a bad idea because creating a Book may be a
> time and memory consuming process (indexes to be loaded etc) so we need some
> sort of a key to refer to Books by.
> A simple string is OK, but better (and more unique) would be the MetaData
> objects previously noted. So these MetaData objects need to be able to give
> access to the Book they represent.
> interface Books
> {
>     Iterator listBooks();
>     void registerDriver(BookDriver driver);
>     Book getBook(BookMetaData id);
> }
> The BookMetaData itself looks something like this:
> interface BookMetaData
> {
>     String getName();
> }
> Before we move on to what BookData and Pointers look like, I have
> intentionally ignored 2 issues:
> Encrypted works - some works will need to be encrypted - however the finding
> of keys or deobfustication will be done within the Driver so we don't need
> to worry about it too much.
> Configuration - some Books will need configuring before they will work,
> maybe with encryption keys, maybe with directories under which to find
> information. Each BookDriver will need to take care of configuring the Books
> that it creates we don't attempt to do anything more fancy even though there
> are parts of ProjectB that have implemented a generic configuration system.
> BookData and Pointers are related. BookData describes the actual Book text
> (for example "In the beginning God created ...") and Pointers describe where
> that text comes from (for example "Gen 1:1")
> BookData first. We do not want to force users of this code to use it in any
> specific way, so BookData should describe the text in as much detail as
> possible without forcing how that text is used. The final display could be a
> PDA, a web browser, a matching verse list or a full-blow GUI display. This
> to me rules out RTF, HTML and plain text, as they are all either display
> specific (RTF/HTML) or low detail (text), and makes me think that XML along
> with some standard converters to turn XML into RTF/HTML/PDF/text/blah is the
> best choice.
> This has the added benefit that it allows us to specify not just what output
> format is required, but also how that transformation is done, the fonts and
> layout details of the produced RTF/HTML are all very configurable.<br>
> It also turn out to be very easy in Java simply by using the XSL libraries
> produced by Sun and Apache. XSL libraries are freely available in all good
> languages. :-)
> However there is still the question of how to marshal the XML into objects
> for manipulation. In Java there are many options - SAX/DOM/JDOM/JAXB -
> ordered in my opinion from worst for this job (SAX) to best for the job
> (JAXB) however JAXB is still very alpha so I am currently using JDOM and
> I've implemented some classes to hide the marshalling method for all but the
> most fancy of Books.
> Pointers are used to request BookData from a Book and are also used as a
> reply from a search - so we have Pointers that tell us where the word
> "aaron" exists in a particular Book. For the case of a Bible a pointer could
> look like this: "Gen 1:1, Isa 45:2, Rev 20:4", for the case of a dictionary
> a pointer would be like this: "aaron", or for a BookDriver that contained
> sermons: "page 153, para 4-5".
> I have not placed a requirement on Pointers for them to apply only to single
> results, or even for the results to be contiguous (Pointers would be of
> little use as search answers if this was to be the case).
> The only common feature of Pointers that I can think of is that they ought
> to be convertible to and from strings.
> For the Bible case a Pointers is a collection of verses, and ProjectB has a
> set of classes called Verse, VerseRange and Passage which are a fundamental
> building block. Passage is a specialization of Pointers where the Book in
> question is a Pointer. The Passage package has classes to do all sorts of
> useful manipulations to lists of verses