[sword-devel] verse parsing
Chris Little
chrislit at crosswire.org
Wed Mar 29 02:37:11 MST 2006
First, on the topic of OSIS book abbreviations:
Almost everything you should ever need for Bibles is at
http://www.crosswire.org/~chrislit/osis/BibleBookNames.html
There are also the following, less up-to-date xml files, which add more
non-canonical materials. These were the the source materials for the
above, but I haven't maintained them since creating the above list of
Bible books.
Bible: http://www.crosswire.org/~chrislit/osis/bible.xml
OT Pseudepigrapha: http://www.crosswire.org/~chrislit/osis/otp.xml
NT Apocrypha: http://www.crosswire.org/~chrislit/osis/nta.xml
Nag Hammadi codices: http://www.crosswire.org/~chrislit/osis/naghammadi.xml
(named) Dead Sea Scrolls: http://www.crosswire.org/~chrislit/osis/qumran.xml
Mormon texts: http://www.crosswire.org/~chrislit/osis/lds.xml
Classical sources (but actually just Josephus, currently):
http://www.crosswire.org/~chrislit/osis/classical.xml
Now, looking at the list of files at the LXXM source site
(http://ccat.sas.upenn.edu/gopher/text/religion/biblical/lxxmorph/),
there are four categories of problems with mapping files onto OSIS IDs:
1) Books with <number>.<abbrev>.<number>.mlxx style filenames, e.g.
01.Gen.1.mlxx & 02.Gen.2.mlxx. These are just single books divided into
two files and should be concatenated.
2) Apocryphal books. These should all be listed in the file listed at
the top. E.g. Judith = Jdt, Tobit = Tob, Odes = Odes, Psalms of Solomon
= PssSol.
3) Ezras. The Ezras are just absurdly icky. For the LXX, I recommend NOT
just mapping 1Esdras to Ezra and 2Esdras to Nehemiah. The don't actually
line up correctly like this. Whole volumes could and probably have been
written about the Ezras, and I would strongly recommend just tagging
them 1Esd and 2Esd, respectively.
[Specifically:
Hebrew Ezra = Vulgate 1Esd = KJV Ezra
Hebrew Neh = Vulgate 2Esd = KJV Neh
LXX 1Esd = Vulgate 3Esd = KJV 1Esd = 2Chr 35-36 paraphrased + Ezra + Neh
7:38-8:12 + other material
LXX 2Esd = Hebrew Ezra+Neh = Vulgate 1Esd+2Esd = KJV Ezra+Neh
And 4Esd(=4Ezra+5Ezra+6Ezra) makes things even more complicated--but
luckily isn't of import since it isn't in the LXX.]
4) Variant books, namely (Josh|Judges)(B|A), Tobit(BA|S),
(Daniel|Bel|Sus)(OG|Th)--6 books with 2 variants each. I would strongly
recommend treating each of these 12 books as individual books. Give them
unique osisIDs, present them to the user as unique books, etc. This is
how Logos does it. This is how BibleWorks does it. And I believe STEP
even incorporated a separate book ID to account for the 6 additional
books in Rahlfs. Rahlfs is a sufficient important source text that you
really ought to do whatever you need to do to accommodate it in its
native form. You should wedge it into another versification system (e.g.
one with only one book each of Joshua, Judges, Tobit, Daniel, Bel & the
Dragon, and Susanna).
I don't have my Rahlfs with me, but I really don't think presenting it
in a tabular view with both traditions on a single screen is the right
way to go. If we're working within the KJV versification, that's a
suitable compromise. But if we're permitted to make changes to the
underlying versification system in Sword and present Rahlfs in its OWN
versification system, the books should be separated.
Towards that end, I would recommend adding 6 books to the
BibleBookNames.html file cited at the top, to accomodate the 6 variant
books in Rahlfs: JoshA, JudgA, TobS, DanTh, BelTh, & SusTh. Under this
system, JoshB = osisID Josh, JudgesA = osisID Judg, TobBA = osisID Tob,
and the OG Daniel texts = osisIDs Dan, Bel, and Sus. Does that seem
agreeable?
The only other way to deal with them is to call them part of a separate
work and use the standard book IDs for both, but put the variants in the
second work. I don't like that idea since they're part of the same print
volume, a volume which is generally considered a single work.
A few more comments below...
Troy A. Griffitts wrote:
> Obviously, my goal was to save everyone as much modification as
> possible, but there just doesn't seem like there is a good fit for
> modules like these.
I think DM, Martin, and I agree on this point: make it work correctly,
regardless of how badly it breaks existing frontends. We can make
modules requiring a new driver invisible to existing frontends and
future frontends can support new features when they are ready to do so.
> The next thing I began to realize is that this module uses a,b,c type
> suffixes on verses (click on the first link in this email again and
> scroll to the bottom of the page). This does not fit nicely into our
> integer concept for verses. I considered adding a 5th level:
> Testament/Book/Chapter/Verse/Sub. But this really breaks the whole
> paradigm anyway, as sub will mostly be blank except when there might be
> a letter tacked to the end. It really doesn't solve any problems, e.g.
> key.Verse(key.Verse()+1) still will break. key++ would work, I guess,
> but you'd have to always check if Sub was set to anything. And who
> knows what Sub really means. Is it a replacement? Is it really a
> subdivision of the verse? It just doesn't seem like it solves any
> problems nicely. It seems like the LXX really is sequentially 31, 31a,
> 32, 33, 33a, 33b. When I know that other Bibles and commentaries mean
> the first part of 33 when they say 33a. So adding Sub doesn't seem like
> it gives us much except keeping Verse an integer.
We need to deal with non-integers for chapters in Greek Esther as in the
NRSV also. In addition, those chapters aren't in sequential numerical or
alpha-numeric order. So we'll have to deal with out-of-order chapters
and, probably, verses. GenBooks handle that fine. Translation to
VerseKeys is going to be a challenge.
> The 'reference' is display like:
>
> /JoshB/24/1
>
> We could add a flag which says to display using a BK CH:VS format. I
> was thinking about adding a pattern, like letting the modules.conf file
> specify something like:
> KeyDisplay=%1 %2:%3
> but I think this is more work for everyone than it benefits. Besides,
> other languages probably prefer other formats (BK CH.VS). So I think
> we'd like to just say something like KeyFormat=BCV
That looks like a great idea. Other LANGUAGES shouldn't be allowed to
modify the formatting of a text. On the other hand, giving other TEXTS
the ability to have customized presentation would be a great benefit,
and this accommodates that very well. For example, the print NRSV Oxford
Study Bible that I have uses BK CH.VS.
> The other problem is parsing...
> Currently VerseKey provides all the nice parsing functionality that
> figures out:
>
> Ijn2-3:12
>
> It can do this because it has a set of books that it know about, along
> with all kinds of abbreviations and translated into a number of
> languages. Our current parser also drops suffixed letters.
I think part of the solution is to make the parser more generalized and
to force the module to give it some parameters for parsing. Each module
needs to tell the parser something like 1) the format and 2) valid
books. The format might be something like a PERL regular expression:
"($book) ([0-9]+):([0-9]+)([a-c])", where the parser then picks out the
book, chapter, verse, and sub-verse. I have no recommendations for
implementation and don't even know whether it is feasible.
The list of valid books is simpler. Every modules should simply provide
an ordered list of its contents (in osisID form, naturally). The parser
then constructs a list of possible book abbreviations to use in parsing,
excluding those books not present. For example, the LXXM is going to
include Judith, but not Jude. So the parser would include all the
abbreviations for Judith, but not those for Jude, and a reference to "Ju
1:1" should parse as Jdt.1.1.
> Finally, if we solve these problems, and place an entry in LXXM:
> Category=Biblical Texts, it will probably break most frontends which
> expect all Biblical Texts to use a VerseKey. I don't know how to solve
> this problem.
I would just give it a different Category.
> I also considered a major change to VerseKey which would make all levels
> strings and not integers. I realize many frontends use integer spin
> controls to increase/decrease chapter and verse. There may also be
> linear logic regarding these things.
Unfortunately I can't think of a better solution to handling the array
of versification systems that exist. I think that's why we went with
strings in OSIS.
--Chris
More information about the sword-devel
mailing list