[sword-devel] KJV2006 - 4th Beta
DM Smith
dmsmith555 at yahoo.com
Sat Mar 25 07:22:45 MST 2006
YAB (yet another beta) (downloadable from the links at the bottom of
http://www.crosswire.org/~dmsmith/kjv2006)
Again, I really value your input and the time you take to evaluate these
betas.
The process by which I am working is that each beta does 2 things:
fixes problems found with the changes by the previous betas
fixes a new class of problems
At any point, I think we can "deliver" the module. Also, it is a simple
thing to undo any of the changes. So if we decide to hold off on any of
the changes, that's fine.
I think a goal (well it is at least mine) should be to get this out as
soon as possible. My motivation is that using JSword, the encoding
errors in this module make BibleDesktop look especially bad. Since this
is the most downloaded module, I think this is an important goal. I
don't mind postponing some of the changes to reach this goal.
I have found differences between the module and the printed text, where
the module agrees with the other (two) etexts. So short of proof-reading
against the printed text, there may unknown differences.
This beta cleaned up the problems reported against the last beta (most
related to apostrophes), (I still have to apply the fixes to those that
are in OT notes.)
I have found and fixed more punctuation problems.
I have compared all the differences between the text of this module and
the printkjv and Tim Lanfear's CCEL work. I checked each of these
against the Old Scofield, using it as the final arbiter.
There are three significant changes for this beta:
This one fixes words that appear in italics.
This one also fixes titles and adds missing book titles. The only titles
that I am aware I have not done are the psalm books I-V.
While I may have made mistakes, I have verified each of these
against the "original".
I have preserved the case and punctuation of the original, but have
not attempted to add line breaks to book titles.
The only encoding of titles that I am not sure about are the Psalm
119 ALEPH., BETH. ... titles.
They should print before the verse.
I have made these to be titles with in the verse, as the nature
of an OSIS title is that titles the element that "contains" it.
In this fashion, they are all subType="x-preverse", but I have
not marked them as such.
This one also starts the fixing of hyphenated names.
The KJV2003 edition is fairly uniform in not having hyphenated names.
However, every printed copy of the KJV that I have uses them.
My take is that we need to preserve the "jots" and "tittles" so I am
adding these back.
So, I have taken a list of names that I got from Tim Lanfear (who
did the CCEL KJV module) as a start.
I have changed all of them according to his list and am now
validating them, verse by verse. (I am in the B's. That's why I say this
is a start.)
Interestingly, I have found that a name is not uniformly hyphenated.
(e.g Abi-ezer is hyphenated about half the time. Beth-lehem is
hyphenated in the OT but not the NT)
I am using an en-dash to encode the hyphen (U+2013)
We may want to change the SWORD engine to handle hyphenated words
better. (e.g. lucene indexing and searching)
A couple of interesting things I have just found out:
Tim Lanfear pointed out to me that in the Hebrew the hyphen is a
special character. Some of the English hyphenated names are separate
words in the Hebrew.
Strongs may have more than one number for a hyphenated word with
each part having its own (e.g. Bar-jona and Bar-jesus in the NT)
Next steps:
Finish validating the hyphenated names. (this may span one or more
betas)
See how lucene handles the indexing of hyphenated words using an
en-dash and minus. And report the results here.
(I am thinking that a minus is seen as word break but and
en-dash is not)
Fix the <divineName> encodings. Sometimes these encompass more than
just the divine name.
Also, the print versions typically use "small-caps" and render
Lord not LORD with it. This appears to have been a tradition since the
1611 printing.
I think it has been the tradition of etexts to use LORD as a way
of rendering small caps.
But with the explicit markup of OSIS this is not necessary.
However it may be necessary for the front-ends to change to
accommodate this.
It might also be nice to change the SWORD engine to mark in the
lucene index the verses containing the divine name and allow searching
on LORD (i.e. some i18n marker. e.g. HERR auf Deutsch, SEIGNEUR en
francais, SENIOR en espanol...) to find those verses.
Validate the paragraph marks. (There are more than there should be.)
And many thanks to those that have e-mailed me lists of verses that need
to be fixed.
Special thanks to Tim Lanfear for his detailed feedback and Terry Biggs
for the SWORD engine changes
More information about the sword-devel
mailing list