[sword-devel] KJV2006 - 4th Beta

DM Smith dmsmith555 at yahoo.com
Sat Mar 25 07:22:45 MST 2006


YAB (yet another beta) (downloadable from the links at the bottom of 
http://www.crosswire.org/~dmsmith/kjv2006)

Again, I really value your input and the time you take to evaluate these 
betas.

The process by which I am working is that each beta does 2 things:
fixes problems found with the changes by the previous betas
fixes a new class of problems

At any point, I think we can "deliver" the module. Also, it is a simple 
thing to undo any of the changes. So if we decide to hold off on any of 
the changes, that's fine.

I think a goal (well it is at least mine) should be to get this out as 
soon as possible. My motivation is that using JSword, the encoding 
errors in this module make BibleDesktop look especially bad. Since this 
is the most downloaded module, I think this is an important goal. I 
don't mind postponing some of the changes to reach this goal.

I have found differences between the module and the printed text, where 
the module agrees with the other (two) etexts. So short of proof-reading 
against the printed text, there may unknown differences.

This beta cleaned up the problems reported against the last beta (most 
related to apostrophes), (I still have to apply the fixes to those that 
are in OT notes.)

I have found and fixed more punctuation problems.

I have compared all the differences between the text of this module and 
the printkjv and Tim Lanfear's CCEL work. I checked each of these 
against the Old Scofield, using it as the final arbiter.

There are three significant changes for this beta:
This one fixes words that appear in italics.

This one also fixes titles and adds missing book titles. The only titles 
that I am aware I have not done are the psalm books I-V.
    While I may have made mistakes, I have verified each of these 
against the "original".
    I have preserved the case and punctuation of the original, but have 
not attempted to add line breaks to book titles.
    The only encoding of titles that I am not sure about are the Psalm 
119 ALEPH., BETH. ... titles.
        They should print before the verse.
        I have made these to be titles with in the verse, as the nature 
of an OSIS title is that titles the element that "contains" it.
        In this fashion, they are all subType="x-preverse", but I have 
not marked them as such.

This one also starts the fixing of hyphenated names.
    The KJV2003 edition is fairly uniform in not having hyphenated names.
    However, every printed copy of the KJV that I have uses them.
    My take is that we need to preserve the "jots" and "tittles" so I am 
adding these back.
    So, I have taken a list of names that I got from Tim Lanfear (who 
did the CCEL KJV module) as a start.
    I have changed all of them according to his list and am now 
validating them, verse by verse. (I am in the B's. That's why I say this 
is a start.)
    Interestingly, I have found that a name is not uniformly hyphenated. 
(e.g Abi-ezer is hyphenated about half the time. Beth-lehem is 
hyphenated in the OT but not the NT)
    I am using an en-dash to encode the hyphen (U+2013)
    We may want to change the SWORD engine to handle hyphenated words 
better. (e.g. lucene indexing and searching)
    A couple of interesting things I have just found out:
       Tim Lanfear pointed out to me that in the Hebrew the hyphen is a 
special character. Some of the English hyphenated names are separate 
words in the Hebrew.
       Strongs may have more than one number for a hyphenated word with 
each part having its own (e.g. Bar-jona and Bar-jesus in the NT)

Next steps:
    Finish validating the hyphenated names. (this may span one or more 
betas)
    See how lucene handles the indexing of hyphenated words using an 
en-dash and minus. And report the results here.
        (I am thinking that a minus is seen as word break but and 
en-dash is not)
    Fix the <divineName> encodings. Sometimes these encompass more than 
just the divine name.
        Also, the print versions typically use "small-caps" and render 
Lord not LORD with it. This appears to have been a tradition since the 
1611 printing.
        I think it has been the tradition of etexts to use LORD as a way 
of rendering small caps.
        But with the explicit markup of OSIS this is not necessary.
        However it may be necessary for the front-ends to change to 
accommodate this.
        It might also be nice to change the SWORD engine to mark in the 
lucene index the verses containing the divine name and allow searching 
on LORD (i.e. some i18n marker. e.g. HERR auf Deutsch, SEIGNEUR en 
francais, SENIOR en espanol...) to find those verses.
    Validate the paragraph marks. (There are more than there should be.)

And many thanks to those that have e-mailed me lists of verses that need 
to be fixed.
Special thanks to Tim Lanfear for his detailed feedback and Terry Biggs 
for the SWORD engine changes


   



More information about the sword-devel mailing list