[sword-devel] KJV2006 progress report

DM Smith dmsmith555 at yahoo.com
Tue Mar 7 11:23:25 MST 2006


I thought I'd let everyone know where the KJV2006 project stands. So far 
it has been a solo effort. Not that I don't want help, but that it has 
all been prep work so far.

I have dumped the KJV2003 by books into
    www.crosswire.org/svn/modules/KJV/trunk/text
and have created a tag for it at
    www.crosswire.org/svn/modules/KJV/tags/kjv2003.

Each book is named according to its OSIS book name and suffixed with 
.xml. Each is a complete OSIS document, but many are not well-formed xml 
and of those that are, none are valid OSIS. (In fact almost every verse 
was not valid! More on that later.)

I have written a program to make the files well-formed and valid. (But 
not necessarily good OSIS) This program also checks that this is true. 
You can see the program at: 
http://www.crosswire.org/svn/jsword/trunk/jsword/src/main/java/org/crosswire/jsword/examples/ModToOsis.java

I am using this program to make global changes to the files. Troy has 
asked me to make the global changes before checking in the files. So 
when I get a few more questions answered and finish Troy's global change 
requests, I'll check in everything. I'll also see about creating a 
module for the beta area.

Then we can start fixing text problems.

Here is a summary of the changes:
1) fix the <note/>...</note> problem
2) replace <p/> (not allowed under OSIS) with <pb/>
3) On <w> elements, replace x-Strongs: with strong: and x-Robinson: with 
robinson: (OSIS does not like the x- as a prefix to a work id)
4) On <w> elements, changed splitID="n"  to type="x-split" subType="x-n"
5) On <w> elements removed the attributes without any values. (XML 
requires attributes to have values)
6) revert x-preverse to an enclosing <div 
type="section><title>...</title>.......</div>
7) changed type="transChanged" subType="type:added" to 
type="x-transChanged" subType="x-added"
    This was used in the following construct:
         <w>...<seg type="x-transChange" subType="x-added">...</seg> ... 
</w>
    OSIS requires x- prefix for both type and subType for <seg> elements.
8) deleted all <resp> elements as this has never been part of the OSIS 
standard. resp is a global attribute. I could merge it with the 
preceding <note type="x-strongsMarkup">....</note> However, I think 
these "notes" should be removed as well.
9) Fixed 81 verses that had improperly specified <w> elements, either 
nested or containing <transChange>.
10) Fixed a few locations where xml elements overlapped as in <a><b></a></b>

Next steps:
1) merge empty indefinite articles to their following element as 
requested by Troy.
2) Get rid of <note type="x-strongMarkup">...</note> These contain notes 
from the taggers of the KJV2003 project, regarding the tagging. These 
also contain URL escape sequences. They look really bad when they show 
up to an end user.
3) Create a module with these changes for beta testing. It may be that 
we need to change the markup to what the SWORD API expects. If so, I 
recommend using xslt just before making the module.
4) Open up the effort for others to identify and correct problems.

I'm thinking that we might want to have 2 releases. An initial one that 
has the fixes that I have done so far and #1 & #2 from next steps. Then 
one that contains the fixes for the missing 's in the OT and any other 
problems that are found.

I also want to experiment with using <q sID="xxx" who="Jesus"/> ... <q 
eID="xxx"> to see if SWORD for Windows can handle it correctly. If so, I 
am inclined to change all quotes to this form. (feedback desired)







More information about the sword-devel mailing list