[sword-devel] KJV2006 progress report
DM Smith
dmsmith555 at yahoo.com
Tue Mar 7 11:23:25 MST 2006
I thought I'd let everyone know where the KJV2006 project stands. So far
it has been a solo effort. Not that I don't want help, but that it has
all been prep work so far.
I have dumped the KJV2003 by books into
www.crosswire.org/svn/modules/KJV/trunk/text
and have created a tag for it at
www.crosswire.org/svn/modules/KJV/tags/kjv2003.
Each book is named according to its OSIS book name and suffixed with
.xml. Each is a complete OSIS document, but many are not well-formed xml
and of those that are, none are valid OSIS. (In fact almost every verse
was not valid! More on that later.)
I have written a program to make the files well-formed and valid. (But
not necessarily good OSIS) This program also checks that this is true.
You can see the program at:
http://www.crosswire.org/svn/jsword/trunk/jsword/src/main/java/org/crosswire/jsword/examples/ModToOsis.java
I am using this program to make global changes to the files. Troy has
asked me to make the global changes before checking in the files. So
when I get a few more questions answered and finish Troy's global change
requests, I'll check in everything. I'll also see about creating a
module for the beta area.
Then we can start fixing text problems.
Here is a summary of the changes:
1) fix the <note/>...</note> problem
2) replace <p/> (not allowed under OSIS) with <pb/>
3) On <w> elements, replace x-Strongs: with strong: and x-Robinson: with
robinson: (OSIS does not like the x- as a prefix to a work id)
4) On <w> elements, changed splitID="n" to type="x-split" subType="x-n"
5) On <w> elements removed the attributes without any values. (XML
requires attributes to have values)
6) revert x-preverse to an enclosing <div
type="section><title>...</title>.......</div>
7) changed type="transChanged" subType="type:added" to
type="x-transChanged" subType="x-added"
This was used in the following construct:
<w>...<seg type="x-transChange" subType="x-added">...</seg> ...
</w>
OSIS requires x- prefix for both type and subType for <seg> elements.
8) deleted all <resp> elements as this has never been part of the OSIS
standard. resp is a global attribute. I could merge it with the
preceding <note type="x-strongsMarkup">....</note> However, I think
these "notes" should be removed as well.
9) Fixed 81 verses that had improperly specified <w> elements, either
nested or containing <transChange>.
10) Fixed a few locations where xml elements overlapped as in <a><b></a></b>
Next steps:
1) merge empty indefinite articles to their following element as
requested by Troy.
2) Get rid of <note type="x-strongMarkup">...</note> These contain notes
from the taggers of the KJV2003 project, regarding the tagging. These
also contain URL escape sequences. They look really bad when they show
up to an end user.
3) Create a module with these changes for beta testing. It may be that
we need to change the markup to what the SWORD API expects. If so, I
recommend using xslt just before making the module.
4) Open up the effort for others to identify and correct problems.
I'm thinking that we might want to have 2 releases. An initial one that
has the fixes that I have done so far and #1 & #2 from next steps. Then
one that contains the fixes for the missing 's in the OT and any other
problems that are found.
I also want to experiment with using <q sID="xxx" who="Jesus"/> ... <q
eID="xxx"> to see if SWORD for Windows can handle it correctly. If so, I
am inclined to change all quotes to this form. (feedback desired)
More information about the sword-devel
mailing list