[sword-devel] KJV2006 Project
DM Smith
dmsmith555 at yahoo.com
Thu Feb 23 10:50:41 MST 2006
Check to see if it is already entered, first.
If it is, perhaps add a comment on it with specific examples.
Here is the basic game plan.
I am extracting the module into books where each book is a supposedly
well-formed OSIS fragment. That is each is
<div type="book" osisID="...">
book goes here
</div>
I am then going to run this through a SAX parser to identify the books
that are not well-formed.
For those that are not well-formed, I am hoping to do basic cleanup via
perl to get them to be well-formed.
This will handle obvious global edits like <note .... />...</note>
becomes <note ....>...</note>.
If any book is too big, I'll subdivide it into chapters and work at the
chapter level.
Once all the books are well formed, then I plan to validate against the
OSIS 2.1 schema.
For obvious global edits, I'll probably do them in perl.
Once this is done, I'll check everything into SVN, book by book, with
long ones perhaps subdivided into chapters.
And open it up for help.
Right now, I have it split out by book and have started to write some
basic validation tools. I may be done with that by the start of next week.
Martin Gruner wrote:
> Lynn,
>
> thanks for pointing this out.
> To be sure that it won't be forgotten, please file a bug at crosswire.org/bugs
> for the "modules" project.
>
> mg
>
> Am Donnerstag, 23. Februar 2006 18:10 schrieb L.Allan-pbio:
>
>>>> I know zilch Greek or Hebrew, but could perhaps help with cleaning up
>>>> the redundant/flawed tags in KJV .... there is a verse that is over
>>>> 10,000 chars long, (Mark 1:9?) and several over 4,000 tags long.
>>>>
>>> Stay tuned. With Troy's help, I should have the work area set up before
>>> too long with the KJV by book, perhaps chapter.
>>>
>> I took a look at the KJV rawtext from the compressed module, and found 208
>> verses whose length is over 2500 characters. All of these are in the NT.
>> There are over 900 NT verses that are over 2000 chars long.
>>
>> Not sure if this helps, but here is a link:
>> htpp://lcdbible.sf.net/misc/VeryLongKjvVerses_2500.zip
>>
>> Mark 1:9 is over 15,000 characters, and something is clearly incorrect. The
>> pattern "w src morph" is repeated about 1000 times within the same verse:
>> # 36: BCV= Mark 1:9 Len:15329
>> <w src="1" lemma="x-Strongs:G2532" morph="x-Robinson:CONJ">And</w>
>> <w src="2" lemma="x-Strongs:G1096" morph="x-Robinson:V-2ADI-3S">it came to
>> pass</w>
>> <w src="3" lemma="x-Strongs:G1722" morph="x-Robinson:PREP">in</w>
>> <w src="4" lemma="x-Strongs:G1565" morph="x-Robinson:D-DPF">those</w>
>> <w src="6" lemma="x-Strongs:G2250" morph="x-Robinson:N-DPF">days</w>, that
>> <w src="8" lemma="x-Strongs:G2424" morph="x-Robinson:N-NSM">Jesus</w>
>> <w src="7" lemma="x-Strongs:G2064" morph="x-Robinson:V-2AAI-3S">came</w>
>> <w src="9" lemma="x-Strongs:G575" morph="x-Robinson:PREP">from</w>
>> <w src="10" lemma="x-Strongs:G3478" morph="x-Robinson:N-PRI">Nazareth</w>
>> <w src="12" lemma="x-Strongs:G1056" morph="x-Robinson:N-GSF">of
>> Galilee</w>, <w src="13" lemma="x-Strongs:G2532"
>> morph="x-Robinson:CONJ">and</w> <w src="14" lemma="x-Strongs:G907"
>> morph="x-Robinson:V-API-3S">was baptized</w>
>> <w src="15" lemma="x-Strongs:G5259" morph="x-Robinson:PREP">of</w>
>> <w src="16" lemma="x-Strongs:G2491" morph="x-Robinson:N-GSM">John</w>
>> <w src="17" lemma="x-Strongs:G1519" morph="x-Robinson:PREP">in</w> <w src
>> morph w src morph w src morph w
>> src morph w src w src morph w src morph w src morph w src morph w
>> w src morph w src morph w src morph w src morph w src w src morph w src
>> morph w src morph w src morph w src
>> w src morph w src morph w src morph w src morph w src w src morph w src
>> morph w src morph w src morph w src
>> *********** repeats ************
>> *********** about ************
>> *********** 300 ************
>> *********** lines ************
>> w src morph w src morph w src morph w src morph w src w src morph w src
>> morph w src morph w src morph w src="20" w src morph w src morph w src
>> morph w src morph="x-Robinson:N-ASM" lemma="x-Strongs:G2446">
>> <w src="19" lemma="x-Strongs:G2446"
>> morph="x-Robinson:N-ASM">Jordan</w></w>. <w src="5" lemma="x-Strongs:G3588"
>> morph="x-Robinson:T-DPF"></w>
>> <w src="11" lemma="x-Strongs:G3588" morph="x-Robinson:T-GSF"></w>
>> <w src="18" lemma="x-Strongs:G3588" morph="x-Robinson:T-ASM"></w><resp
>> type="strongsMarkup" name="rkr" date="2002-11-30-21:45"/>
>>
>> I noticed there was very significant repetition of "x-Strongs:G3588" in a
>> lot of verses, but I don't understand enough about osis markup to know if
>> that is an error. Here is an example:
>> # 2: BCV= Matthew 2:13 Len: 3012
>> <w src="38" lemma="x-Strongs:G3588" morph="x-Robinson:T-GSM"></w>
>> <w src="36" lemma="x-Strongs:G3588" morph="x-Robinson:T-ASN"></w>
>> <w src="18" lemma="x-Strongs:G3588" morph="x-Robinson:T-ASF"></w>
>> <w src="15" lemma="x-Strongs:G3588" morph="x-Robinson:T-ASN"></w>
>> <w src="10" lemma="x-Strongs:G3588" morph="x-Robinson:T-DSM"></w>
>> <w src="2" lemma="x-Strongs:G1161" morph="x-Robinson:CONJ">And when</w>
>> <w src="3" lemma="x-Strongs:G846" morph="x-Robinson:P-GPM">they</w>
>> <w src="1" lemma="x-Strongs:G402" morph="x-Robinson:V-AAP-GPM">were
>> departed,</w>
>> <w src="4" lemma="x-Strongs:G2400" morph="x-Robinson:V-2AAM-2S">behold,</w>
>> <w src="5" lemma="x-Strongs:G32" morph="x-Robinson:N-NSM">the angel</w>
>> <w src="6" lemma="x-Strongs:G2962" morph="x-Robinson:N-GSM">of the Lord</w>
>> <w src="7" lemma="x-Strongs:G5316"
>> morph="x-Robinson:V-PEI-3S">appeareth</w> <w src="11"
>> lemma="x-Strongs:G2501" morph="x-Robinson:N-PRI">to Joseph</w> <w src="8"
>> lemma="x-Strongs:G2596" morph="x-Robinson:PREP">in</w>
>> <w src="9" lemma="x-Strongs:G3677" morph="x-Robinson:N-OI">a dream,</w>
>> <w src="12" lemma="x-Strongs:G3004"
>> morph="x-Robinson:V-PAP-NSM">saying,</w> <w src="13"
>> lemma="x-Strongs:G1453" morph="x-Robinson:V-APP-NSM">Arise,</w> <w src="14"
>> lemma="x-Strongs:G3880" morph="x-Robinson:V-2AAM-2S">and take</w>
>> <w src="16" lemma="x-Strongs:G3813" morph="x-Robinson:N-ASN">the young
>> child</w>
>> <w src="17" lemma="x-Strongs:G2532" morph="x-Robinson:CONJ">and</w>
>> <w src="20" lemma="x-Strongs:G846" morph="x-Robinson:P-GSM">his</w>
>> <w src="19" lemma="x-Strongs:G3384" morph="x-Robinson:N-ASF">mother,</w>
>> <w src="21" lemma="x-Strongs:G2532" morph="x-Robinson:CONJ">and</w>
>> <w src="22" lemma="x-Strongs:G5343" morph="x-Robinson:V-PAM-2S">flee</w>
>> <w src="23" lemma="x-Strongs:G1519" morph="x-Robinson:PREP">into</w>
>> <w src="24" lemma="x-Strongs:G125" morph="x-Robinson:N-ASF">Egypt,</w>
>> <w src="25" lemma="x-Strongs:G2532" morph="x-Robinson:CONJ">and</w>
>> <w src="26" lemma="x-Strongs:G2468" morph="x-Robinson:V-PXM-2S">be thou</w>
>> <w src="27" lemma="x-Strongs:G1563" morph="x-Robinson:ADV">there</w>
>> <w src="28" lemma="x-Strongs:G2193" morph="x-Robinson:CONJ">until</w>
>> <w src="29" lemma="x-Strongs:G302" morph="x-Robinson:PRT"></w>
>> <w src="30" lemma="x-Strongs:G2036" morph="x-Robinson:V-2AAS-1S">I
>> bring</w> <w src="31" lemma="x-Strongs:G4671"
>> morph="x-Robinson:P-2DS">thee</w> <w src="30" lemma="x-Strongs:G2036"
>> morph="x-Robinson:V-2AAS-1S"
>> splitID="41">word:</w>
>> <w src="33" lemma="x-Strongs:G1063" morph="x-Robinson:CONJ">for</w>
>> <w src="34" lemma="x-Strongs:G2264" morph="x-Robinson:N-NSM">Herod</w>
>> <w src="32" lemma="x-Strongs:G3195" morph="x-Robinson:V-PAI-3S">will</w>
>> <w src="35" lemma="x-Strongs:G2212" morph="x-Robinson:V-PAN">seek</w>
>> <w src="37" lemma="x-Strongs:G3813" morph="x-Robinson:N-ASN">the young
>> child</w>
>> <w src="39" lemma="x-Strongs:G622" morph="x-Robinson:V-AAN">to destroy</w>
>> <w src="40" lemma="x-Strongs:G846" morph="x-Robinson:P-ASN">him.</w><resp
>> type="strongsMarkup" name="pdy" date="2003-12-14-08:43"/>
>>
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>
>
More information about the sword-devel
mailing list