[sword-devel] NET markup problems

Ben Morgan benpmorgan at gmail.com
Sun Jul 20 23:54:56 MST 2008


Hi,

I was looking at using the elementtree parser in python to pull out a more
or less plain text version of a module quickly for search indexing.
Incidentally, it is quite a bit faster than calling striptext - on the esv
and kjv, it took about 80% of the time striptext takes

I ran into problems trying it on the NETfree however - there seems to be
trailing osis tags at the end of books:
For example, from Genesis 50:26
'So Joseph died at the age of 110.<note osisRef="Gen.50.26" n="33"></note>
After they embalmed him, his body<note osisRef="Gen.50.26" n="34"></note>
was placed in a coffin in Egypt.<milestone type="line" /><milestone
type="line" /> </div> *<chapter eID="Gen.50"/></div>*'

The last two tags in bold shouldn't be there - they are unmatched anywhere,
and removing them allows parsing to work.

The third last tag, which is a div, matches with a tag in the heading of the
chapter - is the raw entry of a verse meant to be able to be taken as valid
xml by itself? If so, this is also invalid.

God Bless,
Ben
-------------------------------------------------------------------------------------------
The Lord is not slow to fulfill his promise as some count slowness,
but is patient toward you, not wishing that any should perish,
but that all should reach repentance.
2 Peter 3:9 (ESV)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.crosswire.org/pipermail/sword-devel/attachments/20080721/ba104edf/attachment-0001.html 


More information about the sword-devel mailing list