<div dir="ltr">Hi,<br><br>I was looking at using the elementtree parser in python to pull out a more or less plain text version of a module quickly for search indexing.<br>Incidentally, it is quite a bit faster than calling striptext - on the esv and kjv, it took about 80% of the time striptext takes<br>
<br>I ran into problems trying it on the NETfree however - there seems to be trailing osis tags at the end of books:<br>For example, from Genesis 50:26<br>'So Joseph died at the age of 110.<note osisRef="Gen.50.26" n="33"></note> After they embalmed him, his body<note osisRef="Gen.50.26" n="34"></note> was placed in a coffin in Egypt.<milestone type="line" /><milestone type="line" /> </div> <b><chapter eID="Gen.50"/></div></b>'<br>
<br>The last two tags in bold shouldn't be there - they are unmatched anywhere, and removing them allows parsing to work.<br><br>The third last tag, which is a div, matches with a tag in the heading of the chapter - is the raw entry of a verse meant to be able to be taken as valid xml by itself? If so, this is also invalid.<br>
<br clear="all">God Bless,<br>Ben<br>-------------------------------------------------------------------------------------------<br>The Lord is not slow to fulfill his promise as some count slowness,<br>but is patient toward you, not wishing that any should perish,<br>
but that all should reach repentance.<br>2 Peter 3:9 (ESV)
</div>