Karl,<br><br>That is an astounding script. Amazingly done! I haven't tested it, as I don't have Clarke's installed, but it seems that if the Sword lib is mishandling the & character and the <br /> tag, then the problem really lies within Sword and should be fixed there, ASAP. Excellent sed-ing, though!
<br><br>--Greg<br><br><div><span class="gmail_quote">On 9/24/06, <b class="gmail_sendername">Karl Kleinpaste</b> <<a href="mailto:karl@charcoal.com">karl@charcoal.com</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
The nasty little script below takes the current Clarke content and<br>strips the extraneous <br /> elements out in a coherent fashion. This<br>makes the Clarke content actually readable, as opposed to its current<br>
state, which (unless you allow for a very wide commentary subwindow)<br>is thoroughly unreadable.<br><br>Along the way, it also converts his (excessive) use of "&c." into<br>"etc.", which makes some sections work that do not work under the
<br>current Clarke incarnation. Cf. James 5:20, ¾ down, a paragraph<br>beginning, "1. I have already conjectured...", and observe odd<br>paragraph break and grammatical failure -- Sword libs are not<br>preserving `&' properly; proper content is present, but it's simply
<br>not handled properly. See also Gen 1:11, for which Clarke displays<br>nothing at all in WinSword/BibleCS, even though there is content.<br>(GnomeSword displays Clarke's Gen 1:11 content, but incompletely so.)<br><br>
#!/bin/sh -x<br>mod2imp Clarke |<br>sed -e 's|&c\.|etc.|g' \<br> -e 's|\([A-Za-z0-9€-ÿ),.?!:;"]\)<br /> \([A-Za-z0-9€-ÿ(,.?!:;"]\)|\1 \2|g' \<br> -e 's|</i><br /> \+<i>| |g' \<br>
-e 's|\([A-Za-z0-9€-ÿ),.?!:;"]\) \?<br /> <\([is]\)|\1 <\2|g' \<br> -e 's|\([fi]\)><br /> \([A-Za-z0-9€-ÿ(,.?!:;"]\)|\1> \2|g' \<br> -e 's|]<br /> |] |'g \<br> -e 's|<br /> \[| [|'g |
<br>imp2vs /dev/stdin . 2>&1 | egrep -v '^from file: |^adding entry: '<br>chmod go+r nt nt.vss ot ot.vss<br>exit 0<br><br>The modified clarkenobr.conf I'm using:<br><br>[ClarkeNoBr]<br>DataPath=./modules/comments/rawcom/clarke-nobr/
<br>ModDrv=rawCom<br>Lang=en<br>Encoding=UTF-8<br>SourceType=ThML<br>Description=Adam Clarke's Commentary on the Bible (without forced line breaks)<br>About=Adam Clarke's 1810/1825 commentary and critical notes on the Bible, with forced line breaks removed.
<br>LCSH=Bible. Commentaries.<br>DistributionLicense=Public Domain<br><br>--karl<br><br>_______________________________________________<br>sword-devel mailing list: <a href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org
</a><br><a href="http://www.crosswire.org/mailman/listinfo/sword-devel">http://www.crosswire.org/mailman/listinfo/sword-devel</a><br>Instructions to unsubscribe/change your settings at above page<br></blockquote></div><br>