[sword-devel] Updating Clarke commentary to become readable

Karl Kleinpaste karl at charcoal.com
Mon Sep 25 06:20:20 MST 2006


A life theme of mine: "One catastrophe at a time." :-)

Last evening, I said that updating the Clarke content to replace `&'
with "&" made it display fine in WinSword.  I was half right when
I said so.

First, the actual `&' in Clarke's use of "&c." display fine.

But second, in the problem text in question (James 5:20), there is a
new, spurious `&' tacked seemingly arbitrarily onto the word "author".

The updated module text in question:

    1. I have already conjectured that this epistle ranks among the
    <i>most ancient</i> of the Christian writings; its total want of
    reference to the great facts which distinguish the early history of
    the Church, viz., the calling of the Gentiles, the disputes between
    them and the Jews, the questions concerning circumcision, and the
    obligation of the law in connection with the Gospel &amp;c., &amp;c.,
    shows that it must have been written before those things took place,
    or that they must have been wholly unknown to the author; which is
    incredible, allowing him to have been a <i>Christian</i> writer.

What WinSword/BibleCS actually displays:

    1. I have already conjectured that this epistle ranks among the most
    ancient of the Christian writings; its total want of reference to the
    great facts which distinguish the early history of the Church, viz.,
    the calling of the Gentiles, the disputes between them and the Jews,
    the questions concerning circumcision, and the obligation of the law
    in connection with the Gospel &c., &c., shows that it must have been
    written before those things took place, or that they must have been
    wholly unknown to the author& which is incredible, allowing him to
    have been a Christian writer.

Notice the arrival of a new, arbitrary `&' after "author", replacing
the `;' that was supposed to be there.

Now, there is an added complication to this: GnomeSword makes this
improper `&' disappear, just like it makes whole "&amp;" sequences
disappear.  But why would GS be seeing `;' transformed into `&' in the
first place?

That is the first underlying problem, I think: It seems that there is
some prior filtering going on, common to WinSword's RTF and GS' HTML,
which makes this go wrong rather early on.

I'll be hunting down details on GS' perspective in this later today.

--karl

PS- I know ThML is technically on its way out, but given its status as
the module type of the vast majority (5x as many as OSIS), I'd really
like to fix this.

PPS- Slightly updated script to generate br-less Clarke; or I can make
a .zip available to anyone who cares to poke at this problem.

#!/bin/sh -x
mod2imp Clarke |
sed -e 's|&|\&amp;|g' \
    -e 's|\([A-Za-z0-9€-ÿ),.?!:;"]\)<br /> \([A-Za-z0-9€-ÿ(,.?!:;&"]\)|\1 \2|g' \
    -e 's|</i><br /> \+<i>| |g' \
    -e 's|\([A-Za-z0-9€-ÿ),.?!:;"]\) \?<br /> <\([is]\)|\1 <\2|g' \
    -e 's|\([fi]\)><br /> \([A-Za-z0-9€-ÿ(,.?!:;&"]\)|\1> \2|g' \
    -e 's|]<br /> |] |'g \
    -e 's|<br /> \[| [|'g |
imp2vs /dev/stdin . 2>&1 | egrep -v '^from file: |^adding entry: '
chmod go+r nt nt.vss ot ot.vss
exit 0

[ClarkeNoBr]
DataPath=./modules/comments/rawcom/clarke-nobr/
ModDrv=rawCom
Lang=en
Encoding=UTF-8
SourceType=ThML
Description=Adam Clarke's Commentary on the Bible (without forced line breaks)
About=Adam Clarke's 1810/1825 commentary and critical notes on the Bible, with forced line breaks removed.
LCSH=Bible. Commentaries.
DistributionLicense=Public Domain



More information about the sword-devel mailing list