[sword-devel] Updating Clarke commentary to become readable
Karl Kleinpaste
karl at charcoal.com
Mon Sep 25 06:20:20 MST 2006
A life theme of mine: "One catastrophe at a time." :-)
Last evening, I said that updating the Clarke content to replace `&'
with "&" made it display fine in WinSword. I was half right when
I said so.
First, the actual `&' in Clarke's use of "&c." display fine.
But second, in the problem text in question (James 5:20), there is a
new, spurious `&' tacked seemingly arbitrarily onto the word "author".
The updated module text in question:
1. I have already conjectured that this epistle ranks among the
<i>most ancient</i> of the Christian writings; its total want of
reference to the great facts which distinguish the early history of
the Church, viz., the calling of the Gentiles, the disputes between
them and the Jews, the questions concerning circumcision, and the
obligation of the law in connection with the Gospel &c., &c.,
shows that it must have been written before those things took place,
or that they must have been wholly unknown to the author; which is
incredible, allowing him to have been a <i>Christian</i> writer.
What WinSword/BibleCS actually displays:
1. I have already conjectured that this epistle ranks among the most
ancient of the Christian writings; its total want of reference to the
great facts which distinguish the early history of the Church, viz.,
the calling of the Gentiles, the disputes between them and the Jews,
the questions concerning circumcision, and the obligation of the law
in connection with the Gospel &c., &c., shows that it must have been
written before those things took place, or that they must have been
wholly unknown to the author& which is incredible, allowing him to
have been a Christian writer.
Notice the arrival of a new, arbitrary `&' after "author", replacing
the `;' that was supposed to be there.
Now, there is an added complication to this: GnomeSword makes this
improper `&' disappear, just like it makes whole "&" sequences
disappear. But why would GS be seeing `;' transformed into `&' in the
first place?
That is the first underlying problem, I think: It seems that there is
some prior filtering going on, common to WinSword's RTF and GS' HTML,
which makes this go wrong rather early on.
I'll be hunting down details on GS' perspective in this later today.
--karl
PS- I know ThML is technically on its way out, but given its status as
the module type of the vast majority (5x as many as OSIS), I'd really
like to fix this.
PPS- Slightly updated script to generate br-less Clarke; or I can make
a .zip available to anyone who cares to poke at this problem.
#!/bin/sh -x
mod2imp Clarke |
sed -e 's|&|\&|g' \
-e 's|\([A-Za-z0-9-ÿ),.?!:;"]\)<br /> \([A-Za-z0-9-ÿ(,.?!:;&"]\)|\1 \2|g' \
-e 's|</i><br /> \+<i>| |g' \
-e 's|\([A-Za-z0-9-ÿ),.?!:;"]\) \?<br /> <\([is]\)|\1 <\2|g' \
-e 's|\([fi]\)><br /> \([A-Za-z0-9-ÿ(,.?!:;&"]\)|\1> \2|g' \
-e 's|]<br /> |] |'g \
-e 's|<br /> \[| [|'g |
imp2vs /dev/stdin . 2>&1 | egrep -v '^from file: |^adding entry: '
chmod go+r nt nt.vss ot ot.vss
exit 0
[ClarkeNoBr]
DataPath=./modules/comments/rawcom/clarke-nobr/
ModDrv=rawCom
Lang=en
Encoding=UTF-8
SourceType=ThML
Description=Adam Clarke's Commentary on the Bible (without forced line breaks)
About=Adam Clarke's 1810/1825 commentary and critical notes on the Bible, with forced line breaks removed.
LCSH=Bible. Commentaries.
DistributionLicense=Public Domain
More information about the sword-devel
mailing list