<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<div class="moz-cite-prefix">On 19/12/23 00:06, Matěj Cepl wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CXRHD9X38KM1.3S20P5K53U0UJ@cepl.eu">
<pre class="moz-quote-pre" wrap="">I have decided not to rely on very kind help by David
with his Windows tools and I have written (hopefully)
completely platform neutral pure Python 3 script for checking
pairwise-characters. So, far it was used only for fixing
<a class="moz-txt-link-freetext" href="https://gitlab.com/crosswire-bible-society/CzeCEP/-/issues/2">https://gitlab.com/crosswire-bible-society/CzeCEP/-/issues/2</a> and
I am quite sure it is pretty buggy, but it could be proven useful
for somebody.
</pre>
</blockquote>
<p>Thank you for doing this work! This seems like it could be a
useful tool for validating texts of all kinds.</p>
<p>I tried running it over my BSB module, and I hit problems fairly
quickly, some of which are more easily solved than others.</p>
<p>1. No support for language “en”</p>
<p>This was easy enough to handle, there's a configuration variable
near the top of the file that lets you configure which quotes are
used for which languages.</p>
<p>2. Apostrophes</p>
<p>In English, the apostrophe used for possession (“the boy’s
train”) and omission (“don’t let’s start") is traditionally set
with the same character used as the closing single quote, so in
any non-trivial document there will almost certainly be more
"closing single quotes" than opening single quotes, it's not worth
reporting on.</p>
<p>I got around this by just deleting single quotes from the
configuration.</p>
<p>3. Nested quotations<br>
</p>
<p>In Genesis 20:11-13, Abraham tells <span id="en-NASB-511"
class="text Gen-20-15">Abimelech that he told Sarah to tell
other people that she was Abraham’s brother. In the BSB (and
NIV, and ESV, and NASB) this results in a triple-nested
quotation. In English typesetting conventions the outermost
quotation gets double-quotes, the second level gets
single-quotes, and the third level gets double quotes again.
This causes the script to report an error:</span></p>
<p><span id="en-NASB-511" class="text Gen-20-15">> Balance for
character “ is over one in Gen.20.13</span></p>
<p><span id="en-NASB-511" class="text Gen-20-15">I couldn't
immediately think of a way to get around this.</span></p>
<p><span id="en-NASB-511" class="text Gen-20-15">Another quirk that
occurs to me is that in English typesetting, if one person
speaks multiple paragraphs (for example, the Sermon on the
Mount) then each paragraph gets an opening double-quote, but no
closing double-quote. That's going to play havoc with this kind
of quote-checking tool, too.</span></p>
<p><span id="en-NASB-511" class="text Gen-20-15">Perhaps this kind
of tool just isn't suited to checking English text... but I'm
sure there's other languages with more sensible conventions that
it could help with. Good luck with it!</span></p>
<p><span id="en-NASB-511" class="text Gen-20-15"><br>
</span></p>
<p><span id="en-NASB-511" class="text Gen-20-15">Timothy.<br>
</span></p>
</body>
</html>