[sword-devel] Python script for checking pairwise characters (PROFF-OF-CONCEPT)
Timothy Allen
thristian at gmail.com
Mon Dec 18 20:17:23 EST 2023
On 19/12/23 00:06, Matěj Cepl wrote:
> I have decided not to rely on very kind help by David
> with his Windows tools and I have written (hopefully)
> completely platform neutral pure Python 3 script for checking
> pairwise-characters. So, far it was used only for fixing
> https://gitlab.com/crosswire-bible-society/CzeCEP/-/issues/2 and
> I am quite sure it is pretty buggy, but it could be proven useful
> for somebody.
Thank you for doing this work! This seems like it could be a useful tool
for validating texts of all kinds.
I tried running it over my BSB module, and I hit problems fairly
quickly, some of which are more easily solved than others.
1. No support for language “en”
This was easy enough to handle, there's a configuration variable near
the top of the file that lets you configure which quotes are used for
which languages.
2. Apostrophes
In English, the apostrophe used for possession (“the boy’s train”) and
omission (“don’t let’s start") is traditionally set with the same
character used as the closing single quote, so in any non-trivial
document there will almost certainly be more "closing single quotes"
than opening single quotes, it's not worth reporting on.
I got around this by just deleting single quotes from the configuration.
3. Nested quotations
In Genesis 20:11-13, Abraham tells Abimelech that he told Sarah to tell
other people that she was Abraham’s brother. In the BSB (and NIV, and
ESV, and NASB) this results in a triple-nested quotation. In English
typesetting conventions the outermost quotation gets double-quotes, the
second level gets single-quotes, and the third level gets double quotes
again. This causes the script to report an error:
> Balance for character “ is over one in Gen.20.13
I couldn't immediately think of a way to get around this.
Another quirk that occurs to me is that in English typesetting, if one
person speaks multiple paragraphs (for example, the Sermon on the Mount)
then each paragraph gets an opening double-quote, but no closing
double-quote. That's going to play havoc with this kind of
quote-checking tool, too.
Perhaps this kind of tool just isn't suited to checking English text...
but I'm sure there's other languages with more sensible conventions that
it could help with. Good luck with it!
Timothy.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20231219/f5d56cb1/attachment.htm>
More information about the sword-devel
mailing list