<html><head></head><body> When an apostrophe is used to make an English noun a possessive, if the noun already ends with the letter s - the apostrophe is placed after the s. <div dir="auto"><br></div><div dir="auto">There are even some rare exceptions such as the singular noun cockatrice - in which the possessive just has an apostrophe but no letter s afterwards. Hint: search the KJV module. </div><div dir="auto"><br></div><div dir="auto">One method I’ve used in the past on some projects is to temporarily replace the apostrophe used to make a possessive by an unused letter from Latin-1 Supplement such as U+00FE Latin small letter Thorn. </div><div dir="auto"><br></div><div dir="auto">This is used in Old English, Icelandic and Phonetics, but not in modern English or Early Modern English. </div><div dir="auto"><br></div><div dir="auto">You have to know the rules for possessives - and you may still have instances where a closing single quotation mark could be mistaken for a possessive apostrophe unless the script takes account of the wider context. </div><div dir="auto"><br></div>Another use for the apostrophe is to mark a missing syllable in a longer word. This is less likely to occur in Bibles but it might occur in some formal texts. <div dir="auto"><br></div><div dir="auto">And don’t get me started on works that were first digitised before the use of Unicode. <caret></caret><br><div dir="auto"><br></div><div dir="auto">Best regards,</div><div dir="auto"><br></div><div dir="auto">David <br><div><br></div> <div id="protonmail_mobile_signature_block"><div>Sent from <a href="https://proton.me/mail/home">Proton Mail</a> for iOS</div></div> <div><br></div><div><br></div>On Tue, Dec 19, 2023 at 15:20, Nathan Phillip Brink <<a class="" href="mailto:On Tue, Dec 19, 2023 at 15:20, Nathan Phillip Brink <<a href=">ohnobinki@ohnopublishing.net</a>> wrote:<blockquote type="cite" class="protonmail_quote">
On 2023-12-19 04:26, Matěj Cepl wrote:<br>
<blockquote type="cite">
<pre wrap="" class="moz-quote-pre">On Tue Dec 19, 2023 at 2:17 AM CET, Timothy Allen wrote:
</pre>
<span style="white-space: pre-wrap">
</span>
<blockquote type="cite">
<pre wrap="" class="moz-quote-pre">2. Apostrophes
In English, the apostrophe used for possession (“the boy’s train”) and
omission (“don’t let’s start") is traditionally set with the same
character used as the closing single quote, so in any non-trivial
document there will almost certainly be more "closing single quotes"
than opening single quotes, it's not worth reporting on.
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">Yes, I aware of it, and I feel very blessed that I don’t
have this problem in Czech. I have no idea what to do with
this without proper syntactic analysis, which is out of the
question. Perhaps, running `re.sub(r'’s\b', '@#s', whole_text)`
and then back, but it seems like a receipe for disaster.</pre>
</blockquote>
<p>I think a better solution would be to make the script itself
aware of when a closing single quote is acting as a closing quote
or not. If the closing single quote is followed by an alphabetic
character (it should be able to test Unicode character classes for
this), then it should be treated as an apostrophe instead. I don’t
know if biblical texts generally use contractions, but your
regular expression doesn’t handle contractions generally. Also, I
only know English and I am quite possibly missing some edge cases.
Some examples:</p>
<ul>
<li>This isn’t a closing quote. (‘t’ is an alphabetic character)<br>
</li>
<li>“I said, ‘This is a closing quote within a double-quoted
phrase’”. (‘”’ isn’t an alphabetic character)<span style="white-space: pre-wrap">
</span></li>
</ul>
<blockquote type="cite">
<blockquote type="cite">
<pre wrap="" class="moz-quote-pre">3. Nested quotations
In Genesis 20:11-13, Abraham tells Abimelech that he told Sarah to tell
other people that she was Abraham’s brother. In the BSB (and NIV, and
ESV, and NASB) this results in a triple-nested quotation. In English
typesetting conventions the outermost quotation gets double-quotes, the
second level gets single-quotes, and the third level gets double quotes
again. This causes the script to report an error:
I couldn't immediately think of a way to get around this.
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">Me neither. We should probably make effort for error recovery, so
that the script would continue even after reporting a problem,
but I am not sure how to do that either.</pre>
</blockquote>
The other approach would be checking what the counts are upon
reaching a terminating section. As mentioned below, in English, all
quotes are implicitly closed by the end of a paragraph. So any
nonzero counts at the end of a paragraph are OK. But when you
encounter a closing quote, you can make sure that the last opening
quote is the same type of quote.<span style="white-space: pre-wrap"> If you store the opening quote type in a stack, pop whenever you encounter a closing quote while confirming a match, and report an error upon trying to pop an empty stack or encountering an mismatched quote, and clear the stack upon reaching a paragraph end, that would provide something useful for English.
</span>
<blockquote type="cite">
<blockquote type="cite">
<pre wrap="" class="moz-quote-pre">Another quirk that occurs to me is that in English typesetting, if one
person speaks multiple paragraphs (for example, the Sermon on the Mount)
then each paragraph gets an opening double-quote, but no closing
double-quote. That's going to play havoc with this kind of
quote-checking tool, too.
</pre>
</blockquote>
<pre wrap="" class="moz-quote-pre">Yes, we don’t do this in Czech, but it is typographically
possible to just use paragraph indentation instead
of quoting and of course we don’t have anything like
indentation in the pure XML. I have just added quotes in
the appropriate places and plan sending the patch to the
Czech Biblical Society (after David reviews my fixes in
<a href="https://gitlab.com/crosswire-bible-society/CzeCEP/-/issues/2" class="moz-txt-link-freetext">https://gitlab.com/crosswire-bible-society/CzeCEP/-/issues/2</a>)
with some other clear bugs I have found.</pre>
</blockquote>
<p>See above.</p>
<p><span style="white-space: pre-wrap">Unfortunately, it sounds like English speakers would want the script to be aware of different rules per-language, which definitely complicates things. But that would increase the utility in automatically identifying likely transcription errors.
</span></p>
</blockquote></div></div></body></html>