<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Thu, Jun 19, 2025 at 9:07 AM DM Smith <<a href="mailto:dmsmith@crosswire.org">dmsmith@crosswire.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>Greg,<div>There’s an extraneous %s in the output.</div></div></blockquote><div><br></div><div>Ah, not surprising. That is the old, Python 2 way of formatting variables into a string, similar to C style printf syntax with variable arguments coming in a tuple after an overload of the modulus operator (so it would look like `"this is a string: %s" % (a_string, )` ). The modern preferred way is with an f-string, where you preface a string with the character `f` and then reference variables in the string with {variable_name} syntax (e.g. `f"this is a string: {a_string}"`). That %s can be killed off, or replaced with an f-string equivalent.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div><br></div><div>If you put the enumeration after the line "There are 93 OT IDs and 5 NT IDs in v11n which aren’t in your file.” Then you wouldn’t need the heading "The following IDs don’t appear in your file:”</div></div></blockquote><div><br></div><div>Yeah, I had been putting the IDs out to stderr with the logging utility previously. It was only yesterday when I was squashing the remaining Python 3 compat issues that I realized I should just drop them into a print statement. They are, thusly, kinda crazy. In fact, I pass them through a `sort` call, so they won't be in either canonical or document order - unless the document has its verses sorted alphabetically by osisID attribute for some inexplicable reason.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div>It’d also be nice to format it a few per line, indented appropriately.</div></div></blockquote><div><br></div><div>Perhaps broken up by book? Or by book/chapter So it's like</div><div>Verses missing from:</div><div>Gen</div><div> 1 - 1, 3, 5, 7</div><div> 2 - 11, 22</div><div>Exo</div><div> 27 - 1</div><div><br></div><div>There is a long way to go to improve the output, especially of this detail portion. It was, after all, only intended as debugging output for me while I was writing it.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div><br></div><div>I’d be happy to iterate over any suggestions we agree on.</div></div></blockquote><div><br></div><div>As I am not a user of it, nor an intended consumer of it, feel free to improve it as needed! I quickly hacked it together and tossed it out into the world at someone's off-handed request. I don't create modules, though, so I have no vested interest in preserving its current operation in any particular form. And, if this thread has shown anything, it's that likely Peter has been the only user to date. So I doubt you'll disturb anyone else with it.</div><div><br></div><div>If you need my support for anything, I'm happy to lend a hand.</div><div><br></div><div>Pulling in comments from your other email on this thread:</div><div><br></div><div>> I like that it's very simple to read. Having a summary is good. And the
other email which lists the exact ids extra/missing per testament is
very helpful.<div>> I think that enumerating the names of
the extra/missing books and extra/missing chapters would be good. No
sense in enumerating the ids within these.</div><div><br></div><div>That probably would be good. I didn't include detection for an entire missing chapter or book, but it shouldn't be too terribly difficult to enhance it with that. A simple brute force check of every detected missing book or chapter to see if there are any matched verses can reveal that pretty easily.</div><div><br></div><div>> I
ran mine against an input that was a test case for osis2mod’s infinite
loop and it had 2 extra books and 13 extra chapters. This wouldn’t be
obvious in your results.</div><div><br></div><div>True, mine would just complain about hundreds or even thousands of mismatches and silently swallow the list of what those are. I had a few of those that I omitted from the sample output I captured. For instance, there are large portions of the canon for the Catholic versifications missing from the KJV file. It just lists of something absurd like "There are 4,741 missing verses" or whatever it is.</div><div><br></div><div>> Is it an advantage or disadvantage to be compiled against SWORD lib vs slurping header files?</div><div><br></div><div>Like most things, it's a trade-off. Working with the bindings requires that the Sword bindings are installed on the host system. For someone running on Windows, this is particularly non-trivial. For someone running in macOS it's not too difficult to install from source (I don't believe Homebrew builds them). For users of major Linux distributions, it's downright trivial. On Fedora it's as simple as a single `dnf install python3-sword` command for a long time now, and it looks like the bindings are also available for Ubuntu starting in 25.04 with an `apt install python3-sword` as well.</div><div><br></div><div>Advantages of the binding method are that it doesn't rely on parsing a C header file, nor on the file laying out the values in a certain way. It also can be used offline easily, doesn't require parsing the output of HTML in order to find all the applicable files, and is likely slightly faster. Not that the speed probably matters for a single run of this, but if you're bulk processing files the speed advantages can add up.</div><div><br></div><div>Disadvantages of the binding method are that it's requiring you to revert back to a source build if you are using this to test a canon.h file or if you want to use a canon file that isn't available in the package manager of your Linux distribution. Building from source isn't terribly onerous for most of us contributors but it might be more of a problem for a module maintainer. Then again, how often do we add a new versification to the code base?</div><div><br></div><div>So there are pros and cons between them. I was freshly off of getting the bindings to compile when I wrote the first draft of av11n.py so I naturally went that direction. I also try to avoid writing parsers when I can leverage existing ones, as grammars can be notoriously complex to get correct. So that dictated my choices as much as did anything else, really!</div><div><br></div><div>Another possible enhancement might be a CLI flag to limit the testing range to a particular book (or testament) at a time. I have heard people talk about having modules split up to one book per file or similar. If they could say, "Only check this file against Joshua" then it could keep down a significant amount of extra output. But again - I'm not really an intended user of it!</div><div><br></div><div>--Greg</div></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div><br></div><div>DM<br id="m_274868516123331404lineBreakAtBeginningOfMessage"><div><br><blockquote type="cite"><div>On Jun 19, 2025, at 12:12 AM, Greg Hellings <<a href="mailto:greg.hellings@gmail.com" target="_blank">greg.hellings@gmail.com</a>> wrote:</div><br><div><div dir="ltr"><div>And here's an example now that I've fixed the output of the osisIDs when there are fewer than 100 of them:</div><div><br></div><div>[vagrant@localhost ~]$ ./av11n.py kjv.osis.xml <br> <br>Checking Calvin:<br>---------------- <br> The following IDs don’t appear in your file:<br>%s 1Kgs.22.54, 1Sam.20.43, 1Sam.24.23, 3John.1.15, Acts.24.28, Eccl.12.15, Eccl.12.16, Ezek.21.33, Ezek.21.34, Ezek.21.35, Ezek.21.36, Ezek.21.37, Hos.12.15, Isa.8.23, Job.39.31, Job.39.32, Job.39.33, Job.39.34, Job.39.35, Job.39.36, Job.39.37, Job.39.38<br>, Job.40.25, Job.40.26, Job.40.27, Job.40.28, Jonah.2.11, Mark.10.53, Mark.9.51, Num.13.34, Num.30.17, Ps.102.29, Ps.108.14, Ps.12.9, Ps.140.14, Ps.142.8, Ps.18.51, Ps.19.15, Ps.20.10, Ps.21.14, Ps.22.32, Ps.3.9, Ps.30.13, Ps.31.25, Ps.34.23, Ps.36.13, P<br>s.38.23, Ps.39.14, Ps.4.9, Ps.40.18, Ps.41.14, Ps.42.12, Ps.44.27, Ps.45.18, Ps.46.12, Ps.47.10, Ps.48.15, Ps.49.21, Ps.5.13, Ps.51.20, Ps.51.21, Ps.52.10, Ps.52.11, Ps.53.7, Ps.54.8, Ps.54.9, Ps.55.24, Ps.56.14, Ps.57.12, Ps.58.12, Ps.59.18, Ps.6.11, Ps<br>.60.13, Ps.60.14, Ps.61.9, Ps.62.13, Ps.63.12, Ps.64.11, Ps.65.14, Ps.67.8, Ps.68.36, Ps.69.37, Ps.7.18, Ps.70.6, Ps.75.11, Ps.76.13, Ps.77.21, Ps.8.10, Ps.80.20, Ps.81.17, Ps.83.19, Ps.84.13, Ps.85.14, Ps.88.19, Ps.89.53, Ps.9.21, Ps.92.16, Rev.12.18<br> There are 93 OT IDs and 5 NT IDs in v11n which aren’t in your file.<br> The following IDs don’t appear in v11n: <br>%s 1Kgs.22.54, 1Sam.20.43, 1Sam.24.23, 3John.1.15, Acts.24.28, Eccl.12.15, Eccl.12.16, Ezek.21.33, Ezek.21.34, Ezek.21.35, Ezek.21.36, Ezek.21.37, Hos.12.15, Isa.8.23, Job.39.31, Job.39.32, Job.39.33, Job.39.34, Job.39.35, Job.39.36, Job.39.37, Job.39.38<br>, Job.40.25, Job.40.26, Job.40.27, Job.40.28, Jonah.2.11, Mark.10.53, Mark.9.51, Num.13.34, Num.30.17, Ps.102.29, Ps.108.14, Ps.12.9, Ps.140.14, Ps.142.8, Ps.18.51, Ps.19.15, Ps.20.10, Ps.21.14, Ps.22.32, Ps.3.9, Ps.30.13, Ps.31.25, Ps.34.23, Ps.36.13, P<br>s.38.23, Ps.39.14, Ps.4.9, Ps.40.18, Ps.41.14, Ps.42.12, Ps.44.27, Ps.45.18, Ps.46.12, Ps.47.10, Ps.48.15, Ps.49.21, Ps.5.13, Ps.51.20, Ps.51.21, Ps.52.10, Ps.52.11, Ps.53.7, Ps.54.8, Ps.54.9, Ps.55.24, Ps.56.14, Ps.57.12, Ps.58.12, Ps.59.18, Ps.6.11, Ps<br>.60.13, Ps.60.14, Ps.61.9, Ps.62.13, Ps.63.12, Ps.64.11, Ps.65.14, Ps.67.8, Ps.68.36, Ps.69.37, Ps.7.18, Ps.70.6, Ps.75.11, Ps.76.13, Ps.77.21, Ps.8.10, Ps.80.20, Ps.81.17, Ps.83.19, Ps.84.13, Ps.85.14, Ps.88.19, Ps.89.53, Ps.9.21, Ps.92.16, Rev.12.18<br> There are 1 OT IDs and 29 NT IDs in your file which don’t appear in v11n.<br><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jun 18, 2025 at 11:00 PM Greg Hellings <<a href="mailto:greg.hellings@gmail.com" target="_blank">greg.hellings@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Here is an example of the first lines of running my script against the kjv.osis.xml file from the git repo:</div><div><br></div><div><br>Checking Calvin:<br>----------------<br> There are 93 OT IDs and 5 NT IDs in v11n which aren’t in your file.<br> There are 0 OT IDs and 30 NT IDs in your file which don’t appear in v11n.<br><br>Checking Catholic:<br>------------------<br> There are 4530 OT IDs and 3 NT IDs in v11n which aren’t in your file.<br> There are 0 OT IDs and 133 NT IDs in your file which don’t appear in v11n.<br><br>Checking Catholic2:<br>-------------------<br> There are 4638 OT IDs and 3 NT IDs in v11n which aren’t in your file.<br> There are 0 OT IDs and 133 NT IDs in your file which don’t appear in v11n.<br><br>Checking DarbyFr:<br>-----------------<br> There are 31 OT IDs and 4 NT IDs in v11n which aren’t in your file.<br> There are 0 OT IDs and 30 NT IDs in your file which don’t appear in v11n.<br></div><div><br></div><div>This continues on to include such output as</div><div><br></div><div> <br>Checking KJV:<br>------------- <br> Your file has all the references in this v11n<br> Your file has no extra references <br> <br>Checking KJVA: <br>--------------<br> There are 5717 OT IDs and 0 NT IDs in v11n which aren’t in your file.<br> Your file has no extra references<br><br></div><div>giving a clear example of a winner for this particular file.</div><div><br></div><div>Meanwhile, running it against the kjva.osis.xml file includes this in the results:</div><div><br></div><div>...</div><div><br>Checking KJV: <br>------------- <br> Your file has all the references in this v11n<br> There are 2 OT IDs and 5715 NT IDs in your file which don’t appear in v11n.<br> <br>Checking KJVA: <br>-------------- <br> Your file has all the references in this v11n<br> Your file has no extra references</div><div>...</div><div><br></div><div>Fiddling with the file has showed me there are a couple of places where I need to tweak it for Python 3 compatibility that I missed the last time I updated. But fixing those couple of little syntax issues resulted in it running just fine in a Fedora 41 vm with nothing more to do than invoke `dnf install python3-sword` to setup the system to use it.</div><div><br></div><div>--Greg</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jun 18, 2025 at 10:40 PM Greg Hellings <<a href="mailto:greg.hellings@gmail.com" target="_blank">greg.hellings@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto">My script eschews percentages because they seemed relatively pointless to me for measuring a mismatch like this. Instead it gives a count of both Old and New Testament osisIDs that it finds missing and another that it finds unexpectedly for a given versification. If the total of either count is fewer than 100, the IDs for that particular count are printed to the console. It will do this for every registered versification in the version of the library it was compiled against, allowing the user to select whichever one seems best to them based on the results.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jun 18, 2025, 10:25 PM David Haslam <<a href="mailto:dfhdfh@protonmail.com" target="_blank">dfhdfh@protonmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div> <div dir="auto">It’s not just the number of “missing” verses that should figure in the percentage score, but also the number of verses that get concatenated to the last one in a chapter.</div><div dir="auto"><br></div><div dir="auto">The differences in v11n for the Psalms will be especially significant for this, in that some v11n renumber many of them. Likewise for the last few chapters in the book of Job.</div><div dir="auto"><br></div><div dir="auto">Aside: It would be cool to enhance the utility emptyvss by providing a command line option that would ignore books that are not included in the scope parameter in the conf file.</div><div dir="auto"><br></div><div dir="auto">Regards,</div><div><br></div> <div dir="auto">David</div><div><br></div>On Thu, Jun 19, 2025 at 03:18, DM Smith <<a href="mailto:On+Thu,+Jun+19,+2025+at+03:18,+DM+Smith+%3C%3Ca+href=" rel="noreferrer" target="_blank">dmsmith@crosswire.org</a>> wrote:<blockquote type="cite"> <div>
David,
</div>
<div>
<br>
</div>
<div>
Because it only considers the xml, scope is automatically built into it. It is only comparing what is present in the xml with what is part of the av11ns.
</div>
<div>
<br>
</div>
<div>
It might be good to add the enumeration of missing verses.
</div>
<div>
<br>
</div>
<div>
— DM
</div>
<div>
<br>
<blockquote type="cite">
<div>
On Jun 18, 2025, at 4:02 PM, David Haslam <<a href="mailto:dfhdfh@protonmail.com" rel="noreferrer" target="_blank">dfhdfh@protonmail.com</a>> wrote:
</div>
<br>
<div>
<div>
<div dir="auto">
Does it take account of the Scope key in the .conf file for a less than complete Bible ?
</div>
<div dir="auto">
<br>
</div>
<div dir="auto">
David
</div>
<div>
<br>
</div>
<div id="m_274868516123331404m_-4072912897035575119m_-1962289800233906647m_-4553660002178259517protonmail_mobile_signature_block">
<div>
Sent from
<a href="https://proton.me/mail/home" rel="noreferrer" target="_blank">Proton Mail</a> for iOS
</div>
</div>
<div>
<br>
</div>
<div>
<br>
</div>On Wed, Jun 18, 2025 at 20:51, DM Smith <
<a href="mailto:On+Wed,+Jun+18,+2025+at+20:51,+DM+Smith+%3C%3Ca+href=" rel="noreferrer" target="_blank">dmsmith@crosswire.org</a>> wrote:
<blockquote type="cite">
Hi,
<div>
<br>
</div>
<div>
Several have commented on how hard it is to test an OSIS xml file against v11ns especially since it goes off into an infinite loop. (I’ve posted a patch that fixes that) But it is still a process of trial and error to find an appropriate v11n.
</div>
<div>
<br>
</div>
<div>
<div>
So, I’ve been iterating with chatGPT to create a python script to find a best fit v11n. Since I don’t know python, I can’t vouch for the script beyond it worked for a simple test case that had an extra chapter for Genesis and had some extra verses at the end of a chapter in that book.
</div>
<div>
<br>
</div>
<div>
I offer it, as a starting place. See the attached file.
</div>
<div>
<br>
</div>
<div>
It has a —debug flag.
</div>
<div>
The first argument is expected to be the OSIS xml file.
</div>
<div>
The second argument is optional and gives the location to the include directory of svn/sword/trunk/include with all the canon*.h files. If you don’t supply the argument, it uses the web to load the canon*.h files from
<a href="https://www.crosswire.org/svn/sword/trunk/include" rel="noreferrer" target="_blank">https://www.crosswire.org/svn/sword/trunk/include</a>.
</div>
<div>
<br>
</div>
<div>
It will score the fitness of each of the v11ns. It gives the score as a %, but I don’t know what that means. I told it that it should prioritize book matches, then chapter matches and finally verse matches. I don’t know how well it did that scoring. I didn’t test for that.
</div>
<div>
<br>
</div>
<div>
The output is alphabetized. If more than one v11n have the same high score, they are listed.
</div>
<div>
<br>
</div>
<div>
In His Service,
</div>
<div>
<span style="white-space:pre-wrap"> </span>DM
</div>
<div>
<br>
</div>
<div></div>
</div>
<div>
<div></div>
</div>
</blockquote>
</div>_______________________________________________
<br>sword-devel mailing list: <a href="mailto:sword-devel@crosswire.org" rel="noreferrer" target="_blank">sword-devel@crosswire.org</a>
<br><a href="http://crosswire.org/mailman/listinfo/sword-devel" rel="noreferrer" target="_blank">http://crosswire.org/mailman/listinfo/sword-devel</a>
<br>Instructions to unsubscribe/change your settings at above page
<br>
</div>
</blockquote>
</div>
<br></blockquote></div>_______________________________________________<br>
sword-devel mailing list: <a href="mailto:sword-devel@crosswire.org" rel="noreferrer" target="_blank">sword-devel@crosswire.org</a><br>
<a href="http://crosswire.org/mailman/listinfo/sword-devel" rel="noreferrer noreferrer" target="_blank">http://crosswire.org/mailman/listinfo/sword-devel</a><br>
Instructions to unsubscribe/change your settings at above page<br>
</blockquote></div>
</blockquote></div>
</blockquote></div>
_______________________________________________<br>sword-devel mailing list: <a href="mailto:sword-devel@crosswire.org" target="_blank">sword-devel@crosswire.org</a><br><a href="http://crosswire.org/mailman/listinfo/sword-devel" target="_blank">http://crosswire.org/mailman/listinfo/sword-devel</a><br>Instructions to unsubscribe/change your settings at above page<br></div></blockquote></div><br></div></div>_______________________________________________<br>
sword-devel mailing list: <a href="mailto:sword-devel@crosswire.org" target="_blank">sword-devel@crosswire.org</a><br>
<a href="http://crosswire.org/mailman/listinfo/sword-devel" rel="noreferrer" target="_blank">http://crosswire.org/mailman/listinfo/sword-devel</a><br>
Instructions to unsubscribe/change your settings at above page<br>
</blockquote></div></div>