[sword-devel] Script to find a best fit v11n
Greg Hellings
greg.hellings at gmail.com
Sun Jun 29 20:04:25 EDT 2025
Someone left me alone for too long and I converted my original Python
script for parsing versifications into a Rust GUI application. You can see
the quick screencast of what I've done here: https://youtu.be/IwTQTF8PRC4
I was about to publish it and produce binaries, but I ran into a problem:
I used Slint to create the UI, and Slint is licensed for either commercial
or GPLv3
SWORD is licensed as GPLv2
>From my understanding, since v2 and v3 are incompatible, that means I can't
release the both together in a single application. If that's incorrect,
please someone let me know!
--Greg
On Thu, Jun 19, 2025 at 10:20 PM Greg Hellings <greg.hellings at gmail.com>
wrote:
>
>
> On Thu, Jun 19, 2025 at 4:13 PM DM Smith <dmsmith at crosswire.org> wrote:
>
>>
>> On Jun 19, 2025, at 3:24 PM, Greg Hellings <greg.hellings at gmail.com>
>> wrote:
>>
>>
>> Like most things, it's a trade-off. Working with the bindings requires
>> that the Sword bindings are installed on the host system. For someone
>> running on Windows, this is particularly non-trivial. For someone running
>> in macOS it's not too difficult to install from source (I don't believe
>> Homebrew builds them). For users of major Linux distributions, it's
>> downright trivial. On Fedora it's as simple as a single `dnf install
>> python3-sword` command for a long time now, and it looks like the bindings
>> are also available for Ubuntu starting in 25.04 with an `apt install
>> python3-sword` as well.
>>
>>
>> Regarding building SWORD on a Mac, I use homebrew for extra packages. I
>> tried to run ./autogen.sh, but it failed on libtoolize, which homebrew
>> doesn’t have. Then I ran cmake, which failed because icu4c required C++17
>> or better. Hacking that I got CMakeLists.txt, I got it to work. I’ll see if
>> I can use that to run your script.
>>
>
> For these purposes, neither ICU nor CLucene are needed. It's only pulling
> the versification data which is core to the library's builtins.
>
>
>>
>> Advantages of the binding method are that it doesn't rely on parsing a C
>> header file, nor on the file laying out the values in a certain way. It
>> also can be used offline easily, doesn't require parsing the output of HTML
>> in order to find all the applicable files, and is likely slightly faster.
>> Not that the speed probably matters for a single run of this, but if you're
>> bulk processing files the speed advantages can add up.
>>
>>
>> The way I wrote mine is that it could use the include/canon*.h files from
>> a prior local SVN clone. This is very fast. I’d be curious to see how it
>> differs in speed from yours. The default is to go against the web, which is
>> painfully slow. (Note, it doesn’t yet do the standard disclaimer for the
>> web.) Not big deal if it is a single run. Peter mentioned that he does
>> additional analysis of the files in problematic areas that cannot be done
>> by the script.
>>
>> Using the python bindings does have the advantages of not re-inventing
>> the wheel. I was impressed with chatGPT’s regular expressions to slurp the
>> arrays and how concise it was to read the files. There really wasn’t any
>> difficulty in parsing the files. Since the canon*.h files are very static
>> and not likely to affect the parse. I don’t think this is that big a deal.
>>
>
> Yeah, the canon header files are pretty well structured following a
> standard format to make it easier on humans, and thus regex, to swallow.
> The thought of doing so had simply never crossed my mind.
>
>
>>
>>
>>
>> Disadvantages of the binding method are that it's requiring you to revert
>> back to a source build if you are using this to test a canon.h file or if
>> you want to use a canon file that isn't available in the package manager of
>> your Linux distribution. Building from source isn't terribly onerous for
>> most of us contributors but it might be more of a problem for a module
>> maintainer. Then again, how often do we add a new versification to the code
>> base?
>>
>>
>> So, it’s not something we’d expect a module maker to succeed at if not
>> on Un*x. Maybe someone has a library release for the MacOS or Windows that
>> could be used?
>>
>
> Because our Python bindings are built as part of the library and generated
> by Swig, they aren't distributed onto PyPI (the PYthon Package Index),
> which is the standard way of installing third party Python modules. To
> install from PyPI, one simply uses the "pip" tool, or other standard Python
> package installers. But for ours, the binding code is generated by Swig
> from the library code and we don't then subsequently distribute the module
> separately. Doing so would not be terribly difficult, but it is not a route
> we have taken previously.
>
> Of course, installing Python modules that include C bindings necessitates
> having the Python.h file available as well as a compatible version of a C
> compiler. For the official Python distributions, this is always and only
> MSVC - or at least it has been in the past. Officially Python has not
> historically even supported building for Windows with gcc. It's enough of a
> bugbear that I've never even bothered with installing modules with Python
> on Windows.
>
> Nowadays, though, we don't really need to. Anyone who wants to can install
> Ubuntu under the WSL and just take advantage of the existing apt package
> and Python in there. As for macOS, I haven't a good solution there. I only
> use it as demanded for work. Probably best to just let people who want it
> compile it from source there, and let them know there isn't any need for
> the ICU add ons.
>
> An alternative is to go beyond a Python script and create a full utility
> in C that does this same work. That would make distribution much easier to
> all of the platforms. The reason I did not initially take that route is
> that Python is so convenient for working with XML in whereas the library
> has no such mechanism to readily parse it and query in the same way.
> Obviously it can be done, as osis2mod is already doing that work. Its
> parsing code could be repurposed to this effect.
>
>
>>
>> So there are pros and cons between them. I was freshly off of getting the
>> bindings to compile when I wrote the first draft of av11n.py so I naturally
>> went that direction. I also try to avoid writing parsers when I can
>> leverage existing ones, as grammars can be notoriously complex to get
>> correct. So that dictated my choices as much as did anything else, really!
>>
>>
>> My computer science masters degree was in compiler writing! It’s
>> definitely not for the faint of heart!
>>
>
> Mine was in AI. Also not for the faint of heart, but much more
> approachable of a consumer product.
>
>
>>
>>
>> Another possible enhancement might be a CLI flag to limit the testing
>> range to a particular book (or testament) at a time. I have heard people
>> talk about having modules split up to one book per file or similar. If they
>> could say, "Only check this file against Joshua" then it could keep down a
>> significant amount of extra output. But again - I'm not really an intended
>> user of it!
>>
>>
>> Great idea. So David’s suggestion of a scope argument.
>>
>
> Basically, yes.
>
>
>>
>> And I’m not an intended user of it either. I’m just trying to get people
>> to use something other than osis2mod to pick a versification. Looking at
>> the Jira issues on osis2mod, in one issue a person listed their script that
>> looped over the v11ns and called osis2mod with each. Yuck!
>>
>
> Yeah, using osis2mod in that way seems fraught with trouble. But in the
> absence of knowing more about the internal of the library, I can see why
> someone would take that approach.
>
> --Greg
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20250629/e48e77fe/attachment.htm>
More information about the sword-devel
mailing list