[sword-devel] Python script for checking pairwise characters (PROFF-OF-CONCEPT)

Matěj Cepl mcepl at cepl.eu
Mon Dec 18 09:45:44 EST 2023


On Mon Dec 18, 2023 at 2:38 PM CET, Kristof Szabo wrote:
> I wrote some time back https://github.com/krisek/sword-test, with quite a
> few test cases, which, I think, covers your use case as well.

Couple of differences on the first look:

1. Functionally, I prefer my script which stops when the first
   unpaired character is found, thus allowing fixing the problem.
2. I use SAX API (xml.sax from the standard library) and it seems
   to me like better suited for the Bible processing than the
   traditional DOM (or LXML) interface. It nicely hides away all
   hard work going on in the background and let me work only on
   what’s relevant to my task. See
   https://gitlab.com/crosswire-bible-society/CzeCSP/-/blob/master/CEPtoOSIS.py
   for an example of much more complicated processing (and also,
   it is ten-fold or something like that faster than processing
   with Java and Saxon/XSLT).

> > Temporarily the script is in its own repo
> > (https://gitlab.com/mcepl/bible-freq-counter) and attached to
> > this message, but I would like to submit it to sword-utils. How
> > to do it?

Just an update … I have moved the script to
https://git.crosswire.org/mcepl/bible-freq-counter.

Best,

Matěj

-- 
http://matej.ceplovi.cz/blog/, @mcepl at floss.social
GPG Finger: 3C76 A027 CA45 AD70 98B5  BC1D 7920 5802 880B C9D8
 
Nemo plus iuris ad alium transfere potest quam ipse habet.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 216 bytes
Desc: not available
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20231218/ea28cc0e/attachment.sig>


More information about the sword-devel mailing list