[sword-devel] Python script for checking pairwise characters (PROFF-OF-CONCEPT)
Matěj Cepl
mcepl at cepl.eu
Mon Dec 18 09:45:44 EST 2023
On Mon Dec 18, 2023 at 2:38 PM CET, Kristof Szabo wrote:
> I wrote some time back https://github.com/krisek/sword-test, with quite a
> few test cases, which, I think, covers your use case as well.
Couple of differences on the first look:
1. Functionally, I prefer my script which stops when the first
unpaired character is found, thus allowing fixing the problem.
2. I use SAX API (xml.sax from the standard library) and it seems
to me like better suited for the Bible processing than the
traditional DOM (or LXML) interface. It nicely hides away all
hard work going on in the background and let me work only on
what’s relevant to my task. See
https://gitlab.com/crosswire-bible-society/CzeCSP/-/blob/master/CEPtoOSIS.py
for an example of much more complicated processing (and also,
it is ten-fold or something like that faster than processing
with Java and Saxon/XSLT).
> > Temporarily the script is in its own repo
> > (https://gitlab.com/mcepl/bible-freq-counter) and attached to
> > this message, but I would like to submit it to sword-utils. How
> > to do it?
Just an update … I have moved the script to
https://git.crosswire.org/mcepl/bible-freq-counter.
Best,
Matěj
--
http://matej.ceplovi.cz/blog/, @mcepl at floss.social
GPG Finger: 3C76 A027 CA45 AD70 98B5 BC1D 7920 5802 880B C9D8
Nemo plus iuris ad alium transfere potest quam ipse habet.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 216 bytes
Desc: not available
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20231218/ea28cc0e/attachment.sig>
More information about the sword-devel
mailing list