[sword-devel] Python script for checking pairwise characters (PROFF-OF-CONCEPT)

Timothy Allen thristian at gmail.com
Mon Dec 18 19:30:35 EST 2023


On 19/12/23 01:45, Matěj Cepl wrote:
> 2. I use SAX API (xml.sax from the standard library) and it seems
>     to me like better suited for the Bible processing than the
>     traditional DOM (or LXML) interface. It nicely hides away all
>     hard work going on in the background and let me work only on
>     what’s relevant to my task.

As a data point, when I was writing scripts for manipulating and 
updating the BSB module, I found the `xml.etree.ElementTree` module in 
the Python standard library to be many times faster than the SAX API. 
The SAX API is a perhaps a bit more convenient, because you can just 
subscribe to whatever events are meaningful for whatever processing you 
want to do, but ElementTree is just so much faster I found it was worth it.

LXML is probably faster again, but that's a third-party dependency, and 
that adds enough hassle for people who aren't Python developers that I 
drew the line there.

If you've already written things using the SAX API that work well for 
you, there's probably no point rewriting them, but if you're writing 
more tools in the future, you might want to give it a try!


Timothy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20231219/b5e6572a/attachment.htm>


More information about the sword-devel mailing list