[sword-devel] usfm to osis converter...

Greg Hellings greg.hellings at gmail.com
Thu Jul 30 12:28:39 MST 2015


I can't help but wonder if the speed boosts this boasts come at the
expense of one of usfm2osis.py's main benefits - its change from raw
text manipulation to proper XML construction. We previously had a very
fast usfm2osis.pl that used Perl's raw regular expressions to brute
force its way to very fast conversions. The downside to this fast
conversion was that the XML was in no way guaranteed to even be
well-formed and certainly not valid.

Among other clean-up, usfm2osis.py avoided this problem by using
Python's XML libraries to ensure that any output was, at the least,
well-formed. It might even include the schema in order to enforce
validity.

I wonder if this new script suffers from the same problems as usfm2osis.pl

--Greg

On Thu, Jul 30, 2015 at 2:05 PM, David Haslam <dfhmch at googlemail.com> wrote:
> Thanks, Ryan. This looks very interesting. I expect that John Austin and
> others would also find it useful.
>
> Your description (qv) of the project should grab our attention.
>
> I wrote my own USFM to OSIS converter in python. There are several reasons
> for this:
>
>     The usfm2osis.py converter mentioned above runs way too slow on my
> computer. (It takes more than 2 minutes to process the World English Bible).
> I thought I could make one that ran faster.
>     The usfm2osis.py converter source is difficult for me to read, so I'm
> unable to work on improving it. Obviously it would be better to submit
> improvements to that script, but my limitations prevent that. I think the
> biggest difficulty I have with reading the code is the huge amount of
> complicated regular expressions it uses... about 200! Which reminds me of a
> Jamie Zawinski quote.... “Some people, when confronted with a problem, think
> ‘I know, I'll use regular expressions.’ Now they have two problems.”
> (Sometimes they make sense, though. The script I wrote has 9 of them.)
>     I wanted a converter that targeted python3. (usfm2osis.py targeted only
> python2 when I began working on my converter.)
>     I wanted a converter that would be easy to update when changes are made
> to the USFM standard.
>     I thought it would be a fun project. (it was!)
>
> I've tested it with CPython 2.7.6 and CPython3 3.4.0 and it works fine in
> both of those versions of python. (This script works with pypy, pypy3, and
> jython 2.7.0 as well, but they are signfiicantly slower at running this
> script than CPython. I haven't tested it with IronPython as I don't have
> that implementation of the python language.) It is public domain. You may do
> whatever you wish with the code.
>
> It's quite fast. For example, it only takes about 10 seconds to process the
> World English Bible on my computer. That's about a 90% reduction in
> processing time compared with usfm2osis.py in my testing. The output
> validates against the OSIS 2.1.1 schema. No markup errors are reported by
> osis2mod when generating modules for any of the bibles that I have access to
> at this time.
>
> ----
>
> Best regards,
>
> David
>
>
>
> --
> View this message in context: http://sword-dev.350566.n4.nabble.com/usfm-to-osis-converter-tp4654838p4654840.html
> Sent from the SWORD Dev mailing list archive at Nabble.com.
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page



More information about the sword-devel mailing list