[sword-devel] usfm2osis.py

Chris Little chrislit at crosswire.org
Sun Aug 5 20:03:18 MST 2012


On 8/5/2012 7:40 PM, Robert Hunt wrote:
> On 06/08/12 14:20, Chris Little wrote:
>> Linux packagers apparently go the UCS-4 route, so I didn't notice any
>> issue with using the Language Tags. But trying the above on Windows
>> shows that the cygwin build and the builds from python.org (2.7 & 3.2)
>> all use UCS-2. So my script won't work correctly on Windows.
>>
>> Not to worry, though. I'll just replace the Language Tags with
>> Noncharacters in the range u+FDD0-u+FDEF. They're UCS-2-safe since
>> they're BMP codepoints and they're specifically designated as
>> "intended for process-internal uses, but are not permitted for
>> interchange." So in the unlikely event that they appear in input, it's
>> the fault of the USFM-encoder if anything goes awry.
>>
>> We'll have to watch for input outside of the BMP on UCS-2 Python,
>> though, as that could cause problems.
> I guess I'm quite surprised that you wrote a new Python program using
> Python2 when its development is basically coming to an end (and the next
> Ubuntu will no longer have it installed by default).

Python 2.x is better supported than 3 by libraries, including some I may 
elect to use at a later date. I know Python 2.x well and have never seen 
a need to learn 3, and if Python 2.x suits my needs, there's no reason 
to jump to 3. 2to3 might work fine on my app, for all I know.

Python 2.7 not being on the Ubunutu desktop CD doesn't really matter. 
Python 2.7 will still be available via apt-get, and 'python' will still 
refer to Python 2.7.

> I also wonder if
> Python3 would handle Unicode better.

Yes and no, but as far as this specific issue goes, no. UCS-2 is still 
the default internal representation in Python 3 and hence is what 
everyone will have available to them in Python 3 on Windows (as I 
mentioned in the first quoted paragraph above).

--Chris



More information about the sword-devel mailing list