[jsword-devel] Some characters not returned correctly

Jeremy Brown jsword-devel@crosswire.org
Wed, 22 Oct 2003 17:16:11 -0700


It seems like your recent updates have done a lot of good.  Some versions
like the Aleppo Codex and ChiNCVS/ChiNCVT, which I couldn't get to work
before, are now working for me.  (I'm using your underlying Java to
extract verses and create files readable on a Palm).

However, both in the program I've written, and in the JSword GUI, certain
European languages are having their accented characters returned as
"unknown character", though these languages used to work.  

When I use BookData.getPlainText() on these verses, and print out the
character numbers, I get 0xFFFD (unknown character) for these. 

Some examples are in the French Louis Segond, Spanish Reina Valera, Norsk,
Danske.  

The accent character for these languages should fall in the range of the
first 256 unicode characters (normal ISO 8859-1 character set).  All the
characters above this range seem to work fine--for example, Hebrew,
Chinese, and Russian are coming out OK (Aleppo Codex, Chinese NCVS,
Russian Makarij).

Some languages that I would expect to have the same problems but don't
are: Hungarian, Swedish.

Did how you are determining the character encoding change?  Like I say,
these European languages used to work fine, but now some of their
characters are getting converted to 0xFFFD before they are output with
BibleData.getPlainText()

Thanks for all your great work, and for any help you can provide.

Jeremy Brown
Biola University