[sword-devel] StripText() result not converted to UTF-8?
Troy A. Griffitts
scribe at crosswire.org
Sun Feb 18 12:20:37 MST 2007
Joachim,
I believe the filter is wrong. It should return the UTF-8 value. This
is a bug. Anyone want to look through the unicode code chart and recode
all these values?
http://www.unicode.org/charts/PDF/U0080.pdf
Sorry for the bug Joachim.
-Troy.
Joachim Ansorg wrote:
> Hi,
> replying to myself.
>
> I've been wrong in some of my assumptions.
>
> JFB is ThML. It contains the entity Æ
>
> StripText() calls the filter ThMLPlain which converts the Æ into 0xC9,
> which is the corresponding cp1252 character code.
>
> I thought that StripText() would remove all markup and return text in the
> encoding given to EncodingFilterMgr.
>
> My question:
> Is that right or wrong?
>
> Some help would be wonderful,
> Joachim
>
>> Hi,
>> I'm just debugging a bug in BibleTime.
>>
>> Our SWMgr is created to output utf8.
>> The module JFB contains the entitiy Æ .
>>
>> When I call StripText() the entitity is converted to the corresponding
>> character in the cp1252 charset, i.e. char with the value 0xC9.
>> I thought that the latin2utf8 filter would convert this plain text to utf8
>> because I told SWMgr to do this for me.
>>
>> Is there a way to set the output encoding for StripText() to be different
>> than the module's encoding?
>>
>> Thanks a lot,
>> Joachim
>
>
>
More information about the sword-devel
mailing list