[sword-devel] StripText() result not converted to UTF-8?

Troy A. Griffitts scribe at crosswire.org
Sun Feb 18 12:20:37 MST 2007


Joachim,
	I believe the filter is wrong.  It should return the UTF-8 value.  This 
is a bug.  Anyone want to look through the unicode code chart and recode 
all these values?

http://www.unicode.org/charts/PDF/U0080.pdf

	Sorry for the bug Joachim.

		-Troy.



Joachim Ansorg wrote:
> Hi,
> replying to myself.
> 
> I've been wrong in some of my assumptions.
> 
> JFB is ThML. It contains the entity Æ
> 
> StripText() calls the filter ThMLPlain which converts the Æ into 0xC9, 
> which is the corresponding cp1252 character code.
> 
> I thought that StripText() would remove all markup and return text in the 
> encoding given to EncodingFilterMgr.
> 
> My question:
> Is that right or wrong?
> 
> Some help would be wonderful,
> Joachim
> 
>> Hi,
>> I'm just debugging a bug in BibleTime.
>>
>> Our SWMgr is created to output utf8.
>> The module JFB contains the entitiy Æ .
>>
>> When I call StripText() the entitity is converted to the corresponding
>> character in the cp1252 charset, i.e. char with the value 0xC9.
>> I thought that the latin2utf8 filter would convert this plain text to utf8
>> because I told SWMgr to do this for me.
>>
>> Is there a way to set the output encoding for StripText() to be different
>> than the module's encoding?
>>
>> Thanks a lot,
>> Joachim
> 
> 
> 




More information about the sword-devel mailing list