[sword-devel] language/locale codes

Troy A. Griffitts scribe at crosswire.org
Wed Nov 11 11:55:53 MST 2009


Just as a side note to this discussion, we've recently added in utilstr.h:

SWBuf assureValidUTF8(const char *buf);

It would be interesting to pass "Bokmål" to this method and see if it
returns the same data.  There is a test program you can try under in the
source located at:

sword/tests/utf8norm

http://crosswire.org/svn/sword/trunk/tests/utf8norm.cpp



DM Smith wrote:
> On Nov 11, 2009, at 9:59 AM, Karl Kleinpaste wrote:
> 
>> DM Smith <dmsmith at crosswire.org> writes:
>>> U+00E5 is the unicode code point, not the encoding. In hex the utf-8
>>> encoding would be C3 A5. In ISO-8859-1, it would be E5.
>> XEmacs tells me that the buffer is UTF-8.  Manually re-asserting it...
>>
>> M-x set-buffer-file-coding-system RET utf-8 RET
>>
>> ...and re-saving the file makes no change to the content, yet that's
>> exactly the mechanism I've used in the past to convert ISO-8859 to UTF-8.
>>
>>> So I'd suggest looking at a hex dump to see what the encoding is.
>> BTDT.  "od -c" of this...
>>
>> # correct: Norwegian Bokmål
>> #nb     Norsk Bokmål
>> # a hack while g_utf8_validate() dislikes 'å': Norwegian Bokmaal
>> nb      Norsk Bokmaal
>>
>> ...produces this...
>>
>> 0007300   o   e   r   o  \n   #       c   o   r   r   e   c   t   :    
>> 0007320   N   o   r   w   e   g   i   a   n       B   o   k   m 303 245
>> 0007340   l  \n   #   n   b  \t   N   o   r   s   k       B   o   k   m
>> 0007360 303 245   l  \n   #       a       h   a   c   k       w   h   i
>> 0007400   l   e       g   _   u   t   f   8   _   v   a   l   i   d   a
>> 0007420   t   e   (   )       d   i   s   l   i   k   e   s       ' 303
>> 0007440 245   '   :       N   o   r   w   e   g   i   a   n       B   o
>> 0007460   k   m   a   a   l  \n   n   b  \t   N   o   r   s   k       B
>>
>> For a-ring, the character map application observes...
>>        C octal escaped UTF-8: \303\245
>> ...so I'm pretty well convinced that the content is right.
> 
> You've convinced me. I'm curious as to whether this is a reported GTK bug?
> 
> I'm also curious as to whether it handles the decomposed form. The following is \141\314\212:
> Bokmål
> 
> In Him,
> 	DM
> 
> 
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page




More information about the sword-devel mailing list