[sword-devel] language/locale codes
Karl Kleinpaste
karl at kleinpaste.org
Wed Nov 11 07:59:29 MST 2009
DM Smith <dmsmith at crosswire.org> writes:
> U+00E5 is the unicode code point, not the encoding. In hex the utf-8
> encoding would be C3 A5. In ISO-8859-1, it would be E5.
XEmacs tells me that the buffer is UTF-8. Manually re-asserting it...
M-x set-buffer-file-coding-system RET utf-8 RET
...and re-saving the file makes no change to the content, yet that's
exactly the mechanism I've used in the past to convert ISO-8859 to UTF-8.
> So I'd suggest looking at a hex dump to see what the encoding is.
BTDT. "od -c" of this...
# correct: Norwegian Bokmål
#nb Norsk Bokmål
# a hack while g_utf8_validate() dislikes 'å': Norwegian Bokmaal
nb Norsk Bokmaal
...produces this...
0007300 o e r o \n # c o r r e c t :
0007320 N o r w e g i a n B o k m 303 245
0007340 l \n # n b \t N o r s k B o k m
0007360 303 245 l \n # a h a c k w h i
0007400 l e g _ u t f 8 _ v a l i d a
0007420 t e ( ) d i s l i k e s ' 303
0007440 245 ' : N o r w e g i a n B o
0007460 k m a a l \n n b \t N o r s k B
For a-ring, the character map application observes...
C octal escaped UTF-8: \303\245
...so I'm pretty well convinced that the content is right.
More information about the sword-devel
mailing list