[sword-devel] unicode / utf-8
Paul Gear
sword-devel@crosswire.org
Fri, 25 May 2001 12:41:55 +1000
> Congrats guys on the UTF-8 / UNICODE support!
>
> A few comments from my experiences over last week.
>
> UNICODE string on windows is an array of 16-bit characters.
And Java, FWIW.
> UNICODE string on UNIX is an array of 32-bit characters.
>
> UTF-8 IS NOT UNICODE! It supports STORING of unicode.
I thought that all Unicode was 32-bit (at least for the latest version), and
UTF-8 and UTF-16 are two of the defined encoding sequences for Unicode.
Thus, strictly speaking, only 32-bit chars are Unicode, but UTF-8 and UTF-16
can be called Unicode because they're defined by the standard.
> ...
> The question really comes when we try to decide the internal memory
> storage mechanism of these streams...
> ...
> How does searching now work in this new world.
>
> Lot's of things to consider over the next few weeks as we try to hash
> out an initial shot at supporting this new range of modules.
Looks like we might need to bring in a character manipulation library. Make
sure it's GPL-ed! :-)
PDG