[sword-devel] CLucene 2

Matthew Talbert ransom1982 at gmail.com
Fri Sep 2 12:27:50 MST 2011


> It is actually UTF8 to Unicode from everything I've been able to read.
>  utf8towcs is, from what I have read, supposed to represent every
> Unicode character as a single wchar_t which is supposed to be wide
> enough to hold the entire Unicode point value in a single space.  If
> I'm mistaken and someone knows otherwise, I'd appreciate knowing.  So

You are (a little) mistaken. wchar_t is defined differently on
different platforms. On Windows, it is only 16 bits, which actually
*isn't* wide enough to hold all Unicode characters. I believe on
modern *nix systems, it's defined as 32 bits. I do not believe that
clucene actually bothers itself with any characters that do not happen
to fit in 16 bits, so they just don't work, at least on Windows.



More information about the sword-devel mailing list