[sword-devel] CLucene 2

Greg Hellings greg.hellings at gmail.com
Fri Sep 2 11:56:13 MST 2011


On Fri, Sep 2, 2011 at 12:22 PM, Matthew Talbert <ransom1982 at gmail.com> wrote:
>> So I'm looking at the trouble with building CLucene 2 and the first
>> snag seems to hinge around the helper functions we use -
>> lucene_utf8towcs and lucene_wcstoutf8 and the like. These are still
>> present in CLucene 2, but they are not exposed through a public
>> header.  As I see it, we can either choose to add their signatures to
>> our own headers when CLucene 2 is detected and move on with it, or we
>> can appropriate their entire implementation of the functionality and
>> rename it as sword_utf8towcs and the like provided the licenses allow
>> us to do so.  Anyone have a firm opinion on the better path to take?
>>
>
> Wouldn't this be very close to the UTF8 to UTF16 conversion that I
> posted the other day? If so, we already have a filter that can do
> this.

It is actually UTF8 to Unicode from everything I've been able to read.
 utf8towcs is, from what I have read, supposed to represent every
Unicode character as a single wchar_t which is supposed to be wide
enough to hold the entire Unicode point value in a single space.  If
I'm mistaken and someone knows otherwise, I'd appreciate knowing.  So
yes, the functionality is akin to UTF8UTF16, but it is not the same.
I pursued the same path myself only to find SWORD lacks the internal
ability to convert to/from the proper wchar_t format.  The C++ stl
does have the ability, but like so much else in the stl it is pure
arcana to figure out what is going on and the proper syntax, plus it
would not be nearly as performant as operating on the C string objects
(it would require a path that involved: char * -> string ->
stringstream -> wstring -> whcar_t*).

I will follow DM's suggestion, but I wanted to check and see if anyone
here was already in touch with the CLucene crew and knew their minds
on this one.  I thought someone here has, in the past, been at least
marginally involved in CLucene work.  Another alternative is that
apparently ICU supports this functionality also. But then we would
require CLucene _and_ ICU functionality simultaneously in the library.
The CLucene functions appear to be relatively straightforward and
self-contained in their own source file, so that might be the best
route for us if CLucene really does intend to hide this functionality
from its headers.

--Greg

>
> Matthew
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>



More information about the sword-devel mailing list