[sword-devel] UTF8 String Handling [was: Locale and text retrieval code]

Troy A. Griffitts scribe at crosswire.org
Sun Feb 1 15:32:05 MST 2009


Dear Manfred,

Are you overriding the default StringMgr in SWORD with something like:

StringMgr::setSystemStringMgr(new MyStringMananger());

I don't really like this mechanism of overriding SWORD's unicode 
handling and am not sure the benefit of opting for GTK or Qt's unicode 
handling so as not to require ICU is more valuable than having an 
assured similar handling of all strings.

Potential for problems are great if a module is creating using ICU logic 
and accessed using any variant of that logic.  We do binary search 
optimizations in the engine and if the dataset is not 'in order' 
infinite loops could occur or other badness.

Anyway, yes, if you are telling SWORD that you have a Unicode capable 
StringMgr (supportsUnicode() == true) then SWORD will include the UTF8 
locales.  These locales require a working upperUTF8() method in your 
StringMgr.

But I would suggest using ICU if you have SWORD compiled with it now.

	-Troy.


Manfred Bergmann wrote:
> 
> Am 30.01.2009 um 19:50 schrieb Manfred Bergmann:
> 
>> Hi.
>>
>> I'm currently having some problems with locales and can't really 
>> figure out what the problem is.
>> First of all in the initialization code after some checking for system 
>> language and such a call to:
>>
>> sword::LocaleMgr *lManager = sword::LocaleMgr::getSystemLocaleMgr();
>> lManager->setDefaultLocaleName("de");
>>
>> is done.
>>
>> On the first use of VerseKey:
>> VerseKey vk;
>>
>> I get this output on the console:
>> ------------
>> VerseKey::Book: 1. Könige does not have a matching toupper abbrevs 
>> entry! book number returned was: -1(10). Required entry should be:
>> 1. KöNIGE=11
>> VerseKey::Book: 2. Könige does not have a matching toupper abbrevs 
>> entry! book number returned was: -1(11). Required entry should be:
>> 2. KöNIGE=12
>> VerseKey::Book: Sprüche does not have a matching toupper abbrevs 
>> entry! book number returned was: -1(19). Required entry should be:
>> SPRüCHE=20
>> VerseKey::Book: Matthäus does not have a matching toupper abbrevs 
>> entry! book number returned was: -1(0). Required entry should be:
>> MATTHäUS=40
>> VerseKey::Book: Römer does not have a matching toupper abbrevs entry! 
>> book number returned was: -1(5). Required entry should be:
>> RöMER=45
>> VerseKey::Book: Hebräer does not have a matching toupper abbrevs 
>> entry! book number returned was: -1(18). Required entry should be:
>> HEBRäER=58
>> ------------
>>
>> All this results in that no reference lookups for any book name with 
>> Umlauts can be done.
>> We use a subclass of StringMgr which is initialized and the 
>> "supportsUnicode()" method is called (but not the upperUTF8() 
>> interestingly).
>> Any ideas what the problem could be?
> 
> I managed to compile the sword library with ICU and that fixed the issue.
> 
>> The second thing I don't really understand is how to correctly pull 
>> out text for a verse key.
>> Some code is following, sorry for that.
>>
>> In MacSword I have implemented the code like this:
>>
>> -----------------------
>>    // needed to check for UTF8 string
>>    MSStringMgr *strMgr = new MSStringMgr();
>>
>>    // incoming reference
>>    const char *cref = [reference UTF8String];
>>    sword::VerseKey    vk;
>>    sword::ListKey listkey = vk.ParseVerseList(cref, vk, true);
>>    // for the duration of this query be want the key to persist
>>    listkey.Persist(true);
>>    swModule->setKey(listkey);
>>
>>    // iterate through keys
>>    for ((*swModule) = sword::TOP; !swModule->Error(); (*swModule)++) {
>>        const char *keyCStr = swModule->getKeyText();
>>        const char *txtCStr = swModule->RenderText();
>>        NSMutableDictionary *dict = [NSMutableDictionary 
>> dictionaryWithCapacity:2];
>>        NSString *key = @"";
>>        NSString *txt = @"";
>>        if(strMgr->isUtf8(txtCStr)) {
>>            txt = [NSString stringWithUTF8String:txtCStr];
>>        } else {
>>            txt = [NSString stringWithCString:txtCStr 
>> encoding:NSISOLatin1StringEncoding];
>>        }
>>
>>        if([self isUnicode]) {
>>            key = [NSString stringWithUTF8String:keyCStr];
>>        } else {
>>            key = [NSString stringWithCString:keyCStr 
>> encoding:NSISOLatin1StringEncoding];
>>        }
>>
>>        // add to dict
>>        [dict setObject:txt forKey:SW_OUTPUT_TEXT_KEY];
>>        [dict setObject:key forKey:SW_OUTPUT_REF_KEY];
>>        // add to array
>>        [ret addObject:dict];
>>    }
>>    // remove persitent key
>>    swModule->setKey("gen.1.1");
>> ------------------------
> 
> The above code still crashes for references that doesn't exist while the 
> below code works everywhere and always.
> 
>> This actually works but due to the problem with the locale and Umlauts 
>> it crashes somewhere in the library in the head of the for loop 
>> ((*swModule) = sword::TOP).
>>
>> I have some older code lying around (which actually is used in 
>> MacSword 1.4.3 done by William) which I to be honest don't understand:
>>
>> --------------------------
>>     sword::VerseKey vk;
>>        int lastIndex;
>>     ((sword::VerseKey*)(swModule->getKey()))->Headings(1);   
>>     sword::ListKey listkey = vk.ParseVerseList(toUTF8(reference), 
>> "Gen1", true);   
>>     for (int i = 0; i < listkey.Count(); i++) {
>>         sword::VerseKey *element = My_SWDYNAMIC_CAST(VerseKey, 
>> listkey.GetElement(i));
>>        
>>         // is it a chapter or book - not atomic
>>         if(element) {
>>             swModule->Key(element->LowerBound());
>>             // find the upper bound
>>             vk = element->UpperBound();
>>             vk.Headings(true);
>>         } else {
>>             // set it up
>>             swModule->Key(*listkey.GetElement(i));
>>         }
>>
>>         // while not past the upper bound
>>         do {           
>>             //add verse index to dictionary
>>             char *ctxt = (char *)swModule->RenderText();
>>             int clen = strlen(ctxt);
>>            NSString *text = fromUTF8(ctxt);
>>            NSString *verse = fromUTF8(swModule->Key().getText());
>>
>>            // add to dict
>>            NSMutableDictionary *dict = [NSMutableDictionary 
>> dictionaryWithCapacity:2];
>>            [dict setObject:text forKey:SW_OUTPUT_TEXT_KEY];
>>            [dict setObject:verse forKey:SW_OUTPUT_REF_KEY];
>>            // add to array
>>            [ret addObject:dict];
>>
>>             lastIndex = (swModule->Key()).Index();
>>             (*swModule)++;
>>             if(lastIndex == (swModule->Key()).Index())
>>                 break;
>>         }while (element && swModule->Key() <= vk);
>>     }
>> ---------------------
>> This code also has the locale problem but it doesn't crash.
>> The problem is however that I don't know the right way of pulling out 
>> text and I don't really understand some portions of the above code. 
>> These two loops really confuse me. I would be happy if someone could 
>> tell me line by line what happens here.
>> I had the impression that actually the code above, which crashes, is 
>> correct. But it seems, it is not.
>>
>>
>>
>> Regards,
>> Manfred
>>
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
> 
> 
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page




More information about the sword-devel mailing list