[sword-devel] Sword Locales / German Umlaut Issues / AndroidBuild

Tobias Klein contact at tklein.info
Sun Feb 7 08:59:10 EST 2021


Hi Troy,

Thanks once more for all the details! I appreciate it.

I just grepped quickly in the SWORD source code (grep -r "upperUTF8" .  
| grep -v ".svn") and the method upperUTF8 appears to be only used in 
the following places:

./src/keys/versekey.cpp: stringMgr->upperUTF8(abbr, (unsigned 
int)(strlen(abbr)*2));
./src/keys/versekey.cpp: stringMgr->upperUTF8(abbr, (unsigned 
int)(strlen(abbr)*2));
./utilities/imp2gbs.cpp: 
StringMgr::getSystemStringMgr()->upperUTF8(keyBuffer.getRawData(), size-2);

I think neither of those is currently used in Ezra Project, though. At 
the moment I do not have the use case to parse verse keys based on any 
special Unicode inputs. I am only using the standard English 
abbreviations for verse keys and that only happens internally. So, in 
this case I may just process the locales.d files directly in node.js / 
JavaScript.

Regarding node-sword-interface and the build process for mobile 
platforms ... currently I have only tried Android, which works fine. iOS 
should technically work as well, but I have not tried that yet. The 
boiler plate work to make all that happen smoothly is provided by the 
nodejs-mobile <https://code.janeasystems.com/nodejs-mobile> cordova 
plugin. That plugin contains build scripts that seemlessly compile any 
native node.js addons like node-sword-interface or also the sqlite3 
module that I am using.

And since I am now using an API compatible runtime environment both for 
Electron/nodejs and Cordova/nodejs-mobile I did not have to add any 
additional glue code. One risk I see with this approach is that the guys 
who provide nodejs-mobile discontinue their work for some reason. It's 
essentially a completely separately maintained fork of nodejs (it has 
nothing to do with V8 actually). Originally it is based on the 
ChakraCore JavaScript engine of the Microsoft Edge browser. But the 
nodejs-mobile guys ported it to Android and iOS ...

Regarding the StringMgr native callback possibility ... yes technically 
this is possible with a node native addon like node-sword-interface.
I am using such a functionality for the InstallMgr and search progress 
feedbacks already.

So, long story short ... if in the future a usecase comes up to parse 
Unicode-based VerseKeys, I will implement a special StringMgr binding as 
you suggested. But for now I'll focus on handling the locales.d content 
directly in JavaScript / node.js.

I will keep you posted.

Best regards,
Tobias

On 2/6/21 11:59 PM, Troy A. Griffitts wrote:
>
> The data is pulled from the locales.d/ files, but the toUpper logic is 
> necessary in a number of places in the engine. Two come to mind 
> immediately:
>
> parsing verse references not sensitive to case
>
> parsing LD module keys not sensitive to case
>
> To be able to get an uppercase representation of any Unicode 
> character, it takes a pretty hefty dataset of all known human 
> languages-- that's why we leave it up to an external library. And 
> yeah, because ICU is so large, that's why I don't compile it into my 
> binaries in Bishop.  Bishop is about 13MB total, which includes ~8MB 
> of default module data (KJV, SME, StrongsGreek, StrongsHebrew).  
> That's about 5MB for the app.  If I included ICU, it would greatly 
> increase the size.  And both iOS and Android (Swift and Java) already 
> have facilities for getting the toUpper of a string.
>
> I hope you can steal the few lines from Bishop's native SWORD code 
> which tells SWORD to call either Java or Swift when toUpperUTF is called.
>
> I am sorry that this might break the nice ability to have exactly the 
> same code on both iOS and Android (I am surprised that absolutely no 
> changes were required for you to interface to a native library on both 
> iOS and Android!  cordova required me to provide: Android: Java-jni 
> layer; iOS: Swift layer.  I am jealous.)
>
> If you can think of an alternative, I am happy to listen.  We could 
> provide a better StringMgr default (I think we simply have a latin-1 
> single byte tranformation table for basically ASCII characters), which 
> includes an SW_u32 hash which included German characters, but that's 
> going to limit the languages we support to only the ones we add to our 
> toUpper hash, and that's not really a dataset I want to maintain.
>
> Open to suggestions,
>
> Troy
>
>
> On 2/6/21 2:56 PM, Tobias Klein wrote:
>>
>> Dear Troy,
>>
>> Thank you for these explanations! I appreciate it!
>>
>> For Ezra Project on Android, I am at this point simply compiling 
>> node-sword-interface with the Android cross compilers and it works. 
>> However, as I wrote, I have issues for the German Bible book names now.
>>
>> Is the StringMgr functionality only used to handle the locales.d 
>> files? Or also for some content inside any SWORD modules?
>>
>> If it is only used for handling the locales.d files then I would 
>> consider handling the Sword locales.d files directly from JavaScript 
>> / node.js, which already supports Unicode.
>>
>> I also checked whether I can cross-compile the ICU library and that 
>> worked, but this is a huge binary (I think 20-30 MB) and I would 
>> rather keep the APK size as small as possible.
>>
>> Best regards,
>> Tobias
>>
>> *From: *Troy A. Griffitts <mailto:scribe at crosswire.org>
>> *Sent: *Sonntag, 31. Januar 2021 18:20
>> *To: *sword-devel at crosswire.org <mailto:sword-devel at crosswire.org>
>> *Subject: *Re: [sword-devel] Sword Locales / German Umlaut Issues / 
>> AndroidBuild
>>
>> Dear Tobias,
>>
>> My apologies for taking so long to respond to this, but I wanted to 
>> give a thorough answer.  See the summary at the end if you don't care 
>> about the details.
>>
>> So, SWORD has a class StringMgr, which manages strings within SWORD, 
>> and by default SWORD includes a very basic implementation, which 
>> doesn't necessarily know about or support anything beyond what the 
>> basic C string methods support.
>>
>> I am sure this invokes a sense of horror from you at first, so let me 
>> explain a bit how we properly handle character sets.  First, short 
>> background: since we existed well before the Unicode world, we have 
>> multiple locale files for each language, which you will still see in 
>> the locales.d/ folder, each specifying their character encoding, and 
>> most of the time SWORD doesn't need to manipulate characters, so 
>> simply holding data, and passing that data to a display frontend, and 
>> specifying a font which will handle that encoding was enough in the 
>> old world.  IMPORTANT: the one place we do need to manipulate 
>> character data is to perform case-insensitive comparisons.  We did 
>> this in the past by converting a string to uppercase before 
>> comparison.  You'll notice this in the section for Bible book 
>> abbreviation in each locale-- the partial match key must be in a 
>> toupper state.
>>
>> Today, everything in SWORD prefers Unicode and specifically, encoded 
>> as UTF-8.  To support this:
>>
>> First, we have utility functions within SWORD for working with 
>> Unicode encoded strings, see:
>>
>> http://crosswire.org/svn/sword/trunk/include/utilstr.h
>>
>> Specifically:
>>
>> SWBuf assureValidUTF8(const char *buf);
>> SW_u32 getUniCharFromUTF8(const unsigned char **buf, bool skipValidation = false);
>> SWBuf *getUTF8FromUniChar(SW_u32 uchar, SWBuf *appendTo);
>> SWBuf utf8ToWChar(const char *buf);
>> SWBuf wcharToUTF8(const wchar_t *buf);
>>
>> To wrap this up, by subclassing StringMgr, SWORD supports 
>> implementing character encoding by linking to other libraries, e.g., 
>> ICU, Qt, etc. to handle full Unicode support.  And while the 
>> StringMgr interface allow implementation of many string functions, 
>> upperUTF8 is the only real method the SWORD engine needs to work 
>> completely. Some utilities use the other methods in there, but the 
>> engine, only needs this method.
>>
>> In summary, on Android, you are likely not linking to ICU when you 
>> build the native SWORD binary-- which I don't do either for Bishop.  
>> The Cordova SWORD plugin uses the SWORD java-jni bindings, which use 
>> the Java VM to implement StringMgr:
>>
>> https://crosswire.org/svn/sword/trunk/bindings/java-jni/jni/swordstub.cpp 
>> Search for: AndroidStringMgr
>>
>> And on iOS the Cordova plugin uses the Swift libraries to do the 
>> same.  This is done by using the SWORD flatapi call to 
>> org_crosswire_sword_StringMgr_setToUpper to provide a Swift 
>> implementation to uppercase a string.
>>
>> http://crosswire.org/svn/sword/trunk/bindings/cordova/cordova-plugin-crosswire-sword/src/ios/SWORD.swift
>>
>> I hope this give you the information you need to get things working 
>> for you.  Please don't hesitate to ask if you need help,
>>
>> Troy
>>
>> On 1/17/21 11:59 AM, Tobias Klein wrote:
>>
>> Dear Troy,
>>
>> I'm playing with an Android Build of Sword and I get issues with the 
>> German Umlauts.
>>
>> So I have issues with Bible book names like Römer, Könige, etc.
>>
>> The Umlauts are shown as ?.
>>
>> I'm configuring the SWORD build with CMake like below (without ICU!)
>>
>> I remember having similar issues on Linux when building without ICU.
>>
>> How do you build SWORD for Bishop? Any suggestions?
>>
>> Best regards,
>> Tobias
>>
>> -- Check for working CXX compiler: 
>> /opt/Android/SDK/ndk/r21b/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++
>> -- Check for working CXX compiler: 
>> /opt/Android/SDK/ndk/r21b/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++ 
>> -- works
>> -- Detecting CXX compiler ABI info
>> -- Detecting CXX compiler ABI info - done
>> -- Detecting CXX compile features
>> -- Detecting CXX compile features - done
>> -- Check for working C compiler: 
>> /opt/Android/SDK/ndk/r21b/toolchains/llvm/prebuilt/linux-x86_64/bin/clang
>> -- Check for working C compiler: 
>> /opt/Android/SDK/ndk/r21b/toolchains/llvm/prebuilt/linux-x86_64/bin/clang 
>> -- works
>> -- Detecting C compiler ABI info
>> -- Detecting C compiler ABI info - done
>> -- Detecting C compile features
>> -- Detecting C compile features - done
>> -- Configuring your system to build libsword.
>> -- SWORD Version 1008900000
>>
>>
>> _______________________________________________
>> sword-devel mailing list:sword-devel at crosswire.org
>> http://crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20210207/7f5210ce/attachment.html>


More information about the sword-devel mailing list