[sword-devel] Sword Locales / German Umlaut Issues / AndroidBuild
Troy A. Griffitts
scribe at crosswire.org
Sat Feb 6 20:42:27 EST 2021
Sorry to spam on this topic, but here is how V8 does it. From your node
module, do have access to just call Intl::ConvertToUpper?
RUNTIME_FUNCTION(Runtime_StringToUpperCaseIntl) {
HandleScope scope(isolate);
DCHECK_EQ(args.length(), 1);
CONVERT_ARG_HANDLE_CHECKED(String, s, 0);
s = String::Flatten(isolate, s);
RETURN_RESULT_OR_FAILURE(isolate, Intl::ConvertToUpper(isolate, s));
}
On 2/6/21 5:46 PM, Troy A. Griffitts wrote:
>
> So, a quick question, is there a way for your native
> node-sword-interface to call back into your node.js world? e.g., how
> do you handle the progress feedback callbacks for module installs and
> search progress? However you do that, you should be able to do
> exactly the same thing with the toUpperUTF implementation. Just call
> back into your node.js env and pass it the string and return the
> string.toUpperCase().
>
> On 2/6/21 3:59 PM, Troy A. Griffitts wrote:
>>
>> The data is pulled from the locales.d/ files, but the toUpper logic
>> is necessary in a number of places in the engine. Two come to mind
>> immediately:
>>
>> parsing verse references not sensitive to case
>>
>> parsing LD module keys not sensitive to case
>>
>> To be able to get an uppercase representation of any Unicode
>> character, it takes a pretty hefty dataset of all known human
>> languages-- that's why we leave it up to an external library. And
>> yeah, because ICU is so large, that's why I don't compile it into my
>> binaries in Bishop. Bishop is about 13MB total, which includes ~8MB
>> of default module data (KJV, SME, StrongsGreek, StrongsHebrew).
>> That's about 5MB for the app. If I included ICU, it would greatly
>> increase the size. And both iOS and Android (Swift and Java) already
>> have facilities for getting the toUpper of a string.
>>
>> I hope you can steal the few lines from Bishop's native SWORD code
>> which tells SWORD to call either Java or Swift when toUpperUTF is called.
>>
>> I am sorry that this might break the nice ability to have exactly the
>> same code on both iOS and Android (I am surprised that absolutely no
>> changes were required for you to interface to a native library on
>> both iOS and Android! cordova required me to provide: Android:
>> Java-jni layer; iOS: Swift layer. I am jealous.)
>>
>> If you can think of an alternative, I am happy to listen. We could
>> provide a better StringMgr default (I think we simply have a latin-1
>> single byte tranformation table for basically ASCII characters),
>> which includes an SW_u32 hash which included German characters, but
>> that's going to limit the languages we support to only the ones we
>> add to our toUpper hash, and that's not really a dataset I want to
>> maintain.
>>
>> Open to suggestions,
>>
>> Troy
>>
>>
>> On 2/6/21 2:56 PM, Tobias Klein wrote:
>>>
>>> Dear Troy,
>>>
>>>
>>>
>>> Thank you for these explanations! I appreciate it!
>>>
>>>
>>>
>>> For Ezra Project on Android, I am at this point simply compiling
>>> node-sword-interface with the Android cross compilers and it works.
>>> However, as I wrote, I have issues for the German Bible book names now.
>>>
>>>
>>>
>>> Is the StringMgr functionality only used to handle the locales.d
>>> files? Or also for some content inside any SWORD modules?
>>>
>>> If it is only used for handling the locales.d files then I would
>>> consider handling the Sword locales.d files directly from JavaScript
>>> / node.js, which already supports Unicode.
>>>
>>>
>>>
>>> I also checked whether I can cross-compile the ICU library and that
>>> worked, but this is a huge binary (I think 20-30 MB) and I would
>>> rather keep the APK size as small as possible.
>>>
>>> Best regards,
>>> Tobias
>>>
>>>
>>>
>>> *From: *Troy A. Griffitts <mailto:scribe at crosswire.org>
>>> *Sent: *Sonntag, 31. Januar 2021 18:20
>>> *To: *sword-devel at crosswire.org <mailto:sword-devel at crosswire.org>
>>> *Subject: *Re: [sword-devel] Sword Locales / German Umlaut Issues /
>>> AndroidBuild
>>>
>>>
>>>
>>> Dear Tobias,
>>>
>>> My apologies for taking so long to respond to this, but I wanted to
>>> give a thorough answer. See the summary at the end if you don't
>>> care about the details.
>>>
>>> So, SWORD has a class StringMgr, which manages strings within SWORD,
>>> and by default SWORD includes a very basic implementation, which
>>> doesn't necessarily know about or support anything beyond what the
>>> basic C string methods support.
>>>
>>> I am sure this invokes a sense of horror from you at first, so let
>>> me explain a bit how we properly handle character sets. First,
>>> short background: since we existed well before the Unicode world, we
>>> have multiple locale files for each language, which you will still
>>> see in the locales.d/ folder, each specifying their character
>>> encoding, and most of the time SWORD doesn't need to manipulate
>>> characters, so simply holding data, and passing that data to a
>>> display frontend, and specifying a font which will handle that
>>> encoding was enough in the old world. IMPORTANT: the one place we
>>> do need to manipulate character data is to perform case-insensitive
>>> comparisons. We did this in the past by converting a string to
>>> uppercase before comparison. You'll notice this in the section for
>>> Bible book abbreviation in each locale-- the partial match key must
>>> be in a toupper state.
>>>
>>> Today, everything in SWORD prefers Unicode and specifically, encoded
>>> as UTF-8. To support this:
>>>
>>> First, we have utility functions within SWORD for working with
>>> Unicode encoded strings, see:
>>>
>>> http://crosswire.org/svn/sword/trunk/include/utilstr.h
>>> <http://crosswire.org/svn/sword/trunk/include/utilstr.h>
>>>
>>> Specifically:
>>>
>>> SWBuf assureValidUTF8(const char *buf);
>>> SW_u32 getUniCharFromUTF8(const unsigned char **buf, bool skipValidation = false);
>>> SWBuf *getUTF8FromUniChar(SW_u32 uchar, SWBuf *appendTo);
>>> SWBuf utf8ToWChar(const char *buf);
>>> SWBuf wcharToUTF8(const wchar_t *buf);
>>>
>>>
>>>
>>> To wrap this up, by subclassing StringMgr, SWORD supports
>>> implementing character encoding by linking to other libraries, e.g.,
>>> ICU, Qt, etc. to handle full Unicode support. And while the
>>> StringMgr interface allow implementation of many string functions,
>>> upperUTF8 is the only real method the SWORD engine needs to work
>>> completely. Some utilities use the other methods in there, but the
>>> engine, only needs this method.
>>>
>>>
>>>
>>> In summary, on Android, you are likely not linking to ICU when you
>>> build the native SWORD binary-- which I don't do either for Bishop.
>>> The Cordova SWORD plugin uses the SWORD java-jni bindings, which use
>>> the Java VM to implement StringMgr:
>>>
>>> https://crosswire.org/svn/sword/trunk/bindings/java-jni/jni/swordstub.cpp
>>> <https://crosswire.org/svn/sword/trunk/bindings/java-jni/jni/swordstub.cpp>
>>> Search for: AndroidStringMgr
>>>
>>> And on iOS the Cordova plugin uses the Swift libraries to do the
>>> same. This is done by using the SWORD flatapi call to
>>> org_crosswire_sword_StringMgr_setToUpper to provide a Swift
>>> implementation to uppercase a string.
>>>
>>> http://crosswire.org/svn/sword/trunk/bindings/cordova/cordova-plugin-crosswire-sword/src/ios/SWORD.swift
>>> <http://crosswire.org/svn/sword/trunk/bindings/cordova/cordova-plugin-crosswire-sword/src/ios/SWORD.swift>
>>>
>>> I hope this give you the information you need to get things working
>>> for you. Please don't hesitate to ask if you need help,
>>>
>>> Troy
>>>
>>>
>>>
>>> On 1/17/21 11:59 AM, Tobias Klein wrote:
>>>
>>> Dear Troy,
>>>
>>> I'm playing with an Android Build of Sword and I get issues with the
>>> German Umlauts.
>>>
>>> So I have issues with Bible book names like Römer, Könige, etc.
>>>
>>> The Umlauts are shown as ?.
>>>
>>> I'm configuring the SWORD build with CMake like below (without ICU!)
>>>
>>> I remember having similar issues on Linux when building without ICU.
>>>
>>> How do you build SWORD for Bishop? Any suggestions?
>>>
>>> Best regards,
>>> Tobias
>>>
>>> -- Check for working CXX compiler:
>>> /opt/Android/SDK/ndk/r21b/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++
>>> -- Check for working CXX compiler:
>>> /opt/Android/SDK/ndk/r21b/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++
>>> -- works
>>> -- Detecting CXX compiler ABI info
>>> -- Detecting CXX compiler ABI info - done
>>> -- Detecting CXX compile features
>>> -- Detecting CXX compile features - done
>>> -- Check for working C compiler:
>>> /opt/Android/SDK/ndk/r21b/toolchains/llvm/prebuilt/linux-x86_64/bin/clang
>>> -- Check for working C compiler:
>>> /opt/Android/SDK/ndk/r21b/toolchains/llvm/prebuilt/linux-x86_64/bin/clang
>>> -- works
>>> -- Detecting C compiler ABI info
>>> -- Detecting C compiler ABI info - done
>>> -- Detecting C compile features
>>> -- Detecting C compile features - done
>>> -- Configuring your system to build libsword.
>>> -- SWORD Version 1008900000
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> http://crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20210206/61c8e75d/attachment-0001.html>
More information about the sword-devel
mailing list