<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Hi Tobias,</p>
<p>I don't believe processing the locale files directly will change
your issues with with German umlauts. The issue boils down to a
few things:</p>
<p>First, there are other uses of upperUTF8 in the engine. You show
the cpp files in your grep, but not the .h files.</p>
<p>Second, VerseKey will not work correctly for any locale without a
proper upperUTF8 implementation which supports that locale.</p>
<p>The issue is that verses references, freehand from outside or
roundtripped from SWORD itself, still need to map to the uppercase
representation of the book name. If you look at all the locale
files, the verse parsing table uses uppercase for all book
abbreviations, so VerseKey's parser immediately uppercases the
input string before it looks up the book in the table.</p>
<p>This is also true for LD module key lookups. They are stored in
uppercase. Technically it could work if both the module import
tool and the display application were using the same StringMgr--
the default StringMgr would uppercase the string incorrectly, but
consistently with the display application. But this is not the
case. Our import tools use a proper StringMgr to create the
modules with a proper uppercase key and thus the display
application must also use a proper StringMgr or things will not
work correctly.</p>
<p>I am afraid getting the data from the locales.d/ folders in
JavaScript will not help fix the problem.</p>
<p>Troy</p>
<p><br>
</p>
<div class="moz-cite-prefix">On 2/7/21 6:59 AM, Tobias Klein wrote:<br>
</div>
<blockquote type="cite"
cite="mid:3997d80b-5dd9-0099-161d-ada9f583cb37@tklein.info">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<p>Hi Troy,</p>
<p>Thanks once more for all the details! I appreciate it.<br>
<br>
I just grepped quickly in the SWORD source code (grep -r
"upperUTF8" . | grep -v ".svn") and the method upperUTF8
appears to be only used in the following places:<br>
<br>
<tt>./src/keys/versekey.cpp:
stringMgr->upperUTF8(abbr, (unsigned int)(strlen(abbr)*2));</tt><tt><br>
</tt><tt>./src/keys/versekey.cpp:
stringMgr->upperUTF8(abbr, (unsigned int)(strlen(abbr)*2));</tt><tt><br>
</tt><tt>./utilities/imp2gbs.cpp:
StringMgr::getSystemStringMgr()->upperUTF8(keyBuffer.getRawData(),
size-2);</tt></p>
<p>I think neither of those is currently used in Ezra Project,
though. At the moment I do not have the use case to parse verse
keys based on any special Unicode inputs. I am only using the
standard English abbreviations for verse keys and that only
happens internally. So, in this case I may just process the
locales.d files directly in node.js / JavaScript.<br>
<br>
Regarding node-sword-interface and the build process for mobile
platforms ... currently I have only tried Android, which works
fine. iOS should technically work as well, but I have not tried
that yet. The boiler plate work to make all that happen smoothly
is provided by the <a moz-do-not-send="true"
href="https://code.janeasystems.com/nodejs-mobile">nodejs-mobile</a>
cordova plugin. That plugin contains build scripts that
seemlessly compile any native node.js addons like
node-sword-interface or also the sqlite3 module that I am using.<br>
<br>
And since I am now using an API compatible runtime environment
both for Electron/nodejs and Cordova/nodejs-mobile I did not
have to add any additional glue code. One risk I see with this
approach is that the guys who provide nodejs-mobile discontinue
their work for some reason. It's essentially a completely
separately maintained fork of nodejs (it has nothing to do with
V8 actually). Originally it is based on the ChakraCore
JavaScript engine of the Microsoft Edge browser. But the
nodejs-mobile guys ported it to Android and iOS ...<br>
<br>
Regarding the StringMgr native callback possibility ... yes
technically this is possible with a node native addon like
node-sword-interface.<br>
I am using such a functionality for the InstallMgr and search
progress feedbacks already.<br>
<br>
So, long story short ... if in the future a usecase comes up to
parse Unicode-based VerseKeys, I will implement a special
StringMgr binding as you suggested. But for now I'll focus on
handling the locales.d content directly in JavaScript / node.js.<br>
<br>
I will keep you posted.<br>
<br>
Best regards,<br>
Tobias<br>
<br>
</p>
<div class="moz-cite-prefix">On 2/6/21 11:59 PM, Troy A. Griffitts
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:a580b744-b15b-3bb6-fb90-12cf32169833@crosswire.org">
<meta http-equiv="Content-Type" content="text/html;
charset=UTF-8">
<p>The data is pulled from the locales.d/ files, but the toUpper
logic is necessary in a number of places in the engine. Two
come to mind immediately:<br>
</p>
<p>parsing verse references not sensitive to case</p>
<p>parsing LD module keys not sensitive to case<br>
</p>
<p>To be able to get an uppercase representation of any Unicode
character, it takes a pretty hefty dataset of all known human
languages-- that's why we leave it up to an external library.
And yeah, because ICU is so large, that's why I don't compile
it into my binaries in Bishop. Bishop is about 13MB total,
which includes ~8MB of default module data (KJV, SME,
StrongsGreek, StrongsHebrew). That's about 5MB for the app.
If I included ICU, it would greatly increase the size. And
both iOS and Android (Swift and Java) already have facilities
for getting the toUpper of a string.</p>
<p>I hope you can steal the few lines from Bishop's native SWORD
code which tells SWORD to call either Java or Swift when
toUpperUTF is called.</p>
<p>I am sorry that this might break the nice ability to have
exactly the same code on both iOS and Android (I am surprised
that absolutely no changes were required for you to interface
to a native library on both iOS and Android! cordova required
me to provide: Android: Java-jni layer; iOS: Swift layer. I
am jealous.)</p>
<p>If you can think of an alternative, I am happy to listen. We
could provide a better StringMgr default (I think we simply
have a latin-1 single byte tranformation table for basically
ASCII characters), which includes an SW_u32 hash which
included German characters, but that's going to limit the
languages we support to only the ones we add to our toUpper
hash, and that's not really a dataset I want to maintain.</p>
<p>Open to suggestions,</p>
<p>Troy<br>
</p>
<p><br>
</p>
<div class="moz-cite-prefix">On 2/6/21 2:56 PM, Tobias Klein
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:E1l8VZ9-0003St-CW@smtprelay03.ispgateway.de">
<meta http-equiv="Content-Type" content="text/html;
charset=UTF-8">
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)">
<style>@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}pre
{mso-style-priority:99;
mso-style-link:"HTML Preformatted Char";
margin:0cm;
font-size:10.0pt;
font-family:"Courier New";}span.HTMLPreformattedChar
{mso-style-name:"HTML Preformatted Char";
mso-style-priority:99;
mso-style-link:"HTML Preformatted";
font-family:"Courier New";}.MsoChpDefault
{mso-style-type:export-only;}div.WordSection1
{page:WordSection1;}</style>
<div class="WordSection1">
<p class="MsoNormal"><span lang="EN-US">Dear Troy,<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Thank you for these
explanations! I appreciate it!<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">For Ezra Project on
Android, I am at this point simply compiling
node-sword-interface with the Android cross compilers
and it works. However, as I wrote, I have issues for the
German Bible book names now.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Is the StringMgr
functionality only used to handle the locales.d files?
Or also for some content inside any SWORD modules?<br>
<br>
<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">If it is only used
for handling the locales.d files then I would consider
handling the Sword locales.d files directly from
JavaScript / node.js, which already supports Unicode.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">I also checked
whether I can cross-compile the ICU library and that
worked, but this is a huge binary (I think 20-30 MB) and
I would rather keep the APK size as small as possible.<br>
<br>
Best regards,<br>
Tobias</span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div
style="mso-element:para-border-div;border:none;border-top:solid
#E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal" style="border:none;padding:0cm"><b>From:
</b><a href="mailto:scribe@crosswire.org"
moz-do-not-send="true">Troy A. Griffitts</a><br>
<b>Sent: </b>Sonntag, 31. Januar 2021 18:20<br>
<b>To: </b><a href="mailto:sword-devel@crosswire.org"
moz-do-not-send="true">sword-devel@crosswire.org</a><br>
<b>Subject: </b>Re: [sword-devel] Sword Locales /
German Umlaut Issues / AndroidBuild</p>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p>Dear Tobias,</p>
<p>My apologies for taking so long to respond to this, but I
wanted to give a thorough answer. See the summary at the
end if you don't care about the details.</p>
<p>So, SWORD has a class StringMgr, which manages strings
within SWORD, and by default SWORD includes a very basic
implementation, which doesn't necessarily know about or
support anything beyond what the basic C string methods
support.</p>
<p>I am sure this invokes a sense of horror from you at
first, so let me explain a bit how we properly handle
character sets. First, short background: since we existed
well before the Unicode world, we have multiple locale
files for each language, which you will still see in the
locales.d/ folder, each specifying their character
encoding, and most of the time SWORD doesn't need to
manipulate characters, so simply holding data, and passing
that data to a display frontend, and specifying a font
which will handle that encoding was enough in the old
world. IMPORTANT: the one place we do need to manipulate
character data is to perform case-insensitive
comparisons. We did this in the past by converting a
string to uppercase before comparison. You'll notice this
in the section for Bible book abbreviation in each
locale-- the partial match key must be in a toupper state.</p>
<p>Today, everything in SWORD prefers Unicode and
specifically, encoded as UTF-8. To support this:</p>
<p>First, we have utility functions within SWORD for working
with Unicode encoded strings, see:</p>
<p><a
href="http://crosswire.org/svn/sword/trunk/include/utilstr.h"
moz-do-not-send="true">http://crosswire.org/svn/sword/trunk/include/utilstr.h</a></p>
<p>Specifically:</p>
<pre>SWBuf assureValidUTF8(const char *buf);</pre>
<pre>SW_u32 getUniCharFromUTF8(const unsigned char **buf, bool skipValidation = false);</pre>
<pre>SWBuf *getUTF8FromUniChar(SW_u32 uchar, SWBuf *appendTo);</pre>
<pre>SWBuf utf8ToWChar(const char *buf);</pre>
<pre>SWBuf wcharToUTF8(const wchar_t *buf);</pre>
<pre><o:p> </o:p></pre>
<pre><o:p> </o:p></pre>
<p>To wrap this up, by subclassing StringMgr, SWORD supports
implementing character encoding by linking to other
libraries, e.g., ICU, Qt, etc. to handle full Unicode
support. And while the StringMgr interface allow
implementation of many string functions, upperUTF8 is the
only real method the SWORD engine needs to work
completely. Some utilities use the other methods in
there, but the engine, only needs this method.</p>
<p><o:p> </o:p></p>
<p>In summary, on Android, you are likely not linking to ICU
when you build the native SWORD binary-- which I don't do
either for Bishop. The Cordova SWORD plugin uses the
SWORD java-jni bindings, which use the Java VM to
implement StringMgr:</p>
<p><a
href="https://crosswire.org/svn/sword/trunk/bindings/java-jni/jni/swordstub.cpp"
moz-do-not-send="true">https://crosswire.org/svn/sword/trunk/bindings/java-jni/jni/swordstub.cpp</a>
Search for: AndroidStringMgr</p>
<p>And on iOS the Cordova plugin uses the Swift libraries to
do the same. This is done by using the SWORD flatapi call
to org_crosswire_sword_StringMgr_setToUpper to provide a
Swift implementation to uppercase a string. </p>
<p><a
href="http://crosswire.org/svn/sword/trunk/bindings/cordova/cordova-plugin-crosswire-sword/src/ios/SWORD.swift"
moz-do-not-send="true">http://crosswire.org/svn/sword/trunk/bindings/cordova/cordova-plugin-crosswire-sword/src/ios/SWORD.swift</a></p>
<p>I hope this give you the information you need to get
things working for you. Please don't hesitate to ask if
you need help,</p>
<p>Troy</p>
<p><o:p> </o:p></p>
<div>
<p class="MsoNormal">On 1/17/21 11:59 AM, Tobias Klein
wrote:<o:p></o:p></p>
</div>
<p class="MsoNormal"
style="mso-margin-top-alt:5.0pt;margin-right:36.0pt;margin-bottom:5.0pt;margin-left:36.0pt">Dear
Troy, <br>
<br>
I'm playing with an Android Build of Sword and I get
issues with the German Umlauts. <br>
<br>
So I have issues with Bible book names like Römer, Könige,
etc. <br>
<br>
The Umlauts are shown as ?. <br>
<br>
I'm configuring the SWORD build with CMake like below
(without ICU!) <br>
<br>
I remember having similar issues on Linux when building
without ICU. <br>
<br>
How do you build SWORD for Bishop? Any suggestions? <br>
<br>
Best regards, <br>
Tobias <br>
<br>
-- Check for working CXX compiler:
/opt/Android/SDK/ndk/r21b/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++<br>
-- Check for working CXX compiler:
/opt/Android/SDK/ndk/r21b/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++
-- works <br>
-- Detecting CXX compiler ABI info <br>
-- Detecting CXX compiler ABI info - done <br>
-- Detecting CXX compile features <br>
-- Detecting CXX compile features - done <br>
-- Check for working C compiler:
/opt/Android/SDK/ndk/r21b/toolchains/llvm/prebuilt/linux-x86_64/bin/clang<br>
-- Check for working C compiler:
/opt/Android/SDK/ndk/r21b/toolchains/llvm/prebuilt/linux-x86_64/bin/clang
-- works <br>
-- Detecting C compiler ABI info <br>
-- Detecting C compiler ABI info - done <br>
-- Detecting C compile features <br>
-- Detecting C compile features - done <br>
-- Configuring your system to build libsword. <br>
-- SWORD Version 1008900000 <o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
sword-devel mailing list: <a class="moz-txt-link-abbreviated" href="mailto:sword-devel@crosswire.org" moz-do-not-send="true">sword-devel@crosswire.org</a>
<a class="moz-txt-link-freetext" href="http://crosswire.org/mailman/listinfo/sword-devel" moz-do-not-send="true">http://crosswire.org/mailman/listinfo/sword-devel</a>
Instructions to unsubscribe/change your settings at above page</pre>
</blockquote>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
sword-devel mailing list: <a class="moz-txt-link-abbreviated" href="mailto:sword-devel@crosswire.org" moz-do-not-send="true">sword-devel@crosswire.org</a>
<a class="moz-txt-link-freetext" href="http://crosswire.org/mailman/listinfo/sword-devel" moz-do-not-send="true">http://crosswire.org/mailman/listinfo/sword-devel</a>
Instructions to unsubscribe/change your settings at above page</pre>
</blockquote>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
sword-devel mailing list: <a class="moz-txt-link-abbreviated" href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a>
<a class="moz-txt-link-freetext" href="http://crosswire.org/mailman/listinfo/sword-devel">http://crosswire.org/mailman/listinfo/sword-devel</a>
Instructions to unsubscribe/change your settings at above page</pre>
</blockquote>
</body>
</html>