<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>Hi Tobias,</p>

    <p>I don't believe processing the locale files directly will change

      your issues with with German umlauts.  The issue boils down to a

      few things:</p>

    <p>First, there are other uses of upperUTF8 in the engine.  You show

      the cpp files in your grep, but not the .h files.</p>

    <p>Second, VerseKey will not work correctly for any locale without a

      proper upperUTF8 implementation which supports that locale.</p>

    <p>The issue is that verses references, freehand from outside or

      roundtripped from SWORD itself, still need to map to the uppercase

      representation of the book name.  If you look at all the locale

      files, the verse parsing table uses uppercase for all book

      abbreviations, so VerseKey's parser immediately uppercases the

      input string before it looks up the book in the table.</p>

    <p>This is also true for LD module key lookups.  They are stored in

      uppercase.  Technically it could work if both the module import

      tool and the display application were using the same StringMgr--

      the default StringMgr would uppercase the string incorrectly, but

      consistently with the display application.  But this is not the

      case.  Our import tools use a proper StringMgr to create the

      modules with a proper uppercase key and thus the display

      application must also use a proper StringMgr or things will not

      work correctly.</p>

    <p>I am afraid getting the data from the locales.d/ folders in

      JavaScript will not help fix the problem.</p>

    <p>Troy</p>

    <p><br>

    </p>

    <div class="moz-cite-prefix">On 2/7/21 6:59 AM, Tobias Klein wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:3997d80b-5dd9-0099-161d-ada9f583cb37@tklein.info">

      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      <p>Hi Troy,</p>

      <p>Thanks once more for all the details! I appreciate it.<br>

        <br>

        I just grepped quickly in the SWORD source code (grep -r

        "upperUTF8" .  | grep -v ".svn") and the method upperUTF8

        appears to be only used in the following places:<br>

        <br>

        <tt>./src/keys/versekey.cpp:                               

          stringMgr->upperUTF8(abbr, (unsigned int)(strlen(abbr)*2));</tt><tt><br>

        </tt><tt>./src/keys/versekey.cpp:                                       

          stringMgr->upperUTF8(abbr, (unsigned int)(strlen(abbr)*2));</tt><tt><br>

        </tt><tt>./utilities/imp2gbs.cpp:               

          StringMgr::getSystemStringMgr()->upperUTF8(keyBuffer.getRawData(),

          size-2);</tt></p>

      <p>I think neither of those is currently used in Ezra Project,

        though. At the moment I do not have the use case to parse verse

        keys based on any special Unicode inputs. I am only using the

        standard English abbreviations for verse keys and that only

        happens internally. So, in this case I may just process the

        locales.d files directly in node.js / JavaScript.<br>

        <br>

        Regarding node-sword-interface and the build process for mobile

        platforms ... currently I have only tried Android, which works

        fine. iOS should technically work as well, but I have not tried

        that yet. The boiler plate work to make all that happen smoothly

        is provided by the <a moz-do-not-send="true"

          href="https://code.janeasystems.com/nodejs-mobile">nodejs-mobile</a>

        cordova plugin. That plugin contains build scripts that

        seemlessly compile any native node.js addons like

        node-sword-interface or also the sqlite3 module that I am using.<br>

        <br>

        And since I am now using an API compatible runtime environment

        both for Electron/nodejs and Cordova/nodejs-mobile I did not

        have to add any additional glue code. One risk I see with this

        approach is that the guys who provide nodejs-mobile discontinue

        their work for some reason. It's essentially a completely

        separately maintained fork of nodejs (it has nothing to do with

        V8 actually). Originally it is based on the ChakraCore

        JavaScript engine of the Microsoft Edge browser. But the

        nodejs-mobile guys ported it to Android and iOS ...<br>

        <br>

        Regarding the StringMgr native callback possibility ... yes

        technically this is possible with a node native addon like

        node-sword-interface.<br>

        I am using such a functionality for the InstallMgr and search

        progress feedbacks already.<br>

        <br>

        So, long story short ... if in the future a usecase comes up to

        parse Unicode-based VerseKeys, I will implement a special

        StringMgr binding as you suggested. But for now I'll focus on

        handling the locales.d content directly in JavaScript / node.js.<br>

        <br>

        I will keep you posted.<br>

        <br>

        Best regards,<br>

        Tobias<br>

        <br>

      </p>

      <div class="moz-cite-prefix">On 2/6/21 11:59 PM, Troy A. Griffitts

        wrote:<br>

      </div>

      <blockquote type="cite"

        cite="mid:a580b744-b15b-3bb6-fb90-12cf32169833@crosswire.org">

        <meta http-equiv="Content-Type" content="text/html;

          charset=UTF-8">

        <p>The data is pulled from the locales.d/ files, but the toUpper

          logic is necessary in a number of places in the engine. Two

          come to mind immediately:<br>

        </p>

        <p>parsing verse references not sensitive to case</p>

        <p>parsing LD module keys not sensitive to case<br>

        </p>

        <p>To be able to get an uppercase representation of any Unicode

          character, it takes a pretty hefty dataset of all known human

          languages-- that's why we leave it up to an external library. 

          And yeah, because ICU is so large, that's why I don't compile

          it into my binaries in Bishop.  Bishop is about 13MB total,

          which includes ~8MB of default module data (KJV, SME,

          StrongsGreek, StrongsHebrew).  That's about 5MB for the app. 

          If I included ICU, it would greatly increase the size.  And

          both iOS and Android (Swift and Java) already have facilities

          for getting the toUpper of a string.</p>

        <p>I hope you can steal the few lines from Bishop's native SWORD

          code which tells SWORD to call either Java or Swift when

          toUpperUTF is called.</p>

        <p>I am sorry that this might break the nice ability to have

          exactly the same code on both iOS and Android (I am surprised

          that absolutely no changes were required for you to interface

          to a native library on both iOS and Android!  cordova required

          me to provide: Android: Java-jni layer; iOS: Swift layer.  I

          am jealous.)</p>

        <p>If you can think of an alternative, I am happy to listen.  We

          could provide a better StringMgr default (I think we simply

          have a latin-1 single byte tranformation table for basically

          ASCII characters), which includes an SW_u32 hash which

          included German characters, but that's going to limit the

          languages we support to only the ones we add to our toUpper

          hash, and that's not really a dataset I want to maintain.</p>

        <p>Open to suggestions,</p>

        <p>Troy<br>

        </p>

        <p><br>

        </p>

        <div class="moz-cite-prefix">On 2/6/21 2:56 PM, Tobias Klein

          wrote:<br>

        </div>

        <blockquote type="cite"

          cite="mid:E1l8VZ9-0003St-CW@smtprelay03.ispgateway.de">

          <meta http-equiv="Content-Type" content="text/html;

            charset=UTF-8">

          <meta name="Generator" content="Microsoft Word 15 (filtered

            medium)">

          <style>@font-face

        {font-family:"Cambria Math";

        panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0cm;

        font-size:11.0pt;

        font-family:"Calibri",sans-serif;}a:link, span.MsoHyperlink

        {mso-style-priority:99;

        color:blue;

        text-decoration:underline;}pre

        {mso-style-priority:99;

        mso-style-link:"HTML Preformatted Char";

        margin:0cm;

        font-size:10.0pt;

        font-family:"Courier New";}span.HTMLPreformattedChar

        {mso-style-name:"HTML Preformatted Char";

        mso-style-priority:99;

        mso-style-link:"HTML Preformatted";

        font-family:"Courier New";}.MsoChpDefault

        {mso-style-type:export-only;}div.WordSection1

        {page:WordSection1;}</style>

          <div class="WordSection1">

            <p class="MsoNormal"><span lang="EN-US">Dear Troy,<o:p></o:p></span></p>

            <p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>

            <p class="MsoNormal"><span lang="EN-US">Thank you for these

                explanations! I appreciate it!<o:p></o:p></span></p>

            <p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>

            <p class="MsoNormal"><span lang="EN-US">For Ezra Project on

                Android, I am at this point simply compiling

                node-sword-interface with the Android cross compilers

                and it works. However, as I wrote, I have issues for the

                German Bible book names now.<o:p></o:p></span></p>

            <p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>

            <p class="MsoNormal"><span lang="EN-US">Is the StringMgr

                functionality only used to handle the locales.d files?

                Or also for some content inside any SWORD modules?<br>

                <br>

                <o:p></o:p></span></p>

            <p class="MsoNormal"><span lang="EN-US">If it is only used

                for handling the locales.d files then I would consider

                handling the Sword locales.d files directly from

                JavaScript / node.js, which already supports Unicode.<o:p></o:p></span></p>

            <p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>

            <p class="MsoNormal"><span lang="EN-US">I also checked

                whether I can cross-compile the ICU library and that

                worked, but this is a huge binary (I think 20-30 MB) and

                I would rather keep the APK size as small as possible.<br>

                <br>

                Best regards,<br>

                Tobias</span></p>

            <p class="MsoNormal"><o:p> </o:p></p>

            <div

              style="mso-element:para-border-div;border:none;border-top:solid

              #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm">

              <p class="MsoNormal" style="border:none;padding:0cm"><b>From:

                </b><a href="mailto:scribe@crosswire.org"

                  moz-do-not-send="true">Troy A. Griffitts</a><br>

                <b>Sent: </b>Sonntag, 31. Januar 2021 18:20<br>

                <b>To: </b><a href="mailto:sword-devel@crosswire.org"

                  moz-do-not-send="true">sword-devel@crosswire.org</a><br>

                <b>Subject: </b>Re: [sword-devel] Sword Locales /

                German Umlaut Issues / AndroidBuild</p>

            </div>

            <p class="MsoNormal"><o:p> </o:p></p>

            <p>Dear Tobias,</p>

            <p>My apologies for taking so long to respond to this, but I

              wanted to give a thorough answer.  See the summary at the

              end if you don't care about the details.</p>

            <p>So, SWORD has a class StringMgr, which manages strings

              within SWORD, and by default SWORD includes a very basic

              implementation, which doesn't necessarily know about or

              support anything beyond what the basic C string methods

              support.</p>

            <p>I am sure this invokes a sense of horror from you at

              first, so let me explain a bit how we properly handle

              character sets.  First, short background: since we existed

              well before the Unicode world, we have multiple locale

              files for each language, which you will still see in the

              locales.d/ folder, each specifying their character

              encoding, and most of the time SWORD doesn't need to

              manipulate characters, so simply holding data, and passing

              that data to a display frontend, and specifying a font

              which will handle that encoding was enough in the old

              world.  IMPORTANT: the one place we do need to manipulate

              character data is to perform case-insensitive

              comparisons.  We did this in the past by converting a

              string to uppercase before comparison.  You'll notice this

              in the section for Bible book abbreviation in each

              locale-- the partial match key must be in a toupper state.</p>

            <p>Today, everything in SWORD prefers Unicode and

              specifically, encoded as UTF-8.  To support this:</p>

            <p>First, we have utility functions within SWORD for working

              with Unicode encoded strings, see:</p>

            <p><a

                href="http://crosswire.org/svn/sword/trunk/include/utilstr.h"

                moz-do-not-send="true">http://crosswire.org/svn/sword/trunk/include/utilstr.h</a></p>

            <p>Specifically:</p>

            <pre>SWBuf assureValidUTF8(const char *buf);</pre>

            <pre>SW_u32 getUniCharFromUTF8(const unsigned char **buf, bool skipValidation = false);</pre>

            <pre>SWBuf *getUTF8FromUniChar(SW_u32 uchar, SWBuf *appendTo);</pre>

            <pre>SWBuf utf8ToWChar(const char *buf);</pre>

            <pre>SWBuf wcharToUTF8(const wchar_t *buf);</pre>

            <pre><o:p> </o:p></pre>

            <pre><o:p> </o:p></pre>

            <p>To wrap this up, by subclassing StringMgr, SWORD supports

              implementing character encoding by linking to other

              libraries, e.g., ICU, Qt, etc. to handle full Unicode

              support.  And while the StringMgr interface allow

              implementation of many string functions, upperUTF8 is the

              only real method the SWORD engine needs to work

              completely.  Some utilities use the other methods in

              there, but the engine, only needs this method.</p>

            <p><o:p> </o:p></p>

            <p>In summary, on Android, you are likely not linking to ICU

              when you build the native SWORD binary-- which I don't do

              either for Bishop.  The Cordova SWORD plugin uses the

              SWORD java-jni bindings, which use the Java VM to

              implement StringMgr:</p>

            <p><a

href="https://crosswire.org/svn/sword/trunk/bindings/java-jni/jni/swordstub.cpp"

                moz-do-not-send="true">https://crosswire.org/svn/sword/trunk/bindings/java-jni/jni/swordstub.cpp</a>

              Search for: AndroidStringMgr</p>

            <p>And on iOS the Cordova plugin uses the Swift libraries to

              do the same.  This is done by using the SWORD flatapi call

              to org_crosswire_sword_StringMgr_setToUpper to provide a

              Swift implementation to uppercase a string. </p>

            <p><a

href="http://crosswire.org/svn/sword/trunk/bindings/cordova/cordova-plugin-crosswire-sword/src/ios/SWORD.swift"

                moz-do-not-send="true">http://crosswire.org/svn/sword/trunk/bindings/cordova/cordova-plugin-crosswire-sword/src/ios/SWORD.swift</a></p>

            <p>I hope this give you the information you need to get

              things working for you.  Please don't hesitate to ask if

              you need help,</p>

            <p>Troy</p>

            <p><o:p> </o:p></p>

            <div>

              <p class="MsoNormal">On 1/17/21 11:59 AM, Tobias Klein

                wrote:<o:p></o:p></p>

            </div>

            <p class="MsoNormal"

style="mso-margin-top-alt:5.0pt;margin-right:36.0pt;margin-bottom:5.0pt;margin-left:36.0pt">Dear

              Troy, <br>

              <br>

              I'm playing with an Android Build of Sword and I get

              issues with the German Umlauts. <br>

              <br>

              So I have issues with Bible book names like Römer, Könige,

              etc. <br>

              <br>

              The Umlauts are shown as ?. <br>

              <br>

              I'm configuring the SWORD build with CMake like below

              (without ICU!) <br>

              <br>

              I remember having similar issues on Linux when building

              without ICU. <br>

              <br>

              How do you build SWORD for Bishop? Any suggestions? <br>

              <br>

              Best regards, <br>

              Tobias <br>

              <br>

              -- Check for working CXX compiler:

/opt/Android/SDK/ndk/r21b/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++<br>

              -- Check for working CXX compiler:

/opt/Android/SDK/ndk/r21b/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++

              -- works <br>

              -- Detecting CXX compiler ABI info <br>

              -- Detecting CXX compiler ABI info - done <br>

              -- Detecting CXX compile features <br>

              -- Detecting CXX compile features - done <br>

              -- Check for working C compiler:

/opt/Android/SDK/ndk/r21b/toolchains/llvm/prebuilt/linux-x86_64/bin/clang<br>

              -- Check for working C compiler:

/opt/Android/SDK/ndk/r21b/toolchains/llvm/prebuilt/linux-x86_64/bin/clang

              -- works <br>

              -- Detecting C compiler ABI info <br>

              -- Detecting C compiler ABI info - done <br>

              -- Detecting C compile features <br>

              -- Detecting C compile features - done <br>

              -- Configuring your system to build libsword. <br>

              -- SWORD Version 1008900000 <o:p></o:p></p>

            <p class="MsoNormal"><o:p> </o:p></p>

          </div>

          <br>

          <fieldset class="mimeAttachmentHeader"></fieldset>

          <pre class="moz-quote-pre" wrap="">_______________________________________________

sword-devel mailing list: <a class="moz-txt-link-abbreviated" href="mailto:sword-devel@crosswire.org" moz-do-not-send="true">sword-devel@crosswire.org</a>

<a class="moz-txt-link-freetext" href="http://crosswire.org/mailman/listinfo/sword-devel" moz-do-not-send="true">http://crosswire.org/mailman/listinfo/sword-devel</a>

Instructions to unsubscribe/change your settings at above page</pre>

        </blockquote>

        <br>

        <fieldset class="mimeAttachmentHeader"></fieldset>

        <pre class="moz-quote-pre" wrap="">_______________________________________________

sword-devel mailing list: <a class="moz-txt-link-abbreviated" href="mailto:sword-devel@crosswire.org" moz-do-not-send="true">sword-devel@crosswire.org</a>

<a class="moz-txt-link-freetext" href="http://crosswire.org/mailman/listinfo/sword-devel" moz-do-not-send="true">http://crosswire.org/mailman/listinfo/sword-devel</a>

Instructions to unsubscribe/change your settings at above page</pre>

      </blockquote>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <pre class="moz-quote-pre" wrap="">_______________________________________________

sword-devel mailing list: <a class="moz-txt-link-abbreviated" href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a>

<a class="moz-txt-link-freetext" href="http://crosswire.org/mailman/listinfo/sword-devel">http://crosswire.org/mailman/listinfo/sword-devel</a>

Instructions to unsubscribe/change your settings at above page</pre>

    </blockquote>

  </body>

</html>