[sword-devel] SWORD trunk / Regression when updating remote sources

Tobias Klein contact at tklein.info
Sat Oct 17 16:41:04 EDT 2020


That was it! Issue resolved :). I ran my test twice - 40 out 40 
successful. I also checked it once more manually in the node console and 
in Ezra Project and it works without issues now!
Thank you so much, Troy.

Now I can switch to the latest SWORD trunk with Ezra Project!

Best regards,
Tobias

On 10/17/20 8:13 PM, Troy A. Griffitts wrote:
>
> untar now updated. Thanks for your time with this.
>
>
> On 10/17/20 4:36 PM, Tobias Klein wrote:
>>
>> Updated to SVN Rev. 3813.
>>
>> Still hanging here:
>>
>> #0  0x000056143eb2f28c in sword::FileMgr::sysOpen(sword::FileDesc*) ()
>> #1  0x000056143eb18b0b in sword::FileDesc::getFd() ()
>> #2  0x000056143eb9323b in (anonymous namespace)::untar(void*, char 
>> const*) ()
>> #3  0x000056143eb93b60 in sword::ZipCompress::unTarGZ(int, char 
>> const*) ()
>> #4  0x000056143eb483f6 in 
>> sword::InstallMgr::refreshRemoteSource(sword::InstallSource*) ()
>>
>> Best regards,
>> Tobias
>>
>> On 10/17/20 1:52 PM, Troy A. Griffitts wrote:
>>>
>>> OK Tobias,
>>>
>>> Give it a go when you have a chance and let me know.
>>>
>>> Troy
>>>
>>>
>>> On 10/17/20 12:16 PM, Troy A. Griffitts wrote:
>>>> The unTarGZ is also a new method and it looks like it is using the 
>>>> default file handle pool functionality of FileMgr, from looking at 
>>>> your stack trace. Give me about an hour and I'll have a chance to 
>>>> take a look at it. Good news is that it's not having trouble in the 
>>>> CURLFTPTransport. I have the same change queued up for commit for 
>>>> the other 3 transport impls so I will go ahead and commit those, as 
>>>> well. Thank you for working through this with me.
>>>>
>>>> Troy
>>>>
>>>> On October 17, 2020 10:44:18 AM GMT+02:00, Tobias Klein 
>>>> <contact at tklein.info> wrote:
>>>>
>>>>     Dear Troy,
>>>>
>>>>     Thank you so much for the help and all your work on this.
>>>>     Unfortunately the issue is still not resolved for me based on
>>>>     your latest commits.
>>>>
>>>>     I have n threads that all run InstallMgr::refreshRemoteSource.
>>>>     n corresponds to the number of repositories available, so it's
>>>>     currently 10.
>>>>     The operation works until at some point things start hanging
>>>>     again. Sometime it happens after ten consecutive calls of the
>>>>     update function, sometimes already after three times and I also
>>>>     had it hanging after only one attempt.
>>>>     In the calling function the hanging occurs when I join the
>>>>     threads (waiting for them to complete).
>>>>
>>>>     Looking at details in gdb I find this at the point of hanging:
>>>>
>>>>     (gdb) info threads
>>>>       Id   Target Id         Frame
>>>>       102  Thread 0x7f8e3cdd1700 (LWP 46520) "node_sword_cli"
>>>>     0x000056411eacb46c in sword::FileMgr::sysOpen(sword::FileDesc*) ()
>>>>       95   Thread 0x7f8e37fff700 (LWP 46514) "node_sword_cli"
>>>>     0x000056411eacb283 in sword::FileMgr::sysOpen(sword::FileDesc*) ()
>>>>       93   Thread 0x7f8e3ddf6700 (LWP 46511) "node_sword_cli"
>>>>     0x000056411eacb283 in sword::FileMgr::sysOpen(sword::FileDesc*) ()
>>>>       91   Thread 0x7f8e2bfff700 (LWP 46510) "node_sword_cli"
>>>>     0x000056411eacb46c in sword::FileMgr::sysOpen(sword::FileDesc*) ()
>>>>     * 90   Thread 0x7f8e2b7fe700 (LWP 46509) "node_sword_cli"
>>>>     0x000056411eacb296 in sword::FileMgr::sysOpen(sword::FileDesc*) ()
>>>>       1    Thread 0x7f8e3ddf9e00 (LWP 46380) "node_sword_cli"
>>>>     0x00007f8e40d98cd7 in __pthread_clockjoin_ex () from
>>>>     /lib/x86_64-linux-gnu/libpthread.so.0
>>>>
>>>>     And this stacktrace for each individual thread (relevant portion):
>>>>     #0  0x000056411eacb46c in
>>>>     sword::FileMgr::sysOpen(sword::FileDesc*) ()
>>>>     #1  0x000056411eab4b0b in sword::FileDesc::getFd() ()
>>>>     #2  0x000056411eb2fb70 in
>>>>     sword::ZipCompress::unTarGZ(sword::FileDesc*, char const*) ()
>>>>     #3  0x000056411eae4437 in
>>>>     sword::InstallMgr::refreshRemoteSource(sword::InstallSource*) ()
>>>>
>>>>     I'm not sure whether all of these threads here are now hanging
>>>>     or only one of them. It could be just the one that the main
>>>>     function tries to join right now.
>>>>
>>>>     Another observation is that I am getting random output like
>>>>     this during the process (it happens with different conf files,
>>>>     not this one all the time):
>>>>
>>>>     error writing
>>>>     /home/tobi/.sword/installMgr/20120711005000/mods.d/ngu_BL_1987.conf
>>>>     skipping...
>>>>
>>>>     I didn't get these error messages with earlier SVN revisions.
>>>>
>>>>     To be sure I just once more tested with SVN Rev. 3759 and there
>>>>     I consistently get 20 out of 20 attempts successful.
>>>>
>>>>     Best regards,
>>>>     Tobias
>>>>
>>>>     PS: I'm sending this e-mail the second time, didn't seem to
>>>>     come through via mailman the first time (at 9:11 CEST).
>>>>
>>>>     On 10/15/20 8:09 PM, Troy A. Griffitts wrote:
>>>>>
>>>>>     Dear Tobias,
>>>>>
>>>>>     Thank you for all the great information.  This enabled me to
>>>>>     isolate the change which caused the issue.
>>>>>
>>>>>     So, for a bit of background, SWORD has no calls to mark
>>>>>     critical sections which might be problematic for re-entrant
>>>>>     usage.  This has been due to the many implementations of
>>>>>     threading across many different platforms over the years,
>>>>>     before C++11.  But, as a policy to support clients which
>>>>>     desire to use SWORD in a multithreaded manner, we do our best
>>>>>     to make this safe by advising clients to use separate SWMgr
>>>>>     instances per thread.  There are still some shared objects in
>>>>>     this scenario, but we do our best to do all the writing to
>>>>>     these shared objects upon initialization.  We broke this rule
>>>>>     in commit 2760, which is what caused your problem.  SWORD have
>>>>>     a facility to pool open file handles, to help OSs which have
>>>>>     small open file handle limits.  This work is done in FileMgr. 
>>>>>     Recently, to support Windows Unicode path names (the commit
>>>>>     you found which breaks your multithreaded use), we rounded up
>>>>>     all remaining native file IO calls and replaced them to used
>>>>>     FileMgr for the IO and then extended FileMgr to handle Windows
>>>>>     Unicode paths in a Windows-specific manner.  One of these
>>>>>     changes was in CURLFTPTransport, which is where you are having
>>>>>     the issue.  The problem is that, where previously this class
>>>>>     was directly opening a FILE to do its writing, commit 2760
>>>>>     changed this to use FileMgr to open the file, which involved
>>>>>     the SWORD-wide file handle pool, and since we are create a new
>>>>>     file, we are always writing to this shared pool container,
>>>>>     which is not threadsafe.  My guess is that you have two
>>>>>     threads trying to update the pool container at exactly the
>>>>>     same time.  Using the file handle pool is usually safe,
>>>>>     because SWMgr "opens" all of its file handles on
>>>>>     initialization (these are not actually opening OS file
>>>>>     handles, but instead updating the file handle pool container
>>>>>     with proxy objects which delay actual OS open to on-demand,
>>>>>     but the point is this instance of shared file handle pool
>>>>>     container writing is done on creation of the SWMgr, afterward,
>>>>>     the shared resource file handle pool is only read and each
>>>>>     object in the pool is owned by only 1 thread if the "each
>>>>>     thread must have its own SWMgr" rule is followed.
>>>>>
>>>>>     Regardless of the details.  I believe I have committed a fix
>>>>>     for you.  In short, I have changed CURLFTPTransport to follow
>>>>>     our rule to avoid writing to shared objects when we might be
>>>>>     re-entrant.  Here we now use FileMgr's methods which isolate
>>>>>     OS implementation, but not FileMgr's file handle pool (as it
>>>>>     did not previously use the pool before this commit).  This
>>>>>     should allow this to still take advantage of the Windows
>>>>>     OS-specific implementation, and also avoid the critical
>>>>>     section.  Can you please try SVN head and let me know if we
>>>>>     are back to 20 out of 20 successes?
>>>>>
>>>>>     Thanks again for the very helpful debug log and exact revision
>>>>>     where failure began.
>>>>>
>>>>>     Troy
>>>>>
>>>>>
>>>>>     On 10/13/20 10:08 PM, Tobias Klein wrote:
>>>>>>
>>>>>>     I managed to get a backtrace to a segmentation fault using GDB.
>>>>>>
>>>>>>     It seems like the crash is happening in sword::FileMgr::open( ...
>>>>>>
>>>>>>     The starting point is sword::InstallMgr::refreshRemoteSource
>>>>>>     as I was writing before.
>>>>>>
>>>>>>     Best regards,
>>>>>>     Tobias
>>>>>>
>>>>>>     Program received signal SIGSEGV, Segmentation fault.
>>>>>>     [Switching to Thread 0x7f1af3fff700 (LWP 220833)]
>>>>>>     0x00007f1b027045a4 in sword::FileMgr::open(char const*, int,
>>>>>>     int, bool) () from
>>>>>>     /home/tobi/dev/ezra_project/node-sword-interface-git/build/Release/node_sword_interface.node
>>>>>>     (gdb) backtrace
>>>>>>     #0  0x00007f1b027045a4 in sword::FileMgr::open(char const*,
>>>>>>     int, int, bool) () from
>>>>>>     /home/tobi/dev/ezra_project/node-sword-interface-git/build/Release/node_sword_interface.node
>>>>>>     #1  0x00007f1b0276ad7b in sword::(anonymous
>>>>>>     namespace)::my_fwrite(void*, unsigned long, unsigned long,
>>>>>>     void*) ()
>>>>>>        from
>>>>>>     /home/tobi/dev/ezra_project/node-sword-interface-git/build/Release/node_sword_interface.node
>>>>>>     #2  0x00007f1b180626bf in ?? () from
>>>>>>     /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
>>>>>>     #3  0x00007f1b18074a2b in ?? () from
>>>>>>     /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
>>>>>>     #4  0x00007f1b1807e2e4 in ?? () from
>>>>>>     /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
>>>>>>     #5  0x00007f1b1807f6f9 in curl_multi_perform () from
>>>>>>     /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
>>>>>>     #6  0x00007f1b18075d13 in curl_easy_perform () from
>>>>>>     /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
>>>>>>     #7  0x00007f1b0276b683 in
>>>>>>     sword::CURLFTPTransport::getURL(char const*, char const*,
>>>>>>     sword::SWBuf*) () from
>>>>>>     /home/tobi/dev/ezra_project/node-sword-interface-git/build/Release/node_sword_interface.node
>>>>>>     #8  0x00007f1b0271d5d2 in
>>>>>>     sword::InstallMgr::remoteCopy(sword::InstallSource*, char
>>>>>>     const*, char const*, bool, char const*) ()
>>>>>>        from
>>>>>>     /home/tobi/dev/ezra_project/node-sword-interface-git/build/Release/node_sword_interface.node
>>>>>>     #9  0x00007f1b0271edc7 in
>>>>>>     sword::InstallMgr::refreshRemoteSource(sword::InstallSource*)
>>>>>>     () from
>>>>>>     /home/tobi/dev/ezra_project/node-sword-interface-git/build/Release/node_sword_interface.node
>>>>>>     #10 0x00007f1b026ad734 in
>>>>>>     RepositoryInterface::refreshIndividualRemoteSource(std::__cxx11::basic_string<char,
>>>>>>     std::char_traits<char>, std::allocator<char> >,
>>>>>>     std::function<void (unsigned int)>*) ()
>>>>>>        from
>>>>>>     /home/tobi/dev/ezra_project/node-sword-interface-git/build/Release/node_sword_interface.node
>>>>>>     #11 0x00007f1b026b17dd in
>>>>>>     std::thread::_State_impl<std::thread::_Invoker<std::tuple<int
>>>>>>     (RepositoryInterface::*)(std::__cxx11::basic_string<char,
>>>>>>     std::char_traits<char>, std::allocator<char> >,
>>>>>>     std::function<void (unsigned int)>*), RepositoryInterface*,
>>>>>>     std::__cxx11::basic_string<char, std::char_traits<char>,
>>>>>>     std::allocator<char> >, std::function<void (unsigned int)>*>
>>>>>>     > >::_M_run() ()
>>>>>>        from
>>>>>>     /home/tobi/dev/ezra_project/node-sword-interface-git/build/Release/node_sword_interface.node
>>>>>>     #12 0x00007f1b1d622cb4 in ?? () from
>>>>>>     /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>>>>>>     #13 0x00007f1b1e20a609 in start_thread () from
>>>>>>     /lib/x86_64-linux-gnu/libpthread.so.0
>>>>>>     #14 0x00007f1b1e131103 in clone () from
>>>>>>     /lib/x86_64-linux-gnu/libc.so.6
>>>>>>
>>>>>>     On 10/13/20 1:07 PM, Tobias Klein wrote:
>>>>>>>
>>>>>>>     Hi Troy,
>>>>>>>
>>>>>>>     I tested more SVN revisions of SWORD trunk (starting from my
>>>>>>>     stable version until I hit the bug) and I can now say that
>>>>>>>
>>>>>>>     SVN Rev. 3759 is the last SVN revision that works without
>>>>>>>     hanging for the below mentioned scenario. (20 out of 20
>>>>>>>     tests successful)
>>>>>>>
>>>>>>>     SVN Rev. 3760 is the first SVN revision where the hanging
>>>>>>>     occurs. The commit message is "First cut at better isolation
>>>>>>>     of FileIO to FileMgr and providing a WIN32 impl with works
>>>>>>>     with wchar_t".
>>>>>>>
>>>>>>>     Modified files:
>>>>>>>     include/filemgr.h
>>>>>>>     include/swbuf.h
>>>>>>>     lib/bcppmake/libsword.bpr
>>>>>>>     src/mgr/curlftpt.cpp
>>>>>>>     src/mgr/curlhttpt.cpp
>>>>>>>     src/mgr/filemgr.cpp
>>>>>>>     src/mgr/installmgr.cpp
>>>>>>>     src/mgr/swmgr.cpp
>>>>>>>     src/utilfuns/utilstr.cpp
>>>>>>>
>>>>>>>     Maybe this helps to find the root-cause.
>>>>>>>
>>>>>>>     Best regards,
>>>>>>>     Tobias
>>>>>>>
>>>>>>>     On 10/12/20 9:20 PM, Tobias Klein wrote:
>>>>>>>>
>>>>>>>>     I'll see whether I can collect a stack trace. It may take
>>>>>>>>     some time until I have it.
>>>>>>>>
>>>>>>>>     The multi-threaded "remote source refreshing" worked
>>>>>>>>     without issues until recently.
>>>>>>>>
>>>>>>>>     Here is the code of the function that does the actual work
>>>>>>>>     in a thread.
>>>>>>>>     See
>>>>>>>>     https://github.com/tobias-klein/node-sword-interface/blob/787160ccb4b3bab2a762d22f74031c7237edc803/src/sword_backend/repository_interface.cpp#L105.
>>>>>>>>
>>>>>>>>     intRepositoryInterface::refreshIndividualRemoteSource(stringremoteSourceName,
>>>>>>>>     std::function<void(unsignedintprogress)>*progressCallback)
>>>>>>>>     {
>>>>>>>>     //cout << "Refreshing source " << remoteSourceName << endl
>>>>>>>>     << flush;
>>>>>>>>     InstallSource* source= this->getRemoteSource(remoteSourceName);
>>>>>>>>     intresult= this->_installMgr->refreshRemoteSource(source);
>>>>>>>>     if(result!= 0) {
>>>>>>>>     cerr<<"Failed to refresh source
>>>>>>>>     "<<remoteSourceName<<endl<<flush;
>>>>>>>>     }
>>>>>>>>     remoteSourceUpdateMutex.lock();
>>>>>>>>     this->_remoteSourceUpdateCount++;
>>>>>>>>     unsignedinttotalPercent=
>>>>>>>>     (unsignedint)calculateIntPercentage<double>(this->_remoteSourceUpdateCount,
>>>>>>>>     this->_remoteSourceCount);
>>>>>>>>     if(progressCallback!= 0) {
>>>>>>>>     (*progressCallback)(totalPercent);
>>>>>>>>     }
>>>>>>>>     remoteSourceUpdateMutex.unlock();
>>>>>>>>     returnresult;
>>>>>>>>     }
>>>>>>>>
>>>>>>>>     Best regards,
>>>>>>>>     Tobias
>>>>>>>>
>>>>>>>>     On 10/12/20 9:01 PM, Troy A. Griffitts wrote:
>>>>>>>>>     Any luck getting a stack trace on crash?
>>>>>>>>>
>>>>>>>>>     Regarding the "multitheaded mode", I'd have to get a bit
>>>>>>>>>     more information as to exactly how you are sharing SWORD
>>>>>>>>>     objects across your threads. Generally, as a rule, you
>>>>>>>>>     shouldn't. We recommend a separate instance of SWMgr per
>>>>>>>>>     thread and that probably goes for InstallMgr, as well.
>>>>>>>>>
>>>>>>>>>     Troy
>>>>>>>>>
>>>>>>>>>     On October 12, 2020 8:29:31 PM GMT+02:00, Tobias Klein
>>>>>>>>>     <contact at tklein.info> wrote:
>>>>>>>>>
>>>>>>>>>         Hi Troy,
>>>>>>>>>
>>>>>>>>>         I'm using curl on all three platforms.
>>>>>>>>>
>>>>>>>>>         Regarding the timeout configuration I have not changed
>>>>>>>>>         anything yet, to make this configurable in Ezra
>>>>>>>>>         Project is still on my todo list.
>>>>>>>>>
>>>>>>>>>         I just checked on Linux.
>>>>>>>>>         With the old version (May 18th 2020) no hanging or
>>>>>>>>>         crash in 10 out of 10 times.
>>>>>>>>>         WIth the new version (latest trunk / SWORD 1.9 RC3) I
>>>>>>>>>         get 1 x crash, 2 x hanging, 7 x working.
>>>>>>>>>
>>>>>>>>>         I'm running the InstallMgr::refreshRemoteSource "in a
>>>>>>>>>         multi-threaded mode".
>>>>>>>>>
>>>>>>>>>         Best regards,
>>>>>>>>>         Tobias
>>>>>>>>>
>>>>>>>>>         On 10/12/20 6:59 PM, Troy A. Griffitts wrote:
>>>>>>>>>>         Hi Tobias,
>>>>>>>>>>
>>>>>>>>>>         What transport library are you building with? ftplib
>>>>>>>>>>         or curl?
>>>>>>>>>>
>>>>>>>>>>         Have you changed the value of our new timeout from
>>>>>>>>>>         the default, I believe we decided on, 10 seconds?
>>>>>>>>>>
>>>>>>>>>>         Troy
>>>>>>>>>>
>>>>>>>>>>         On October 12, 2020 6:46:54 PM GMT+02:00, Tobias
>>>>>>>>>>         Klein <contact at tklein.info> wrote:
>>>>>>>>>>
>>>>>>>>>>             Hi Troy,
>>>>>>>>>>
>>>>>>>>>>             In my latest Ezra Project builds using SWORD trunk I’ve been noticing random „hangs“ and crashes related to "updating remote sources“. I suppose it must be around InstallMgr::refreshRemoteSource.
>>>>>>>>>>
>>>>>>>>>>             This was still rock solid when using SWORD trunk from May 18th 2020, but not so any more with the recent SWORD trunk.
>>>>>>>>>>
>>>>>>>>>>             Unfortunately I cannot pinpoint this more specifically. I just wanted to first share this observation, because it’s worrying me.
>>>>>>>>>>
>>>>>>>>>>             I’ve been noticing this regression both on Windows and macOS. Need to check later whether this also happens on Linux, cannot recall it right now.
>>>>>>>>>>
>>>>>>>>>>             Best regards,
>>>>>>>>>>             Tobias
>>>>>>>>>>             ------------------------------------------------------------------------
>>>>>>>>>>             sword-devel mailing list:sword-devel at crosswire.org
>>>>>>>>>>             http://crosswire.org/mailman/listinfo/sword-devel
>>>>>>>>>>             Instructions to unsubscribe/change your settings at above page
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>         -- 
>>>>>>>>>>         Sent from my Android device with K-9 Mail. Please
>>>>>>>>>>         excuse my brevity. 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     -- 
>>>>>>>>>     Sent from my Android device with K-9 Mail. Please excuse
>>>>>>>>>     my brevity. 
>>>>>>>>
>>>>>>>>     _______________________________________________
>>>>>>>>     sword-devel mailing list:sword-devel at crosswire.org
>>>>>>>>     http://crosswire.org/mailman/listinfo/sword-devel
>>>>>>>>     Instructions to unsubscribe/change your settings at above page
>>>>>>>
>>>>>>>     _______________________________________________
>>>>>>>     sword-devel mailing list:sword-devel at crosswire.org
>>>>>>>     http://crosswire.org/mailman/listinfo/sword-devel
>>>>>>>     Instructions to unsubscribe/change your settings at above page
>>>>>>
>>>>>>     _______________________________________________
>>>>>>     sword-devel mailing list:sword-devel at crosswire.org
>>>>>>     http://crosswire.org/mailman/listinfo/sword-devel
>>>>>>     Instructions to unsubscribe/change your settings at above page
>>>>>
>>>>>     _______________________________________________
>>>>>     sword-devel mailing list:sword-devel at crosswire.org
>>>>>     http://crosswire.org/mailman/listinfo/sword-devel
>>>>>     Instructions to unsubscribe/change your settings at above page
>>>>
>>>>
>>>> -- 
>>>> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>>>>
>>>> _______________________________________________
>>>> sword-devel mailing list:sword-devel at crosswire.org
>>>> http://crosswire.org/mailman/listinfo/sword-devel
>>>> Instructions to unsubscribe/change your settings at above page
>>>
>>> _______________________________________________
>>> sword-devel mailing list:sword-devel at crosswire.org
>>> http://crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>>
>> _______________________________________________
>> sword-devel mailing list:sword-devel at crosswire.org
>> http://crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20201017/69480760/attachment-0001.html>


More information about the sword-devel mailing list