[sword-devel] SWORD trunk / Regression when updating remote sources
Troy A. Griffitts
scribe at crosswire.org
Sat Oct 17 14:13:46 EDT 2020
untar now updated. Thanks for your time with this.
On 10/17/20 4:36 PM, Tobias Klein wrote:
>
> Updated to SVN Rev. 3813.
>
> Still hanging here:
>
> #0 0x000056143eb2f28c in sword::FileMgr::sysOpen(sword::FileDesc*) ()
> #1 0x000056143eb18b0b in sword::FileDesc::getFd() ()
> #2 0x000056143eb9323b in (anonymous namespace)::untar(void*, char
> const*) ()
> #3 0x000056143eb93b60 in sword::ZipCompress::unTarGZ(int, char const*) ()
> #4 0x000056143eb483f6 in
> sword::InstallMgr::refreshRemoteSource(sword::InstallSource*) ()
>
> Best regards,
> Tobias
>
> On 10/17/20 1:52 PM, Troy A. Griffitts wrote:
>>
>> OK Tobias,
>>
>> Give it a go when you have a chance and let me know.
>>
>> Troy
>>
>>
>> On 10/17/20 12:16 PM, Troy A. Griffitts wrote:
>>> The unTarGZ is also a new method and it looks like it is using the
>>> default file handle pool functionality of FileMgr, from looking at
>>> your stack trace. Give me about an hour and I'll have a chance to
>>> take a look at it. Good news is that it's not having trouble in the
>>> CURLFTPTransport. I have the same change queued up for commit for
>>> the other 3 transport impls so I will go ahead and commit those, as
>>> well. Thank you for working through this with me.
>>>
>>> Troy
>>>
>>> On October 17, 2020 10:44:18 AM GMT+02:00, Tobias Klein
>>> <contact at tklein.info> wrote:
>>>
>>> Dear Troy,
>>>
>>> Thank you so much for the help and all your work on this.
>>> Unfortunately the issue is still not resolved for me based on
>>> your latest commits.
>>>
>>> I have n threads that all run InstallMgr::refreshRemoteSource. n
>>> corresponds to the number of repositories available, so it's
>>> currently 10.
>>> The operation works until at some point things start hanging
>>> again. Sometime it happens after ten consecutive calls of the
>>> update function, sometimes already after three times and I also
>>> had it hanging after only one attempt.
>>> In the calling function the hanging occurs when I join the
>>> threads (waiting for them to complete).
>>>
>>> Looking at details in gdb I find this at the point of hanging:
>>>
>>> (gdb) info threads
>>> Id Target Id Frame
>>> 102 Thread 0x7f8e3cdd1700 (LWP 46520) "node_sword_cli"
>>> 0x000056411eacb46c in sword::FileMgr::sysOpen(sword::FileDesc*) ()
>>> 95 Thread 0x7f8e37fff700 (LWP 46514) "node_sword_cli"
>>> 0x000056411eacb283 in sword::FileMgr::sysOpen(sword::FileDesc*) ()
>>> 93 Thread 0x7f8e3ddf6700 (LWP 46511) "node_sword_cli"
>>> 0x000056411eacb283 in sword::FileMgr::sysOpen(sword::FileDesc*) ()
>>> 91 Thread 0x7f8e2bfff700 (LWP 46510) "node_sword_cli"
>>> 0x000056411eacb46c in sword::FileMgr::sysOpen(sword::FileDesc*) ()
>>> * 90 Thread 0x7f8e2b7fe700 (LWP 46509) "node_sword_cli"
>>> 0x000056411eacb296 in sword::FileMgr::sysOpen(sword::FileDesc*) ()
>>> 1 Thread 0x7f8e3ddf9e00 (LWP 46380) "node_sword_cli"
>>> 0x00007f8e40d98cd7 in __pthread_clockjoin_ex () from
>>> /lib/x86_64-linux-gnu/libpthread.so.0
>>>
>>> And this stacktrace for each individual thread (relevant portion):
>>> #0 0x000056411eacb46c in
>>> sword::FileMgr::sysOpen(sword::FileDesc*) ()
>>> #1 0x000056411eab4b0b in sword::FileDesc::getFd() ()
>>> #2 0x000056411eb2fb70 in
>>> sword::ZipCompress::unTarGZ(sword::FileDesc*, char const*) ()
>>> #3 0x000056411eae4437 in
>>> sword::InstallMgr::refreshRemoteSource(sword::InstallSource*) ()
>>>
>>> I'm not sure whether all of these threads here are now hanging
>>> or only one of them. It could be just the one that the main
>>> function tries to join right now.
>>>
>>> Another observation is that I am getting random output like this
>>> during the process (it happens with different conf files, not
>>> this one all the time):
>>>
>>> error writing
>>> /home/tobi/.sword/installMgr/20120711005000/mods.d/ngu_BL_1987.conf
>>> skipping...
>>>
>>> I didn't get these error messages with earlier SVN revisions.
>>>
>>> To be sure I just once more tested with SVN Rev. 3759 and there
>>> I consistently get 20 out of 20 attempts successful.
>>>
>>> Best regards,
>>> Tobias
>>>
>>> PS: I'm sending this e-mail the second time, didn't seem to come
>>> through via mailman the first time (at 9:11 CEST).
>>>
>>> On 10/15/20 8:09 PM, Troy A. Griffitts wrote:
>>>>
>>>> Dear Tobias,
>>>>
>>>> Thank you for all the great information. This enabled me to
>>>> isolate the change which caused the issue.
>>>>
>>>> So, for a bit of background, SWORD has no calls to mark
>>>> critical sections which might be problematic for re-entrant
>>>> usage. This has been due to the many implementations of
>>>> threading across many different platforms over the years,
>>>> before C++11. But, as a policy to support clients which desire
>>>> to use SWORD in a multithreaded manner, we do our best to make
>>>> this safe by advising clients to use separate SWMgr instances
>>>> per thread. There are still some shared objects in this
>>>> scenario, but we do our best to do all the writing to these
>>>> shared objects upon initialization. We broke this rule in
>>>> commit 2760, which is what caused your problem. SWORD have a
>>>> facility to pool open file handles, to help OSs which have
>>>> small open file handle limits. This work is done in FileMgr.
>>>> Recently, to support Windows Unicode path names (the commit you
>>>> found which breaks your multithreaded use), we rounded up all
>>>> remaining native file IO calls and replaced them to used
>>>> FileMgr for the IO and then extended FileMgr to handle Windows
>>>> Unicode paths in a Windows-specific manner. One of these
>>>> changes was in CURLFTPTransport, which is where you are having
>>>> the issue. The problem is that, where previously this class
>>>> was directly opening a FILE to do its writing, commit 2760
>>>> changed this to use FileMgr to open the file, which involved
>>>> the SWORD-wide file handle pool, and since we are create a new
>>>> file, we are always writing to this shared pool container,
>>>> which is not threadsafe. My guess is that you have two threads
>>>> trying to update the pool container at exactly the same time.
>>>> Using the file handle pool is usually safe, because SWMgr
>>>> "opens" all of its file handles on initialization (these are
>>>> not actually opening OS file handles, but instead updating the
>>>> file handle pool container with proxy objects which delay
>>>> actual OS open to on-demand, but the point is this instance of
>>>> shared file handle pool container writing is done on creation
>>>> of the SWMgr, afterward, the shared resource file handle pool
>>>> is only read and each object in the pool is owned by only 1
>>>> thread if the "each thread must have its own SWMgr" rule is
>>>> followed.
>>>>
>>>> Regardless of the details. I believe I have committed a fix
>>>> for you. In short, I have changed CURLFTPTransport to follow
>>>> our rule to avoid writing to shared objects when we might be
>>>> re-entrant. Here we now use FileMgr's methods which isolate OS
>>>> implementation, but not FileMgr's file handle pool (as it did
>>>> not previously use the pool before this commit). This should
>>>> allow this to still take advantage of the Windows OS-specific
>>>> implementation, and also avoid the critical section. Can you
>>>> please try SVN head and let me know if we are back to 20 out of
>>>> 20 successes?
>>>>
>>>> Thanks again for the very helpful debug log and exact revision
>>>> where failure began.
>>>>
>>>> Troy
>>>>
>>>>
>>>> On 10/13/20 10:08 PM, Tobias Klein wrote:
>>>>>
>>>>> I managed to get a backtrace to a segmentation fault using GDB.
>>>>>
>>>>> It seems like the crash is happening in sword::FileMgr::open( ...
>>>>>
>>>>> The starting point is sword::InstallMgr::refreshRemoteSource
>>>>> as I was writing before.
>>>>>
>>>>> Best regards,
>>>>> Tobias
>>>>>
>>>>> Program received signal SIGSEGV, Segmentation fault.
>>>>> [Switching to Thread 0x7f1af3fff700 (LWP 220833)]
>>>>> 0x00007f1b027045a4 in sword::FileMgr::open(char const*, int,
>>>>> int, bool) () from
>>>>> /home/tobi/dev/ezra_project/node-sword-interface-git/build/Release/node_sword_interface.node
>>>>> (gdb) backtrace
>>>>> #0 0x00007f1b027045a4 in sword::FileMgr::open(char const*,
>>>>> int, int, bool) () from
>>>>> /home/tobi/dev/ezra_project/node-sword-interface-git/build/Release/node_sword_interface.node
>>>>> #1 0x00007f1b0276ad7b in sword::(anonymous
>>>>> namespace)::my_fwrite(void*, unsigned long, unsigned long,
>>>>> void*) ()
>>>>> from
>>>>> /home/tobi/dev/ezra_project/node-sword-interface-git/build/Release/node_sword_interface.node
>>>>> #2 0x00007f1b180626bf in ?? () from
>>>>> /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
>>>>> #3 0x00007f1b18074a2b in ?? () from
>>>>> /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
>>>>> #4 0x00007f1b1807e2e4 in ?? () from
>>>>> /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
>>>>> #5 0x00007f1b1807f6f9 in curl_multi_perform () from
>>>>> /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
>>>>> #6 0x00007f1b18075d13 in curl_easy_perform () from
>>>>> /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
>>>>> #7 0x00007f1b0276b683 in sword::CURLFTPTransport::getURL(char
>>>>> const*, char const*, sword::SWBuf*) () from
>>>>> /home/tobi/dev/ezra_project/node-sword-interface-git/build/Release/node_sword_interface.node
>>>>> #8 0x00007f1b0271d5d2 in
>>>>> sword::InstallMgr::remoteCopy(sword::InstallSource*, char
>>>>> const*, char const*, bool, char const*) ()
>>>>> from
>>>>> /home/tobi/dev/ezra_project/node-sword-interface-git/build/Release/node_sword_interface.node
>>>>> #9 0x00007f1b0271edc7 in
>>>>> sword::InstallMgr::refreshRemoteSource(sword::InstallSource*)
>>>>> () from
>>>>> /home/tobi/dev/ezra_project/node-sword-interface-git/build/Release/node_sword_interface.node
>>>>> #10 0x00007f1b026ad734 in
>>>>> RepositoryInterface::refreshIndividualRemoteSource(std::__cxx11::basic_string<char,
>>>>> std::char_traits<char>, std::allocator<char> >,
>>>>> std::function<void (unsigned int)>*) ()
>>>>> from
>>>>> /home/tobi/dev/ezra_project/node-sword-interface-git/build/Release/node_sword_interface.node
>>>>> #11 0x00007f1b026b17dd in
>>>>> std::thread::_State_impl<std::thread::_Invoker<std::tuple<int
>>>>> (RepositoryInterface::*)(std::__cxx11::basic_string<char,
>>>>> std::char_traits<char>, std::allocator<char> >,
>>>>> std::function<void (unsigned int)>*), RepositoryInterface*,
>>>>> std::__cxx11::basic_string<char, std::char_traits<char>,
>>>>> std::allocator<char> >, std::function<void (unsigned int)>*> >
>>>>> >::_M_run() ()
>>>>> from
>>>>> /home/tobi/dev/ezra_project/node-sword-interface-git/build/Release/node_sword_interface.node
>>>>> #12 0x00007f1b1d622cb4 in ?? () from
>>>>> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>>>>> #13 0x00007f1b1e20a609 in start_thread () from
>>>>> /lib/x86_64-linux-gnu/libpthread.so.0
>>>>> #14 0x00007f1b1e131103 in clone () from
>>>>> /lib/x86_64-linux-gnu/libc.so.6
>>>>>
>>>>> On 10/13/20 1:07 PM, Tobias Klein wrote:
>>>>>>
>>>>>> Hi Troy,
>>>>>>
>>>>>> I tested more SVN revisions of SWORD trunk (starting from my
>>>>>> stable version until I hit the bug) and I can now say that
>>>>>>
>>>>>> SVN Rev. 3759 is the last SVN revision that works without
>>>>>> hanging for the below mentioned scenario. (20 out of 20 tests
>>>>>> successful)
>>>>>>
>>>>>> SVN Rev. 3760 is the first SVN revision where the hanging
>>>>>> occurs. The commit message is "First cut at better isolation
>>>>>> of FileIO to FileMgr and providing a WIN32 impl with works
>>>>>> with wchar_t".
>>>>>>
>>>>>> Modified files:
>>>>>> include/filemgr.h
>>>>>> include/swbuf.h
>>>>>> lib/bcppmake/libsword.bpr
>>>>>> src/mgr/curlftpt.cpp
>>>>>> src/mgr/curlhttpt.cpp
>>>>>> src/mgr/filemgr.cpp
>>>>>> src/mgr/installmgr.cpp
>>>>>> src/mgr/swmgr.cpp
>>>>>> src/utilfuns/utilstr.cpp
>>>>>>
>>>>>> Maybe this helps to find the root-cause.
>>>>>>
>>>>>> Best regards,
>>>>>> Tobias
>>>>>>
>>>>>> On 10/12/20 9:20 PM, Tobias Klein wrote:
>>>>>>>
>>>>>>> I'll see whether I can collect a stack trace. It may take
>>>>>>> some time until I have it.
>>>>>>>
>>>>>>> The multi-threaded "remote source refreshing" worked without
>>>>>>> issues until recently.
>>>>>>>
>>>>>>> Here is the code of the function that does the actual work
>>>>>>> in a thread.
>>>>>>> See
>>>>>>> https://github.com/tobias-klein/node-sword-interface/blob/787160ccb4b3bab2a762d22f74031c7237edc803/src/sword_backend/repository_interface.cpp#L105.
>>>>>>>
>>>>>>> intRepositoryInterface::refreshIndividualRemoteSource(stringremoteSourceName,
>>>>>>> std::function<void(unsignedintprogress)>*progressCallback)
>>>>>>> {
>>>>>>> //cout << "Refreshing source " << remoteSourceName << endl
>>>>>>> << flush;
>>>>>>> InstallSource* source= this->getRemoteSource(remoteSourceName);
>>>>>>> intresult= this->_installMgr->refreshRemoteSource(source);
>>>>>>> if(result!= 0) {
>>>>>>> cerr<<"Failed to refresh source
>>>>>>> "<<remoteSourceName<<endl<<flush;
>>>>>>> }
>>>>>>> remoteSourceUpdateMutex.lock();
>>>>>>> this->_remoteSourceUpdateCount++;
>>>>>>> unsignedinttotalPercent=
>>>>>>> (unsignedint)calculateIntPercentage<double>(this->_remoteSourceUpdateCount,
>>>>>>> this->_remoteSourceCount);
>>>>>>> if(progressCallback!= 0) {
>>>>>>> (*progressCallback)(totalPercent);
>>>>>>> }
>>>>>>> remoteSourceUpdateMutex.unlock();
>>>>>>> returnresult;
>>>>>>> }
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Tobias
>>>>>>>
>>>>>>> On 10/12/20 9:01 PM, Troy A. Griffitts wrote:
>>>>>>>> Any luck getting a stack trace on crash?
>>>>>>>>
>>>>>>>> Regarding the "multitheaded mode", I'd have to get a bit
>>>>>>>> more information as to exactly how you are sharing SWORD
>>>>>>>> objects across your threads. Generally, as a rule, you
>>>>>>>> shouldn't. We recommend a separate instance of SWMgr per
>>>>>>>> thread and that probably goes for InstallMgr, as well.
>>>>>>>>
>>>>>>>> Troy
>>>>>>>>
>>>>>>>> On October 12, 2020 8:29:31 PM GMT+02:00, Tobias Klein
>>>>>>>> <contact at tklein.info> wrote:
>>>>>>>>
>>>>>>>> Hi Troy,
>>>>>>>>
>>>>>>>> I'm using curl on all three platforms.
>>>>>>>>
>>>>>>>> Regarding the timeout configuration I have not changed
>>>>>>>> anything yet, to make this configurable in Ezra Project
>>>>>>>> is still on my todo list.
>>>>>>>>
>>>>>>>> I just checked on Linux.
>>>>>>>> With the old version (May 18th 2020) no hanging or
>>>>>>>> crash in 10 out of 10 times.
>>>>>>>> WIth the new version (latest trunk / SWORD 1.9 RC3) I
>>>>>>>> get 1 x crash, 2 x hanging, 7 x working.
>>>>>>>>
>>>>>>>> I'm running the InstallMgr::refreshRemoteSource "in a
>>>>>>>> multi-threaded mode".
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Tobias
>>>>>>>>
>>>>>>>> On 10/12/20 6:59 PM, Troy A. Griffitts wrote:
>>>>>>>>> Hi Tobias,
>>>>>>>>>
>>>>>>>>> What transport library are you building with? ftplib
>>>>>>>>> or curl?
>>>>>>>>>
>>>>>>>>> Have you changed the value of our new timeout from the
>>>>>>>>> default, I believe we decided on, 10 seconds?
>>>>>>>>>
>>>>>>>>> Troy
>>>>>>>>>
>>>>>>>>> On October 12, 2020 6:46:54 PM GMT+02:00, Tobias Klein
>>>>>>>>> <contact at tklein.info> wrote:
>>>>>>>>>
>>>>>>>>> Hi Troy,
>>>>>>>>>
>>>>>>>>> In my latest Ezra Project builds using SWORD trunk I’ve been noticing random „hangs“ and crashes related to "updating remote sources“. I suppose it must be around InstallMgr::refreshRemoteSource.
>>>>>>>>>
>>>>>>>>> This was still rock solid when using SWORD trunk from May 18th 2020, but not so any more with the recent SWORD trunk.
>>>>>>>>>
>>>>>>>>> Unfortunately I cannot pinpoint this more specifically. I just wanted to first share this observation, because it’s worrying me.
>>>>>>>>>
>>>>>>>>> I’ve been noticing this regression both on Windows and macOS. Need to check later whether this also happens on Linux, cannot recall it right now.
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Tobias
>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>> sword-devel mailing list:sword-devel at crosswire.org
>>>>>>>>> http://crosswire.org/mailman/listinfo/sword-devel <http://crosswire.org/mailman/listinfo/sword-devel>
>>>>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Sent from my Android device with K-9 Mail. Please
>>>>>>>>> excuse my brevity.
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Sent from my Android device with K-9 Mail. Please excuse my
>>>>>>>> brevity.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> sword-devel mailing list:sword-devel at crosswire.org
>>>>>>> http://crosswire.org/mailman/listinfo/sword-devel
>>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>>>
>>>>>> _______________________________________________
>>>>>> sword-devel mailing list:sword-devel at crosswire.org
>>>>>> http://crosswire.org/mailman/listinfo/sword-devel
>>>>>> Instructions to unsubscribe/change your settings at above page
>>>>>
>>>>> _______________________________________________
>>>>> sword-devel mailing list:sword-devel at crosswire.org
>>>>> http://crosswire.org/mailman/listinfo/sword-devel
>>>>> Instructions to unsubscribe/change your settings at above page
>>>>
>>>> _______________________________________________
>>>> sword-devel mailing list:sword-devel at crosswire.org
>>>> http://crosswire.org/mailman/listinfo/sword-devel
>>>> Instructions to unsubscribe/change your settings at above page
>>>
>>>
>>> --
>>> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>>>
>>> _______________________________________________
>>> sword-devel mailing list:sword-devel at crosswire.org
>>> http://crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>>
>> _______________________________________________
>> sword-devel mailing list:sword-devel at crosswire.org
>> http://crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20201017/21e53603/attachment-0001.html>
More information about the sword-devel
mailing list