[server-admins] MonIt

Fri Sep 12 12:22:56 EDT 2025

Thanks for investigating this DM.  I had a look today and observed the 
same behavior.  Watching: `tail -f crosswire-access_log|grep study` I 
see almost exclusively bots, a couple requests each second.  We have a 
robots.txt file which should prevent them from crawling swordweb 
(sadly... it would be nice if this was searchable in whatever database 
these bots are populating) but they don't see to be obeying.

My attempt to remediate:

I've added a robots.txt entry for the entire top level study/ folder and 
also a `Crawl-delay: 30`.  We'll see if they honor any of that.

https://crosswire.org/robots.txt

I've changed the tomcat session timeout to 7 minutes (from 30 minutes), 
which will expire the swordorbserver instances from bots much faster.

/home/swordweb/servers/main/conf/web.xml#L582

We may need to start blacklisting agents in fail2ban or spend a bit of 
time rethinking the TTL for swordorbserver.  It doesn't hurt to nicely 
`killall swordorbserver`.  If a session can't connect to its orb, it 
will spawn a new one-- it just takes the extra startup time for sword to 
read its library, which is pretty quick on our server anyway.  The issue 
with bots is that swordorbserver spawns per session, and bots never 
persist a session so a new session is opened per request.

Let's see how this does,

Troy

On 9/11/25 2:01 AM, DM Smith wrote:
> This time I didn’t fall asleep. The culprit is the swordorbserver 
> processes. There were 8-10 created every 10 seconds. I created a loop 
> to spit out the count every 10 seconds. The last count before jira was 
> killed was 1040.
>
>
> top - 19:56:20 up 4 days,  4:46,  2 users,  load average: 1.07, 1.06, 1.41
>
> Tasks:*1504 *total,*  2 *running,*1502 *sleeping,*  0 *stopped,*  0 
> *zombie
>
> %Cpu(s):*3.8 *us,*  3.1 *sy,*  0.0 *ni,*92.5 *id,*0.4 *wa,*  0.0 
> *hi,*  0.1 *si,*  0.0 *st
>
> MiB Mem :*  78860.2 *total,*    505.7 *free,*78156.8 *used,*    197.8 
> *buff/cache
>
> MiB Swap:*  4883.0 *total,*      0.0 *free,*4883.0 *used.*    56.6 
> *avail Mem
>
>
>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ 
> COMMAND
>
> 2939079 jira      20   0   13.1g   1.3g   5804 S   0.0   1.7 3:42.35 java
>
> 2941639 swordweb  20   0   22.0g 578728   5796 S  18.5   0.7 1:35.40 java
>
> 2765973 tomcat    20   0   11.0g 229580   3744 S   0.3   0.3 1:07.09 java
>
> 1886 mysql     20   0 8814624 204424   1972 S   6.6   0.3 255:40.49 
> mariadbd
>
> 2930053 crosswi+  20   0   25.5g 107332   6924 S   0.3   0.1 0:28.18 java
>
> 1530 vmrcre    20   0   18.9g  97908      0 S   0.3   0.1 70:37.28 java
>
> 2949492 swordweb  20   0  369652  92396  13660 S   0.0   0.1 0:00.43 
> swordorbserver
>
> 2949512 swordweb  20   0  370200  91440  13508 S   0.0   0.1 0:00.45 
> swordorbserver
>
> 2949528 swordweb  20   0  370200  91348  13420 S   0.0   0.1 0:00.44 
> swordorbserver
>
> 2949409 swordweb  20   0  370200  91336  13628 S   0.0   0.1 0:00.46 
> swordorbserver
>
>
>
>
>> On Sep 10, 2025, at 7:34 PM, DM Smith <dmsmith at crosswire.org> wrote:
>>
>> I watched “top” sorted by RSS and Jira was at the top. The RSS slowly 
>> went up to 1.5G and then 1.6 and finally to 1.7. But processes went 
>> from 500 or so to over 1200, when I fell asleep watching it die a 
>> third time. Lisa alerted me that I had fallen asleep!
>>
>> I noticed that 3 of the top processes, all java, were killed. 2 
>> restarted (monit?). The number of processes dropped to around 500 and 
>> have been creeping upward and it’s nearly 900 now.
>>
>> I continued to watch and swordweb died and restarted again.
>>
>> Hope this helps.
>>
>> DM
>>
>>> On Sep 10, 2025, at 4:47 PM, DM Smith <dmsmith at crosswire.org> wrote:
>>>
>>> And it died again after a few minutes. I restarted it. Not hopeful.
>>>
>>> — DM
>>>
>>>> On Sep 10, 2025, at 4:17 PM, DM Smith <dmsmith at crosswire.org> wrote:
>>>>
>>>> I don’t know how to triage or fix the underlying problem. I’ve 
>>>> restarted it.
>>>>
>>>> Looking at /var/log/messages, it is a machine OOM. Same as what 
>>>> Troy saw before. I’m guessing that the “OOM Reaper” is picking the 
>>>> biggest memory hogs and killing them. Those are all java processes.
>>>>
>>>> There are many, many (~200) sword observer processes each taking a 
>>>> paltry 90,000 KB. Perhaps these are the culprits? Is there a bound 
>>>> on the pool of sword observers?
>>>>
>>>> I also noted that:
>>>> When the large java process is killed (which belongs to Jira) that 
>>>> many mariadb connections by jira are terminated.
>>>> Jira needs to be updated from its current outdated version. Perhaps 
>>>> the newer version has a better memory footprint?
>>>>
>>>> — DM
>>>>
>>>>> On Sep 10, 2025, at 8:11 AM, Karl Kleinpaste <karl at kleinpaste.org> 
>>>>> wrote:
>>>>>
>>>>> On 9/9/25 3:57 PM, DM Smith wrote:
>>>>>> I restarted it.
>>>>>
>>>>> And it's dead again.
>>>>>
>>>>> Something is evidently more seriously wrong than a mere need to 
>>>>> restart.
>>>>> _______________________________________________
>>>>> server-admins mailing list
>>>>> server-admins at crosswire.org
>>>>> https://crosswire.org/mailman/listinfo/server-admins
>>>>
>>>> _______________________________________________
>>>> server-admins mailing list
>>>> server-admins at crosswire.org
>>>> https://crosswire.org/mailman/listinfo/server-admins
>>>
>>> _______________________________________________
>>> server-admins mailing list
>>> server-admins at crosswire.org
>>> https://crosswire.org/mailman/listinfo/server-admins
>>
>> _______________________________________________
>> server-admins mailing list
>> server-admins at crosswire.org
>> https://crosswire.org/mailman/listinfo/server-admins
>
>
> _______________________________________________
> server-admins mailing list
> server-admins at crosswire.org
> https://crosswire.org/mailman/listinfo/server-admins
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://crosswire.org/pipermail/server-admins/attachments/20250912/bca2a640/attachment-0001.htm>