[server-admins] MonIt
Troy A. Griffitts
scribe at crosswire.org
Fri Sep 12 12:22:56 EDT 2025
Thanks for investigating this DM. I had a look today and observed the
same behavior. Watching: `tail -f crosswire-access_log|grep study` I
see almost exclusively bots, a couple requests each second. We have a
robots.txt file which should prevent them from crawling swordweb
(sadly... it would be nice if this was searchable in whatever database
these bots are populating) but they don't see to be obeying.
My attempt to remediate:
I've added a robots.txt entry for the entire top level study/ folder and
also a `Crawl-delay: 30`. We'll see if they honor any of that.
https://crosswire.org/robots.txt
I've changed the tomcat session timeout to 7 minutes (from 30 minutes),
which will expire the swordorbserver instances from bots much faster.
/home/swordweb/servers/main/conf/web.xml#L582
We may need to start blacklisting agents in fail2ban or spend a bit of
time rethinking the TTL for swordorbserver. It doesn't hurt to nicely
`killall swordorbserver`. If a session can't connect to its orb, it
will spawn a new one-- it just takes the extra startup time for sword to
read its library, which is pretty quick on our server anyway. The issue
with bots is that swordorbserver spawns per session, and bots never
persist a session so a new session is opened per request.
Let's see how this does,
Troy
On 9/11/25 2:01 AM, DM Smith wrote:
> This time I didn’t fall asleep. The culprit is the swordorbserver
> processes. There were 8-10 created every 10 seconds. I created a loop
> to spit out the count every 10 seconds. The last count before jira was
> killed was 1040.
>
>
> top - 19:56:20 up 4 days, 4:46, 2 users, load average: 1.07, 1.06, 1.41
>
> Tasks:*1504 *total,* 2 *running,*1502 *sleeping,* 0 *stopped,* 0
> *zombie
>
> %Cpu(s):*3.8 *us,* 3.1 *sy,* 0.0 *ni,*92.5 *id,*0.4 *wa,* 0.0
> *hi,* 0.1 *si,* 0.0 *st
>
> MiB Mem :* 78860.2 *total,* 505.7 *free,*78156.8 *used,* 197.8
> *buff/cache
>
> MiB Swap:* 4883.0 *total,* 0.0 *free,*4883.0 *used.* 56.6
> *avail Mem
>
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
> COMMAND
>
> 2939079 jira 20 0 13.1g 1.3g 5804 S 0.0 1.7 3:42.35 java
>
> 2941639 swordweb 20 0 22.0g 578728 5796 S 18.5 0.7 1:35.40 java
>
> 2765973 tomcat 20 0 11.0g 229580 3744 S 0.3 0.3 1:07.09 java
>
> 1886 mysql 20 0 8814624 204424 1972 S 6.6 0.3 255:40.49
> mariadbd
>
> 2930053 crosswi+ 20 0 25.5g 107332 6924 S 0.3 0.1 0:28.18 java
>
> 1530 vmrcre 20 0 18.9g 97908 0 S 0.3 0.1 70:37.28 java
>
> 2949492 swordweb 20 0 369652 92396 13660 S 0.0 0.1 0:00.43
> swordorbserver
>
> 2949512 swordweb 20 0 370200 91440 13508 S 0.0 0.1 0:00.45
> swordorbserver
>
> 2949528 swordweb 20 0 370200 91348 13420 S 0.0 0.1 0:00.44
> swordorbserver
>
> 2949409 swordweb 20 0 370200 91336 13628 S 0.0 0.1 0:00.46
> swordorbserver
>
>
>
>
>> On Sep 10, 2025, at 7:34 PM, DM Smith <dmsmith at crosswire.org> wrote:
>>
>> I watched “top” sorted by RSS and Jira was at the top. The RSS slowly
>> went up to 1.5G and then 1.6 and finally to 1.7. But processes went
>> from 500 or so to over 1200, when I fell asleep watching it die a
>> third time. Lisa alerted me that I had fallen asleep!
>>
>> I noticed that 3 of the top processes, all java, were killed. 2
>> restarted (monit?). The number of processes dropped to around 500 and
>> have been creeping upward and it’s nearly 900 now.
>>
>> I continued to watch and swordweb died and restarted again.
>>
>> Hope this helps.
>>
>> DM
>>
>>> On Sep 10, 2025, at 4:47 PM, DM Smith <dmsmith at crosswire.org> wrote:
>>>
>>> And it died again after a few minutes. I restarted it. Not hopeful.
>>>
>>> — DM
>>>
>>>> On Sep 10, 2025, at 4:17 PM, DM Smith <dmsmith at crosswire.org> wrote:
>>>>
>>>> I don’t know how to triage or fix the underlying problem. I’ve
>>>> restarted it.
>>>>
>>>> Looking at /var/log/messages, it is a machine OOM. Same as what
>>>> Troy saw before. I’m guessing that the “OOM Reaper” is picking the
>>>> biggest memory hogs and killing them. Those are all java processes.
>>>>
>>>> There are many, many (~200) sword observer processes each taking a
>>>> paltry 90,000 KB. Perhaps these are the culprits? Is there a bound
>>>> on the pool of sword observers?
>>>>
>>>> I also noted that:
>>>> When the large java process is killed (which belongs to Jira) that
>>>> many mariadb connections by jira are terminated.
>>>> Jira needs to be updated from its current outdated version. Perhaps
>>>> the newer version has a better memory footprint?
>>>>
>>>> — DM
>>>>
>>>>> On Sep 10, 2025, at 8:11 AM, Karl Kleinpaste <karl at kleinpaste.org>
>>>>> wrote:
>>>>>
>>>>> On 9/9/25 3:57 PM, DM Smith wrote:
>>>>>> I restarted it.
>>>>>
>>>>> And it's dead again.
>>>>>
>>>>> Something is evidently more seriously wrong than a mere need to
>>>>> restart.
>>>>> _______________________________________________
>>>>> server-admins mailing list
>>>>> server-admins at crosswire.org
>>>>> https://crosswire.org/mailman/listinfo/server-admins
>>>>
>>>> _______________________________________________
>>>> server-admins mailing list
>>>> server-admins at crosswire.org
>>>> https://crosswire.org/mailman/listinfo/server-admins
>>>
>>> _______________________________________________
>>> server-admins mailing list
>>> server-admins at crosswire.org
>>> https://crosswire.org/mailman/listinfo/server-admins
>>
>> _______________________________________________
>> server-admins mailing list
>> server-admins at crosswire.org
>> https://crosswire.org/mailman/listinfo/server-admins
>
>
> _______________________________________________
> server-admins mailing list
> server-admins at crosswire.org
> https://crosswire.org/mailman/listinfo/server-admins
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://crosswire.org/pipermail/server-admins/attachments/20250912/bca2a640/attachment-0001.htm>
More information about the server-admins
mailing list