Page 1 of 1

Monitoring Engine Not Working on DR Server (ramdisk errors)

Posted: Wed Oct 16, 2024 11:50 am
by vornado
Hello all,

On our Nagios disaster recovery (DR) server, we normally keep the monitoring engine turned off (Admin | System Status, click icon to start or stop). This morning I updated the server to Nagios 2024R1.3 and thought it would be a good idea to check that Nagios was working properly. I checked the host and service status pages and found that the 'Last Checked' date was not updating.

Checking status with systemctl gives this:

Code: Select all

[root@C155MNAG02 ~]# systemctl status nagios
● nagios.service - Nagios Core 4.5.3
     Loaded: loaded (/usr/lib/systemd/system/nagios.service; disabled; preset: disabled)
     Active: active (running) since Wed 2024-10-16 12:26:48 EDT; 10min ago
       Docs: https://www.nagios.org/documentation
    Process: 6820 ExecStartPre=/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg (code=exited, status=0/SUCCESS)
    Process: 6821 ExecStart=/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg (code=exited, status=0/SUCCESS)
   Main PID: 6822 (nagios)
      Tasks: 6 (limit: 48799)
     Memory: 24.4M
        CPU: 1.609s
     CGroup: /system.slice/nagios.service
             ├─6822 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
             ├─6823 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
             ├─6824 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
             ├─6825 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
             ├─6826 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
             └─6842 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

Oct 16 12:36:08 C155MNAG02.vornadort.com nagios[6822]: Error: Unable to rename file '/usr/local/nagios/var/nagios.tmpOEn8Gf' to '/ramdisk/status.dat': Permission denied
Oct 16 12:36:08 C155MNAG02.vornadort.com nagios[6822]: Error: Unable to update status data file '/ramdisk/status.dat': Permission denied
Oct 16 12:36:18 C155MNAG02.vornadort.com nagios[6822]: Error: Unable to rename file '/usr/local/nagios/var/nagios.tmpsXf9v9' to '/ramdisk/status.dat': Permission denied
Oct 16 12:36:18 C155MNAG02.vornadort.com nagios[6822]: Error: Unable to update status data file '/ramdisk/status.dat': Permission denied
Oct 16 12:36:28 C155MNAG02.vornadort.com nagios[6822]: Error: Unable to rename file '/usr/local/nagios/var/nagios.tmpA4ar8A' to '/ramdisk/status.dat': Permission denied
Oct 16 12:36:28 C155MNAG02.vornadort.com nagios[6822]: Error: Unable to update status data file '/ramdisk/status.dat': Permission denied
Oct 16 12:36:38 C155MNAG02.vornadort.com nagios[6822]: Error: Unable to rename file '/usr/local/nagios/var/nagios.tmphYGgpE' to '/ramdisk/status.dat': Permission denied
Oct 16 12:36:38 C155MNAG02.vornadort.com nagios[6822]: Error: Unable to update status data file '/ramdisk/status.dat': Permission denied
Oct 16 12:36:48 C155MNAG02.vornadort.com nagios[6822]: Error: Unable to rename file '/usr/local/nagios/var/nagios.tmpqFHViX' to '/ramdisk/status.dat': No such file or directory
Oct 16 12:36:48 C155MNAG02.vornadort.com nagios[6822]: Error: Unable to update status data file '/ramdisk/status.dat': No such file or directory
Checking ramdisk.service status, I get this:

Code: Select all

[root@C155MNAG02 ~]# systemctl status ramdisk
● ramdisk.service - Ramdisk
     Loaded: loaded (/usr/lib/systemd/system/ramdisk.service; enabled; preset: disabled)
     Active: active (exited) since Wed 2024-10-16 12:06:15 EDT; 43min ago
    Process: 870 ExecStartPre=/usr/bin/mkdir -p -m 775 /var/nagiosramdisk /var/nagiosramdisk/tmp /var/nagiosramdisk/spool /var/nagiosramdisk/spool/checkresults /var/nagiosramdisk/spool/xidp>
    Process: 875 ExecStartPre=/usr/bin/mount -t tmpfs -o size=500m tmpfs /var/nagiosramdisk (code=exited, status=0/SUCCESS)
    Process: 876 ExecStartPre=/usr/bin/mkdir -p -m 775 /var/nagiosramdisk /var/nagiosramdisk/tmp /var/nagiosramdisk/spool /var/nagiosramdisk/spool/checkresults /var/nagiosramdisk/spool/xidp>
    Process: 879 ExecStart=/usr/bin/chown -R nagios:nagios /var/nagiosramdisk (code=exited, status=0/SUCCESS)
   Main PID: 879 (code=exited, status=0/SUCCESS)
        CPU: 8ms
I can do immediate checks and 'Last Check' is updated accordingly. I also updated our Nagios development server and it is working fine and does not get these errors. Servers were recently migrated from CentOS to RHEL 9 (finished last July).

Any assistance resolving this issue would truly be appreciated. Please let me know if you need additional information.

Thank you and best regards,

Steve

Re: Monitoring Engine Not Working on DR Server (ramdisk errors)

Posted: Wed Oct 16, 2024 2:51 pm
by gwesterman
Hi @vornado,

Based on this thread, here are some troubleshooting ideas:
1. Do you have a /var/run/nagios folder?
2. If not, try restarting nagios and see if it shows up. If it still doesn't show up, try just creating it.
3. If yes, what are the permissions?

Let us know what you find.

Thank you!

Re: Monitoring Engine Not Working on DR Server (ramdisk errors)

Posted: Thu Oct 17, 2024 11:33 am
by vornado
Thank you for your reply, @gwesterman.

None of our servers have the /var/run/nagios folder - not even the ones that are working. But I went ahead and created it on our DR server and restarted Nagios. Still not working.

/var/run/nagios folder that I created:

drwxr-xr-x 2 nagios nagios 40 Oct 17 09:22 nagios


I also created a /ramdisk folder which got rid of a different error message that I was getting from systemctl status.

I made a few changes to get ride of other systemctl status warnings and errors, so I'm now down to this:

Code: Select all

[root@C155MNAG02 nagios]# systemctl status nagios
● nagios.service - Nagios Core 4.5.3
     Loaded: loaded (/usr/lib/systemd/system/nagios.service; disabled; preset: disabled)
     Active: active (running) since Thu 2024-10-17 11:47:17 EDT; 44min ago
       Docs: https://www.nagios.org/documentation
    Process: 337636 ExecStartPre=/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg (code=exited, status=0/SUCCESS)
    Process: 337637 ExecStart=/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg (code=exited, status=0/SUCCESS)
   Main PID: 337638 (nagios)
      Tasks: 6 (limit: 48799)
     Memory: 41.7M
        CPU: 5.779s
     CGroup: /system.slice/nagios.service
             ├─337638 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
             ├─337639 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
             ├─337640 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
             ├─337641 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
             ├─337642 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
             └─337657 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

Oct 17 11:47:17 C155MNAG02.vornadort.com nagios[337638]: qh: core query handler registered
Oct 17 11:47:17 C155MNAG02.vornadort.com nagios[337638]: qh: echo service query handler registered
Oct 17 11:47:17 C155MNAG02.vornadort.com nagios[337638]: qh: help for the query handler registered
Oct 17 11:47:17 C155MNAG02.vornadort.com nagios[337638]: wproc: Successfully registered manager as @wproc with query handler
Oct 17 11:47:17 C155MNAG02.vornadort.com nagios[337638]: wproc: Registry request: name=Core Worker 337639;pid=337639
Oct 17 11:47:17 C155MNAG02.vornadort.com nagios[337638]: wproc: Registry request: name=Core Worker 337641;pid=337641
Oct 17 11:47:17 C155MNAG02.vornadort.com nagios[337638]: wproc: Registry request: name=Core Worker 337642;pid=337642
Oct 17 11:47:17 C155MNAG02.vornadort.com nagios[337638]: wproc: Registry request: name=Core Worker 337640;pid=337640
Oct 17 11:47:17 C155MNAG02.vornadort.com nagios[337638]: Event broker module '/usr/local/nagios/bin/ndo.so' initialized successfully.
Oct 17 11:47:17 C155MNAG02.vornadort.com nagios[337638]: Successfully launched command file worker with pid 337657

Re: Monitoring Engine Not Working on DR Server (ramdisk errors)

Posted: Fri Oct 18, 2024 12:02 pm
by gwesterman
Hi @vornado,

As far as I can tell there are no errors in the codeblock you shared. Are you still having issues? I am not quite sure what still needs resolving.

Let me know if you are still looking for assistance or if your issue has been properly addressed.

Thank you!

Re: Monitoring Engine Not Working on DR Server (ramdisk errors)

Posted: Wed Oct 23, 2024 10:14 am
by vornado
Thanks for your help. I checked the server this morning and resolved a minor issue with a command file. Everything seems to be OK now. This topic can be closed, but I'm about to open another one :(