Page 1 of 2
consistent start-stop script to nagiosxi after upgrade 5.5.1
Posted: Fri Aug 17, 2018 7:05 am
by junkertf
Hello,
I experience an interesting situation after a succesfull upgrade from 5.4.12 to 5.5.1. (actuall 5.5.2 now)
On 5.4 i used the following swquence to stop the nagiosxi proccesses (and vice-versa, from the bottom to top, to start) on rhel7 platform:
service nagiosxi stop
sleep 3
service npcd stop
sleep 3
service ndo2db stop
sleep 3
service nagios stop
sleep 3
#service postgresql stop
service mariadb stop
sleep 3
service httpd stop
In 5.4 it is worked well. Few days ago i restarted our nagiosxi instance with that same script and i found, that some of the proccesses are not running well, irritating... (picture attached)
I tried to start it from the GUI with success, so currently i just not understand why is the failure.
Can you help me what i made wrong, or where i the failure find can?
Thank you, best regards,
Ferenc
Re: consistent start-stop script to nagiosxi after upgrade 5
Posted: Fri Aug 17, 2018 12:40 pm
by ssax
Please attach the output of these commands:
I believe the init script (/etc/init.d/nagios) waits up to 90 seconds for the nagios process to stop, please see here, you can get duplicate processes if you don't wait until it's stopped:
Code: Select all
# now we have to wait for nagios to exit and remove its
# own NagiosRunFile, otherwise a following "start" could
# happen, and then the exiting nagios will remove the
# new NagiosRunFile, allowing multiple nagios daemons
# to (sooner or later) run - John Sellens
#echo -n 'Waiting for nagios to exit .'
for i in {1..90}; do
if status_nagios > /dev/null; then
echo -n '.'
sleep 1
else
break
fi
done
if status_nagios > /dev/null; then
echo ""
echo "Warning - nagios did not exit in a timely manner - Killing it!"
killproc_nagios KILL
else
echo "done."
fi
Re: consistent start-stop script to nagiosxi after upgrade 5
Posted: Fri Aug 17, 2018 12:41 pm
by ssax
In addition to my previous post, is NPCD running?
If it's not running:
Re: consistent start-stop script to nagiosxi after upgrade 5
Posted: Mon Aug 20, 2018 11:40 pm
by junkertf
Hello,
On a running instance its seems:
[root@naigos ~]# service npcd status
NPCD running (pid 8423).
[root@nagios ~]# ps aux | grep nagios.cfg
nagios 8372 0.0 0.0 54044 5056 ? Ss Aug17 0:48 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 8380 0.0 0.0 53528 1376 ? S Aug17 0:10 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 25657 0.0 0.0 112708 996 pts/1 S+ 06:24 0:00 grep --color=auto nagios.cfg
[root@nagios ~]# ipcs -q
------ Message Queues --------
key msqid owner perms used-bytes messages
0xee000002 0 nagios 600 0 0
0x36000002 32769 nagios 600 0 0
after stop
[root@nagios ~]# service npcd status
NPCD not running.
[root@nagios ~]# ps aux | grep nagios.cfg
root 26253 0.0 0.0 112704 992 pts/1 S+ 06:26 0:00 grep --color=auto nagios.cfg
[root@nagios ~]# ipcs -q
------ Message Queues --------
key msqid owner perms used-bytes messages
0xee000002 0 nagios 600 0 0
0x36000002 32769 nagios 600 0 0
then starting the instance
[root@nagios ~]# /root/nagiosxi_full_start.sh
Redirecting to /bin/systemctl start httpd.service
Redirecting to /bin/systemctl start mariadb.service
Starting nagios: done.
Starting ndo2db (via systemctl): [ OK ]
NPCD started.
[root@nagios ~]# service npcd status
NPCD running (pid 27054).
[root@nagios ~]# ps aux | grep nagios.cfg
nagios 26917 0.0 0.0 54044 2952 ? Ss 06:28 0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 26931 0.0 0.0 53528 1376 ? S 06:28 0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 27197 0.0 0.0 112704 996 pts/1 S+ 06:28 0:00 grep --color=auto nagios.cfg
[root@nagios ~]# ipcs -q
------ Message Queues --------
key msqid owner perms used-bytes messages
0xee000002 0 nagios 600 0 0
0x36000002 32769 nagios 600 0 0
0xe3000002 65538 nagios 600 0 0
The GUI shows still that the monitoring engine not started, as well as performance grapher...
Also experience the problem attached in picture format...
Best regards,
Ferenc
Re: consistent start-stop script to nagiosxi after upgrade 5
Posted: Tue Aug 21, 2018 1:46 am
by junkertf
Hello again,
That last error (check_icmp) is solved. (chowned and setuid changed)
The currently situation generates the other question, where the automatic updates of nagiosxi disabled can... (not the update available check!)
Best regards,
Ferenc
Re: consistent start-stop script to nagiosxi after upgrade 5
Posted: Tue Aug 21, 2018 7:31 am
by junkertf
And hello again
Also found a reason for the newest version question!
Currently only my question for the Performance Grapher and Monitoring Engine are unanswered.
Best regards,
Ferenc
Re: consistent start-stop script to nagiosxi after upgrade 5
Posted: Tue Aug 21, 2018 9:01 am
by ssax
Looks like you have too many kernel message queues (ipcs -q, you should only have one), please run these commands to fix:
Code: Select all
service nagios stop
service ndo2db stop
for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
service ndo2db start
service nagios start
Please PM me a copy of your profile so that we can review your settings, you can download it from Admin > System Profile > Download Profile.
Re: consistent start-stop script to nagiosxi after upgrade 5
Posted: Wed Aug 22, 2018 1:57 am
by junkertf
Hello,
I had sended you system profiles, issue experienced on our UAT (5.5.1) and PROD (5.5.2) environment also.
Tried to stop all the services and cleaning out ipcs processes, but the result is the same -> Performance grapher and Monitoring engine still not start correctly...
Thanks, best regards,
Ferenc
Re: consistent start-stop script to nagiosxi after upgrade 5
Posted: Wed Aug 22, 2018 3:10 pm
by ssax
Received, looking at them now.
Re: consistent start-stop script to nagiosxi after upgrade 5
Posted: Wed Aug 22, 2018 3:22 pm
by ssax
Please send a copy of your /etc/sudoers and the output of these commands:
Code: Select all
grep nag /etc/group
chage -l nagios
grep "User\|Group" /etc/httpd/conf/httpd.conf
Additionally, are your servers AD/LDAP integrated?
And just to clarify, the only issue is that it's showing as red in the interface for the component status, correct?