consistent start-stop script to nagiosxi after upgrade 5.5.1

junkertf · Post by **junkertf** » Fri Aug 17, 2018 7:05 am

Hello,

I experience an interesting situation after a succesfull upgrade from 5.4.12 to 5.5.1. (actuall 5.5.2 now)

On 5.4 i used the following swquence to stop the nagiosxi proccesses (and vice-versa, from the bottom to top, to start) on rhel7 platform:

service nagiosxi stop
sleep 3
service npcd stop
sleep 3
service ndo2db stop
sleep 3
service nagios stop
sleep 3
#service postgresql stop
service mariadb stop
sleep 3
service httpd stop

In 5.4 it is worked well. Few days ago i restarted our nagiosxi instance with that same script and i found, that some of the proccesses are not running well, irritating... (picture attached)

I tried to start it from the GUI with success, so currently i just not understand why is the failure.
Can you help me what i made wrong, or where i the failure find can?

Thank you, best regards,

Ferenc

ssax · Post by **ssax** » Fri Aug 17, 2018 12:40 pm

Please attach the output of these commands:

Code: Select all

ps aux | grep nagios.cfg
ipcs -q

I believe the init script (/etc/init.d/nagios) waits up to 90 seconds for the nagios process to stop, please see here, you can get duplicate processes if you don't wait until it's stopped:

Code: Select all

# now we have to wait for nagios to exit and remove its
		# own NagiosRunFile, otherwise a following "start" could
		# happen, and then the exiting nagios will remove the
		# new NagiosRunFile, allowing multiple nagios daemons
		# to (sooner or later) run - John Sellens
		#echo -n 'Waiting for nagios to exit .'
		for i in {1..90}; do
			if status_nagios > /dev/null; then
				echo -n '.'
				sleep 1
			else
				break
			fi
		done
		if status_nagios > /dev/null; then
			echo ""
			echo "Warning - nagios did not exit in a timely manner - Killing it!"
			killproc_nagios KILL
		else
			echo "done."
		fi

ssax · Post by **ssax** » Fri Aug 17, 2018 12:41 pm

In addition to my previous post, is NPCD running?

Code: Select all

service npcd status

If it's not running:

Code: Select all

service npcd start

junkertf · Post by **junkertf** » Mon Aug 20, 2018 11:40 pm

Hello,

On a running instance its seems:

[root@naigos ~]# service npcd status
NPCD running (pid 8423).
[root@nagios ~]# ps aux | grep nagios.cfg
nagios 8372 0.0 0.0 54044 5056 ? Ss Aug17 0:48 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 8380 0.0 0.0 53528 1376 ? S Aug17 0:10 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 25657 0.0 0.0 112708 996 pts/1 S+ 06:24 0:00 grep --color=auto nagios.cfg
[root@nagios ~]# ipcs -q
------ Message Queues --------
key msqid owner perms used-bytes messages
0xee000002 0 nagios 600 0 0
0x36000002 32769 nagios 600 0 0

after stop

[root@nagios ~]# service npcd status
NPCD not running.
[root@nagios ~]# ps aux | grep nagios.cfg
root 26253 0.0 0.0 112704 992 pts/1 S+ 06:26 0:00 grep --color=auto nagios.cfg
[root@nagios ~]# ipcs -q

------ Message Queues --------
key msqid owner perms used-bytes messages
0xee000002 0 nagios 600 0 0
0x36000002 32769 nagios 600 0 0

then starting the instance

[root@nagios ~]# /root/nagiosxi_full_start.sh
Redirecting to /bin/systemctl start httpd.service
Redirecting to /bin/systemctl start mariadb.service
Starting nagios: done.
Starting ndo2db (via systemctl): [ OK ]
NPCD started.
[root@nagios ~]# service npcd status
NPCD running (pid 27054).
[root@nagios ~]# ps aux | grep nagios.cfg
nagios 26917 0.0 0.0 54044 2952 ? Ss 06:28 0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 26931 0.0 0.0 53528 1376 ? S 06:28 0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 27197 0.0 0.0 112704 996 pts/1 S+ 06:28 0:00 grep --color=auto nagios.cfg
[root@nagios ~]# ipcs -q

------ Message Queues --------
key msqid owner perms used-bytes messages
0xee000002 0 nagios 600 0 0
0x36000002 32769 nagios 600 0 0
0xe3000002 65538 nagios 600 0 0

The GUI shows still that the monitoring engine not started, as well as performance grapher...
Also experience the problem attached in picture format...

Best regards,

Ferenc

junkertf · Post by **junkertf** » Tue Aug 21, 2018 1:46 am

Hello again,

That last error (check_icmp) is solved. (chowned and setuid changed)

The currently situation generates the other question, where the automatic updates of nagiosxi disabled can... (not the update available check!)

Best regards,

Ferenc

junkertf · Post by **junkertf** » Tue Aug 21, 2018 7:31 am

And hello again

Also found a reason for the newest version question!
Currently only my question for the Performance Grapher and Monitoring Engine are unanswered.

Best regards,

Ferenc

ssax · Post by **ssax** » Tue Aug 21, 2018 9:01 am

Looks like you have too many kernel message queues (ipcs -q, you should only have one), please run these commands to fix:

Code: Select all

service nagios stop
service ndo2db stop
for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
service ndo2db start
service nagios start

Please PM me a copy of your profile so that we can review your settings, you can download it from Admin > System Profile > Download Profile.

junkertf · Post by **junkertf** » Wed Aug 22, 2018 1:57 am

Hello,

I had sended you system profiles, issue experienced on our UAT (5.5.1) and PROD (5.5.2) environment also.

Tried to stop all the services and cleaning out ipcs processes, but the result is the same -> Performance grapher and Monitoring engine still not start correctly...

Thanks, best regards,

Ferenc

ssax · Post by **ssax** » Wed Aug 22, 2018 3:10 pm

Received, looking at them now.

ssax · Post by **ssax** » Wed Aug 22, 2018 3:22 pm

Please send a copy of your /etc/sudoers and the output of these commands:

Code: Select all

grep nag /etc/group
chage -l nagios
grep "User\|Group" /etc/httpd/conf/httpd.conf

Additionally, are your servers AD/LDAP integrated?

And just to clarify, the only issue is that it's showing as red in the interface for the component status, correct?

Nagios Support Forum

consistent start-stop script to nagiosxi after upgrade 5.5.1

consistent start-stop script to nagiosxi after upgrade 5.5.1

Re: consistent start-stop script to nagiosxi after upgrade 5

Re: consistent start-stop script to nagiosxi after upgrade 5

Re: consistent start-stop script to nagiosxi after upgrade 5

Re: consistent start-stop script to nagiosxi after upgrade 5

Re: consistent start-stop script to nagiosxi after upgrade 5

Re: consistent start-stop script to nagiosxi after upgrade 5

Re: consistent start-stop script to nagiosxi after upgrade 5

Re: consistent start-stop script to nagiosxi after upgrade 5

Re: consistent start-stop script to nagiosxi after upgrade 5