Page 1 of 2
Cut over to CentOS 7 this morning, Nagios specific errors
Posted: Thu Oct 17, 2019 11:38 am
by rferebee
Good morning, I cut over my Nagios XI servers from CentOS 6 to CentOS 7 this morning.
Everything went really smoothly, but I'm seeing one issue I can't figure out. Every check-in cycle these services in the picture below go critical, but as soon as I force them to check-in again they report as OK.
They keep going flipping back and forth from critical to OK, not sure why.
Any ideas I can try?
Re: Cut over to CentOS 7 this morning, Nagios specific error
Posted: Thu Oct 17, 2019 11:49 am
by rferebee
Also, my server just started sending out hundreds of "Flapping Stopped" notifications. Is there any way to clear out whatever queue those are in, so they don't get sent out?
Re: Cut over to CentOS 7 this morning, Nagios specific error
Posted: Thu Oct 17, 2019 11:55 am
by benjaminsmith
Hello,
You can run the following command to clear the events queue.
Code: Select all
echo "truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;" | mysql -uroot -pnagiosxi nagiosxi
Regarding the other issue, I believe the script is having trouble parsing the output from systemctl. Can you send me the profile, so I can try to verify this in the logs? Thanks.
Re: Cut over to CentOS 7 this morning, Nagios specific error
Posted: Thu Oct 17, 2019 11:59 am
by rferebee
PM sent with profile, thank you.
I truncated the DBs, but it's still sending out notifications. Could the ones I'm getting be delayed from earlier in the morning? The ones I'm getting right now are shown to be from a little over a minute ago in the console.
Re: Cut over to CentOS 7 this morning, Nagios specific error
Posted: Thu Oct 17, 2019 12:58 pm
by benjaminsmith
Hello,
Have you turned off notifications yet? You'll want to turn off notifications and then clear the event queue.
Regarding the 'could not parse XML error', this will be patched in the next release. To correct, replace (make a backup of you existing file) the manage_services.sh script with the one attached. It's in the/usr/local/nagiosxi/scripts directory.
Once uploaded, make it executable chmod +x and change the group permissions to chown root:nagios
Re: Cut over to CentOS 7 this morning, Nagios specific error
Posted: Thu Oct 17, 2019 1:05 pm
by rferebee
Do I need to restart any services after I make these changes?
I found this solution in a previous support thread:
Code: Select all
service nagios stop
service ndo2db stop
service crond stop
service postgresql restart
pkill -9 -u nagios
echo "truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;" | psql nagiosxi nagiosxi
service crond start
service ndo2db start
service nagios start
service npcd restart
Re: Cut over to CentOS 7 this morning, Nagios specific error
Posted: Thu Oct 17, 2019 1:40 pm
by benjaminsmith
Hello
@rferebee,
In this case it shouldn't be necessary, but if you're trying to clear any alerts, it doesn't hurt to kill off all the processes and restart the services.
Re: Cut over to CentOS 7 this morning, Nagios specific error
Posted: Thu Oct 17, 2019 1:42 pm
by rferebee
Ok, I'll keep that in mind.
Everything appears to be stable at the moment, but if we could please keep this open for a day or so, I'd appreciate it.
Re: Cut over to CentOS 7 this morning, Nagios specific error
Posted: Thu Oct 17, 2019 1:57 pm
by benjaminsmith
Hello
@rferebee,
Everything appears to be stable at the moment, but if we could please keep this open for a day or so, I'd appreciate it.
No problem. We'll keep this open.
Re: Cut over to CentOS 7 this morning, Nagios specific error
Posted: Fri Oct 18, 2019 11:44 am
by rferebee
Everything appears to be running smoothly this morning. No issues last night.
I do have one question though, hopefully you can assist. Prior to the cut over we would receive email notifications whenever our backup XI server would perform a failover restore. This occurs daily for us at 9AM. Well, the server is doing the failover restore, but it's not sending out the notifications telling us that's it's doing it. I must have missed a configuration file somewhere, but I'm not sure where to look.
If I PM'd you an example of the notifications we used to get, do you think you'd be able to identify what mechanism would trigger it?