Page 1 of 2

Sometimes Force immediate check fails

Posted: Tue Sep 01, 2020 11:47 am
by spcmidrange
Good Day

This one is strange. sometimes the "force immediate check" doesn't work and sometimes it does.

Here is a snip from the nagios.log

[1598978422] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;IDM Service - WebServer;1598978419

[1598978436] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/var\/lib\/ipa\/backup Directory;1598978434
[1598978436] Error: External command failed -> SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/var\/lib\/ipa\/backup Directory;1598978434
[1598978436] External command [1598978436] SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/var\/lib\/ipa\/backup Directory;1598978434 returned error Command failed

[1598978453] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/var\/lib\/ipa\/backup Directory;1598978444
[1598978453] Error: External command failed -> SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/var\/lib\/ipa\/backup Directory;1598978444
[1598978453] External command [1598978453] SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/var\/lib\/ipa\/backup Directory;1598978444 returned error Command failed

[1598978459] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/boot Directory;1598978457
[1598978459] Error: External command failed -> SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/boot Directory;1598978457
[1598978459] External command [1598978459] SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/boot Directory;1598978457 returned error Command failed


[1598978466] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;IDM Service - WebServer;1598978464

Disk space checks return command failed, but they work just fine on regular scheduled 5 min intervals. The "IDM Service" one is just a check_procs and it works just fine

Nagios XI 5.6.10 on RedHat 7.8

Re: Sometimes Force immediate check fails

Posted: Tue Sep 01, 2020 5:03 pm
by ssax
The \ in the commands should not be there unless your services have those \ in the name. Does your service have \ in the name?

You should see this instead:

Code: Select all

EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;/var/lib/ipa/backup Directory;1598978434

Re: Sometimes Force immediate check fails

Posted: Fri Sep 11, 2020 4:30 pm
by spcmidrange
Apologizes,forgot about this post i did haha

No this '\' are not part of the service_description

[root@lp-dc1-nagp1 ~]# cat /usr/local/nagios/etc/services/lp-dc1-auth1.cfg | grep backup
</snip>

Code: Select all

define service {
    host_name                lp-dc1-auth1
    service_description      /var/lib/ipa/backup Directory
    use                      aixadmin-service
    check_command            check_remote_ssl!check_backup!!!!!!!
    max_check_attempts       3
    check_interval           5
    retry_interval           1
    check_period             xi_timeperiod_24x7
    notification_interval    120
    notification_period      xi_timeperiod_24x7
    _xiwizard                linux-server
    register                 1
}

Re: Sometimes Force immediate check fails

Posted: Mon Sep 14, 2020 4:33 pm
by ssax
I really think this is likely because of the \ character not being stripped on submission.

The ones that work probably don't have the / character in them being escaped which is causing the \.

What exact version of XI are you using?

Please PM a copy of your profile, you can download it from Admin > System Profile > Download Profile.

I labbed this up with the same service name and it's working properly on XI 5.7.2 so I'll need to know which XI version you're on.

Are you submitting them through the API or web interface?

Re: Sometimes Force immediate check fails

Posted: Tue Sep 15, 2020 4:52 pm
by spcmidrange
Thank you for the reply

So we have 2 nagios servers in our environment, Both are at 5.6.10. This server specifically was updated from 5.2.5 to 5.6.10 and we have seen this issue ever since. The other server wasnt that old, went from 5.5.2 to 5.6.10, and i do not see the same issue with it. Dont know if that matters but thought i would mention it

This is all done via the web interface.

I have PM'd you the profile

Thank you

Re: Sometimes Force immediate check fails

Posted: Thu Sep 17, 2020 2:19 pm
by ssax
Please send me this file from the non-working system:

Code: Select all

/usr/local/nagiosxi/html/includes/components/xicore/status-utils.inc.php
Open up firefox and pull up the /var/lib/ipa/backup Directory service.

Hit the F12 button, click the Network Tab.

Now click the Force immediate check button and you should see a bunch of URLs popup, click on the one that shows:

Code: Select all

ajaxhelper.php?cmd=submitcommands
Then right click on it, choose Copy > Copy URL and PM that info to me as well.

Re: Sometimes Force immediate check fails

Posted: Fri Sep 18, 2020 1:52 pm
by spcmidrange
Info Sent

Thank you for your help

Re: Sometimes Force immediate check fails

Posted: Mon Sep 21, 2020 2:03 pm
by ssax
All that looks proper.

I think you need to do this:

https://support.nagios.com/kb/article/n ... r-754.html

I'm labbing it up with postgresql to test but I'm pretty sure that's the resolution for this.

Re: Sometimes Force immediate check fails

Posted: Tue Sep 22, 2020 11:14 am
by spcmidrange
Well I dont see the same error when I force check this specific service, but i do see simlar error that the KBA talks about:

Code: Select all

ERROR:  syntax error at or near "stopped" at character 282
STATEMENT:  UPDATE xi_sysstat SET value='a:3:{s:10:"nagioscore";a:4:{s:6:"daemon";s:6:"nagios";s:6:"output";s:195:"           └─30957 /usr/local/nagios/libexec/check_nrpe -n -2 -H msg-dc2-exch-1.saskpower.sk.ca -p 5666 -t 30 -c check_service -a service=MSExchangeSubmission crit=state = \'stopped\' warn=none";s:11:"return_code";i:0;s:6:"status";i:0;}s:3:"pnp";a:4:{s:6:"daemon";s:4:"npcd";s:6:"output";s:89:"           └─4303 /usr/local/nagios/bin/npcd -d -f /usr/local/nagios/etc/pnp/npcd.cfg";s:11:"return_code";i:0;s:6:"status";i:0;}s:8:"ndoutils";a:4:{s:6:"daemon";s:6:"ndo2db";s:6:"output";s:90:"           └─21383 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg -f";s:11:"return_code";i:0;s:6:"status";i:0;}}', update_time=NOW() WHERE metric='daemons'
I will follow the KBA suggestion and plan a time to impliment. Am I going to have to do a full restart of all things Nagios, or just bounce pgsql?

Thank you!

Re: Sometimes Force immediate check fails

Posted: Wed Sep 23, 2020 4:27 pm
by ssax
I labbed it up and was able to replicate the issue with it putting the \ in there. It fixed it by doing what the guide says and editing this file:

Code: Select all

/var/lib/pgsql/data/postgresql.conf
Changing this:

Code: Select all

#standard_conforming_strings = on
To this:

Code: Select all

standard_conforming_strings = off
Then restart postgresql:

Code: Select all

service postgresql restart
If you see errors in the web interface (you likely will will postfix is restarted and the app is able to connect to the DB) try logging out and back in again or closing the browser and going back into it.