Sometimes Force immediate check fails

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
spcmidrange
Posts: 47
Joined: Fri Jun 15, 2012 12:54 pm

Sometimes Force immediate check fails

Post by spcmidrange »

Good Day

This one is strange. sometimes the "force immediate check" doesn't work and sometimes it does.

Here is a snip from the nagios.log

[1598978422] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;IDM Service - WebServer;1598978419

[1598978436] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/var\/lib\/ipa\/backup Directory;1598978434
[1598978436] Error: External command failed -> SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/var\/lib\/ipa\/backup Directory;1598978434
[1598978436] External command [1598978436] SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/var\/lib\/ipa\/backup Directory;1598978434 returned error Command failed

[1598978453] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/var\/lib\/ipa\/backup Directory;1598978444
[1598978453] Error: External command failed -> SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/var\/lib\/ipa\/backup Directory;1598978444
[1598978453] External command [1598978453] SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/var\/lib\/ipa\/backup Directory;1598978444 returned error Command failed

[1598978459] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/boot Directory;1598978457
[1598978459] Error: External command failed -> SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/boot Directory;1598978457
[1598978459] External command [1598978459] SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/boot Directory;1598978457 returned error Command failed


[1598978466] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;IDM Service - WebServer;1598978464

Disk space checks return command failed, but they work just fine on regular scheduled 5 min intervals. The "IDM Service" one is just a check_procs and it works just fine

Nagios XI 5.6.10 on RedHat 7.8
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Sometimes Force immediate check fails

Post by ssax »

The \ in the commands should not be there unless your services have those \ in the name. Does your service have \ in the name?

You should see this instead:

Code: Select all

EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;/var/lib/ipa/backup Directory;1598978434
spcmidrange
Posts: 47
Joined: Fri Jun 15, 2012 12:54 pm

Re: Sometimes Force immediate check fails

Post by spcmidrange »

Apologizes,forgot about this post i did haha

No this '\' are not part of the service_description

[root@lp-dc1-nagp1 ~]# cat /usr/local/nagios/etc/services/lp-dc1-auth1.cfg | grep backup
</snip>

Code: Select all

define service {
    host_name                lp-dc1-auth1
    service_description      /var/lib/ipa/backup Directory
    use                      aixadmin-service
    check_command            check_remote_ssl!check_backup!!!!!!!
    max_check_attempts       3
    check_interval           5
    retry_interval           1
    check_period             xi_timeperiod_24x7
    notification_interval    120
    notification_period      xi_timeperiod_24x7
    _xiwizard                linux-server
    register                 1
}
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Sometimes Force immediate check fails

Post by ssax »

I really think this is likely because of the \ character not being stripped on submission.

The ones that work probably don't have the / character in them being escaped which is causing the \.

What exact version of XI are you using?

Please PM a copy of your profile, you can download it from Admin > System Profile > Download Profile.

I labbed this up with the same service name and it's working properly on XI 5.7.2 so I'll need to know which XI version you're on.

Are you submitting them through the API or web interface?
spcmidrange
Posts: 47
Joined: Fri Jun 15, 2012 12:54 pm

Re: Sometimes Force immediate check fails

Post by spcmidrange »

Thank you for the reply

So we have 2 nagios servers in our environment, Both are at 5.6.10. This server specifically was updated from 5.2.5 to 5.6.10 and we have seen this issue ever since. The other server wasnt that old, went from 5.5.2 to 5.6.10, and i do not see the same issue with it. Dont know if that matters but thought i would mention it

This is all done via the web interface.

I have PM'd you the profile

Thank you
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Sometimes Force immediate check fails

Post by ssax »

Please send me this file from the non-working system:

Code: Select all

/usr/local/nagiosxi/html/includes/components/xicore/status-utils.inc.php
Open up firefox and pull up the /var/lib/ipa/backup Directory service.

Hit the F12 button, click the Network Tab.

Now click the Force immediate check button and you should see a bunch of URLs popup, click on the one that shows:

Code: Select all

ajaxhelper.php?cmd=submitcommands
Then right click on it, choose Copy > Copy URL and PM that info to me as well.
spcmidrange
Posts: 47
Joined: Fri Jun 15, 2012 12:54 pm

Re: Sometimes Force immediate check fails

Post by spcmidrange »

Info Sent

Thank you for your help
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Sometimes Force immediate check fails

Post by ssax »

All that looks proper.

I think you need to do this:

https://support.nagios.com/kb/article/n ... r-754.html

I'm labbing it up with postgresql to test but I'm pretty sure that's the resolution for this.
spcmidrange
Posts: 47
Joined: Fri Jun 15, 2012 12:54 pm

Re: Sometimes Force immediate check fails

Post by spcmidrange »

Well I dont see the same error when I force check this specific service, but i do see simlar error that the KBA talks about:

Code: Select all

ERROR:  syntax error at or near "stopped" at character 282
STATEMENT:  UPDATE xi_sysstat SET value='a:3:{s:10:"nagioscore";a:4:{s:6:"daemon";s:6:"nagios";s:6:"output";s:195:"           └─30957 /usr/local/nagios/libexec/check_nrpe -n -2 -H msg-dc2-exch-1.saskpower.sk.ca -p 5666 -t 30 -c check_service -a service=MSExchangeSubmission crit=state = \'stopped\' warn=none";s:11:"return_code";i:0;s:6:"status";i:0;}s:3:"pnp";a:4:{s:6:"daemon";s:4:"npcd";s:6:"output";s:89:"           └─4303 /usr/local/nagios/bin/npcd -d -f /usr/local/nagios/etc/pnp/npcd.cfg";s:11:"return_code";i:0;s:6:"status";i:0;}s:8:"ndoutils";a:4:{s:6:"daemon";s:6:"ndo2db";s:6:"output";s:90:"           └─21383 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg -f";s:11:"return_code";i:0;s:6:"status";i:0;}}', update_time=NOW() WHERE metric='daemons'
I will follow the KBA suggestion and plan a time to impliment. Am I going to have to do a full restart of all things Nagios, or just bounce pgsql?

Thank you!
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Sometimes Force immediate check fails

Post by ssax »

I labbed it up and was able to replicate the issue with it putting the \ in there. It fixed it by doing what the guide says and editing this file:

Code: Select all

/var/lib/pgsql/data/postgresql.conf
Changing this:

Code: Select all

#standard_conforming_strings = on
To this:

Code: Select all

standard_conforming_strings = off
Then restart postgresql:

Code: Select all

service postgresql restart
If you see errors in the web interface (you likely will will postfix is restarted and the app is able to connect to the DB) try logging out and back in again or closing the browser and going back into it.
Locked