Sometimes Force immediate check fails
-
spcmidrange
- Posts: 47
- Joined: Fri Jun 15, 2012 12:54 pm
Sometimes Force immediate check fails
Good Day
This one is strange. sometimes the "force immediate check" doesn't work and sometimes it does.
Here is a snip from the nagios.log
[1598978422] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;IDM Service - WebServer;1598978419
[1598978436] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/var\/lib\/ipa\/backup Directory;1598978434
[1598978436] Error: External command failed -> SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/var\/lib\/ipa\/backup Directory;1598978434
[1598978436] External command [1598978436] SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/var\/lib\/ipa\/backup Directory;1598978434 returned error Command failed
[1598978453] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/var\/lib\/ipa\/backup Directory;1598978444
[1598978453] Error: External command failed -> SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/var\/lib\/ipa\/backup Directory;1598978444
[1598978453] External command [1598978453] SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/var\/lib\/ipa\/backup Directory;1598978444 returned error Command failed
[1598978459] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/boot Directory;1598978457
[1598978459] Error: External command failed -> SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/boot Directory;1598978457
[1598978459] External command [1598978459] SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/boot Directory;1598978457 returned error Command failed
[1598978466] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;IDM Service - WebServer;1598978464
Disk space checks return command failed, but they work just fine on regular scheduled 5 min intervals. The "IDM Service" one is just a check_procs and it works just fine
Nagios XI 5.6.10 on RedHat 7.8
This one is strange. sometimes the "force immediate check" doesn't work and sometimes it does.
Here is a snip from the nagios.log
[1598978422] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;IDM Service - WebServer;1598978419
[1598978436] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/var\/lib\/ipa\/backup Directory;1598978434
[1598978436] Error: External command failed -> SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/var\/lib\/ipa\/backup Directory;1598978434
[1598978436] External command [1598978436] SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/var\/lib\/ipa\/backup Directory;1598978434 returned error Command failed
[1598978453] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/var\/lib\/ipa\/backup Directory;1598978444
[1598978453] Error: External command failed -> SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/var\/lib\/ipa\/backup Directory;1598978444
[1598978453] External command [1598978453] SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/var\/lib\/ipa\/backup Directory;1598978444 returned error Command failed
[1598978459] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/boot Directory;1598978457
[1598978459] Error: External command failed -> SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/boot Directory;1598978457
[1598978459] External command [1598978459] SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;\/boot Directory;1598978457 returned error Command failed
[1598978466] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;IDM Service - WebServer;1598978464
Disk space checks return command failed, but they work just fine on regular scheduled 5 min intervals. The "IDM Service" one is just a check_procs and it works just fine
Nagios XI 5.6.10 on RedHat 7.8
Re: Sometimes Force immediate check fails
The \ in the commands should not be there unless your services have those \ in the name. Does your service have \ in the name?
You should see this instead:
You should see this instead:
Code: Select all
EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;lp-dc1-auth1;/var/lib/ipa/backup Directory;1598978434-
spcmidrange
- Posts: 47
- Joined: Fri Jun 15, 2012 12:54 pm
Re: Sometimes Force immediate check fails
Apologizes,forgot about this post i did haha
No this '\' are not part of the service_description
[root@lp-dc1-nagp1 ~]# cat /usr/local/nagios/etc/services/lp-dc1-auth1.cfg | grep backup
</snip>
No this '\' are not part of the service_description
[root@lp-dc1-nagp1 ~]# cat /usr/local/nagios/etc/services/lp-dc1-auth1.cfg | grep backup
</snip>
Code: Select all
define service {
host_name lp-dc1-auth1
service_description /var/lib/ipa/backup Directory
use aixadmin-service
check_command check_remote_ssl!check_backup!!!!!!!
max_check_attempts 3
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
notification_interval 120
notification_period xi_timeperiod_24x7
_xiwizard linux-server
register 1
}
Re: Sometimes Force immediate check fails
I really think this is likely because of the \ character not being stripped on submission.
The ones that work probably don't have the / character in them being escaped which is causing the \.
What exact version of XI are you using?
Please PM a copy of your profile, you can download it from Admin > System Profile > Download Profile.
I labbed this up with the same service name and it's working properly on XI 5.7.2 so I'll need to know which XI version you're on.
Are you submitting them through the API or web interface?
The ones that work probably don't have the / character in them being escaped which is causing the \.
What exact version of XI are you using?
Please PM a copy of your profile, you can download it from Admin > System Profile > Download Profile.
I labbed this up with the same service name and it's working properly on XI 5.7.2 so I'll need to know which XI version you're on.
Are you submitting them through the API or web interface?
-
spcmidrange
- Posts: 47
- Joined: Fri Jun 15, 2012 12:54 pm
Re: Sometimes Force immediate check fails
Thank you for the reply
So we have 2 nagios servers in our environment, Both are at 5.6.10. This server specifically was updated from 5.2.5 to 5.6.10 and we have seen this issue ever since. The other server wasnt that old, went from 5.5.2 to 5.6.10, and i do not see the same issue with it. Dont know if that matters but thought i would mention it
This is all done via the web interface.
I have PM'd you the profile
Thank you
So we have 2 nagios servers in our environment, Both are at 5.6.10. This server specifically was updated from 5.2.5 to 5.6.10 and we have seen this issue ever since. The other server wasnt that old, went from 5.5.2 to 5.6.10, and i do not see the same issue with it. Dont know if that matters but thought i would mention it
This is all done via the web interface.
I have PM'd you the profile
Thank you
Re: Sometimes Force immediate check fails
Please send me this file from the non-working system:
Open up firefox and pull up the /var/lib/ipa/backup Directory service.
Hit the F12 button, click the Network Tab.
Now click the Force immediate check button and you should see a bunch of URLs popup, click on the one that shows:
Then right click on it, choose Copy > Copy URL and PM that info to me as well.
Code: Select all
/usr/local/nagiosxi/html/includes/components/xicore/status-utils.inc.phpHit the F12 button, click the Network Tab.
Now click the Force immediate check button and you should see a bunch of URLs popup, click on the one that shows:
Code: Select all
ajaxhelper.php?cmd=submitcommands-
spcmidrange
- Posts: 47
- Joined: Fri Jun 15, 2012 12:54 pm
Re: Sometimes Force immediate check fails
Info Sent
Thank you for your help
Thank you for your help
Re: Sometimes Force immediate check fails
All that looks proper.
I think you need to do this:
https://support.nagios.com/kb/article/n ... r-754.html
I'm labbing it up with postgresql to test but I'm pretty sure that's the resolution for this.
I think you need to do this:
https://support.nagios.com/kb/article/n ... r-754.html
I'm labbing it up with postgresql to test but I'm pretty sure that's the resolution for this.
-
spcmidrange
- Posts: 47
- Joined: Fri Jun 15, 2012 12:54 pm
Re: Sometimes Force immediate check fails
Well I dont see the same error when I force check this specific service, but i do see simlar error that the KBA talks about:
I will follow the KBA suggestion and plan a time to impliment. Am I going to have to do a full restart of all things Nagios, or just bounce pgsql?
Thank you!
Code: Select all
ERROR: syntax error at or near "stopped" at character 282
STATEMENT: UPDATE xi_sysstat SET value='a:3:{s:10:"nagioscore";a:4:{s:6:"daemon";s:6:"nagios";s:6:"output";s:195:" └─30957 /usr/local/nagios/libexec/check_nrpe -n -2 -H msg-dc2-exch-1.saskpower.sk.ca -p 5666 -t 30 -c check_service -a service=MSExchangeSubmission crit=state = \'stopped\' warn=none";s:11:"return_code";i:0;s:6:"status";i:0;}s:3:"pnp";a:4:{s:6:"daemon";s:4:"npcd";s:6:"output";s:89:" └─4303 /usr/local/nagios/bin/npcd -d -f /usr/local/nagios/etc/pnp/npcd.cfg";s:11:"return_code";i:0;s:6:"status";i:0;}s:8:"ndoutils";a:4:{s:6:"daemon";s:6:"ndo2db";s:6:"output";s:90:" └─21383 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg -f";s:11:"return_code";i:0;s:6:"status";i:0;}}', update_time=NOW() WHERE metric='daemons'
Thank you!
Re: Sometimes Force immediate check fails
I labbed it up and was able to replicate the issue with it putting the \ in there. It fixed it by doing what the guide says and editing this file:
Changing this:
To this:
Then restart postgresql:
If you see errors in the web interface (you likely will will postfix is restarted and the app is able to connect to the DB) try logging out and back in again or closing the browser and going back into it.
Code: Select all
/var/lib/pgsql/data/postgresql.confCode: Select all
#standard_conforming_strings = onCode: Select all
standard_conforming_strings = offCode: Select all
service postgresql restart