Page 1 of 1

Very weird check result issue after server migration

Posted: Wed Feb 27, 2019 9:02 am
by WillemDH
Hello,

We more or less successfully migrated our Nagios server today from CentOS 6 to CentOS 7. But We are experiencing a very weird issue with some checks using a custom plugin of ours. When we dit this plugin and add some string, like "TEST" in the output, this is not reflected in the output in Nagios XI.

TESTED FROM XI UI:

Code: Select all

019-02-27 14:55:43,584: check_curl_result.sh: Info: Debug mode
2019-02-27 14:55:43,591: check_curl_result.sh: Info: Curl action sbcAlarms started 
2019-02-27 14:55:43,597: check_curl_result.sh: Info: Executing CheckSBCAlarms with url https://hostname.domain/api/v1/alarms/active 
2019-02-27 14:55:43,921: check_curl_result.sh: Info: Result: curl: (18) transfer closed with outstanding read data remaining

HTTP Status: 204, Url: https://srvtelsbcqa01.gentgrp.gent.be/api/v1/alarms/active, Stats: 0 bytes in 0.307 second response time
TESTED FROM CLI:

Code: Select all

/usr/local/nagios/libexec/check_curl_result.sh --url https://hostname.domain/api/v1/alarms/active --type sbcAlarms --user Admin --password password
2019-02-27 14:59:29,921: check_curl_result.sh: Info: TEST Executing TEST CheckSBCAlarms with url https://hostname.domain/api/v1/alarms/active
2019-02-27 14:59:30,192: check_curl_result.sh: Info: Result:
HTTP Status: 204, Url: https://hostname.domain/api/v1/alarms/active, Stats: 0 bytes in 0.259 second response time,  | time=0.259s time_first_byte=0.259s time_tcp_connect=0.006s size=0B,
2019-02-27 14:59:30,197: check_curl_result.sh: Info: Verifying Curl exitcode 0
2019-02-27 14:59:30,203: check_curl_result.sh: Info: Verifying HTTP Status Code
2019-02-27 14:59:30,210: check_curl_result.sh: Info: Output: TEST
2019-02-27 14:59:30,215: check_curl_result.sh: Info: Curl action sbcAlarms finished. Exitcode 0
OK: TEST
What could cause this behaviour. It's like the service has cached an old version of the plugin?? Is this even possible..? Already did a delete, write, verify, restart..

Any advice is welcome

Re: Very weird check result issue after server migration

Posted: Wed Feb 27, 2019 11:16 am
by SteveBeauchemin
Multi-line output expected right?

I had a similar issue when I upgraded 5.x. Not OS related. The output of my scripts which I would normally see in the Nagios GUI Status Information all became a single line. Anything after the CR/LF was gone. It turns out that I needed to use <BR> in my script to see multiple lines in the GUI. I had been using \n before that. The data is not actually gone, and can be seen in the Macro $LONGSERVICEOUTPUT$. At least my data was there.

I do not know if that is what you are experiencing, but that is what I changed to see extra data in my setup.

Steve B

Re: Very weird check result issue after server migration

Posted: Wed Feb 27, 2019 3:50 pm
by cdienger
Are you referring to not seeing the word "TEST" in the output? Like you should see "TEST" in the line "Curl action sbcAlarms started". I've tried various things to try and reproduce but without luck so far. What happens if you move the script out of the directory or make it non executable? Is the command under Configure > Core config manager > Commands > _Commands, pointed to the right script? Try creating a new temporary command and point it at the script and see if you get the same behavior with the new command.

Re: Very weird check result issue after server migration

Posted: Thu Feb 28, 2019 6:50 am
by WillemDH
Found the issue. After changing the command, it became apparent that these services had a hostgroup configured, which set a hostgroup which is used to assign the service to a gearman worker node. After cloning the used template and removing the gearman hostgroup from it, the issue is resolved. This ticket can be closed.

Thanks all for thinking with me!

Re: Very weird check result issue after server migration

Posted: Thu Feb 28, 2019 11:26 am
by scottwilkerson
WillemDH wrote:Found the issue. After changing the command, it became apparent that these services had a hostgroup configured, which set a hostgroup which is used to assign the service to a gearman worker node. After cloning the used template and removing the gearman hostgroup from it, the issue is resolved. This ticket can be closed.

Thanks all for thinking with me!
Great!

Locking thread