Very weird check result issue after server migration

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Very weird check result issue after server migration

Post by WillemDH »

Hello,

We more or less successfully migrated our Nagios server today from CentOS 6 to CentOS 7. But We are experiencing a very weird issue with some checks using a custom plugin of ours. When we dit this plugin and add some string, like "TEST" in the output, this is not reflected in the output in Nagios XI.

TESTED FROM XI UI:

Code: Select all

019-02-27 14:55:43,584: check_curl_result.sh: Info: Debug mode
2019-02-27 14:55:43,591: check_curl_result.sh: Info: Curl action sbcAlarms started 
2019-02-27 14:55:43,597: check_curl_result.sh: Info: Executing CheckSBCAlarms with url https://hostname.domain/api/v1/alarms/active 
2019-02-27 14:55:43,921: check_curl_result.sh: Info: Result: curl: (18) transfer closed with outstanding read data remaining

HTTP Status: 204, Url: https://srvtelsbcqa01.gentgrp.gent.be/api/v1/alarms/active, Stats: 0 bytes in 0.307 second response time
TESTED FROM CLI:

Code: Select all

/usr/local/nagios/libexec/check_curl_result.sh --url https://hostname.domain/api/v1/alarms/active --type sbcAlarms --user Admin --password password
2019-02-27 14:59:29,921: check_curl_result.sh: Info: TEST Executing TEST CheckSBCAlarms with url https://hostname.domain/api/v1/alarms/active
2019-02-27 14:59:30,192: check_curl_result.sh: Info: Result:
HTTP Status: 204, Url: https://hostname.domain/api/v1/alarms/active, Stats: 0 bytes in 0.259 second response time,  | time=0.259s time_first_byte=0.259s time_tcp_connect=0.006s size=0B,
2019-02-27 14:59:30,197: check_curl_result.sh: Info: Verifying Curl exitcode 0
2019-02-27 14:59:30,203: check_curl_result.sh: Info: Verifying HTTP Status Code
2019-02-27 14:59:30,210: check_curl_result.sh: Info: Output: TEST
2019-02-27 14:59:30,215: check_curl_result.sh: Info: Curl action sbcAlarms finished. Exitcode 0
OK: TEST
What could cause this behaviour. It's like the service has cached an old version of the plugin?? Is this even possible..? Already did a delete, write, verify, restart..

Any advice is welcome
Nagios XI 5.8.1
https://outsideit.net
SteveBeauchemin
Posts: 524
Joined: Mon Oct 14, 2013 7:19 pm

Re: Very weird check result issue after server migration

Post by SteveBeauchemin »

Multi-line output expected right?

I had a similar issue when I upgraded 5.x. Not OS related. The output of my scripts which I would normally see in the Nagios GUI Status Information all became a single line. Anything after the CR/LF was gone. It turns out that I needed to use <BR> in my script to see multiple lines in the GUI. I had been using \n before that. The data is not actually gone, and can be seen in the Macro $LONGSERVICEOUTPUT$. At least my data was there.

I do not know if that is what you are experiencing, but that is what I changed to see extra data in my setup.

Steve B
XI 5.7.3 / Core 4.4.6 / NagVis 1.9.8 / LiveStatus 1.5.0p11 / RRDCached 1.7.0 / Redis 3.2.8 /
SNMPTT / Gearman 0.33-7 / Mod_Gearman 3.0.7 / NLS 2.0.8 / NNA 2.3.1 /
NSClient 0.5.0 / NRPE Solaris 3.2.1 Linux 3.2.1 HPUX 3.2.1
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Very weird check result issue after server migration

Post by cdienger »

Are you referring to not seeing the word "TEST" in the output? Like you should see "TEST" in the line "Curl action sbcAlarms started". I've tried various things to try and reproduce but without luck so far. What happens if you move the script out of the directory or make it non executable? Is the command under Configure > Core config manager > Commands > _Commands, pointed to the right script? Try creating a new temporary command and point it at the script and see if you get the same behavior with the new command.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: Very weird check result issue after server migration

Post by WillemDH »

Found the issue. After changing the command, it became apparent that these services had a hostgroup configured, which set a hostgroup which is used to assign the service to a gearman worker node. After cloning the used template and removing the gearman hostgroup from it, the issue is resolved. This ticket can be closed.

Thanks all for thinking with me!
Nagios XI 5.8.1
https://outsideit.net
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Very weird check result issue after server migration

Post by scottwilkerson »

WillemDH wrote:Found the issue. After changing the command, it became apparent that these services had a hostgroup configured, which set a hostgroup which is used to assign the service to a gearman worker node. After cloning the used template and removing the gearman hostgroup from it, the issue is resolved. This ticket can be closed.

Thanks all for thinking with me!
Great!

Locking thread
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked