Page 1 of 1

Global event handler appears not to work

Posted: Tue May 01, 2012 10:16 am
by andy_kann
Hi all,

I'm running into this really annoying problem.
I've inherited a Nagios setup, which is used to monitor a system which has limited or no access to the outside world. Main method to connect is FTP. Nagios has been setup to write all host and service events into XML files in a directory on the nagios server, which are downloaded via FTP to see what has happened on the system and for trending and the likes. No ideal situation, but the best we can do for now.
On the previous version of the system, writing the events into an XML file worked. Currently we are working on an upgrade for the complete platform, and to learn the ropes of nagios, I decided to figure out how the Nagios configuration was setup. I've got everything running now (and documented!), except writing the XML files. For some reason, I can't get it to work. So, I turn to this forum....

OK, some details...Nagios is version 3.2.3, running on RHEL 5u7. In /etc/nagios/nagios.cfg I've enabled the 'global_host_event_handler', which points to 'notify-host-by-file', and the 'global_service_event_handler', which points to 'notify-service-by-file'. the 'notify-host-by-file' and 'notify-service-by-file' are defined in the /etc/nagios/objects/commands.cfg and look like this:

# 'notify-host-by-file' command definition
define command{
command_name notify-host-by-file
command_line /usr/bin/printf "%b" "<?xml version=\"1.0\"?>\n<\0041-- Nagios notification -->\n<notification>\n<epsname>$EPSNAME</epsname>\n<type>$NOTIFICATIONTYPE$</type>\n<host>$HOSTNAME$</host>\n<address>$HOSTADDRESS$</address>\n<state>$HOSTSTATE$</state>\n<date>$DATE$</date>\n<time>$TIME$</time>\n<output><\0041[CDATA[$HOSTOUTPUT$]]></output>\n<info><\0041[CDATA[$LONGHOSTOUTPUT$]]></info>\n</notification>\n" > "/opt/nagios/host_notification_$HOSTNOTIFICATIONID$_$HOSTSTATE$.xml"
}

# 'notify-service-by-file' command definition
define command{
command_name notify-service-by-file
command_line /usr/bin/printf "%b" "<?xml version=\"1.0\"?>\n<\0041-- Nagios notification -->\n<notification>\n<epsname>$EPSNAME</epsname>\n<type>$NOTIFICATIONTYPE$</type>\n<service>$SERVICEDESC$</service>\n<host>$HOSTALIAS$</host>\n<address>$HOSTADDRESS$</address>\n<state>$SERVICESTATE$</state>\n<date>$DATE$</date>\n<time>$TIME$</time>\n<output><\0041[CDATA[$SERVICEOUTPUT$]]></output>\n<info><\0041[CDATA[$LONGSERVICEOUTPUT$]]></info>\n</notification>\n" > "/opt/nagios/service_notification_$SERVICENOTIFICATIONID$_$SERVICESTATE$.xml"
}

I've been toying with this for days now, stopping and restarting services nagios monitors, but the file is not written to disk. However, the nagios logfile shows that the 'notify-service-by-file' is called(at least, that is my interpretation :
[1335884520] SERVICE NOTIFICATION: nagiosadmin;esx03;HW check;CRITICAL;notify-service-by-file;CRITICAL - could not contact snmp agent, wrong device
[1335884560] SERVICE NOTIFICATION: nagiosadmin;vcenter;Antivirus Actualization;CRITICAL;notify-service-by-file;CRITICAL 329 days ago without updating.
[1335884910] SERVICE NOTIFICATION: nagiosadmin;glassfish-server;AppCenter-REST;CRITICAL;notify-service-by-file;Connection refused
[1335885030] SERVICE NOTIFICATION: nagiosadmin;glassfish-server;ACMA;CRITICAL;notify-service-by-file;Connection refused
[1335885030] SERVICE NOTIFICATION: nagiosadmin;glassfish-server;Certificate;CRITICAL;notify-service-by-file;Connection refused
[1335885030] SERVICE NOTIFICATION: nagiosadmin;glassfish-server;Coach;CRITICAL;notify-service-by-file;HTTPAUTH CRITICAL: authentication failed
[1335885050] SERVICE NOTIFICATION: nagiosadmin;glassfish-server;AppCenter;CRITICAL;notify-service-by-file;Connection refused
[1335885050] SERVICE NOTIFICATION: nagiosadmin;glassfish-server-asml-eps;Navigator;CRITICAL;notify-service-by-file;HTTPAUTH CRITICAL: authentication failed

And like Is said, as far as I can see, the configuration on the previous version of this platform is the same, and there it works. I probably are overlooking something, but I don't know what ??
Hope you folks can help me along !

cheers,
Andy

Re: Global event handler appears not to work

Posted: Tue May 01, 2012 7:37 pm
by jsmurphy
In my experience issues like this where Nagios appears to be processing everything correctly but you aren't seeing anything come out of the notification is nine times out of ten caused by a permissions issue or a misconfiguration in the output utility.

I would probably su to the Nagios user and ensure that you have the permissions to write to the directory and that nothing funky has happened to the permissions to prevent you from executing printf (unlikely). Also if you have SELinux, disable it temporarily and do another test because SELinux is the silent but deadly breaker of all things.

Re: Global event handler appears not to work

Posted: Wed May 02, 2012 1:03 am
by andy_kann
I assumed that also. I tried to run the printf command as the nagios user, which results in a xml file in the specified directory. So I can conclude that permissions are ok.

I don't have SElinux running, so I can exclude that also. My gutfeeling is that I am overlooking a small setting related to the global event handler. I have checked every config file almost a dozen times, compared it to the Nagios config on the older version (which DOES work!), but to no avail yet.

Any hints or tips are more than welcome!

cheers,
Andy

Re: Global event handler appears not to work

Posted: Wed May 02, 2012 4:35 am
by andy_kann
OK, I've fixed the problem....It appeared that I forgot the second $ to expand a variable I defined in resource.cfg....For some reason, Nagios processes the command, blurs out a bunch of macro-errors and then, silence. No errors, that the printf command did not succeed. Anyway, adding the second $ fixed that....

However, not the variable expands as $, instead of the value I gave it in resource.cfg...any hints on that ?

cheers,
Andy

Re: Global event handler appears not to work

Posted: Wed May 02, 2012 10:11 am
by agriffin
Are you sure the macro is defined and nagios is reading its definition? I've only run into that problem before when I tried using an undefined macro.

Re: Global event handler appears not to work

Posted: Wed May 02, 2012 6:10 pm
by jsmurphy
agriffin wrote:Are you sure the macro is defined and nagios is reading its definition? I've only run into that problem before when I tried using an undefined macro.
Exactly this, I had the same thing happen when I tried to use a custom macro I had defined on a service but forgot to add the service prefix.

Re: Global event handler appears not to work

Posted: Thu May 03, 2012 4:04 am
by andy_kann
It was an undefined macro. I did not define it as $USERx$. When I changed that, it expanded correctly.
Thanks for the help !!

cheers,
Andy

Re: Global event handler appears not to work

Posted: Fri May 04, 2012 9:43 am
by agriffin
Glad I could help!