Global event handler appears not to work

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
andy_kann
Posts: 4
Joined: Tue May 01, 2012 10:00 am

Global event handler appears not to work

Post by andy_kann »

Hi all,

I'm running into this really annoying problem.
I've inherited a Nagios setup, which is used to monitor a system which has limited or no access to the outside world. Main method to connect is FTP. Nagios has been setup to write all host and service events into XML files in a directory on the nagios server, which are downloaded via FTP to see what has happened on the system and for trending and the likes. No ideal situation, but the best we can do for now.
On the previous version of the system, writing the events into an XML file worked. Currently we are working on an upgrade for the complete platform, and to learn the ropes of nagios, I decided to figure out how the Nagios configuration was setup. I've got everything running now (and documented!), except writing the XML files. For some reason, I can't get it to work. So, I turn to this forum....

OK, some details...Nagios is version 3.2.3, running on RHEL 5u7. In /etc/nagios/nagios.cfg I've enabled the 'global_host_event_handler', which points to 'notify-host-by-file', and the 'global_service_event_handler', which points to 'notify-service-by-file'. the 'notify-host-by-file' and 'notify-service-by-file' are defined in the /etc/nagios/objects/commands.cfg and look like this:

# 'notify-host-by-file' command definition
define command{
command_name notify-host-by-file
command_line /usr/bin/printf "%b" "<?xml version=\"1.0\"?>\n<\0041-- Nagios notification -->\n<notification>\n<epsname>$EPSNAME</epsname>\n<type>$NOTIFICATIONTYPE$</type>\n<host>$HOSTNAME$</host>\n<address>$HOSTADDRESS$</address>\n<state>$HOSTSTATE$</state>\n<date>$DATE$</date>\n<time>$TIME$</time>\n<output><\0041[CDATA[$HOSTOUTPUT$]]></output>\n<info><\0041[CDATA[$LONGHOSTOUTPUT$]]></info>\n</notification>\n" > "/opt/nagios/host_notification_$HOSTNOTIFICATIONID$_$HOSTSTATE$.xml"
}

# 'notify-service-by-file' command definition
define command{
command_name notify-service-by-file
command_line /usr/bin/printf "%b" "<?xml version=\"1.0\"?>\n<\0041-- Nagios notification -->\n<notification>\n<epsname>$EPSNAME</epsname>\n<type>$NOTIFICATIONTYPE$</type>\n<service>$SERVICEDESC$</service>\n<host>$HOSTALIAS$</host>\n<address>$HOSTADDRESS$</address>\n<state>$SERVICESTATE$</state>\n<date>$DATE$</date>\n<time>$TIME$</time>\n<output><\0041[CDATA[$SERVICEOUTPUT$]]></output>\n<info><\0041[CDATA[$LONGSERVICEOUTPUT$]]></info>\n</notification>\n" > "/opt/nagios/service_notification_$SERVICENOTIFICATIONID$_$SERVICESTATE$.xml"
}

I've been toying with this for days now, stopping and restarting services nagios monitors, but the file is not written to disk. However, the nagios logfile shows that the 'notify-service-by-file' is called(at least, that is my interpretation :
[1335884520] SERVICE NOTIFICATION: nagiosadmin;esx03;HW check;CRITICAL;notify-service-by-file;CRITICAL - could not contact snmp agent, wrong device
[1335884560] SERVICE NOTIFICATION: nagiosadmin;vcenter;Antivirus Actualization;CRITICAL;notify-service-by-file;CRITICAL 329 days ago without updating.
[1335884910] SERVICE NOTIFICATION: nagiosadmin;glassfish-server;AppCenter-REST;CRITICAL;notify-service-by-file;Connection refused
[1335885030] SERVICE NOTIFICATION: nagiosadmin;glassfish-server;ACMA;CRITICAL;notify-service-by-file;Connection refused
[1335885030] SERVICE NOTIFICATION: nagiosadmin;glassfish-server;Certificate;CRITICAL;notify-service-by-file;Connection refused
[1335885030] SERVICE NOTIFICATION: nagiosadmin;glassfish-server;Coach;CRITICAL;notify-service-by-file;HTTPAUTH CRITICAL: authentication failed
[1335885050] SERVICE NOTIFICATION: nagiosadmin;glassfish-server;AppCenter;CRITICAL;notify-service-by-file;Connection refused
[1335885050] SERVICE NOTIFICATION: nagiosadmin;glassfish-server-asml-eps;Navigator;CRITICAL;notify-service-by-file;HTTPAUTH CRITICAL: authentication failed

And like Is said, as far as I can see, the configuration on the previous version of this platform is the same, and there it works. I probably are overlooking something, but I don't know what ??
Hope you folks can help me along !

cheers,
Andy
User avatar
jsmurphy
Posts: 989
Joined: Wed Aug 18, 2010 9:46 pm

Re: Global event handler appears not to work

Post by jsmurphy »

In my experience issues like this where Nagios appears to be processing everything correctly but you aren't seeing anything come out of the notification is nine times out of ten caused by a permissions issue or a misconfiguration in the output utility.

I would probably su to the Nagios user and ensure that you have the permissions to write to the directory and that nothing funky has happened to the permissions to prevent you from executing printf (unlikely). Also if you have SELinux, disable it temporarily and do another test because SELinux is the silent but deadly breaker of all things.
andy_kann
Posts: 4
Joined: Tue May 01, 2012 10:00 am

Re: Global event handler appears not to work

Post by andy_kann »

I assumed that also. I tried to run the printf command as the nagios user, which results in a xml file in the specified directory. So I can conclude that permissions are ok.

I don't have SElinux running, so I can exclude that also. My gutfeeling is that I am overlooking a small setting related to the global event handler. I have checked every config file almost a dozen times, compared it to the Nagios config on the older version (which DOES work!), but to no avail yet.

Any hints or tips are more than welcome!

cheers,
Andy
andy_kann
Posts: 4
Joined: Tue May 01, 2012 10:00 am

Re: Global event handler appears not to work

Post by andy_kann »

OK, I've fixed the problem....It appeared that I forgot the second $ to expand a variable I defined in resource.cfg....For some reason, Nagios processes the command, blurs out a bunch of macro-errors and then, silence. No errors, that the printf command did not succeed. Anyway, adding the second $ fixed that....

However, not the variable expands as $, instead of the value I gave it in resource.cfg...any hints on that ?

cheers,
Andy
agriffin
Posts: 876
Joined: Mon May 09, 2011 9:36 am

Re: Global event handler appears not to work

Post by agriffin »

Are you sure the macro is defined and nagios is reading its definition? I've only run into that problem before when I tried using an undefined macro.
User avatar
jsmurphy
Posts: 989
Joined: Wed Aug 18, 2010 9:46 pm

Re: Global event handler appears not to work

Post by jsmurphy »

agriffin wrote:Are you sure the macro is defined and nagios is reading its definition? I've only run into that problem before when I tried using an undefined macro.
Exactly this, I had the same thing happen when I tried to use a custom macro I had defined on a service but forgot to add the service prefix.
andy_kann
Posts: 4
Joined: Tue May 01, 2012 10:00 am

Re: Global event handler appears not to work

Post by andy_kann »

It was an undefined macro. I did not define it as $USERx$. When I changed that, it expanded correctly.
Thanks for the help !!

cheers,
Andy
agriffin
Posts: 876
Joined: Mon May 09, 2011 9:36 am

Re: Global event handler appears not to work

Post by agriffin »

Glad I could help!
Locked