SOLVED - Nagios 3.5. auto acknowledge almost works... almost

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
JoeGeorge
Posts: 10
Joined: Thu Jan 02, 2014 9:54 am

SOLVED - Nagios 3.5. auto acknowledge almost works... almost

Post by JoeGeorge »

I am a novice in perl programming.
And thank you in advance for reading and trying to solve my issue.

Problem:
It seems that the final step is never achieved.
I replied to a CRITICAL alert using email (on my smart phone), Nagios seems to know about it and processes it.
But an acknowledge icon never appears on the problem service or host.

Maybe one of you can help identify anything that's missing in the configuration or scripts.


I have done my homework and been successful in implementing Nagios 3.5.

The last piece of the puzzle is to do "auto acknowledge via email" feature.

I've followed several tips
http://www.techopsguys.com/2010/01/05/a ... l-replies/

http://www.mail-archive.com/nagios-user ... 21705.html

http://www.tek-tips.com/viewthread.cfm?qid=1641878

=========



Here is the output of the perl scrpt (i enable stdout and diagnostic)
********

Code: Select all

Subject: Re: ** PROBLEM Service Alert: prod ETS nagios/fs root is CRITICAL **
Servicedescription = fs root
Hostname = prod ETS nagios
Ack: service
Joe [1388671334] ACKNOWLEDGE_SVC_PROBLEM;prod ETS nagios;fs root;1;1;0;emailuser;acknowledged through email
SeenSubject: 1
SeenFrom: 1
Skipped: 0
AckUser: emailuser
Subject: Re: ** PROBLEM Service Alert: prod ETS nagios/fs root is CRITICAL **
Servicedescription = fs root
Hostname = prod ETS nagios
Ack: service
Joe [1388671746] ACKNOWLEDGE_SVC_PROBLEM;prod ETS nagios;fs root;1;1;0;emailuser;acknowledged through email
SeenSubject: 1
SeenFrom: 1
Skipped: 0
AckUser: emailuser
Subject: Re: ** PROBLEM Service Alert: prod ETS nagios/fs root is CRITICAL **
Servicedescription = fs root
Hostname = prod ETS nagios
Ack: service
SUBMITTING ACK by Joe [1388672724] ACKNOWLEDGE_SVC_PROBLEM;prod ETS nagios;fs root;1;1;0;emailuser;acknowledged through email
SeenSubject: 1
SeenFrom: 1
Skipped: 0
AckUser: emailuser
**************

And the actual scripts ============

Code: Select all

#!/usr/bin/perl -w

use diagnostics -verbose;
open(STDOUT, ">>/home/nagios/script.log") or die $!;

# Nagios email ack script
#
# mdkeller - 2009/05/11
#
# Script that runs out of .procmailrc and parses email
# If it determines that it is a reply to an alert, then we
# submit an acknowledgement to Nagios. If not we then
# forward the email to the nagios admin.

use strict;
use Getopt::Std;

my $msg = "";
my $acked = 0;
my $debug = 1;
my $commandfile = "/home/nagios/nagios.cmd.test";
my $nagiosadmin = "my_name_here\@mycompany_here.com";
my $ackusername = "emailuser";
my $acksubject = "acknowledged through email";

sub usage {
    print "Usage: $0 [-h] [-d]\n";
    print "\t-h\t\thelp\n";
    print "\t-d\t\tRun in debug mode and don't ack or email\n";
    exit 1;
}

# Get the options from the command line.
my %options = ();
getopts("hd", \%options);

if($options{h}) {
    usage();
}

if($options{d}) {
   $debug = 1;
}

my $problem;
my $service = 0;
my $host = 0;
my $servicedescription = "";
my $hostname = "";
my $ackmesg = "";
my $skip = 0;


# Need to make sure we only check the first subject and from
my $seensubject = 0;
my $seenfrom = 0;
# To tell us if the subject matched so we can send an ACK
my $subjectmatch = 0;

# Loop through the email message
while(<STDIN>) {
    # Save message for possible later use if this isn't an acknowledgement
    $msg .= $_;

    # Basically look for certain lines in the message. If they match we set a rule for
    # if it is a host or service acknowledgement.
    # For a service ack
    #        - We need the servicedescription and hostname
    #        - We also check to make sure it is WARNING or CRITICAL this means it is not a recovery
    # For a host ack
    #        -  We just need the hostname
    #        -  We also check to make sure the State is DOWN. This means it is not a recovery
    #
    # Service subjects look like:
    #   Pager: Subject: Re: PROBLEM: idcxpr0193/NRPE CRITICAL
    #
    #   Email: Subject: Re: ** PROBLEM Service Alert: idcxpr0193.con-way.com/IPMI is CRITICAL **
    #
    #if(/^Subject:\s*[Rr][Ee]:.*PROBLEM.*(CRITICAL|WARNING).*$/ && ! $seensubject ) {
    if(/^Subject:\s*[Rr][Ee]:.*PROBLEM.*\/.*$/ && ! $seensubject ) {
        # The string matches a rule for a service acknowledgement
        $service = 1;
        $subjectmatch = 1;
        # Check subject for Problem status
        if($debug) {
            print "$_";
        }
    }
    if(/^Subject:\s*[Rr][Ee]:.*PROBLEM.*DOWN.*$/ && ! $seensubject ) {
        # The string matches a rule for a host acknowledgement
        $host = 1;
        $subjectmatch = 1;
        # Check subject for Problem status
        if($debug) {
            print "$_";
        }
    }
    elsif(/^.*Service:/) {
        # The string matches a rule for a service acknowledgement
        $service = 1;
        $servicedescription = $_;
        $servicedescription =~ s/^.*Service:\s+(.*)\s*$/$1/;
        if($debug) {
            #print "$_";
            print "Servicedescription = $servicedescription\n";
        }
    }
    elsif(/^.*Host:/) {
        $hostname = $_;
        $hostname =~ s/^.*Host:\s+(.*)\s*$/$1/;
        $hostname =~ s/^(\w*)?\..*$/$1/;
        if($debug) {
            #print "$_";
            print "Hostname = $hostname\n";
        }
    }
    elsif(/^.*State:.*DOWN.*/) {
        # This is a host down problem
        #$host = 1;
    }
    elsif(/^.*State:.*(CRITICAL|WARNING).*/) {
        # This is a service critical or warning problem
        #$service = 1;
    }
    elsif(/^From:.*MAILER-DAEMON.*$/i && ! $seenfrom ) {
        $skip = 1;
    }
    elsif(/^From:/ && ! $seenfrom ) {
        # See if we can determine the from address so we can set it as the ackuser
        # We want to shorten it so it doen't have a domain or extra crud
        #
        my $tmpemail = $_;
        chomp($tmpemail);
        if(/</) {
            $tmpemail =~ s/^From:.*<(.*)\@.*>.*$/$1/;
        }
        elsif(/\(/) {
            $tmpemail =~ s/^From:.*\((.*)\@.*\).*$/$1/;
        }
        else {
            $tmpemail =~ s/^From:\s*(.*)\@.*$/$1/;
        }

        # Sadly we can't use the email right now, because Nagios needs to have
        # the username configured as a contact for it to work.
        #
        #if( $tmpemail ne "" && length($tmpemail) < 13 ) {
        #    $ackusername = $tmpemail;
        #}
    }
    elsif(/^comment:/i ) {
        my $tmpcomment = $_;
        chomp($tmpcomment);
        $tmpcomment =~ s/^comment:\s+(.*)$/$1/i;
        print $tmpcomment . "\n";
        if ( $tmpcomment ne "" ) {
            $acksubject = $tmpcomment;
        }
    }

    if (/^Subject:/) {
        # Make sure we only look at the first subject.
        # In case this has an attachment or such.
        $seensubject = 1;
    }

    if (/^From:/) {
        # Make sure we only look at the first from.
        # In case this has an attachment or such.
        $seenfrom = 1;
    }
}

# Determine if it is a host or service acknowledgement and build the submission
if ($service && $subjectmatch) {
    # This is a service acknowledgement so build message
    if ( $servicedescription eq "" ) {
        # Something went wrong and no service description got through
        $service = 0;
    }

    print "Ack: service\n";
    $ackmesg = "ACKNOWLEDGE_SVC_PROBLEM;$hostname;$servicedescription;1;1;0;$ackusername;$acksubject";
}
elsif ($host && $subjectmatch) {
    # This is a host acknowledgement so build message
    print "Ack: host\n";
    $ackmesg = "ACKNOWLEDGE_HOST_PROBLEM;$hostname;1;1;0;$ackusername;$acksubject";

}

# If we aren't supposed to skip an ack
# AND it is a service or host ack
# AND the ack message is set
# THEN submit the ACK to nagios
if ( ! $skip && ( $service || $host ) && $ackmesg ne "" && $hostname ne "" ) {
    my $now = time();
    if ( $debug ) {
        print "SUBMITTING ACK by Joe [$now] $ackmesg\n";
    }
    else {
        open(CMDFILE, ">> $commandfile") || die "Can't open $commandfile!\n";
        print CMDFILE "[$now] $ackmesg\n";
        close(CMDFILE);
    }
    $acked = 1;
}

# If the email was not acked then forward it to the Nagios admins
if ( ! $acked && ! $debug ) {
    open(MAIL, "|/usr/sbin/sendmail -t");

    print MAIL "From: $nagiosadmin\n";
    print MAIL "To: $nagiosadmin\n";
    print MAIL "Subject: Nagios email\n\n";
    print MAIL $msg;
}

if ( $debug ) {
    print "SeenSubject: $seensubject\n";
    print "SeenFrom: $seenfrom\n";
    print "Skipped: $skip\n";
    print "AckUser: $ackusername\n";
}
============


And the external command content =
[1388177940] ACKNOWLEDGE_SVC_PROBLEM;prod ETS nagios;fs root;1;1;0;emailuser;acknowledged through email
[1388178286] ACKNOWLEDGE_SVC_PROBLEM;prod ETS nagios;fs root;1;1;0;emailuser;acknowledged through email
[1388639210] ACKNOWLEDGE_SVC_PROBLEM;prod ETS nagios;Current Load;1;1;0;emailuser;acknowledged through email
[1388641901] ACKNOWLEDGE_SVC_PROBLEM;prod ETS nagios;Current Load;1;1;0;emailuser;acknowledged through email
[1388663619] ACKNOWLEDGE_SVC_PROBLEM;prod ETS nagios;Current Load;1;1;0;emailuser;acknowledged through email

===========================
Last edited by JoeGeorge on Mon Jan 06, 2014 7:46 pm, edited 1 time in total.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Nagios 3.5 - auto acknowledge almost works... almost

Post by abrist »

Well, from everything we can see here, it looks as though the acknowledgement worked as expected. When you try to view the acknowledgements for an object, do you get any http errors?

Code: Select all

tail -25 /var/log/httpd/error_log
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
JoeGeorge
Posts: 10
Joined: Thu Jan 02, 2014 9:54 am

Re: Nagios 3.5 - auto acknowledge almost works... almost

Post by JoeGeorge »

Other service items that I acknowledge manually, I can view the acknowledgement without any problem.

My problem is very strange.
Based on gut feelings, there may be a tiny minor thing that I could have missed.

Maybe you guys can point to that right direction.

what have been confirmed:
- the procmail - OK
- perl script to auto-ack - OK
- nagios core - OK (in general)
- what did I miss?????

Thanks so much....
JoeGeorge
Posts: 10
Joined: Thu Jan 02, 2014 9:54 am

Re: Nagios 3.5 - auto acknowledge almost works... almost

Post by JoeGeorge »

Here are the error_log files. No error reported for January 2014 yet.
When I last tested auto ack via email was yesterday - 2 Jan 2014.

# ls -al error*
-rw-r--r-- 1 root root 255 Dec 29 03:13 error_log
-rw-r--r-- 1 root root 331 Dec 8 03:48 error_log-20131208
-rw-r--r-- 1 root root 331 Dec 15 03:37 error_log-20131215
-rw-r--r-- 1 root root 331 Dec 22 03:40 error_log-20131222
-rw-r--r-- 1 root root 741 Dec 29 03:13 error_log-20131229


and the content of the latest erorr log ====
[Sun Dec 29 03:13:01 2013] [notice] Digest: generating secret for digest authentication ...
[Sun Dec 29 03:13:01 2013] [notice] Digest: done
[Sun Dec 29 03:13:01 2013] [notice] Apache/2.2.15 (Unix) DAV/2 PHP/5.3.3 configured -- resuming normal operations



thank you
JoeGeorge
Posts: 10
Joined: Thu Jan 02, 2014 9:54 am

Re: Nagios 3.5 - auto acknowledge almost works... almost

Post by JoeGeorge »

By the way...

How is this "ACKNOWLEDGE_SVC_PROBLEM" invoked?
where is this built-in function located?
Is there a way to execute it manually - at will?

What is the actual nagios script/built-in task that makes the change to an alert?

thank you
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios 3.5 - auto acknowledge almost works... almost

Post by scottwilkerson »

JoeGeorge wrote:By the way...

How is this "ACKNOWLEDGE_SVC_PROBLEM" invoked?
where is this built-in function located?
Is there a way to execute it manually - at will?

What is the actual nagios script/built-in task that makes the change to an alert?

thank you
It is read in from the command pipe, and is an internal command.

In your script you likely need to change the line

Code: Select all

my $commandfile = "/home/nagios/nagios.cmd.test";
to be the actual location of the nagios.cmd

You can see this location in the nagios.cfg, the name is ov the directive is command_file
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
JoeGeorge
Posts: 10
Joined: Thu Jan 02, 2014 9:54 am

Re: Nagios 3.5 - auto acknowledge almost works... almost

Post by JoeGeorge »

I will try to change it back to nagios.cmd

It was that way before. It didn't work.
So I needed to change it to see what was written to it.

I'll update again later after I am back in the office and try.

thanks
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Nagios 3.5 - auto acknowledge almost works... almost

Post by sreinhardt »

Sounds great, let us know how it goes. I would say that inputting to the command pipe is probably the easiest route if you are local to the nagios installation.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
JoeGeorge
Posts: 10
Joined: Thu Jan 02, 2014 9:54 am

Re: Nagios 3.5 - auto acknowledge almost works... almost

Post by JoeGeorge »

OK..

I changed the script so the line is like this:
my $msg = "";
my $acked = 0;
my $debug = 1;
##my $commandfile = "/home/nagios/nagios.cmd.test";
my $commandfile = "/usr/local/nagios/var/rw/nagios.cmd";



portion of nagios.log - check the "server1269" - NO ACKNOWLEDGEMENT, even though I did it =========
[1389026889] SERVICE ALERT: server1188;file count Outbound;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds.
[1389026939] SERVICE ALERT: server1269;fs root;CRITICAL;HARD;4;DISK CRITICAL - free space: / 7512 MB (78% inode=87%):
[1389026939] SERVICE NOTIFICATION: user1;server1269;fs root;CRITICAL;notify-service-by-email;DISK CRITICAL - free space: / 7512 MB (78% inode=87%):
[1389026939] SERVICE NOTIFICATION: user2;server1269;fs root;CRITICAL;notify-service-by-email;DISK CRITICAL - free space: / 7512 MB (78% inode=87%):
[1389026939] SERVICE NOTIFICATION: nagiosadmin;server1269;fs root;CRITICAL;notify-service-by-email;DISK CRITICAL - free space: / 7512 MB (78% inode=87%):
[1389026949] SERVICE ALERT: server1188;file count Outbound;CRITICAL;SOFT;2;CHECK_NRPE: Socket timeout after 10 seconds.
[1389026979] SERVICE ALERT: server1188;file count Outbound;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 10 seconds.
[1389027009] SERVICE ALERT: server1188;file count Outbound;OK;SOFT;3;0 Files - ok on cdcxpr1188
[1389027039] SERVICE ALERT: server1188;file count Outbound;OK;SOFT;2;0 Files - ok on cdcxpr1188
[1389027159] SERVICE ALERT: etsio-prod;Check AS2 HTTPS-4443;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10 seconds
[1389027209] SERVICE ALERT: etsio-prod;Check AS2 HTTPS-4443;OK;SOFT;2;HTTP OK: HTTP/1.1 200 OK - 180 bytes in 0.267 second response time
=================================


the procmail log file has this (I masked my real email address) ==========
procmail: [837] Mon Jan 6 08:51:11 2014
procmail: Assigning "PATH=/usr/bin"
procmail: Assigning "MATCH="
procmail: Matched "Re: ** PROBLEM Service Alert: prod ETS nagios/fs root is CRITICAL **"
procmail: Match on "^Subject: [ ]*\/[^ ].*"
procmail: Assigning "LASTFOLDER=/usr/local/nagios/etc/scripts/nagios-ack-by-email-dev.pl Re: ** PROBLEM Service Alert: prod ETS nagios/fs root is CRITICAL **"
procmail: Notified comsat: "nagios@:/usr/local/nagios/etc/scripts/nagios-ack-by-email-dev.pl Re: ** PROBLEM Service Alert: prod ETS nagios/fs root is CRITICAL **"
From [email protected] Mon Jan 6 08:51:11 2014
Subject: Re: ** PROBLEM Service Alert: prod ETS nagios/fs root is CRITICAL **
Folder: /usr/local/nagios/etc/scripts/nagios-ack-by-email-dev.pl Re: 3713
procmail: Executing "/usr/local/nagios/etc/scripts/nagios-ack-by-email-dev.pl,Re: ** PROBLEM Service Alert: prod ETS nagios/fs root is CRITICAL **"
Last edited by JoeGeorge on Mon Jan 06, 2014 3:48 pm, edited 1 time in total.
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Nagios 3.5 - auto acknowledge almost works... almost

Post by sreinhardt »

What user is your script being executed as? I just want to be sure it is in a group that has write permission to nagios.cmd.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Locked