Unix Log File Monitoring using Nagios XI

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Srinija544
Posts: 58
Joined: Mon Oct 15, 2018 9:30 pm

Unix Log File Monitoring using Nagios XI

Post by Srinija544 »

Moderator Note: Content removed per request
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Unix Log File Monitoring using Nagios XI

Post by scottwilkerson »

We do have a whole program dedicated to this functionality, Nagios Log Server
https://www.nagios.com/products/nagios-log-server/

However if you hare having a problem using the plugin you gave, can you give an example of the command you are using and the problem you are seeing?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Srinija544
Posts: 58
Joined: Mon Oct 15, 2018 9:30 pm

Re: Unix Log File Monitoring using Nagios XI

Post by Srinija544 »

Hi scottwilkerson,

Thank you for your response.

Please find the command which we are using to get this done:

command[check_file_content_dc1uat163]=/usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -i "ERROR|WARN|FATAL"

Output:
OK for /apps/wildfly-11.0.0/standalone/log/server.log.2019-11-04 (3581 found)

This is not appropriate for our requirement.

We need to get a critical alert when we see "ERROR|WARN|FATAL" in our Log File.

Please let me know if any further details are required in solving this issue.

Regards,
Srinija.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Unix Log File Monitoring using Nagios XI

Post by scottwilkerson »

Srinija544 wrote:We need to get a critical alert when we see "ERROR|WARN|FATAL" in our Log File.
Then I believe you want to change the -i to -e

Code: Select all

command[check_file_content_dc1uat163]=/usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -e "ERROR|WARN|FATAL"
restart nrpe
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Srinija544
Posts: 58
Joined: Mon Oct 15, 2018 9:30 pm

Re: Unix Log File Monitoring using Nagios XI

Post by Srinija544 »

Hi scottwilkerson,

I have replaced the command as mentioned.

Command:
command[check_file_content_dc1uat163]=/usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -e "ERROR|WARN|FATAL"

we are getting this alert.

UNKNOWN:

Usage : check_file_content.pl -f file -i include -e exclude -n lines_number [-h]

Options :
-f
Full path to file to analyze
-n
Number of lines to find (default is 1)
-i
Include pattern (can add multiple include)
-e
Exclude pattern (can add multiple include)
-h, --help
Print this help screen

Example : check_file_content.pl -f /etc/passwd -i 0 -e root -n 5

Please check and let me know for any further information.

Regards,
Srinija.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Unix Log File Monitoring using Nagios XI

Post by scottwilkerson »

Can you run this from the CLI on the remote machine

Code: Select all

/usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -e "ERROR|WARN|FATAL"
If not lets try

Code: Select all

/usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -e "ERROR" -e "WARN" -e "FATAL"
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Srinija544
Posts: 58
Joined: Mon Oct 15, 2018 9:30 pm

Re: Unix Log File Monitoring using Nagios XI

Post by Srinija544 »

Please see the output after running two commands:
=====================================================================================================================
[root@dc1uat163 ~]# /usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -e "ERROR|WARN|FATAL"
Usage : check_file_content.pl -f file -i include -e exclude -n lines_number [-h]

Options :
-f
Full path to file to analyze
-n
Number of lines to find (default is 1)
-i
Include pattern (can add multiple include)
-e
Exclude pattern (can add multiple include)
-h, --help
Print this help screen

Example : check_file_content.pl -f /etc/passwd -i 0 -e root -n 5
[root@dc1uat163 ~]# /usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -e "ERROR" -e "WARN" -e "FATAL"
Usage : check_file_content.pl -f file -i include -e exclude -n lines_number [-h]

Options :
-f
Full path to file to analyze
-n
Number of lines to find (default is 1)
-i
Include pattern (can add multiple include)
-e
Exclude pattern (can add multiple include)
-h, --help
Print this help screen

Example : check_file_content.pl -f /etc/passwd -i 0 -e root -n 5
[root@dc1uat163 ~]#
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Unix Log File Monitoring using Nagios XI

Post by scottwilkerson »

I just read the code and it doesn't look like it does what you want, however if you make the following change to the plugin it should give you the desired results

change this line

Code: Select all

if ($i > $num)
to this

Code: Select all

if ($i < $num)
then run like you had it originally

Code: Select all

/usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -i "ERROR|WARN|FATAL"
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Srinija544
Posts: 58
Joined: Mon Oct 15, 2018 9:30 pm

Re: Unix Log File Monitoring using Nagios XI

Post by Srinija544 »

I have changed the script as mentioned but we have a problem. When i run this command:

Command:
===============================
/usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -i "ERROR|WARN|FATAL"
======================================
We are getting this output:
====================================
[root@dc1uat163 log]# /usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -i "ERROR|WARN|FATAL"
FAILED on /apps/wildfly-11.0.0/standalone/log/server.log.2019-11-10. Found only 270 on 1
====================================================
It means we have the count is 270 whether it may be ERROR or FATAL or WARN but when i check this separately the alert has to be OK if the pattern is not found in the file but this was not happening. Please find the below outputs:
===================================================
If i use only WARN as pattern:
output:

[root@dc1uat163 log]# /usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -i "WARN" FAILED on /apps/wildfly-11.0.0/standalone/log/server.log.2019-11-10. Found only 270 on 1

but for the other two patterns:
[root@dc1uat163 log]# /usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -i "FATAL"
FAILED on /apps/wildfly-11.0.0/standalone/log/server.log.2019-11-10
==============================================================
[root@dc1uat163 log]# /usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -i "ERROR"
FAILED on /apps/wildfly-11.0.0/standalone/log/server.log.2019-11-10
======================================================================================
If the mentioned patterns are not fond the file then the alert has to be in OK State but we are getting CRITICAL.

Please help me in correcting this issue.

Regards,
Srinija.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Unix Log File Monitoring using Nagios XI

Post by scottwilkerson »

this is not our plugin so help from the author may be better, but to try to assist I rewrote a portion of the file, you can try this:

Code: Select all

#!/usr/bin/perl
#===============================================================================
#
# FILE: check_file_content.pl
#
# USAGE: ./check_file_content.pl
#
# DESCRIPTION: Nagios plugin to check file content
#
# OPTIONS: ---
# REQUIREMENTS: ---
# BUGS: ---
# NOTES: ---
# AUTHOR: Pierre Mavro (), [email protected]
# COMPANY:
# VERSION: 0.1
# CREATED: 10/05/2010 09:25:56
# REVISION: ---
#===============================================================================

use warnings;
use strict;
use Getopt::Long;

my %RETCODES = ('OK' => 0, 'WARNING' => 1, 'CRITICAL' => 2, 'UNKNOWN' => 3);

# Help
sub help
{
print "Usage : check_file_content.pl -f file -i include -e exclude -n lines_number [-h]\n\n";
print "Options :\n";
print " -f\n\tFull path to file to analyze\n";
print " -n\n\tNumber of lines to find (default is 1)\n";
print " -i\n\tInclude pattern (can add multiple include)\n";
print " -e\n\tExclude pattern (can add multiple include)\n";
print " -h, --help\n\tPrint this help screen\n";
print "\nExample : check_file_content.pl -f /etc/passwd -i 0 -e root -n 5\n";
exit $RETCODES{"UNKNOWN"};
}

sub check_args
{
help if !(defined(@ARGV));

my ($file,@include,@exclude);
my $num=1;

# Set options
GetOptions( "help|h" => \&help,
"f=s" => \$file,
"i=s" => \@include,
"e=s" => \@exclude,
"n=i" => \$num);

unless (($file) and (@include))
{
&help;
}
else
{
check_soft($file,$num,\@include,\@exclude);
}
}

sub check_soft
{
my $file=shift;
my $num=shift;
my $ref_include=shift;
my $ref_exclude=shift;
my @include = @$ref_include;
my @exclude = @$ref_exclude;
my $i=0;

if (!open(FILER, "<$file"))
{
print "Can't open $file: $!\n";
exit $RETCODES{"CRITICAL"};
}

while(<FILER>)
{
chomp($_);
my $line=$_;
my $found=0;

# Should match
foreach (@include)
{
if ($line =~ /$_/)
{
$found=1;
last;
}
}

# Shouldn't match
if (@exclude)
{
foreach (@exclude)
{
if ($line =~ /$_/)
{
$found=0;
last;
}
}
}

$i++ if ($found == 1);
}
close(FILER);

if ($i > 0)
{
if ($i < $num)
{
print "OK for $file ($i found)\n";
exit $RETCODES{"OK"};
}
else
{
print "FAILED on $file. Found only $i on $num\n";
exit $RETCODES{"CRITICAL"};
}
}
else
{
print "OK on $file\n";
exit $RETCODES{"OK"};
}
}

check_args;
And I believe with multiple args you need to run it like this

Code: Select all

/usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -i "ERROR" -i "WARN" -i "FATAL"
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked