Unix Log File Monitoring using Nagios XI
-
Srinija544
- Posts: 58
- Joined: Mon Oct 15, 2018 9:30 pm
Unix Log File Monitoring using Nagios XI
Moderator Note: Content removed per request
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Unix Log File Monitoring using Nagios XI
We do have a whole program dedicated to this functionality, Nagios Log Server
https://www.nagios.com/products/nagios-log-server/
However if you hare having a problem using the plugin you gave, can you give an example of the command you are using and the problem you are seeing?
https://www.nagios.com/products/nagios-log-server/
However if you hare having a problem using the plugin you gave, can you give an example of the command you are using and the problem you are seeing?
-
Srinija544
- Posts: 58
- Joined: Mon Oct 15, 2018 9:30 pm
Re: Unix Log File Monitoring using Nagios XI
Hi scottwilkerson,
Thank you for your response.
Please find the command which we are using to get this done:
command[check_file_content_dc1uat163]=/usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -i "ERROR|WARN|FATAL"
Output:
OK for /apps/wildfly-11.0.0/standalone/log/server.log.2019-11-04 (3581 found)
This is not appropriate for our requirement.
We need to get a critical alert when we see "ERROR|WARN|FATAL" in our Log File.
Please let me know if any further details are required in solving this issue.
Regards,
Srinija.
Thank you for your response.
Please find the command which we are using to get this done:
command[check_file_content_dc1uat163]=/usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -i "ERROR|WARN|FATAL"
Output:
OK for /apps/wildfly-11.0.0/standalone/log/server.log.2019-11-04 (3581 found)
This is not appropriate for our requirement.
We need to get a critical alert when we see "ERROR|WARN|FATAL" in our Log File.
Please let me know if any further details are required in solving this issue.
Regards,
Srinija.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Unix Log File Monitoring using Nagios XI
Then I believe you want to change the -i to -eSrinija544 wrote:We need to get a critical alert when we see "ERROR|WARN|FATAL" in our Log File.
Code: Select all
command[check_file_content_dc1uat163]=/usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -e "ERROR|WARN|FATAL"-
Srinija544
- Posts: 58
- Joined: Mon Oct 15, 2018 9:30 pm
Re: Unix Log File Monitoring using Nagios XI
Hi scottwilkerson,
I have replaced the command as mentioned.
Command:
command[check_file_content_dc1uat163]=/usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -e "ERROR|WARN|FATAL"
we are getting this alert.
UNKNOWN:
Usage : check_file_content.pl -f file -i include -e exclude -n lines_number [-h]
Options :
-f
Full path to file to analyze
-n
Number of lines to find (default is 1)
-i
Include pattern (can add multiple include)
-e
Exclude pattern (can add multiple include)
-h, --help
Print this help screen
Example : check_file_content.pl -f /etc/passwd -i 0 -e root -n 5
Please check and let me know for any further information.
Regards,
Srinija.
I have replaced the command as mentioned.
Command:
command[check_file_content_dc1uat163]=/usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -e "ERROR|WARN|FATAL"
we are getting this alert.
UNKNOWN:
Usage : check_file_content.pl -f file -i include -e exclude -n lines_number [-h]
Options :
-f
Full path to file to analyze
-n
Number of lines to find (default is 1)
-i
Include pattern (can add multiple include)
-e
Exclude pattern (can add multiple include)
-h, --help
Print this help screen
Example : check_file_content.pl -f /etc/passwd -i 0 -e root -n 5
Please check and let me know for any further information.
Regards,
Srinija.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Unix Log File Monitoring using Nagios XI
Can you run this from the CLI on the remote machine
If not lets try
Code: Select all
/usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -e "ERROR|WARN|FATAL"Code: Select all
/usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -e "ERROR" -e "WARN" -e "FATAL"-
Srinija544
- Posts: 58
- Joined: Mon Oct 15, 2018 9:30 pm
Re: Unix Log File Monitoring using Nagios XI
Please see the output after running two commands:
=====================================================================================================================
[root@dc1uat163 ~]# /usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -e "ERROR|WARN|FATAL"
Usage : check_file_content.pl -f file -i include -e exclude -n lines_number [-h]
Options :
-f
Full path to file to analyze
-n
Number of lines to find (default is 1)
-i
Include pattern (can add multiple include)
-e
Exclude pattern (can add multiple include)
-h, --help
Print this help screen
Example : check_file_content.pl -f /etc/passwd -i 0 -e root -n 5
[root@dc1uat163 ~]# /usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -e "ERROR" -e "WARN" -e "FATAL"
Usage : check_file_content.pl -f file -i include -e exclude -n lines_number [-h]
Options :
-f
Full path to file to analyze
-n
Number of lines to find (default is 1)
-i
Include pattern (can add multiple include)
-e
Exclude pattern (can add multiple include)
-h, --help
Print this help screen
Example : check_file_content.pl -f /etc/passwd -i 0 -e root -n 5
[root@dc1uat163 ~]#
=====================================================================================================================
[root@dc1uat163 ~]# /usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -e "ERROR|WARN|FATAL"
Usage : check_file_content.pl -f file -i include -e exclude -n lines_number [-h]
Options :
-f
Full path to file to analyze
-n
Number of lines to find (default is 1)
-i
Include pattern (can add multiple include)
-e
Exclude pattern (can add multiple include)
-h, --help
Print this help screen
Example : check_file_content.pl -f /etc/passwd -i 0 -e root -n 5
[root@dc1uat163 ~]# /usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -e "ERROR" -e "WARN" -e "FATAL"
Usage : check_file_content.pl -f file -i include -e exclude -n lines_number [-h]
Options :
-f
Full path to file to analyze
-n
Number of lines to find (default is 1)
-i
Include pattern (can add multiple include)
-e
Exclude pattern (can add multiple include)
-h, --help
Print this help screen
Example : check_file_content.pl -f /etc/passwd -i 0 -e root -n 5
[root@dc1uat163 ~]#
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Unix Log File Monitoring using Nagios XI
I just read the code and it doesn't look like it does what you want, however if you make the following change to the plugin it should give you the desired results
change this line
to this
then run like you had it originally
change this line
Code: Select all
if ($i > $num)Code: Select all
if ($i < $num)Code: Select all
/usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -i "ERROR|WARN|FATAL"-
Srinija544
- Posts: 58
- Joined: Mon Oct 15, 2018 9:30 pm
Re: Unix Log File Monitoring using Nagios XI
I have changed the script as mentioned but we have a problem. When i run this command:
Command:
===============================
/usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -i "ERROR|WARN|FATAL"
======================================
We are getting this output:
====================================
[root@dc1uat163 log]# /usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -i "ERROR|WARN|FATAL"
FAILED on /apps/wildfly-11.0.0/standalone/log/server.log.2019-11-10. Found only 270 on 1
====================================================
It means we have the count is 270 whether it may be ERROR or FATAL or WARN but when i check this separately the alert has to be OK if the pattern is not found in the file but this was not happening. Please find the below outputs:
===================================================
If i use only WARN as pattern:
output:
[root@dc1uat163 log]# /usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -i "WARN" FAILED on /apps/wildfly-11.0.0/standalone/log/server.log.2019-11-10. Found only 270 on 1
but for the other two patterns:
[root@dc1uat163 log]# /usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -i "FATAL"
FAILED on /apps/wildfly-11.0.0/standalone/log/server.log.2019-11-10
==============================================================
[root@dc1uat163 log]# /usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -i "ERROR"
FAILED on /apps/wildfly-11.0.0/standalone/log/server.log.2019-11-10
======================================================================================
If the mentioned patterns are not fond the file then the alert has to be in OK State but we are getting CRITICAL.
Please help me in correcting this issue.
Regards,
Srinija.
Command:
===============================
/usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -i "ERROR|WARN|FATAL"
======================================
We are getting this output:
====================================
[root@dc1uat163 log]# /usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -i "ERROR|WARN|FATAL"
FAILED on /apps/wildfly-11.0.0/standalone/log/server.log.2019-11-10. Found only 270 on 1
====================================================
It means we have the count is 270 whether it may be ERROR or FATAL or WARN but when i check this separately the alert has to be OK if the pattern is not found in the file but this was not happening. Please find the below outputs:
===================================================
If i use only WARN as pattern:
output:
[root@dc1uat163 log]# /usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -i "WARN" FAILED on /apps/wildfly-11.0.0/standalone/log/server.log.2019-11-10. Found only 270 on 1
but for the other two patterns:
[root@dc1uat163 log]# /usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -i "FATAL"
FAILED on /apps/wildfly-11.0.0/standalone/log/server.log.2019-11-10
==============================================================
[root@dc1uat163 log]# /usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -i "ERROR"
FAILED on /apps/wildfly-11.0.0/standalone/log/server.log.2019-11-10
======================================================================================
If the mentioned patterns are not fond the file then the alert has to be in OK State but we are getting CRITICAL.
Please help me in correcting this issue.
Regards,
Srinija.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Unix Log File Monitoring using Nagios XI
this is not our plugin so help from the author may be better, but to try to assist I rewrote a portion of the file, you can try this:
And I believe with multiple args you need to run it like this
Code: Select all
#!/usr/bin/perl
#===============================================================================
#
# FILE: check_file_content.pl
#
# USAGE: ./check_file_content.pl
#
# DESCRIPTION: Nagios plugin to check file content
#
# OPTIONS: ---
# REQUIREMENTS: ---
# BUGS: ---
# NOTES: ---
# AUTHOR: Pierre Mavro (), [email protected]
# COMPANY:
# VERSION: 0.1
# CREATED: 10/05/2010 09:25:56
# REVISION: ---
#===============================================================================
use warnings;
use strict;
use Getopt::Long;
my %RETCODES = ('OK' => 0, 'WARNING' => 1, 'CRITICAL' => 2, 'UNKNOWN' => 3);
# Help
sub help
{
print "Usage : check_file_content.pl -f file -i include -e exclude -n lines_number [-h]\n\n";
print "Options :\n";
print " -f\n\tFull path to file to analyze\n";
print " -n\n\tNumber of lines to find (default is 1)\n";
print " -i\n\tInclude pattern (can add multiple include)\n";
print " -e\n\tExclude pattern (can add multiple include)\n";
print " -h, --help\n\tPrint this help screen\n";
print "\nExample : check_file_content.pl -f /etc/passwd -i 0 -e root -n 5\n";
exit $RETCODES{"UNKNOWN"};
}
sub check_args
{
help if !(defined(@ARGV));
my ($file,@include,@exclude);
my $num=1;
# Set options
GetOptions( "help|h" => \&help,
"f=s" => \$file,
"i=s" => \@include,
"e=s" => \@exclude,
"n=i" => \$num);
unless (($file) and (@include))
{
&help;
}
else
{
check_soft($file,$num,\@include,\@exclude);
}
}
sub check_soft
{
my $file=shift;
my $num=shift;
my $ref_include=shift;
my $ref_exclude=shift;
my @include = @$ref_include;
my @exclude = @$ref_exclude;
my $i=0;
if (!open(FILER, "<$file"))
{
print "Can't open $file: $!\n";
exit $RETCODES{"CRITICAL"};
}
while(<FILER>)
{
chomp($_);
my $line=$_;
my $found=0;
# Should match
foreach (@include)
{
if ($line =~ /$_/)
{
$found=1;
last;
}
}
# Shouldn't match
if (@exclude)
{
foreach (@exclude)
{
if ($line =~ /$_/)
{
$found=0;
last;
}
}
}
$i++ if ($found == 1);
}
close(FILER);
if ($i > 0)
{
if ($i < $num)
{
print "OK for $file ($i found)\n";
exit $RETCODES{"OK"};
}
else
{
print "FAILED on $file. Found only $i on $num\n";
exit $RETCODES{"CRITICAL"};
}
}
else
{
print "OK on $file\n";
exit $RETCODES{"OK"};
}
}
check_args;Code: Select all
/usr/local/nagios/libexec/check_file_content.pl -f /apps/wildfly-11.0.0/standalone/log/server.log.`date --date='yesterday' +"%Y-%m-%d"` -i "ERROR" -i "WARN" -i "FATAL"