Page 1 of 1

Some checks show it failed but in real they are running

Posted: Wed Apr 08, 2015 4:00 am
by anji009
Hi Nagios Experts,
I have setup the Nagios in my Linux machine and put my own plug in to check the services running in remote host.
I installed NRPE and setup is fine.

Remote host has some application services running and I want to monitor them.i wrote a plugin to monitor the services and send the output to nagios.

Problem here is, all services are running fine in the remote host, but its showing SUCCESS to few and FAILURE to few services in nagios webpage (See attached png file)

I checked it from command line aswell through the nagios server, it says the same.. please advise

Nagios Server:
#################################################################
root@xxxxx:libexec> ./check_nrpe -H 192.0.145.143 -c check_services3
MarketsStaticData is running fine.
PID for MarketsStaticData = 2232

root@xxxxx:libexec> ./check_nrpe -H 192.0.145.143 -c check_services2
ToolboxLogstashCollector is OFFLINE.
CRITICAL-
################################################################

Remote server (NRPE)
###############################################################
root@AAAA:ToolboxNagiosRemote> /usr/local/nagios/libexe :> ./check_services.sh ToolboxLogstashCollector
ToolboxLogstashCollector is running fine.
PID for ToolboxLogstashCollector = 2008
OK-
###############################################################

May I know the reason why few services showing correct but few are not.. I checked all the config files but none of them seems incorrect :(

See below for the config files.

#### Nagios Server #####

vi /usr/local/nagios/etc/nagios.cfg

/usr/local/nagios/etc/objects/ipokit-dev_sg.cfg
/usr/local/nagios/etc/objects/commands.cfg

vi /usr/local/nagios/etc/objects/ipokit-dev_sg.cfg
check_command check_nrpe!memory_usage
check_command check_nrpe!check_services1
check_command check_nrpe!check_services2
check_command check_nrpe!check_services3
check_command check_nrpe!check_services4
check_command check_nrpe!check_services5
check_command check_nrpe!check_services6

vi /usr/local/nagios/etc/objects/commands.cfg
# memory usage command definition
define command{
command_name memory_usage
command_line $USER1$/memory_usage.sh
}
define command{
command_name check_services1
command_line $USER1$/check_services.sh $ARG1$
}
define command{
command_name check_services2
command_line $USER1$/check_services.sh $ARG1$
}
define command{
command_name check_services3
command_line $USER1$/check_services.sh $ARG1$
}
define command{
command_name check_services4
command_line $USER1$/check_services.sh $ARG1$
}
define command{
command_name check_services5
command_line $USER1$/check_services.sh $ARG1$
}
define command{
command_name check_services6
command_line $USER1$/check_services.sh $ARG1$
}
define command{
command_name check_services7
command_line $USER1$/check_services.sh $ARG1$
}
define command{
command_name check_services8
command_line $USER1$/check_services.sh $ARG1$
}


##### NRPE Server ######
vi /etc/xinetd.d/nrpe

server = /usr/local/nagios/bin/nrpe
server_args = -c /usr/local/nagios/etc/nrpe_ipokit-dev_sg.cfg --inetd
only_from = 127.0.0.1 192.168.143.243 # Nagios Master server - maap server

vi /usr/local/nagios/etc/nrpe_ipokit-dev_sg.cfg

command[memory_usage]=/usr/local/nagios/libexec/memory_usage.sh
command[check_services1]=/usr/local/nagios/libexec/check_services.sh ToolboxLogstashCollector
command[check_services2]=/usr/local/nagios/libexec/check_services.sh ToolboxLogService
command[check_services3]=/usr/local/nagios/libexec/check_services.sh ToolboxProxy
command[check_services4]=/usr/local/nagios/libexec/check_services.sh ToolboxPolicyFileServer
command[check_services5]=/usr/local/nagios/libexec/check_services.sh ToolboxUserAdminFrontend
command[check_services6]=/usr/local/nagios/libexec/check_services.sh IpoToolFrontend

Re: Some checks show it failed but in real they are running

Posted: Wed Apr 08, 2015 10:34 am
by jdalrymple
root@xxxxx:libexec> ./check_nrpe -H 192.0.145.143 -c check_services2
ToolboxLogstashCollector is OFFLINE.
CRITICAL-
Can the nagios user run all the necessary parts of the script you created?
Is selinux enabled?

Also - just FYI, you have a lot of duplicated work here. You really only need 1 check_command:

Code: Select all

define command{
command_name check_services1
command_line $USER1$/check_services.sh $ARG1$
}
Just pass in the proper arg in each individual service, as you are.

Re: Some checks show it failed but in real they are running

Posted: Wed Apr 08, 2015 10:09 pm
by anji009
Hi jdalrymple
Thanks alot for the reply, You spotted correctly... the failed services permission was 644, I changed it to 755 and everything fine now.

But, I am still struggling with other problem .

When I open my nagios web-interface, it says , You don't have permission to access /nagios/ on this server.

the nagios config file in /etc/httpd/conf.d

#############################
ScriptAlias /nagios/cgi-bin "/opt/ToolboxNagiosMaster/current/dist/nagios-4.0.8/sbin"

<Directory "/opt/ToolboxNagiosMaster/current/dist/nagios-4.0.8/sbin">
# SSLRequireSSL
Options ExecCGI
AllowOverride None
Order allow,deny
Allow from all
# Order deny,allow
# Deny from all
# Allow from 127.0.0.1
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /opt/ToolboxNagiosMaster/current/dist/nagios-4.0.8/etc/htpasswd.users
Require valid-user
</Directory>

Alias /nagios "/opt/ToolboxNagiosMaster/current/dist/nagios-4.0.8/share"

<Directory "/opt/ToolboxNagiosMaster/current/dist/nagios-4.0.8/share">
# SSLRequireSSL
Options Indexes
AllowOverride None
#DirectoryIndex index.php
Order allow,deny
Allow from all
# Order deny,allow
# Deny from all
# Allow from 127.0.0.1
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /opt/ToolboxNagiosMaster/current/dist/nagios-4.0.8/etc/htpasswd.users
Require valid-user
</Directory>
#############################

we are running the nagios service with our own user not the nagios user, when we install the package the folders and files will be copied with our same user.

I did couple tests from my side but none of them working
1. I changed the user and group for the Nagios package (ToolboxNagiosMaster) to apache as the httpd is running with apache in the server, I changed the folder user and group and permissions, so that the httpd can be able to access the files under sbin and share folder... but its not working

root@XXX:opt> ls
drwxrwxrwx. 3 apache apache 4096 Apr 8 16:23 ToolboxNagiosMaster

2. Changed the permission to 777 for the /opt/ToolboxNagiosMaster folder so the httpd can access the nagios files, still didn't work.

As an alternative, i had to diable the SELinux file everytime if I want to go into the nagios page by doing below..
echo 0 > /selinux/enforce

I know this is not a good practice, but i do not find any other solution to run the nagios web-interface without disabiling SELinux file.

Can you please advise here ?

Thanks
Anji

Re: Some checks show it failed but in real they are running

Posted: Thu Apr 09, 2015 1:06 pm
by lmiltchev
You can check out this tutorial:

http://exchange.nagios.org/directory/Tu ... os/details

Hope this helps.

Re: Some checks show it failed but in real they are running

Posted: Mon Apr 27, 2015 4:38 am
by anji009
Hi lmiltchev
Apologies for late reply As i was not around.

I tried to create a custom plugin as per the suggestion from the link you provided.. Its still not working for me :(

If I make the SELinux settings "Permissive" , nagios working fine and I can see all my services in the web page... But if I change it back to "Enforcing" mode, i get the below error

Error: Could not open CGI config file '/opt/ToolboxNagiosMaster/current/dist/nagios-4.0.8/etc/cgi.cfg' for reading!


As per my requirement, I must not change the SELinux settings, it has to be in Enforcing mode

How Can I make it work with Enforcing mode ? Can you advise please ?

Thanks
Anji

Re: Some checks show it failed but in real they are running

Posted: Mon Apr 27, 2015 1:57 pm
by tgriep
Are the permissions for that folder and file set correctly?
Please run the following and post back the results.

Code: Select all

ll /opt/ToolboxNagiosMaster/current/dist/nagios-4.0.8/etc/

Re: Some checks show it failed but in real they are running

Posted: Mon Apr 27, 2015 8:14 pm
by anji009
Hi tgriep
yes, it is 755
please see below

root@xxx:etc> ll /opt/ToolboxNagiosMaster/current/dist/nagios-4.0.8/etc/
total 128K
drwxr-xr-x. 3 u25298 cusers 4.0K Apr 27 17:24 ./
drwxr-xr-x. 9 u25298 cusers 4.0K Apr 13 12:07 ../
-rwxr-xr-x. 1 u25298 cusers 12K Apr 13 12:07 cgi.cfg*
-rwxr-xr-x. 1 root root 26 Apr 28 09:10 htpasswd.users*
-rwxr-xr-x. 1 u25298 cusers 45K Apr 13 12:07 nagios.cfg*
-rwxr-xr-x. 1 u25298 cusers 45K Apr 13 12:07 nagios.cfg.bak*
drwxr-xr-x. 2 u25298 cusers 4.0K Apr 27 15:39 objects/
-rwxr-xr-x. 1 u25298 cusers 1.4K Apr 13 12:07 resource.cfg*

Re: Some checks show it failed but in real they are running

Posted: Tue Apr 28, 2015 10:09 am
by tgriep
The Apache web server needs to be able to read those files. Can you add the apache user to the group called cusers and test it again?

Re: Some checks show it failed but in real they are running

Posted: Tue Apr 28, 2015 9:44 pm
by anji009
Hi tgriep
I added the apache user to all files as you advised, still no luck :(
Even I tried to change the group of etc folder to apache...

root@srp05927lx:etc> ls
total 120
-rwxr-xr-x. 1 u25298 apache 45709 Apr 29 09:40 nagios.cfg.bak
-rwxr-xr-x. 1 u25298 apache 1378 Apr 29 09:40 resource.cfg
-rwxr-xr-x. 1 u25298 apache 45644 Apr 29 09:40 nagios.cfg
drwxr-xr-x. 2 u25298 apache 4096 Apr 29 09:40 objects
-rwxr-xr-x. 1 u25298 apache 12081 Apr 29 09:40 cgi.cfg
-rwxr-xr-x. 1 root apache 26 Apr 29 10:22 htpasswd.users

I added apache user to the group aswell in case if it works.. but thats not helping too

root@srp05927lx:nagios-4.0.8> /usr/sbin/usermod -a -G cusers apache
root@srp05927lx:nagios-4.0.8> service httpd restart


Currently, nagios weg interface is just stuck at the "Authontication" page, even though I give the nagiosadmin credentials, its not letting me go in..its keep on asking for the password..

But, when I change the the SELinux settings to 0 (setenforce 0), I can able to go into the web page and see my services.

Your help is highly required here as I am stuck with this from long time and tried almost everything except the one which works :( :(


Thanks
Anji

Re: Some checks show it failed but in real they are running

Posted: Wed Apr 29, 2015 1:06 pm
by tgriep
Run the following while you try to login to the Nagios web interface and post the output back here.

Code: Select all

tail -f /var/log/httpd/error_log
tail -f /var/log/httpd/access_log
Could you also run the following and post the output also?

Code: Select all

ps aux |grep nagios