Page 1 of 1
Linux Ubuntu install - troubleshoot services
Posted: Tue Sep 27, 2016 6:13 pm
by gbnag
hello,
Installed the linux agent for ubuntu and finding a number of services are either unknown or in warning or critical state (see attached).
How would we rectify to show them as normal. The target system does run the cron and ssh daemons.
Could not locate the command 'check_total_procs' on the nagiosxi server under /usr/local/nagios/libexec/, but service shows as critical.
ty
Re: Linux Ubuntu install - troubleshoot services
Posted: Tue Sep 27, 2016 7:44 pm
by Box293
How did you install the linux agent for ubutnu, did you follow a specific guide?
Re: Linux Ubuntu install - troubleshoot services
Posted: Wed Sep 28, 2016 1:23 pm
by gbnag
We did get an installation error, i.e:
----------------------------------------------
Processing triggers for libc-bin (2.19-0ubuntu6.9) ...
Prerequisites installed OK
RESULT=0
Running './2-usersgroups'...
Adding users and groups...
useradd: user 'nagios' already exists
groupadd: group 'nagios' already exists
useradd: user 'nagios' already exists
ERROR: User 'nagios' was not created - exiting.
RESULT=1
===================
INSTALLATION ERROR!
===================
Installation step failed - exiting.
-----------------------------------------------------------
when we addressed the installation error during a quick start session it was suggested in order to bypass this error, add 'touch installed.usersgroups' under the linux-nrpe-agent directory and rerun the full install script, which we did.
------------------------------------------------------------------
Installation finished without errors:
.........
......................
xinetd stop/waiting
xinetd start/running, process 26802
Subcomponents installed OK
RESULT=0
##########################################################
### ###
### Nagios XI Linux Agent Installation Complete! ###
### ###
##########################################################
If you experience any problems, please attach the file install.log that was just created to any support requests.
root@dmg-dev:/tmp/linux-nrpe-agent#
Re: Linux Ubuntu install - troubleshoot services
Posted: Wed Sep 28, 2016 2:02 pm
by lmiltchev
Not all of these services are added via the "Linux Server" wizard... Did you add "Current Load", and "Current Users" manually?
Show us the actual commands for the failing services run from the command line (on the Nagios XI server) along with the output.
Also, run the following command on the remote (Ununtu) box:
Code: Select all
cat /usr/local/nagios/etc/nrpe/common.cfg
Re: Linux Ubuntu install - troubleshoot services
Posted: Wed Sep 28, 2016 6:50 pm
by gbnag
We performed a default install.
From the XI server, we can run checks for load (shows green
[root@localhost-010049098179 libexec]# /usr/local/nagios/libexec/check_nrpe -H dmg-dev -t 30 -c check_load -a '-w 5 -c 10'
OK - load average: 2.22, 2.22, 2.29|load1=2.220;5.000;10.000;0; load5=2.220;5.000;10.000;0; load15=2.290;5.000;10.000;0;
[root@localhost-010049098179 libexec]#
However unable to find a 'Current Load' or 'Current Users 'check (both services showing orange).
Total Processes (service showing red) unable to find the check for it either (see attached).
For ssh server (service showing yellow) getting:
[root@localhost-010049098179 libexec]# /usr/local/nagios/libexec/check_nrpe -H dmg-dev -t 30 -c check_ssh
NRPE: Command 'check_ssh' not defined
For cron scheduling daemon (service showing yellow), unable to locate a check script.
Below is the common.cfg output from the target ubuntu host:
----------------------------------------------------------------------------
root@dmg-dev:~# cat /usr/local/nagios/etc/nrpe/common.cfg
### GENERIC SERVICES ###
command[check_init_service]=sudo /usr/local/nagios/libexec/check_init_service $ARG1$
command[check_services]=/usr/local/nagios/libexec/check_services -p $ARG1$
### MISC SYSTEM METRICS ###
#command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_users]=/usr/local/nagios/libexec/check_users $ARG1$
command[check_load]=/usr/local/nagios/libexec/check_load $ARG1$
command[check_swap]=/usr/local/nagios/libexec/check_swap $ARG1$
command[check_cpu_stats]=/usr/local/nagios/libexec/check_cpu_stats.sh $ARG1$
command[check_mem]=/usr/local/nagios/libexec/custom_check_mem -n $ARG1$
### SYSTEM UPDATES ###
command[check_yum]=/usr/local/nagios/libexec/check_yum
command[check_apt]=/usr/local/nagios/libexec/check_apt
### DISK ###
command[check_disk]=/usr/local/nagios/libexec/check_disk $ARG1$
command[check_ide_smart]=/usr/local/nagios/libexec/check_ide_smart $ARG1$
### PROCESSES ###
command[check_all_procs]=/usr/local/nagios/libexec/custom_check_procs
command[check_procs]=/usr/local/nagios/libexec/check_procs $ARG1$
### OPEN FILES ###
command[check_open_files]=/usr/local/nagios/libexec/check_open_files.pl $ARG1$
### NETWORK CONNECTIONS ###
command[check_netstat]=/usr/local/nagios/libexec/check_netstat.pl -p $ARG1$ $ARG2$root@dmg-dev:~#
Re: Linux Ubuntu install - troubleshoot services
Posted: Thu Sep 29, 2016 12:18 am
by Box293
gbnag wrote:For ssh server (service showing yellow) getting:
[root@localhost-010049098179 libexec]# /usr/local/nagios/libexec/check_nrpe -H dmg-dev -t 30 -c check_ssh
NRPE: Command 'check_ssh' not defined
There is no
check_ssh command defined in the NRPE client as part of the install. You would need to define it in
common.cfg on the Ubuntu server if you wanted to execute it via NRPE.
However with check_ssh generally this is not a check you would execute via NRPE, you would execute it from the XI server. In this case you would edit your service in CCM and select the command
check_xi_service_ssh.
gbnag wrote:However unable to find a 'Current Load' or 'Current Users 'check (both services showing orange).
In CCM, for these services please click the floppy disk icon.
This will bring up the text definition of the services.
Please paste the text definitions here in a code block.
Re: Linux Ubuntu install - troubleshoot services
Posted: Thu Sep 29, 2016 6:10 pm
by gbnag
Code: Select all
###############################################################################
#
# Service configuration file
#
# Created by: Nagios Core Config Manager 2.5.2
# Date: 2016-09-29 16:08:45
# Version: Nagios 3.x config file
#
# --- DO NOT EDIT THIS FILE BY HAND ---
# Nagios CCM will overwrite all manual settings during the next update if you
# would like to edit files manually, place them in the 'static' directory or
# import your configs into the CCM by placing them in the 'import' directory.
#
###############################################################################
define service {
host_name dmg-dev.qualcomm.com
service_description / Disk Usage
use xiwizard_nrpe_service
check_command check_nrpe!check_disk!-a '-w 20% -c 10% -p /'
max_check_attempts 5
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
notification_interval 60
notification_period xi_timeperiod_24x7
contacts nagiosadmin
_xiwizard linux-server
register 1
}
define service {
host_name dmg-dev.qualcomm.com
service_description APT Updates
use xiwizard_nrpe_service
check_command check_nrpe!check_apt!-a '-U'
max_check_attempts 5
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
notification_interval 60
notification_period xi_timeperiod_24x7
contacts nagiosadmin
_xiwizard linux-server
register 1
}
define service {
host_name dmg-dev.qualcomm.com
service_description CPU Stats
use xiwizard_nrpe_service
check_command check_nrpe!check_cpu_stats!-a '-w 85 -c 95'
max_check_attempts 5
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
notification_interval 60
notification_period xi_timeperiod_24x7
contacts nagiosadmin
_xiwizard linux-server
register 1
}
define service {
host_name dmg-dev.qualcomm.com
service_description Cron Scheduling Daemon
use xiwizard_nrpe_service
check_command check_nrpe!check_init_service!-a 'cron'
max_check_attempts 5
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
notification_interval 60
notification_period xi_timeperiod_24x7
contacts nagiosadmin
_xiwizard linux-server
register 1
}
define service {
host_name dmg-dev.qualcomm.com
service_description Current Load
use generic-service
check_command check_nrpe!check_load!
max_check_attempts 5
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
notification_interval 60
notification_period xi_timeperiod_24x7
contacts nagiosadmin
_xiwizard nrpe
register 1
}
define service {
host_name dmg-dev.qualcomm.com
service_description Current Users
use generic-service
check_command check_nrpe!check_users!
max_check_attempts 5
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
notification_interval 60
notification_period xi_timeperiod_24x7
contacts nagiosadmin
_xiwizard nrpe
register 1
}
define service {
host_name dmg-dev.qualcomm.com
service_description Load
use xiwizard_nrpe_service
check_command check_nrpe!check_load!-a '-w 15,10,5 -c 30,20,10'
max_check_attempts 5
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
notification_interval 60
notification_period xi_timeperiod_24x7
contacts nagiosadmin
_xiwizard linux-server
register 1
}
define service {
host_name dmg-dev.qualcomm.com
service_description Memory Usage
use xiwizard_nrpe_service
check_command check_nrpe!check_mem!-a '-w 20 -c 10'
max_check_attempts 5
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
notification_interval 60
notification_period xi_timeperiod_24x7
contacts nagiosadmin
_xiwizard linux-server
register 1
}
define service {
host_name dmg-dev.qualcomm.com
service_description Node Javascript process
use xiwizard_nrpe_service
check_command check_nrpe!check_services!-a 'node'
max_check_attempts 5
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
notification_interval 60
notification_period xi_timeperiod_24x7
contacts nagiosadmin
_xiwizard linux-server
register 1
}
define service {
host_name dmg-dev.qualcomm.com
service_description Open Files
use xiwizard_nrpe_service
check_command check_nrpe!check_open_files!-a '-w 30 -c 50'
max_check_attempts 5
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
notification_interval 60
notification_period xi_timeperiod_24x7
contacts nagiosadmin
_xiwizard linux-server
register 1
}
define service {
host_name dmg-dev.qualcomm.com
service_description Ping
use xiwizard_linuxserver_ping_service
max_check_attempts 5
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
notification_interval 60
notification_period xi_timeperiod_24x7
contacts nagiosadmin
_xiwizard nrpe
register 1
}
define service {
host_name dmg-dev.qualcomm.com
service_description Swap Usage
use xiwizard_nrpe_service
check_command check_nrpe!check_swap!-a '-w 50 -c 20'
max_check_attempts 5
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
notification_interval 60
notification_period xi_timeperiod_24x7
contacts nagiosadmin
_xiwizard linux-server
register 1
}
define service {
host_name dmg-dev.qualcomm.com
service_description Total Processes
use generic-service
check_command check_nrpe!check_total_procs!
max_check_attempts 5
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
notification_interval 60
notification_period xi_timeperiod_24x7
contacts nagiosadmin
_xiwizard nrpe
register 1
}
define service {
host_name dmg-dev.qualcomm.com
service_description Users
use xiwizard_nrpe_service
check_command check_nrpe!check_users!-a '-w 5 -c 10'
max_check_attempts 5
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
notification_interval 60
notification_period xi_timeperiod_24x7
contacts nagiosadmin
_xiwizard linux-server
register 1
}
###############################################################################
#
# Service configuration file
#
# END OF FILE
#
###############################################################################
Re: Linux Ubuntu install - troubleshoot services
Posted: Thu Sep 29, 2016 7:26 pm
by Box293
gbnag wrote:Code: Select all
define service {
host_name dmg-dev.qualcomm.com
service_description Current Load
use generic-service
check_command check_nrpe!check_load!
max_check_attempts 5
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
notification_interval 60
notification_period xi_timeperiod_24x7
contacts nagiosadmin
_xiwizard nrpe
register 1
}
define service {
host_name dmg-dev.qualcomm.com
service_description Current Users
use generic-service
check_command check_nrpe!check_users!
max_check_attempts 5
check_interval 5
retry_interval 1
check_period xi_timeperiod_24x7
notification_interval 60
notification_period xi_timeperiod_24x7
contacts nagiosadmin
_xiwizard nrpe
register 1
}
Both of these checks do not have anything defined in $ARG2$ hence this is why it's failing.
For the "Current Load" check, in CCM edit the service and add the following to the $ARG2$ field:
-a '-w 15,10,5 -c 30,20,10'
For the "Current Users" check, in CCM edit the service and add the following to the $ARG2$ field:
-a '-w 5 -c 10'
Apply Config when done.
Re: Linux Ubuntu install - troubleshoot services
Posted: Fri Sep 30, 2016 6:51 pm
by gbnag
that worked, thanks.
Now, the values of 'current users' and 'users' is identical as are the values of 'Current load' and 'load'.
Just trying to understand why by default there are 2 variables reporting same values, except the 'Current users and current load did have issues reporting before making the suggested adjustments.
Also, could you guide us getting the Total Processes to show green?
Re: Linux Ubuntu install - troubleshoot services
Posted: Mon Oct 03, 2016 1:29 pm
by lmiltchev
Now, the values of 'current users' and 'users' is identical as are the values of 'Current load' and 'load'.
Just trying to understand why by default there are 2 variables reporting same values, except the 'Current users and current load did have issues reporting before making the suggested adjustments.
The reason I asked you this:
Did you add "Current Load", and "Current Users" manually?
is because these services are NOT included by default in the "Linux Server" wizard (not for Ubuntu anyway). See an example of "default" services added by the wizard below:
example01.PNG
Did you select "Ubuntu" from the "Linux Distribution" drop-down menu in Step 1 of the "Linux Server" wizard?
Also, could you guide us getting the Total Processes to show green?
You will need to modify the "Total Processes" service under the CCM - add:
-a '-w 150 -c 250'
to the $ARG2$ field. Save, and apply configuration.
Note: 150 & 250 are the "default" WARNING & CRITICAL threshold values. Feel free to use whatever values are relevant in your environment.