Page 1 of 1

Linux Ubuntu install - troubleshoot services

Posted: Tue Sep 27, 2016 6:13 pm
by gbnag
hello,

Installed the linux agent for ubuntu and finding a number of services are either unknown or in warning or critical state (see attached).
How would we rectify to show them as normal. The target system does run the cron and ssh daemons.
Could not locate the command 'check_total_procs' on the nagiosxi server under /usr/local/nagios/libexec/, but service shows as critical.

ty

Re: Linux Ubuntu install - troubleshoot services

Posted: Tue Sep 27, 2016 7:44 pm
by Box293
How did you install the linux agent for ubutnu, did you follow a specific guide?

Re: Linux Ubuntu install - troubleshoot services

Posted: Wed Sep 28, 2016 1:23 pm
by gbnag
We did get an installation error, i.e:

----------------------------------------------
Processing triggers for libc-bin (2.19-0ubuntu6.9) ...
Prerequisites installed OK
RESULT=0
Running './2-usersgroups'...
Adding users and groups...
useradd: user 'nagios' already exists
groupadd: group 'nagios' already exists
useradd: user 'nagios' already exists
ERROR: User 'nagios' was not created - exiting.
RESULT=1

===================
INSTALLATION ERROR!
===================
Installation step failed - exiting.
-----------------------------------------------------------


when we addressed the installation error during a quick start session it was suggested in order to bypass this error, add 'touch installed.usersgroups' under the linux-nrpe-agent directory and rerun the full install script, which we did.



------------------------------------------------------------------

Installation finished without errors:

.........
......................
xinetd stop/waiting
xinetd start/running, process 26802
Subcomponents installed OK
RESULT=0

##########################################################
### ###
### Nagios XI Linux Agent Installation Complete! ###
### ###
##########################################################

If you experience any problems, please attach the file install.log that was just created to any support requests.

root@dmg-dev:/tmp/linux-nrpe-agent#

Re: Linux Ubuntu install - troubleshoot services

Posted: Wed Sep 28, 2016 2:02 pm
by lmiltchev
Not all of these services are added via the "Linux Server" wizard... Did you add "Current Load", and "Current Users" manually?

Show us the actual commands for the failing services run from the command line (on the Nagios XI server) along with the output.

Also, run the following command on the remote (Ununtu) box:

Code: Select all

cat /usr/local/nagios/etc/nrpe/common.cfg

Re: Linux Ubuntu install - troubleshoot services

Posted: Wed Sep 28, 2016 6:50 pm
by gbnag
We performed a default install.

From the XI server, we can run checks for load (shows green
[root@localhost-010049098179 libexec]# /usr/local/nagios/libexec/check_nrpe -H dmg-dev -t 30 -c check_load -a '-w 5 -c 10'
OK - load average: 2.22, 2.22, 2.29|load1=2.220;5.000;10.000;0; load5=2.220;5.000;10.000;0; load15=2.290;5.000;10.000;0;
[root@localhost-010049098179 libexec]#

However unable to find a 'Current Load' or 'Current Users 'check (both services showing orange).

Total Processes (service showing red) unable to find the check for it either (see attached).


For ssh server (service showing yellow) getting:
[root@localhost-010049098179 libexec]# /usr/local/nagios/libexec/check_nrpe -H dmg-dev -t 30 -c check_ssh
NRPE: Command 'check_ssh' not defined

For cron scheduling daemon (service showing yellow), unable to locate a check script.




Below is the common.cfg output from the target ubuntu host:
----------------------------------------------------------------------------

root@dmg-dev:~# cat /usr/local/nagios/etc/nrpe/common.cfg

### GENERIC SERVICES ###
command[check_init_service]=sudo /usr/local/nagios/libexec/check_init_service $ARG1$
command[check_services]=/usr/local/nagios/libexec/check_services -p $ARG1$

### MISC SYSTEM METRICS ###
#command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_users]=/usr/local/nagios/libexec/check_users $ARG1$
command[check_load]=/usr/local/nagios/libexec/check_load $ARG1$
command[check_swap]=/usr/local/nagios/libexec/check_swap $ARG1$
command[check_cpu_stats]=/usr/local/nagios/libexec/check_cpu_stats.sh $ARG1$
command[check_mem]=/usr/local/nagios/libexec/custom_check_mem -n $ARG1$

### SYSTEM UPDATES ###
command[check_yum]=/usr/local/nagios/libexec/check_yum
command[check_apt]=/usr/local/nagios/libexec/check_apt

### DISK ###
command[check_disk]=/usr/local/nagios/libexec/check_disk $ARG1$
command[check_ide_smart]=/usr/local/nagios/libexec/check_ide_smart $ARG1$

### PROCESSES ###
command[check_all_procs]=/usr/local/nagios/libexec/custom_check_procs
command[check_procs]=/usr/local/nagios/libexec/check_procs $ARG1$

### OPEN FILES ###
command[check_open_files]=/usr/local/nagios/libexec/check_open_files.pl $ARG1$

### NETWORK CONNECTIONS ###
command[check_netstat]=/usr/local/nagios/libexec/check_netstat.pl -p $ARG1$ $ARG2$root@dmg-dev:~#

Re: Linux Ubuntu install - troubleshoot services

Posted: Thu Sep 29, 2016 12:18 am
by Box293
gbnag wrote:For ssh server (service showing yellow) getting:
[root@localhost-010049098179 libexec]# /usr/local/nagios/libexec/check_nrpe -H dmg-dev -t 30 -c check_ssh
NRPE: Command 'check_ssh' not defined
There is no check_ssh command defined in the NRPE client as part of the install. You would need to define it in common.cfg on the Ubuntu server if you wanted to execute it via NRPE.

However with check_ssh generally this is not a check you would execute via NRPE, you would execute it from the XI server. In this case you would edit your service in CCM and select the command check_xi_service_ssh.
gbnag wrote:However unable to find a 'Current Load' or 'Current Users 'check (both services showing orange).
In CCM, for these services please click the floppy disk icon.
This will bring up the text definition of the services.
Please paste the text definitions here in a code block.

Re: Linux Ubuntu install - troubleshoot services

Posted: Thu Sep 29, 2016 6:10 pm
by gbnag

Code: Select all

###############################################################################
#
# Service configuration file
#
# Created by: Nagios Core Config Manager 2.5.2
# Date:	      2016-09-29 16:08:45
# Version:    Nagios 3.x config file
#
# --- DO NOT EDIT THIS FILE BY HAND --- 
# Nagios CCM will overwrite all manual settings during the next update if you 
# would like to edit files manually, place them in the 'static' directory or 
# import your configs into the CCM by placing them in the 'import' directory.
#
###############################################################################

define service {
	host_name			dmg-dev.qualcomm.com
	service_description		/ Disk Usage
	use				xiwizard_nrpe_service
	check_command			check_nrpe!check_disk!-a '-w 20% -c 10% -p /'
	max_check_attempts		5
	check_interval			5
	retry_interval			1
	check_period			xi_timeperiod_24x7
	notification_interval		60
	notification_period		xi_timeperiod_24x7
	contacts			nagiosadmin
	_xiwizard			linux-server
	register			1
	}	

define service {
	host_name			dmg-dev.qualcomm.com
	service_description		APT Updates
	use				xiwizard_nrpe_service
	check_command			check_nrpe!check_apt!-a '-U'
	max_check_attempts		5
	check_interval			5
	retry_interval			1
	check_period			xi_timeperiod_24x7
	notification_interval		60
	notification_period		xi_timeperiod_24x7
	contacts			nagiosadmin
	_xiwizard			linux-server
	register			1
	}	

define service {
	host_name			dmg-dev.qualcomm.com
	service_description		CPU Stats
	use				xiwizard_nrpe_service
	check_command			check_nrpe!check_cpu_stats!-a '-w 85 -c 95'
	max_check_attempts		5
	check_interval			5
	retry_interval			1
	check_period			xi_timeperiod_24x7
	notification_interval		60
	notification_period		xi_timeperiod_24x7
	contacts			nagiosadmin
	_xiwizard			linux-server
	register			1
	}	

define service {
	host_name			dmg-dev.qualcomm.com
	service_description		Cron Scheduling Daemon
	use				xiwizard_nrpe_service
	check_command			check_nrpe!check_init_service!-a 'cron'
	max_check_attempts		5
	check_interval			5
	retry_interval			1
	check_period			xi_timeperiod_24x7
	notification_interval		60
	notification_period		xi_timeperiod_24x7
	contacts			nagiosadmin
	_xiwizard			linux-server
	register			1
	}	

define service {
	host_name			dmg-dev.qualcomm.com
	service_description		Current Load
	use				generic-service
	check_command			check_nrpe!check_load!
	max_check_attempts		5
	check_interval			5
	retry_interval			1
	check_period			xi_timeperiod_24x7
	notification_interval		60
	notification_period		xi_timeperiod_24x7
	contacts			nagiosadmin
	_xiwizard			nrpe
	register			1
	}	

define service {
	host_name			dmg-dev.qualcomm.com
	service_description		Current Users
	use				generic-service
	check_command			check_nrpe!check_users!
	max_check_attempts		5
	check_interval			5
	retry_interval			1
	check_period			xi_timeperiod_24x7
	notification_interval		60
	notification_period		xi_timeperiod_24x7
	contacts			nagiosadmin
	_xiwizard			nrpe
	register			1
	}	

define service {
	host_name			dmg-dev.qualcomm.com
	service_description		Load
	use				xiwizard_nrpe_service
	check_command			check_nrpe!check_load!-a '-w 15,10,5 -c 30,20,10'
	max_check_attempts		5
	check_interval			5
	retry_interval			1
	check_period			xi_timeperiod_24x7
	notification_interval		60
	notification_period		xi_timeperiod_24x7
	contacts			nagiosadmin
	_xiwizard			linux-server
	register			1
	}	

define service {
	host_name			dmg-dev.qualcomm.com
	service_description		Memory Usage
	use				xiwizard_nrpe_service
	check_command			check_nrpe!check_mem!-a '-w 20 -c 10'
	max_check_attempts		5
	check_interval			5
	retry_interval			1
	check_period			xi_timeperiod_24x7
	notification_interval		60
	notification_period		xi_timeperiod_24x7
	contacts			nagiosadmin
	_xiwizard			linux-server
	register			1
	}	

define service {
	host_name			dmg-dev.qualcomm.com
	service_description		Node Javascript process
	use				xiwizard_nrpe_service
	check_command			check_nrpe!check_services!-a 'node'
	max_check_attempts		5
	check_interval			5
	retry_interval			1
	check_period			xi_timeperiod_24x7
	notification_interval		60
	notification_period		xi_timeperiod_24x7
	contacts			nagiosadmin
	_xiwizard			linux-server
	register			1
	}	

define service {
	host_name			dmg-dev.qualcomm.com
	service_description		Open Files
	use				xiwizard_nrpe_service
	check_command			check_nrpe!check_open_files!-a '-w 30 -c 50'
	max_check_attempts		5
	check_interval			5
	retry_interval			1
	check_period			xi_timeperiod_24x7
	notification_interval		60
	notification_period		xi_timeperiod_24x7
	contacts			nagiosadmin
	_xiwizard			linux-server
	register			1
	}	

define service {
	host_name			dmg-dev.qualcomm.com
	service_description		Ping
	use				xiwizard_linuxserver_ping_service
	max_check_attempts		5
	check_interval			5
	retry_interval			1
	check_period			xi_timeperiod_24x7
	notification_interval		60
	notification_period		xi_timeperiod_24x7
	contacts			nagiosadmin
	_xiwizard			nrpe
	register			1
	}	

define service {
	host_name			dmg-dev.qualcomm.com
	service_description		Swap Usage
	use				xiwizard_nrpe_service
	check_command			check_nrpe!check_swap!-a '-w 50 -c 20'
	max_check_attempts		5
	check_interval			5
	retry_interval			1
	check_period			xi_timeperiod_24x7
	notification_interval		60
	notification_period		xi_timeperiod_24x7
	contacts			nagiosadmin
	_xiwizard			linux-server
	register			1
	}	

define service {
	host_name			dmg-dev.qualcomm.com
	service_description		Total Processes
	use				generic-service
	check_command			check_nrpe!check_total_procs!
	max_check_attempts		5
	check_interval			5
	retry_interval			1
	check_period			xi_timeperiod_24x7
	notification_interval		60
	notification_period		xi_timeperiod_24x7
	contacts			nagiosadmin
	_xiwizard			nrpe
	register			1
	}	

define service {
	host_name			dmg-dev.qualcomm.com
	service_description		Users
	use				xiwizard_nrpe_service
	check_command			check_nrpe!check_users!-a '-w 5 -c 10'
	max_check_attempts		5
	check_interval			5
	retry_interval			1
	check_period			xi_timeperiod_24x7
	notification_interval		60
	notification_period		xi_timeperiod_24x7
	contacts			nagiosadmin
	_xiwizard			linux-server
	register			1
	}	

###############################################################################
#
# Service configuration file
#
# END OF FILE
#
###############################################################################

Re: Linux Ubuntu install - troubleshoot services

Posted: Thu Sep 29, 2016 7:26 pm
by Box293
gbnag wrote:

Code: Select all

define service {
	host_name			dmg-dev.qualcomm.com
	service_description		Current Load
	use				generic-service
	check_command			check_nrpe!check_load!
	max_check_attempts		5
	check_interval			5
	retry_interval			1
	check_period			xi_timeperiod_24x7
	notification_interval		60
	notification_period		xi_timeperiod_24x7
	contacts			nagiosadmin
	_xiwizard			nrpe
	register			1
	}	

define service {
	host_name			dmg-dev.qualcomm.com
	service_description		Current Users
	use				generic-service
	check_command			check_nrpe!check_users!
	max_check_attempts		5
	check_interval			5
	retry_interval			1
	check_period			xi_timeperiod_24x7
	notification_interval		60
	notification_period		xi_timeperiod_24x7
	contacts			nagiosadmin
	_xiwizard			nrpe
	register			1
	}
Both of these checks do not have anything defined in $ARG2$ hence this is why it's failing.

For the "Current Load" check, in CCM edit the service and add the following to the $ARG2$ field:
-a '-w 15,10,5 -c 30,20,10'

For the "Current Users" check, in CCM edit the service and add the following to the $ARG2$ field:
-a '-w 5 -c 10'

Apply Config when done.

Re: Linux Ubuntu install - troubleshoot services

Posted: Fri Sep 30, 2016 6:51 pm
by gbnag
that worked, thanks.

Now, the values of 'current users' and 'users' is identical as are the values of 'Current load' and 'load'.
Just trying to understand why by default there are 2 variables reporting same values, except the 'Current users and current load did have issues reporting before making the suggested adjustments.

Also, could you guide us getting the Total Processes to show green?

Re: Linux Ubuntu install - troubleshoot services

Posted: Mon Oct 03, 2016 1:29 pm
by lmiltchev
Now, the values of 'current users' and 'users' is identical as are the values of 'Current load' and 'load'.
Just trying to understand why by default there are 2 variables reporting same values, except the 'Current users and current load did have issues reporting before making the suggested adjustments.
The reason I asked you this:
Did you add "Current Load", and "Current Users" manually?
is because these services are NOT included by default in the "Linux Server" wizard (not for Ubuntu anyway). See an example of "default" services added by the wizard below:
example01.PNG
Did you select "Ubuntu" from the "Linux Distribution" drop-down menu in Step 1 of the "Linux Server" wizard?
Also, could you guide us getting the Total Processes to show green?
You will need to modify the "Total Processes" service under the CCM - add:
-a '-w 150 -c 250'
to the $ARG2$ field. Save, and apply configuration.

Note: 150 & 250 are the "default" WARNING & CRITICAL threshold values. Feel free to use whatever values are relevant in your environment.