Nagios Core - Passive checks no communication.

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
eadmanday
Posts: 11
Joined: Tue May 15, 2018 11:01 am

Nagios Core - Passive checks no communication.

Post by eadmanday »

Hey folks, I am new to Nagios and am working on a Nagios Core server setup on an Ubuntu 16.04 Server, and an NCPA agent on a client windows 10 local machine. I am trying to setup a passive check between the two systems. I feel that I am missing one thing in order for the client system to communicate with my ubuntu server. I am able to see a grayed out windows placeholder text on the Nagios server but it will never receive any information from the client.

If anyone has some ideas or can help me it would be greatly appreciated.

I will dump the information of my configs.

Server:
in

Code: Select all

/usr/local/nagios/etc/nagios.cfg 
cfg_file=/usr/local/nagios/etc/objects/new_host.cfg
I have added the new_host.cfg file in /usr/local/nagios/etc/object/new_host.cfg with the following code

Code: Select all

define host {
    use            passive_host
    host_name        Webprint
}
define service {
    use                    passive_service
    service_description    CPU Usage
    host_name                Webprint
}
define service {
    use                    passive_service
    service_description    Disk Usage
    host_name                Webprint
}
define service {
    use                    passive_service
    service_description    Swap Usage
    host_name                Webprint
}
define service {
    use                    passive_service
    service_description    Memory Usage
    host_name                Webprint
}
define service {
    use                    passive_service
    service_description    Process Count
    host_name                Webprint
}
In the commands.cfg file i added

Code: Select all

define command {
        command_name    check_dummy
        command_line    $USER1$/check_dummy $ARG1$
}
In the templates.cfg file i have added the following

Code: Select all

define host {
    use                        generic-host
    name                     passive_host
    active_checks_enabled        0
    passive_checks_enabled        1
    flap_detection_enabled        0
    register                    0
    check_period                24x7
    max_check_attempts        1
    check_interval            5
    retry_interval            1
    check_freshness            0
    contact_groups            admins
    check_command            check_dummy!0
    notification_interval    60
    notification_period        24x7
    notification_options        d,u,r
}
And added the following in services within the templates section.

Code: Select all

define service {
    use                        generic-service
    name                        passive_service
    active_checks_enabled        0
    passive_checks_enabled        1
    flap_detection_enabled        0
    register                    0
    check_period                24x7
    max_check_attempts        1
    check_interval            5
    retry_interval            1
    check_freshness            0
    contact_groups            admins
    check_command            check_dummy!0
    notification_interval    60
    notification_period        24x7
    notification_options        w,u,c,r
}
Windows 10 Client has the NCPA.cfg file in the default file location used when installing the agent. The following code is from that file.

Code: Select all

#
# AUTO GENERATED NRDP CONFIG FROM WINDOWS INSTALLER
#

[passive checks]

# Host check  - This is to stop "pending check" status in Nagios
%HOSTNAME%|__HOST__ = system/agent_version

# Service checks
%HOSTNAME%|CPU Usage = cpu/percent --warning 80 --critical 90 --aggregate avg
%HOSTNAME%|Disk Usage = disk/logical/C:|/used_percent --warning 80 --critical 90 --units Gi
%HOSTNAME%|Swap Usage = memory/swap --warning 60 --critical 80 --units Gi
%HOSTNAME%|Memory Usage = memory/virtual --warning 80 --critical 90 --units Gi
%HOSTNAME%|Process Count = processes --warning 300 --critical 400
I have also added firewall rules for port 5693 and can access nrdp web interface on the server and on the ncpa agent on the client.

Does anyone have any ideas what might be causing the lack of communication?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios Core - Passive checks no communication.

Post by scottwilkerson »

Did you setup NRDP on your Nagios Core Server and configure the Passive host and token when installing the NSCA agent on the Windows server?
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
eadmanday
Posts: 11
Joined: Tue May 15, 2018 11:01 am

Re: Nagios Core - Passive checks no communication.

Post by eadmanday »

As far as I know, I did, I am able to go to the NRDP webpage on the server http://1.2.3.4/nrdp/ and test the token and I receive an OK confirmation. try a bad token and receive a bad token error.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios Core - Passive checks no communication.

Post by scottwilkerson »

eadmanday wrote:As far as I know, I did, I am able to go to the NRDP webpage on the server http://1.2.3.4/nrdp/ and test the token and I receive an OK confirmation. try a bad token and receive a bad token error.
But did you add that URL and token to the passive section on installation of NCPA agent on the windows machine and check the enable passive checkbox??
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
eadmanday
Posts: 11
Joined: Tue May 15, 2018 11:01 am

Re: Nagios Core - Passive checks no communication.

Post by eadmanday »

That is correct, i did do that under the passive configuration for NRDP

Here is the NCPA.Cfg file

Code: Select all

#
#   NCPA Main Config File
#   ---------------------
#

#
# -------------------------------
# General Configuration
# -------------------------------
#

[general]

#
# Check logging is on by default, you can disable it if you do not want to record
# the check requests that are coming in or checks being sent over NRDP.
# Default: check_logging = 1
#
check_logging = 1

#
# Check logging time - how long in DAYS you'd like to keep checks in the database.
# Default: 30
#
check_logging_time = 30

#
# Excluded file system types removes these fs types from the disk metrics
# (This is mostly only noteable on UNIX systems but also works on Windows if you need it)
# Default: aufs,autofs,binfmt_misc,cifs,cgroup,configfs,debugfs,devpts,devtmpfs,
#          encryptfs,efivarfs,fuse,hugetlbfs,mqueue,nfs,overlayfs,proc,pstore,
#          rpc_pipefs,securityfs,selinuxfs,smb,sysfs,tmpfs,tracefs
#
exclude_fs_types = aufs,autofs,binfmt_misc,cifs,cgroup,configfs,debugfs,devpts,devtmpfs,encryptfs,efivarfs,fuse,hugetlbfs,mqueue,nfs,overlayfs,proc,pstore,rpc_pipefs,securityfs,selinuxfs,smb,sysfs,tmpfs,tracefs

#
# The default unit to convert bytes (B) into if no unit is specified
# (Gi = 1024 MiB, G = 1000 MB)
#
default_units = Gi

#
# -------------------------------
# Listener Configuration (daemon)
# -------------------------------
#

[listener]

#
# User and group to run plugins as (recommended to use nagios:nagios)
# Default: uid = nagios
# Default: gid = nagios
#
# ** Note - The daemon runs as root, but forks a child process when running a plugin
#    that is defined by the user, for security reasons. However, without the main daemon
#    running as root, much of the system information would be missing. This is typical behavior. **
#
# This is for Unix only (Linux, Mac OS X, etc)
#
uid = nagios
gid = nagios

#
# IP address and port number for the Listener to use for the web GUI and API
# Default: ip = 0.0.0.0
# Default: port = 5693
#
ip =0.0.0.0
port =5693

#
# SSL connection and certificate config (if an SSL option is not available on some older
# operating systems it will default back to TLSv1)
# ssl_version options: TLSv1, TLSv1_1, TLSv1_2
#
ssl_version =TLSv1_2
certificate = adhoc

#
# Listener logging file level, location, and the PID location
# Default: loglevel = info (debug, info, warning, error)
# Default: logfile = var/log/ncpa_listener.log
# Default: pidfile = var/run/ncpa_listener.pid (leave listener in pid file name)
#
loglevel =warning
logfile = var/log/ncpa_listener.log
pidfile = var/run/ncpa_listener.pid

#
# Delay the listener (API & web GUI) from starting in seconds
# Default: 0
#
# delay_start = 30

#
# Allow admin functionality in the web GUI. When this is set to 0, the admin section will not
# be displayed in the header and will not be available to be accessed.
# Default: 1
#
admin_gui_access = 1

#
# Admin password for the admin section in the web GUI, by default there is no admin
# password and the admin section of the GUI can be accessed by anyone if admin_gui_access is set to 1.
# Default: None
#
# Note: Setting this value to 'None' will automatically log you in, setting it empty will allow you to
# log in using a blank password.
#
admin_password = None

#
# Require admin password to access ALL of the web GUI.
# This does not affect API access via token (community_string).
# Default: 0
#
admin_auth_only = 0

#
# -------------------------------
# Listener Configuration (API)
# -------------------------------
#

[api]

#
# The token that will be used to log into the basic web GUI (API browser, graphs, top charts, etc)
# and to authenticate requests to the API and requests through check_ncpa.py
#
community_string =test

#
# -------------------------------
# Passive Configuration (daemon)
# -------------------------------
#

[passive]

#
# Handlers are a comma separated list of what you would like the passive agent to run
# Default: None
# Options:
#   nrds, nrdp, kafkaproducer
#
# Example:
# handlers = nrds,nrdp,kafkaproducer
#
handlers =nrdp

#
# User and group to run passive checks as (Recommended to use nagios:nagios)
# Default: uid = nagios
# Default: gid = nagios
#
uid = nagios
gid = nagios

#
# Passive check interval - the amount in seconds to wait between each passive check by default,
# this can be overwritten by adding on a "|<duration>" in seconds to the passive check config
# Default: 300 (5 minutes)
#
sleep =50

#
# Passive logging file level, location, and the PID location
# Default: loglevel = info (debug, info, warning, error)
# Default: logfile = var/log/ncpa_passive.log
# Default: pidfile = var/run/ncpa_passive.pid (leave passive in pid file name)
#
loglevel =warning
logfile = var/log/ncpa_passive.log
pidfile = var/run/ncpa_passive.pid

#
# Delay passive checks from starting in seconds
# Default: 0
#
# delay_start = 30

#
# -------------------------------
# Passive Configuration (NRDP)
# -------------------------------
#

[nrdp]

#
# Connection settings to the NRDP server
# parent = NRDP server location (ex: http://<address>/nrdp)
# token = NRDP server token used to send NRDP results
#
parent =http://1.2.3.4/nrdp
token =test

#
# The hostname that will replace %HOSTNAME% in the check definitions and will be
# sent to NRDP with the check name as the service description (service name)
#
hostname =NCPA 2

#
# -------------------------------
# Passive Configuration (NRDS)
# -------------------------------
#

[nrds]

#
# NRDS CONFIGURATION DOES NOT WORK YET. MORE TO COME IN VERSION 2.1.0.
#

#
# NRDS connection information
#
url = 
token = 
config_name = 
config_version = 
update_config = 1
update_plugins = 1

[kafkaproducer]

#
# -------------------------------
# Passive Configuration (Kafka)
# -------------------------------
#

hostname = None
servers = localhost:9092
clientname = NCPA-Kafka
topic = ncpa

#
# -------------------------------
# Plugin Configuration
# -------------------------------
#

[plugin directives]

#
# Plugin path where all plugins will be ran from.
#
plugin_path = plugins/

#
# Plugin execution timeout in seconds. Different than the check_ncpa.py timeout, which is
# normally for network connection issues. Will return a CRITICAL value and error when the plugin
# reaches the defined max execution timeout and kills the process.
# Default: 60
#
# plugin_timeout = 60

#
# Extensions for plugins
# ----------------------
# The extension for the plugin denotes how NCPA will try to run the plugin. Use this
# for setting how you want to run the plugin in the command line.
#
# NOTE: Plugins without an extension will be ran in the cmdline as follows:
#       $plugin_name $plugin_args
#
# Defaults:
# .sh = /bin/sh $plugin_name $plugin_args
# .py = python $plugin_name $plugin_args
# .ps1 = powershell -ExecutionPolicy Bypass -File $plugin_name $plugin_args
# .vbs = cscript $plugin_name $plugin_args //NoLogo
# .bat = cmd /c $plugin_name $plugin_args
#
# Since windows NCPA is 32-bit, if you need to use 64-bit powershell, try the following for
# the powershell plugin definition:
# .ps1 = c:\windows\sysnative\windowspowershell\v1.0\powershell.exe -ExecutionPolicy Unrestricted -File $plugin_name $plugin_args
#

# Linux / Mac OS X
.sh = /bin/sh $plugin_name $plugin_args
.py = python $plugin_name $plugin_args

# Windows
.ps1 = powershell -ExecutionPolicy Bypass -File $plugin_name $plugin_args
.vbs = cscript $plugin_name $plugin_args //NoLogo
.wsf = cscript $plugin_name $plugin_args //NoLogo
.bat = cmd /c $plugin_name $plugin_args
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios Core - Passive checks no communication.

Post by scottwilkerson »

Great. Finally did you start the NCPA passive service?
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
eadmanday
Posts: 11
Joined: Tue May 15, 2018 11:01 am

Re: Nagios Core - Passive checks no communication.

Post by eadmanday »

Both the NCPAlistener and NCPApassive are both running and set to delayed startup on the Windows 10 system.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios Core - Passive checks no communication.

Post by scottwilkerson »

just looked again and your nagios configs have a host named "Webprint" but in your NCPA config you have this

Code: Select all

hostname =NCPA 2
You would need to change this to

Code: Select all

hostname =Webprint
and then restart both services
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
eadmanday
Posts: 11
Joined: Tue May 15, 2018 11:01 am

Re: Nagios Core - Passive checks no communication.

Post by eadmanday »

Thank you for pointing that out. I changed the name and restarted the service(s). I am still not getting anything to show on the Nagios > Current Status > hosts page.

Although looking into the log i did see that there a few lines for Webprint and nothing else.

Code: Select all

tail -10000 /usr/local/nagios/var/nagios.log |head -1000|grep -A 2 -i 'Webprint' |more
[1526886000] CURRENT HOST STATE: Webprint;UP;HARD;1;
[1526886000] CURRENT HOST STATE: localhost;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 0.06 ms
[1526886000] CURRENT SERVICE STATE: Webprint;CPU Usage;OK;HARD;1;
[1526886000] CURRENT SERVICE STATE: Webprint;Disk Usage;OK;HARD;1;
[1526886000] CURRENT SERVICE STATE: Webprint;Memory Usage;OK;HARD;1;
[1526886000] CURRENT SERVICE STATE: Webprint;Process Count;OK;HARD;1;
[1526886000] CURRENT SERVICE STATE: Webprint;Swap Usage;OK;HARD;1;
[1526886000] CURRENT SERVICE STATE: localhost;Current Load;OK;HARD;1;OK - load average: 0.00, 0.00, 0.00
[1526886000] CURRENT SERVICE STATE: localhost;Current Users;OK;HARD;1;USERS OK - 0 users currently logged in
I also see a few errors from the template.cfg files for the following

Code: Select all

[1526921951] Error: Template 'generic-service' specified in service definition could not be not found (config file '/usr/local/nagios/etc/objects/templates.cfg', starting on line 180)
[1526921951] Error: Template 'generic-service' specified in service definition could not be not found (config file '/usr/local/nagios/etc/objects/templates.cfg', starting on line 156)
Those two errors are on the two following define services.

Code: Select all

# Generic service definition template - This is NOT a real service, just a template!


define service {
    use                        generic-service
    name                        passive_service
    active_checks_enabled        0
    passive_checks_enabled        1
    flap_detection_enabled        0
    register                    0
    check_period                24x7
    max_check_attempts        1
    check_interval            5
    retry_interval            1
    check_freshness            0
    contact_groups            admins
    check_command            check_dummy!0
    notification_interval    60
    notification_period        24x7
    notification_options        w,u,c,r
}




# Local service definition template - This is NOT a real service, just a template!

define service{
        name                            local-service;            The name of this service template
        use                             generic-service;          Inherit default values from the generic-service definition
        max_check_attempts              4;                        Re-check the service up to 4 times in order to determine its final (hard) state
        check_interval                  5;                        Check the service every 5 minutes under normal conditions
        retry_interval                  1;                        Re-check the service every minute until a hard state can be determined
        register                        0;                        DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
        }
Not sure if that would have anything to do with it or if there is another place I should be looking for communication information?

Thank you for your help as well.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios Core - Passive checks no communication.

Post by scottwilkerson »

This error looks like you don't have a generic-service template defined but you are calling it with the use directive

something like this needs to be added to the config

Code: Select all

define service {
       name                          		generic-service
       is_volatile                   		0
       max_check_attempts            		3
       check_interval                		10
       retry_interval                		2
       active_checks_enabled         		1
       passive_checks_enabled        		1
       check_period                  		24x7
       parallelize_check             		1
       obsess_over_service           		1
       check_freshness               		0
       event_handler_enabled         		1
       flap_detection_enabled        		1
       process_perf_data             		1
       retain_status_information     		1
       retain_nonstatus_information  		1
       notification_interval         		60
       notification_period           		24x7
       notification_options          		w,c,u,r,
       notifications_enabled         		1
       contact_groups                		admins
       register                    		0

}	
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
Locked