Page 1 of 1

NSCA with NSClient++ results not holding

Posted: Sat Jul 18, 2015 3:43 pm
by filemakers
Ok, I've spent 3 weeks trying to get this functioning and I am at the end of my understanding.
I am a 4 week nagios /linux noob
Setup - 2 Windows Web servers on the internet - No ability to open ports inbound as behind a natted network and impractical going forward as wish to apply to other internet facing servers.
Have installed NSClient ++ and using NSCA to send results into nagios every 4 minutes
I believe the messages are arriving ok, as all Alerts are appearing with correct info at one stage or another on the nagios webportal.

The problem is , Whilst I can see them coming in ok on the messages log, they dont stay green, they flick to critical with the error "CHECK_NRPE: Socket timeout after 10 seconds. " , even though there is nothing wrong with the monitored machine.

I am sure my configuration files will current read far from perfect at the moment, as I have tweaked a few of the services on the nagios server, to see if make any difference.

This is a home project, so I only get on Evenings, and I am in the UK, so please dont think I am ignorant with my replys, I am learning nagios and linux for this project , so I apologise in advance if I take my time providing the info required to help me.

You will find alerts duplicated, again done to attempt different configs to see if helps.

Here is my NSClient Ini file

[/modules]
CheckSystem=enabled
CheckDisk=enabled
CheckExternalScripts=enabled
CheckHelpers=enabled
Scheduler=enabled
NSCAClient=enabled
CheckWMI=enabled
CheckSystem=1
CheckDisk=1
FileLogger.dll
NSClientListener.dll

;[/settings/scheduler/schedules/foo]
;command=bar
;[/settings/scheduler/schedules/alias]
;command=command

[/settings/scheduler/schedules/default]
interval=4m

[/settings/scheduler/schedules]
cpu=alias_cpu
mem=alias_mem
disk=alias_disk
service=alias_service
uptime=check_uptime
Check Up Time=CheckUpTime


CPU Usage=checkCPU warn=80 crit=90 time=30m time=20s time=10s

Memory Usage=checkMem MaxWarn=90% MaxCrit=98% ShowAll

Pagefile Usage=checkMem MaxWarn=90% MaxCrit=98% ShowAll type=page

All Drive Space Usage=CheckDriveSize -a FilterType=FIXED matching=.*[CD].* ShowAll=long MinWarn=10%

MinCrit=5% CheckAll

FilterType=FIXED

C:\ Drive Space Usage=CheckDriveSize MinWarn=10% MinCrit=5% Drive=c:\
FilterType=FIXED

Apache Service Status=checkServiceState Apache2.2

NSClient Service Status=checkServiceState nscp

Operating System Version = CheckWMI "Query=Select Version,Caption from win32_OperatingSystem"


;# The following is the host check (always ok/up)

host_check=CheckOK Machine is okay


[/settings/NSCA/client]
hostname=Houseswiftweb1

[/settings/NSCA/client/targets/default]
address=**.**.**.** Obsured
encryption=xor
password=********** Obscured
allow arguments=1


**********************************************************************************************************************
here is a sample passive template and host

define host{
name windows-server-passive ; The name of this host template
use generic-host ; Inherit default values from the generic-host template
check_period 24x7 ; By default, Windows servers are monitored round the clock
check_interval 5 ; Actively check the server every 5 minutes
retry_interval 1 ; Schedule host check retries at 1 minute intervals
max_check_attempts 10 ; Check each server 10 times (max)
check_command check-host-alive ; Default command to check if servers are "alive"
notification_period 24x7 ; Send notification out at any time - day or night
notification_interval 30 ; Resend notifications every 30 minutes
notification_options d,r ; Only send notifications for specific host states
contact_groups admins ; Notifications get sent to the admins by default
hostgroups windows-servers-passive ; Host groups that Windows servers should be a member of
register 0 ; DONT REGISTER THIS - ITS JUST A TEMPLATE
}

define host{
use windows-server-passive ; Inherit default values from a template
host_name **********web1 ; The name we're giving to this host
alias ******* Web Server ; A longer name associat$
passive_checks_enabled 1
active_checks_enabled 0
address 77.*.*.* ; IP address of the host
}

*******************************************************************************************

Here are a couple of services

define service{
use generic-service-passive
hostgroup_name windows-servers-passive
passive_checks_enabled 1
active_checks_enabled 0
service_description CPU Usage
check_command check_nrpe!checkCPU 80 90 30 20 10
}


define service{
use generic-service-passive
hostgroup_name windows-servers-passive
passive_checks_enabled 1
active_checks_enabled 0
service_description Memory Usage
check_command check_nrpe!checkMem 90 98
}



********************************************************************************************************************

here is a extract of my log,

Jul 17 02:18:43 localhost xinetd[10564]: EXIT: nsca status=0 pid=7908 duration=5(sec)
Jul 17 02:18:47 localhost nsca[7909]: Handling the connection...
Jul 17 02:18:47 localhost nsca[7909]: SERVICE CHECK -> Host Name: '******* web1', Service Description: 'Pagefile Usage', Return Code: '0', Output: 'OK: committed: Total: 47.988GB - Used: 18.767GB (39%) - Free: 29.221GB (60%)|'committed'=18.76693GB;43.1895;47.02856;0;47.98833 'committed %'=39%;89;97;0;100'
Jul 17 02:18:47 localhost nsca[7909]: End of connection...
Jul 17 02:18:48 localhost nsca[7910]: Handling the connection...
Jul 17 02:18:48 localhost nsca[7910]: SERVICE CHECK -> Host Name: '********** - Uranus', Service Description: 'CPU Usage', Return Code: '0', Output: 'OK: CPU load is ok.|'total 20m'=0%;80;90 'total 10s'=0%;80;90 'total 4'=0%;80;90'
Jul 17 02:18:48 localhost nsca[7910]: End of connection...
Jul 17 02:18:48 localhost xinetd[10564]: EXIT: nsca status=0 pid=7910 duration=5(sec)
Jul 17 02:18:55 localhost xinetd[10564]: START: nsca pid=7921 from=::ffff:*.*.*.*
Jul 17 02:19:00 localhost nsca[7921]: Handling the connection...
Jul 17 02:19:00 localhost nsca[7921]: SERVICE CHECK -> Host Name: '******* web1', Service Description: 'CPU Usage', Return Code: '0', Output: 'OK: CPU load is ok.|'total 30m'=12%;80;90 'total 20s'=11%;80;90 'total 10s'=12%;80;90'
Jul 17 02:19:00 localhost nsca[7921]: End of connection...
Jul 17 02:19:00 localhost xinetd[10564]: EXIT: nsca status=0 pid=7921 duration=5(sec)
Jul 17 02:19:02 localhost xinetd[10564]: START: nsca pid=7927 from=::ffff:*.*.*.*
Jul 17 02:19:07 localhost nsca[7927]: Handling the connection...
Jul 17 02:19:07 localhost nsca[7927]: SERVICE CHECK -> Host Name: '******* web1', Service Description: 'Check Up Time', Return Code: '0', Output: 'OK: uptime: 49w 299d 7176:429503h, boot: 2014-Aug-08 09:05:14 (UTC)|'uptime'=29763186s;172800;86400'
Jul 17 02:19:07 localhost nsca[7927]: End of connection...
Jul 17 02:19:07 localhost xinetd[10564]: EXIT: nsca status=0 pid=7927 duration=5(sec)
Jul 17 02:19:25 localhost xinetd[10564]: START: nsca pid=7939 from=::ffff:*.*.*.*
Jul 17 02:19:28 localhost xinetd[10564]: START: nsca pid=7940 from=::ffff:*.*.*.*
Jul 17 02:19:30 localhost nsca[7939]: Handling the connection...
Jul 17 02:19:30 localhost nsca[7939]: SERVICE CHECK -> Host Name: '******* web1', Service Description: 'uptime', Return Code: '0', Output: 'OK: uptime: 49w 299d 7176:429503h, boot: 2014-Aug-08 09:05:14 (UTC)|'uptime'=29763207s;172800;86400'
Jul 17 02:19:30 localhost nsca[7939]: End of connection...
Jul 17 02:19:30 localhost xinetd[10564]: EXIT: nsca status=0 pid=7939 duration=5(sec)


Dispite these arriving, my alerts are still showing the NRPE error



Is anyone able to help me please?

Re: NSCA with NSClient++ results not holding

Posted: Mon Jul 20, 2015 9:47 am
by jdalrymple
I would change my check_command to check_dummy for passives. That still doesn't explain why the NRPE checks are running. After setting your config (namely active_checks=0) did you restart Nagios?

Maybe take a look at /usr/local/nagios/var/objects.cache (path may need modification) and share with us the section relevant to one of your broken hosts or services?