Page 1 of 2

wrong notification or alarms

Posted: Thu Nov 09, 2017 7:15 am
by DOkuwa
Hello
I do have some alarms or notifications which are false
examples are where the device is not down

** Notification **

Notification Type: PROBLEM

Service: Network-Ports
Host: uschi-eqn1-cs02
Address: 10.255.43.1
State: WARNING

Date/Time: 2017-11-09 Additional Info : No response from remote host 10.255.43.1

Re: wrong notification or alarms

Posted: Mon Nov 13, 2017 12:58 pm
by acefreakz
Can you show us the service definition for Network-Ports? For now am guessing that the port is closed on the remote host.
I do have some alarms or notifications which are false
Do you consistently getting this? or sometimes?

Re: wrong notification or alarms

Posted: Mon Nov 13, 2017 1:51 pm
by kyang
Thanks @acefreakz,

@DOkuwa, a service definition for Network-Ports would help us. Is it only this service that is experiencing this problem?

Also, can you explain how it is false? Is the service actually UP?

Re: wrong notification or alarms

Posted: Thu Nov 16, 2017 5:34 am
by DOkuwa
An example is this

Code: Select all

[1510823136] SERVICE ALERT: ukldn-thca-cs02;Network-Ports;UNKNOWN;HARD;3;ERROR: Unable to get status from 10.245.65.1 (alarm timeout)
[1510823318] SERVICE ALERT: ukldn-thca-cs02;Network-Ports;WARNING;HARD;3;No response from remote host '10.245.65.1'
[1510823500] SERVICE ALERT: ukldn-thca-cs02;Network-Ports;UNKNOWN;HARD;3;ERROR: Unable to get status from 10.245.65.1 (alarm timeout)
[1510823858] SERVICE ALERT: ukldn-thca-cs02;Network-Ports;OK;HARD;3;All ports OK.
[1510824157] SERVICE ALERT: ukldn-thca-cs02;Network-Ports;WARNING;HARD;3;No response from remote host '10.245.65.1'
[1510824351] SERVICE ALERT: ukldn-thca-cs02;Network-Ports;UNKNOWN;HARD;3;ERROR: Unable to get status from 10.245.65.1 (alarm timeout)
[1510824703] SERVICE ALERT: ukldn-thca-cs02;Network-Ports;OK;HARD;3;All ports OK

but we can ping it and there are no issues

 ping 10.245.65.1
PING 10.245.65.1 (10.245.65.1) 56(84) bytes of data.
64 bytes from 10.245.65.1: icmp_seq=1 ttl=62 time=1.41 ms
64 bytes from 10.245.65.1: icmp_seq=2 ttl=62 time=3.88 ms
64 bytes from 10.245.65.1: icmp_seq=3 ttl=62 time=2.49 ms
64 bytes from 10.245.65.1: icmp_seq=4 ttl=62 time=3.69 ms
^C
--- 10.245.65.1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3002




Re: wrong notification or alarms

Posted: Thu Nov 16, 2017 9:47 am
by DOkuwa
Also did an snmpwalk and this worked

Code: Select all

 /usr/bin/snmpwalk  -v 2c -c xxxxx ukldn-thca-cs02 1.3.6.1.4.1.6527.3.1.2.2.4.2.1.39
SNMPv2-SMI::enterprises.6527.3.1.2.2.4.2.1.39.1.35684352 = INTEGER: 5
SNMPv2-SMI::enterprises.6527.3.1.2.2.4.2.1.39.1.35717120 = INTEGER: 5
SNMPv2-SMI::enterprises.6527.3.1.2.2.4.2.1.39.1.35749888 = INTEGER: 5
SNMPv2-SMI::enterprises.6527.3.1.2.2.4.2.1.39.1.35782656 = INTEGER: 5
SNMPv2-SMI::enterprises.6527.3.1.2.2.4.2.1.39.1.35815424 = INTEGER: 5
SNMPv2-SMI::enterprises.6527.3.1.2.2.4.2.1.39.1.35848192 = INTEGER

Re: wrong notification or alarms

Posted: Thu Nov 16, 2017 10:47 am
by DOkuwa
another hosts that has the same issue

Code: Select all

 more nagios.log | grep uschi-eqn1-cs02
[1510790400] CURRENT HOST STATE: uschi-eqn1-cs02;UP;HARD;1;OK - 10.255.43.1: rta 8.845ms, lost 0%
[1510790400] CURRENT SERVICE STATE: uschi-eqn1-cs02;Cpm;WARNING;HARD;3;No response from remote host '10.255.43.1'
[1510790400] CURRENT SERVICE STATE: uschi-eqn1-cs02;Fans;WARNING;HARD;3;No response from remote host '10.255.43.1'
[1510790400] CURRENT SERVICE STATE: uschi-eqn1-cs02;Lsp;UNKNOWN;HARD;3;ERROR: Unable to get status from 10.255.43.1 (alarm timeout)
[1510790400] CURRENT SERVICE STATE: uschi-eqn1-cs02;Network-Ports;WARNING;HARD;3;No response from remote host '10.255.43.1'
[1510790400] CURRENT SERVICE STATE: uschi-eqn1-cs02;Psu;WARNING;HARD;3;No response from remote host '10.255.43.1'
[1510790400] CURRENT SERVICE STATE: uschi-eqn1-cs02;Temperature;WARNING;HARD;3;No response from remote host '10.255.43.1'
[1510790400] CURRENT SERVICE STATE: uschi-eqn1-cs02;ping;OK;HARD;1;OK - 10.255.43.1: rta 16.862ms, lost 0%
usr/local/nagios/var# /usr/bin/snmpwalk -v 2c -c xxxxx uschi-eqn1-cs02 1.3.6.1.4.1.6527.3.1.2.2.4.2.1.38
SNMPv2-SMI::enterprises.6527.3.1.2.2.4.2.1.38.1.37781504 = INTEGER: 3
SNMPv2-SMI::enterprises.6527.3.1.2.2.4.2.1.38.1.37814272 = INTEGER: 3
SNMPv2-SMI::enterprises.6527.3.1.2.2.4.2.1.38.1.37847040 = INTEGER: 3
SNMPv2-SMI::enterprises.6527.3.1.2.2.4.2.1.38.1.37879808 = INTEGER: 3
SNMPv2-SMI::enterprises.6527.3.1.2.2.4.2.1.38.1.37912576 = INTEGER: 3
SNMPv2-SMI::enterprises.6527.3.1.2.2.4.2.1.38.1.37945344 = INTEGER: 3
SNMPv2-SMI::enterprises.6527.3.1.2.2.4.2.1.38.1.37978112 = INTEGER: 3
SNMPv2-SMI::enterprises.6527.3.1.2.2.4.2.1.38.1.38010880 = INTEGER: 3
SNMPv2-SMI::enterprises.6527.3.1.2.2.4.2.1.38.1.38043648 = INTEGER: 3


Re: wrong notification or alarms

Posted: Thu Nov 16, 2017 4:59 pm
by npolovenko
@DOkuwa, Was this check working OK in the past or did you just configure it? What manual did you use to set it up? We also need to see the command definition and service definition. Usually you can find those in this folder: /usr/local/nagios/etc/objects
If you're having trouble finding it you can download the whole etc folder using Filezilla or WinSCP /usr/local/nagios/etc/ and upload it here.

Re: wrong notification or alarms

Posted: Fri Nov 17, 2017 10:05 am
by DOkuwa
I could not find anything
just the ip address is wrong and I have added the new IP address and still the same issue I have checked the hosts.cfg and service .cfg and no issues

Code: Select all

define host{
        host_name                               uschi-eqn1-cs02
        use                             core-host
        alias                           uschi-eqn1-cs02
        address                         10.255.50.1
        _SNMPCOMMUNITY                  10.255.50.1
        hostgroups                      Alcatel_Equipment, Core_Equipmen

Code: Select all

 define service{
st_name                               uschi-eqn1-cs02
        use                             core-host
        alias                           uschi-eqn1-cs02
        address                         10.255.50.1
        _SNMPCOMMUNITY                  10.255.50.1
        hostgroups                      Alcatel_Equipment, Core_Equipmen

Re: wrong notification or alarms

Posted: Fri Nov 17, 2017 2:57 pm
by kyang
@DOkuwa,

The field you are using for "_SNMPCOMMUNITY" doesn't look, right? Does that actually work?
Could you run this command and show us the output?

Code: Select all

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Along with that, could you show us the objects.cache file?

Usually located here: If not you could do a find command --> find / -name objects.cache

Code: Select all

/usr/local/nagios/var/objects.cache

Re: wrong notification or alarms

Posted: Tue Nov 21, 2017 4:58 am
by DOkuwa

Code: Select all


/usr/local/nagios/libexec# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Nagios Core 3.2.3
Copyright (c) 2009-2010 Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 10-03-2010
License: GPL

Website: http://www.nagios.org
Reading configuration data...
   Read main config file okay...
Processing object config file '/usr/local/nagios/etc/hosts.cfg'...
Processing object config file '/usr/local/nagios/etc/services.cfg'...
Processing object config file '/usr/local/nagios/etc/misccommands.cfg'...
Processing object config file '/usr/local/nagios/etc/checkcommands.cfg'...
Processing object config file '/usr/local/nagios/etc/contactgroups.cfg'...
Processing object config file '/usr/local/nagios/etc/contacts.cfg'...
Processing object config file '/usr/local/nagios/etc/hostgroups.cfg'...
Processing object config file '/usr/local/nagios/etc/servicegroups.cfg'...
Processing object config file '/usr/local/nagios/etc/timeperiods.cfg'...
Processing object config file '/usr/local/nagios/etc/escalations.cfg'...
Processing object config file '/usr/local/nagios/etc/dependencies.cfg'...
Processing object config file '/usr/local/nagios/etc/hostextinfo.cfg'...
Processing object config file '/usr/local/nagios/etc/serviceextinfo.cfg'...
Processing object config file '/usr/local/nagios/etc/meta_commands.cfg'...
Processing object config file '/usr/local/nagios/etc/meta_contact.cfg'...
Processing object config file '/usr/local/nagios/etc/meta_contactgroup.cfg'...
Processing object config file '/usr/local/nagios/etc/meta_dependencies.cfg'...
Processing object config file '/usr/local/nagios/etc/meta_escalations.cfg'...
Processing object config file '/usr/local/nagios/etc/meta_host.cfg'...
Processing object config file '/usr/local/nagios/etc/meta_hostgroup.cfg'...
Processing object config file '/usr/local/nagios/etc/meta_services.cfg'...
Processing object config file '/usr/local/nagios/etc/meta_timeperiod.cfg'...
   Read object config files okay...

Running pre-flight check on configuration data...

Checking services...
Warning: Service 'HTTP-Tim' on host 'doctool.exponential-e.net' has no default contacts or contactgroups defined!
Warning: Service 'HTTP-Tim' on host 'tm01-dev.exponential-e.net' has no default contacts or contactgroups defined!
        Checked 2358 services.
Checking hosts...
Warning: Host 'doctool.exponential-e.net' has no default contacts or contactgroups defined!
Warning: Host 'tm01-dev.exponential-e.net' has no default contacts or contactgroups defined!
        Checked 434 hosts.
Checking host groups...
        Checked 11 host groups.
Checking service groups...
        Checked 4 service groups.
Checking contacts...
        Checked 36 contacts.
Checking contact groups...
        Checked 7 contact groups.
Checking service escalations...
        Checked 0 service escalations.
Checking service dependencies...
        Checked 0 service dependencies.
Checking host escalations...
        Checked 0 host escalations.
Checking host dependencies...
        Checked 0 host dependencies.
Checking commands...
        Checked 71 commands.
Checking time periods...
        Checked 6 time periods.
Checking for circular paths between hosts...
Checking for circular host and service dependencies...
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 4
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check

snmp running on the device
root@ukldn-ixhs-nms01:/usr/local/nagios/libexec./check_alcatel_psu.pl  -C xxxx -H 10.255.50.1
[PSU 1 deviceStateOk] [PSU 2 deviceNotEquipped] [PSU 3 deviceNotEquipped] [PSU 4




The object.cache file is in a attachment