Acknowledgements suddenly stopped working....

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
sandsdenver
Posts: 6
Joined: Fri Apr 25, 2014 1:46 pm

Acknowledgements suddenly stopped working....

Post by sandsdenver »

I inherited this monster to try to support, its a Nagios system with 38000+ devices and over 1 million service checks every day....,normally I am a ccna netowrk tech, just my background.

So first when I ack something, all I would get back was this (starting yesterday):

nagios_ack add '0554-RTR-01' 'CA:997325'

...One moment please, analyzing the STATUSFILE and your selections...

And thats all I would get back on the screen.
I restarted the Nagios service, and now we get back the proper lines when doing acknowledgements, example:

nagios_ack add '0324-' 'CA1002000'

...One moment please, analyzing the STATUSFILE and your selections...
...Comparison results follow...
Acknowledge HOST Alarm: 0324-RTR-01 CA1002000
Acknowledge HOST Alarm: 0324-SW-ER-01 CA1002000
Acknowledge HOST Alarm: 0324-SW-ER-03 CA1002000
...Comparison complete. Nagios now processing any commands issued...

All was fine and dandy, but then I looked at the host and services page, and we never see the check mark (see picture).....what could I look into further? Screenshots attached of check marks we are supposed to see.....

If it matters:

Version : Nagios® Core™ 4.0.8 -
CentOS Linux release 7.0.1406 (Core)


Thank you!!!
Attachments
Full view.jpg
No acks.jpg
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Acknowledgements suddenly stopped working....

Post by Box293 »

Yes that is a lot of devices being monitored.
sandsdenver wrote:So first when I ack something, all I would get back was this (starting yesterday):

nagios_ack add '0554-RTR-01' 'CA:997325'

...One moment please, analyzing the STATUSFILE and your selections...

And thats all I would get back on the screen.
I restarted the Nagios service, and now we get back the proper lines when doing acknowledgements, example:

nagios_ack add '0324-' 'CA1002000'

...One moment please, analyzing the STATUSFILE and your selections...
...Comparison results follow...
Acknowledge HOST Alarm: 0324-RTR-01 CA1002000
Acknowledge HOST Alarm: 0324-SW-ER-01 CA1002000
Acknowledge HOST Alarm: 0324-SW-ER-03 CA1002000
...Comparison complete. Nagios now processing any commands issued...
I'm not familiar with this, can you post some screenshots of these steps so we can get a better idea.

Just to check some basics and post back the output:

I'll get you to check the amount of free disk space on your nagios server. Type the following at the command prompt:

Code: Select all

df -h
df -i
Also please run this one:

Code: Select all

top -n 1
Run these commands

Code: Select all

tail /var/log/messages -n 100 > /tmp/messages_log.txt
tail /usr/local/nagios/var/nagios.log -n 100 > /tmp/nagios_log.txt
Send us these files:
/tmp/messages_log.txt
/tmp/nagios_log.txt


Please post the file /usr/local/nagios/etc/nagios.cfg
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
sandsdenver
Posts: 6
Joined: Fri Apr 25, 2014 1:46 pm

Re: Acknowledgements suddenly stopped working....

Post by sandsdenver »

Thank you for your time, here is the information requested.

Code: Select all

root@ccsd-lx-noc03 ~> df -h
Filesystem               Size  Used Avail Use% Mounted on
/dev/mapper/centos-root   45G   24G   21G  54% /
devtmpfs                 5.8G     0  5.8G   0% /dev
tmpfs                    5.8G     0  5.8G   0% /dev/shm
tmpfs                    5.8G  576K  5.8G   1% /run
tmpfs                    5.8G     0  5.8G   0% /sys/fs/cgroup
tmpfs                    500M  203M  298M  41% /var/nagiosramdisk
/dev/sdb1                 55G   27G   28G  50% /data
/dev/sda1                497M  214M  284M  43% /boot
10.50.5.2:/scripts       1.8T  835G  907G  48% /scripts

root@ccsd-lx-noc03 ~> df -i
Filesystem                 Inodes  IUsed     IFree IUse% Mounted on
/dev/mapper/centos-root  46669824 109818  46560006    1% /
devtmpfs                  1515443    357   1515086    1% /dev
tmpfs                     1517624      1   1517623    1% /dev/shm
tmpfs                     1517624    461   1517163    1% /run
tmpfs                     1517624     13   1517611    1% /sys/fs/cgroup
tmpfs                     1517624      6   1517618    1% /var/nagiosramdisk
/dev/sdb1                57670656  11338  57659318    1% /data
/dev/sda1                  512000    351    511649    1% /boot
10.50.5.2:/scripts      122101760 112788 121988972    1% /scripts

root@ccsd-lx-noc03 ~> top -n 1
top - 12:40:07 up 247 days,  2:37,  1 user,  load average: 3.31, 5.74, 5.74
Tasks: 200 total,   8 running, 191 sleeping,   0 stopped,   1 zombie
%Cpu(s): 24.3 us, 13.3 sy,  0.0 ni, 60.2 id,  0.9 wa,  0.0 hi,  1.3 si,  0.0 st
KiB Mem:  12140992 total,  7997432 used,  4143560 free,        0 buffers
KiB Swap:  5242876 total,    79584 used,  5163292 free.  5803376 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 5253 apache    20   0  296724 189268    816 R  26.5  1.6   0:21.84 status.cgi
 7772 root      20   0    7408    648    552 R  13.3  0.0   0:00.08 nagiostats
 7768 nagios    20   0       0      0      0 R  11.6  0.0   0:00.07 php
31415 nagios    20   0 1415396 599336   2408 R  11.6  4.9 286:04.42 nagios
   21 root      20   0       0      0      0 S   5.0  0.0   1344:01 ksoftirqd/1
 3895 nagios    20   0       0      0      0 Z   5.0  0.0   0:00.08 mod_gearma+
 7737 root      20   0  123656   1520   1092 R   5.0  0.0   0:00.48 top
 7766 nagios    20   0  143892   3252   2048 R   5.0  0.0   0:00.03 mod_gearma+
28500 apache    20   0  327208   8592   1948 S   5.0  0.1   0:00.50 httpd
  322 root      20   0   21260   1656   1088 S   1.7  0.0  24:03.74 cvfwd
 5285 nagios    20   0  143604   3348   2128 S   1.7  0.0   0:00.14 mod_gearma+
 5580 nagios    20   0  143604   3348   2128 S   1.7  0.0   0:00.04 mod_gearma+
 5835 nagios    20   0  143604   3348   2128 S   1.7  0.0   0:00.05 mod_gearma+
 6474 nagios    20   0  143084   2740   1904 S   1.7  0.0   0:00.02 mod_gearma+
 7621 root      20   0   51596  17436   2296 S   1.7  0.1   0:00.23 mrtg
 7764 nagios    20   0  116452    716    588 S   1.7  0.0   0:00.03 check_icmp
    1 root      20   0  197616   4576   2412 S   0.0  0.0 225:09.96 systemd
root@ccsd-lx-noc03 ~> tail /var/log/messages -n 100 > /tmp/messages_log.txt

Attached.
root@ccsd-lx-noc03 ~> tail /usr/local/nagios/var/nagios.log -n 100 > /tmp/nagios_log.txt

Attached.

root@ccsd-lx-noc03 ~> more /usr/local/nagios/etc/nagios.cfg

Attached.
Attachments
nagios.cfg
(46.75 KiB) Downloaded 270 times
messages_log.txt
(17.13 KiB) Downloaded 255 times
nagios_log.txt
(13.32 KiB) Downloaded 269 times
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Acknowledgements suddenly stopped working....

Post by tmcdonald »

Box293 wrote:I'm not familiar with this, can you post some screenshots of these steps so we can get a better idea.
I am also a bit unsure of what you were referencing in your original post. Could you please post the screenshots of the

Code: Select all

...One moment please, analyzing the STATUSFILE and your selections...
...Comparison results follow...
Acknowledge HOST Alarm: 0324-RTR-01 CA1002000
Acknowledge HOST Alarm: 0324-SW-ER-01 CA1002000
Acknowledge HOST Alarm: 0324-SW-ER-03 CA1002000
...Comparison complete. Nagios now processing any commands issued... 
etc. etc. information?
Former Nagios employee
sandsdenver
Posts: 6
Joined: Fri Apr 25, 2014 1:46 pm

Re: Acknowledgements suddenly stopped working....

Post by sandsdenver »

Its probably just a front end GUI we have used for the past few years, here is a scrren shot. The issue went away for about a week and has came back today.
Attachments
AckScreenShot.jpg
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Acknowledgements suddenly stopped working....

Post by Box293 »

Is this a physical server or a VM?

If it's a VM, can you look at the VM's performance stats through the Hypervisor, I'm particularly interested to see if the VM's memory is being exhausted.
sandsdenver wrote:All was fine and dandy, but then I looked at the host and services page, and we never see the check mark (see picture).....what could I look into further? Screenshots attached of check marks we are supposed to see.....
Do these check marks eventually appear?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
sandsdenver
Posts: 6
Joined: Fri Apr 25, 2014 1:46 pm

Re: Acknowledgements suddenly stopped working....

Post by sandsdenver »

Nagios is a beast, hard to tackle the spider web of what goes where....built by one guy, over 8 years.....who is no longer here,..let the fun begin! lol

Yes, they did start working again when we did a build. Yes these are VMs.

Just for info, here is the latest numbers.....

Checking objects...
Checked 79160 services.
Checked 8048 hosts.
Checked 2133 host groups.
Checked 1002 service groups.
Checked 91 contacts.
Checked 59 contact groups.
Checked 564 commands.
Checked 16 time periods.
Checked 9155 host escalations.
Checked 13192 service escalations.
Checking for circular paths...
Checked 8048 hosts
Checked 3552 service dependencies
Checked 0 host dependencies
Checked 16 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

So for now this thread can be clos.....wait!, one more unrelated question......

What does something like this do?

define hostgroup {
# 36 Hosts in this group
hostgroup_name DataCenter.CIS
alias DataCenter.CIS
}

Oh, and when you define a host, does that automatically get checked via ping command or do you need a servicecheck to do this? Can you turn that off (checking to see if it up)?
Does that make a hostgroup and name it....then later you put in members?
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Acknowledgements suddenly stopped working....

Post by Box293 »

Creating a hostgroup allows you to do things like assign one service to the group, hence all hosts in that group get the service, it's a configuration technique. This might explain it better: http://sites.box293.com/nagios/guides/c ... n-services

If you define a host, you would define a check_command to use like check-host-alive OR use a template that has it defined. A host doesn't need a check command however if a host goes down then it's services won't have their notifications suppressed. A host check command is different to a service. A host can only have one check command but can have many services.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked