I inherited this monster to try to support, its a Nagios system with 38000+ devices and over 1 million service checks every day....,normally I am a ccna netowrk tech, just my background.
So first when I ack something, all I would get back was this (starting yesterday):
nagios_ack add '0554-RTR-01' 'CA:997325'
...One moment please, analyzing the STATUSFILE and your selections...
And thats all I would get back on the screen.
I restarted the Nagios service, and now we get back the proper lines when doing acknowledgements, example:
nagios_ack add '0324-' 'CA1002000'
...One moment please, analyzing the STATUSFILE and your selections...
...Comparison results follow...
Acknowledge HOST Alarm: 0324-RTR-01 CA1002000
Acknowledge HOST Alarm: 0324-SW-ER-01 CA1002000
Acknowledge HOST Alarm: 0324-SW-ER-03 CA1002000
...Comparison complete. Nagios now processing any commands issued...
All was fine and dandy, but then I looked at the host and services page, and we never see the check mark (see picture).....what could I look into further? Screenshots attached of check marks we are supposed to see.....
If it matters:
Version : Nagios® Core™ 4.0.8 -
CentOS Linux release 7.0.1406 (Core)
Thank you!!!
Acknowledgements suddenly stopped working....
-
sandsdenver
- Posts: 6
- Joined: Fri Apr 25, 2014 1:46 pm
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: Acknowledgements suddenly stopped working....
Yes that is a lot of devices being monitored.
Just to check some basics and post back the output:
I'll get you to check the amount of free disk space on your nagios server. Type the following at the command prompt:
Also please run this one:
Run these commands
Send us these files:
/tmp/messages_log.txt
/tmp/nagios_log.txt
Please post the file /usr/local/nagios/etc/nagios.cfg
I'm not familiar with this, can you post some screenshots of these steps so we can get a better idea.sandsdenver wrote:So first when I ack something, all I would get back was this (starting yesterday):
nagios_ack add '0554-RTR-01' 'CA:997325'
...One moment please, analyzing the STATUSFILE and your selections...
And thats all I would get back on the screen.
I restarted the Nagios service, and now we get back the proper lines when doing acknowledgements, example:
nagios_ack add '0324-' 'CA1002000'
...One moment please, analyzing the STATUSFILE and your selections...
...Comparison results follow...
Acknowledge HOST Alarm: 0324-RTR-01 CA1002000
Acknowledge HOST Alarm: 0324-SW-ER-01 CA1002000
Acknowledge HOST Alarm: 0324-SW-ER-03 CA1002000
...Comparison complete. Nagios now processing any commands issued...
Just to check some basics and post back the output:
I'll get you to check the amount of free disk space on your nagios server. Type the following at the command prompt:
Code: Select all
df -h
df -iCode: Select all
top -n 1Code: Select all
tail /var/log/messages -n 100 > /tmp/messages_log.txt
tail /usr/local/nagios/var/nagios.log -n 100 > /tmp/nagios_log.txt/tmp/messages_log.txt
/tmp/nagios_log.txt
Please post the file /usr/local/nagios/etc/nagios.cfg
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
-
sandsdenver
- Posts: 6
- Joined: Fri Apr 25, 2014 1:46 pm
Re: Acknowledgements suddenly stopped working....
Thank you for your time, here is the information requested.
root@ccsd-lx-noc03 ~> tail /var/log/messages -n 100 > /tmp/messages_log.txt
Attached.
root@ccsd-lx-noc03 ~> tail /usr/local/nagios/var/nagios.log -n 100 > /tmp/nagios_log.txt
Attached.
root@ccsd-lx-noc03 ~> more /usr/local/nagios/etc/nagios.cfg
Attached.
Code: Select all
root@ccsd-lx-noc03 ~> df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos-root 45G 24G 21G 54% /
devtmpfs 5.8G 0 5.8G 0% /dev
tmpfs 5.8G 0 5.8G 0% /dev/shm
tmpfs 5.8G 576K 5.8G 1% /run
tmpfs 5.8G 0 5.8G 0% /sys/fs/cgroup
tmpfs 500M 203M 298M 41% /var/nagiosramdisk
/dev/sdb1 55G 27G 28G 50% /data
/dev/sda1 497M 214M 284M 43% /boot
10.50.5.2:/scripts 1.8T 835G 907G 48% /scripts
root@ccsd-lx-noc03 ~> df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/centos-root 46669824 109818 46560006 1% /
devtmpfs 1515443 357 1515086 1% /dev
tmpfs 1517624 1 1517623 1% /dev/shm
tmpfs 1517624 461 1517163 1% /run
tmpfs 1517624 13 1517611 1% /sys/fs/cgroup
tmpfs 1517624 6 1517618 1% /var/nagiosramdisk
/dev/sdb1 57670656 11338 57659318 1% /data
/dev/sda1 512000 351 511649 1% /boot
10.50.5.2:/scripts 122101760 112788 121988972 1% /scripts
root@ccsd-lx-noc03 ~> top -n 1
top - 12:40:07 up 247 days, 2:37, 1 user, load average: 3.31, 5.74, 5.74
Tasks: 200 total, 8 running, 191 sleeping, 0 stopped, 1 zombie
%Cpu(s): 24.3 us, 13.3 sy, 0.0 ni, 60.2 id, 0.9 wa, 0.0 hi, 1.3 si, 0.0 st
KiB Mem: 12140992 total, 7997432 used, 4143560 free, 0 buffers
KiB Swap: 5242876 total, 79584 used, 5163292 free. 5803376 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5253 apache 20 0 296724 189268 816 R 26.5 1.6 0:21.84 status.cgi
7772 root 20 0 7408 648 552 R 13.3 0.0 0:00.08 nagiostats
7768 nagios 20 0 0 0 0 R 11.6 0.0 0:00.07 php
31415 nagios 20 0 1415396 599336 2408 R 11.6 4.9 286:04.42 nagios
21 root 20 0 0 0 0 S 5.0 0.0 1344:01 ksoftirqd/1
3895 nagios 20 0 0 0 0 Z 5.0 0.0 0:00.08 mod_gearma+
7737 root 20 0 123656 1520 1092 R 5.0 0.0 0:00.48 top
7766 nagios 20 0 143892 3252 2048 R 5.0 0.0 0:00.03 mod_gearma+
28500 apache 20 0 327208 8592 1948 S 5.0 0.1 0:00.50 httpd
322 root 20 0 21260 1656 1088 S 1.7 0.0 24:03.74 cvfwd
5285 nagios 20 0 143604 3348 2128 S 1.7 0.0 0:00.14 mod_gearma+
5580 nagios 20 0 143604 3348 2128 S 1.7 0.0 0:00.04 mod_gearma+
5835 nagios 20 0 143604 3348 2128 S 1.7 0.0 0:00.05 mod_gearma+
6474 nagios 20 0 143084 2740 1904 S 1.7 0.0 0:00.02 mod_gearma+
7621 root 20 0 51596 17436 2296 S 1.7 0.1 0:00.23 mrtg
7764 nagios 20 0 116452 716 588 S 1.7 0.0 0:00.03 check_icmp
1 root 20 0 197616 4576 2412 S 0.0 0.0 225:09.96 systemdAttached.
root@ccsd-lx-noc03 ~> tail /usr/local/nagios/var/nagios.log -n 100 > /tmp/nagios_log.txt
Attached.
root@ccsd-lx-noc03 ~> more /usr/local/nagios/etc/nagios.cfg
Attached.
- Attachments
-
nagios.cfg- (46.75 KiB) Downloaded 270 times
-
messages_log.txt- (17.13 KiB) Downloaded 255 times
-
nagios_log.txt- (13.32 KiB) Downloaded 269 times
Re: Acknowledgements suddenly stopped working....
I am also a bit unsure of what you were referencing in your original post. Could you please post the screenshots of theBox293 wrote:I'm not familiar with this, can you post some screenshots of these steps so we can get a better idea.
Code: Select all
...One moment please, analyzing the STATUSFILE and your selections...
...Comparison results follow...
Acknowledge HOST Alarm: 0324-RTR-01 CA1002000
Acknowledge HOST Alarm: 0324-SW-ER-01 CA1002000
Acknowledge HOST Alarm: 0324-SW-ER-03 CA1002000
...Comparison complete. Nagios now processing any commands issued...
Former Nagios employee
-
sandsdenver
- Posts: 6
- Joined: Fri Apr 25, 2014 1:46 pm
Re: Acknowledgements suddenly stopped working....
Its probably just a front end GUI we have used for the past few years, here is a scrren shot. The issue went away for about a week and has came back today.
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: Acknowledgements suddenly stopped working....
Is this a physical server or a VM?
If it's a VM, can you look at the VM's performance stats through the Hypervisor, I'm particularly interested to see if the VM's memory is being exhausted.
If it's a VM, can you look at the VM's performance stats through the Hypervisor, I'm particularly interested to see if the VM's memory is being exhausted.
Do these check marks eventually appear?sandsdenver wrote:All was fine and dandy, but then I looked at the host and services page, and we never see the check mark (see picture).....what could I look into further? Screenshots attached of check marks we are supposed to see.....
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
-
sandsdenver
- Posts: 6
- Joined: Fri Apr 25, 2014 1:46 pm
Re: Acknowledgements suddenly stopped working....
Nagios is a beast, hard to tackle the spider web of what goes where....built by one guy, over 8 years.....who is no longer here,..let the fun begin! lol
Yes, they did start working again when we did a build. Yes these are VMs.
Just for info, here is the latest numbers.....
Checking objects...
Checked 79160 services.
Checked 8048 hosts.
Checked 2133 host groups.
Checked 1002 service groups.
Checked 91 contacts.
Checked 59 contact groups.
Checked 564 commands.
Checked 16 time periods.
Checked 9155 host escalations.
Checked 13192 service escalations.
Checking for circular paths...
Checked 8048 hosts
Checked 3552 service dependencies
Checked 0 host dependencies
Checked 16 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...
So for now this thread can be clos.....wait!, one more unrelated question......
What does something like this do?
define hostgroup {
# 36 Hosts in this group
hostgroup_name DataCenter.CIS
alias DataCenter.CIS
}
Oh, and when you define a host, does that automatically get checked via ping command or do you need a servicecheck to do this? Can you turn that off (checking to see if it up)?
Does that make a hostgroup and name it....then later you put in members?
Yes, they did start working again when we did a build. Yes these are VMs.
Just for info, here is the latest numbers.....
Checking objects...
Checked 79160 services.
Checked 8048 hosts.
Checked 2133 host groups.
Checked 1002 service groups.
Checked 91 contacts.
Checked 59 contact groups.
Checked 564 commands.
Checked 16 time periods.
Checked 9155 host escalations.
Checked 13192 service escalations.
Checking for circular paths...
Checked 8048 hosts
Checked 3552 service dependencies
Checked 0 host dependencies
Checked 16 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...
So for now this thread can be clos.....wait!, one more unrelated question......
What does something like this do?
define hostgroup {
# 36 Hosts in this group
hostgroup_name DataCenter.CIS
alias DataCenter.CIS
}
Oh, and when you define a host, does that automatically get checked via ping command or do you need a servicecheck to do this? Can you turn that off (checking to see if it up)?
Does that make a hostgroup and name it....then later you put in members?
- Box293
- Too Basu
- Posts: 5126
- Joined: Sun Feb 07, 2010 10:55 pm
- Location: Deniliquin, Australia
- Contact:
Re: Acknowledgements suddenly stopped working....
Creating a hostgroup allows you to do things like assign one service to the group, hence all hosts in that group get the service, it's a configuration technique. This might explain it better: http://sites.box293.com/nagios/guides/c ... n-services
If you define a host, you would define a check_command to use like check-host-alive OR use a template that has it defined. A host doesn't need a check command however if a host goes down then it's services won't have their notifications suppressed. A host check command is different to a service. A host can only have one check command but can have many services.
If you define a host, you would define a check_command to use like check-host-alive OR use a template that has it defined. A host doesn't need a check command however if a host goes down then it's services won't have their notifications suppressed. A host check command is different to a service. A host can only have one check command but can have many services.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.