Nagios Freezes and Host Status not getting Updated

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Teja
Posts: 53
Joined: Tue Jun 13, 2017 8:13 am

Re: Nagios Freezes and Host Status not getting Updated

Post by Teja »

Oh Got it! Thank You friend.
Teja
Posts: 53
Joined: Tue Jun 13, 2017 8:13 am

Re: Nagios Freezes and Host Status not getting Updated

Post by Teja »

Hi scottwilkerson,
It just freezed once again and there are no errors in configuration files,Restarted and it's working normal.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios Freezes and Host Status not getting Updated

Post by scottwilkerson »

I am starting to wonder if you may have missed some steps which could cause the message queue to fill

Did you happen to follow the guide here?
https://support.nagios.com/kb/article/n ... tml#Ubuntu

Specifically the Linux Kernel Settings portion
NDOUtils uses the kernel message queue for transferring the data from Nagios to NDOUtils. We are going to increase the default values the Kernel boots with to ensure it operates optimally.

First create a backup copy of the /etc/sysctl.conf file:

Code: Select all

sudo cp /etc/sysctl.conf /etc/sysctl.conf_backup

Now make the required changes:

Code: Select all

sudo sed -i '/msgmnb/d' /etc/sysctl.conf
sudo sed -i '/msgmax/d' /etc/sysctl.conf
sudo sed -i '/shmmax/d' /etc/sysctl.conf
sudo sed -i '/shmall/d' /etc/sysctl.conf
sudo sh -c 'printf "\n\nkernel.msgmnb = 131072000\n" >> /etc/sysctl.conf'
sudo sh -c 'printf "kernel.msgmax = 131072000\n" >> /etc/sysctl.conf'
sudo sh -c 'printf "kernel.shmmax = 4294967295\n" >> /etc/sysctl.conf'
sudo sh -c 'printf "kernel.shmall = 268435456\n" >> /etc/sysctl.conf'
sudo sysctl -e -p /etc/sysctl.conf
This could be causing the problem. If not, can you describe what happens when it "freezes"
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
Teja
Posts: 53
Joined: Tue Jun 13, 2017 8:13 am

Re: Nagios Freezes and Host Status not getting Updated

Post by Teja »

Scottwilkerson- As our's is a very busy system (which is hosted on AWS) So many hosts go "Down" & "UP" frequently and recently we had a bulk of hosts going down and from that time this issue started to happen,So when it Freezes the host-status won't update like "Last check" time doesn't change and "Duration" time keep on increasing . FYI below i am sharing my default "/etc/sysctl.conf " file numbers and host status screen-shot.

Thank You!

Code: Select all

kernel.msgmax = 131072000
kernel.msgmnb = 131072000
kernel.msgmni = 65536000
Attachments
Example Host Status.PNG
Example Host Status.PNG (6.44 KiB) Viewed 4856 times
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios Freezes and Host Status not getting Updated

Post by scottwilkerson »

What you are viewing on the screen NEVER comes from the DB.

If the times aren't changing I wonder if you might not have multiple nagios parent processes running.

Please run the following and post the output:

Code: Select all

ps -ef|grep bin/nagios
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
Teja
Posts: 53
Joined: Tue Jun 13, 2017 8:13 am

Re: Nagios Freezes and Host Status not getting Updated

Post by Teja »

You were right, here it is, I think there are lot of multiple processes running friend. Could you please confirm what processes to kill?

Code: Select all

nagios   28690     1  7 06:57 ?        00:11:06 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   28692 28690 11 06:57 ?        00:17:39 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28693 28690 11 06:57 ?        00:17:36 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28694 28690 12 06:57 ?        00:19:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28695 28690 12 06:57 ?        00:19:28 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28696 28690 11 06:57 ?        00:18:46 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28697 28690 12 06:57 ?        00:18:52 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28698 28690 11 06:57 ?        00:18:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28699 28690 11 06:57 ?        00:17:47 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28701 28690 12 06:57 ?        00:20:12 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28702 28690 12 06:57 ?        00:18:53 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28703 28690 12 06:57 ?        00:19:05 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28704 28690 13 06:57 ?        00:21:27 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28705 28690 12 06:57 ?        00:19:54 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28706 28690 11 06:57 ?        00:17:50 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28707 28690 12 06:57 ?        00:18:53 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28708 28690 11 06:57 ?        00:18:40 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28709 28690 12 06:57 ?        00:19:26 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28710 28690 12 06:57 ?        00:19:39 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28711 28690 12 06:57 ?        00:19:01 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28712 28690 12 06:57 ?        00:19:47 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28713 28690 12 06:57 ?        00:18:53 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28714 28690 12 06:57 ?        00:19:22 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28715 28690 12 06:57 ?        00:19:26 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28716 28690 11 06:57 ?        00:18:33 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28717 28690 11 06:57 ?        00:17:59 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28718 28690 13 06:57 ?        00:20:28 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28719 28690 11 06:57 ?        00:17:42 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28720 28690 12 06:57 ?        00:20:22 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28721 28690 11 06:57 ?        00:17:48 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28722 28690 13 06:57 ?        00:20:26 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28723 28690 11 06:57 ?        00:18:48 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28724 28690 11 06:57 ?        00:18:01 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28725 28690 12 06:57 ?        00:20:06 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28726 28690 11 06:57 ?        00:18:37 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28727 28690 12 06:57 ?        00:20:12 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28728 28690 12 06:57 ?        00:19:42 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28729 28690 13 06:57 ?        00:20:51 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28730 28690 13 06:57 ?        00:21:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28731 28690 12 06:57 ?        00:20:15 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28732 28690 12 06:57 ?        00:18:59 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28733 28690 13 06:57 ?        00:20:45 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28734 28690 12 06:57 ?        00:19:44 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28735 28690 12 06:57 ?        00:20:09 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28736 28690 12 06:57 ?        00:20:05 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28737 28690 13 06:57 ?        00:21:11 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28738 28690 13 06:57 ?        00:21:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28739 28690 12 06:57 ?        00:20:15 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28740 28690 12 06:57 ?        00:19:40 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28745 28690  0 06:57 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root     29241 29217  0 09:34 pts/0    00:00:00 grep --color=auto bin/nagios
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Nagios Freezes and Host Status not getting Updated

Post by tgriep »

Actually, that output looks good. One parent process and one child process and a lot of workers are normal.
Can you post your status.dat file so we can check the servers settings?
Be sure to check out our Knowledgebase for helpful articles and solutions!
Teja
Posts: 53
Joined: Tue Jun 13, 2017 8:13 am

Re: Nagios Freezes and Host Status not getting Updated

Post by Teja »

Hi tgriep,
Sorry i don't have permissions to share the file here, Could you please guide me the steps so that i can check with that and since after i have changed the "debug_level=0" the nagios file became to grow usually the nagios log file would be in b/w 300-400 mb but the last few days log files are 5.5GB, 7.2GB , 9.3GB,Its growing day by day, below i am attaching the SS of the log size. So what can be the cause ? should i change back debug_level to 2 ?

-> Some of the sites says to change "auto_rescheduling_window=180" to "auto_rescheduling_window=45" in nagios.cfg file can i check by doing this ?


->tgriep just now we observed that when ever we configure a new site in .cfg files and do "service nagios reload" and sometime "restart" it is freezing,then we have to kill the process and start again to make it work normal.

Thank You!!!
Attachments
last 3 day logs
last 3 day logs
Nagios log sizes.PNG (6.58 KiB) Viewed 4823 times
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Nagios Freezes and Host Status not getting Updated

Post by tgriep »

I wanted to see the status.dat file so I can check what the server thinks the current status is and the status of that services that are not running correctly.
I thing the freezing is caused by Nagios having to process the very large nagios.log file and we would have to look in that file to see that has caused it to increase in size.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Teja
Posts: 53
Joined: Tue Jun 13, 2017 8:13 am

Re: Nagios Freezes and Host Status not getting Updated

Post by Teja »

Hi tgriep,
The log rotation is not working i guess because i see the nagios logs keep growing everyday.Could you please help me to find the issue. Thank you!
Attachments
Nagios logs.PNG
Locked