Nagios Freezes and Host Status not getting Updated

This forum is intended for the discussion of Nagios Core development. Feature requests, patches, bug fixes, and all types of development-related discussions are welcome!

NOTE: The SourceForge.net nagios-devel mailing list has been deprecated in favor of this forum in order to expedite support and provide additional features not available on the old mailing list.

Re: Nagios Freezes and Host Status not getting Updated

Postby Teja » Wed Aug 30, 2017 11:12 am

Oh Got it! Thank You friend.
Teja
 
Posts: 47
Joined: Tue Jun 13, 2017 8:13 am

Re: Nagios Freezes and Host Status not getting Updated

Postby Teja » Wed Aug 30, 2017 12:03 pm

Hi scottwilkerson,
It just freezed once again and there are no errors in configuration files,Restarted and it's working normal.
Teja
 
Posts: 47
Joined: Tue Jun 13, 2017 8:13 am

Re: Nagios Freezes and Host Status not getting Updated

Postby scottwilkerson » Wed Aug 30, 2017 12:09 pm

I am starting to wonder if you may have missed some steps which could cause the message queue to fill

Did you happen to follow the guide here?
https://support.nagios.com/kb/article/ndoutils-installing-ndoutils.html#Ubuntu

Specifically the Linux Kernel Settings portion

NDOUtils uses the kernel message queue for transferring the data from Nagios to NDOUtils. We are going to increase the default values the Kernel boots with to ensure it operates optimally.

First create a backup copy of the /etc/sysctl.conf file:

Code: Select all
sudo cp /etc/sysctl.conf /etc/sysctl.conf_backup



Now make the required changes:

Code: Select all
sudo sed -i '/msgmnb/d' /etc/sysctl.conf
sudo sed -i '/msgmax/d' /etc/sysctl.conf
sudo sed -i '/shmmax/d' /etc/sysctl.conf
sudo sed -i '/shmall/d' /etc/sysctl.conf
sudo sh -c 'printf "\n\nkernel.msgmnb = 131072000\n" >> /etc/sysctl.conf'
sudo sh -c 'printf "kernel.msgmax = 131072000\n" >> /etc/sysctl.conf'
sudo sh -c 'printf "kernel.shmmax = 4294967295\n" >> /etc/sysctl.conf'
sudo sh -c 'printf "kernel.shmall = 268435456\n" >> /etc/sysctl.conf'
sudo sysctl -e -p /etc/sysctl.conf


This could be causing the problem. If not, can you describe what happens when it "freezes"
User avatar
scottwilkerson
CTO
 
Posts: 7911
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: Nagios Freezes and Host Status not getting Updated

Postby Teja » Thu Aug 31, 2017 5:43 am

Scottwilkerson- As our's is a very busy system (which is hosted on AWS) So many hosts go "Down" & "UP" frequently and recently we had a bulk of hosts going down and from that time this issue started to happen,So when it Freezes the host-status won't update like "Last check" time doesn't change and "Duration" time keep on increasing . FYI below i am sharing my default "/etc/sysctl.conf " file numbers and host status screen-shot.

Thank You!

Code: Select all
kernel.msgmax = 131072000
kernel.msgmnb = 131072000
kernel.msgmni = 65536000
Attachments
Example Host Status.PNG
Example Host Status.PNG (6.44 KiB) Viewed 568 times
Teja
 
Posts: 47
Joined: Tue Jun 13, 2017 8:13 am

Re: Nagios Freezes and Host Status not getting Updated

Postby scottwilkerson » Thu Aug 31, 2017 8:54 am

What you are viewing on the screen NEVER comes from the DB.

If the times aren't changing I wonder if you might not have multiple nagios parent processes running.

Please run the following and post the output:
Code: Select all
ps -ef|grep bin/nagios
User avatar
scottwilkerson
CTO
 
Posts: 7911
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: Nagios Freezes and Host Status not getting Updated

Postby Teja » Thu Aug 31, 2017 9:35 am

You were right, here it is, I think there are lot of multiple processes running friend. Could you please confirm what processes to kill?

Code: Select all
nagios   28690     1  7 06:57 ?        00:11:06 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   28692 28690 11 06:57 ?        00:17:39 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28693 28690 11 06:57 ?        00:17:36 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28694 28690 12 06:57 ?        00:19:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28695 28690 12 06:57 ?        00:19:28 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28696 28690 11 06:57 ?        00:18:46 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28697 28690 12 06:57 ?        00:18:52 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28698 28690 11 06:57 ?        00:18:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28699 28690 11 06:57 ?        00:17:47 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28701 28690 12 06:57 ?        00:20:12 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28702 28690 12 06:57 ?        00:18:53 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28703 28690 12 06:57 ?        00:19:05 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28704 28690 13 06:57 ?        00:21:27 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28705 28690 12 06:57 ?        00:19:54 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28706 28690 11 06:57 ?        00:17:50 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28707 28690 12 06:57 ?        00:18:53 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28708 28690 11 06:57 ?        00:18:40 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28709 28690 12 06:57 ?        00:19:26 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28710 28690 12 06:57 ?        00:19:39 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28711 28690 12 06:57 ?        00:19:01 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28712 28690 12 06:57 ?        00:19:47 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28713 28690 12 06:57 ?        00:18:53 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28714 28690 12 06:57 ?        00:19:22 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28715 28690 12 06:57 ?        00:19:26 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28716 28690 11 06:57 ?        00:18:33 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28717 28690 11 06:57 ?        00:17:59 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28718 28690 13 06:57 ?        00:20:28 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28719 28690 11 06:57 ?        00:17:42 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28720 28690 12 06:57 ?        00:20:22 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28721 28690 11 06:57 ?        00:17:48 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28722 28690 13 06:57 ?        00:20:26 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28723 28690 11 06:57 ?        00:18:48 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28724 28690 11 06:57 ?        00:18:01 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28725 28690 12 06:57 ?        00:20:06 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28726 28690 11 06:57 ?        00:18:37 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28727 28690 12 06:57 ?        00:20:12 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28728 28690 12 06:57 ?        00:19:42 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28729 28690 13 06:57 ?        00:20:51 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28730 28690 13 06:57 ?        00:21:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28731 28690 12 06:57 ?        00:20:15 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28732 28690 12 06:57 ?        00:18:59 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28733 28690 13 06:57 ?        00:20:45 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28734 28690 12 06:57 ?        00:19:44 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28735 28690 12 06:57 ?        00:20:09 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28736 28690 12 06:57 ?        00:20:05 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28737 28690 13 06:57 ?        00:21:11 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28738 28690 13 06:57 ?        00:21:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28739 28690 12 06:57 ?        00:20:15 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28740 28690 12 06:57 ?        00:19:40 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios   28745 28690  0 06:57 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root     29241 29217  0 09:34 pts/0    00:00:00 grep --color=auto bin/nagios
Teja
 
Posts: 47
Joined: Tue Jun 13, 2017 8:13 am

Re: Nagios Freezes and Host Status not getting Updated

Postby tgriep » Thu Aug 31, 2017 5:02 pm

Actually, that output looks good. One parent process and one child process and a lot of workers are normal.
Can you post your status.dat file so we can check the servers settings?
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
tgriep
Madmin
 
Posts: 6198
Joined: Thu Oct 30, 2014 9:02 am

Re: Nagios Freezes and Host Status not getting Updated

Postby Teja » Fri Sep 01, 2017 7:20 am

Hi tgriep,
Sorry i don't have permissions to share the file here, Could you please guide me the steps so that i can check with that and since after i have changed the "debug_level=0" the nagios file became to grow usually the nagios log file would be in b/w 300-400 mb but the last few days log files are 5.5GB, 7.2GB , 9.3GB,Its growing day by day, below i am attaching the SS of the log size. So what can be the cause ? should i change back debug_level to 2 ?

-> Some of the sites says to change "auto_rescheduling_window=180" to "auto_rescheduling_window=45" in nagios.cfg file can i check by doing this ?


->tgriep just now we observed that when ever we configure a new site in .cfg files and do "service nagios reload" and sometime "restart" it is freezing,then we have to kill the process and start again to make it work normal.

Thank You!!!
Attachments
Nagios log sizes.PNG
last 3 day logs
Nagios log sizes.PNG (6.58 KiB) Viewed 535 times
Teja
 
Posts: 47
Joined: Tue Jun 13, 2017 8:13 am

Re: Nagios Freezes and Host Status not getting Updated

Postby tgriep » Fri Sep 01, 2017 9:19 am

I wanted to see the status.dat file so I can check what the server thinks the current status is and the status of that services that are not running correctly.
I thing the freezing is caused by Nagios having to process the very large nagios.log file and we would have to look in that file to see that has caused it to increase in size.
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
tgriep
Madmin
 
Posts: 6198
Joined: Thu Oct 30, 2014 9:02 am

Re: Nagios Freezes and Host Status not getting Updated

Postby Teja » Thu Sep 07, 2017 7:08 am

Hi tgriep,
The log rotation is not working i guess because i see the nagios logs keep growing everyday.Could you please help me to find the issue. Thank you!
Attachments
Nagios logs.PNG
Teja
 
Posts: 47
Joined: Tue Jun 13, 2017 8:13 am

PreviousNext

Return to Nagios Core Development

Who is online

Users browsing this forum: No registered users and 3 guests