Nagios Freezes and Host Status not getting Updated
Re: Nagios Freezes and Host Status not getting Updated
Oh Got it! Thank You friend.
Re: Nagios Freezes and Host Status not getting Updated
Hi scottwilkerson,
It just freezed once again and there are no errors in configuration files,Restarted and it's working normal.
It just freezed once again and there are no errors in configuration files,Restarted and it's working normal.
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Nagios Freezes and Host Status not getting Updated
I am starting to wonder if you may have missed some steps which could cause the message queue to fill
Did you happen to follow the guide here?
https://support.nagios.com/kb/article/n ... tml#Ubuntu
Specifically the Linux Kernel Settings portion
Did you happen to follow the guide here?
https://support.nagios.com/kb/article/n ... tml#Ubuntu
Specifically the Linux Kernel Settings portion
This could be causing the problem. If not, can you describe what happens when it "freezes"NDOUtils uses the kernel message queue for transferring the data from Nagios to NDOUtils. We are going to increase the default values the Kernel boots with to ensure it operates optimally.
First create a backup copy of the /etc/sysctl.conf file:
Code: Select all
sudo cp /etc/sysctl.conf /etc/sysctl.conf_backup
Now make the required changes:
Code: Select all
sudo sed -i '/msgmnb/d' /etc/sysctl.conf sudo sed -i '/msgmax/d' /etc/sysctl.conf sudo sed -i '/shmmax/d' /etc/sysctl.conf sudo sed -i '/shmall/d' /etc/sysctl.conf sudo sh -c 'printf "\n\nkernel.msgmnb = 131072000\n" >> /etc/sysctl.conf' sudo sh -c 'printf "kernel.msgmax = 131072000\n" >> /etc/sysctl.conf' sudo sh -c 'printf "kernel.shmmax = 4294967295\n" >> /etc/sysctl.conf' sudo sh -c 'printf "kernel.shmall = 268435456\n" >> /etc/sysctl.conf' sudo sysctl -e -p /etc/sysctl.conf
Re: Nagios Freezes and Host Status not getting Updated
Scottwilkerson- As our's is a very busy system (which is hosted on AWS) So many hosts go "Down" & "UP" frequently and recently we had a bulk of hosts going down and from that time this issue started to happen,So when it Freezes the host-status won't update like "Last check" time doesn't change and "Duration" time keep on increasing . FYI below i am sharing my default "/etc/sysctl.conf " file numbers and host status screen-shot.
Thank You!
Thank You!
Code: Select all
kernel.msgmax = 131072000
kernel.msgmnb = 131072000
kernel.msgmni = 65536000
- Attachments
-
- Example Host Status.PNG (6.44 KiB) Viewed 5557 times
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Nagios Freezes and Host Status not getting Updated
What you are viewing on the screen NEVER comes from the DB.
If the times aren't changing I wonder if you might not have multiple nagios parent processes running.
Please run the following and post the output:
If the times aren't changing I wonder if you might not have multiple nagios parent processes running.
Please run the following and post the output:
Code: Select all
ps -ef|grep bin/nagios
Re: Nagios Freezes and Host Status not getting Updated
You were right, here it is, I think there are lot of multiple processes running friend. Could you please confirm what processes to kill?
Code: Select all
nagios 28690 1 7 06:57 ? 00:11:06 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 28692 28690 11 06:57 ? 00:17:39 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28693 28690 11 06:57 ? 00:17:36 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28694 28690 12 06:57 ? 00:19:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28695 28690 12 06:57 ? 00:19:28 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28696 28690 11 06:57 ? 00:18:46 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28697 28690 12 06:57 ? 00:18:52 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28698 28690 11 06:57 ? 00:18:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28699 28690 11 06:57 ? 00:17:47 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28701 28690 12 06:57 ? 00:20:12 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28702 28690 12 06:57 ? 00:18:53 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28703 28690 12 06:57 ? 00:19:05 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28704 28690 13 06:57 ? 00:21:27 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28705 28690 12 06:57 ? 00:19:54 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28706 28690 11 06:57 ? 00:17:50 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28707 28690 12 06:57 ? 00:18:53 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28708 28690 11 06:57 ? 00:18:40 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28709 28690 12 06:57 ? 00:19:26 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28710 28690 12 06:57 ? 00:19:39 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28711 28690 12 06:57 ? 00:19:01 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28712 28690 12 06:57 ? 00:19:47 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28713 28690 12 06:57 ? 00:18:53 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28714 28690 12 06:57 ? 00:19:22 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28715 28690 12 06:57 ? 00:19:26 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28716 28690 11 06:57 ? 00:18:33 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28717 28690 11 06:57 ? 00:17:59 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28718 28690 13 06:57 ? 00:20:28 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28719 28690 11 06:57 ? 00:17:42 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28720 28690 12 06:57 ? 00:20:22 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28721 28690 11 06:57 ? 00:17:48 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28722 28690 13 06:57 ? 00:20:26 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28723 28690 11 06:57 ? 00:18:48 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28724 28690 11 06:57 ? 00:18:01 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28725 28690 12 06:57 ? 00:20:06 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28726 28690 11 06:57 ? 00:18:37 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28727 28690 12 06:57 ? 00:20:12 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28728 28690 12 06:57 ? 00:19:42 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28729 28690 13 06:57 ? 00:20:51 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28730 28690 13 06:57 ? 00:21:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28731 28690 12 06:57 ? 00:20:15 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28732 28690 12 06:57 ? 00:18:59 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28733 28690 13 06:57 ? 00:20:45 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28734 28690 12 06:57 ? 00:19:44 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28735 28690 12 06:57 ? 00:20:09 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28736 28690 12 06:57 ? 00:20:05 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28737 28690 13 06:57 ? 00:21:11 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28738 28690 13 06:57 ? 00:21:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28739 28690 12 06:57 ? 00:20:15 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28740 28690 12 06:57 ? 00:19:40 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 28745 28690 0 06:57 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 29241 29217 0 09:34 pts/0 00:00:00 grep --color=auto bin/nagios
Re: Nagios Freezes and Host Status not getting Updated
Actually, that output looks good. One parent process and one child process and a lot of workers are normal.
Can you post your status.dat file so we can check the servers settings?
Can you post your status.dat file so we can check the servers settings?
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Nagios Freezes and Host Status not getting Updated
Hi tgriep,
Sorry i don't have permissions to share the file here, Could you please guide me the steps so that i can check with that and since after i have changed the "debug_level=0" the nagios file became to grow usually the nagios log file would be in b/w 300-400 mb but the last few days log files are 5.5GB, 7.2GB , 9.3GB,Its growing day by day, below i am attaching the SS of the log size. So what can be the cause ? should i change back debug_level to 2 ?
-> Some of the sites says to change "auto_rescheduling_window=180" to "auto_rescheduling_window=45" in nagios.cfg file can i check by doing this ?
->tgriep just now we observed that when ever we configure a new site in .cfg files and do "service nagios reload" and sometime "restart" it is freezing,then we have to kill the process and start again to make it work normal.
Thank You!!!
Sorry i don't have permissions to share the file here, Could you please guide me the steps so that i can check with that and since after i have changed the "debug_level=0" the nagios file became to grow usually the nagios log file would be in b/w 300-400 mb but the last few days log files are 5.5GB, 7.2GB , 9.3GB,Its growing day by day, below i am attaching the SS of the log size. So what can be the cause ? should i change back debug_level to 2 ?
-> Some of the sites says to change "auto_rescheduling_window=180" to "auto_rescheduling_window=45" in nagios.cfg file can i check by doing this ?
->tgriep just now we observed that when ever we configure a new site in .cfg files and do "service nagios reload" and sometime "restart" it is freezing,then we have to kill the process and start again to make it work normal.
Thank You!!!
- Attachments
-
- last 3 day logs
- Nagios log sizes.PNG (6.58 KiB) Viewed 5524 times
Re: Nagios Freezes and Host Status not getting Updated
I wanted to see the status.dat file so I can check what the server thinks the current status is and the status of that services that are not running correctly.
I thing the freezing is caused by Nagios having to process the very large nagios.log file and we would have to look in that file to see that has caused it to increase in size.
I thing the freezing is caused by Nagios having to process the very large nagios.log file and we would have to look in that file to see that has caused it to increase in size.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Nagios Freezes and Host Status not getting Updated
Hi tgriep,
The log rotation is not working i guess because i see the nagios logs keep growing everyday.Could you please help me to find the issue. Thank you!
The log rotation is not working i guess because i see the nagios logs keep growing everyday.Could you please help me to find the issue. Thank you!