Hi,
I am currently using Nagios® Core™ Version 3.5.0 to monitor about 10 physical nodes and 20 VMs since almost one year withouth problems. In the last month,I am receiving too many alerts from Current Load and Total processes at the same time, like if all the nodes and VMs would had the same problem. I've checked the load percentage error messages, for instance:
***** Nagios *****
Notification Type: PROBLEM
Service: Current Load
Host: ***
Address: *.*.*.*
State: WARNING
Additional Info:
WARNING - load average: 0.00, 1.23, 3.54
Also, I have changed the total processes threshold to 400/500 in order to avoid these alerts. Anybody knows why am I receiving these two types of alerts at the same time?
Thanks in advance
Current load and total processes alerts
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: Current load and total processes alerts
Are you using a shared template across those servers? If so, though unlikely, you may be seeing this happen on multiple hosts. How many hosts is this happening on? All of them, some of them, etc? Any correlation between them? Have you personally checked the systems yourself?
Re: Current load and total processes alerts
All of the hosts are defined as "linux-server" and the services as "local-service". Apart from that, when alerts start to be sent simultaneously most of the critical values detected are the same (number of total processes, percentage of cpu load, etc.). I have checked some of the servers, running the command "top" and the number of processes usually doesn't match between nagios monitor service and top command.
Re: Current load and total processes alerts
It almost seems like each host has the same IP but a different name...
What changes were made prior to this issue? Any edits made to templates on the Nagios end? Any load balancing or failover implemented in the VMs? Anything that might do some weird things with the assigned IP addresses?
What changes were made prior to this issue? Any edits made to templates on the Nagios end? Any load balancing or failover implemented in the VMs? Anything that might do some weird things with the assigned IP addresses?
Former Nagios employee
Re: Current load and total processes alerts
Well, I have checked that all VMs and nodes have different IP's and aliases. Besides this, I had to increase the Total Processes range and now the system is stable, no alerts are being sent currently. Anyway, I guess that it could be related to some critical process that is not running at the moment. I'll come back as soon as the alerts start to be sent again (if they do).
Thanks
Thanks
Re: Current load and total processes alerts
Very odd. Not certain that a critical process would show the exact same load for all servers, but stranger things have happened I suppose. You might try forcing some cpu-intensive processes on one of the servers as a test and see what gets returned. We'll keep this thread open for you. Looking forward to the results.
Former Nagios employee