Current load and total processes alerts

Miguel · Post by **Miguel** » Tue May 13, 2014 6:04 am

Hi,
I am currently using Nagios® Core™ Version 3.5.0 to monitor about 10 physical nodes and 20 VMs since almost one year withouth problems. In the last month,I am receiving too many alerts from Current Load and Total processes at the same time, like if all the nodes and VMs would had the same problem. I've checked the load percentage error messages, for instance:
***** Nagios *****
Notification Type: PROBLEM
Service: Current Load
Host: ***
Address: *.*.*.*
State: WARNING
Additional Info:
WARNING - load average: 0.00, 1.23, 3.54

Also, I have changed the total processes threshold to 400/500 in order to avoid these alerts. Anybody knows why am I receiving these two types of alerts at the same time?

Thanks in advance

slansing · Post by **slansing** » Tue May 13, 2014 10:56 am

Are you using a shared template across those servers? If so, though unlikely, you may be seeing this happen on multiple hosts. How many hosts is this happening on? All of them, some of them, etc? Any correlation between them? Have you personally checked the systems yourself?

Miguel · Post by **Miguel** » Wed May 14, 2014 10:45 am

All of the hosts are defined as "linux-server" and the services as "local-service". Apart from that, when alerts start to be sent simultaneously most of the critical values detected are the same (number of total processes, percentage of cpu load, etc.). I have checked some of the servers, running the command "top" and the number of processes usually doesn't match between nagios monitor service and top command.

tmcdonald · Post by **tmcdonald** » Thu May 15, 2014 9:52 am

It almost seems like each host has the same IP but a different name...

What changes were made prior to this issue? Any edits made to templates on the Nagios end? Any load balancing or failover implemented in the VMs? Anything that might do some weird things with the assigned IP addresses?

Miguel · Post by **Miguel** » Mon May 19, 2014 9:12 am

Well, I have checked that all VMs and nodes have different IP's and aliases. Besides this, I had to increase the Total Processes range and now the system is stable, no alerts are being sent currently. Anyway, I guess that it could be related to some critical process that is not running at the moment. I'll come back as soon as the alerts start to be sent again (if they do).

Thanks

tmcdonald · Post by **tmcdonald** » Mon May 19, 2014 10:16 am

Very odd. Not certain that a critical process would show the exact same load for all servers, but stranger things have happened I suppose. You might try forcing some cpu-intensive processes on one of the servers as a test and see what gets returned. We'll keep this thread open for you. Looking forward to the results.

Nagios Support Forum

Current load and total processes alerts

Current load and total processes alerts

Re: Current load and total processes alerts

Re: Current load and total processes alerts

Re: Current load and total processes alerts

Re: Current load and total processes alerts

Re: Current load and total processes alerts