nagios serverd load rising

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
elinagios
Posts: 146
Joined: Thu Feb 16, 2017 3:45 am

nagios serverd load rising

Post by elinagios »

Hello

server:
centos 7.9
nagiosxi 5.7.5

For some reason our nagios server load has gone up significantly in the last year but the host and service count has stayed almost the same. I have a feeling that someone made configuration error somewhere but i cant pin-point what it may be.
1 year ago average load was a bit under 3, 6 months ago around 4 and since then it has been rising, last month average load is around 8 and peaks often going to 15. Like i said, the host and service count doesn't seem to have grown (5100 hosts, 4606 services). I even changed many 1 minute checks to 2-5 minute, that didnt have any impact to load.

top -b -n1 |head -n 27
this command will show me the top load generators, there are different check going through, like check_snmp_stor, check_snmp_win and so on but what seems to be the most frequent is snmpget that takes a lot of load.

1) Are the snmpget queries logged somewhere? I looked through log files and didnt see anything intresting anywhere.
2) check_snmp_stor, check_snmp_win checks are using snmp, why is snmpget logged differently in top?

When i look at my commands in nagiosxi and filter snmpget, there are 0 answers there.

So what else should i check?
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: nagios serverd load rising

Post by benjaminsmith »

Hi elinagios,

The /usr/local/nagios/var/nagios.log is probably the best place to check. That said, snmp checks to usually take more CPU than other types of checks.

Can you send us the system profile and we'll take a closer look at the logs for you?

To send us your system profile.
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button

Best Regards,
Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
elinagios
Posts: 146
Joined: Thu Feb 16, 2017 3:45 am

Re: nagios serverd load rising

Post by elinagios »

So i looked closely on the /usr/local/nagios/var/nagios.log log file. What i noticed is that some services are giving:
Warning: The results of service 'Slot 5 TS Memory Usage' on host 'hostname1' are stale by 0d 0h 0m 34s (threshold=0d 0h 5m 18s). I'm forcing an immediate check of the service.
If i look at the service and host definition then the check freshness is enabled. Both are using active checks, check_ping for host and check_snmp for service. As i understand check freshness is not needed for active checks and can be disabled ?
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: nagios serverd load rising

Post by benjaminsmith »

Hi,
As i understand check freshness is not needed for active checks and can be disabled ?
That's correct, freshness checking is not required for active checks.

See: Host & Service Freshness Checks

Please PM the system profile and I'll review the logs and check for errors.

--Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
elinagios
Posts: 146
Joined: Thu Feb 16, 2017 3:45 am

Re: nagios serverd load rising

Post by elinagios »

I think you can close this thread. I saw a huge improvement on the CPU load side. Many those stale services where check_snmp running interval with 1 minute, removing the check_freshness and from some other active checks too has taken down the average load back to 3-4.
Thank you!
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: nagios serverd load rising

Post by scottwilkerson »

elinagios wrote:I think you can close this thread. I saw a huge improvement on the CPU load side. Many those stale services where check_snmp running interval with 1 minute, removing the check_freshness and from some other active checks too has taken down the average load back to 3-4.
Thank you!
Locking thread
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked