Page 1 of 2
Current Load on Localhost
Posted: Fri Sep 24, 2021 3:19 am
by lanxessinfy
Hi Team,
currently we are monitoring around 300 servers and 3500 services using Nagiosxi , nagiosxi server has 4 vCPU, 32 GB RAM.
We are getting critical alerts on current load of Localhost on every Monday and it is back to Ok on Tuesday
below is the checck command.
define service {
host_name localhost
service_description Current Load
use local-service
check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
contacts SN-SCRITICAL-LOCALHOST-Service
register 1
}
Attaching the SS of performance graph.
Current_Load.PNG
Please suggest a suitable solution.
Thanks
Re: Current Load on Localhost
Posted: Fri Sep 24, 2021 2:40 pm
by pbroste
Hello @lanxessinfy
Thanks for reaching out.
This issue appears to be intermittent, and the option to set up a triggered 'Eventhandler' to catch the offending process(es) would be the best way to approach it.
Here are the two support articles to instruct on the setup:
https://assets.nagios.com/downloads/nagiosxi/docs/Introduction-To-Event-Handlers-in-Nagios-XI.pdf
https://assets.nagios.com/downloads/nagiosxi/docs/Configuring-Global-Event-Handlers-In-Nagios-XI.pdf
I would say that
eventhandler script when triggered by the alert, should write the following to a log for review when alerts are triggered. This will provide an overview of what is happening during the extra load.
- top -b -n 1 > top.txt
- tail -n 10 /usr/local/nagiosxi/var/eventman.log
- tail -n 10 /usr/local/nagios/var/nagios.log
- tail -n 10 /var/log/syslog or /var/log/messages
Thanks,
Perry
Re: Current Load on Localhost
Posted: Mon Sep 27, 2021 8:58 am
by lanxessinfy
Hi @perry,
Thanks for the response.
We have set up event handler for current load service of nagios server.
As per the event handler script provided by you, it has to restart httpd service when the service state is critical but in our case it did not restart.
when the cpu load is critical this is the output.
top -b -n 1 > top.txt
[**************** ~]$ top -b -n 1 > top.txt
[*****************~]$ cat top.txt
top - 14:15:30 up 10 days, 8:05, 2 users, load average: 8.62, 8.63, 7.02
Tasks: 349 total, 11 running, 338 sleeping, 0 stopped, 0 zombie
%Cpu(s): 83.8 us, 5.9 sy, 0.0 ni, 8.8 id, 0.0 wa, 0.0 hi, 1.5 si, 0.0 st
KiB Mem : 32947892 total, 1267736 free, 11574456 used, 20105700 buff/cache
KiB Swap: 2097148 total, 1717324 free, 379824 used. 19939124 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12155 apache 20 0 4275160 3.000g 15832 R 93.8 9.5 22:35.79 wkhtmltopdf
30319 apache 20 0 3988992 2.851g 15784 R 93.8 9.1 32:03.40 wkhtmltopdf
30207 apache 20 0 3835948 2.623g 16284 R 87.5 8.3 16:25.83 wkhtmltopdf
6880 mysql 20 0 2652616 491336 10252 S 31.2 1.5 1:56.99 mysqld
1793 root 20 0 546936 32916 2996 S 12.5 0.1 35:28.19 python
732 root 20 0 0 0 0 S 6.2 0.0 52:01.41 kcs-evbsync/3
8331 apache 20 0 761776 42024 14424 R 6.2 0.1 0:05.68 httpd
9991 apache 20 0 756360 35996 13852 R 6.2 0.1 0:04.82 httpd
10813 apache 20 0 758060 38000 14168 R 6.2 0.1 0:03.16 httpd
13509 apache 20 0 752576 32376 14048 S 6.2 0.1 0:03.66 httpd
16037 apache 20 0 751292 30380 13808 S 6.2 0.1 0:00.18 httpd
can you please suggest a way to check if the httpd service is restarted or not ? and please give brief on the logs like eventman.log, /var/log/syslog or /var/log/messages.
Thanks!
Re: Current Load on Localhost
Posted: Mon Sep 27, 2021 5:03 pm
by pbroste
Hello @lanxessinfy
Appears that the top three are busy converting to pdf. Question; how many reports are you converting to pdf? Depending on what we are converting this can be process-intensive.
You can check for httpd service restart by viewing system logs. The system messages include the messages that are logged during system startup, mail, cron, services, kern, and auth,
Code: Select all
grep -Ei 'Starting The Apache HTTP Server' /var/log/messages
or depending on distro:
Code: Select all
grep -Ei 'Starting The Apache HTTP Server' /var/log/syslog
The
eventman.log show real-time handler events, which is located:
Code: Select all
tail -F /usr/local/nagiosxi/var/eventman.log
Thanks,
Perry
Re: Current Load on Localhost
Posted: Tue Sep 28, 2021 12:53 am
by lanxessinfy
Hi,
we are downloading/converting only 1 report.
whenever we are trying to download report the current load is increasing.
Today early morning current load went into critical but we didn't generate any reports.
currently we are using below script to log service info into hostinfo.txt file and restarting httpd service , the scripting is running fine but we want the script should run when the current load service state is critical.
script:
#!/bin/bash
SERVICESTATE=$1
case "$SERVICESTATE" in
CRITICAL)
top -b -n 1 | head -20 > /tmp/hostinfo.txt
systemctl restart httpd.service
;;
esac
In the above script we wrote service sate as "CRITICAL" but it is not working as per the logic.
can you please modify the existing script or provide us script which meets our requirement.
Thanks.
Re: Current Load on Localhost
Posted: Tue Sep 28, 2021 1:07 pm
by pbroste
Hello @lanxessinfy
Thanks for following up, the option to restart the service would be to
create an eventhandler to restart the 'httpd' service on critical alert on (for example) check_load plugin.
Here is an
example on restart service script:
Code: Select all
/usr/local/nagios/libexec/service_restart.sh
Paste the following into the terminal session:
#!/bin/sh
case "$1" in
OK)
;;
WARNING)
;;
UNKNOWN)
;;
#Restarting Linux Services With NCPA
CRITICAL)
/usr/local/nagios/libexec/check_ncpa.py -H "$2" -P 5693 -t "$3" -M 'plugins/service_restart.sh' -a "$4"
;;
esac
exit 0
Please review and let us know if you have further questions,
Perry
Re: Current Load on Localhost
Posted: Tue Oct 12, 2021 6:50 am
by lanxessinfy
Hi,
I did exactly what's in the document and run a passive check but could able to restart the httpd service.
I tun this cmd " grep -Ei 'Starting The Apache HTTP Server' /var/log/messages " but no new records.
Please find the SS.
Thanks!
Re: Current Load on Localhost
Posted: Wed Oct 13, 2021 9:04 am
by pbroste
Hello @lanxessinfy
Thanks for following up, want to take a look at the System Profile from your environment.
To send us your system profile.
- Login to the Nagios XI GUI using a web browser.
- Click the "Admin" > "System Profile" Menu
- Click the "Download Profile" button
- Save the profile.zip file and send via Private Message
Thanks,
Perry
Re: Current Load on Localhost
Posted: Wed Oct 13, 2021 10:59 am
by lanxessinfy
Hi,
I have sent the profile to you.
Thanks!
Re: Current Load on Localhost
Posted: Thu Oct 14, 2021 11:34 am
by pbroste
Hello @
@lanxessinfy
Following up with my test results; I went ahead and imported your System Profile on my test VM.
In the command for the 'Service Restart - Linux' set up with:
Command Name *
Service Restart - Linux
Command Line *
/home/nagios/service_restart.sh $SERVICESTATE$
Set permissions on the script to look like this:
-rwxr-xr-x 1 nagios nagios 334 Oct 14 11:02 service_restart.sh
The script that I used to test:
#!/bin/bash
SERVICESTATE=$1
case "$SERVICESTATE" in
CRITICAL)
top -b -n 1 | head -20 > /tmp/hostinfo.txt
date >> /tmp/hostinfo.txt
systemctl restart httpd.service
;;
esac
Logs to verify 'Event Handler' executed on alert:
*Code: Select all
cat /usr/local/nagios/var/nagios.log | grep -Ei 'current load' -A 5 -B 1
*Note; enable 'log_event_handlers=1' in
/usr/local/nagios/etc/nagios.cfg to output logging results for Event Handlers.
My results:
[1634229490] SERVICE EVENT HANDLER: localhost;Current Load;CRITICAL;HARD;1;Service Restart - Linux
May need to address permissions on the '/tmp/hostinfo.txt' to (chown nagios:nagios /tmp/hostinfo.txt)
Thanks,
Perry