Page 1 of 1

Rebbot monitoring for the linux servers

Posted: Wed Mar 03, 2021 2:28 am
by informatica
Hi Team,

we would like to monitor the reboot of server for the linux server.
For Ex: if the server is reboot we will come to know that after 5 min as per nagios standard setting every 5 min and retry 1 and max attempts 5 min.

But we need alert if the server is reboot/restart.
Do we have any service config like if the server is rebooted we should get the alert instead of changing the host check interval.
please let us know if we have such plugin for the linux server.

we hare using nrpe as agent in linux server.

Re: Rebbot monitoring for the linux servers

Posted: Wed Mar 03, 2021 6:35 pm
by jdunitz
You could do something like this as a plugin script:

Code: Select all

#!/bin/bash
# Reboot alert

SECONDS=`cat /proc/uptime | awk -F. '{print $1}'`

if [ "$SECONDS" -lt "60" ]; then
result="CRITICAL"
exitstatus="2"

else
result="OK"
exitstatus="0"
fi

echo "$result - uptime is $SECONDS"
exit $exitstatus

You could modify or add on to this to make it behave how you want, if you needed it to be fancier.

--Jeffrey

Re: Rebbot monitoring for the linux servers

Posted: Thu Mar 04, 2021 3:53 am
by informatica
Hi Team,

Idon't know how its working. I tried to execute the script checked with restarting the server but no luck this is not working as expected.

[nagios@in-root]$ /usr/local/nagios/libexec/check_nrpe -H inl77 -c check_uptime_minute
OK - uptime is 822
You have new mail in /var/spool/mail/root

The server is restarted and uptime is 6 min. But in script you mentioned -le 60 in seconds even thought alert is not generated .
[root@test libexec]# uptime
04:40:28 up 6 min, 1 user, load average: 0.02, 0.10, 0.05

Re: Rebbot monitoring for the linux servers

Posted: Fri Mar 05, 2021 10:29 am
by jdunitz
Sorry, a couple lines got cut off when I pasted the script. Here's the full one:

Code: Select all

#!/bin/bash
# Reboot alert

SECONDS=`cat /proc/uptime | awk -F. '{print $1}'`

if [ "$SECONDS" -lt "60" ]; then
result="CRITICAL"
exitstatus="2"

else
result="OK"
exitstatus="0"
fi

echo $result
exit $exitstatus
And you can see how it works:
After rebooting the other machine, as soon as it came back up and became reachable, I ran the check and got critical:

Code: Select all

[root@jpd-nagiosxi-one libexec]#  ./check_nrpe -H 192.168.1.10  -2  -c check_uptime_minute
CRITICAL

Then I waited another minute or so, and reran it, and:

Code: Select all

[root@jpd-nagiosxi-one libexec]#  ./check_nrpe -H 192.168.1.10  -2  -c check_uptime_minute
OK
[root@jpd-nagiosxi-one libexec]#
There you go!

--Jeffrey