Rebbot monitoring for the linux servers

informatica · Post by **informatica** » Wed Mar 03, 2021 2:28 am

Hi Team,

we would like to monitor the reboot of server for the linux server.
For Ex: if the server is reboot we will come to know that after 5 min as per nagios standard setting every 5 min and retry 1 and max attempts 5 min.

But we need alert if the server is reboot/restart.
Do we have any service config like if the server is rebooted we should get the alert instead of changing the host check interval.
please let us know if we have such plugin for the linux server.

we hare using nrpe as agent in linux server.

Post by **jdunitz** » Wed Mar 03, 2021 6:35 pm

You could do something like this as a plugin script:

Code: Select all

#!/bin/bash
# Reboot alert

SECONDS=`cat /proc/uptime | awk -F. '{print $1}'`

if [ "$SECONDS" -lt "60" ]; then
result="CRITICAL"
exitstatus="2"

else
result="OK"
exitstatus="0"
fi

echo "$result - uptime is $SECONDS"
exit $exitstatus

You could modify or add on to this to make it behave how you want, if you needed it to be fancier.

--Jeffrey

informatica · Post by **informatica** » Thu Mar 04, 2021 3:53 am

Hi Team,

Idon't know how its working. I tried to execute the script checked with restarting the server but no luck this is not working as expected.

[nagios@in-root]$ /usr/local/nagios/libexec/check_nrpe -H inl77 -c check_uptime_minute
OK - uptime is 822
You have new mail in /var/spool/mail/root

The server is restarted and uptime is 6 min. But in script you mentioned -le 60 in seconds even thought alert is not generated .
[root@test libexec]# uptime
04:40:28 up 6 min, 1 user, load average: 0.02, 0.10, 0.05

Post by **jdunitz** » Fri Mar 05, 2021 10:29 am

Sorry, a couple lines got cut off when I pasted the script. Here's the full one:

Code: Select all

#!/bin/bash
# Reboot alert

SECONDS=`cat /proc/uptime | awk -F. '{print $1}'`

if [ "$SECONDS" -lt "60" ]; then
result="CRITICAL"
exitstatus="2"

else
result="OK"
exitstatus="0"
fi

echo $result
exit $exitstatus

And you can see how it works:
After rebooting the other machine, as soon as it came back up and became reachable, I ran the check and got critical:

Code: Select all

[root@jpd-nagiosxi-one libexec]#  ./check_nrpe -H 192.168.1.10  -2  -c check_uptime_minute
CRITICAL

Then I waited another minute or so, and reran it, and:

Code: Select all

[root@jpd-nagiosxi-one libexec]#  ./check_nrpe -H 192.168.1.10  -2  -c check_uptime_minute
OK
[root@jpd-nagiosxi-one libexec]#

There you go!

--Jeffrey

Nagios Support Forum

Rebbot monitoring for the linux servers

Rebbot monitoring for the linux servers

Re: Rebbot monitoring for the linux servers

Re: Rebbot monitoring for the linux servers

Re: Rebbot monitoring for the linux servers