Page 1 of 1
cloned Linux VM, checks stopped working
Posted: Tue Nov 12, 2019 9:50 am
by benningtonr
We had a Linux VM, that had several checks working fine. We cloned it and now the checks on the cloned VM do not work.
Ideas on were to start looking.i remove the host and all services from nagios, re did it and they still fail. I am attaching a snipit of the checks, ping is failing because the web master has that feature not allowed in yet.
Re: cloned Linux VM, checks stopped working
Posted: Tue Nov 12, 2019 11:50 am
by mbellerue
That is strange. Can you give us your nrpe config file from the remote machine? According to the error messages, the checks just aren't defined.
Also, is the IP address of the cloned Nagios server the same as the IP address of the old Nagios server? Or has the IP address of the cloned Nagios server been added to the allowed_hosts variable in the remote server's nrpe config file?
Re: cloned Linux VM, checks stopped working
Posted: Wed Nov 13, 2019 8:19 am
by benningtonr
The ip address has been changed, I have changed it in the setting on the nagios server. I compared the nrpe.cfg with another Linux server that is reporting normally and they are almost identical. with the exception of the don't blame Nagios. but the checks on the nagios server have the same arguments. Both the working nrpe and the non working nrpe have 127.0.0.1 as the allowed host. I also deleted all setting for this server on the Nagios server and ran the linux wizard again, with the same results.
Re: cloned Linux VM, checks stopped working
Posted: Wed Nov 13, 2019 8:34 am
by benningtonr
Another difference between the working and not working
Not working:
[bob@nagios ~]$ /usr/local/nagios/libexec/check_nrpe -H x.x.130.28
NRPE v3.2.1
Working:
[bob@nagios ~]$ /usr/local/nagios/libexec/check_nrpe -H x.x.130.8
NRPE v2.15
Re: cloned Linux VM, checks stopped working
Posted: Wed Nov 13, 2019 3:16 pm
by mbellerue
So it looks like these are the only commands that are specified in your nrpe.cfg file.
Code: Select all
command[check_users]=/usr/lib/nagios/plugins/check_users -w 5 -c 10
command[check_load]=/usr/lib/nagios/plugins/check_load -r -w .15,.10,.05 -c .30,.25,.20
command[check_hda1]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /dev/hda1
command[check_zombie_procs]=/usr/lib/nagios/plugins/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/lib/nagios/plugins/check_procs -w 150 -c 200
Below that a little ways are some of the default commands that you can use as an example. Some people just uncomment these commands and use them as-is.
Code: Select all
### MISC SYSTEM METRICS ###
#command[check_users]=/usr/lib/nagios/plugins/check_users $ARG1$
#command[check_load]=/usr/lib/nagios/plugins/check_load $ARG1$
#command[check_disk]=/usr/lib/nagios/plugins/check_disk $ARG1$
#command[check_swap]=/usr/lib/nagios/plugins/check_swap $ARG1$
#command[check_cpu_stats]=/usr/lib/nagios/plugins/check_cpu_stats.sh $ARG1$
#command[check_mem]=/usr/lib/nagios/plugins/custom_check_mem -n $ARG1$
But the commands that are failing with an error similar to,
NRPE: Command 'check_disk' not defined. This is what it's talking about. The commands need to be defined in the nrpe.cfg file on the remote machine.
Re: cloned Linux VM, checks stopped working
Posted: Mon Nov 18, 2019 4:42 pm
by benningtonr
Check disk is defined, it is the third one from the top, i have two Linux boxes, configured the same one works and one does not, i will un comment to see if that helps, but why would one work and one not?
Thank you
Re: cloned Linux VM, checks stopped working
Posted: Mon Nov 18, 2019 5:23 pm
by benjaminsmith
Hello
@benningtonr,
Looking over the configurations that Michale posted, you'll notice the command is defined as check_hda1, but the log shown in the initial post is trying to call check_disk
Code: Select all
command[check_hda1]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /dev/hda1
When you use check_nrpe to call a check on the remote host the command/name will need to be set in the nrpe.cfg file. The main difference between the sample command the one above is the sample command is set to take arguments for the thresholds and path.
Have you compared the nrpe.cfg files from both systems? Those other commands such as check_cpu_stats or check_apt in this configuration file are commented out. Try uncommenting those commands.
Code: Select all
#command[check_cpu_stats]=/usr/lib/nagios/plugins/check_cpu_stats.sh $ARG1$
#command[check_init_service]=sudo /usr/lib/nagios/plugins/check_init_service $ARG1$
Let us know if making those changes, fixes the issue.