Nagios perform command on server down
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Nagios perform command on server down
Can you post your nsc.ini from server B (blocking out any sensitive data)
Re: Nagios perform command on server down
Hi Scott,
I already got it to work. I misplaced my command under the [External Script] section instead of [External Scripts]. Now it does work when I turn off server A. However, server A is an MSSQL server and I actually want Nagios to perform the action as soon as the MSSQL Server Agent stops working on server A. I already added the MSSQL server check in Nagios to server A. Currently Nagios is trying to connect to the MSSQL server using a MSSQL username and password. As soon as I take the MSSQL server agent down, Nagios produces a critical warning on this service, telling it is down. And that's when I want Nagios to perform the Event Handler, instead of waiting untill the whole server is down.
Could you please describe how to configure the Event Handler to occur when this specific service check fails?
Thank you for all your kind help so far.
I already got it to work. I misplaced my command under the [External Script] section instead of [External Scripts]. Now it does work when I turn off server A. However, server A is an MSSQL server and I actually want Nagios to perform the action as soon as the MSSQL Server Agent stops working on server A. I already added the MSSQL server check in Nagios to server A. Currently Nagios is trying to connect to the MSSQL server using a MSSQL username and password. As soon as I take the MSSQL server agent down, Nagios produces a critical warning on this service, telling it is down. And that's when I want Nagios to perform the Event Handler, instead of waiting untill the whole server is down.
Could you please describe how to configure the Event Handler to occur when this specific service check fails?
Thank you for all your kind help so far.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Nagios perform command on server down
You can add the event handler to a service the same way you added it to the host, like so:
Go to Configure -> Core Config Manager -> Services -> Modify -> Check Settings Tab
Set
Event handler = run_script_on_b_when_a_down
Go to Configure -> Core Config Manager -> Services -> Modify -> Check Settings Tab
Set
Event handler = run_script_on_b_when_a_down
Re: Nagios perform command on server down
Thank you Scott.
I am facing another problem now. The .bat file, which the NSclient should execute, only does its action on the machine where the .bat is located. For example, in the .bat file are commands described to rename a hosts file in the system32 folder on localhost and also on some hosts in the LAN network. When I execute the script by hand, all the hosts files on all the hosts in the LAN network are renamed. But when Nagios calls the script, only the hosts file on the host where the script is located gets renamed.
I see the following in the nsclient.log at the time of the script execution:
error:modules\NRPEListener\NRPEListener.cpp:465: Truncating return data as it is bigger then NRPE allows
Can anyone tell me how to get this fixed or what the error means? When Nagios calls the script I need it to perform all the commands in it, not only the first one on localhost.
I am facing another problem now. The .bat file, which the NSclient should execute, only does its action on the machine where the .bat is located. For example, in the .bat file are commands described to rename a hosts file in the system32 folder on localhost and also on some hosts in the LAN network. When I execute the script by hand, all the hosts files on all the hosts in the LAN network are renamed. But when Nagios calls the script, only the hosts file on the host where the script is located gets renamed.
I see the following in the nsclient.log at the time of the script execution:
error:modules\NRPEListener\NRPEListener.cpp:465: Truncating return data as it is bigger then NRPE allows
Can anyone tell me how to get this fixed or what the error means? When Nagios calls the script I need it to perform all the commands in it, not only the first one on localhost.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Nagios perform command on server down
I'm not positive, but I am guessing that the user that NSClient++ is running under doesn't have permissions to change the files on other machines. You may need to have NSClient++ run as a different user...
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: Nagios perform command on server down
Code: Select all
error:modules\NRPEListener\NRPEListener.cpp:465: Truncating return data as it is bigger then NRPE allowshttp://nsclient.org/nscp/ticket/133
Re: Nagios perform command on server down
Hi Slansing,
I figured out that even when I place the commands to rename the hosts file on localhost at the bottom of my script, these are still the only ones getting executed. The first ones don't which describe to rename the hosts file on another remote host in the LAN. So I assume it has got nothing to do with the maximum length. How would it otherwise be possible that the whole script is skipped, and only the last 2 lines describing a rename action on localhost is executed?
Before I forget: It doesn't matter where I place the commands for localhost. Even if I place them in the middle of my script, those are still the only ones getting executed. All the ones on remote hosts don't.
I cannot understand why... The NSClient doesn't have to do anything, only call a .bat file to execute. The real execution is done by the batch file....
I figured out that even when I place the commands to rename the hosts file on localhost at the bottom of my script, these are still the only ones getting executed. The first ones don't which describe to rename the hosts file on another remote host in the LAN. So I assume it has got nothing to do with the maximum length. How would it otherwise be possible that the whole script is skipped, and only the last 2 lines describing a rename action on localhost is executed?
Before I forget: It doesn't matter where I place the commands for localhost. Even if I place them in the middle of my script, those are still the only ones getting executed. All the ones on remote hosts don't.
I cannot understand why... The NSClient doesn't have to do anything, only call a .bat file to execute. The real execution is done by the batch file....
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Nagios perform command on server down
Did you check this?scottwilkerson wrote:I'm not positive, but I am guessing that the user that NSClient++ is running under doesn't have permissions to change the files on other machines. You may need to have NSClient++ run as a different user...
By default NSClient++ runs under the Local system account which likely doesn't have access to your remote machines. To change this (or at least test it) you can
open services
right click on NSCLient+
chose properties
Log On Tab
Change to specify the account that you logon as,the one that CAN run the commands on the other server.
Click OK
restart NSClient++
Then try again.
Re: Nagios perform command on server down
Thank you Scott! You saved my day.scottwilkerson wrote:Did you check this?scottwilkerson wrote:I'm not positive, but I am guessing that the user that NSClient++ is running under doesn't have permissions to change the files on other machines. You may need to have NSClient++ run as a different user...
By default NSClient++ runs under the Local system account which likely doesn't have access to your remote machines. To change this (or at least test it) you can
open services
right click on NSCLient+
chose properties
Log On Tab
Change to specify the account that you logon as,the one that CAN run the commands on the other server.
Click OK
restart NSClient++
Then try again.
Works like a charm.
I was unsure if this would work because the servers are not in a domain, only in the same LAN network. But adding the same account to every server and granting .\<username> rights on the NSclient service worked.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Nagios perform command on server down
No problem. Glad it worked for you.