All hosts are down - icmp check failing

Shwele · Post by **Shwele** » Thu Oct 26, 2017 9:38 am

Host itself is down with this output:

Warning: This plugin must be either run as root or setuid root.

All the services are up and running with OK.

Permissions for that check are:

Code: Select all

-rwxrwxr-x 1 nagios nagios 213856 Oct 24 16:02 check_icmp*

Anything I'm missing here?
I'm trying to check with Azure server and also tried with another server that we have, same error.

Thanks

npolovenko · Post by **npolovenko** » Thu Oct 26, 2017 12:25 pm

Hello again, @Shwele.
Please go to

Code: Select all

cd /usr/local/nagios/libexec/
and run
chmod u+s check_icmp

After that restart nagios

Code: Select all

service nagios restart

The plugin should be working now.

Shwele · Post by **Shwele** » Fri Oct 27, 2017 2:41 am

Why hello hello, my troubleshooting buddy @npolovenko

Its not, that is the problem.

Forgot to mention that when I visited hosts, it even says the following, which I did:

Code: Select all

To run as root, you can use a tool like sudo.
To set the setuid permissions, use the command:
chmod u+s yourpluginfile
check_icmp: Failed to obtain ICMP socket: Operation not permitted

As you can see the permissions are set:

nagios.png

After doing force immediate check, its still down.

Is there any public server I could put in order to check, maybe its my providers fault for not allowing ICMP to go trough their routers?

npolovenko · Post by **npolovenko** » Fri Oct 27, 2017 12:43 pm

@Shwele,

Why hello hello, my troubleshooting buddy @npolovenko

That's right, next time i should probably say Chao, prijatelju!

Are you able to just run a ping command from your nagios server?

Code: Select all

ping server_IP

It could be that the servers you're trying to check have some firewall restrictions.

Also, if you go to /usr/local/nagios/libexec/ there should be another plugin called ./check_ping
Let's see if it works:

Code: Select all

cd /usr/local/nagios/libexec/
./check_ping -H Target_Server_IP -w 1,10% -c 2,20%

If it works we could modify the template in XI to swap check_icmp for check_ping. Poth plugins are doing essentially the same function.

Shwele · Post by **Shwele** » Mon Oct 30, 2017 4:17 am

Ćao prijatelju! @npolovenko

Global admins sure have it easy, learning where I'm from to get extra friendly.

I am able to ping server on local hosting, but I am unable to ping Azure server. From what I've researched, they have this port 1 disabled, making it non pingable.

That fix with changing icmp with ping, Ill check if that fixes the issue for that server at least.

Azure server output from that ping command you've proposed:

Code: Select all

PING CRITICAL - Packet loss = 100%|rta=2.000000ms;1.000000;2.000000;0.000000 pl=100%;10;20;0

server in hosted in my country:

Code: Select all

PING OK - Packet loss = 0%, RTA = 0.44 ms|rta=0.444000ms;1.000000;2.000000;0.000000 pl=0%;10;20;0

Update:
After moving check_icmp and coping check_ping with its name, it still appears down in nagios interface, even tho it works from command line.

Ping command in nagiosxi:

Code: Select all

$USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5
ARG1: 100.0,20%
ARG2: 500.0,60%

npolovenko · Post by **npolovenko** » Mon Oct 30, 2017 11:11 am

Hello, @Shwele.
I definitely did not try to learn your location on purpose. I came across the website you posted when I was working on another thread and because I'm fluent in your language I decided to say hello

However if you have a privacy concern let me know and I'll clean up my previous posts...My apologies.

Update:
After moving check_icmp and coping check_ping with its name, it still appears down in nagios interface, even tho it works from command line.

Did you modify the template? If you go to Configure/Core Configuration Manager. Then you click on the host. You need to check what host template is being used.

template.png

Usually it's linux_server
Then you need to navigate to Templates/Host Templates in the left column, click on the template and change the check command. After it's done click on apply configuration.
You don't need to move/rename or modify plugin files themselves.

Now, if the Azure servers don't support ping commands you could use yet another template with another command of your choice for the initial host-health check. You could use check_tcp, or check_http which is going to access the server's webpage with http and if the webpage is UP the status for the host will be OK.

Shwele · Post by **Shwele** » Tue Oct 31, 2017 6:22 am

Hellows @npolovenko

Oh so its like that. Awesome, what website btw? You can PM too.

I got no issues with that, it was just a joke on my part, don't worry about it, it doesn't bother me. Its even nice.

PS: privacy is always a concern

PSS: its ok to leave it, I don't mind it, really

Ok that did the trick! Modified template for adequate check, from check-host-alive to check-host-alive-http . Hosts are now alive and well.

Tho now ping is causing issues with local server, could be by our meddling before :1

In NagiosXI web interface service as unknown showing this:

Code: Select all

CRITICAL - Could not interpret output from ping command

When I call ping command as nagios has in Status Information, here is the output:

Code: Select all

/usr/bin/ping -n -U -w 10 -c 5 12.34.56.789
PING 12.34.56.789 (12.34.56.789) 56(84) bytes of data.
64 bytes from 12.34.56.789: icmp_seq=1 ttl=63 time=10.4 ms
64 bytes from 12.34.56.789: icmp_seq=2 ttl=63 time=0.454 ms
64 bytes from 12.34.56.789: icmp_seq=3 ttl=63 time=5.82 ms
64 bytes from 12.34.56.789: icmp_seq=4 ttl=63 time=0.551 ms

And here is /usr/local/nagios/libexec/check_ping -H 12.34.56.789 -w 3000.0,80% -c 5000.0,100% -p 5

Code: Select all

PING OK - Packet loss = 0%, RTA = 0.75 ms|rta=0.747000ms;3000.000000;5000.000000;0.000000 pl=0%;80;100;0

Also Im seeing bunch of checks that have -local in it... for example disk, so I should find online some nagios service for checking disks, since this one does it locally only, without getting hostname from host itself?

npolovenko · Post by **npolovenko** » Tue Oct 31, 2017 1:20 pm

@Shwele

In NagiosXI web interface service as unknown showing this:
CODE: SELECT ALL
CRITICAL - Could not interpret output from ping command

That's a weird issue. Can you change the user to Nagios in the command line:
su - nagios
And run same commands:

Code: Select all

/usr/local/nagios/libexec/check_ping -H 12.34.56.789 -w 3000.0,80% -c 5000.0,100% -p 5

and this:

Code: Select all

/usr/bin/ping -n -U -w 10 -c 5 12.34.56.789

Do you get any errors/permission issues like that? If yes, you might need to run chmod u+s /bin/ping.

Also in XI go to Core Configuration Manager, click on host that has ping errors. And you can click on "Run Check Command".

screenshot-192.168.4.172-2017-10-31-12-58-43-575.png

Does that work ok? Make sure your command is defined as on my screenshot.

Also Im seeing bunch of checks that have -local in it... for example disk, so I should find online some nagios service for checking disks, since this one does it locally only, without getting hostname from host itself?

Can you clarify? Do you mean that all services have a config name "localhost" in Core Configuration Manager? Services will have the same name as a host. Are you asking how to monitor disk, ect on another server? Are your servers running windows or linux?

Shwele · Post by **Shwele** » Wed Nov 01, 2017 3:58 am

Yea, looks like ping wasn't changed with, I did run chmod u+s /usr/bin/ping and now it went smoothly.

Now its all green and well, looks like default permissions didn't fix that. Btw why don't you implement that chmod in nagiosxi installation, due that it saves you troubleshooting and dealing with it? There is even ICMP issue as well, when you install it as root you should have permission to do so?

Can you clarify? Do you mean that all services have a config name "localhost" in Core Configuration Manager? Services will have the same name as a host. Are you asking how to monitor disk, ect on another server? Are your servers running windows or linux?

No, I mean I copied service I wanted from localhost to reuse them, changed what host it is checking. But I'm getting same output for disk usage, which is from localhost, so I'm guessing checks were for server that nagios is on.

Like, I used copy, then renamed service check to hostname_checkname and added previously created host. But it seems these services are for local only (some of them?) and it has to be done over NRPE?
Exactly, one of the things is disk, mysql checks, apache, etc etc, I think all so far are checking localhost, not the host I want it to.
Linux server for all instances that we have, already up and running.

I tried going over wizard for another server, to see how that goes and it seems like the only option is to install NRPE on that server if I want it to be stalked by Nagios?

And NRPE needs it clean, so you see my troubles... tho SNMP seems like valid option, will write ticket with issues with it.

Ill try to elaborate few things and make a new topic, due that this one is getting roundabout.
Ill pm you with current state of hosts, since now its no longer dummy hosts.

npolovenko · Post by **npolovenko** » Wed Nov 01, 2017 12:15 pm

Hi @Shwele,

Btw why don't you implement that chmod in nagiosxi installation, due that it saves you troubleshooting and dealing with it? There is even ICMP issue as well, when you install it as root you should have permission to do so?

Usually, all standard plugins in XI work right out of the box without the need to change permissions. Not sure what happened, but we'll do some more QA tests to identify the issue and fix it in a future release.

Like, I used copy, then renamed service check to hostname_checkname and added previously created host. But it seems these services are for local only (some of them?) and it has to be done over NRPE?

Yeah, exactly. Many services, such as check_disk, check_processor, check_apache cannot run over standard HTTP protocol, without installing some additional agent on the remote server. That's where agents like NCPA or NRPE come into place. They run those plugins locally as a sudo and send results back to Nagios using various protocols.

I'm going to close this thread as resolved but feel free to open another one if you encounter other problems.

Nagios Support Forum

All hosts are down - icmp check failing

All hosts are down - icmp check failing

Re: All hosts are down - icmp check failing

Re: All hosts are down - icmp check failing

Re: All hosts are down - icmp check failing

Re: All hosts are down - icmp check failing

Re: All hosts are down - icmp check failing

Re: All hosts are down - icmp check failing

Re: All hosts are down - icmp check failing

Re: All hosts are down - icmp check failing

Re: All hosts are down - icmp check failing