check_icmp flawed
Posted: Fri Aug 10, 2018 3:31 am
We have a rather large installation with hunderds of hosts and thousands of checks. Sometimes check_icmp is reporting OK checking hosts actually down. After a lot of investigation we downloaded the source code of check_icmp. It has been designed to accomplish multiple tasks and it is not stricly checking icmp echo replies. It uses raw sockets and captures all icmp incoming packets. It uses the process PID (which is a field of the icmp packet) to identify its own echo replies. Unfortunately the pid_t structures it uses its a signed short int. So, with hundreds of ongoing ping checks, it seems that sometimes there can be 2 check_icmp processes with two PIDs that converted to pid_t are the same signed int. Unfortunately check_icmp is NOT even checking the sequence field in the icmp packet (which could reduce the possibility to match wrongly an echo reply).
We changed the source code a lot for strict checking synchronous icmp echo request and replies.
It seems (as reported by other users) that check_ping is not showing the flaws of check_icmp. We would suggest not to use check_icmp in production critical deployments.
We changed the source code a lot for strict checking synchronous icmp echo request and replies.
It seems (as reported by other users) that check_ping is not showing the flaws of check_icmp. We would suggest not to use check_icmp in production critical deployments.