Page 1 of 2
Nagios check_tcp not showing correct status on UI
Posted: Sun Jun 15, 2025 9:14 am
by pnikhade
Hi All,
I am working to setup nagios monitoring on remote host for TCP ports 5044, 9200 & 5601.
All these ports have containers running for elasticsearch, logstash & kibana respectively.
The problem here is that whenever I try to setup monitoring on the /usr/local/nagios/etc/servers hosts.cfg file on nagios UI it is not showing me "OK".
Below message is shown, so why it shows this message ? please point out my mistake or possible steps ?
(No output on stdout) stderr: Could not resolve hostname 13.233.122.181 -p 9200: Name or service not known
Whereas if I run the command manually I get response line below, which seems correct as all containers are running on respective ports.
Code: Select all
[root@nagios-core libexec]# pwd
/usr/local/nagios/libexec
[root@nagios-core libexec]# ./check_tcp -H 13.233.122.181 -p 9200 && ./check_tcp -H 13.233.122.181 -p 5601
TCP OK - 0.001 second response time on 13.233.122.181 port 9200|time=0.001045s;;;0.000000;10.000000
TCP OK - 0.001 second response time on 13.233.122.181 port 5601|time=0.000808s;;;0.000000;10.000000
Re: Nagios check_tcp not showing correct status on UI
Posted: Sun Jun 15, 2025 9:32 pm
by kg2857
That seems odd.
Can you post the service definition as well as the command defined?
You may also want to run what's defined in the command definition and post that.
BTW, nagios usually runs commands as the nagios user, not root. This isn't likely to be an issue here, but could impact your testing in the future.
Re: Nagios check_tcp not showing correct status on UI
Posted: Sun Jun 15, 2025 10:11 pm
by pnikhade
Please check the below,
Service defination defined on nagios server,
Path - /usr/local/nagios/etc/servers/ELK-stack.cfg
Code: Select all
define host {
use linux-server
host_name ELK-Stack
alias My client server
address 15.207.248.93
max_check_attempts 20
check_period 24x7
notification_interval 1
notification_period 24x7
}
define service {
use generic-service
host_name ELK-Stack
service_description CPU Load
check_interval 2
retry_interval 1
check_command check_nrpe!check_load -a '-w .15,.10,.05 -c .30,.25,.20'
}
define service {
use generic-service
host_name ELK-Stack
service_description NPRE service
check_interval 1
retry_interval 1
check_command check_nrpe!check_kibana '-H 15.207.248.93 -p 5666'
}
define service {
use generic-service
host_name ELK-Stack
service_description Elasticsearch service
check_interval 1
retry_interval 1
check_command check_nrpe!check_elasticsearch '-H 15.207.248.93 -p 9200'
}
define service {
use generic-service
host_name ELK-Stack
service_description Kibana service
check_interval 1
retry_interval 1
check_command check_nrpe!check_kibana '-H 15.207.248.93 -p 5061'
}
define service {
use generic-service
host_name ELK-Stack
service_description Logstash service
check_interval 1
retry_interval 1
check_command check_nrpe!check_logstash '-H 15.207.248.93 -p 5044'
}
commands defined on client nrpe.cfg file,
Path - /usr/local/nagios/etc/nrpe.cfg
Code: Select all
command[check_logstash]=/usr/local/nagios/libexec/check_tcp $ARG1$
command[check_kibana]=/usr/local/nagios/libexec/check_tcp $ARG1$
command[check_disk]=/usr/local/nagios/libexec/check_disk $ARG1$
Re: Nagios check_tcp not showing correct status on UI
Posted: Sun Jun 15, 2025 10:30 pm
by kg2857
And the output of the following is what?
/usr/local/nagios/libexec/check_tcp -H 15.207.248.93 -p 5044
Your test is pointing to 13.233.122.181but the check is running on the address above so the testing is meaningless.
Re: Nagios check_tcp not showing correct status on UI
Posted: Sun Jun 15, 2025 10:56 pm
by pnikhade
Please understand that EC2 instance was turned off and once it is on it will be assigned with new IP. Hence 15.207.248.93 is the new IP address. This IP address is replaced everywhere, including nrpe.cfg, and .cfg file under server directory.
Code: Select all
[root@nagios-core ~]# /usr/local/nagios/libexec/check_tcp -H 15.207.248.93 -p 9200 && /usr/local/nagios/libexec/check_tcp -H 15.207.248.93 -p 5601 && /usr/local/nagios/libexec/check_tcp -H 15.207.248.93 -p 5666
TCP OK - 0.001 second response time on 15.207.248.93 port 9200|time=0.001498s;;;0.000000;10.000000
TCP OK - 0.001 second response time on 15.207.248.93 port 5601|time=0.001028s;;;0.000000;10.000000
TCP OK - 0.001 second response time on 15.207.248.93 port 5666|time=0.000913s;;;0.000000;10.000000
[root@nagios-core ~]#
Re: Nagios check_tcp not showing correct status on UI
Posted: Sun Jun 15, 2025 11:09 pm
by kg2857
You aren't testing as the service is defined.
Glad the issue is resolved.
Re: Nagios check_tcp not showing correct status on UI
Posted: Sun Jun 15, 2025 11:22 pm
by pnikhade
It is not resolved. I am still getting below message on UI. Please help. I just showed you in my earlier reply that from command line this works correctly but on UI it shows error.
(No output on stdout) stderr: Could not resolve hostname 15.207.248.93 -p 9200: Name or service not known
Re: Nagios check_tcp not showing correct status on UI
Posted: Sun Jun 15, 2025 11:40 pm
by kg2857
All I can think of is to remove the single quotes in the service since the system is trying to resolve '15.207.248.93 -p 9200' as a hostname.
Re: Nagios check_tcp not showing correct status on UI
Posted: Sun Jun 15, 2025 11:56 pm
by pnikhade
The single quotes are basically taking arguments with "-a" flag. So not sure it is an incorrect syntax ?
Anyways I tried that as well, UI shows like below,
Usage:
Re: Nagios check_tcp not showing correct status on UI
Posted: Mon Jun 16, 2025 12:11 am
by kg2857
Try removing the check_nrpe from the service.
command[check_logstash]=/usr/local/nagios/libexec/check_tcp $ARG1$
define service {
use generic-service
host_name ELK-Stack
service_description Logstash service
check_interval 1
retry_interval 1
check_command check_logstash '-H 15.207.248.93 -p 5044'
}