Problems with UNKNOWN messages.
-
nagiosEngie
- Posts: 104
- Joined: Thu May 03, 2018 7:57 am
Problems with UNKNOWN messages.
HEllo Nagios Crew,
I'm having huge amounts of UNKNOWN messages due to:
UNKNOWN: Execution exceeded timeout threshold of 60s
UNKNOWN: Error occurred while running the plugin. Use the verbose flag for more details.
Most of these are alarms from "SWAP" and "Uptime" checks done on NCPA agents.
Some stats in attached file: "unkonwn messages stats.docx"
Stats are generated on this months (october) alerts.
Some servers generate a huge amount of unknown messages.
Do you have any suggestions to limit this problem?
Thanks
Sandro
I'm having huge amounts of UNKNOWN messages due to:
UNKNOWN: Execution exceeded timeout threshold of 60s
UNKNOWN: Error occurred while running the plugin. Use the verbose flag for more details.
Most of these are alarms from "SWAP" and "Uptime" checks done on NCPA agents.
Some stats in attached file: "unkonwn messages stats.docx"
Stats are generated on this months (october) alerts.
Some servers generate a huge amount of unknown messages.
Do you have any suggestions to limit this problem?
Thanks
Sandro
You do not have the required permissions to view the files attached to this post.
-
benjaminsmith
- Posts: 5324
- Joined: Wed Aug 22, 2018 4:39 pm
- Location: saint paul
Re: Problems with UNKNOWN messages.
HI @nagiosEngine
If this just started happening in October and it's intermittent, you might be experiencing some type of network connectivity or quality of service issues.
You could try upping the timeout settings beyond 60 seconds on the ncpa check command. Go the CCM > _Commands > Edit ( $USER1$/check_ncpa.py -H $HOSTADDRESS$ -T 120 $ARG1$ and if that reduces the number for unknown messages, it's most likely a network issue.
Let me know if what you find out.
If this just started happening in October and it's intermittent, you might be experiencing some type of network connectivity or quality of service issues.
You could try upping the timeout settings beyond 60 seconds on the ncpa check command. Go the CCM > _Commands > Edit ( $USER1$/check_ncpa.py -H $HOSTADDRESS$ -T 120 $ARG1$ and if that reduces the number for unknown messages, it's most likely a network issue.
Let me know if what you find out.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
nagiosEngie
- Posts: 104
- Joined: Thu May 03, 2018 7:57 am
Re: Problems with UNKNOWN messages.
Hello,
I kept an eye on the messages related to UPTIME and SWAP. They sill all go in timeout:
From event log:
2018-11-06 15:01:10 Warning: Check of service 'Uptime' on host 'EILIBWEBITMI01' timed out after 120.007s!
Runtime Error 2018-11-06 15:01:10 wproc: host=EILIBWEBITMI01; service=Uptime;
2018-11-06 15:00:29 Warning: Check of service 'Swap Usage' on host 'EILIBWEBITMI01' timed out after 120.006s!
Runtime Error 2018-11-06 15:00:29 wproc: host=EILIBWEBITMI01; service=Swap Usage;
SAndro
I kept an eye on the messages related to UPTIME and SWAP. They sill all go in timeout:
From event log:
2018-11-06 15:01:10 Warning: Check of service 'Uptime' on host 'EILIBWEBITMI01' timed out after 120.007s!
Runtime Error 2018-11-06 15:01:10 wproc: host=EILIBWEBITMI01; service=Uptime;
2018-11-06 15:00:29 Warning: Check of service 'Swap Usage' on host 'EILIBWEBITMI01' timed out after 120.006s!
Runtime Error 2018-11-06 15:00:29 wproc: host=EILIBWEBITMI01; service=Swap Usage;
SAndro
Re: Problems with UNKNOWN messages.
Can you show us a few examples of "failing" checks, run from the command line along with the output of it? Please use the verbose flag (-v).
Example:
Also, run the following command and show the output:
Example:
Code: Select all
/usr/local/nagios/libexec/check_ncpa.py -H <ip address> -t '<token>' -P 5693 -M memory/swap/percent -w 50 -c 80 -vCode: Select all
nmap <server ip> -p 5693Be sure to check out our Knowledgebase for helpful articles and solutions!
-
nagiosEngie
- Posts: 104
- Joined: Thu May 03, 2018 7:57 am
Re: Problems with UNKNOWN messages.
Hi,
That is the problem if I do it via command line I am unable to generate the timeout, even if I repeat the command one after the other.
I launched this 10 times in a row and the command gave the correct output wit no problem.
/usr/local/nagios/libexec/check_ncpa.py -v -H <IP> -T 120 -t 'xxxxxx' -P 5693 -M memory/swap -u Gi -w 95 -c 98
Connecting to: https://<IP>:5693/api/memory/swap/?token=xxxxx&warning=95&critical=98&units=Gi&check=1
File returned contained:
{
"returncode": 0,
"stdout": "OK: Used swap was 56.90 % (Total: 9.25 GiB, Used: 5.26 GiB, Free: 3.99 GiB) | 'total'=9.25GiB;9;9; 'used'=5.26GiB;9;9; 'free'=3.99GiB;9;9;"
}
OK: Used swap was 56.90 % (Total: 9.25 GiB, Used: 5.26 GiB, Free: 3.99 GiB) | 'total'=9.25GiB;9;9; 'used'=5.26GiB;9;9; 'free'=3.99GiB;9;9;
NMAP command output:
nmap <IP> -p 5693
Starting Nmap 6.47 ( http://nmap.org ) at 2018-11-07 09:56 CET
Nmap scan report for <HOSTNAME FQDN> (<IP>)
Host is up (0.00078s latency).
PORT STATE SERVICE
5693/tcp open unknown
Nmap done: 1 IP address (1 host up) scanned in 0.06 seconds
I checked CPU usage and SWAP on the monitored server and it is OK.
Thanks
Sandro
That is the problem if I do it via command line I am unable to generate the timeout, even if I repeat the command one after the other.
I launched this 10 times in a row and the command gave the correct output wit no problem.
/usr/local/nagios/libexec/check_ncpa.py -v -H <IP> -T 120 -t 'xxxxxx' -P 5693 -M memory/swap -u Gi -w 95 -c 98
Connecting to: https://<IP>:5693/api/memory/swap/?token=xxxxx&warning=95&critical=98&units=Gi&check=1
File returned contained:
{
"returncode": 0,
"stdout": "OK: Used swap was 56.90 % (Total: 9.25 GiB, Used: 5.26 GiB, Free: 3.99 GiB) | 'total'=9.25GiB;9;9; 'used'=5.26GiB;9;9; 'free'=3.99GiB;9;9;"
}
OK: Used swap was 56.90 % (Total: 9.25 GiB, Used: 5.26 GiB, Free: 3.99 GiB) | 'total'=9.25GiB;9;9; 'used'=5.26GiB;9;9; 'free'=3.99GiB;9;9;
NMAP command output:
nmap <IP> -p 5693
Starting Nmap 6.47 ( http://nmap.org ) at 2018-11-07 09:56 CET
Nmap scan report for <HOSTNAME FQDN> (<IP>)
Host is up (0.00078s latency).
PORT STATE SERVICE
5693/tcp open unknown
Nmap done: 1 IP address (1 host up) scanned in 0.06 seconds
I checked CPU usage and SWAP on the monitored server and it is OK.
Thanks
Sandro
Re: Problems with UNKNOWN messages.
This is strange. Does the timeout happen at about the same time? Perhaps, the server that you are monitoring is very busy at that time with performing updates, backups, etc.?That is the problem if I do it via command line I am unable to generate the timeout, even if I repeat the command one after the other.
What is the version of the NCPA agent and check_ncpa.py plugin that you are currently using?
Code: Select all
/usr/local/nagios/libexec/check_ncpa.py -H<ip address> -t <token>' -P 5693 -M system/agent_version
/usr/local/nagios/libexec/check_ncpa.py -VBe sure to check out our Knowledgebase for helpful articles and solutions!
-
nagiosEngie
- Posts: 104
- Joined: Thu May 03, 2018 7:57 am
Re: Problems with UNKNOWN messages.
Hello lmiltchev,
the situation is getting worse. I am getting timeouts on more and more servers.
I had a look at the eventlog and I have just in the last week 3300 timeout messages.
this is now happening on 5 different servers.
La ncpa agent I am using is 2.1.3.
Nagios upgraded to the latest update 5.5.6
check_ncpa.py, Version 1.1.3
Do you think this can be related to high load on the nagios server? stats in image.
Thanks
SAndro
the situation is getting worse. I am getting timeouts on more and more servers.
I had a look at the eventlog and I have just in the last week 3300 timeout messages.
this is now happening on 5 different servers.
La ncpa agent I am using is 2.1.3.
Nagios upgraded to the latest update 5.5.6
check_ncpa.py, Version 1.1.3
Do you think this can be related to high load on the nagios server? stats in image.
Thanks
SAndro
You do not have the required permissions to view the files attached to this post.
Re: Problems with UNKNOWN messages.
Can you PM me your latest profile (Admin > System Config > System Profile > Download Profile)? We will need to review your configs and various logs.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Problems with UNKNOWN messages.
For some reason nagios.log is missing from the profile. Can you PM me the log?
Also, send the ncpa_listener.log and win32service_ncpalistener.log from the Windows machine.
How long does it usually take to run these NCPA commands from the command line? Try running them several times, and time the check.
Example:
Also, send the ncpa_listener.log and win32service_ncpalistener.log from the Windows machine.
How long does it usually take to run these NCPA commands from the command line? Try running them several times, and time the check.
Example:
Code: Select all
time /usr/local/nagios/libexec/check_ncpa.py -v -H <IP> -T 120 -t 'xxxxxx' -P 5693 -M memory/swap -u Gi -w 95 -c 98Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Problems with UNKNOWN messages.
There are errors in the ncpa_listener.log as this one:
Do the following:
1. Stop both, the NCPA Listener, and NCPA Passive services on the Windows machine.
2. Delete the db file - C:\Program Files (x86)\Nagios\NCPA\var\ncpa.db. It will be recreated when the services start.
3. Disable the check logging in the C:\Program Files (x86)\Nagios\NCPA\etc\ncpa.cfg by changing the check_logging value to zero:
save, and exit.
4. Start the NCPA Listener, and NCPA Passive services.
Let us know if this helped.
which means that most probably the db is corrupt.2018-10-22 04:15:33,447:ERROR:database:database is locked
Traceback (most recent call last):
File "C:\ncpa\agent\listener\database.py", line 67, in add_check
OperationalError: database is locked
Do the following:
1. Stop both, the NCPA Listener, and NCPA Passive services on the Windows machine.
2. Delete the db file - C:\Program Files (x86)\Nagios\NCPA\var\ncpa.db. It will be recreated when the services start.
3. Disable the check logging in the C:\Program Files (x86)\Nagios\NCPA\etc\ncpa.cfg by changing the check_logging value to zero:
Code: Select all
check_logging = 04. Start the NCPA Listener, and NCPA Passive services.
Let us know if this helped.
Be sure to check out our Knowledgebase for helpful articles and solutions!