Problems with UNKNOWN messages.

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
nagiosEngie
Posts: 104
Joined: Thu May 03, 2018 7:57 am

Problems with UNKNOWN messages.

Post by nagiosEngie »

HEllo Nagios Crew,
I'm having huge amounts of UNKNOWN messages due to:

UNKNOWN: Execution exceeded timeout threshold of 60s
UNKNOWN: Error occurred while running the plugin. Use the verbose flag for more details.

Most of these are alarms from "SWAP" and "Uptime" checks done on NCPA agents.
Some stats in attached file: "unkonwn messages stats.docx"

Stats are generated on this months (october) alerts.
Some servers generate a huge amount of unknown messages.
Do you have any suggestions to limit this problem?

Thanks
Sandro
You do not have the required permissions to view the files attached to this post.
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Problems with UNKNOWN messages.

Post by benjaminsmith »

HI @nagiosEngine

If this just started happening in October and it's intermittent, you might be experiencing some type of network connectivity or quality of service issues.

You could try upping the timeout settings beyond 60 seconds on the ncpa check command. Go the CCM > _Commands > Edit ( $USER1$/check_ncpa.py -H $HOSTADDRESS$ -T 120 $ARG1$ and if that reduces the number for unknown messages, it's most likely a network issue.

Let me know if what you find out.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
nagiosEngie
Posts: 104
Joined: Thu May 03, 2018 7:57 am

Re: Problems with UNKNOWN messages.

Post by nagiosEngie »

Hello,
I kept an eye on the messages related to UPTIME and SWAP. They sill all go in timeout:

From event log:

2018-11-06 15:01:10 Warning: Check of service 'Uptime' on host 'EILIBWEBITMI01' timed out after 120.007s!
Runtime Error 2018-11-06 15:01:10 wproc: host=EILIBWEBITMI01; service=Uptime;

2018-11-06 15:00:29 Warning: Check of service 'Swap Usage' on host 'EILIBWEBITMI01' timed out after 120.006s!
Runtime Error 2018-11-06 15:00:29 wproc: host=EILIBWEBITMI01; service=Swap Usage;

SAndro
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Problems with UNKNOWN messages.

Post by lmiltchev »

Can you show us a few examples of "failing" checks, run from the command line along with the output of it? Please use the verbose flag (-v).

Example:

Code: Select all

/usr/local/nagios/libexec/check_ncpa.py -H <ip address> -t '<token>' -P 5693 -M memory/swap/percent -w 50 -c 80 -v
Also, run the following command and show the output:

Code: Select all

nmap <server ip> -p 5693
Be sure to check out our Knowledgebase for helpful articles and solutions!
nagiosEngie
Posts: 104
Joined: Thu May 03, 2018 7:57 am

Re: Problems with UNKNOWN messages.

Post by nagiosEngie »

Hi,
That is the problem if I do it via command line I am unable to generate the timeout, even if I repeat the command one after the other.
I launched this 10 times in a row and the command gave the correct output wit no problem.

/usr/local/nagios/libexec/check_ncpa.py -v -H <IP> -T 120 -t 'xxxxxx' -P 5693 -M memory/swap -u Gi -w 95 -c 98
Connecting to: https://<IP>:5693/api/memory/swap/?token=xxxxx&warning=95&critical=98&units=Gi&check=1
File returned contained:
{
"returncode": 0,
"stdout": "OK: Used swap was 56.90 % (Total: 9.25 GiB, Used: 5.26 GiB, Free: 3.99 GiB) | 'total'=9.25GiB;9;9; 'used'=5.26GiB;9;9; 'free'=3.99GiB;9;9;"
}
OK: Used swap was 56.90 % (Total: 9.25 GiB, Used: 5.26 GiB, Free: 3.99 GiB) | 'total'=9.25GiB;9;9; 'used'=5.26GiB;9;9; 'free'=3.99GiB;9;9;

NMAP command output:

nmap <IP> -p 5693
Starting Nmap 6.47 ( http://nmap.org ) at 2018-11-07 09:56 CET
Nmap scan report for <HOSTNAME FQDN> (<IP>)
Host is up (0.00078s latency).
PORT STATE SERVICE
5693/tcp open unknown

Nmap done: 1 IP address (1 host up) scanned in 0.06 seconds

I checked CPU usage and SWAP on the monitored server and it is OK.
Thanks
Sandro
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Problems with UNKNOWN messages.

Post by lmiltchev »

That is the problem if I do it via command line I am unable to generate the timeout, even if I repeat the command one after the other.
This is strange. Does the timeout happen at about the same time? Perhaps, the server that you are monitoring is very busy at that time with performing updates, backups, etc.?

What is the version of the NCPA agent and check_ncpa.py plugin that you are currently using?

Code: Select all

/usr/local/nagios/libexec/check_ncpa.py -H<ip address> -t <token>' -P 5693 -M system/agent_version
/usr/local/nagios/libexec/check_ncpa.py -V
Can you show us the 'Uptime' and 'Swap Usage' service configs on host 'EILIBWEBITMI01'? Please, obfuscate sensitive data.
Be sure to check out our Knowledgebase for helpful articles and solutions!
nagiosEngie
Posts: 104
Joined: Thu May 03, 2018 7:57 am

Re: Problems with UNKNOWN messages.

Post by nagiosEngie »

Hello lmiltchev,
the situation is getting worse. I am getting timeouts on more and more servers.
I had a look at the eventlog and I have just in the last week 3300 timeout messages.
this is now happening on 5 different servers.

La ncpa agent I am using is 2.1.3.
Nagios upgraded to the latest update 5.5.6
check_ncpa.py, Version 1.1.3
Do you think this can be related to high load on the nagios server? stats in image.

Thanks

SAndro
You do not have the required permissions to view the files attached to this post.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Problems with UNKNOWN messages.

Post by lmiltchev »

Can you PM me your latest profile (Admin > System Config > System Profile > Download Profile)? We will need to review your configs and various logs.
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Problems with UNKNOWN messages.

Post by lmiltchev »

For some reason nagios.log is missing from the profile. Can you PM me the log?

Also, send the ncpa_listener.log and win32service_ncpalistener.log from the Windows machine.

How long does it usually take to run these NCPA commands from the command line? Try running them several times, and time the check.

Example:

Code: Select all

time /usr/local/nagios/libexec/check_ncpa.py -v -H <IP> -T 120 -t 'xxxxxx' -P 5693 -M memory/swap -u Gi -w 95 -c 98
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Problems with UNKNOWN messages.

Post by lmiltchev »

There are errors in the ncpa_listener.log as this one:
2018-10-22 04:15:33,447:ERROR:database:database is locked
Traceback (most recent call last):
File "C:\ncpa\agent\listener\database.py", line 67, in add_check
OperationalError: database is locked
which means that most probably the db is corrupt.

Do the following:

1. Stop both, the NCPA Listener, and NCPA Passive services on the Windows machine.

2. Delete the db file - C:\Program Files (x86)\Nagios\NCPA\var\ncpa.db. It will be recreated when the services start.

3. Disable the check logging in the C:\Program Files (x86)\Nagios\NCPA\etc\ncpa.cfg by changing the check_logging value to zero:

Code: Select all

check_logging = 0
save, and exit.

4. Start the NCPA Listener, and NCPA Passive services.

Let us know if this helped.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked