Hi Nagios,
On a few servers, we recently upgraded our Kernel & all packages to the latest version(s). Shortly thereafter we lost the ability to gather data on these hosts via NCPA. Checks against NCPA return "error": "Referencing node that does not exist: ".
I restarted ncpa_listener & ncpa_passive & checked that the fw rule for 5693 is still in place. What do you suggest as the next step in troubleshooting?
Thanks,
Maxwell Ramirez
NCPA checks after upgrade
-
Maxwellb99
- Posts: 97
- Joined: Tue Jan 26, 2016 5:29 pm
NCPA checks after upgrade
You do not have the required permissions to view the files attached to this post.
Re: NCPA checks after upgrade
Can you navigate to https://gcxsp0013:5693/api/cpu/percent from your desktop?
If so, what happens if you run curl -k "https://gcxsp0013:5693/api/cpu/percent/ ... <YourToken> from the Nagios server's command line?
If so, what happens if you run curl -k "https://gcxsp0013:5693/api/cpu/percent/ ... <YourToken> from the Nagios server's command line?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
Maxwellb99
- Posts: 97
- Joined: Tue Jan 26, 2016 5:29 pm
Re: NCPA checks after upgrade
Hi, Yeah same result, that's kinda how we ruled out layer 4 issue.
I'm getting there, talking with the agent, then it tells me "referencing Node that doesn't exist". For any check, just to be sure, I changed the hostname & it returns results. For whatever reason after the upgrade the agent no longer recognizes the node endpoints.
I'm getting there, talking with the agent, then it tells me "referencing Node that doesn't exist". For any check, just to be sure, I changed the hostname & it returns results. For whatever reason after the upgrade the agent no longer recognizes the node endpoints.
Re: NCPA checks after upgrade
Alright, let's crank up the logging on NCPA, make a request, and see what we get. In /usr/local/ncpa/etc/ncpa.cfg, find the loglevel = entry for the ncpa_listener, and set that to debug. Restart the ncpa_listener service, tail -f /usr/local/ncpa/var/log/ncpa_listener.log and make a call to the API. Let's see what it gives us.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
Maxwellb99
- Posts: 97
- Joined: Tue Jan 26, 2016 5:29 pm
Re: NCPA checks after upgrade
Hi,
I attached the log (redacting our token). I'm getting a value error within the _pslinux.py file.
2020-02-10 18:09:13,733 1059 ERROR not sure how to interpret line ' 8 0 sda 418304 109 4737620 2771986 71413189 4660773 2260171374 2611701609 0 60283969 2578429714 0 0 0 0 0 0\n'
Traceback (most recent call last):
File "/root/ncpa/agent/listener/server.py", line 931, in api
File "/root/ncpa/agent/listener/psapi.py", line 279, in getter
File "/root/ncpa/agent/listener/psapi.py", line 260, in refresh
File "/root/ncpa/agent/listener/psapi.py", line 230, in get_root_node
File "/root/ncpa/agent/listener/psapi.py", line 162, in get_disk_node
File "/usr/local/lib/python2.7/site-packages/psutil/__init__.py", line 2169, in disk_io_counters
File "/usr/local/lib/python2.7/site-packages/psutil/_pslinux.py", line 1120, in disk_io_counters
File "/usr/local/lib/python2.7/site-packages/psutil/_pslinux.py", line 1093, in read_procfs
ValueError: not sure how to interpret line ' 8 0 sda 418304 109 4737620 2771986 71413189 4660773 2260171374 2611701609 0 60283969 2578429714 0 0 0 0 0 0\n'
I attached the log (redacting our token). I'm getting a value error within the _pslinux.py file.
2020-02-10 18:09:13,733 1059 ERROR not sure how to interpret line ' 8 0 sda 418304 109 4737620 2771986 71413189 4660773 2260171374 2611701609 0 60283969 2578429714 0 0 0 0 0 0\n'
Traceback (most recent call last):
File "/root/ncpa/agent/listener/server.py", line 931, in api
File "/root/ncpa/agent/listener/psapi.py", line 279, in getter
File "/root/ncpa/agent/listener/psapi.py", line 260, in refresh
File "/root/ncpa/agent/listener/psapi.py", line 230, in get_root_node
File "/root/ncpa/agent/listener/psapi.py", line 162, in get_disk_node
File "/usr/local/lib/python2.7/site-packages/psutil/__init__.py", line 2169, in disk_io_counters
File "/usr/local/lib/python2.7/site-packages/psutil/_pslinux.py", line 1120, in disk_io_counters
File "/usr/local/lib/python2.7/site-packages/psutil/_pslinux.py", line 1093, in read_procfs
ValueError: not sure how to interpret line ' 8 0 sda 418304 109 4737620 2771986 71413189 4660773 2260171374 2611701609 0 60283969 2578429714 0 0 0 0 0 0\n'
You do not have the required permissions to view the files attached to this post.
Re: NCPA checks after upgrade
Excellent, thank you for that. Can you give me the output of cat /proc/diskstats and uname -a ? That line for sda looks extremely long. It contains 20 fields, and _pslinux.py looks like it only knows how to deal with up to 18 fields. This is starting to look like the version of _pslinux.py that we bundle with NCPA doesn't handle kernel 5.5+.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
Maxwellb99
- Posts: 97
- Joined: Tue Jan 26, 2016 5:29 pm
Re: NCPA checks after upgrade
[<>@gcxsp0013 ~]$ uname -a
Linux gcxsp0013 5.5.0-1.el7.elrepo.x86_64 #1 SMP Sun Jan 26 20:12:30 EST 2020 x86_64 x86_64 x86_64 GNU/Linux
[<>@gcxsp0013 ~]$ cat /proc/diskstats
8 0 sda 435740 109 4884052 2933629 82640015 5374521 2616141120 2097266571 0 69618407 2058504029 0 0 0 0 0 0
8 1 sda1 600 0 13792 1066 11 0 4236 81 0 147 1066 0 0 0 0 0 0
8 2 sda2 435098 109 4867108 2932415 82640004 5374521 2616136884 2097266490 0 69618200 2058502828 0 0 0 0 0 0
253 0 dm-0 434468 0 4853614 2933677 88015038 0 2616132256 3950228056 0 70461827 3953161733 0 0 0 0 0 0
253 1 dm-1 82 0 4408 725 0 0 0 0 0 56 725 0 0 0 0 0 0
253 2 dm-2 706 0 4846 695 60 0 4628 38971 0 208 39666 0 0 0 0 0 0
Linux gcxsp0013 5.5.0-1.el7.elrepo.x86_64 #1 SMP Sun Jan 26 20:12:30 EST 2020 x86_64 x86_64 x86_64 GNU/Linux
[<>@gcxsp0013 ~]$ cat /proc/diskstats
8 0 sda 435740 109 4884052 2933629 82640015 5374521 2616141120 2097266571 0 69618407 2058504029 0 0 0 0 0 0
8 1 sda1 600 0 13792 1066 11 0 4236 81 0 147 1066 0 0 0 0 0 0
8 2 sda2 435098 109 4867108 2932415 82640004 5374521 2616136884 2097266490 0 69618200 2058502828 0 0 0 0 0 0
253 0 dm-0 434468 0 4853614 2933677 88015038 0 2616132256 3950228056 0 70461827 3953161733 0 0 0 0 0 0
253 1 dm-1 82 0 4408 725 0 0 0 0 0 56 725 0 0 0 0 0 0
253 2 dm-2 706 0 4846 695 60 0 4628 38971 0 208 39666 0 0 0 0 0 0
Re: NCPA checks after upgrade
Okay, yes kernel 5.5 added 2 columns to /proc/diskstats, and the way that _pslinux.py handles providing the output of diskstats is that it counts the number of columns, and outputs accordingly. Now the new column count falls outside of their logic, and so it doesn't know how to proceed.
I'm getting a bug report open for this.
I'm getting a bug report open for this.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
Maxwellb99
- Posts: 97
- Joined: Tue Jan 26, 2016 5:29 pm
Re: NCPA checks after upgrade
Cool. Thanks.
So what's the furthest officially supported Kernel RHEL & CentOS?
So what's the furthest officially supported Kernel RHEL & CentOS?
Re: NCPA checks after upgrade
The new columns were added in kernel 5.5, so anything prior to 5.5, and you should be good to go. Here's a list of the fields.
https://www.kernel.org/doc/Documentatio ... -diskstats
In looking at that link, and the logic in _pslinux.py, the logic will handle anything up to 18 fields. So kernel 4.18 through 5.4.
https://www.kernel.org/doc/Documentatio ... -diskstats
In looking at that link, and the logic in _pslinux.py, the logic will handle anything up to 18 fields. So kernel 4.18 through 5.4.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!