Page 2 of 3

Re: Elasticsearch-health-monitoring plugin

Posted: Tue Nov 17, 2015 7:09 pm
by jolson
Box293 wrote:
jolson wrote:My perfdata actually has five semicolons as well - I don't think that is the problem.
I believe newer versions of rrdtool is particular about this. I think it starts when you implement rrdcached as that installs a newer version of rrdtool than what is shipped with XI.

I'm pretty sure if @willemdh increases the logging verbosity then he would see the data being discarded in the logs:

http://support.nagios.com/wiki/index.ph ... leshooting

Code: Select all

tail -f /usr/local/nagios/var/perfdata.log > /tmp/perfdata.txt
tail -f /usr/local/nagios/var/npcd.log > /tmp/npcd.txt
Interesting! Thanks for the knowledge as always.

Re: Elasticsearch-health-monitoring plugin

Posted: Wed Nov 18, 2015 2:45 am
by WillemDH
Hmm, I'm not sure I ever installed rrdcached.

Found this procedure: https://www.google.be/url?sa=t&rct=j&q= ... 3241,d.d2s

Code: Select all

ls /etc/sysconfig/rrdcached
ls: cannot access /etc/sysconfig/rrdcached: No such file or directory
Also RRD_DAEMON_OPTS is commented in /usr/local/nagios/etc/pnp/process_perfdata.cfg

Code: Select all

 # RRD_DAEMON_OPTS = unix:/tmp/rrdcached.sock
Please also note that I'm using a ramdisk. Tailed the perfdata and npcd logfile after setting debug level, but nothing recently appeared in it, even after a few checks. Gonna try recreating the service.

EDIT: Recreated the service, again, copy of a service where perfdata is working fine. But same issue, no graphs.

EDIT 2: Ok, i'll post the intersting stuff from the npcd and perfdata logs. maybe you guys know better what it means

Code: Select all

 cat npcd.txt
[11-18-2015 09:07:31] NPCD: A thread was started on thread_counter = 0
[11-18-2015 09:07:31] NPCD: Processing file 1447834033.perfdata.host with ID 139688456226560 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1447834033.perfdata.host
[11-18-2015 09:07:31] NPCD: DEBUG: load 1.340000/20.000000
[11-18-2015 09:07:31] NPCD: Processing file '1447834033.perfdata.host'
[11-18-2015 09:07:31] NPCD: ThreadCounter 1/5 File is 1447834033.perfdata.service
[11-18-2015 09:07:31] NPCD: Regular File: 1447834033.perfdata.service
[11-18-2015 09:07:31] NPCD: A thread was started on thread_counter = 1
[11-18-2015 09:07:31] NPCD: Processing file 1447834033.perfdata.service with ID 139688375482112 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1447834033.perfdata.service
[11-18-2015 09:07:31] NPCD: Have to wait: Filecounter = 2 - thread_counter = 2
[11-18-2015 09:07:31] NPCD: Processing file '1447834033.perfdata.service'
[11-18-2015 09:07:34] NPCD: No more files to process... waiting for 15 seconds
[11-18-2015 09:07:49] NPCD: Found 4 files in /var/nagiosramdisk/spool/perfdata/
[11-18-2015 09:07:49] NPCD: DEBUG: load 1.430000/20.000000
[11-18-2015 09:07:49] NPCD: ThreadCounter 0/5 File is .
[11-18-2015 09:07:49] NPCD: DEBUG: load 1.430000/20.000000
[11-18-2015 09:07:49] NPCD: ThreadCounter 0/5 File is ..
[11-18-2015 09:07:49] NPCD: DEBUG: load 1.430000/20.000000
[11-18-2015 09:07:49] NPCD: ThreadCounter 0/5 File is 1447834048.perfdata.host
[11-18-2015 09:07:49] NPCD: Regular File: 1447834048.perfdata.host
[11-18-2015 09:07:49] NPCD: A thread was started on thread_counter = 0
[11-18-2015 09:07:49] NPCD: Processing file 1447834048.perfdata.host with ID 139688456226560 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1447834048.perfdata.host
[11-18-2015 09:07:49] NPCD: DEBUG: load 1.430000/20.000000
[11-18-2015 09:07:49] NPCD: ThreadCounter 1/5 File is 1447834049.perfdata.service
[11-18-2015 09:07:49] NPCD: Processing file '1447834048.perfdata.host'
[11-18-2015 09:07:49] NPCD: Regular File: 1447834049.perfdata.service
[11-18-2015 09:07:49] NPCD: A thread was started on thread_counter = 1
[11-18-2015 09:07:49] NPCD: Processing file 1447834049.perfdata.service with ID 139688375482112 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1447834049.perfdata.service
[11-18-2015 09:07:49] NPCD: Processing file '1447834049.perfdata.service'
[11-18-2015 09:07:49] NPCD: Have to wait: Filecounter = 2 - thread_counter = 2
[11-18-2015 09:07:53] NPCD: No more files to process... waiting for 15 seconds
[11-18-2015 09:08:08] NPCD: Found 6 files in /var/nagiosramdisk/spool/perfdata/
[11-18-2015 09:08:08] NPCD: DEBUG: load 1.450000/20.000000
[11-18-2015 09:08:08] NPCD: ThreadCounter 0/5 File is .
[11-18-2015 09:08:08] NPCD: DEBUG: load 1.450000/20.000000
[11-18-2015 09:08:08] NPCD: ThreadCounter 0/5 File is ..
[11-18-2015 09:08:08] NPCD: DEBUG: load 1.450000/20.000000
[11-18-2015 09:08:08] NPCD: ThreadCounter 0/5 File is 1447834063.perfdata.host
[11-18-2015 09:08:08] NPCD: Regular File: 1447834063.perfdata.host
[11-18-2015 09:08:08] NPCD: A thread was started on thread_counter = 0
[11-18-2015 09:08:08] NPCD: Processing file 1447834063.perfdata.host with ID 139688456226560 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1447834063.perfdata.host
[11-18-2015 09:08:08] NPCD: Processing file '1447834063.perfdata.host'
[11-18-2015 09:08:08] NPCD: DEBUG: load 1.450000/20.000000
[11-18-2015 09:08:08] NPCD: ThreadCounter 1/5 File is 1447834063.perfdata.service
[11-18-2015 09:08:08] NPCD: Regular File: 1447834063.perfdata.service
[11-18-2015 09:08:08] NPCD: A thread was started on thread_counter = 1
[11-18-2015 09:08:08] NPCD: Processing file 1447834063.perfdata.service with ID 139688375482112 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1447834063.perfdata.service
[11-18-2015 09:08:08] NPCD: Processing file '1447834063.perfdata.service'
[11-18-2015 09:08:08] NPCD: DEBUG: load 1.450000/20.000000
[11-18-2015 09:08:08] NPCD: ThreadCounter 2/5 File is 1447834079.perfdata.host
[11-18-2015 09:08:08] NPCD: Regular File: 1447834079.perfdata.host
[11-18-2015 09:08:08] NPCD: A thread was started on thread_counter = 2
[11-18-2015 09:08:08] NPCD: Processing file 1447834079.perfdata.host with ID 139688364992256 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1447834079.perfdata.host
[11-18-2015 09:08:08] NPCD: DEBUG: load 1.450000/20.000000
[11-18-2015 09:08:08] NPCD: Processing file '1447834079.perfdata.host'
[11-18-2015 09:08:08] NPCD: ThreadCounter 3/5 File is 1447834079.perfdata.service
[11-18-2015 09:08:08] NPCD: Regular File: 1447834079.perfdata.service
[11-18-2015 09:08:08] NPCD: A thread was started on thread_counter = 3
[11-18-2015 09:08:08] NPCD: Processing file 1447834079.perfdata.service with ID 139688354502400 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1447834079.perfdata.service
[11-18-2015 09:08:08] NPCD: Processing file '1447834079.perfdata.service'
[11-18-2015 09:08:08] NPCD: Have to wait: Filecounter = 4 - thread_counter = 4
[11-18-2015 09:08:11] NPCD: No more files to process... waiting for 15 seconds
[11-18-2015 09:08:26] NPCD: Found 4 files in /var/nagiosramdisk/spool/perfdata/
[11-18-2015 09:08:26] NPCD: DEBUG: load 1.310000/20.000000
[11-18-2015 09:08:26] NPCD: ThreadCounter 0/5 File is .
[11-18-2015 09:08:26] NPCD: DEBUG: load 1.310000/20.000000
[11-18-2015 09:08:26] NPCD: ThreadCounter 0/5 File is ..
[11-18-2015 09:08:26] NPCD: DEBUG: load 1.310000/20.000000
[11-18-2015 09:08:26] NPCD: ThreadCounter 0/5 File is 1447834093.perfdata.host
[11-18-2015 09:08:26] NPCD: Regular File: 1447834093.perfdata.host
[11-18-2015 09:08:26] NPCD: A thread was started on thread_counter = 0
[11-18-2015 09:08:26] NPCD: Processing file 1447834093.perfdata.host with ID 139688456226560 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1447834093.perfdata.host
[11-18-2015 09:08:26] NPCD: DEBUG: load 1.310000/20.000000
[11-18-2015 09:08:26] NPCD: Processing file '1447834093.perfdata.host'
[11-18-2015 09:08:26] NPCD: ThreadCounter 1/5 File is 1447834093.perfdata.service
[11-18-2015 09:08:26] NPCD: Regular File: 1447834093.perfdata.service
[11-18-2015 09:08:26] NPCD: A thread was started on thread_counter = 1
[11-18-2015 09:08:26] NPCD: Processing file 1447834093.perfdata.service with ID 139688375482112 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1447834093.perfdata.service
[11-18-2015 09:08:26] NPCD: Have to wait: Filecounter = 2 - thread_counter = 2
[11-18-2015 09:08:26] NPCD: Processing file '1447834093.perfdata.service'
[11-18-2015 09:08:30] NPCD: No more files to process... waiting for 15 seconds
[11-18-2015 09:08:45] NPCD: Found 4 files in /var/nagiosramdisk/spool/perfdata/
[11-18-2015 09:08:45] NPCD: DEBUG: load 1.020000/20.000000
[11-18-2015 09:08:45] NPCD: ThreadCounter 0/5 File is .
[11-18-2015 09:08:45] NPCD: DEBUG: load 1.020000/20.000000
[11-18-2015 09:08:45] NPCD: ThreadCounter 0/5 File is ..
[11-18-2015 09:08:45] NPCD: DEBUG: load 1.020000/20.000000
[11-18-2015 09:08:45] NPCD: ThreadCounter 0/5 File is 1447834108.perfdata.host
[11-18-2015 09:08:45] NPCD: Regular File: 1447834108.perfdata.host
[11-18-2015 09:08:45] NPCD: A thread was started on thread_counter = 0
[11-18-2015 09:08:45] NPCD: Processing file 1447834108.perfdata.host with ID 139688456226560 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1447834108.perfdata.host
[11-18-2015 09:08:45] NPCD: DEBUG: load 1.020000/20.000000
[11-18-2015 09:08:45] NPCD: Processing file '1447834108.perfdata.host'
[11-18-2015 09:08:45] NPCD: ThreadCounter 1/5 File is 1447834108.perfdata.service
[11-18-2015 09:08:45] NPCD: Regular File: 1447834108.perfdata.service
[11-18-2015 09:08:45] NPCD: A thread was started on thread_counter = 1
[11-18-2015 09:08:45] NPCD: Processing file 1447834108.perfdata.service with ID 139688375482112 - going to exec /usr/local/nagios/libexec/process_perfdata.pl -n -b /var/nagiosramdisk/spool/perfdata//1447834108.perfdata.service
[11-18-2015 09:08:45] NPCD: Processing file '1447834108.perfdata.service'
[11-18-2015 09:08:45] NPCD: Have to wait: Filecounter = 2 - thread_counter = 2

Code: Select all

cat perfdata.txt | grep "nagioslogserver01"
2015-11-18 09:08:08 [64602] [1] Found Performance Data for nagioslogserver01.gentgrp.gent.be / _HOST_ (rta=0.385ms;3000.000;5000.000;0; pl=0%;80;100;; rtmax=0.982ms;;;; rtmin=0.214ms;;;;)
2015-11-18 09:08:08 [64602] [2] RRDs::update /usr/local/nagios/share/perfdata/nagioslogserver01.gentgrp.gent.be/_HOST_.rrd 1447834058:0.385:0:0.982:0.214
2015-11-18 09:08:08 [64602] [2] /usr/local/nagios/share/perfdata/nagioslogserver01.gentgrp.gent.be/_HOST_.rrd updated
2015-11-18 09:08:08 [64608] [1] Found Performance Data for nagioslogserver01.gentgrp.gent.be / INF_Elasticsearch (cluster_nodes=2;;;;; cluster_master_eligible_nodes=2;;;;; cluster_data_nodes=2;;;;; cluster_active_shards=342;;;;; cluster_relocating_shards=0;;;;; cluster_initialising_shards=0;;;;; cluster_unassigned_shards=0;;;;; cluster_total_shards=342;;;;; cluster_total_indices=35;;;;; cluster_closed_indices=0;;;;; storesize=454154063698B;;;;; documents=489811417c;;;;; index_ops=37211961c;;;;; index_time=21123931ms;;;;; flush_ops=669c;;;;; flush_time=80921ms;;;;; throttle_time=8421121ms;;;;; index_ops=37211961c;;;;; index_time=21123931ms;;;;; delete_ops=0c;;;;; delete_time=0ms;;;;; get_ops=26644c;;;;; get_time=6913ms;;;;; exists_ops=26591c;;;;; exists_time=6909ms;;;;; missing_ops=53c;;;;; missing_time=4ms;;;;; query_ops=64166c;;;;; query_time=269146ms;;;;; fetch_ops=63899c;;;;; fetch_time=10188ms;;;;; merge_ops=33592c;;;;; refresh_ops=171103c;;;;; refresh_time=6128534ms;;;;; gc_old_count=5c;;;;; gc_young_count=23285c;;;;; heap_used=42%;;;;;)
2015-11-18 09:08:09 [64608] [1] Found Performance Data for nagioslogserver01.gentgrp.gent.be / HEA_Elasticsearch (cluster_nodes=2;;;;; cluster_master_eligible_nodes=2;;;;; cluster_data_nodes=2;;;;; cluster_active_shards=342;;;;; cluster_relocating_shards=0;;;;; cluster_initialising_shards=0;;;;; cluster_unassigned_shards=0;;;;; cluster_total_shards=342;;;;; cluster_total_indices=35;;;;; cluster_closed_indices=0;;;;; storesize=454154063698B;;;;; documents=489813216c;;;;; index_ops=37215700c;;;;; index_time=21125926ms;;;;; flush_ops=669c;;;;; flush_time=80921ms;;;;; throttle_time=8421121ms;;;;; index_ops=37215700c;;;;; index_time=21125926ms;;;;; delete_ops=0c;;;;; delete_time=0ms;;;;; get_ops=26648c;;;;; get_time=6914ms;;;;; exists_ops=26595c;;;;; exists_time=6910ms;;;;; missing_ops=53c;;;;; missing_time=4ms;;;;; query_ops=64170c;;;;; query_time=269149ms;;;;; fetch_ops=63903c;;;;; fetch_time=10189ms;;;;; merge_ops=33593c;;;;; refresh_ops=171109c;;;;; refresh_time=6128781ms;;;;; gc_old_count=5c;;;;; gc_young_count=23286c;;;;; heap_used=43%;;;;;)
2015-11-18 09:08:09 [64604] [1] Found Performance Data for nagioslogserver01.gentgrp.gent.be / HEA_Elasticsearch (cluster_nodes=2;;;;; cluster_master_eligible_nodes=2;;;;; cluster_data_nodes=2;;;;; cluster_active_shards=342;;;;; cluster_relocating_shards=0;;;;; cluster_initialising_shards=0;;;;; cluster_unassigned_shards=0;;;;; cluster_total_shards=342;;;;; cluster_total_indices=35;;;;; cluster_closed_indices=0;;;;; storesize=454151730978B;;;;; documents=489809464c;;;;; index_ops=37209965c;;;;; index_time=21122944ms;;;;; flush_ops=669c;;;;; flush_time=80921ms;;;;; throttle_time=8420594ms;;;;; index_ops=37209965c;;;;; index_time=21122944ms;;;;; delete_ops=0c;;;;; delete_time=0ms;;;;; get_ops=26644c;;;;; get_time=6913ms;;;;; exists_ops=26591c;;;;; exists_time=6909ms;;;;; missing_ops=53c;;;;; missing_time=4ms;;;;; query_ops=64164c;;;;; query_time=269145ms;;;;; fetch_ops=63897c;;;;; fetch_time=10188ms;;;;; merge_ops=33590c;;;;; refresh_ops=171098c;;;;; refresh_time=6128294ms;;;;; gc_old_count=5c;;;;; gc_young_count=23284c;;;;; heap_used=42%;;;;;)
2015-11-18 09:08:27 [65263] [1] Found Performance Data for nagioslogserver01.gentgrp.gent.be / DRV_Root (/=69317MB;154232;162800;0;171369)
2015-11-18 09:08:27 [65263] [2] RRDs::update /usr/local/nagios/share/perfdata/nagioslogserver01.gentgrp.gent.be/DRV_Root.rrd 1447834081:69317
2015-11-18 09:08:27 [65263] [2] /usr/local/nagios/share/perfdata/nagioslogserver01.gentgrp.gent.be/DRV_Root.rrd updated
2015-11-18 09:08:27 [65263] [1] Found Performance Data for nagioslogserver01.gentgrp.gent.be / HEA_Elasticsearch (cluster_nodes=2;;;;; cluster_master_eligible_nodes=2;;;;; cluster_data_nodes=2;;;;; cluster_active_shards=342;;;;; cluster_relocating_shards=0;;;;; cluster_initialising_shards=0;;;;; cluster_unassigned_shards=0;;;;; cluster_total_shards=342;;;;; cluster_total_indices=35;;;;; cluster_closed_indices=0;;;;; storesize=454162234602B;;;;; documents=489819233c;;;;; index_ops=37220120c;;;;; index_time=21128434ms;;;;; flush_ops=669c;;;;; flush_time=80921ms;;;;; throttle_time=8421392ms;;;;; index_ops=37220120c;;;;; index_time=21128434ms;;;;; delete_ops=0c;;;;; delete_time=0ms;;;;; get_ops=26653c;;;;; get_time=6915ms;;;;; exists_ops=26600c;;;;; exists_time=6911ms;;;;; missing_ops=53c;;;;; missing_time=4ms;;;;; query_ops=64178c;;;;; query_time=269152ms;;;;; fetch_ops=63911c;;;;; fetch_time=10193ms;;;;; merge_ops=33597c;;;;; refresh_ops=171125c;;;;; refresh_time=6129498ms;;;;; gc_old_count=5c;;;;; gc_young_count=23289c;;;;; heap_used=41%;;;;;)
HEA_Elasticsearch is the first service and INF_Elasticsearch is the second service I created for testing. So the perfdata is effectively found (Found Performance Data for nagioslogserver01.gentgrp.gent.be / HEA_Elasticsearch) but how can I see if it is discarded?

Re: Elasticsearch-health-monitoring plugin

Posted: Wed Nov 18, 2015 5:50 pm
by Box293
Willlem,

Can you use my Performance Data Tool to view the RRD file to see if the data exists in the RRD file.

Re: Elasticsearch-health-monitoring plugin

Posted: Thu Nov 19, 2015 10:16 am
by WillemDH
I should have looked in the first place. The rrd files dont exist. I thought if I saw the perfdata in the advanced tab it was already in the rrd. Apparently not.

Re: Elasticsearch-health-monitoring plugin

Posted: Thu Nov 19, 2015 5:37 pm
by Box293
So I suggest editing the plugin and removing the extra ;

I see two different plugins were reference here so I don't know which one has the code that needs to be changed.

Re: Elasticsearch-health-monitoring plugin

Posted: Thu Nov 19, 2015 5:57 pm
by WillemDH
The one from Jesse.

Re: Elasticsearch-health-monitoring plugin

Posted: Thu Nov 19, 2015 6:06 pm
by Box293
So looking at the plugin:

Code: Select all

from nagioscheck import PerformanceMetric, Status
It uses a library to do the performance data stuff.

https://github.com/saj/pynagioscheck

Specifically the nagioscheck.py file:
https://github.com/saj/pynagioscheck/bl ... oscheck.py

Line 295 needs the last ; removed:
Current:

Code: Select all

return ("%s=%s%s;%s;%s;%s;%s;" %
Should be:

Code: Select all

return ("%s=%s%s;%s;%s;%s;%s" %
If this fixes the problem we'll report this as an issue on the pynagioscheck GitHub project.

Re: Elasticsearch-health-monitoring plugin

Posted: Mon Nov 30, 2015 5:25 am
by WillemDH
Nice one Troy. Updating nagioscheck.py

Code: Select all

return ("%s=%s%s;%s;%s;%s;%s;" %
with

Code: Select all

return ("%s=%s%s;%s;%s;%s;%s" %
did the trick. perdata is working as expected now. Weird that this did work with Jesse though. Will you make the GitHub issue on https://github.com/saj/pynagioscheck?

Thanks all!

Re: Elasticsearch-health-monitoring plugin

Posted: Mon Nov 30, 2015 2:28 pm
by bwallace
Box293, thanks for the suggestions. Willem, would you say this thread is ready to be locked?

Re: Elasticsearch-health-monitoring plugin

Posted: Mon Nov 30, 2015 6:51 pm
by Box293
WillemDH wrote:Nice one Troy. Updating nagioscheck.py

Code: Select all

return ("%s=%s%s;%s;%s;%s;%s;" %
with

Code: Select all

return ("%s=%s%s;%s;%s;%s;%s" %
did the trick. perdata is working as expected now. Weird that this did work with Jesse though. Will you make the GitHub issue on https://github.com/saj/pynagioscheck?

Thanks all!
Sweet, glad my instincts where spot on with this :ugeek:

Issue created on GitHub:
https://github.com/saj/pynagioscheck/issues/2