Page 1 of 2

Graphs Stopped after Password Change

Posted: Thu Jan 24, 2013 7:43 am
by Gavin
Hi,

We're running 2012R1.4

Someone inadvertently reset the 'nagiosadmin' password via. the normal user management interface. At the same time, we deleted one of our admin users. I have a feeling that this user was created by cloning the 'nagiosadmin' user (this won't happen again), and I remember seeing a bug where cloning items actually moves certain parameters rather than copying them? Either way, ever since then, graphing has ceased functioning.

I've since reset the security tokens via. the GUI, and the nagiosadmin user now has an alphanumeric password.

I've restarted every Nagios service, and there are still no graphs. I've included an excerpt of some logs at the bottom of this ticket. My digging also led me to find that the 'backend_ticket' for the nagiosadmin user (in the PostgreSQL db) is only 8 characters long, and all other users are 64. I was also surprised to see that the nagiosadmin user has a user_id of 18? Is that normal?

Any help would be appreciated...

Thanks,

Gavin

--------------------

Log sample taken at 12:41

/usr/local/nagios/var/perfdata.log

Code: Select all

2013-01-24 10:19:06 [24778] [2] RRDs Perl Modules are not installed. Falling back to rrdtool system call.
2013-01-24 10:19:06 [24778] [2] /usr/bin/rrdtool update --daemon=unix:/var/rrdtool/rrdcached/rrdcached.sock /usr/local/nagios/share/perfdata/.pnp-internal/runtime_runtime.rrd 1359022731:1.772237
2013-01-24 10:19:06 [24778] [1] rrdtool update returns 0
2013-01-24 10:19:06 [24778] [2] RRDs Perl Modules are not installed. Falling back to rrdtool system call.
2013-01-24 10:19:06 [24778] [2] /usr/bin/rrdtool update --daemon=unix:/var/rrdtool/rrdcached/rrdcached.sock /usr/local/nagios/share/perfdata/.pnp-internal/runtime_rows.rrd 1359022731:497
2013-01-24 10:19:06 [24778] [1] rrdtool update returns 0
2013-01-24 10:19:06 [24778] [2] RRDs Perl Modules are not installed. Falling back to rrdtool system call.
2013-01-24 10:19:06 [24778] [2] /usr/bin/rrdtool update --daemon=unix:/var/rrdtool/rrdcached/rrdcached.sock /usr/local/nagios/share/perfdata/.pnp-internal/runtime_errors.rrd 1359022731:1
2013-01-24 10:19:06 [24778] [1] rrdtool update returns 0
2013-01-24 10:19:06 [24778] [2] RRDs Perl Modules are not installed. Falling back to rrdtool system call.
2013-01-24 10:19:06 [24778] [2] /usr/bin/rrdtool update --daemon=unix:/var/rrdtool/rrdcached/rrdcached.sock /usr/local/nagios/share/perfdata/.pnp-internal/runtime_invalid.rrd 1359022731:0
2013-01-24 10:19:06 [24778] [1] rrdtool update returns 0
2013-01-24 10:19:06 [24778] [2] RRDs Perl Modules are not installed. Falling back to rrdtool system call.
2013-01-24 10:19:06 [24778] [2] /usr/bin/rrdtool update --daemon=unix:/var/rrdtool/rrdcached/rrdcached.sock /usr/local/nagios/share/perfdata/.pnp-internal/runtime_skipped.rrd 1359022731:5
2013-01-24 10:19:06 [24778] [1] rrdtool update returns 0
2013-01-24 10:19:06 [24778] [2] RRDs Perl Modules are not installed. Falling back to rrdtool system call.
2013-01-24 10:19:06 [24778] [2] /usr/bin/rrdtool update --daemon=unix:/var/rrdtool/rrdcached/rrdcached.sock /usr/local/nagios/share/perfdata/.pnp-internal/runtime_update.rrd 1359022731:491
2013-01-24 10:19:06 [24778] [1] rrdtool update returns 0
2013-01-24 10:19:06 [24778] [2] RRDs Perl Modules are not installed. Falling back to rrdtool system call.
2013-01-24 10:19:06 [24778] [2] /usr/bin/rrdtool update --daemon=unix:/var/rrdtool/rrdcached/rrdcached.sock /usr/local/nagios/share/perfdata/.pnp-internal/runtime_create.rrd 1359022731:0
2013-01-24 10:19:06 [24778] [1] rrdtool update returns 0
2013-01-24 10:19:07 [24778] [1] PNP exiting (runtime 0.00019s) ...
/usr/local/nagios/var/npcd.log

Code: Select all

[01-24-2013 12:40:36] NPCD: No more files to process... waiting for 15 seconds
[01-24-2013 12:40:51] NPCD: Found 2 files in /var/nagiosramdisk/spool/perfdata/
[01-24-2013 12:40:51] NPCD: DEBUG: load 2.790000/40.000000
[01-24-2013 12:40:51] NPCD: ThreadCounter 0/5 File is .
[01-24-2013 12:40:51] NPCD: DEBUG: load 2.790000/40.000000
[01-24-2013 12:40:51] NPCD: ThreadCounter 0/5 File is ..
[01-24-2013 12:40:51] NPCD: No more files to process... waiting for 15 seconds
[01-24-2013 12:41:06] NPCD: Found 2 files in /var/nagiosramdisk/spool/perfdata/
[01-24-2013 12:41:06] NPCD: DEBUG: load 2.530000/40.000000
[01-24-2013 12:41:06] NPCD: ThreadCounter 0/5 File is .
[01-24-2013 12:41:06] NPCD: DEBUG: load 2.530000/40.000000
[01-24-2013 12:41:06] NPCD: ThreadCounter 0/5 File is ..
[01-24-2013 12:41:06] NPCD: No more files to process... waiting for 15 seconds
[01-24-2013 12:41:21] NPCD: Found 2 files in /var/nagiosramdisk/spool/perfdata/
[01-24-2013 12:41:21] NPCD: DEBUG: load 20.070000/40.000000
[01-24-2013 12:41:21] NPCD: ThreadCounter 0/5 File is .
[01-24-2013 12:41:21] NPCD: DEBUG: load 20.070000/40.000000
[01-24-2013 12:41:21] NPCD: ThreadCounter 0/5 File is ..
[01-24-2013 12:41:21] NPCD: No more files to process... waiting for 15 seconds
[01-24-2013 12:41:36] NPCD: Found 2 files in /var/nagiosramdisk/spool/perfdata/
[01-24-2013 12:41:36] NPCD: DEBUG: load 15.830000/40.000000
[01-24-2013 12:41:36] NPCD: ThreadCounter 0/5 File is .
[01-24-2013 12:41:36] NPCD: DEBUG: load 15.830000/40.000000
[01-24-2013 12:41:36] NPCD: ThreadCounter 0/5 File is ..
[01-24-2013 12:41:36] NPCD: No more files to process... waiting for 15 seconds

Re: Graphs Stopped after Password Change

Posted: Thu Jan 24, 2013 11:51 am
by yancy
Gavin,

Can you double check the file permissions on process_perfdata.pl

Code: Select all

  ll /usr/local/nagios/libexec/process_perfdata.pl 
I'll have to do some digging to answer your other questions, or someone more knowledgeable can chime in.


Regards,

-Yancy

Re: Graphs Stopped after Password Change

Posted: Thu Jan 24, 2013 11:57 am
by Gavin
Hi Nancy,

We ran the permissions reset. Permissions on that file are as follows:

Code: Select all

-rwxr-xr-x 1 nagios nagios 42K Dec 17 11:17 /usr/local/nagios/libexec/process_perfdata.pl*
Thanks,

Gavin

Re: Graphs Stopped after Password Change

Posted: Thu Jan 24, 2013 12:18 pm
by yancy
Gavin,

how about your perfdata directory

Code: Select all

 ll /usr/local/nagios/share/perfdata 
if that checks out, try cracking open that directory and running rrdtraf against a rrd file

for example:

Code: Select all

  /usr/local/nagios/libexec/check_rrdtraf -f nrpe_diskspace.rrd -w 1 -c 2 
regards,

-Yancy

Re: Graphs Stopped after Password Change

Posted: Thu Jan 24, 2013 12:37 pm
by chrisp
Hi Yancy,

I sit next to Gavin in our office. He's gone home but I am still here, trying to get this fixed ASAP.

There are plenty of files in there: -

Code: Select all

% ll /usr/local/nagios/share/perfdata | wc -l
136
and here's the check_rrdtraf test on my test server: -

Code: Select all

% /usr/local/nagios/libexec/check_rrdtraf -f rainbow-it.net/Check_HTTP_-_Port_80.rrd -w 1 -c 2 
OK - Current BW in: 16.00bps Out: 0bps|in=16.000000b/s;1;2 out=0b/s;1;2
Definitely no data showing on that graph: -
rainbow-it.net_HTTP_80.png

Re: Graphs Stopped after Password Change

Posted: Thu Jan 24, 2013 12:59 pm
by yancy
chrisp,

can you check what the file permissions are on that directory.

Code: Select all

 ll /usr/local/nagios/share/perfdata 

the permissions should be as follows:
drwxrwxrwx 2 nagios nagios

Re: Graphs Stopped after Password Change

Posted: Thu Jan 24, 2013 1:03 pm
by chrisp
Current permissions are 775 nagios:nagios, not 777!

Re: Graphs Stopped after Password Change

Posted: Thu Jan 24, 2013 5:42 pm
by scottwilkerson
Can you post the settings for the following commands

Code: Select all

process-service-perfdata-file-bulk
process-host-perfdata-file-bulk

Re: Graphs Stopped after Password Change

Posted: Thu Jan 24, 2013 7:04 pm
by chrisp
Right, this is extremely helpful and utterly confounding and frustrating. This is how it looks right now: -

Code: Select all

% grep -A1 bulk /usr/local/nagios/etc/commands.cfg 
       command_name                  		process-host-perfdata-file-bulk
       command_line                  		/bin/mv /usr/local/nagios/var/host-perfdata /var/nagiosramdisk/spool/xidpe/$TIMET$.perfdata.host
--
       command_name                  		process-host-perfdata-file-pnp-bulk
       command_line                  		/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/perfdata/host-perfdata.$TIMET$
--
       command_name                  		process-service-perfdata-file-bulk
       command_line                  		/bin/mv /usr/local/nagios/var/service-perfdata /var/nagiosramdisk/spool/xidpe/$TIMET$.perfdata.service
--
       command_name                  		process-service-perfdata-file-pnp-bulk
       command_line                  		/bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/perfdata/service-perfdata.$TIMET$
But this is what it looked like last night: -

Code: Select all

% grep -A1 bulk /usr/local/nagios/etc/commands.cfg
       command_name                         process-host-perfdata-file-bulk
       command_line                         /bin/mv /var/nagiosramdisk/host-perfdata /var/nagiosramdisk/spool/xidpe/$TIMET$.perfdata.host
--
       command_name                         process-host-perfdata-file-pnp-bulk
       command_line                         /bin/mv /var/nagiosramdisk/host-perfdata /var/nagiosramdisk/spool/perfdata/host-perfdata.$TIMET$
--
       command_name                         process-service-perfdata-file-bulk
       command_line                         /bin/mv /var/nagiosramdisk/service-perfdata /var/nagiosramdisk/spool/xidpe/$TIMET$.perfdata.service
--
       command_name                         process-service-perfdata-file-pnp-bulk
       command_line                         /bin/mv /var/nagiosramdisk/service-perfdata /var/nagiosramdisk/spool/perfdata/service-perfdata.$TIMET$
Can you shed any light on how these values might have reverted back to a broken state? This really concerns me greatly, as I can't think of any actions that we may have deliberately taken, which would have altered the file like this.

After putting the file back to the worky state & restarting stuff, we have graph data again: -
whyarethecommandschanging.png
It's probably worth noting that rrdcached gets very upset and "service rrdcached restart" fails to properly kill the old processes (there are 2 when it's in the upset state), so I had to do "killall rrdcached" before I could get it back on track.

Re: Graphs Stopped after Password Change

Posted: Thu Jan 24, 2013 9:18 pm
by chrisp
I just rebooted and the commands.cfg has changed again: -

Code: Select all

# grep -A1 bulk /usr/local/nagios/etc/commands.cfg            
       command_name                  		process-host-perfdata-file-bulk
       command_line                  		/bin/mv /usr/local/nagios/var/host-perfdata /var/nagiosramdisk/spool/xidpe/$TIMET$.perfdata.host
--
       command_name                  		process-host-perfdata-file-pnp-bulk
       command_line                  		/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/perfdata/host-perfdata.$TIMET$
--
       command_name                  		process-service-perfdata-file-bulk
       command_line                  		/bin/mv /usr/local/nagios/var/service-perfdata /var/nagiosramdisk/spool/xidpe/$TIMET$.perfdata.service
--
       command_name                  		process-service-perfdata-file-pnp-bulk
       command_line                  		/bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/perfdata/service-perfdata.$TIMET$