Should apply config reset all service checks?

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
WillH
Posts: 54
Joined: Mon Aug 03, 2020 10:37 am

Should apply config reset all service checks?

Post by WillH »

We've noticed on many of our servers when hitting apply config in the gui, or via API, that all of the service checks are flip to pending. This is generating many events to our integrations with ServiceNow.

Looking in the /user/local/nagios/var/nagios.log we're seeing "Caught SIGTERM, shutting down..."
Is this normal for apply change?
WillH
Posts: 54
Joined: Mon Aug 03, 2020 10:37 am

Re: Should apply config reset all service checks?

Post by WillH »

Also,
Happy Labor day for those in the US!
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Should apply config reset all service checks?

Post by pbroste »

Hello @WillH

Hello,

Thanks for reaching out and would like to take a look at your Nagios XI System Profile so we can see what is going on.

To send us your system profile.
  • Login to the Nagios XI GUI using a web browser.
  • Click the "Admin" > "System Profile" Menu
  • Click the "Download Profile" button
  • Save the profile.zip file and send via private message
Or by command line:

Code: Select all

rm -rf /usr/local/nagiosxi/var/components/profile.zip
/usr/local/nagiosxi/scripts/components/getprofile.sh SUPPORT

Then send the resulting /usr/local/nagiosxi/var/components/profile.zip​ file via Private Message.

Thanks,
Perry
WillH
Posts: 54
Joined: Mon Aug 03, 2020 10:37 am

Re: Should apply config reset all service checks?

Post by WillH »

rm -rf?
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Should apply config reset all service checks?

Post by pbroste »

Hello WillH

Thanks for sending over the System Profile, and after review, I did not catch any applyconfig logged.

Want to walk through the database repair and then a reindex. Then trigger off a 'applyconfig' so we can catch that and see what is going on.

We would expect to see something like this:
- - [07/Sep/2021:17:13:46 -0500] "GET /nagiosxi/includes/components/nagioscorecfg/applyconfig.php?cmd=confirm HTTP/1.1" 200 4382 "http://xxx.xxx.xxx.xxx/nagiosxi/include ... -index.php"
  • run through the database repair.
  • [list]
  • Code: Select all

    /usr/local/nagiosxi/scripts/repair_databases.sh
[*]Reindex the Core Configuration Manager (CCM) configs[/*]
  • rm -rf /usr/local/nagios/etc/import/*
  • 1: Terminal command list all running /bin/nagios -> ps -aux | grep -E '/bin/nagios'
  • 2: Terminal command -> killall -9 nagios (or pkill nagios)
  • 3: Terminal command check to see if /bin/nagios processes are stopped
  • 4: Restart nagios.service by terminal command: systemctl restart nagios
  • 5: Head over to the Nagios XI web console ==> Core Configuration Manager (CCM) ==> Config File Management ==> [Delete Files] ==> [Write Files] ==> [Verify Files]
  • 6: Core Configuration Manager (CCM) ==> Under Quick Tools ==> "Apply Configuration"
  • 7: Restart nagios.service by terminal command: systemctl restart nagios
  • [list]
  • Code: Select all

    systemctl restart nagios
[/list]

[*]verify that the host and services look good and verify that there are no errors in core by:[/*]

  • Code: Select all

    /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
[/list]

Please PM your updated system profile for us to review.

To send us your system profile.
  • Login to the Nagios XI GUI using a web browser.
  • Click the "Admin" > "System Profile" Menu
  • Click the "Download Profile" button
  • Save the profile.zip file and send via Private Message
Thanks,
Perry
WillH
Posts: 54
Joined: Mon Aug 03, 2020 10:37 am

Re: Should apply config reset all service checks?

Post by WillH »

Same results. :(
Sending over the profile, with the full logfile from today rather than the truncated one in system profile's data dump
Also added is compare.txt, output of the /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg command
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Should apply config reset all service checks?

Post by pbroste »

Hello @willH

Thanks for sending the updated System Profile, in looking at the "Caught SIGTERM, shutting down..." instances which typically happen when running out of resources. We see that the environment is utilizing Ramdisk, and has run out of space.
tmpfs 2.0G 2.0G 0 100% /var/nagiosramdisk
Want to take a look at what that mountpoint looks like:

Code: Select all

ls -la -R /var/nagiosramdisk
And

Code: Select all

du -h /var/nagiosramdisk
How many npcd's are running:

Code: Select all

ps -aux | grep -Ei 'npcd'
If you find more than one please kill process, then restart and verify status.

Code: Select all

service npcd restart
service npcd status
Review some log data:

Code: Select all

tail -50 /usr/local/nagios/var/perfdata.log
tail -50 /usr/local/nagios/var/npcd.log
Total Hosts:

Code: Select all

/usr/local/nagios/bin/nagiostats|grep "Total Hosts"|awk '{ print $3 }';/usr/local/nagios/bin/nagiostats|grep "Total Services"|awk '{ print $3 }'
Make sure that the built-in nagios user has an active account:

Code: Select all

chage -I -1 -m 0 -M 99999 -E -1 nagios
The option to add >> /tmp/somefile.txt to dump the command results into file to send.

Please let us know the results,
Perry
WillH
Posts: 54
Joined: Mon Aug 03, 2020 10:37 am

Re: Should apply config reset all service checks?

Post by WillH »

npcd:
nagios 3962 0.0 0.0 8648 720 ? S Sep01 0:07 /usr/local/nagios/bin/npcd -d -f /usr/local/nagios/etc/pnp/npcd.cfg
root 117332 0.0 0.0 112816 992 pts/0 S+ 14:08 0:00 grep --color=auto -Ei npcd

Ramdisk answers:
/var/nagiosramdisk:
total 2097156
drwxrwxrwt 4 nagios nagios 240 Sep 8 14:07 .
drwxr-xr-x. 26 root root 4096 Nov 17 2020 ..
-rw-r--r-- 1 nagios nagios 46575616 Sep 8 14:07 host-perfdata
-rw------- 1 nagios nagios 53248 Sep 1 15:16 nagios.tmp40vcXN
-rw------- 1 nagios nagios 0 Sep 4 23:02 nagios.tmp9Mjjaa
-rw------- 1 nagios nagios 0 Sep 4 23:00 nagios.tmpo9AI3P
-rw------- 1 nagios nagios 73728 Sep 2 06:10 nagios.tmpPhpAMU
-rw-r--r-- 1 nagios nagios 13462438 Sep 8 08:08 objects.cache
-rw-r--r-- 1 nagios nagios 2087317504 Sep 8 14:07 service-perfdata
drwxrwxr-x 5 nagios nagios 100 Sep 1 10:14 spool
-rw-rw-r-- 1 nagios nagios 0 Sep 8 14:07 status.dat
drwxrwxr-x 2 nagios nagios 40 Sep 8 08:22 tmp

/var/nagiosramdisk/spool:
total 0
drwxrwxr-x 5 nagios nagios 100 Sep 1 10:14 .
drwxrwxrwt 4 nagios nagios 240 Sep 8 14:07 ..
drwxrwxr-x 2 nagios nagios 40 Sep 8 08:22 checkresults
drwxrwxr-x 2 nagios nagios 40 Sep 1 10:14 perfdata
drwxrwxr-x 2 nagios nagios 40 Sep 1 10:14 xidpe

/var/nagiosramdisk/spool/checkresults:
total 0
drwxrwxr-x 2 nagios nagios 40 Sep 8 08:22 .
drwxrwxr-x 5 nagios nagios 100 Sep 1 10:14 ..

/var/nagiosramdisk/spool/perfdata:
total 0
drwxrwxr-x 2 nagios nagios 40 Sep 1 10:14 .
drwxrwxr-x 5 nagios nagios 100 Sep 1 10:14 ..

/var/nagiosramdisk/spool/xidpe:
total 0
drwxrwxr-x 2 nagios nagios 40 Sep 1 10:14 .
drwxrwxr-x 5 nagios nagios 100 Sep 1 10:14 ..

/var/nagiosramdisk/tmp:
total 0
drwxrwxr-x 2 nagios nagios 40 Sep 8 08:22 .
drwxrwxrwt 4 nagios nagios 240 Sep 8 14:07 ..
0 /var/nagiosramdisk/spool/perfdata
0 /var/nagiosramdisk/spool/xidpe
0 /var/nagiosramdisk/spool/checkresults
0 /var/nagiosramdisk/spool
0 /var/nagiosramdisk/tmp
2.0G /var/nagiosramdisk

Perfdata tail
2021-08-24 14:45:41 [88017] [1] rrdtool update returns 0
2021-08-24 14:45:41 [88017] [1] Found Performance Data for HOSTNAME / Solaris_-_Memory_Paging__Read_-_Write ('MemoryPagingRead'=0;300;; 'MemoryPagingWrite'=1;300;;)
2021-08-24 14:45:41 [88017] [1] rrdtool update returns 0
2021-08-24 14:45:41 [88017] [1] Found Performance Data for HOSTNAME- / Linux_-_Physical_Memory__Memory_Utilization_and_PageScan_and_SwapOut ('MemUtil'=3.4;0;95;; 'MemPageOutRate'=374.05;0;500;; 'MemPageInRate'=0.00;100;20000;; 'MemSwapoutByteRate'=0;0;20;;)
2021-08-24 14:45:41 [88017] [1] rrdtool update returns 0
2021-08-24 14:45:41 [88017] [1] Found Performance Data for HOSTNAME--02 / Linux_-_Interface_Errors_eth0 ('ErrorIn'=0.0;1;; 'ErrorOut'=0.0;1;;)
2021-08-24 14:45:41 [88017] [1] rrdtool update returns 0
2021-08-24 14:45:41 [88017] [1] Found Performance Data for HOSTNAME-004 / Windows_-_C___Pct_Used ('used'=29.00GiB;85;90; 'free'=70.48GiB;85;90; 'total'=99.49GiB;85;90;)
2021-08-24 14:45:41 [88017] [1] rrdtool update returns 0
2021-08-24 14:45:41 [88017] [1] Found Performance Data for HOSTNAME--01g / Solaris_-_Filesystem_Utilization__var_share_kvol ('DiskSpaceUtil'=0.0%;80;95;; 'DiskFreeSpace'=26745.82MB;200;64;; 'DiskInodeUtil'=10%;80;95;;)
2021-08-24 14:45:41 [88017] [1] rrdtool update returns 0
2021-08-24 14:45:41 [88017] [1] Found Performance Data for HOSTNAME-003 / Windows_-_CPU_Total_Avg_Utilization ('percent'=0.00%;;98;)
2021-08-24 14:45:41 [88017] [1] rrdtool update returns 0
2021-08-24 14:45:41 [88017] [1] Found Performance Data for HOSTNAME / Linux_-_Interface_Errors_eth10 ('ErrorIn'=0.0;1;; 'ErrorOut'=0.0;1;;)
2021-08-24 14:45:41 [88017] [1] rrdtool update returns 0
2021-08-24 14:45:41 [88017] [1] Found Performance Data for HOSTNAME-013 / Windows_-_CPU_Diagnostics___CPU_Utilization_and_ProcQ-CPU ('Processor Queue Length'=0.0;; 'CPU Count'=12.0;; 'CPU Avg Pct Used'=8.58;;)
2021-08-24 14:45:41 [88017] [1] rrdtool update returns 0
2021-08-24 14:45:41 [88017] [1] Found Performance Data for HOSTNAME--04 / Solaris_-_Interface_Errors_usbecm0 ('ErrorIn'=0.0;1;; 'ErrorOut'=0.0;1;;)
2021-08-24 14:45:41 [88017] [1] rrdtool update returns 0
2021-08-24 14:45:41 [88017] [1] Found Performance Data for HOSTNAME--02 / Solaris_-_Interface_Errors_ipmp0 ('ErrorIn'=0.0;1;; 'ErrorOut'=0.0;1;;)
2021-08-24 14:45:41 [88017] [1] rrdtool update returns 0
2021-08-24 14:45:41 [88017] [1] Found Performance Data for HOSTNAME--63 / Solaris_-_Total_CPU_Util ('percent'=0.38%;;95;)
2021-08-24 14:45:41 [88017] [1] rrdtool update returns 0
2021-08-24 14:45:41 [88017] [1] Found Performance Data for HOSTNAME--22 / Solaris_-_Physical_Memory__Errors_exceeded_Levels_Log_Message (lines=0)
2021-08-24 14:45:41 [88017] [1] rrdtool update returns 0
2021-08-24 14:45:41 [88017] [1] Found Performance Data for HOSTNAME-001 / Windows_-_Memory_PageWrites_or_PageReads ('MemoryPagingRead'=0.0;;10000; 'MemoryPagingWrite'=0.0;;400;)
2021-08-24 14:45:41 [88017] [1] rrdtool update returns 0
2021-08-24 14:45:41 [88017] [1] Found Performance Data for HOSTNAME--01 / Solaris_-_Interface_Errors_net2 ('ErrorIn'=0.0;1;; 'ErrorOut'=0.0;1;;)
2021-08-24 14:45:41 [88017] [1] rrdtool update returns 0
2021-08-24 14:45:41 [88017] [1] Found Performance Data for HOSTNAME--01g / Solaris_-_Physical_Memory__Memory_Utilization_and_PageScan_and_SwapOut ('MemUtil'=49.4%;0;95;; 'MemPageOutRate'=0pgpgout/s;0;500;; 'MemPageInRate'=0pgpgin/s;100;20000;; 'MemSwapoutByteRate'=0disk/s;0;20;;)
2021-08-24 14:45:41 [88017] [1] rrdtool update returns 0
2021-08-24 14:45:41 [88017] [1] Found Performance Data for HOSTNAME-017 / Windows_-_Memory_PageWrites_or_PageReads ('MemoryPagingRead'=0.0;;10000; 'MemoryPagingWrite'=0.0;;400;)
2021-08-24 14:45:41 [88017] [1] rrdtool update returns 0
2021-08-24 14:45:41 [88017] [1] Found Performance Data for HOSTNAME--01 / Windows_-_C___Pct_Used ('used'=29.02GiB;68;72; 'free'=50.61GiB;68;72; 'total'=79.64GiB;68;72;)
2021-08-24 14:45:41 [88017] [1] rrdtool update returns 0
2021-08-24 14:45:41 [88017] [1] Found Performance Data for HOSTNAME-015 / Windows_-_CPU_Total_Avg_Utilization ('percent'=13.28%;;98;)
2021-08-24 14:45:41 [88017] [1] rrdtool update returns 0
2021-08-24 14:45:41 [88017] [1] Found Performance Data for HOSTNAME-007 / Windows_-_Unexpected_Restart__Uptime ('uptime'=225278.44s;600:;;)
2021-08-24 14:45:41 [88017] [1] rrdtool update returns 0
2021-08-24 14:45:41 [88017] [1] Found Performance Data for HOSTNAME-043 / Windows_-_Unexpected_Restart__Uptime ('uptime'=449065.14s;600:;;)
2021-08-24 14:45:41 [88017] [1] rrdtool update returns 0
2021-08-24 14:45:41 [88017] [1] Found Performance Data for HOSTNAME--01v / Linux_-_Filesystem_Utilization__usr ('DiskSpaceUtil'=33.2%;80;95;; 'DiskFreeSpace'=2590.68MB;200;64;; 'DiskInodeUtil'=10%;80;95;;)
2021-08-24 14:45:41 [88017] [1] rrdtool update returns 0
2021-08-24 14:45:41 [88017] [1] Found Performance Data for HOSTNAME--07 / Linux_-_Interface_Collisions_eth5 ('nic_collision's=0;;0;;)
2021-08-24 14:45:41 [88017] [1] rrdtool update returns 0
2021-08-24 14:45:41 [88017] [1] Found Performance Data for HOSTNAME--01 / Drive_Windows_-_E___Pct_Used ('used'=255.00GiB;255;270; 'free'=44.92GiB;255;270; 'total'=299.92GiB;255;270;)
2021-08-24 14:45:41 [88017] [1] rrdtool update returns 0
2021-08-24 14:45:41 [88017] [1] 571 lines processed
2021-08-24 14:45:41 [88017] [1] /var/nagiosramdisk/spool/perfdata//1629834327.perfdata.service-PID-88017 deleted
2021-08-24 14:45:41 [88017] [1] PNP exiting (runtime 1.801666s) ...

npcd log tail
[09-08-2021 14:13:48] NPCD: ThreadCounter 0/5 File is ..
[09-08-2021 14:13:48] NPCD: No more files to process... waiting for 15 seconds
[09-08-2021 14:14:03] NPCD: Found 2 files in /var/nagiosramdisk/spool/perfdata/
[09-08-2021 14:14:03] NPCD: DEBUG: load 3.150000/999.000000
[09-08-2021 14:14:03] NPCD: ThreadCounter 0/5 File is .
[09-08-2021 14:14:03] NPCD: DEBUG: load 3.150000/999.000000
[09-08-2021 14:14:03] NPCD: ThreadCounter 0/5 File is ..
[09-08-2021 14:14:03] NPCD: No more files to process... waiting for 15 seconds
[09-08-2021 14:14:18] NPCD: Found 2 files in /var/nagiosramdisk/spool/perfdata/
[09-08-2021 14:14:18] NPCD: DEBUG: load 2.690000/999.000000
[09-08-2021 14:14:18] NPCD: ThreadCounter 0/5 File is .
[09-08-2021 14:14:18] NPCD: DEBUG: load 2.690000/999.000000
[09-08-2021 14:14:18] NPCD: ThreadCounter 0/5 File is ..
[09-08-2021 14:14:18] NPCD: No more files to process... waiting for 15 seconds
[09-08-2021 14:14:33] NPCD: Found 2 files in /var/nagiosramdisk/spool/perfdata/
[09-08-2021 14:14:33] NPCD: DEBUG: load 2.700000/999.000000
[09-08-2021 14:14:33] NPCD: ThreadCounter 0/5 File is .
[09-08-2021 14:14:33] NPCD: DEBUG: load 2.700000/999.000000
[09-08-2021 14:14:33] NPCD: ThreadCounter 0/5 File is ..
[09-08-2021 14:14:33] NPCD: No more files to process... waiting for 15 seconds
[09-08-2021 14:14:48] NPCD: Found 2 files in /var/nagiosramdisk/spool/perfdata/
[09-08-2021 14:14:48] NPCD: DEBUG: load 2.910000/999.000000
[09-08-2021 14:14:48] NPCD: ThreadCounter 0/5 File is .
[09-08-2021 14:14:48] NPCD: DEBUG: load 2.910000/999.000000
[09-08-2021 14:14:48] NPCD: ThreadCounter 0/5 File is ..
[09-08-2021 14:14:48] NPCD: No more files to process... waiting for 15 seconds
[09-08-2021 14:15:03] NPCD: Found 2 files in /var/nagiosramdisk/spool/perfdata/
[09-08-2021 14:15:03] NPCD: DEBUG: load 4.090000/999.000000
[09-08-2021 14:15:03] NPCD: ThreadCounter 0/5 File is .
[09-08-2021 14:15:03] NPCD: DEBUG: load 4.090000/999.000000
[09-08-2021 14:15:03] NPCD: ThreadCounter 0/5 File is ..
[09-08-2021 14:15:03] NPCD: No more files to process... waiting for 15 seconds
[09-08-2021 14:15:18] NPCD: Found 2 files in /var/nagiosramdisk/spool/perfdata/
[09-08-2021 14:15:18] NPCD: DEBUG: load 3.690000/999.000000
[09-08-2021 14:15:18] NPCD: ThreadCounter 0/5 File is .
[09-08-2021 14:15:18] NPCD: DEBUG: load 3.690000/999.000000
[09-08-2021 14:15:18] NPCD: ThreadCounter 0/5 File is ..
[09-08-2021 14:15:18] NPCD: No more files to process... waiting for 15 seconds
[09-08-2021 14:15:33] NPCD: Found 2 files in /var/nagiosramdisk/spool/perfdata/
[09-08-2021 14:15:33] NPCD: DEBUG: load 3.560000/999.000000
[09-08-2021 14:15:33] NPCD: ThreadCounter 0/5 File is .
[09-08-2021 14:15:33] NPCD: DEBUG: load 3.560000/999.000000
[09-08-2021 14:15:33] NPCD: ThreadCounter 0/5 File is ..
[09-08-2021 14:15:33] NPCD: No more files to process... waiting for 15 seconds
[09-08-2021 14:15:48] NPCD: Found 2 files in /var/nagiosramdisk/spool/perfdata/
[09-08-2021 14:15:48] NPCD: DEBUG: load 3.380000/999.000000
[09-08-2021 14:15:48] NPCD: ThreadCounter 0/5 File is .
[09-08-2021 14:15:48] NPCD: DEBUG: load 3.380000/999.000000
[09-08-2021 14:15:48] NPCD: ThreadCounter 0/5 File is ..
[09-08-2021 14:15:48] NPCD: No more files to process... waiting for 15 seconds

total # of hosts and services, grep command comes back 0/0
GUI tells me 388 hosts and 13528 services

no results from that chage line given, but chage -l returns

Last password change : Jul 08, 2021
Password expires : never
Password inactive : never
Account expires : never
Minimum number of days between password change : 0
Maximum number of days between password change : 99999
Number of days of warning before password expires : 21
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Should apply config reset all service checks?

Post by pbroste »

Hello @WillH

We see that there appears to be a total 2097156 file count sitting in '/var/nagiosramdisk'. Let's go ahead and remove and start-up the ncpd service.

By removing the contents in '/var/nagiosramdisk' we will be purging performance data for graphic stats:

First; we stop the ncpd.service:

Code: Select all

systemctl stop npcd
Secondly; remove perfdata:

Code: Select all

find /var/nagiosramdisk/ -type f -name '*' -exec rm -r {} \;
Then start the ncpd.service:

Code: Select all

systemctl start npcd
Please let it run for awhile and watch the file count on:

Code: Select all

watch -n 1 'ls /var/nagiosramdisk/spool/xidpe | wc -l'
watch -n 1 'ls /var/nagiosramdisk/spool/perfdata | wc -l'
Follow up with the updated System Profile.
  • Login to the Nagios XI GUI using a web browser.
  • Click the "Admin" > "System Profile" Menu
  • Click the "Download Profile" button
  • Save the profile.zip file and share in a private message or upload it to the post/ticket
Thanks,
Perry


Thanks,
Perry
WillH
Posts: 54
Joined: Mon Aug 03, 2020 10:37 am

Re: Should apply config reset all service checks?

Post by WillH »

Perry,
do you mean the npcd service?
Locked