XI System Component Status & Nagiosxi issues

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
anil406
Posts: 43
Joined: Tue Apr 01, 2014 3:53 pm

XI System Component Status & Nagiosxi issues

Post by anil406 »

Hello Support,
I am seeing below after the nagiosxi upgrade to latest, and also nagios service status shows its not running, while its actually running. On web interface some hosts gives the Host Status Detail when clicked from the dashboard while some take me to blank page..
You do not have the required permissions to view the files attached to this post.
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: XI System Component Status & Nagiosxi issues

Post by tmcdonald »

Let's make sure all of your critical services are running:

Code: Select all

service crond status
service nagios status
service mysqld status
service ndo2db status
Former Nagios employee
anil406
Posts: 43
Joined: Tue Apr 01, 2014 3:53 pm

Re: XI System Component Status & Nagiosxi issues

Post by anil406 »

thanks. crond, mysqld are running, but nagios and ndo2db are not running..
[root@nagios01 var]# service nagios status
nagios is not running
---------------------------
Nagios show its not running, however process has kicked off..
nagios 4666 1 3 10:41 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

[root@nagios01 var]# service ndo2db status
ndo2db is not running but subsystem locked
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: XI System Component Status & Nagiosxi issues

Post by lmiltchev »

Run the following commands and show the output:

Code: Select all

tail -20 /var/log/messages
/usr/local/nagios/bin/nagios | head -2
/usr/local/nagios/bin/ndo2db | head -2
Be sure to check out our Knowledgebase for helpful articles and solutions!
anil406
Posts: 43
Joined: Tue Apr 01, 2014 3:53 pm

Re: XI System Component Status & Nagiosxi issues

Post by anil406 »

may i should have posted it here, as this is right ticket for nagiosxi issue..anyways,here we go.., I am seeing its unable to data sink due to ndo2db is not running, so i tried to start it, it unable to start due subsystem is locked, hence i removed the lock file and tried again, still wont start ..

[root@nagios01 init.d]# tail -20 /var/log/messages
Jul 31 15:15:23 nagios01 nagios: ndomod: Still unable to connect to data sink. 490948 items lost, 5000 queued items to flush.
Jul 31 15:21:12 nagios01 nagios: SERVICE ALERT: sprucegen016.load.ssn-silver.cb;TestTalk;CRITICAL;SOFT;1;CRITICAL - Plugin timed out after 10 seconds
Jul 31 15:22:06 nagios01 nagios: SERVICE ALERT: sprucegen016.load.ssn-silver.cb;TestTalk;OK;SOFT;2;OK
Jul 31 15:30:35 nagios01 nagios: ndomod: Still unable to connect to data sink. 518028 items lost, 5000 queued items to flush.
Jul 31 15:41:42 nagios01 nagios: Auto-save of retention data completed successfully.
Jul 31 15:45:47 nagios01 nagios: ndomod: Still unable to connect to data sink. 545139 items lost, 5000 queued items to flush.
Jul 31 15:52:12 nagios01 nagios: SERVICE ALERT: sprucegen016.load.ssn-silver.cb;TestTalk;CRITICAL;SOFT;1;CRITICAL - Plugin timed out after 10 seconds
Jul 31 15:53:08 nagios01 nagios: SERVICE ALERT: sprucegen016.load.ssn-silver.cb;TestTalk;OK;SOFT;2;OK
Jul 31 15:53:54 nagios01 nagios: HOST NOTIFICATION: nagiosadmin;container74-web-02.spruce.cb;DOWN;xi_host_notification_handler;check_icmp: Failed to resolve address
Jul 31 15:56:10 nagios01 nagios: SERVICE ALERT: soa-02.spruce.cb;/var usage;CRITICAL;SOFT;1;CRITICAL - Plugin timed out after 10 seconds
Jul 31 15:57:00 nagios01 nagios: SERVICE ALERT: soa-02.spruce.cb;/var usage;OK;SOFT;2;DISK OK - free space: /var 2629 MB (69% inode=99%):
Jul 31 15:58:24 nagios01 nagios: HOST NOTIFICATION: nagiosadmin;container74-web-01.spruce.cb;DOWN;xi_host_notification_handler;check_icmp: Failed to resolve address
Jul 31 16:00:36 nagios01 nagios: SERVICE NOTIFICATION: nagiosadmin;cid1-instance.spruce.cb;Total Processes;WARNING;xi_service_notification_handler;PROCS WARNING: 162 processes
Jul 31 16:00:59 nagios01 nagios: ndomod: Still unable to connect to data sink. 572248 items lost, 5000 queued items to flush.
Jul 31 16:13:31 nagios01 nagios: ndomod: Successfully connected to data sink. 594224 items lost, 5000 queued items to flush.
Jul 31 16:13:31 nagios01 nagios: ndomod: Successfully flushed 5000 queued items to data sink.
Jul 31 16:20:12 nagios01 nagios: SERVICE ALERT: sprucegen036.load.ssn-silver.cb;TestTalk;CRITICAL;SOFT;1;CRITICAL - Plugin timed out after 10 seconds
Jul 31 16:21:07 nagios01 nagios: SERVICE ALERT: sprucegen036.load.ssn-silver.cb;TestTalk;OK;SOFT;2;OK
Jul 31 16:27:11 nagios01 nagios: SERVICE ALERT: sprucegen001.load.ssn-silver.cb;Current Load;CRITICAL;SOFT;1;CRITICAL - Plugin timed out after 10 seconds
Jul 31 16:28:07 nagios01 nagios: SERVICE ALERT: sprucegen001.load.ssn-silver.cb;Current Load;OK;SOFT;2;OK - load average: 0.00, 0.00, 0.00

[root@nagios01 init.d]#/usr/local/nagios/bin/nagios | head -2;/usr/local/nagios/bin/ndo2db |head -2

Nagios Core 4.0.7

NDO2DB 2.0.0
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: XI System Component Status & Nagiosxi issues

Post by scottwilkerson »

Can you post the output of the following

Code: Select all

chage -l nagios
tail /var/log/cron
Thanks
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
anil406
Posts: 43
Joined: Tue Apr 01, 2014 3:53 pm

Re: XI System Component Status & Nagiosxi issues

Post by anil406 »

Scott, we are using ldap id(nagios). I am posting the tail of cron log..
Aug 1 12:36:01 nagios01 CROND[29101]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cleaner.php > /usr/local/nagiosxi/var/cleaner.log 2>&1)
Aug 1 12:36:01 nagios01 CROND[29104]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/reportengine.php > /usr/local/nagiosxi/var/reportengine.log 2>&1)
Aug 1 12:37:01 nagios01 CROND[29235]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/nom.php > /usr/local/nagiosxi/var/nom.log 2>&1)
Aug 1 12:37:01 nagios01 CROND[29236]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cleaner.php > /usr/local/nagiosxi/var/cleaner.log 2>&1)
Aug 1 12:37:01 nagios01 CROND[29237]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/reportengine.php > /usr/local/nagiosxi/var/reportengine.log 2>&1)
Aug 1 12:37:01 nagios01 CROND[29242]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php > /usr/local/nagiosxi/var/eventman.log 2>&1)
Aug 1 12:37:01 nagios01 CROND[29244]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php > /usr/local/nagiosxi/var/perfdataproc.log 2>&1)
Aug 1 12:37:01 nagios01 CROND[29240]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php > /usr/local/nagiosxi/var/feedproc.log 2>&1)
Aug 1 12:37:01 nagios01 CROND[29243]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php > /usr/local/nagiosxi/var/sysstat.log 2>&1)
Aug 1 12:37:01 nagios01 CROND[29245]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php > /usr/local/nagiosxi/var/cmdsubsys.log 2>&1)
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: XI System Component Status & Nagiosxi issues

Post by lmiltchev »

Can you post the nagios.cfg file?
Be sure to check out our Knowledgebase for helpful articles and solutions!
anil406
Posts: 43
Joined: Tue Apr 01, 2014 3:53 pm

Re: XI System Component Status & Nagiosxi issues

Post by anil406 »

[root@nagios01 etc]# more nagios.cfg

Code: Select all

# MODIFIED
admin_email=root@localhost
admin_pager=root@localhost
translate_passive_host_checks=1
log_event_handlers=0
use_large_installation_tweaks=1
enable_environment_macros=0


# NDOUtils module
broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg


# PNP settings - bulk mode with NCPD
process_performance_data=1
# service performance data
service_perfdata_file=/usr/local/nagios/var/service-perfdata
service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA
$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVI
CESTATETYPE$\tSERVICEOUTPUT::$SERVICEOUTPUT$
service_perfdata_file_mode=a
service_perfdata_file_processing_interval=15
service_perfdata_file_processing_command=process-service-perfdata-file-bulk
# host performance data
host_perfdata_file=/usr/local/nagios/var/host-perfdata
host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\
tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tHOSTOUTPUT::$HOSTOUTPUT$
host_perfdata_file_mode=a
host_perfdata_file_processing_interval=15
host_perfdata_file_processing_command=process-host-perfdata-file-bulk


# OBJECTS - UNMODIFIED
#cfg_file=/usr/local/nagios/etc/objects/commands.cfg
#cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
#cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
#cfg_file=/usr/local/nagios/etc/objects/templates.cfg
#cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg


# STATIC OBJECT DEFINITIONS (THESE DON'T GET EXPORTED/IMPORTED BY NAGIOSQL)
cfg_dir=/usr/local/nagios/etc/static

# OBJECTS EXPORTED FROM NAGIOSQL
cfg_file=/usr/local/nagios/etc/contacttemplates.cfg
cfg_file=/usr/local/nagios/etc/contactgroups.cfg
cfg_file=/usr/local/nagios/etc/contacts.cfg
cfg_file=/usr/local/nagios/etc/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/commands.cfg
cfg_file=/usr/local/nagios/etc/hostgroups.cfg
cfg_file=/usr/local/nagios/etc/servicegroups.cfg
cfg_file=/usr/local/nagios/etc/hosttemplates.cfg
cfg_file=/usr/local/nagios/etc/servicetemplates.cfg
cfg_file=/usr/local/nagios/etc/servicedependencies.cfg
cfg_file=/usr/local/nagios/etc/serviceescalations.cfg
cfg_file=/usr/local/nagios/etc/hostdependencies.cfg
cfg_file=/usr/local/nagios/etc/hostescalations.cfg
cfg_file=/usr/local/nagios/etc/hostextinfo.cfg
cfg_file=/usr/local/nagios/etc/serviceextinfo.cfg
[root@nagios01 etc]# ls /usr/local/nagios/bin/
file2sock   log2ndo     nagios      nagiostats  ndo2db      ndomod.o    npcd        npcdmod.o   nrpe        nsca        sockdebug
[root@nagios01 etc]# more nagios.cfg
# MODIFIED
admin_email=root@localhost
admin_pager=root@localhost
translate_passive_host_checks=1
log_event_handlers=0
use_large_installation_tweaks=1
enable_environment_macros=0


# NDOUtils module
broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg


# PNP settings - bulk mode with NCPD
process_performance_data=1
# service performance data
service_perfdata_file=/usr/local/nagios/var/service-perfdata
service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA
$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVI
CESTATETYPE$\tSERVICEOUTPUT::$SERVICEOUTPUT$
service_perfdata_file_mode=a
service_perfdata_file_processing_interval=15
service_perfdata_file_processing_command=process-service-perfdata-file-bulk
# host performance data
host_perfdata_file=/usr/local/nagios/var/host-perfdata
host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\
tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tHOSTOUTPUT::$HOSTOUTPUT$
host_perfdata_file_mode=a
host_perfdata_file_processing_interval=15
host_perfdata_file_processing_command=process-host-perfdata-file-bulk


# OBJECTS - UNMODIFIED
#cfg_file=/usr/local/nagios/etc/objects/commands.cfg
#cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
#cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
#cfg_file=/usr/local/nagios/etc/objects/templates.cfg
#cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg


# STATIC OBJECT DEFINITIONS (THESE DON'T GET EXPORTED/IMPORTED BY NAGIOSQL)
cfg_dir=/usr/local/nagios/etc/static

# OBJECTS EXPORTED FROM NAGIOSQL
cfg_file=/usr/local/nagios/etc/contacttemplates.cfg
cfg_file=/usr/local/nagios/etc/contactgroups.cfg
cfg_file=/usr/local/nagios/etc/contacts.cfg
cfg_file=/usr/local/nagios/etc/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/commands.cfg
cfg_file=/usr/local/nagios/etc/hostgroups.cfg
cfg_file=/usr/local/nagios/etc/servicegroups.cfg
cfg_file=/usr/local/nagios/etc/hosttemplates.cfg
cfg_file=/usr/local/nagios/etc/servicetemplates.cfg
cfg_file=/usr/local/nagios/etc/servicedependencies.cfg
cfg_file=/usr/local/nagios/etc/serviceescalations.cfg
cfg_file=/usr/local/nagios/etc/hostdependencies.cfg
cfg_file=/usr/local/nagios/etc/hostescalations.cfg
cfg_file=/usr/local/nagios/etc/hostextinfo.cfg
cfg_file=/usr/local/nagios/etc/serviceextinfo.cfg
cfg_dir=/usr/local/nagios/etc/hosts
cfg_dir=/usr/local/nagios/etc/services

# GLOBAL EVENT HANDLERS
global_host_event_handler=xi_host_event_handler
global_service_event_handler=xi_service_event_handler



# UNMODIFIED
accept_passive_host_checks=1
accept_passive_service_checks=1
additional_freshness_latency=15
auto_reschedule_checks=0
auto_rescheduling_interval=30
auto_rescheduling_window=180
bare_update_check=0
cached_host_check_horizon=15
cached_service_check_horizon=15
check_external_commands=1
check_for_orphaned_hosts=1
check_for_orphaned_services=1
check_for_updates=1
check_host_freshness=0
check_result_path=/usr/local/nagios/var/spool/checkresults
check_result_reaper_frequency=10
check_service_freshness=1
command_check_interval=-1
command_file=/usr/local/nagios/var/rw/nagios.cmd
daemon_dumps_core=0
date_format=us
debug_file=/usr/local/nagios/var/nagios.debug
debug_level=0
debug_verbosity=1
enable_embedded_perl=1
enable_event_handlers=1
enable_flap_detection=1
enable_notifications=1
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
event_broker_options=-1
event_handler_timeout=30
execute_host_checks=1
execute_service_checks=1
external_command_buffer_slots=4096
high_host_flap_threshold=20.0
high_service_flap_threshold=20.0
host_check_timeout=30
host_freshness_check_interval=60
host_inter_check_delay_method=s
illegal_macro_output_chars=`~$&|'"<>
illegal_object_name_chars=`~!$%^&*|'"<>?,()=
interval_length=60
lock_file=/usr/local/nagios/var/nagios.lock
log_archive_path=/usr/local/nagios/var/archives
log_external_commands=0
log_file=/usr/local/nagios/var/nagios.log
log_host_retries=1
log_initial_states=0
anil406
Posts: 43
Joined: Tue Apr 01, 2014 3:53 pm

Re: XI System Component Status & Nagiosxi issues

Post by anil406 »

Hello lmitchev,
I think ndo2db binary isnt updating the ndo2db lock with pid, as you can see lock files are empty even though service is up. Same happens with nagios too...Here is what i found..

[root@nagios01 bin]# bash -x /etc/init.d/ndo2db status
+ '[' -f /etc/rc.d/init.d/functions ']'
+ . /etc/rc.d/init.d/functions
++ TEXTDOMAIN=initscripts
++ umask 022
++ PATH=/sbin:/usr/sbin:/bin:/usr/bin
++ export PATH
++ '[' -z '' ']'
++ COLUMNS=80
++ '[' -z '' ']'
+++ /sbin/consoletype
++ CONSOLETYPE=pty
++ '[' -f /etc/sysconfig/i18n -a -z '' -a -z '' ']'
++ . /etc/profile.d/lang.sh
++ unset LANGSH_SOURCED
++ '[' -z '' ']'
++ '[' -f /etc/sysconfig/init ']'
++ . /etc/sysconfig/init
+++ BOOTUP=color
+++ RES_COL=60
+++ MOVE_TO_COL='echo -en \033[60G'
+++ SETCOLOR_SUCCESS='echo -en \033[0;32m'
+++ SETCOLOR_FAILURE='echo -en \033[0;31m'
+++ SETCOLOR_WARNING='echo -en \033[0;33m'
+++ SETCOLOR_NORMAL='echo -en \033[0;39m'
+++ PROMPT=yes
+++ AUTOSWAP=no
+++ ACTIVE_CONSOLES='/dev/tty[1-6]'
+++ SINGLE=/sbin/sushell
++ '[' pty = serial ']'
++ __sed_discard_ignored_files='/\(~\|\.bak\|\.orig\|\.rpmnew\|\.rpmorig\|\.rpmsave\)$/d'
+ servicename=ndo2db
+ prefix=/usr/local/nagios
+ exec_prefix=/usr/local/nagios
+ Ndo2dbBin=/usr/local/nagios/bin/ndo2db
+ Ndo2dbCfgFile=/usr/local/nagios/etc/ndo2db.cfg
+ Ndo2dbVarDir=/usr/local/nagios/var
+ Ndo2dbRunFile=/usr/local/nagios/var/ndo2db.lock
+ Ndo2dbLockDir=/usr/local/nagiosxi/var/subsys
+ Ndo2dbLockFile=ndo2db
+ Ndo2dbUser=nagios
+ Ndo2dbGroup=nagios
+ '[' '!' -f /usr/local/nagios/bin/ndo2db ']'
+ '[' '!' -f /usr/local/nagios/etc/ndo2db.cfg ']'
+ case "$1" in
+ printstatus_ndo2db
+ status_ndo2db
+ pid_ndo2db
+ test '!' -f /usr/local/nagios/var/ndo2db.lock
++ head -n 1 /usr/local/nagios/var/ndo2db.lock
+ Ndo2dbPID=
+ return 0
+ ps -p
+ test -f /usr/local/nagiosxi/var/subsys/ndo2db
+ return 2
+ test 2 == 2
+ echo 'ndo2db is not running but subsystem locked'
ndo2db is not running but subsystem locked
+ exit 1
[root@nagios01 bin]# ps -ef | grep ndo2db
nagios 19643 1 0 14:39 ? 00:00:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios 19763 19643 0 14:39 ? 00:00:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios 19764 19763 1 14:39 ? 00:00:04 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
root 31771 635 0 14:45 pts/0 00:00:00 grep ndo2db
[root@nagios01 bin]# ls -l /usr/local/nagios/var/ndo2db.lock
-rw-r--r-- 1 nagios nagios 0 Aug 1 14:39 /usr/local/nagios/var/ndo2db.lock
[root@nagios01 bin]#
Locked