Page 1 of 1
Issues after Nagios XI 5.5 upgrade
Posted: Fri Jul 13, 2018 12:55 am
by sitaonair
Hi,
After a successful upgrade to Nagios XI 5.5, it was noticed that the host server was running out of disk space and upon checking, there were a couple of nagios.tmp files, retention.dat, and status.dat located in /usr/local/nagios/var which were very large.
It was also observed that the ndo2db process was consuming alot of cpu resources and from the Nagio XI GUI, all checks were in a pending state.
As we were not able to resolve the issue, we had to roll back to the previous version (5.4.11) and all is working fine now. Wondering if there was anything that was missed in the upgrade which could have caused these issues?
Thanks
Re: Issues after Nagios XI 5.5 upgrade
Posted: Fri Jul 13, 2018 9:57 am
by tmcdonald
That all sounds like there were multiple nagios processes running which were competing for resources. The steps in this guide should fix the issue if that is what it is:
https://support.nagios.com/kb/article/n ... ts-27.html
The article was written for a similar problem but the same steps would be used to make sure only one process group is running.
If that turns out not to be the issue, would you be able to run the upgrade again and send me a system profile via PM?
Re: Issues after Nagios XI 5.5 upgrade
Posted: Mon Jul 16, 2018 3:53 am
by sitaonair
tmcdonald wrote:That all sounds like there were multiple nagios processes running which were competing for resources. The steps in this guide should fix the issue if that is what it is:
https://support.nagios.com/kb/article/n ... ts-27.html
The article was written for a similar problem but the same steps would be used to make sure only one process group is running.
If that turns out not to be the issue, would you be able to run the upgrade again and send me a system profile via PM?
Hi, at the point of the issue, we did check and there was only 1 nagios process running. We also did a restart of the nagios process and a restart of the server itself when that did not work.
Re: Issues after Nagios XI 5.5 upgrade
Posted: Mon Jul 16, 2018 12:43 pm
by jomann
There could have been issues with the Nagios Core process and the upgrade itself. Core is updated in XI 5.5 and that can cause some weird issues. Do you have any 3rd party extensions with Core or anything that you have set up that interacts (mod_gearman, ramdisk, etc) with Core on this box?
Re: Issues after Nagios XI 5.5 upgrade
Posted: Wed Jul 18, 2018 6:47 am
by sitaonair
We do have a long list of plugin/scripts, but I do not see anything on gearman/ramdisk. Could you kindly advise how do I check if the script interacts with Core?
Code: Select all
check_apt
check_asa_l2lvpn.pl
check_asterisk.pl
check_asterisk_sip_peers.sh
check_bgp.0.4.pl
check_bgp.sh
check_bgp_counters
check_bgp_neighbors.sh
check_bgpstate_custom
check_bl
check_bpi.php
check_breeze
check_by_ssh
check_cisco.pl
check_cisco_bgp.pl
check_cisco_firewall.sh
check_cisco_fru_module.pl
check_cisco_ip_sla.py
check_cisco_ipsla.pl
check_cisco_snmp.pl
check_clamd
check_cluster
check_cpu_stats.sh
check_dhcp
check_dig
check_dir
check_disk
check_disk_smb
check_dns
check_docker.sh
check_domain.php
check_dummy
check_em01.pl
check_email_delivery
check_email_delivery_epn
check_email_loop.pl
check_esx3.pl
check_file_age
check_flexlm
check_fortigate.pl
check_fortigate_vpn.pl
check_fping
check_ftp
check_ftp_fully
check_game
check_hpjd
check_http
check_http_proxy
check_icmp
check_ide_smart
check_ifoperstatnag
check_ifoperstatus
check_ifstatus
check_imap
check_imap_receive
check_imap_receive_epn
check_init_service
check_ircd
check_jabber
check_jbossAS.py
check_ldap
check_ldaps
check_load
check_loadmaster.pl
check_log
check_mailq
check_mongodb.py
check_mountpoints.sh
check_mrtg
check_mrtgtraf
check_mssql
check_mssql_database.py
check_mssql_server.py
check_multi
check_mysql
check_mysql_health
check_mysql_query
check_nagios
check_nagios_performance.php
check_nagioslogserver.php
check_nagiosxiserver.php
check_ncpa.py
check_netstat.pl
check_newrelic.pl
check_newrelic.pl.orig
check_nna.py
check_nntp
check_nntps
check_nrpe
check_nt
check_ntp
check_ntp_peer
check_ntp_time
check_nwstat
check_open_files.pl
check_oracle
check_ospf.0.1.pl
check_ospf_counters
check_ospf_counters_custom
check_overcr
check_pgsql
check_ping
check_pnp_rrds.pl
check_polling
check_pop
check_postgres.pl
check_postgres.pl.BakupFixingDbReplication
check_postgres_archive_ready
check_postgres_autovac_freeze
check_postgres_backends
check_postgres_bloat
check_postgres_checkpoint
check_postgres_cluster_id
check_postgres_commitratio
check_postgres_connection
check_postgres_custom_query
check_postgres_database_size
check_postgres_dbstats
check_postgres_disabled_triggers
check_postgres_disk_space
check_postgres_fsm_pages
check_postgres_fsm_relations
check_postgres_hitratio
check_postgres_hot_standby_delay
check_postgres_index_size
check_postgres_last_analyze
check_postgres_last_autoanalyze
check_postgres_last_autovacuum
check_postgres_last_vacuum
check_postgres_listener
check_postgres_locks
check_postgres_logfile
check_postgres_new_version_bc
check_postgres_new_version_box
check_postgres_new_version_cp
check_postgres_new_version_pg
check_postgres_new_version_tnm
check_postgres_pgagent_jobs
check_postgres_pgb_pool_cl_active
check_postgres_pgb_pool_cl_waiting
check_postgres_pgb_pool_maxwait
check_postgres_pgb_pool_sv_active
check_postgres_pgb_pool_sv_idle
check_postgres_pgb_pool_sv_login
check_postgres_pgb_pool_sv_tested
check_postgres_pgb_pool_sv_used
check_postgres_pgbouncer_backends
check_postgres_pgbouncer_checksum
check_postgres_prepared_txns
check_postgres_query_runtime
check_postgres_query_time
check_postgres_relation_size
check_postgres_replicate_row
check_postgres_same_schema
check_postgres_sequence
check_postgres_settings_checksum
check_postgres_slony_status
check_postgres_table_size
check_postgres_timesync
check_postgres_txn_idle
check_postgres_txn_time
check_postgres_txn_wraparound
check_postgres_version
check_postgres_wal_files
check_procs
check_puppet_agent
check_puppet_agent.sh
check_radius.py
check_radius_adv
check_real
check_rpc
check_rrdtraf
check_rrdtraf.php
check_sensors
check_services
check_simap
check_sip
check_smtp
check_smtp_send
check_smtp_send_epn
check_snmp
check_snmp_IBM_Bladecenter.pl
check_snmp_boostedge.pl
check_snmp_cisco_bgp.pl
check_snmp_cisco_ospf-neighbor.pl
check_snmp_cpfw.pl
check_snmp_css.pl
check_snmp_css_main.pl
check_snmp_env.pl
check_snmp_generic.pl
check_snmp_int.pl
check_snmp_linkproof_nhr.pl
check_snmp_load.pl
check_snmp_load_wizard.pl
check_snmp_mem.pl
check_snmp_nsbox.pl
check_snmp_process.pl
check_snmp_process_wizard.pl
check_snmp_storage.pl
check_snmp_storage_wizard.pl
check_snmp_vrrp.pl
check_snmp_win.pl
check_spop
check_ssh
check_ssh_expect.pl
check_ssh_expect.pl.org
check_ssmtp
check_supervisor
check_swap
check_tcp
check_tftp.sh
check_time
check_udp
check_ups
check_uptime
check_users
check_wave
check_webinject.sh
check_win_snmp_disk.pl
check_wmi_plus.conf
check_wmi_plus.ini
check_wmi_plus.pl
check_xisla.php
check_yum
cookies.txt
custom_check_mem
custom_check_procs
folder_watch.pl
html
nagisk.pl
negate
new_relic.pl
newrelic.py
process_perfdata.pl
send_nsca
test.py
urlize
utils.pm
utils.sh
Re: Issues after Nagios XI 5.5 upgrade
Posted: Wed Jul 18, 2018 11:45 am
by swolf
mod_gearman and
the ramdisk tweak aren't plugins/utility scripts, rather they're modifications you would make to the server if you needed greater performance in your environment. We usually advise customers to look into these at when their environment has around 10k checks. I've linked documentation in case you're curious about them, but I'm guessing we don't need to worry about these.
The plugins themselves should update automatically to nagios-plugins 2.2.1 when you upgrade to XI 5.5, so I wouldn't expect these to be the issue.
When you upgraded, did you go to version 5.5.0 or to 5.5.1? Your first post came pretty close to the release of 5.5.1, and there was an issue with NDO in 5.5.0, so there's a chance your issue will be fixed simply by trying to upgrade again.
If you do have the same issues after upgrading again, try to retrieve the upgrade.log file and upload it here (or PM it to jomann or myself) before rolling back. Hopefully this will help us figure out what's going on. It might also be helpful if you could paste the terminal output of the following command:
Code: Select all
/usr/local/nagios/bin/nagios --version
which will just tell us if Nagios Core did successfully update.
Re: Issues after Nagios XI 5.5 upgrade
Posted: Thu Jul 19, 2018 5:16 am
by sitaonair
Just did a check, we do not have gearman or ramdisk installed on the Nagios server.
My upgrade was to 5.5.0, and yes I noticed 5.5.1 was released shortly after I had the failed upgrade and rolled back. If there any document on the NDO issue as I could not locate it in the changelog?
Would require some justification on trying a failed upgrade in such a short period or time. Thanks
Re: Issues after Nagios XI 5.5 upgrade
Posted: Thu Jul 19, 2018 8:08 am
by scottwilkerson
We release 5.5.1 so quickly particularly to resolve a bug in Core that would cause down hosts and services to continue to use their retry_interval which is typically lower than the normal check_interval which would cause more checks to run than usual.
While we did not have any other reports of NDO having higher than normal usage, it would be slightly higher, and may manifest itself more in a large environment.