Nagios VM Crashes

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
ahoward12
Posts: 137
Joined: Thu Jan 05, 2017 10:24 am

Nagios VM Crashes

Post by ahoward12 »

Hey Gents, I have a question for you. My Nagios XI VM is randomly crashing and I can't find any correlation as to why. When I say crashing, it is completely unresponsive, not even pingable.

Here is the output from /var/log/secure

Code: Select all

[root@NAGIOS log]# tail -1000 secure
May 27 10:53:03 NAGIOS sudo: PAM unable to dlopen(/lib64/security/pam_fprintd.so): /lib64/security/pam_fprintd.so: cannot open shared object file: No such file or directory
May 27 10:53:03 NAGIOS sudo: PAM adding faulty module: /lib64/security/pam_fprintd.so
May 27 10:53:03 NAGIOS sudo:   nagios : TTY=unknown ; PWD=/home/nagios ; USER=root ; COMMAND=/usr/local/nagiosxi/scripts/reset_config_perms.sh
May 29 10:06:40 NAGIOS sshd[1404]: Server listening on 0.0.0.0 port 22.
May 29 10:06:40 NAGIOS sshd[1404]: Server listening on :: port 22.
May 29 10:07:05 NAGIOS sudo: PAM unable to dlopen(/lib64/security/pam_fprintd.so): /lib64/security/pam_fprintd.so: cannot open shared object file: No such file or directory
May 29 10:07:05 NAGIOS sudo: PAM adding faulty module: /lib64/security/pam_fprintd.so
May 29 10:07:05 NAGIOS sudo:   nagios : TTY=unknown ; PWD=/home/nagios ; USER=root ; COMMAND=/usr/local/nagiosxi/scripts/reset_config_perms.sh
May 29 10:07:19 NAGIOS login: PAM unable to dlopen(/lib64/security/pam_fprintd.so): /lib64/security/pam_fprintd.so: cannot open shared object file: No such file or directory
May 29 10:07:19 NAGIOS login: PAM adding faulty module: /lib64/security/pam_fprintd.so
So as seen above it looks like it crashed on the 27th..When I force shutdown/power on, my database is corrupt and I have to run the "repair_databases.sh" (The output is attached). There is not relevant in /var/log/messages. I can give an output if request though. This happened about 4 days ago. I did the same steps above and it seemed fine, just a fluke. However, it happened again...I have also attached a system profile.

Nagios XI 5.4.11
CentOS 6.8
VMware
You do not have the required permissions to view the files attached to this post.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios VM Crashes

Post by scottwilkerson »

What kind of vmware infrastructure is this running on? Could another VM possibly utilize all the disk? is the datastore network attached which could cause the drive to disconnect?

Those are the things I can think of. If the VM loses connectivity to the disk mid-write or is ever forced off, it is typical to have to repair the DB because it is constantly writing data.

If it happens again it may be useful to get a full copy of /var/log/messages to see what happened just before it froze.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
ahoward12
Posts: 137
Joined: Thu Jan 05, 2017 10:24 am

Re: Nagios VM Crashes

Post by ahoward12 »

It's running on 6.5, the data store it is sitting on has Terabytes free so that is not the issue. The Nagios VM itself has 10 GBs free give or take.The datastore is network attached but there are no erorrs on other machines, the SAN or the host. I did have a copy of /var/log/messages but there was nothing relevant at all.

Kind of an annoying situation, this scenario has happened before maybe a year ago, I went to reference the post I am referring to but since I let my support lapse I no longer have access to the Customer Forums, I find that a bit annoying that I cannot even look at my past threads. If I remember correctly it had something to do with increasing some table size in the database if that rings a bell?

Thanks
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios VM Crashes

Post by scottwilkerson »

I wasn't talking about size of the datastore but possibility of connection being interupted.

I moved 1 of you previous threads to the general forum so you can see it.
https://support.nagios.com/forum/viewto ... =6&t=46377

And the only thread where I see you made a DB change, this was the change

Code: Select all

echo "use nagios;alter table nagios_logentries modify logentry_data varchar(4096) not null;" | mysql -pnagiosxi
Although this wouldn't never cause an XI server to completely lockup
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked