nagios tcp connection issue

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
benhank
Posts: 1264
Joined: Tue Apr 12, 2011 12:29 pm

nagios tcp connection issue

Post by benhank »

My nagios server suddenly will not allow Http connections =(
I can ping it, my other nagios server is showing that it is up, but just refusing the connections.
I can even log onto it with putty. It also stopped sending notifications when it went down.

So I did the f0llowing, and nagios started flooding my users with alerts.

Code: Select all


[root@lkennagiosp02 ~]# service nagios restart
Running configuration check...done.
Stopping nagios: No lock file found in /usr/local/nagios/var/nagios.lock
Starting nagios: done.
[root@lkennagiosp02 ~]# No lock file found in /usr/local/nagios/var/nagios.lock
-bash: No: command not found
[root@lkennagiosp02 ~]# service http restart
http: unrecognized service
[root@lkennagiosp02 ~]# service mysqld start
Starting mysqld:                                           [  OK  ]
[root@lkennagiosp02 ~]# service nagios stop
Stopping nagios: .done.
then I did the following:

Code: Select all

[root@lkennagiosp02 ~]# service nagios restart
Running configuration check...done.
Stopping nagios: No lock file found in /usr/local/nagios/var/nagios.lock
Starting nagios: done
[root@lkennagiosp02 ~]# nagios -v /etc/nagios/nagios.cfg

Nagios Core 3.4.1
Copyright (c) 2009-2011 Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 05-11-2012
License: GPL

Website: http://www.nagios.org
Reading configuration data...
Error: Cannot open main configuration file '/etc/nagios/nagios.cfg' for reading!
   Error processing main config file!



***> One or more problems was encountered while processing the config files...

     Check your configuration file(s) to ensure that they contain valid
     directives and data defintions.  If you are upgrading from a previous
     version of Nagios, you should be aware that some variables/definitions
     may have been removed or modified in this version.  Make sure to read
     the HTML documentation regarding the config files, as well as the
     'Whats New' section to find out what has changed.

[root@lkennagiosp02 ~]#  No lock file found in /usr/local/nagios/var/nagios.lock
-bash: No: command not found
[root@lkennagiosp02 ~]# tup
-bash: tup: command not found
[root@lkennagiosp02 ~]# top
top - 14:19:44 up 9 days, 21:09,  1 user,  load average: 6.83, 5.45, 3.83
Tasks: 758 total,   1 running, 757 sleeping,   0 stopped,   0 zombie
Cpu(s): 14.8%us,  2.4%sy,  0.0%ni, 82.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  16296880k total, 13361724k used,  2935156k free,   317120k buffers
Swap: 18530296k total,        0k used, 18530296k free, 11772916k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1223 nagios    20   0  148m  13m 2260 S 11.8  0.1   0:00.36 check_wmi_plus.
 1225 nagios    20   0  148m  13m 2260 S 11.8  0.1   0:00.36 check_wmi_plus.
 1231 nagios    20   0  148m  13m 2260 S 11.8  0.1   0:00.36 check_wmi_plus.
 1227 nagios    20   0  148m  13m 2260 S 11.5  0.1   0:00.35 check_wmi_plus.
 1233 nagios    20   0  148m  13m 2260 S 11.5  0.1   0:00.35 check_wmi_plus.
 1209 nagios    20   0  148m  13m 2260 S  9.9  0.1   0:00.30 check_wmi_plus.
 1211 nagios    20   0  148m  13m 2260 S  9.9  0.1   0:00.30 check_wmi_plus.
 1213 nagios    20   0  148m  13m 2260 S  9.9  0.1   0:00.30 check_wmi_plus.
 1215 nagios    20   0  148m  13m 2260 S  9.9  0.1   0:00.30 check_wmi_plus.
 1217 nagios    20   0  148m  13m 2260 S  9.9  0.1   0:00.30 check_wmi_plus.
 1219 nagios    20   0  148m  13m 2260 S  9.9  0.1   0:00.30 check_wmi_plus.
 1221 nagios    20   0  148m  13m 2260 S  9.9  0.1   0:00.30 check_wmi_plus.
 1229 nagios    20   0  148m  13m 2260 S  9.5  0.1   0:00.29 check_wmi_plus.
30110 nagios    20   0 50912  21m 1032 S  8.9  0.1   2:11.85 nagios
 7468 root      20   0 15656 1988  988 R  1.3  0.0   0:03.30 top
   47 root      RT   0     0    0    0 S  0.3  0.0   0:00.93 migration/11
  136 root      20   0     0    0    0 S  0.3  0.0   0:28.79 events/5
 2074 postgres  20   0  208m  12m  11m S  0.3  0.1   0:54.94 postmaster
29066 nagios    20   0  206m  20m 7400 S  0.3  0.1   0:00.26 php
29068 nagios    20   0  207m  20m 7368 S  0.3  0.1   0:00.27 php
29099 postgres  20   0  209m 5972 3736 S  0.3  0.0   0:00.07 postmaster
29156 postgres  20   0  209m 6040 3812 S  0.3  0.0   0:00.03 postmaster
    1 root      20   0 19204 1504 1220 S  0.0  0.0   0:03.22 init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd
    3 root      RT   0     0    0    0 S  0.0  0.0   0:00.02 migration/0
    4 root      20   0     0    0    0 S  0.0  0.0   0:00.08 ksoftirqd/0
    5 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0
    6 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 watchdog/0
    7 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/1
    8 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/1
    9 root      20   0     0    0    0 S  0.0  0.0   0:00.08 ksoftirqd/1
   10 root      RT   0     0    0    0 S  0.0  0.0   0:00.05 watchdog/1
   11 root      RT   0     0    0    0 S  0.0  0.0   0:00.12 migration/2
   12 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/2
   13 root      20   0     0    0    0 S  0.0  0.0   0:00.03 ksoftirqd/2
   14 root      RT   0     0    0    0 S  0.0  0.0   0:00.12 watchdog/2
   15 root      RT   0     0    0    0 S  0.0  0.0   0:00.21 migration/3
   16 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/3
   17 root      20   0     0    0    0 S  0.0  0.0   0:00.01 ksoftirqd/3
   18 root      RT   0     0    0    0 S  0.0  0.0   0:03.21 watchdog/3
Proudly running:
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
User avatar
CGraham
Posts: 115
Joined: Tue Aug 16, 2011 2:43 pm

Re: nagios tcp connection issue

Post by CGraham »

Depending on your distro, the Apache service is "httpd" or "apache2"

If you're using the Nagios VM you will use:

service httpd status

You can also check if anything is listening for http connections by running the following and look for something listening on port 80 or 443.

netstat -an | grep "LISTEN"

Here's what mine looks like:

[root@hostname ~]# service httpd status
httpd (pid 1985) is running...

[root@hostname ~]# netstat -an | grep LISTEN
tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:5432 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN
tcp 0 0 :::80 :::* LISTEN
tcp 0 0 :::22 :::* LISTEN
tcp 0 0 ::1:631 :::* LISTEN
tcp 0 0 ::1:5432 :::* LISTEN
tcp 0 0 ::1:25 :::* LISTEN
tcp 0 0 :::5666 :::* LISTEN
tcp 0 0 :::5667 :::* LISTEN
User avatar
benhank
Posts: 1264
Joined: Tue Apr 12, 2011 12:29 pm

Re: nagios tcp connection issue

Post by benhank »

well, I rebooted the system (windows guys will we ever learn to not reboot linux servers...)
Now I can get to my nagios site buuut...

Code: Select all

Available Updates

A new Nagios XI update is available.

2011R3.3 was released on August 20th, 2012.

Visit www.nagios.com to obtain the latest update.
Latest Available Version:	2011R3.3
Installed Version:	2011R3.2
Last Update Check:	09/17/2012 17:18:21
Last Updated: 09/17/2012 17:18:23
SQL: SQL Error [ndoutils] : Table './nagios/nagios_programstatus' is marked as crashed and last (automatic?) repair failed SQL: SQL Error [ndoutils] : Table './nagios/nagios_programstatus' is marked as crashed and last (automatic?) repair failed SQL: SQL Error [ndoutils] : Table './nagios/nagios_programstatus' is marked as crashed and last (automatic?) repair failed SQL: SQL Error [ndoutils] : Table './nagios/nagios_programstatus' is marked as crashed and last (automatic?) repair failed SQL: SQL Error [ndoutils] : Table './nagios/nagios_programstatus' is marked as crashed and last (automatic?) repair failed SQL: SQL Error [ndoutils] : Table './nagios/nagios_programstatus' is marked as crashed and last (automatic?) repair failed SQL: SQL Error [ndoutils] : Table './nagios/nagios_programstatus' is marked as crashed and last (automatic?) repair failed
Service Status Summary
do I do the nagios DB repair?
Proudly running:
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: nagios tcp connection issue

Post by mguthrie »

User avatar
benhank
Posts: 1264
Joined: Tue Apr 12, 2011 12:29 pm

Re: nagios tcp connection issue

Post by benhank »

i think i see my problem, seems that for some reason my 1.5 tb server has no more disk space. i have no clue how that happened. the system has only been up for 3 months


edite:

ok the script ran and fixed a ton of tables, but now the mysqld service fails to start.
Proudly running:
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: nagios tcp connection issue

Post by mguthrie »

What kind of error message are you getting when you try to restart?

Get anything useful in the /var/log/mysqld.log?
User avatar
benhank
Posts: 1264
Joined: Tue Apr 12, 2011 12:29 pm

Re: nagios tcp connection issue

Post by benhank »

let me see. sheessh

nagios will either drive you to drink or drive you to stop drinkin.... lol
well not nagios itself but knot knowing how to manage it will

Edit:
well I got a small problem....note pad can't open the file because it is too large.
Word cant open it because it is over 512 mb
yes my mysql log is 1.5 gigs .....
Proudly running:
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: nagios tcp connection issue

Post by mguthrie »

We only need the last few lines. Try to start the mysqld service, and then lets grab a small chunk of the log:

Code: Select all

tail -100 /var/log/mysqld.log > sql.txt
User avatar
benhank
Posts: 1264
Joined: Tue Apr 12, 2011 12:29 pm

Re: nagios tcp connection issue

Post by benhank »

i think i see my problem..my messages log file is 9 gigs! and I have another one that is 1+ gig.
Proudly running:
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
User avatar
benhank
Posts: 1264
Joined: Tue Apr 12, 2011 12:29 pm

Re: nagios tcp connection issue

Post by benhank »

also I tried to start the service no go. I am just gonna delete those huge log files the try again

edit: deleted the files but the service wont start
Proudly running:
NagiosXI 5.4.12 2 node Prod Env 2500 hosts, 13,000 services
Nagiosxi 5.5.7(test env) 2500 hosts, 13,000 services
Nagios Logserver 2 node Prod Env 500 objects sending
Nagios Network Analyser
Nagios Fusion
Locked