Page 1 of 2

nagios tcp connection issue

Posted: Mon Sep 17, 2012 1:31 pm
by benhank
My nagios server suddenly will not allow Http connections =(
I can ping it, my other nagios server is showing that it is up, but just refusing the connections.
I can even log onto it with putty. It also stopped sending notifications when it went down.

So I did the f0llowing, and nagios started flooding my users with alerts.

Code: Select all


[root@lkennagiosp02 ~]# service nagios restart
Running configuration check...done.
Stopping nagios: No lock file found in /usr/local/nagios/var/nagios.lock
Starting nagios: done.
[root@lkennagiosp02 ~]# No lock file found in /usr/local/nagios/var/nagios.lock
-bash: No: command not found
[root@lkennagiosp02 ~]# service http restart
http: unrecognized service
[root@lkennagiosp02 ~]# service mysqld start
Starting mysqld:                                           [  OK  ]
[root@lkennagiosp02 ~]# service nagios stop
Stopping nagios: .done.
then I did the following:

Code: Select all

[root@lkennagiosp02 ~]# service nagios restart
Running configuration check...done.
Stopping nagios: No lock file found in /usr/local/nagios/var/nagios.lock
Starting nagios: done
[root@lkennagiosp02 ~]# nagios -v /etc/nagios/nagios.cfg

Nagios Core 3.4.1
Copyright (c) 2009-2011 Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 05-11-2012
License: GPL

Website: http://www.nagios.org
Reading configuration data...
Error: Cannot open main configuration file '/etc/nagios/nagios.cfg' for reading!
   Error processing main config file!



***> One or more problems was encountered while processing the config files...

     Check your configuration file(s) to ensure that they contain valid
     directives and data defintions.  If you are upgrading from a previous
     version of Nagios, you should be aware that some variables/definitions
     may have been removed or modified in this version.  Make sure to read
     the HTML documentation regarding the config files, as well as the
     'Whats New' section to find out what has changed.

[root@lkennagiosp02 ~]#  No lock file found in /usr/local/nagios/var/nagios.lock
-bash: No: command not found
[root@lkennagiosp02 ~]# tup
-bash: tup: command not found
[root@lkennagiosp02 ~]# top
top - 14:19:44 up 9 days, 21:09,  1 user,  load average: 6.83, 5.45, 3.83
Tasks: 758 total,   1 running, 757 sleeping,   0 stopped,   0 zombie
Cpu(s): 14.8%us,  2.4%sy,  0.0%ni, 82.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  16296880k total, 13361724k used,  2935156k free,   317120k buffers
Swap: 18530296k total,        0k used, 18530296k free, 11772916k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1223 nagios    20   0  148m  13m 2260 S 11.8  0.1   0:00.36 check_wmi_plus.
 1225 nagios    20   0  148m  13m 2260 S 11.8  0.1   0:00.36 check_wmi_plus.
 1231 nagios    20   0  148m  13m 2260 S 11.8  0.1   0:00.36 check_wmi_plus.
 1227 nagios    20   0  148m  13m 2260 S 11.5  0.1   0:00.35 check_wmi_plus.
 1233 nagios    20   0  148m  13m 2260 S 11.5  0.1   0:00.35 check_wmi_plus.
 1209 nagios    20   0  148m  13m 2260 S  9.9  0.1   0:00.30 check_wmi_plus.
 1211 nagios    20   0  148m  13m 2260 S  9.9  0.1   0:00.30 check_wmi_plus.
 1213 nagios    20   0  148m  13m 2260 S  9.9  0.1   0:00.30 check_wmi_plus.
 1215 nagios    20   0  148m  13m 2260 S  9.9  0.1   0:00.30 check_wmi_plus.
 1217 nagios    20   0  148m  13m 2260 S  9.9  0.1   0:00.30 check_wmi_plus.
 1219 nagios    20   0  148m  13m 2260 S  9.9  0.1   0:00.30 check_wmi_plus.
 1221 nagios    20   0  148m  13m 2260 S  9.9  0.1   0:00.30 check_wmi_plus.
 1229 nagios    20   0  148m  13m 2260 S  9.5  0.1   0:00.29 check_wmi_plus.
30110 nagios    20   0 50912  21m 1032 S  8.9  0.1   2:11.85 nagios
 7468 root      20   0 15656 1988  988 R  1.3  0.0   0:03.30 top
   47 root      RT   0     0    0    0 S  0.3  0.0   0:00.93 migration/11
  136 root      20   0     0    0    0 S  0.3  0.0   0:28.79 events/5
 2074 postgres  20   0  208m  12m  11m S  0.3  0.1   0:54.94 postmaster
29066 nagios    20   0  206m  20m 7400 S  0.3  0.1   0:00.26 php
29068 nagios    20   0  207m  20m 7368 S  0.3  0.1   0:00.27 php
29099 postgres  20   0  209m 5972 3736 S  0.3  0.0   0:00.07 postmaster
29156 postgres  20   0  209m 6040 3812 S  0.3  0.0   0:00.03 postmaster
    1 root      20   0 19204 1504 1220 S  0.0  0.0   0:03.22 init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd
    3 root      RT   0     0    0    0 S  0.0  0.0   0:00.02 migration/0
    4 root      20   0     0    0    0 S  0.0  0.0   0:00.08 ksoftirqd/0
    5 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0
    6 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 watchdog/0
    7 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/1
    8 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/1
    9 root      20   0     0    0    0 S  0.0  0.0   0:00.08 ksoftirqd/1
   10 root      RT   0     0    0    0 S  0.0  0.0   0:00.05 watchdog/1
   11 root      RT   0     0    0    0 S  0.0  0.0   0:00.12 migration/2
   12 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/2
   13 root      20   0     0    0    0 S  0.0  0.0   0:00.03 ksoftirqd/2
   14 root      RT   0     0    0    0 S  0.0  0.0   0:00.12 watchdog/2
   15 root      RT   0     0    0    0 S  0.0  0.0   0:00.21 migration/3
   16 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/3
   17 root      20   0     0    0    0 S  0.0  0.0   0:00.01 ksoftirqd/3
   18 root      RT   0     0    0    0 S  0.0  0.0   0:03.21 watchdog/3

Re: nagios tcp connection issue

Posted: Mon Sep 17, 2012 3:17 pm
by CGraham
Depending on your distro, the Apache service is "httpd" or "apache2"

If you're using the Nagios VM you will use:

service httpd status

You can also check if anything is listening for http connections by running the following and look for something listening on port 80 or 443.

netstat -an | grep "LISTEN"

Here's what mine looks like:

[root@hostname ~]# service httpd status
httpd (pid 1985) is running...

[root@hostname ~]# netstat -an | grep LISTEN
tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:5432 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN
tcp 0 0 :::80 :::* LISTEN
tcp 0 0 :::22 :::* LISTEN
tcp 0 0 ::1:631 :::* LISTEN
tcp 0 0 ::1:5432 :::* LISTEN
tcp 0 0 ::1:25 :::* LISTEN
tcp 0 0 :::5666 :::* LISTEN
tcp 0 0 :::5667 :::* LISTEN

Re: nagios tcp connection issue

Posted: Mon Sep 17, 2012 4:36 pm
by benhank
well, I rebooted the system (windows guys will we ever learn to not reboot linux servers...)
Now I can get to my nagios site buuut...

Code: Select all

Available Updates

A new Nagios XI update is available.

2011R3.3 was released on August 20th, 2012.

Visit www.nagios.com to obtain the latest update.
Latest Available Version:	2011R3.3
Installed Version:	2011R3.2
Last Update Check:	09/17/2012 17:18:21
Last Updated: 09/17/2012 17:18:23
SQL: SQL Error [ndoutils] : Table './nagios/nagios_programstatus' is marked as crashed and last (automatic?) repair failed SQL: SQL Error [ndoutils] : Table './nagios/nagios_programstatus' is marked as crashed and last (automatic?) repair failed SQL: SQL Error [ndoutils] : Table './nagios/nagios_programstatus' is marked as crashed and last (automatic?) repair failed SQL: SQL Error [ndoutils] : Table './nagios/nagios_programstatus' is marked as crashed and last (automatic?) repair failed SQL: SQL Error [ndoutils] : Table './nagios/nagios_programstatus' is marked as crashed and last (automatic?) repair failed SQL: SQL Error [ndoutils] : Table './nagios/nagios_programstatus' is marked as crashed and last (automatic?) repair failed SQL: SQL Error [ndoutils] : Table './nagios/nagios_programstatus' is marked as crashed and last (automatic?) repair failed
Service Status Summary
do I do the nagios DB repair?

Re: nagios tcp connection issue

Posted: Mon Sep 17, 2012 4:56 pm
by mguthrie

Re: nagios tcp connection issue

Posted: Tue Sep 18, 2012 9:50 am
by benhank
i think i see my problem, seems that for some reason my 1.5 tb server has no more disk space. i have no clue how that happened. the system has only been up for 3 months


edite:

ok the script ran and fixed a ton of tables, but now the mysqld service fails to start.

Re: nagios tcp connection issue

Posted: Tue Sep 18, 2012 10:02 am
by mguthrie
What kind of error message are you getting when you try to restart?

Get anything useful in the /var/log/mysqld.log?

Re: nagios tcp connection issue

Posted: Tue Sep 18, 2012 10:10 am
by benhank
let me see. sheessh

nagios will either drive you to drink or drive you to stop drinkin.... lol
well not nagios itself but knot knowing how to manage it will

Edit:
well I got a small problem....note pad can't open the file because it is too large.
Word cant open it because it is over 512 mb
yes my mysql log is 1.5 gigs .....

Re: nagios tcp connection issue

Posted: Tue Sep 18, 2012 10:25 am
by mguthrie
We only need the last few lines. Try to start the mysqld service, and then lets grab a small chunk of the log:

Code: Select all

tail -100 /var/log/mysqld.log > sql.txt

Re: nagios tcp connection issue

Posted: Tue Sep 18, 2012 10:29 am
by benhank
i think i see my problem..my messages log file is 9 gigs! and I have another one that is 1+ gig.

Re: nagios tcp connection issue

Posted: Tue Sep 18, 2012 10:36 am
by benhank
also I tried to start the service no go. I am just gonna delete those huge log files the try again

edit: deleted the files but the service wont start