Nagios had socket timeouts for all machines this weekend.

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
DaveHill
Posts: 1
Joined: Mon Jun 23, 2014 12:12 pm

Nagios had socket timeouts for all machines this weekend.

Post by DaveHill »

Hey there. This weekend we noticed nagios reporting 26 of ours servers as down. Based on what i am seeing in our nagios logs, almost every machine we have monitored reported socket timeout errors. However those 26 left over were configured not to scan over the weekends, so they stayed in a "down" state.

I'm not sure where to start to figure out what caused this issue or how to prevent it in the future. I've attached the nagios logs below with some edits for security reasons

Code: Select all

[1403499600] CURRENT SERVICE STATE:  Wanrouter Pri;C:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE:  Wanrouter Pri;CPU Load;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE:  Wanrouter Pri;Memory Usage;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE:  Wanrouter Pri;Mistriss.exe;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE:  Wanrouter Pri;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE:  Wanrouter Pri;NetOp Host;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE:  Wanrouter Pri;Remote Admin Server;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE:  Wanrouter Pri;TT Guardian;OK;HARD;1;guardian: Started
[1403499600] CURRENT SERVICE STATE:  Wanrouter Pri;TT Guardian Control;OK;HARD;1;guardianctrl: Started
[1403499600] CURRENT SERVICE STATE:  Wanrouter Pri;TT Messaging Router;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE:  Wanrouter Pri;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE:  Wanrouter Sec;C:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE:  Wanrouter Sec;CPU Load;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE:  Wanrouter Sec;Memory Usage;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE:  Wanrouter Sec;Mistriss.exe;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE:  Wanrouter Sec;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE:  Wanrouter Sec;NetOp Host;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE:  Wanrouter Sec;Remote Admin Server;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE:  Wanrouter Sec;TT Guardian;OK;HARD;1;guardian: Started
[1403499600] CURRENT SERVICE STATE:  Wanrouter Sec;TT Guardian Control;OK;HARD;1;guardianctrl: Started
[1403499600] CURRENT SERVICE STATE:  Wanrouter Sec;TT Messaging Router;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE:  Wanrouter Sec;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: GT-COLLECT;C:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: GT-COLLECT;E:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME-DB;CorvilCatcher.exe;OK;HARD;1;CorvilCatcher.exe: Running
[1403499600] CURRENT SERVICE STATE: SERVERNAME100;C:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME100;CPU Load;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME100;Memory Usage;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME100;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME100;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME101;C:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME101;CPU Load;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME101;Memory Usage;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME101;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME101;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME102;C:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME102;CPU Load;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME102;Memory Usage;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME102;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME102;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME103;C:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME103;CPU Load;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME103;Memory Usage;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME103;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME103;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME104;C:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME104;CPU Load;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME104;Memory Usage;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME104;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME104;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME105;C:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME105;CPU Load;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME105;Memory Usage;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME105;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME105;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME106;C:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME106;CPU Load;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME106;Memory Usage;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME106;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME106;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME107;C:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME107;CPU Load;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME107;Memory Usage;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME107;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME107;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME108;C:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME108;CPU Load;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME108;Memory Usage;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME108;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME108;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME129-12;C:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME129-12;CPU Load;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME129-12;Memory Usage;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME129-12;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME129-12;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME129.10.DB;C:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME129.10.DB;CPU Load;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME129.10.DB;Memory Usage;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME129.10.DB;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME129.10.DB;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME129.11;C:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME129.11;CPU Load;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME129.11;Memory Usage;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME129.11;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME129.11;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME129.40.DB;C:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME129.40.DB;CPU Load;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME129.40.DB;Memory Usage;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME129.40.DB;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME129.40.DB;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME80;C:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME80;CPU Load;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME80;Memory Usage;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME80;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME80;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME81;C:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME81;CPU Load;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME81;Memory Usage;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME81;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME81;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME82;C:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME82;CPU Load;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME82;Memory Usage;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME82;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME82;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME85;C:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME85;CPU Load;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME85;Memory Usage;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME85;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME85;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME87.DB;C:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME87.DB;CPU Load;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME87.DB;Memory Usage;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME87.DB;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME87.DB;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME88;C:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME88;CPU Load;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME88;Memory Usage;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME88;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME88;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME89;C:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME89;CPU Load;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME89;Memory Usage;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME89;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME89;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME90;C:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME90;CPU Load;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME90;Memory Usage;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME90;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME90;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME93;C:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME93;CPU Load;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME93;Memory Usage;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME93;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME93;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME94;C:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME94;CPU Load;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME94;Memory Usage;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME94;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME94;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME95;C:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME95;CPU Load;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME95;Memory Usage;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME95;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME95;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME97.DB;C:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME97.DB;CPU Load;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME97.DB;Memory Usage;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME97.DB;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME97.DB;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME98;C:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME98;CPU Load;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME98;Memory Usage;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME98;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME98;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME99;C:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME99;CPU Load;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME99;Memory Usage;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME99;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME99;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: GT-Deltix;CPU Load;OK;HARD;1;CPU Load 0% (5 min average)
[1403499600] CURRENT SERVICE STATE: GT-Deltix;Memory Usage;OK;HARD;1;Memory usage: total:32760.11 Mb - used: 9042.49 Mb (28%) - free: 23717.61 Mb (72%)
[1403499600] CURRENT SERVICE STATE: GT-Deltix;NSClient++ Version;OK;HARD;1;NSClient++ 0.3.8.76 2010-05-27
[1403499600] CURRENT SERVICE STATE: GT-Deltix;Uptime;OK;HARD;1;System Uptime - 155 day(s) 8 hour(s) 25 minute(s)
[1403499600] CURRENT SERVICE STATE: GT-HOUSE;HP Raid Array;OK;HARD;1;OK - Smart Array P410i in Slot 0 (Embedded) -/OK/OK (LD 1: OK [(3C:1:1 OK) (3C:1:2 OK)], LD 2: OK [(3C:1:3 OK)], LD 3: OK [(2C:1:5 OK) (2C:1:6 OK) (2C:1:7 OK) (2C:1:8 OK) (3C:1:4 OK) (4C:2:1 OK) (4C:2:2 OK) (4C:2:3 OK) (4C:2:4 OK) (5C:2:5 OK) (5C:2:6 OK)])
[1403499600] CURRENT SERVICE STATE: GT-INTRANET.OURDOMAIN;Adenin Scheduler Service;OK;HARD;1;SchedulerService: Started
[1403499600] CURRENT SERVICE STATE: GT-INTRANET.OURDOMAIN;C:\ Drive Space;OK;HARD;1;c:\ - total: 410.09 Gb - used: 279.71 Gb (68%) - free 130.39 Gb (32%)
[1403499600] CURRENT SERVICE STATE: GT-INTRANET.OURDOMAIN;CPU Load;OK;HARD;1;CPU Load 0% (5 min average)
[1403499600] CURRENT SERVICE STATE: GT-INTRANET.OURDOMAIN;Memory Usage;OK;HARD;1;Memory usage: total:5222.01 Mb - used: 1760.00 Mb (34%) - free: 3462.01 Mb (66%)
[1403499600] CURRENT SERVICE STATE: GT-INTRANET.OURDOMAIN;NSClient++ Version;OK;HARD;1;NSClient++ 0.3.7.493 2009-10-12
[1403499600] CURRENT SERVICE STATE: GT-INTRANET.OURDOMAIN;ReportingServicesService.exe;OK;HARD;1;ReportingServicesService.exe: Running
[1403499600] CURRENT SERVICE STATE: GT-INTRANET.OURDOMAIN;SQL Server Reporting Services;OK;HARD;1;ReportServer: Started
[1403499600] CURRENT SERVICE STATE: GT-INTRANET.OURDOMAIN;SchedulerService.exe;OK;HARD;1;SchedulerService.exe: Running
[1403499600] CURRENT SERVICE STATE: GT-INTRANET.OURDOMAIN;Uptime;OK;HARD;1;System Uptime - 0 day(s) 0 hour(s) 55 minute(s)
[1403499600] CURRENT SERVICE STATE: GT-TERMINAL1;C:\ Drive Space;OK;HARD;1;c:\ - total: 126.65 Gb - used: 28.43 Gb (22%) - free 98.23 Gb (78%)
[1403499600] CURRENT SERVICE STATE: GT-TERMINAL1;CPU Load;OK;HARD;1;CPU Load 0% (5 min average)
[1403499600] CURRENT SERVICE STATE: GT-TERMINAL1;Memory Usage;OK;HARD;1;Memory usage: total:9407.63 Mb - used: 6025.41 Mb (64%) - free: 3382.22 Mb (36%)
[1403499600] CURRENT SERVICE STATE: GT-TERMINAL1;NSClient++ Version;OK;HARD;1;NSClient++ 0.3.8.76 2010-05-27
[1403499600] CURRENT SERVICE STATE: GT-TERMINAL1;Uptime;OK;HARD;1;System Uptime - 216 day(s) 19 hour(s) 5 minute(s)
[1403499600] CURRENT SERVICE STATE: SERVERNAME-A;C:\ Drive Space;OK;HARD;1;c:\ - total: 136.69 Gb - used: 39.21 Gb (29%) - free 97.48 Gb (71%)
[1403499600] CURRENT SERVICE STATE: SERVERNAME-A;CPU Load;OK;HARD;1;CPU Load 0% (5 min average)
[1403499600] CURRENT SERVICE STATE: SERVERNAME-A;Memory Usage;OK;HARD;1;Memory usage: total:5976.45 Mb - used: 866.79 Mb (15%) - free: 5109.66 Mb (85%)
[1403499600] CURRENT SERVICE STATE: SERVERNAME-A;Mistriss.exe;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME-A;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME-A;NetOp Host;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME-A;Remote Admin Server;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME-A;TT Chron;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME-A;TT Fill Server;OK;HARD;1;SERVERNAME-AFillServer: Started
[1403499600] CURRENT SERVICE STATE: SERVERNAME-A;TT Guardian;OK;HARD;1;guardian: Started
[1403499600] CURRENT SERVICE STATE: SERVERNAME-A;TT Guardian Control;OK;HARD;1;guardianctrl: Started
[1403499600] CURRENT SERVICE STATE: SERVERNAME-A;TT Guardserver;OK;HARD;1;guardserv: Started
[1403499600] CURRENT SERVICE STATE: SERVERNAME-A;TT Messaging;OK;HARD;1;ttmd: Started
[1403499600] CURRENT SERVICE STATE: SERVERNAME-A;TT Order Server;OK;HARD;1;SERVERNAME-AOrderServer: Started
[1403499600] CURRENT SERVICE STATE: SERVERNAME-A;TT Price Server;OK;HARD;1;SERVERNAME-APriceServer: Started
[1403499600] CURRENT SERVICE STATE: SERVERNAME-A;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME-C;C:\ Drive Space;OK;HARD;1;c:\ - total: 136.70 Gb - used: 107.83 Gb (79%) - free 28.87 Gb (21%)
[1403499600] CURRENT SERVICE STATE: SERVERNAME-C;CPU Load;OK;HARD;1;CPU Load 0% (5 min average)
[1403499600] CURRENT SERVICE STATE: SERVERNAME-C;Memory Usage;OK;HARD;1;Memory usage: total:5976.45 Mb - used: 934.39 Mb (16%) - free: 5042.06 Mb (84%)
[1403499600] CURRENT SERVICE STATE: SERVERNAME-C;Mistriss.exe;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME-C;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME-C;NetOp Host;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME-C;Remote Admin Server;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME-C;TT Chron;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME-C;TT Fill Server;OK;HARD;1;SERVERNAME-CFillServer: Started
[1403499600] CURRENT SERVICE STATE: SERVERNAME-C;TT Guardian;OK;HARD;1;guardian: Started
[1403499600] CURRENT SERVICE STATE: SERVERNAME-C;TT Guardian Control;OK;HARD;1;guardianctrl: Started
[1403499600] CURRENT SERVICE STATE: SERVERNAME-C;TT Guardserver;OK;HARD;1;guardserv: Started
[1403499600] CURRENT SERVICE STATE: SERVERNAME-C;TT Messaging;OK;HARD;1;ttmd: Started
[1403499600] CURRENT SERVICE STATE: SERVERNAME-C;TT Order Server;OK;HARD;1;SERVERNAME-COrderServer: Started
[1403499600] CURRENT SERVICE STATE: SERVERNAME-C;TT Price Server;OK;HARD;1;SERVERNAME-CPriceServer: Started
[1403499600] CURRENT SERVICE STATE: SERVERNAME-C;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME-H;C:\ Drive Space;OK;HARD;1;c:\ - total: 136.69 Gb - used: 29.79 Gb (22%) - free 106.90 Gb (78%)
[1403499600] CURRENT SERVICE STATE: SERVERNAME-H;CPU Load;OK;HARD;1;CPU Load 0% (5 min average)
[1403499600] CURRENT SERVICE STATE: SERVERNAME-H;Memory Usage;OK;HARD;1;Memory usage: total:5976.46 Mb - used: 857.73 Mb (14%) - free: 5118.73 Mb (86%)
[1403499600] CURRENT SERVICE STATE: SERVERNAME-H;Mistriss.exe;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME-H;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME-H;NetOp Host;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME-H;Remote Admin Server;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME-H;TT Chron;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME-H;TT Fill Server;OK;HARD;1;SERVERNAME-HFillServer: Started
[1403499600] CURRENT SERVICE STATE: SERVERNAME-H;TT Guardian;OK;HARD;1;guardian: Started
[1403499600] CURRENT SERVICE STATE: SERVERNAME-H;TT Guardian Control;OK;HARD;1;guardianctrl: Started
[1403499600] CURRENT SERVICE STATE: SERVERNAME-H;TT Guardserver;OK;HARD;1;guardserv: Started
[1403499600] CURRENT SERVICE STATE: SERVERNAME-H;TT Messaging;OK;HARD;1;ttmd: Started
[1403499600] CURRENT SERVICE STATE: SERVERNAME-H;TT Order Server;OK;HARD;1;SERVERNAME-HOrderServer: Started
[1403499600] CURRENT SERVICE STATE: SERVERNAME-H;TT Price Server;OK;HARD;1;SERVERNAME-HPriceServer: Started
[1403499600] CURRENT SERVICE STATE: SERVERNAME-H;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME-N;C:\ Drive Space;OK;HARD;1;c:\ - total: 136.69 Gb - used: 10.01 Gb (7%) - free 126.68 Gb (93%)
[1403499600] CURRENT SERVICE STATE: SERVERNAME-N;CPU Load;OK;HARD;1;CPU Load 0% (5 min average)
[1403499600] CURRENT SERVICE STATE: SERVERNAME-N;Memory Usage;OK;HARD;1;Memory usage: total:5976.45 Mb - used: 641.48 Mb (11%) - free: 5334.97 Mb (89%)
[1403499600] CURRENT SERVICE STATE: SERVERNAME-N;Mistriss.exe;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME-N;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME-N;NetOp Host;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME-N;Remote Admin Server;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME-N;TT Chron;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: SERVERNAME-N;TT Guardian;OK;HARD;1;guardian: Started
[1403499600] CURRENT SERVICE STATE: SERVERNAME-N;TT Guardian Control;OK;HARD;1;guardianctrl: Started
[1403499600] CURRENT SERVICE STATE: SERVERNAME-N;TT Guardserver;OK;HARD;1;guardserv: Started
[1403499600] CURRENT SERVICE STATE: SERVERNAME-N;TT Messaging;OK;HARD;1;ttmd: Started
[1403499600] CURRENT SERVICE STATE: SERVERNAME-N;TT Price Server;OK;HARD;1;SERVERNAME-NPriceServer: Started
[1403499600] CURRENT SERVICE STATE: SERVERNAME-N;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: Mercury Wanrouter;C:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: Mercury Wanrouter;CPU Load;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: Mercury Wanrouter;Memory Usage;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: Mercury Wanrouter;Mistriss.exe;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: Mercury Wanrouter;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: Mercury Wanrouter;NetOp Host;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: Mercury Wanrouter;Remote Admin Server;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: Mercury Wanrouter;TT Guardian;OK;HARD;1;guardian: Started
[1403499600] CURRENT SERVICE STATE: Mercury Wanrouter;TT Guardian Control;OK;HARD;1;guardianctrl: Started
[1403499600] CURRENT SERVICE STATE: Mercury Wanrouter;TT Messaging Router;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: Mercury Wanrouter;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: NY Wanrouter;C:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: NY Wanrouter;CPU Load;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: NY Wanrouter;Memory Usage;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: NY Wanrouter;Mistriss.exe;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: NY Wanrouter;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: NY Wanrouter;NetOp Host;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: NY Wanrouter;Remote Admin Server;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: NY Wanrouter;TT Guardian;OK;HARD;1;guardian: Started
[1403499600] CURRENT SERVICE STATE: NY Wanrouter;TT Guardian Control;OK;HARD;1;guardianctrl: Started
[1403499600] CURRENT SERVICE STATE: NY Wanrouter;TT Messaging Router;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: NY Wanrouter;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: Object+ server;C:\ Drive Space;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: Object+ server;CME DropCopy Feed;OK;HARD;1;CMEDROPCOPY: Started
[1403499600] CURRENT SERVICE STATE: Object+ server;CME/CBOT FastFix Feed;OK;HARD;1;FastFixFeed: Started
[1403499600] CURRENT SERVICE STATE: Object+ server;CPU Load;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: Object+ server;ICE2RM;OK;HARD;1;ICE2RM: Started
[1403499600] CURRENT SERVICE STATE: Object+ server;ICE2RMPrice;OK;HARD;1;ICE2RMPrice: Started
[1403499600] CURRENT SERVICE STATE: Object+ server;Memory Usage;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: Object+ server;NSClient++ Version;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
[1403499600] CURRENT SERVICE STATE: Object+ server;NYSELiffeUS2RMPrice;OK;HARD;1;NYSELiffeUS2RMPrice: Started
[1403499600] CURRENT SERVICE STATE: Object+ server;RiskServer;OK;HARD;1;RiskServer: Started
[1403499600] CURRENT SERVICE STATE: Object+ server;Uptime;CRITICAL;HARD;4;CRITICAL - Socket timeout after 10 seconds
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Nagios had socket timeouts for all machines this weekend

Post by slansing »

Are these servers back up now? It sounds like something my have happened to your network routes between nagios and the servers, or the ports that the checks were occurring across. Have you checked to see if anything happened between Nagios and those serves on the network layer? That looks like a whole lot of windows checks, possibly something with the firewalls globally? Or an issue with another application taking over port 12489/5666 at that time?
cs_nagcc
Posts: 17
Joined: Fri Dec 28, 2012 7:35 am

Re: Nagios had socket timeouts for all machines this weekend

Post by cs_nagcc »

I'm running into the same issue, but only for a few of the servers running similar software. I will receive a Socket Timeout error followed by a recovery notification 1-3 minutes later, then repeat. My Nagios installation hasn't changed in over a year and those specific servers haven't changed in longer than that. I'm curious as to why this is just starting to happen now. I haven't heard any complaints from employees using the applications on those servers, so I don't think it's anything related to dropping from the network.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Nagios had socket timeouts for all machines this weekend

Post by abrist »

If nothing has changed, My bet would be a network related problem.
Were other machines on the network experiencing the same issues?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Locked