NRPE agent failed on multiple Solaris servers
Posted: Mon Jun 17, 2019 6:55 am
We are running NRPE Version: 3.2.1 on hundreds of SPARC Solaris LDOM's (5.11 11.4.7.5.0 sun4v sparc sun4v). These are grouped into different LDOM's on the hardware. One set of hardware (7 agents on different LDOM's) had every single NRPE agent fail at 5:52:40AM on Sunday with the following errors.
Researching what "fork() failed with error 12, bailing out" means, I am seeing multiple possible meanings. Can anyone tell me what may have caused this error or point me to were I could find this info?
Researching what "fork() failed with error 12, bailing out" means, I am seeing multiple possible meanings. Can anyone tell me what may have caused this error or point me to were I could find this info?
Code: Select all
Jun 16 05:52:40 SERVERNAME nrpe[17986]: [ID 702911 daemon.error] fork() failed with error 12, bailing out...
Jun 16 05:52:54 SERVERNAME nrpe[19611]: [ID 702911 daemon.notice] Starting up daemon
Jun 16 05:52:54 SERVERNAME nrpe[19611]: [ID 702911 daemon.notice] Warning: Daemon is configured to accept command arguments from clients!
Jun 16 05:53:54 SERVERNAME nrpe[19611]: [ID 702911 daemon.error] fork() failed with error 12, bailing out...
Jun 16 05:53:54 SERVERNAME nrpe[19672]: [ID 702911 daemon.notice] Starting up daemon
Jun 16 05:53:54 SERVERNAME nrpe[19672]: [ID 702911 daemon.notice] Warning: Daemon is configured to accept command arguments from clients!
Jun 16 05:54:40 SERVERNAME nrpe[19672]: [ID 702911 daemon.error] fork() failed with error 12, bailing out...
Jun 16 05:54:40 SERVERNAME nrpe[21421]: [ID 702911 daemon.notice] Starting up daemon
Jun 16 05:54:40 SERVERNAME nrpe[21421]: [ID 702911 daemon.notice] Warning: Daemon is configured to accept command arguments from clients!
Jun 16 05:56:53 SERVERNAME nrpe[21421]: [ID 702911 daemon.error] fork() failed with error 12, bailing out...
Jun 16 05:56:53 SERVERNAME nrpe[24766]: [ID 702911 daemon.error] Error: (!log_opts) Could not complete SSL handshake with 10.201.252.16: 1
Jun 16 05:56:53 SERVERNAME nrpe[24902]: [ID 702911 daemon.notice] Starting up daemon
Jun 16 05:56:53 SERVERNAME nrpe[24902]: [ID 702911 daemon.notice] Warning: Daemon is configured to accept command arguments from clients!
Jun 16 05:58:44 SERVERNAME nrpe[24902]: [ID 702911 daemon.error] fork() failed with error 12, bailing out...
Jun 16 05:58:45 SERVERNAME svc.startd[19762]: [ID 652011 daemon.warning] svc:/system/sstore:default: Method "/lib/svc/method/svc-sstore start" failed with exit status 1.
Jun 16 05:58:46 SERVERNAME svc.startd[19762]: [ID 748625 daemon.error] network/nagios/nrpe:default failed repeatedly: transitioned to maintenance (see 'svcs -xv' for details)
Jun 16 05:58:47 SERVERNAME fmd: [ID 377184 daemon.error] SUNW-MSG-ID: SMF-8000-YX, TYPE: Defect, VER: 1, SEVERITY: Major
Jun 16 05:58:47 SERVERNAME EVENT-TIME: Sun Jun 16 05:58:47 EDT 2019
Jun 16 05:58:47 SERVERNAME PLATFORM: unknown, CSN: unknown, HOSTNAME: SERVERNAME
Jun 16 05:58:47 SERVERNAME SOURCE: software-diagnosis, REV: 0.2
Jun 16 05:58:47 SERVERNAME EVENT-ID: 27556f62-faf6-4dfa-b825-c6ae8b97b76f
Jun 16 05:58:47 SERVERNAME DESC: Service svc:/network/nagios/nrpe:default failed - the instance is restarting too quickly.
Jun 16 05:58:47 SERVERNAME AUTO-RESPONSE: The service has been placed into the maintenance state.
Jun 16 05:58:47 SERVERNAME IMPACT: svc:/network/nagios/nrpe:default is unavailable.
Jun 16 05:58:47 SERVERNAME REC-ACTION: Run 'svcs -xv svc:/network/nagios/nrpe:default' to determine the generic reason why the service failed, the location of any logfiles, and a list of other services impacted. Please refer to the associated reference document at http://support.oracle.com/msg/SMF-8000-YX for the latest service procedures and policies regarding this diagnosis.