Nagios 4.3.4 from EPEL

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
jlom
Posts: 17
Joined: Mon Dec 04, 2017 4:49 pm

Nagios 4.3.4 from EPEL

Post by jlom »

Hi, I hope I can get some help here.

I have installed nagios 4.3.4 from EPEL and i get the error "Unable to get process status" and "Whoops! Error: Could not read host and service status information! "

There are no errors when I check the configuration file, nagios loads clean, no errors in the logs.

I have 4.3.2 running fine which I build from src but I wanted to try the RPM out.

so far i've done the following.
I've checked the logs and config file for errors.
I've reloaded the service and ensured it was running.

What file(s) are responsible for this and how can i ensure they are being accessed properly ?

Thanks.
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Nagios 4.3.4 from EPEL

Post by dwhitfield »

jlom wrote: I have 4.3.2 running fine which I build from src but I wanted to try the RPM out.
Are these both on the same machine? If so, that could be the source of your issues.

How much are you running on the 4.3.4 box? If not very much, in your nagios.cfg, you can set debug_level=-1, restart nagios, and then take a look at /usr/local/nagios/var/nagios.debug (or wherever your debug file is). If you have a lot going on, you'll want to pick from the debug options at https://assets.nagios.com/downloads/nag ... gmain.html

If you don't get anything useful with 0 or 1 set in debug_verbosity=1, go ahead and change it to 2. Again, I wouldn't start with 2 if you have a lot going on on your server as the files are likely to get large. Of course, you can always compress the files before sharing them with us.

As this is your first post, you will not be able to PM me the debug output yet. However, feel free to attach it in the thread if there's no sensitive info in it.

Regardless of the debugging mentioned above, I'd like to see the nagios.log, or at least a tail of it. Again, if you need to wait to PM it (or scrub it), that's fine.
jlom
Posts: 17
Joined: Mon Dec 04, 2017 4:49 pm

Re: Nagios 4.3.4 from EPEL

Post by jlom »

Hey !

I didn't expect such a prompt reply being the post was in moderation.


SO I've attached the debug file, it seems to lead nowhere. I've also tried to check http access /err logs and nothing there either.

I did completely remove the source built nagios from earlier
Attachments
nagios-debug.log
(910.67 KiB) Downloaded 442 times
jlom
Posts: 17
Joined: Mon Dec 04, 2017 4:49 pm

Re: Nagios 4.3.4 from EPEL

Post by jlom »

Here is a tail of my log file for good measure.

Code: Select all

 tail -20 /var/log/nagios/nagios.log 
[1512483503] Event broker module 'NERD' deinitialized successfully.
[1512483503] Nagios 4.3.4 starting... (PID=26757)
[1512483503] Local time is Tue Dec 05 09:18:23 EST 2017
[1512483503] LOG VERSION: 2.0
[1512483503] qh: Socket '/var/spool/nagios/cmd/nagios.qh' successfully initialized
[1512483503] qh: core query handler registered
[1512483503] nerd: Channel hostchecks registered successfully
[1512483503] nerd: Channel servicechecks registered successfully
[1512483503] nerd: Channel opathchecks registered successfully
[1512483503] nerd: Fully initialized and ready to rock!
[1512483503] wproc: Successfully registered manager as @wproc with query handler
[1512483503] wproc: Registry request: name=Core Worker 26762;pid=26762
[1512483503] wproc: Registry request: name=Core Worker 26761;pid=26761
[1512483503] wproc: Registry request: name=Core Worker 26760;pid=26760
[1512483503] wproc: Registry request: name=Core Worker 26759;pid=26759
[1512483503] Successfully launched command file worker with pid 26763
[1512487103] Auto-save of retention data completed successfully.
[1512490703] Auto-save of retention data completed successfully.
[1512494303] Auto-save of retention data completed successfully.
[1512497903] Auto-save of retention data completed successfully.

jlom
Posts: 17
Joined: Mon Dec 04, 2017 4:49 pm

Re: Nagios 4.3.4 from EPEL

Post by jlom »

snipped from an strace I did. Not sure why it seems like the version and status info is working fine here.

Code: Select all

[pid 31167] mmap(0x7f736cfd1000, 6792, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f736cfd1000
[pid 31167] close(16)                   = 0
[pid 31167] mprotect(0x7f736cfcf000, 4096, PROT_READ) = 0
[pid 31167] mprotect(0x7f736d1d7000, 4096, PROT_READ) = 0
[pid 31167] munmap(0x7f736fc25000, 35494) = 0
[pid 31167] socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 16
[pid 31167] connect(16, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("SCRUBBED")}, 16) = 0
[pid 31167] poll([{fd=16, events=POLLOUT}], 1, 0) = 1 ([{fd=16, revents=POLLOUT}])
[pid 31167] sendto(16, "\334\241\1\0\0\1\0\0\0\0\0\0\3api\6nagios\3org\0\0\1\0\1", 32, MSG_NOSIGNAL, NULL, 0) = 32
[pid 31167] poll([{fd=16, events=POLLIN}], 1, 5000) = 1 ([{fd=16, revents=POLLIN}])
[pid 31167] ioctl(16, FIONREAD, [66])   = 0
[pid 31167] recvfrom(16, "\334\241\201\200\0\1\0\2\0\0\0\0\3api\6nagios\3org\0\0\1\0\1\300\f\0\5\0\1\0\0\343\340\0\6\3vs1\300\20\300,\0\1\0\1\0\0[l\0\4`~~\237", 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("SCRUBBED")}, [16]) = 66
[pid 31167] close(16)                   = 0
[pid 31167] socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 16
[pid 31167] fcntl(16, F_GETFL)          = 0x2 (flags O_RDWR)
[pid 31167] fcntl(16, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 31167] connect(16, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("96.126.126.159")}, 16) = -1 EINPROGRESS (Operation now in progress)
[pid 31167] select(17, [16], [16], NULL, {2, 0}) = 1 (out [16], left {1, 961502})
[pid 31167] getsockopt(16, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
[pid 31167] select(17, NULL, [16], NULL, {2, 0}) = 1 (out [16], left {1, 999998})
[pid 31167] sendto(16, "POST /versioncheck/ HTTP/1.0\r\nUser-Agent: Nagios/4.3.4\r\nConnection: close\r\nHost: api.nagios.org\r\nContent-Type: application/x-www-form-urlencoded\r\nContent-Length: 72\r\n\r\nv=1&product=nagios&tinycheck=1&stableonly=1&uid=1512420542&version=4.3.4", 240, 0, NULL, 0) = 240
[pid 31167] select(17, [16], NULL, NULL, {2, 0}) = 1 (in [16], left {1, 890241})
[pid 31167] recvfrom(16, "HTTP/1.1 200 OK\r\nDate: Tue, 05 Dec 2017 18:58:21 GMT\r\nServer: Apache\r\nVary: Accept-Encoding\r\nContent-Length: 95\r\nConnection: close\r\nContent-Type: text/plain; charset=UTF-8\r\n\r\nUPDATE_AVAILABLE=0\nUPDATE_VERSION=4.3.4\nUPDATE_RELEASEDATE=2017-08-24\nPRODUCT_NAM"..., 1024, 0, NULL, NULL) = 270
[pid 31167] select(17, [16], NULL, NULL, {2, 0}) = 1 (in [16], left {1, 999960})
[pid 31167] recvfrom(16, "", 754, 0, NULL, NULL) = 0
[pid 31167] close(16)                   = 0
[pid 31167] write(10, "[1512500301.765769] [001.0] [pid=31167] schedule_new_event()\n", 61) = 61
[pid 31167] lseek(10, 0, SEEK_CUR)      = 575099
[pid 31167] write(10, "[1512500301.765891] [008.0] [pid=31167] New Event Details:\n", 59) = 59
[pid 31167] lseek(10, 0, SEEK_CUR)      = 575158
[pid 31167] write(10, "[1512500301.765974] [008.0] [pid=31167]  Type:                       EVENT_CHECK_PROGRAM_UPDATE\n", 96) = 96
[pid 31167] lseek(10, 0, SEEK_CUR)      = 575254
[pid 31167] write(10, "[1512500301.766056] [008.0] [pid=31167]  High Priority:              Yes\n", 73) = 73
[pid 31167] lseek(10, 0, SEEK_CUR)      = 575327
[pid 31167] write(10, "[1512500301.766138] [008.0] [pid=31167]  Run Time:                   12-06-2017 12:48:51\n", 89) = 89
[pid 31167] lseek(10, 0, SEEK_CUR)      = 575416
[pid 31167] write(10, "[1512500301.766220] [008.0] [pid=31167]  Recurring:                  No\n", 72) = 72
[pid 31167] lseek(10, 0, SEEK_CUR)      = 575488
[pid 31167] write(10, "[1512500301.766302] [008.0] [pid=31167]  Event Interval:             79200\n", 75) = 75
[pid 31167] lseek(10, 0, SEEK_CUR)      = 575563
[pid 31167] write(10, "[1512500301.766383] [008.0] [pid=31167]  Compensate for Time Change: Yes\n", 73) = 73
[pid 31167] lseek(10, 0, SEEK_CUR)      = 575636
[pid 31167] write(10, "[1512500301.766463] [008.0] [pid=31167]  Event Options:              0\n", 71) = 71
[pid 31167] lseek(10, 0, SEEK_CUR)      = 575707
[pid 31167] write(10, "[1512500301.766660] [008.0] [pid=31167]  Event ID:                   0x8c1690\n", 78) = 78
[pid 31167] lseek(10, 0, SEEK_CUR)      = 575785
[pid 31167] write(10, "[1512500301.766744] [001.0] [pid=31167] add_event()\n", 52) = 52
[pid 31167] lseek(10, 0, SEEK_CUR)      = 575837
[pid 31167] write(10, "[1512500301.766829] [064.1] [pid=31167] Making callbacks (type 18)...\n", 70) = 70
[pid 31167] lseek(10, 0, SEEK_CUR)      = 575907
[pid 31167] write(10, "[1512500301.766911] [001.0] [pid=31167] save_status_data()\n", 59) = 59
[pid 31167] lseek(10, 0, SEEK_CUR)      = 575966
[pid 31167] open("/var/spool/nagios/nagios.tmpvzMcoE", O_RDWR|O_CREAT|O_EXCL, 0600) = 16
[pid 31167] fcntl(16, F_GETFL)          = 0x8002 (flags O_RDWR|O_LARGEFILE)
[pid 31167] fstat(16, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
[pid 31167] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f736fd5b000
[pid 31167] lseek(16, 0, SEEK_CUR)      = 0
[pid 31167] write(16, "########################################\n#          NAGIOS STATUS FILE\n#\n# THIS FILE IS AUTOMATICALLY GENERATED\n# BY NAGIOS.  DO NOT MODIFY THIS FILE!\n########################################\n\ninfo {\n\tcreated=1512500301\n\tversion=4.3.4\n\tlast_update_check=15"..., 4096) = 4096
[pid 31167] write(16, "ent Users\n\tmodified_attributes=0\n\tcheck_command=check_local_users!20!50\n\tcheck_period=24x7\n\tnotification_period=24x7\n\timportance=0\n\tcheck_interval=5.000000\n\tretry_interval=1.000000\n\tevent_handler=\n\thas_been_checked=1\n\tshould_be_scheduled=1\n\tcheck_execution"..., 4096) = 4096
[pid 31167] write(16, "\timportance=0\n\tcheck_interval=5.000000\n\tretry_interval=1.000000\n\tevent_handler=\n\thas_been_checked=1\n\tshould_be_scheduled=1\n\tcheck_execution_time=0.004\n\tcheck_latency=0.000\n\tcheck_type=0\n\tcurrent_state=0\n\tlast_hard_state=0\n\tlast_event_id=0\n\tcurrent_event_id"..., 4096) = 4096
jlom
Posts: 17
Joined: Mon Dec 04, 2017 4:49 pm

Re: Nagios 4.3.4 from EPEL

Post by jlom »

ANy Ideas?

In the meantime I'm spinning up a fresh box to see if that helps.
jlom
Posts: 17
Joined: Mon Dec 04, 2017 4:49 pm

Re: Nagios 4.3.4 from EPEL

Post by jlom »

Fresh install, still no luck with the RPM. Same errors.

This is pretty frustrating.

Is there any way I can expose the file(s) that can't be accessed ? I'm seeing nothing in the logs or debug.
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Nagios 4.3.4 from EPEL

Post by dwhitfield »

Is there error happening in the web UI? Can you provide a screenshot? We had a similar issue, but it sounds a bit different. Before I start comparing the two issues, I want to make sure I understand what is going on.

If this isn't in the UI, where is it? You say it's not in a log, so again, I just want to be sure I'm understanding the issue.
jlom
Posts: 17
Joined: Mon Dec 04, 2017 4:49 pm

Re: Nagios 4.3.4 from EPEL

Post by jlom »

Hey,

I actually fixed this last night. Status.dat from the RPM goes to /var/log/ and it's not reachable. I moved it to /var/nagios/ and then made it readable by the nagios user edited nagios.cf and the init script to reflect that change and restarted the process.

To clarify, the UI was not able to detect the process and show host /check status.

I fixed it via process of elimination. Where could I look in the future to find this error somewhere? Is it logged at all ?
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Nagios 4.3.4 from EPEL

Post by dwhitfield »

I think this might have showed up in the nagios.log if you'd have sent more. Can you send the complete nagios.log archive from one of the days where this happened?

Feel free to request some additional logging or to make the error message more useful at https://github.com/NagiosEnterprises/na ... issues/new

The first message just appears in a cgi that doesn't appear to exist in 4.2.4, which is what we use in XI, so it's an unfamiliar error message. The second one shows up in more places, but without the whoops, so that's why I didn't find it initially. There may be some clues at https://github.com/NagiosEnterprises/na ... 1%22&type= but I'm not seeing anything at the moment.
Locked