Nagios Support Forum

Posted: **Thu Jan 23, 2020 12:57 pm**

Hi Support Team,

I am using Nagios Core 4.4.5 on CentOS 7.7. I am monitoring 25 servers as of now. Is there a way to find out what is the current resource utilization of all the 25 servers. For example:- Server 1 -> CPU, Memory and Disk Space resource utilization and so on and so forth for the remaining servers. This is an exercise to do Capacity Planning in our Organisation to find out if the capability of current infrastructure (25 servers) is sufficient for the next 6 months to a year.

Thanks in Advance and I look forward to hearing from you.

Best Regards,

Kaushal

Posted: **Thu Jan 23, 2020 7:48 pm**

Nagios XI comes with a lot of this built in as it comprises of a lot of different technologies.

Most likely you'll extrapolate that data from performance data. I would suggest you look at implementing Influxdb/Grafana. Here's some documentation on how to do that:

https://support.nagios.com/kb/article/n ... u-802.html

Posted: **Fri Jan 24, 2020 1:01 pm**

Thanks Troy Lea for the reply. I will keep you posted as it progresses. Much appreciated for your help

Posted: **Fri Jan 24, 2020 4:35 pm**

Sounds good!

Posted: **Wed Jan 29, 2020 10:13 pm**

Hi

Functionally it is working. I am attaching the screenshot. I am running

[img]
[/img]

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Nagios Core 4.4.5
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2019-08-20
License: GPL

Website: https://www.nagios.org
Reading configuration data...
Read main config file okay...
Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...
Checked 311 services.
Checked 34 hosts.
Checked 1 host groups.
Checked 0 service groups.
Checked 28 contacts.
Checked 9 contact groups.
Checked 39 commands.
Checked 5 time periods.
Checked 0 host escalations.
Checked 0 service escalations.
Checking for circular paths...
Checked 34 hosts
Checked 0 service dependencies
Checked 0 host dependencies
Checked 5 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors: 0

Things look okay - No serious problems were detected during the pre-flight check

Versions :- influxdb-1.7.9-1.x86_64 , grafana-6.5.3-1.x86_64, histou v0.4.3, Nagflux v0.4.1, CentOS Linux release 7.7.1908 (Core), Nagios Core 4.4.5

Please let me know if you need any additional information.

Best Regards,

Kaushal

Posted: **Thu Jan 30, 2020 1:04 am**

Hi Troy,

I see breaks in the graph as per the screenshot attached.

Best Regards,

Kaushal

Posted: **Thu Jan 30, 2020 1:15 am**

Hi Troy,

I am attaching the screenshot again for your reference.

Best Regards,

Kaushal

Posted: **Thu Jan 30, 2020 4:39 pm**

The breaks in your performance data generally mean you are not receiving valid data back from the plugin during these intervals, or perhaps for some reason the performance data is being deleted before it is being processed.

You may want to look at the influxdb logs to see if it is reporting any errors.

Posted: **Fri Jan 31, 2020 12:23 pm**

Hi Troy,

Please find the below details after investigating further.

Code: Select all

[b]systemctl status nagflux.service[/b]
● nagflux.service - A connector which transforms performancedata from Nagios/Icinga(2)/Naemon to InfluxDB/Elasticsearch
   Loaded: loaded (/usr/lib/systemd/system/nagflux.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2020-01-31 04:10:26 UTC; 12h ago
     Docs: https://github.com/Griesbacher/nagflux
 Main PID: 28895 (nagflux)
   CGroup: /system.slice/nagflux.service
           └─28895 /opt/nagflux/nagflux -configPath /opt/nagflux/config.gcfg

Jan 31 17:05:59 ip-172-31-0-145.ap-south-1.compute.internal nagflux[28895]: 2020-01-31 17:05:59 Critical: Connection type is unknown, options are: tcp, file. Input:
Jan 31 17:06:28 ip-172-31-0-145.ap-south-1.compute.internal nagflux[28895]: 2020-01-31 17:06:28 Critical: Connection type is unknown, options are: tcp, file. Input:
Jan 31 17:06:28 ip-172-31-0-145.ap-south-1.compute.internal nagflux[28895]: 2020-01-31 17:06:28 Critical: Connection type is unknown, options are: tcp, file. Input:
Jan 31 17:06:28 ip-172-31-0-145.ap-south-1.compute.internal nagflux[28895]: 2020-01-31 17:06:28 Critical: Connection type is unknown, options are: tcp, file. Input:
Jan 31 17:06:29 ip-172-31-0-145.ap-south-1.compute.internal nagflux[28895]: 2020-01-31 17:06:29 Critical: Connection type is unknown, options are: tcp, file. Input:
Jan 31 17:06:29 ip-172-31-0-145.ap-south-1.compute.internal nagflux[28895]: 2020-01-31 17:06:29 Critical: Connection type is unknown, options are: tcp, file. Input:
Jan 31 17:06:29 ip-172-31-0-145.ap-south-1.compute.internal nagflux[28895]: 2020-01-31 17:06:29 Critical: Connection type is unknown, options are: tcp, file. Input:
Jan 31 17:06:59 ip-172-31-0-145.ap-south-1.compute.internal nagflux[28895]: 2020-01-31 17:06:59 Critical: Connection type is unknown, options are: tcp, file. Input:
Jan 31 17:06:59 ip-172-31-0-145.ap-south-1.compute.internal nagflux[28895]: 2020-01-31 17:06:59 Critical: Connection type is unknown, options are: tcp, file. Input:
Jan 31 17:06:59 ip-172-31-0-145.ap-south-1.compute.internal nagflux[28895]: 2020-01-31 17:06:59 Critical: Connection type is unknown, options are: tcp, file. Input :

cat /opt/nagflux/config.gcfg

Code: Select all

[main]
	NagiosSpoolfileFolder = "/usr/local/nagios/var/spool/nagfluxperfdata"
	NagiosSpoolfileWorker = 1
	InfluxWorker = 2
	MaxInfluxWorker = 5
	DumpFile = "nagflux.dump"
	NagfluxSpoolfileFolder = "/usr/local/nagios/var/nagflux"
	FieldSeparator = "&"
	BufferSize = 10000
	FileBufferSize = 65536
	DefaultTarget = "all"

[Log]
	LogFile = ""
	MinSeverity = "INFO"

[Livestatus]
#        # tcp or file
         Type = "file"
#        # tcp: 127.0.0.1:6557 or file /var/run/live
        file /usr/local/nagios/var/live.sock
#        #Address = "127.0.0.1:6557"
#        # The amount to minutes to wait for livestatus to come up, if set to 0 the detection is disabled
        MinutesToWait = 2
#        # Set the Version of Livestatus. Allowed are Nagios, Icinga2, Naemon.
#        # If left empty Nagflux will try to detect it on it's own, which will not always work.
       Version = ""

[InfluxDBGlobal]
	CreateDatabaseIfNotExists = true
	NastyString = ""
	NastyStringToReplace = ""
	HostcheckAlias = "hostcheck"

[InfluxDB "nagflux"]
	Enabled = true
	Version = 1.0
	Address = "http://127.0.0.1:8086"
	Arguments = "precision=ms&u=root&p=root&db=nagflux"
	StopPullingDataIfDown = true

[InfluxDB "fast"]
	Enabled = false
	Version = 1.0
	Address = "http://127.0.0.1:8086"
	Arguments = "precision=ms&u=root&p=root&db=fast"
	StopPullingDataIfDown = false

Livestatus live socker file is /usr/local/nagios/var/live.sock
srw-rw----. 1 nagios nagios 0 Jan 29 07:20 /usr/local/nagios/var/live.sock

I have enabled the below in /opt/nagflux/config.gcfg. nagflux service does not start at all.

Code: Select all

[Livestatus]
#        # tcp or file
        Type = "file"
#        # tcp: 127.0.0.1:6557 or file /var/run/live
        file /usr/local/nagios/var/live.sock
#        #Address = "127.0.0.1:6557"
#        # The amount to minutes to wait for livestatus to come up, if set to 0 the detection is disabled
        MinutesToWait = 2
#        # Set the Version of Livestatus. Allowed are Nagios, Icinga2, Naemon.
#        # If left empty Nagflux will try to detect it on it's own, which will not always work.
        Version = ""

Nagios Cfg file

Code: Select all

process_performance_data=1
host_perfdata_file=/usr/local/nagios/var/host-perfdata
host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$
host_perfdata_file_mode=a
host_perfdata_file_processing_interval=15
host_perfdata_file_processing_command=process-host-perfdata-file-nagflux
#
service_perfdata_file=/usr/local/nagios/var/host-perfdata
service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$
service_perfdata_file_mode=a
service_perfdata_file_processing_interval=15
service_perfdata_file_processing_command=process-service-perfdata-file-nagflux

#systemctl status nagflux.service

● nagflux.service - A connector which transforms performancedata from Nagios/Icinga(2)/Naemon to InfluxDB/Elasticsearch
Loaded: loaded (/usr/lib/systemd/system/nagflux.service; enabled; vendor preset: disabled)
Active: failed (Result: start-limit) since Fri 2020-01-31 17:10:48 UTC; 3s ago
Docs: https://github.com/Griesbacher/nagflux
Process: 10845 ExecStart=/opt/nagflux/nagflux -configPath /opt/nagflux/config.gcfg (code=exited, status=2)
Main PID: 10845 (code=exited, status=2)

Jan 31 17:10:48 ip-172-31-0-145.ap-south-1.compute.internal nagflux[10845]: main.main()
Jan 31 17:10:48 ip-172-31-0-145.ap-south-1.compute.internal nagflux[10845]: /root/gorepo/src/github.com/griesbacher/nagflux/main.go:68 +0x22e
Jan 31 17:10:48 ip-172-31-0-145.ap-south-1.compute.internal systemd[1]: Unit nagflux.service entered failed state.
Jan 31 17:10:48 ip-172-31-0-145.ap-south-1.compute.internal systemd[1]: nagflux.service failed.
Jan 31 17:10:48 ip-172-31-0-145.ap-south-1.compute.internal systemd[1]: nagflux.service holdoff time over, scheduling restart.
Jan 31 17:10:48 ip-172-31-0-145.ap-south-1.compute.internal systemd[1]: Stopped A connector which transforms performancedata from Nagios/Icinga(2)/Naemon to InfluxDB/Elasticsearch.
Jan 31 17:10:48 ip-172-31-0-145.ap-south-1.compute.internal systemd[1]: start request repeated too quickly for nagflux.service
Jan 31 17:10:48 ip-172-31-0-145.ap-south-1.compute.internal systemd[1]: Failed to start A connector which transforms performancedata from Nagios/Icinga(2)/Naemon to InfluxDB/Elasticsearch.
Jan 31 17:10:48 ip-172-31-0-145.ap-south-1.compute.internal systemd[1]: Unit nagflux.service entered failed state.
Jan 31 17:10:48 ip-172-31-0-145.ap-south-1.compute.internal systemd[1]: nagflux.service failed.

Please suggest further and correct me if I am missing anything. I look forward to hearing from you. Thanks in Advance.

Best Regards,

Kaushal

Posted: **Fri Jan 31, 2020 8:51 pm**

Hi Troy,

Checking in again if you had a chance to look at the post to this forum?

Best Regards,

Kaushal

Nagios Support Forum

CPU, Memory and Disk Space resource utilization

CPU, Memory and Disk Space resource utilization

Re: CPU, Memory and Disk Space resource utilization

Re: CPU, Memory and Disk Space resource utilization

Re: CPU, Memory and Disk Space resource utilization

Re: CPU, Memory and Disk Space resource utilization

Re: CPU, Memory and Disk Space resource utilization

Re: CPU, Memory and Disk Space resource utilization

Re: CPU, Memory and Disk Space resource utilization

Re: CPU, Memory and Disk Space resource utilization

Re: CPU, Memory and Disk Space resource utilization