Page 1 of 2

CPU, Memory and Disk Space resource utilization

Posted: Thu Jan 23, 2020 12:57 pm
by kaushalshriyan
Hi Support Team,

I am using Nagios Core 4.4.5 on CentOS 7.7. I am monitoring 25 servers as of now. Is there a way to find out what is the current resource utilization of all the 25 servers. For example:- Server 1 -> CPU, Memory and Disk Space resource utilization and so on and so forth for the remaining servers. This is an exercise to do Capacity Planning in our Organisation to find out if the capability of current infrastructure (25 servers) is sufficient for the next 6 months to a year.

Thanks in Advance and I look forward to hearing from you.

Best Regards,

Kaushal

Re: CPU, Memory and Disk Space resource utilization

Posted: Thu Jan 23, 2020 7:48 pm
by Box293
Nagios XI comes with a lot of this built in as it comprises of a lot of different technologies.

Most likely you'll extrapolate that data from performance data. I would suggest you look at implementing Influxdb/Grafana. Here's some documentation on how to do that:

https://support.nagios.com/kb/article/n ... u-802.html

Re: CPU, Memory and Disk Space resource utilization

Posted: Fri Jan 24, 2020 1:01 pm
by kaushalshriyan
Thanks Troy Lea for the reply. I will keep you posted as it progresses. Much appreciated for your help

Re: CPU, Memory and Disk Space resource utilization

Posted: Fri Jan 24, 2020 4:35 pm
by cdienger
Sounds good!

Re: CPU, Memory and Disk Space resource utilization

Posted: Wed Jan 29, 2020 10:13 pm
by kaushalshriyan
Hi

Functionally it is working. I am attaching the screenshot. I am running

[img]
[/img]

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

Nagios Core 4.4.5
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2019-08-20
License: GPL

Website: https://www.nagios.org
Reading configuration data...
Read main config file okay...
Read object config files okay...

Running pre-flight check on configuration data...

Checking objects...
Checked 311 services.
Checked 34 hosts.
Checked 1 host groups.
Checked 0 service groups.
Checked 28 contacts.
Checked 9 contact groups.
Checked 39 commands.
Checked 5 time periods.
Checked 0 host escalations.
Checked 0 service escalations.
Checking for circular paths...
Checked 34 hosts
Checked 0 service dependencies
Checked 0 host dependencies
Checked 5 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors: 0

Things look okay - No serious problems were detected during the pre-flight check

Versions :- influxdb-1.7.9-1.x86_64 , grafana-6.5.3-1.x86_64, histou v0.4.3, Nagflux v0.4.1, CentOS Linux release 7.7.1908 (Core), Nagios Core 4.4.5

Please let me know if you need any additional information.

Best Regards,

Kaushal

Re: CPU, Memory and Disk Space resource utilization

Posted: Thu Jan 30, 2020 1:04 am
by kaushalshriyan
Hi Troy,

I see breaks in the graph as per the screenshot attached.

Best Regards,

Kaushal

Re: CPU, Memory and Disk Space resource utilization

Posted: Thu Jan 30, 2020 1:15 am
by kaushalshriyan
Hi Troy,

I am attaching the screenshot again for your reference.
Screenshot 2020-01-30 at 11.43.47 AM.png
Best Regards,

Kaushal

Re: CPU, Memory and Disk Space resource utilization

Posted: Thu Jan 30, 2020 4:39 pm
by Box293
The breaks in your performance data generally mean you are not receiving valid data back from the plugin during these intervals, or perhaps for some reason the performance data is being deleted before it is being processed.

You may want to look at the influxdb logs to see if it is reporting any errors.

Re: CPU, Memory and Disk Space resource utilization

Posted: Fri Jan 31, 2020 12:23 pm
by kaushalshriyan
Hi Troy,

Please find the below details after investigating further.

Code: Select all

[b]systemctl status nagflux.service[/b]
● nagflux.service - A connector which transforms performancedata from Nagios/Icinga(2)/Naemon to InfluxDB/Elasticsearch
   Loaded: loaded (/usr/lib/systemd/system/nagflux.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2020-01-31 04:10:26 UTC; 12h ago
     Docs: https://github.com/Griesbacher/nagflux
 Main PID: 28895 (nagflux)
   CGroup: /system.slice/nagflux.service
           └─28895 /opt/nagflux/nagflux -configPath /opt/nagflux/config.gcfg
Jan 31 17:05:59 ip-172-31-0-145.ap-south-1.compute.internal nagflux[28895]: 2020-01-31 17:05:59 Critical: Connection type is unknown, options are: tcp, file. Input:
Jan 31 17:06:28 ip-172-31-0-145.ap-south-1.compute.internal nagflux[28895]: 2020-01-31 17:06:28 Critical: Connection type is unknown, options are: tcp, file. Input:
Jan 31 17:06:28 ip-172-31-0-145.ap-south-1.compute.internal nagflux[28895]: 2020-01-31 17:06:28 Critical: Connection type is unknown, options are: tcp, file. Input:
Jan 31 17:06:28 ip-172-31-0-145.ap-south-1.compute.internal nagflux[28895]: 2020-01-31 17:06:28 Critical: Connection type is unknown, options are: tcp, file. Input:
Jan 31 17:06:29 ip-172-31-0-145.ap-south-1.compute.internal nagflux[28895]: 2020-01-31 17:06:29 Critical: Connection type is unknown, options are: tcp, file. Input:
Jan 31 17:06:29 ip-172-31-0-145.ap-south-1.compute.internal nagflux[28895]: 2020-01-31 17:06:29 Critical: Connection type is unknown, options are: tcp, file. Input:
Jan 31 17:06:29 ip-172-31-0-145.ap-south-1.compute.internal nagflux[28895]: 2020-01-31 17:06:29 Critical: Connection type is unknown, options are: tcp, file. Input:
Jan 31 17:06:59 ip-172-31-0-145.ap-south-1.compute.internal nagflux[28895]: 2020-01-31 17:06:59 Critical: Connection type is unknown, options are: tcp, file. Input:
Jan 31 17:06:59 ip-172-31-0-145.ap-south-1.compute.internal nagflux[28895]: 2020-01-31 17:06:59 Critical: Connection type is unknown, options are: tcp, file. Input:
Jan 31 17:06:59 ip-172-31-0-145.ap-south-1.compute.internal nagflux[28895]: 2020-01-31 17:06:59 Critical: Connection type is unknown, options are: tcp, file. Input :
cat /opt/nagflux/config.gcfg

Code: Select all

[main]
	NagiosSpoolfileFolder = "/usr/local/nagios/var/spool/nagfluxperfdata"
	NagiosSpoolfileWorker = 1
	InfluxWorker = 2
	MaxInfluxWorker = 5
	DumpFile = "nagflux.dump"
	NagfluxSpoolfileFolder = "/usr/local/nagios/var/nagflux"
	FieldSeparator = "&"
	BufferSize = 10000
	FileBufferSize = 65536
	DefaultTarget = "all"

[Log]
	LogFile = ""
	MinSeverity = "INFO"

[Livestatus]
#        # tcp or file
         Type = "file"
#        # tcp: 127.0.0.1:6557 or file /var/run/live
        file /usr/local/nagios/var/live.sock
#        #Address = "127.0.0.1:6557"
#        # The amount to minutes to wait for livestatus to come up, if set to 0 the detection is disabled
        MinutesToWait = 2
#        # Set the Version of Livestatus. Allowed are Nagios, Icinga2, Naemon.
#        # If left empty Nagflux will try to detect it on it's own, which will not always work.
       Version = ""

[InfluxDBGlobal]
	CreateDatabaseIfNotExists = true
	NastyString = ""
	NastyStringToReplace = ""
	HostcheckAlias = "hostcheck"

[InfluxDB "nagflux"]
	Enabled = true
	Version = 1.0
	Address = "http://127.0.0.1:8086"
	Arguments = "precision=ms&u=root&p=root&db=nagflux"
	StopPullingDataIfDown = true

[InfluxDB "fast"]
	Enabled = false
	Version = 1.0
	Address = "http://127.0.0.1:8086"
	Arguments = "precision=ms&u=root&p=root&db=fast"
	StopPullingDataIfDown = false
Livestatus live socker file is /usr/local/nagios/var/live.sock
srw-rw----. 1 nagios nagios 0 Jan 29 07:20 /usr/local/nagios/var/live.sock

I have enabled the below in /opt/nagflux/config.gcfg. nagflux service does not start at all.

Code: Select all

[Livestatus]
#        # tcp or file
        Type = "file"
#        # tcp: 127.0.0.1:6557 or file /var/run/live
        file /usr/local/nagios/var/live.sock
#        #Address = "127.0.0.1:6557"
#        # The amount to minutes to wait for livestatus to come up, if set to 0 the detection is disabled
        MinutesToWait = 2
#        # Set the Version of Livestatus. Allowed are Nagios, Icinga2, Naemon.
#        # If left empty Nagflux will try to detect it on it's own, which will not always work.
        Version = ""
Nagios Cfg file

Code: Select all

process_performance_data=1
host_perfdata_file=/usr/local/nagios/var/host-perfdata
host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$
host_perfdata_file_mode=a
host_perfdata_file_processing_interval=15
host_perfdata_file_processing_command=process-host-perfdata-file-nagflux
#
service_perfdata_file=/usr/local/nagios/var/host-perfdata
service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$
service_perfdata_file_mode=a
service_perfdata_file_processing_interval=15
service_perfdata_file_processing_command=process-service-perfdata-file-nagflux


#systemctl status nagflux.service
● nagflux.service - A connector which transforms performancedata from Nagios/Icinga(2)/Naemon to InfluxDB/Elasticsearch
Loaded: loaded (/usr/lib/systemd/system/nagflux.service; enabled; vendor preset: disabled)
Active: failed (Result: start-limit) since Fri 2020-01-31 17:10:48 UTC; 3s ago
Docs: https://github.com/Griesbacher/nagflux
Process: 10845 ExecStart=/opt/nagflux/nagflux -configPath /opt/nagflux/config.gcfg (code=exited, status=2)
Main PID: 10845 (code=exited, status=2)

Jan 31 17:10:48 ip-172-31-0-145.ap-south-1.compute.internal nagflux[10845]: main.main()
Jan 31 17:10:48 ip-172-31-0-145.ap-south-1.compute.internal nagflux[10845]: /root/gorepo/src/github.com/griesbacher/nagflux/main.go:68 +0x22e
Jan 31 17:10:48 ip-172-31-0-145.ap-south-1.compute.internal systemd[1]: Unit nagflux.service entered failed state.
Jan 31 17:10:48 ip-172-31-0-145.ap-south-1.compute.internal systemd[1]: nagflux.service failed.
Jan 31 17:10:48 ip-172-31-0-145.ap-south-1.compute.internal systemd[1]: nagflux.service holdoff time over, scheduling restart.
Jan 31 17:10:48 ip-172-31-0-145.ap-south-1.compute.internal systemd[1]: Stopped A connector which transforms performancedata from Nagios/Icinga(2)/Naemon to InfluxDB/Elasticsearch.
Jan 31 17:10:48 ip-172-31-0-145.ap-south-1.compute.internal systemd[1]: start request repeated too quickly for nagflux.service
Jan 31 17:10:48 ip-172-31-0-145.ap-south-1.compute.internal systemd[1]: Failed to start A connector which transforms performancedata from Nagios/Icinga(2)/Naemon to InfluxDB/Elasticsearch.
Jan 31 17:10:48 ip-172-31-0-145.ap-south-1.compute.internal systemd[1]: Unit nagflux.service entered failed state.
Jan 31 17:10:48 ip-172-31-0-145.ap-south-1.compute.internal systemd[1]: nagflux.service failed.

Please suggest further and correct me if I am missing anything. I look forward to hearing from you. Thanks in Advance.

Best Regards,

Kaushal

Re: CPU, Memory and Disk Space resource utilization

Posted: Fri Jan 31, 2020 8:51 pm
by kaushalshriyan
Hi Troy,

Checking in again if you had a chance to look at the post to this forum?

Best Regards,

Kaushal