Files

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

102 lines
3.2 KiB
Markdown
Raw Permalink Normal View History

2019-09-09 05:48:35 -05:00
# Fast up/down checking
2019-09-09 05:48:35 -05:00
Normally, LibreNMS sends an ICMP ping to the device before polling to
check if it is up or down. This check is tied to the poller frequency,
which is normally 5 minutes. This means it may take up to 5 minutes
to find out if a device is down.
2019-09-09 05:48:35 -05:00
Some users may want to know if devices stop responding to ping more
quickly than that. LibreNMS offers a ping.php script to run ping
checks as quickly as possible without increasing snmp load on your
devices by switching to 1 minute polling.
2019-09-09 05:48:35 -05:00
**WARNING**: If you do not have an alert rule that alerts on device
status, enabling this will be a waste of resources. You can find one
in the [Alert Rules
Collection](../Alerting/Rules.md#alert-rules-collection).
2019-09-09 05:48:35 -05:00
## Setting the ping check to 1 minute
2020-11-27 23:34:58 +01:00
1: If you are using [RRDCached](../Extensions/RRDCached.md), stop the service.
2020-05-20 16:09:23 +03:00
- This will flush all pending writes so that the rrdstep.php script can change the steps.
2020-05-20 16:09:23 +03:00
2: Change the ping_rrd_step setting in config.php
!!! setting "poller/rrdtool"
```bash
lnms config:set ping_rrd_step 60
```
2019-09-09 05:48:35 -05:00
2020-05-20 16:09:23 +03:00
3: Update the rrd files to change the step (step is hardcoded at file
2019-09-09 05:48:35 -05:00
creation in rrd files)
2019-09-09 05:48:35 -05:00
```
./scripts/rrdstep.php -h all
```
2020-05-20 16:09:23 +03:00
4: Add the following line to /etc/cron.d/librenms to allow 1 minute
2019-09-09 05:48:35 -05:00
ping checks
```
* * * * * librenms /opt/librenms/ping.php >> /dev/null 2>&1
```
2020-11-27 23:34:58 +01:00
5: If applicable: Start the [RRDCached](../Extensions/RRDCached.md) service
2020-05-20 16:09:23 +03:00
2019-09-09 05:48:35 -05:00
**NOTE**: If you are using distributed pollers you can restrict a
poller to a group by appending `-g` to the cron entry. Alternatively,
you should only run ping.php on a single node.
## Sub minute ping check
2020-08-25 15:14:44 +02:00
Cron only has a resolution of one minute, so for sub-minute ping checks we need to adapt both `ping`
and `alerts` entries. We add two entries per function, but add a delay before one of these entries.
2020-08-25 15:14:44 +02:00
Remember, you need to remove the original `ping.php` and `alerts.php` entries in crontab before
proceeding!
2019-09-09 05:48:35 -05:00
1: Set ping_rrd_step
!!! setting "poller/rrdtool"
```bash
lnms config:set ping_rrd_step 30
```
2019-09-09 05:48:35 -05:00
2: Update the rrd files
```
./scripts/rrdstep.php -h all
```
2019-09-09 05:48:35 -05:00
3: Update cron (removing any other ping.php or alert.php entries)
```
* * * * * librenms /opt/librenms/ping.php >> /dev/null 2>&1
* * * * * librenms sleep 30 && /opt/librenms/ping.php >> /dev/null 2>&1
* * * * * librenms sleep 15 && /opt/librenms/alerts.php >> /dev/null 2>&1
* * * * * librenms sleep 45 && /opt/librenms/alerts.php >> /dev/null 2>&1
```
## Device dependencies
2019-09-09 05:48:35 -05:00
The ping.php script respects device dependencies, but the main poller
does not (for technical reasons). However, using this script does not
disable the icmp check in the poller and a child may be reported as
down before the parent.
## Settings
2020-08-25 15:14:44 +02:00
`ping.php` uses much the same settings as the poller fping with one
2019-09-09 05:48:35 -05:00
exception: retries is used instead of count.
2020-08-25 15:14:44 +02:00
`ping.php` does not measure loss and avg response time, only up/down, so
2019-09-09 05:48:35 -05:00
once a device responds it stops pinging it.
!!! setting "poller/ping"
```bash
lnms config:set fping_options.retries 2
lnms config:set fping_options.timeout 500
lnms config:set fping_options.interval 500
```