1
0
mirror of https://github.com/dennypage/dpinger.git synced 2024-05-19 06:50:01 +00:00

14 Commits
v3.0 ... master

Author SHA1 Message Date
dennypage
664f5c7aa6 Use dpinger defaults for send_interval and time_period 2023-11-08 10:40:52 -08:00
Denny Page
0e963753e1 Use gcc by default 2023-11-08 10:22:58 -08:00
Denny Page
e3cb41889e Add explicit cast for assignment of alarm_hold_periods 2023-11-08 10:08:07 -08:00
dennypage
9c31ea4380 Update default parameters and correct loss calculation
Update parameters to reflect existing defaults and correct the loss resolution calculation when using subsecond send intervals.
2023-11-08 08:29:46 -08:00
Denny Page
fff9b65eb5 Correct usage note regarding loss resolution with subsecond send intervals 2023-11-08 08:07:39 -08:00
Denny Page
47f1a778b9 Correct usage note regarding parameters to the alert_cmd 2023-11-06 15:40:46 -08:00
Denny Page
ce7d88bddf Update version to 3.3 2023-01-18 19:17:16 -08:00
Denny Page
67b8ba1f6d Add option to explicitly control the hold time for alarms. 2023-01-18 18:24:20 -08:00
Denny Page
c845c582b4 Add examples for dpinger logging/monitoring with InfluxDB and Grafana 2022-05-14 14:50:40 -07:00
dennypage
fbc7e8f87f Update copyright year 2022-03-01 08:21:24 -08:00
Denny Page
efc17c7204 Log signal number on exit 2022-02-28 10:07:30 -08:00
Denny Page
bc00923f62 Update text formatting to match current GitHub format. No change to actual license. 2020-06-07 13:29:36 -07:00
Denny Page
bf18a6e2a8 Update copyright 2020-06-07 13:23:33 -07:00
Denny Page
cee7ac9da0 Add a version number to usage output 2017-12-08 21:37:23 -08:00
8 changed files with 620 additions and 44 deletions

13
LICENSE
View File

@@ -1,15 +1,15 @@
Copyright (c) 2015-2017, Denny Page
Copyright (c) 2015-2022, Denny Page
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
@@ -21,4 +21,3 @@ SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

View File

@@ -1,8 +1,8 @@
#CC=gcc
#WARNINGS=-Wall -Wextra -Wformat=2 -Wno-unused-result
CC=gcc
WARNINGS=-Wall -Wextra -Wformat=2 -Wno-unused-result
CC=clang
WARNINGS=-Weverything -Wno-padded -Wno-disabled-macro-expansion -Wno-reserved-id-macro
#CC=clang
#WARNINGS=-Weverything -Wno-unsafe-buffer-usage -Wno-cast-function-type-strict -Wno-padded -Wno-disabled-macro-expansion -Wno-reserved-id-macro
CFLAGS=${WARNINGS} -pthread -g -O2

View File

@@ -4,10 +4,10 @@ In general, dpinger works a bit differently than other latency monitors. Rather
When the alert check is made, or a report is generated, dpinger goes through the array and examines each echo request. If a reply has been received, it is used as part of the overall latency calculation. If a reply has not yet been received, the amount of time since the request is compared against the loss interval. If it is greater than the loss interval, the request/reply is counted as lost in the current report. However the concept of the request/reply being lost is not a permanent decision. In subsequent reports, if a the missing reply has been received, its latency will be used instead of being counted as lost.
It's important to keep in mind that latency and loss are reported as averages across the entire request set. The default time period for dpinger is 30 seconds, with an echo request being sent every 250 milliseconds. This means that the latency and loss will be reported as averages across 115-120 samples. The alert check runs every second by default. So each time, the 4 oldest entries in the set have been replaced by the 4 newest ones.
It's important to keep in mind that latency and loss are reported as averages across the entire request set. The default time period for dpinger is 60 seconds, with an echo request being sent every 500 milliseconds. This means that the latency and loss will be reported as averages across 116-120 samples. The alert check runs every second by default. So each time, the 4 oldest entries in the set have been replaced by the 4 newest ones.
Note that if you want accurate loss reporting, it is important that the number of samples be sufficient. In order to achieve 1% loss resolution, you have need more than 100 samples in the set. The calculation for loss resolution is:
100 * send_interval / (time_period - loss_interval)
100 / ((time_period - loss_interval) / send_interval)
The default settings for dpinger report loss with an accuracy of 0.87%.

View File

@@ -1,6 +1,6 @@
//
// Copyright (c) 2015-2017, Denny Page
// Copyright (c) 2015-2023, Denny Page
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
@@ -104,8 +104,9 @@ static unsigned long loss_alarm_threshold_percent = 0;
static char * alert_cmd = NULL;
static size_t alert_cmd_offset;
// Number of periods to wait to declare an alarm as cleared
#define ALARM_DECAY_PERIODS 10
// Interval before an alarm is cleared (hold time)
static unsigned long alarm_hold_msec = 0;
#define DEFAULT_HOLD_PERIODS 10
// Report file
static const char * report_name = NULL;
@@ -190,25 +191,6 @@ static uint16_t sequence_limit;
// Receive thread ready
static unsigned int recv_ready = 0;
//
// Termination handler
//
__attribute__ ((noreturn))
static void
term_handler(void)
{
// NB: This function may be simultaneously invoked by multiple threads
if (usocket_name)
{
(void) unlink(usocket_name);
}
if (pidfile_name)
{
(void) unlink(pidfile_name);
}
exit(0);
}
//
// Log for abnormal events
@@ -234,6 +216,28 @@ logger(
}
//
// Termination handler
//
__attribute__ ((noreturn))
static void
term_handler(
int signum)
{
// NB: This function may be simultaneously invoked by multiple threads
if (usocket_name)
{
(void) unlink(usocket_name);
}
if (pidfile_name)
{
(void) unlink(pidfile_name);
}
logger("exiting on signal %d\n", signum);
exit(0);
}
//
// Compute checksum for ICMP
//
@@ -590,6 +594,7 @@ alert_thread(
unsigned long average_latency_usec;
unsigned long latency_deviation;
unsigned long average_loss_percent;
unsigned int alarm_hold_periods;
unsigned int latency_alarm_decay = 0;
unsigned int loss_alarm_decay = 0;
unsigned int alert = 0;
@@ -600,6 +605,9 @@ alert_thread(
sleeptime.tv_sec = alert_interval_msec / 1000;
sleeptime.tv_nsec = (alert_interval_msec % 1000) * 1000000;
// Set number of alarm hold periods
alarm_hold_periods = (unsigned int) ((alarm_hold_msec + alert_interval_msec - 1) / alert_interval_msec);
while (1)
{
r = nanosleep(&sleeptime, NULL);
@@ -619,7 +627,7 @@ alert_thread(
alert = 1;
}
latency_alarm_decay = ALARM_DECAY_PERIODS;
latency_alarm_decay = alarm_hold_periods;
}
else if (latency_alarm_decay)
{
@@ -640,7 +648,7 @@ alert_thread(
alert = 1;
}
loss_alarm_decay = ALARM_DECAY_PERIODS;
loss_alarm_decay = alarm_hold_periods;
}
else if (loss_alarm_decay)
{
@@ -840,8 +848,9 @@ get_length_arg(
static void
usage(void)
{
fprintf(stderr, "Dpinger version 3.3\n\n");
fprintf(stderr, "Usage:\n");
fprintf(stderr, " %s [-f] [-R] [-S] [-P] [-B bind_addr] [-s send_interval] [-l loss_interval] [-t time_period] [-r report_interval] [-d data_length] [-o output_file] [-A alert_interval] [-D latency_alarm] [-L loss_alarm] [-C alert_cmd] [-i identifier] [-u usocket] [-p pidfile] dest_addr\n\n", progname);
fprintf(stderr, " %s [-f] [-R] [-S] [-P] [-B bind_addr] [-s send_interval] [-l loss_interval] [-t time_period] [-r report_interval] [-d data_length] [-o output_file] [-A alert_interval] [-D latency_alarm] [-L loss_alarm] [-H hold_interval] [-C alert_cmd] [-i identifier] [-u usocket] [-p pidfile] dest_addr\n\n", progname);
fprintf(stderr, " options:\n");
fprintf(stderr, " -f run in foreground\n");
fprintf(stderr, " -R rewind output file between reports\n");
@@ -857,6 +866,7 @@ usage(void)
fprintf(stderr, " -A time interval between alerts (default 1s)\n");
fprintf(stderr, " -D time threshold for latency alarm (default none)\n");
fprintf(stderr, " -L percent threshold for loss alarm (default none)\n");
fprintf(stderr, " -H time interval to hold an alarm before clearing it (default 10x alert interval)\n");
fprintf(stderr, " -C optional command to be invoked via system() for alerts\n");
fprintf(stderr, " -i identifier text to include in output\n");
fprintf(stderr, " -u unix socket name for polling\n");
@@ -868,10 +878,11 @@ usage(void)
fprintf(stderr, " the output format is \"latency_avg latency_stddev loss_pct\"\n");
fprintf(stderr, " latency values are output in microseconds\n");
fprintf(stderr, " loss percentage is reported in whole numbers of 0-100\n");
fprintf(stderr, " resolution of loss calculation is: 100 * send_interval / (time_period - loss_interval)\n\n");
fprintf(stderr, " the alert_cmd is invoked as \"alert_cmd dest_addr alarm_flag latency_avg loss_avg\"\n");
fprintf(stderr, " resolution of loss calculation is: 100 / ((time_period - loss_interval) / send_interval)\n\n");
fprintf(stderr, " the alert_cmd is invoked as \"alert_cmd dest_addr alarm_flag latency_avg latency_stddev loss_pct\"\n");
fprintf(stderr, " alarm_flag is set to 1 if either latency or loss is in alarm state\n");
fprintf(stderr, " alarm_flag will return to 0 when both have have cleared alarm state\n\n");
fprintf(stderr, " alarm_flag will return to 0 when both have have cleared alarm state\n");
fprintf(stderr, " alarm hold time begins when the source of the alarm retruns to normal\n\n");
}
@@ -912,7 +923,7 @@ parse_args(
progname = argv[0];
while((opt = getopt(argc, argv, "fRSPB:s:l:t:r:d:o:A:D:L:C:i:u:p:")) != -1)
while((opt = getopt(argc, argv, "fRSPB:s:l:t:r:d:o:A:D:L:H:C:i:u:p:")) != -1)
{
switch (opt)
{
@@ -1005,6 +1016,14 @@ parse_args(
}
break;
case 'H':
r = get_time_arg_msec(optarg, &alarm_hold_msec);
if (r)
{
fatal("invalid alarm hold interval %s\n", optarg);
}
break;
case 'C':
alert_cmd_offset = strlen(optarg);
alert_cmd = malloc(alert_cmd_offset + OUTPUT_MAX);
@@ -1399,6 +1418,12 @@ main(
fatal("getnameinfo of destination address failed\n");
}
// Default alarm hold if not explicitly set
if (alarm_hold_msec == 0)
{
alarm_hold_msec = alert_interval_msec * DEFAULT_HOLD_PERIODS;
}
if (bind_addr_len)
{
r = getnameinfo((struct sockaddr *) &bind_addr, bind_addr_len, bind_str, sizeof(bind_str), NULL, 0, NI_NUMERICHOST);
@@ -1408,9 +1433,9 @@ main(
}
}
logger("send_interval %lums loss_interval %lums time_period %lums report_interval %lums data_len %lu alert_interval %lums latency_alarm %lums loss_alarm %lu%% dest_addr %s bind_addr %s identifier \"%s\"\n",
logger("send_interval %lums loss_interval %lums time_period %lums report_interval %lums data_len %lu alert_interval %lums latency_alarm %lums loss_alarm %lu%% alarm_hold %lums dest_addr %s bind_addr %s identifier \"%s\"\n",
send_interval_msec, loss_interval_msec, time_period_msec, report_interval_msec, echo_data_len,
alert_interval_msec, latency_alarm_threshold_msec, loss_alarm_threshold_percent,
alert_interval_msec, latency_alarm_threshold_msec, loss_alarm_threshold_percent, alarm_hold_msec,
dest_str, bind_str, identifier);
// Set my echo id

19
influx/README.md Normal file
View File

@@ -0,0 +1,19 @@
Examples for dpinger logging/monitoring with InfluxDB and Grafana
<br>
Files:
dpinger_influx_logger
Python script for logging dpinger data in InfluxDB
dpinger_start.sh
Sample start script for dpinger influx logging
dpinger_grafana_dashboard.json
Example Grafana dashboard for monitoring dpinger data

View File

@@ -0,0 +1,456 @@
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": {
"type": "datasource",
"uid": "grafana"
},
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"target": {
"limit": 100,
"matchAny": false,
"tags": [],
"type": "dashboard"
},
"type": "dashboard"
}
]
},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 0,
"id": 3,
"iteration": 1652309379625,
"links": [],
"liveNow": false,
"panels": [
{
"datasource": {
"uid": "$source"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"min": 0,
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "ms"
},
"overrides": [
{
"matcher": {
"id": "byName",
"options": "loss"
},
"properties": [
{
"id": "unit",
"value": "percent"
}
]
},
{
"matcher": {
"id": "byName",
"options": "loss"
},
"properties": [
{
"id": "color",
"value": {
"fixedColor": "#e00000",
"mode": "fixed"
}
},
{
"id": "custom.fillOpacity",
"value": 100
},
{
"id": "custom.lineWidth",
"value": 0
},
{
"id": "unit",
"value": "percent"
},
{
"id": "max",
"value": 100
}
]
}
]
},
"gridPos": {
"h": 19,
"w": 24,
"x": 0,
"y": 0
},
"id": 2,
"options": {
"legend": {
"calcs": [
"mean",
"lastNotNull",
"max",
"min"
],
"displayMode": "table",
"placement": "bottom"
},
"tooltip": {
"mode": "multi",
"sort": "none"
}
},
"pluginVersion": "8.3.5",
"targets": [
{
"alias": "latency",
"groupBy": [
{
"params": [
"$intervals"
],
"type": "time"
},
{
"params": [
"null"
],
"type": "fill"
}
],
"measurement": "dpinger",
"orderByTime": "ASC",
"policy": "default",
"query": "SELECT mean(\"latency\") FROM \"wan\" WHERE $timeFilter GROUP BY time($__interval) fill(null)",
"queryType": "randomWalk",
"rawQuery": false,
"refId": "A",
"resultFormat": "time_series",
"select": [
[
{
"params": [
"latency"
],
"type": "field"
},
{
"params": [],
"type": "mean"
}
]
],
"tags": [
{
"key": "name",
"operator": "=~",
"value": "/^$name$/"
}
]
},
{
"alias": "stddev",
"groupBy": [
{
"params": [
"$intervals"
],
"type": "time"
},
{
"params": [
"null"
],
"type": "fill"
}
],
"measurement": "dpinger",
"orderByTime": "ASC",
"policy": "default",
"queryType": "randomWalk",
"refId": "B",
"resultFormat": "time_series",
"select": [
[
{
"params": [
"stddev"
],
"type": "field"
},
{
"params": [],
"type": "mean"
}
]
],
"tags": [
{
"key": "name",
"operator": "=~",
"value": "/^$name$/"
}
]
},
{
"alias": "loss",
"groupBy": [
{
"params": [
"$intervals"
],
"type": "time"
},
{
"params": [
"null"
],
"type": "fill"
}
],
"measurement": "dpinger",
"orderByTime": "ASC",
"policy": "default",
"queryType": "randomWalk",
"refId": "C",
"resultFormat": "time_series",
"select": [
[
{
"params": [
"loss"
],
"type": "field"
},
{
"params": [],
"type": "mean"
}
]
],
"tags": [
{
"key": "name",
"operator": "=~",
"value": "/^$name$/"
}
]
}
],
"title": "$name - ${intervals} intervals",
"transformations": [],
"type": "timeseries"
}
],
"refresh": "1m",
"schemaVersion": 36,
"style": "dark",
"tags": [],
"templating": {
"list": [
{
"current": {
"selected": false,
"text": "dpinger",
"value": "dpinger"
},
"hide": 0,
"includeAll": false,
"label": "Source",
"multi": false,
"name": "source",
"options": [],
"query": "influxdb",
"queryValue": "",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"type": "datasource"
},
{
"current": {
"selected": false,
"text": "wan",
"value": "wan"
},
"datasource": {
"type": "influxdb",
"uid": "$source"
},
"definition": "SHOW TAG VALUES WITH KEY = \"name\"",
"hide": 0,
"includeAll": false,
"label": "Name",
"multi": false,
"name": "name",
"options": [],
"query": "SHOW TAG VALUES WITH KEY = \"name\"",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 0,
"tagValuesQuery": "",
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"auto": true,
"auto_count": 500,
"auto_min": "10s",
"current": {
"selected": false,
"text": "auto",
"value": "$__auto_interval_intervals"
},
"hide": 0,
"label": "Intervals",
"name": "intervals",
"options": [
{
"selected": true,
"text": "auto",
"value": "$__auto_interval_intervals"
},
{
"selected": false,
"text": "10s",
"value": "10s"
},
{
"selected": false,
"text": "30s",
"value": "30s"
},
{
"selected": false,
"text": "1m",
"value": "1m"
},
{
"selected": false,
"text": "2m",
"value": "2m"
},
{
"selected": false,
"text": "5m",
"value": "5m"
},
{
"selected": false,
"text": "10m",
"value": "10m"
},
{
"selected": false,
"text": "15m",
"value": "15m"
},
{
"selected": false,
"text": "30m",
"value": "30m"
},
{
"selected": false,
"text": "1h",
"value": "1h"
},
{
"selected": false,
"text": "6h",
"value": "6h"
},
{
"selected": false,
"text": "12h",
"value": "12h"
},
{
"selected": false,
"text": "1d",
"value": "1d"
},
{
"selected": false,
"text": "7d",
"value": "7d"
}
],
"query": "10s,30s,1m,2m,5m,10m,15m,30m,1h,6h,12h,1d,7d",
"queryValue": "",
"refresh": 2,
"skipUrlSync": false,
"type": "interval"
}
]
},
"time": {
"from": "now-24h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"1m",
"5m"
]
},
"timezone": "",
"title": "WAN Latency",
"uid": "ThwrgHYMk",
"version": 46,
"weekStart": ""
}

70
influx/dpinger_influx_logger Executable file
View File

@@ -0,0 +1,70 @@
#!/usr/bin/python
dpinger_path = "/usr/local/bin/dpinger"
import os
import sys
import signal
import requests
from subprocess import Popen, PIPE
from requests import post
# Handle SIGINT
def signal_handler(signal, frame):
try:
dpinger.kill()
except:
pass
sys.exit(0)
signal.signal(signal.SIGINT, signal_handler)
# Handle command line ars
progname = sys.argv.pop(0)
if (len(sys.argv) < 4):
print('Usage: {0} influx_url influx_db host name target [additional dpinger options]'.format(progname))
print(' influx_url URL of the Influx server')
print(' influx_db name of the Influx database')
print(' host value of "host" tag (example: output of hostname command)')
print(' name value of "name" tag (example: a circuit name such as "wan")')
print(' target IP address to monitor (also the value of the "target" tag)')
sys.exit(1)
influx_url = sys.argv.pop(0)
influx_db = sys.argv.pop(0)
host = sys.argv.pop(0)
name = sys.argv.pop(0)
target = sys.argv.pop(0)
influx_user = os.getenv('INFLUX_USER')
influx_pass = os.getenv('INFLUX_PASS')
# Set up dpinger command
cmd = [dpinger_path, "-f"]
cmd.extend(sys.argv)
cmd.extend(["-r", "10s", target])
# Set up formats
url = '{0}/write?db={1}'.format(influx_url, influx_db)
datafmt = "dpinger,host={0},name={1},target={2} latency={{0:.3f}},stddev={{1:.3f}},loss={{2}}i".format(host, name, target)
# Start up dpinger
try:
dpinger = Popen(cmd, stdout=PIPE, text=True, bufsize=0)
except:
print("failed to start dpinger")
sys.exit(1)
# Start the show
while True:
line = dpinger.stdout.readline()
if (len(line) == 0):
print("dpinger exited")
sys.exit(1)
[latency, stddev, loss] = line.split()
data = datafmt.format(float(latency) / 1000, float(stddev) / 1000, loss)
#print(data)
try:
post(url = url, auth = (influx_user, influx_pass), data = data)
except:
print("post failed")

7
influx/dpinger_start.sh Executable file
View File

@@ -0,0 +1,7 @@
#!/bin/sh
INFLUX_URL="http://myinfluxhost:8086"
export INFLUX_USER="dpinger"
export INFLUX_PASS="myinfluxpass"
exec /usr/local/dpinger_influx_logger $INFLUX_URL dpinger `hostname` wan 8.8.8.8