mirror of
https://github.com/dennypage/dpinger.git
synced 2024-05-19 06:50:01 +00:00
Compare commits
32 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
664f5c7aa6 | ||
|
|
0e963753e1 | ||
|
|
e3cb41889e | ||
|
|
9c31ea4380 | ||
|
|
fff9b65eb5 | ||
|
|
47f1a778b9 | ||
|
|
ce7d88bddf | ||
|
|
67b8ba1f6d | ||
|
|
c845c582b4 | ||
|
|
fbc7e8f87f | ||
|
|
efc17c7204 | ||
|
|
bc00923f62 | ||
|
|
bf18a6e2a8 | ||
|
|
cee7ac9da0 | ||
|
|
2b032751e5 | ||
|
|
84ee15b155 | ||
|
|
e10c51ad95 | ||
|
|
579ae3d66b | ||
|
|
64e644e7be | ||
|
|
a18d82ab6e | ||
|
|
34b0bb924e | ||
|
|
4173834bbe | ||
|
|
2a8eaa0c8f | ||
|
|
edb883498d | ||
|
|
c276feb339 | ||
|
|
ef21655e77 | ||
|
|
2d2d21892a | ||
|
|
a24c0cd0d0 | ||
|
|
fdbd4a1d96 | ||
|
|
6796fa0752 | ||
|
|
24022ac098 | ||
|
|
42c84e965e |
13
LICENSE
13
LICENSE
@@ -1,15 +1,15 @@
|
||||
Copyright (c) 2015-2016, Denny Page
|
||||
Copyright (c) 2015-2022, Denny Page
|
||||
All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are met:
|
||||
|
||||
* Redistributions of source code must retain the above copyright notice, this
|
||||
list of conditions and the following disclaimer.
|
||||
1. Redistributions of source code must retain the above copyright notice, this
|
||||
list of conditions and the following disclaimer.
|
||||
|
||||
* Redistributions in binary form must reproduce the above copyright notice,
|
||||
this list of conditions and the following disclaimer in the documentation
|
||||
and/or other materials provided with the distribution.
|
||||
2. Redistributions in binary form must reproduce the above copyright notice,
|
||||
this list of conditions and the following disclaimer in the documentation
|
||||
and/or other materials provided with the distribution.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
||||
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
@@ -21,4 +21,3 @@ SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
|
||||
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
|
||||
8
Makefile
8
Makefile
@@ -1,8 +1,8 @@
|
||||
#CC=gcc
|
||||
#WARNINGS=-Wall -Wextra -Wformat=2 -Wno-unused-result
|
||||
CC=gcc
|
||||
WARNINGS=-Wall -Wextra -Wformat=2 -Wno-unused-result
|
||||
|
||||
CC=clang
|
||||
WARNINGS=-Weverything -Wno-padded -Wno-disabled-macro-expansion
|
||||
#CC=clang
|
||||
#WARNINGS=-Weverything -Wno-unsafe-buffer-usage -Wno-cast-function-type-strict -Wno-padded -Wno-disabled-macro-expansion -Wno-reserved-id-macro
|
||||
|
||||
CFLAGS=${WARNINGS} -pthread -g -O2
|
||||
|
||||
|
||||
13
NOTES.md
Normal file
13
NOTES.md
Normal file
@@ -0,0 +1,13 @@
|
||||
<b>Loss accuracy</b>
|
||||
|
||||
In general, dpinger works a bit differently than other latency monitors. Rather than a "probe" that fires off and processes a handful of echo request/replies all at once, dpinger maintains a rolling array of echo requests spaced on the send interval. In other words, instead of waking up every second and sending 4 echo requests at once, dpinger sends an echo request every 250 milliseconds. When dpinger receives an echo reply, the time difference between the request packet and reply packet (latency) is recorded. There is nothing that times out an echo request/reply and records it as permanently lost.
|
||||
|
||||
When the alert check is made, or a report is generated, dpinger goes through the array and examines each echo request. If a reply has been received, it is used as part of the overall latency calculation. If a reply has not yet been received, the amount of time since the request is compared against the loss interval. If it is greater than the loss interval, the request/reply is counted as lost in the current report. However the concept of the request/reply being lost is not a permanent decision. In subsequent reports, if a the missing reply has been received, its latency will be used instead of being counted as lost.
|
||||
|
||||
It's important to keep in mind that latency and loss are reported as averages across the entire request set. The default time period for dpinger is 60 seconds, with an echo request being sent every 500 milliseconds. This means that the latency and loss will be reported as averages across 116-120 samples. The alert check runs every second by default. So each time, the 4 oldest entries in the set have been replaced by the 4 newest ones.
|
||||
|
||||
Note that if you want accurate loss reporting, it is important that the number of samples be sufficient. In order to achieve 1% loss resolution, you have need more than 100 samples in the set. The calculation for loss resolution is:
|
||||
|
||||
100 / ((time_period - loss_interval) / send_interval)
|
||||
|
||||
The default settings for dpinger report loss with an accuracy of 0.87%.
|
||||
334
dpinger.c
334
dpinger.c
@@ -1,6 +1,6 @@
|
||||
|
||||
//
|
||||
// Copyright (c) 2015-2016, Denny Page
|
||||
// Copyright (c) 2015-2023, Denny Page
|
||||
// All rights reserved.
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
@@ -27,6 +27,11 @@
|
||||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
|
||||
|
||||
// Silly that this is required for accept4 on Linux
|
||||
#define _GNU_SOURCE
|
||||
|
||||
|
||||
#include <stdio.h>
|
||||
#include <errno.h>
|
||||
#include <string.h>
|
||||
@@ -39,10 +44,11 @@
|
||||
#include <signal.h>
|
||||
|
||||
#include <netdb.h>
|
||||
#include <net/if.h>
|
||||
#include <sys/socket.h>
|
||||
#include <net/if.h>
|
||||
#include <sys/un.h>
|
||||
#include <sys/stat.h>
|
||||
#include <sys/file.h>
|
||||
#include <netinet/in.h>
|
||||
#include <netinet/ip.h>
|
||||
#include <netinet/ip_icmp.h>
|
||||
@@ -52,17 +58,6 @@
|
||||
#include <pthread.h>
|
||||
#include <syslog.h>
|
||||
|
||||
// TODO:
|
||||
//
|
||||
// After December 31st, 2016, review use of fcntl() for setting non blocking
|
||||
// and close on exec. It would be preferable to use accept4(), SOCK_CLOEXEC
|
||||
// and SOCK_NONBLOCK. These are currently avoided to allow use on older
|
||||
// systems such as FreeBSD 9.3, Linux 2.6.26.
|
||||
// For Linux accept4() currently requires defining _GNU_SOURCE which we would
|
||||
// like to avoid.
|
||||
// For FreeBSD, these definitions were introduced with FreeBSD 10.0 and are
|
||||
// not present in 9.3 which is supported through 2016.
|
||||
|
||||
|
||||
// Who we are
|
||||
static const char * progname;
|
||||
@@ -74,16 +69,17 @@ static const char * pidfile_name = NULL;
|
||||
// Flags
|
||||
static unsigned int flag_rewind = 0;
|
||||
static unsigned int flag_syslog = 0;
|
||||
static unsigned int flag_priority = 0;
|
||||
|
||||
// String representation of target
|
||||
#define ADDR_STR_MAX (INET6_ADDRSTRLEN + IF_NAMESIZE + 1)
|
||||
static char dest_str[ADDR_STR_MAX];
|
||||
|
||||
// Time period over which we are averaging results in ms
|
||||
static unsigned long time_period_msec = 30000;
|
||||
static unsigned long time_period_msec = 60000;
|
||||
|
||||
// Interval between sends in ms
|
||||
static unsigned long send_interval_msec = 250;
|
||||
static unsigned long send_interval_msec = 500;
|
||||
|
||||
// Interval before a sequence is initially treated as lost
|
||||
// Input from command line in ms and used in us
|
||||
@@ -108,8 +104,9 @@ static unsigned long loss_alarm_threshold_percent = 0;
|
||||
static char * alert_cmd = NULL;
|
||||
static size_t alert_cmd_offset;
|
||||
|
||||
// Number of periods to wait to declare an alarm as cleared
|
||||
#define ALARM_DECAY_PERIODS 10
|
||||
// Interval before an alarm is cleared (hold time)
|
||||
static unsigned long alarm_hold_msec = 0;
|
||||
#define DEFAULT_HOLD_PERIODS 10
|
||||
|
||||
// Report file
|
||||
static const char * report_name = NULL;
|
||||
@@ -191,25 +188,8 @@ static uint16_t echo_id;
|
||||
static uint16_t next_sequence = 0;
|
||||
static uint16_t sequence_limit;
|
||||
|
||||
|
||||
//
|
||||
// Termination handler
|
||||
//
|
||||
__attribute__ ((noreturn))
|
||||
static void
|
||||
term_handler(void)
|
||||
{
|
||||
// NB: This function may be simultaneously invoked by multiple threads
|
||||
if (usocket_name)
|
||||
{
|
||||
(void) unlink(usocket_name);
|
||||
}
|
||||
if (pidfile_name)
|
||||
{
|
||||
(void) unlink(pidfile_name);
|
||||
}
|
||||
exit(0);
|
||||
}
|
||||
// Receive thread ready
|
||||
static unsigned int recv_ready = 0;
|
||||
|
||||
|
||||
//
|
||||
@@ -236,6 +216,28 @@ logger(
|
||||
}
|
||||
|
||||
|
||||
//
|
||||
// Termination handler
|
||||
//
|
||||
__attribute__ ((noreturn))
|
||||
static void
|
||||
term_handler(
|
||||
int signum)
|
||||
{
|
||||
// NB: This function may be simultaneously invoked by multiple threads
|
||||
if (usocket_name)
|
||||
{
|
||||
(void) unlink(usocket_name);
|
||||
}
|
||||
if (pidfile_name)
|
||||
{
|
||||
(void) unlink(pidfile_name);
|
||||
}
|
||||
logger("exiting on signal %d\n", signum);
|
||||
exit(0);
|
||||
}
|
||||
|
||||
|
||||
//
|
||||
// Compute checksum for ICMP
|
||||
//
|
||||
@@ -286,7 +288,7 @@ llsqrt(
|
||||
}
|
||||
}
|
||||
|
||||
return s;
|
||||
return (unsigned long) s;
|
||||
}
|
||||
|
||||
|
||||
@@ -333,18 +335,23 @@ send_thread(
|
||||
echo_request->code = 0;
|
||||
echo_request->id = echo_id;
|
||||
|
||||
// Give the recv thread a moment to initialize
|
||||
sleeptime.tv_sec = 0;
|
||||
sleeptime.tv_nsec = 10000; // 10us
|
||||
do {
|
||||
r = nanosleep(&sleeptime, NULL);
|
||||
if (r == -1)
|
||||
{
|
||||
logger("nanosleep error in send thread waiting for recv thread: %d\n", errno);
|
||||
}
|
||||
} while (recv_ready == 0);
|
||||
|
||||
// Set up the timespec for nanosleep
|
||||
sleeptime.tv_sec = send_interval_msec / 1000;
|
||||
sleeptime.tv_nsec = (send_interval_msec % 1000) * 1000000;
|
||||
|
||||
while (1)
|
||||
{
|
||||
r = nanosleep(&sleeptime, NULL);
|
||||
if (r == -1)
|
||||
{
|
||||
logger("nanosleep error in send thread: %d\n", errno);
|
||||
}
|
||||
|
||||
// Set sequence number and checksum
|
||||
echo_request->sequence = htons(next_sequence);
|
||||
echo_request->cksum = 0;
|
||||
@@ -352,17 +359,23 @@ send_thread(
|
||||
|
||||
array[next_slot].status = PACKET_STATUS_EMPTY;
|
||||
sched_yield();
|
||||
clock_gettime(CLOCK_MONOTONIC, &array[next_slot].time_sent);
|
||||
|
||||
clock_gettime(CLOCK_MONOTONIC, &array[next_slot].time_sent);
|
||||
array[next_slot].status = PACKET_STATUS_SENT;
|
||||
len = sendto(send_sock, echo_request, echo_request_len, 0, (struct sockaddr *) &dest_addr, dest_addr_len);
|
||||
if (len == -1)
|
||||
{
|
||||
logger("%s%s: sendto error: %d\n", identifier, dest_str, errno);
|
||||
}
|
||||
array[next_slot].status = PACKET_STATUS_SENT;
|
||||
|
||||
next_slot = (next_slot + 1) % array_size;
|
||||
next_sequence = (next_sequence + 1) % sequence_limit;
|
||||
|
||||
r = nanosleep(&sleeptime, NULL);
|
||||
if (r == -1)
|
||||
{
|
||||
logger("nanosleep error in send thread: %d\n", errno);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -383,6 +396,9 @@ recv_thread(
|
||||
struct timespec now;
|
||||
unsigned int array_slot;
|
||||
|
||||
// Thread startup complete
|
||||
recv_ready = 1;
|
||||
|
||||
while (1)
|
||||
{
|
||||
src_addr_len = sizeof(src_addr);
|
||||
@@ -471,7 +487,7 @@ report(
|
||||
packets_received++;
|
||||
latency_usec = array[slot].latency_usec;
|
||||
total_latency_usec += latency_usec;
|
||||
total_latency_usec2 += latency_usec * latency_usec;
|
||||
total_latency_usec2 += (unsigned long long) latency_usec * latency_usec;
|
||||
}
|
||||
else if (array[slot].status == PACKET_STATUS_SENT &&
|
||||
ts_elapsed_usec(&array[slot].time_sent, &now) > loss_interval_usec)
|
||||
@@ -489,7 +505,7 @@ report(
|
||||
|
||||
// stddev = sqrt((sum(rtt^2) / packets) - (sum(rtt) / packets)^2)
|
||||
*average_latency_usec = avg;
|
||||
*latency_deviation = llsqrt(avg2 - (avg * avg));
|
||||
*latency_deviation = llsqrt(avg2 - ((unsigned long long) avg * avg));
|
||||
}
|
||||
else
|
||||
{
|
||||
@@ -578,6 +594,7 @@ alert_thread(
|
||||
unsigned long average_latency_usec;
|
||||
unsigned long latency_deviation;
|
||||
unsigned long average_loss_percent;
|
||||
unsigned int alarm_hold_periods;
|
||||
unsigned int latency_alarm_decay = 0;
|
||||
unsigned int loss_alarm_decay = 0;
|
||||
unsigned int alert = 0;
|
||||
@@ -588,6 +605,9 @@ alert_thread(
|
||||
sleeptime.tv_sec = alert_interval_msec / 1000;
|
||||
sleeptime.tv_nsec = (alert_interval_msec % 1000) * 1000000;
|
||||
|
||||
// Set number of alarm hold periods
|
||||
alarm_hold_periods = (unsigned int) ((alarm_hold_msec + alert_interval_msec - 1) / alert_interval_msec);
|
||||
|
||||
while (1)
|
||||
{
|
||||
r = nanosleep(&sleeptime, NULL);
|
||||
@@ -607,7 +627,7 @@ alert_thread(
|
||||
alert = 1;
|
||||
}
|
||||
|
||||
latency_alarm_decay = ALARM_DECAY_PERIODS;
|
||||
latency_alarm_decay = alarm_hold_periods;
|
||||
}
|
||||
else if (latency_alarm_decay)
|
||||
{
|
||||
@@ -628,7 +648,7 @@ alert_thread(
|
||||
alert = 1;
|
||||
}
|
||||
|
||||
loss_alarm_decay = ALARM_DECAY_PERIODS;
|
||||
loss_alarm_decay = alarm_hold_periods;
|
||||
}
|
||||
else if (loss_alarm_decay)
|
||||
{
|
||||
@@ -687,9 +707,14 @@ usocket_thread(
|
||||
|
||||
while (1)
|
||||
{
|
||||
#if defined(DISABLE_ACCEPT4)
|
||||
// Legacy
|
||||
sock_fd = accept(usocket_fd, NULL, NULL);
|
||||
(void) fcntl(sock_fd, F_SETFL, FD_CLOEXEC);
|
||||
(void) fcntl(sock_fd, F_SETFL, fcntl(sock_fd, F_GETFL, 0) | O_NONBLOCK);
|
||||
#else
|
||||
sock_fd = accept4(usocket_fd, NULL, NULL, SOCK_NONBLOCK | SOCK_CLOEXEC);
|
||||
#endif
|
||||
|
||||
report(&average_latency_usec, &latency_deviation, &average_loss_percent);
|
||||
|
||||
@@ -727,10 +752,10 @@ get_time_arg_msec(
|
||||
const char * arg,
|
||||
unsigned long * value)
|
||||
{
|
||||
unsigned long t;
|
||||
long t;
|
||||
char * suffix;
|
||||
|
||||
t = strtoul(arg, &suffix, 10);
|
||||
t = strtol(arg, &suffix, 10);
|
||||
if (*suffix == 'm')
|
||||
{
|
||||
// Milliseconds
|
||||
@@ -743,13 +768,13 @@ get_time_arg_msec(
|
||||
suffix++;
|
||||
}
|
||||
|
||||
// Garbage in the number?
|
||||
if (*suffix != 0)
|
||||
// Invalid specification?
|
||||
if (t < 0 || *suffix != 0)
|
||||
{
|
||||
return 1;
|
||||
}
|
||||
|
||||
*value = t;
|
||||
*value = (unsigned long) t;
|
||||
return 0;
|
||||
}
|
||||
|
||||
@@ -762,22 +787,22 @@ get_percent_arg(
|
||||
const char * arg,
|
||||
unsigned long * value)
|
||||
{
|
||||
unsigned long t;
|
||||
long t;
|
||||
char * suffix;
|
||||
|
||||
t = strtoul(arg, &suffix, 10);
|
||||
t = strtol(arg, &suffix, 10);
|
||||
if (*suffix == '%')
|
||||
{
|
||||
suffix++;
|
||||
}
|
||||
|
||||
// Garbage in the number?
|
||||
if (*suffix != 0 || t > 100)
|
||||
// Invalid specification?
|
||||
if (t < 0 || t > 100 || *suffix != 0)
|
||||
{
|
||||
return 1;
|
||||
}
|
||||
|
||||
*value = t;
|
||||
*value = (unsigned long) t;
|
||||
return 0;
|
||||
}
|
||||
|
||||
@@ -790,10 +815,10 @@ get_length_arg(
|
||||
const char * arg,
|
||||
unsigned long * value)
|
||||
{
|
||||
unsigned long t;
|
||||
long t;
|
||||
char * suffix;
|
||||
|
||||
t = strtoul(arg, &suffix, 10);
|
||||
t = strtol(arg, &suffix, 10);
|
||||
if (*suffix == 'b')
|
||||
{
|
||||
// Bytes
|
||||
@@ -806,13 +831,13 @@ get_length_arg(
|
||||
suffix++;
|
||||
}
|
||||
|
||||
// Garbage in the number?
|
||||
if (*suffix != 0)
|
||||
// Invalid specification?
|
||||
if (t < 0 || *suffix != 0)
|
||||
{
|
||||
return 1;
|
||||
}
|
||||
|
||||
*value = t;
|
||||
*value = (unsigned long) t;
|
||||
return 0;
|
||||
}
|
||||
|
||||
@@ -823,22 +848,25 @@ get_length_arg(
|
||||
static void
|
||||
usage(void)
|
||||
{
|
||||
fprintf(stderr, "Dpinger version 3.3\n\n");
|
||||
fprintf(stderr, "Usage:\n");
|
||||
fprintf(stderr, " %s [-f] [-R] [-S] [-B bind_addr] [-s send_interval] [-l loss_interval] [-t time_period] [-r report_interval] [-d data_length] [-o output_file] [-A alert_interval] [-D latency_alarm] [-L loss_alarm] [-C alert_cmd] [-i identifier] [-u usocket] [-p pidfile] dest_addr\n\n", progname);
|
||||
fprintf(stderr, " %s [-f] [-R] [-S] [-P] [-B bind_addr] [-s send_interval] [-l loss_interval] [-t time_period] [-r report_interval] [-d data_length] [-o output_file] [-A alert_interval] [-D latency_alarm] [-L loss_alarm] [-H hold_interval] [-C alert_cmd] [-i identifier] [-u usocket] [-p pidfile] dest_addr\n\n", progname);
|
||||
fprintf(stderr, " options:\n");
|
||||
fprintf(stderr, " -f run in foreground\n");
|
||||
fprintf(stderr, " -R rewind output file between reports\n");
|
||||
fprintf(stderr, " -S log warnings via syslog\n");
|
||||
fprintf(stderr, " -P priority scheduling for receive thread (requires root)\n");
|
||||
fprintf(stderr, " -B bind (source) address\n");
|
||||
fprintf(stderr, " -s time interval between echo requests (default 250ms)\n");
|
||||
fprintf(stderr, " -l time interval before packets are treated as lost (default 5x send interval)\n");
|
||||
fprintf(stderr, " -t time period over which results are averaged (default 30s)\n");
|
||||
fprintf(stderr, " -s time interval between echo requests (default 500ms)\n");
|
||||
fprintf(stderr, " -l time interval before packets are treated as lost (default 4x send interval)\n");
|
||||
fprintf(stderr, " -t time period over which results are averaged (default 60s)\n");
|
||||
fprintf(stderr, " -r time interval between reports (default 1s)\n");
|
||||
fprintf(stderr, " -d data length (default 0)\n");
|
||||
fprintf(stderr, " -o output file for reports (default stdout)\n");
|
||||
fprintf(stderr, " -A time interval between alerts (default 1s)\n");
|
||||
fprintf(stderr, " -D time threshold for latency alarm (default none)\n");
|
||||
fprintf(stderr, " -L percent threshold for loss alarm (default none)\n");
|
||||
fprintf(stderr, " -H time interval to hold an alarm before clearing it (default 10x alert interval)\n");
|
||||
fprintf(stderr, " -C optional command to be invoked via system() for alerts\n");
|
||||
fprintf(stderr, " -i identifier text to include in output\n");
|
||||
fprintf(stderr, " -u unix socket name for polling\n");
|
||||
@@ -850,10 +878,11 @@ usage(void)
|
||||
fprintf(stderr, " the output format is \"latency_avg latency_stddev loss_pct\"\n");
|
||||
fprintf(stderr, " latency values are output in microseconds\n");
|
||||
fprintf(stderr, " loss percentage is reported in whole numbers of 0-100\n");
|
||||
fprintf(stderr, " resolution of loss calculation is: 100 * send_interval / (time_period - loss_interval)\n\n");
|
||||
fprintf(stderr, " the alert_cmd is invoked as \"alert_cmd dest_addr alarm_flag latency_avg loss_avg\"\n");
|
||||
fprintf(stderr, " resolution of loss calculation is: 100 / ((time_period - loss_interval) / send_interval)\n\n");
|
||||
fprintf(stderr, " the alert_cmd is invoked as \"alert_cmd dest_addr alarm_flag latency_avg latency_stddev loss_pct\"\n");
|
||||
fprintf(stderr, " alarm_flag is set to 1 if either latency or loss is in alarm state\n");
|
||||
fprintf(stderr, " alarm_flag will return to 0 when both have have cleared alarm state\n\n");
|
||||
fprintf(stderr, " alarm_flag will return to 0 when both have have cleared alarm state\n");
|
||||
fprintf(stderr, " alarm hold time begins when the source of the alarm retruns to normal\n\n");
|
||||
}
|
||||
|
||||
|
||||
@@ -866,14 +895,11 @@ fatal(
|
||||
const char * format,
|
||||
...)
|
||||
{
|
||||
if (format)
|
||||
{
|
||||
va_list args;
|
||||
va_list args;
|
||||
|
||||
va_start(args, format);
|
||||
vfprintf(stderr, format, args);
|
||||
va_end(args);
|
||||
}
|
||||
va_start(args, format);
|
||||
vfprintf(stderr, format, args);
|
||||
va_end(args);
|
||||
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
@@ -897,7 +923,7 @@ parse_args(
|
||||
|
||||
progname = argv[0];
|
||||
|
||||
while((opt = getopt(argc, argv, "fRSB:s:l:t:r:d:o:A:D:L:C:i:u:p:")) != -1)
|
||||
while((opt = getopt(argc, argv, "fRSPB:s:l:t:r:d:o:A:D:L:H:C:i:u:p:")) != -1)
|
||||
{
|
||||
switch (opt)
|
||||
{
|
||||
@@ -913,6 +939,10 @@ parse_args(
|
||||
flag_syslog = 1;
|
||||
break;
|
||||
|
||||
case 'P':
|
||||
flag_priority = 1;
|
||||
break;
|
||||
|
||||
case 'B':
|
||||
bind_arg = optarg;
|
||||
break;
|
||||
@@ -971,7 +1001,7 @@ parse_args(
|
||||
|
||||
case 'D':
|
||||
r = get_time_arg_msec(optarg, &latency_alarm_threshold_msec);
|
||||
if (r || latency_alarm_threshold_msec == 0)
|
||||
if (r)
|
||||
{
|
||||
fatal("invalid latency alarm threshold %s\n", optarg);
|
||||
}
|
||||
@@ -980,12 +1010,20 @@ parse_args(
|
||||
|
||||
case 'L':
|
||||
r = get_percent_arg(optarg, &loss_alarm_threshold_percent);
|
||||
if (r || loss_alarm_threshold_percent == 0)
|
||||
if (r)
|
||||
{
|
||||
fatal("invalid loss alarm threshold %s\n", optarg);
|
||||
}
|
||||
break;
|
||||
|
||||
case 'H':
|
||||
r = get_time_arg_msec(optarg, &alarm_hold_msec);
|
||||
if (r)
|
||||
{
|
||||
fatal("invalid alarm hold interval %s\n", optarg);
|
||||
}
|
||||
break;
|
||||
|
||||
case 'C':
|
||||
alert_cmd_offset = strlen(optarg);
|
||||
alert_cmd = malloc(alert_cmd_offset + OUTPUT_MAX);
|
||||
@@ -1018,7 +1056,7 @@ parse_args(
|
||||
|
||||
default:
|
||||
usage();
|
||||
fatal(NULL);
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1026,7 +1064,7 @@ parse_args(
|
||||
if (argc != optind + 1)
|
||||
{
|
||||
usage();
|
||||
fatal(NULL);
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
dest_arg = argv[optind];
|
||||
|
||||
@@ -1036,17 +1074,17 @@ parse_args(
|
||||
fatal("no activity enabled\n");
|
||||
}
|
||||
|
||||
// Ensure we have something to average over
|
||||
if (time_period_msec < send_interval_msec)
|
||||
// Ensure there is a minimum of one resolved slot at all times
|
||||
if (time_period_msec <= send_interval_msec * 2 + loss_interval_msec)
|
||||
{
|
||||
fatal("time period cannot be less than send interval\n");
|
||||
fatal("the time period must be greater than twice the send interval plus the loss interval\n");
|
||||
}
|
||||
|
||||
// Ensure we don't have sequence space issues. This really should only be hit by
|
||||
// complete accident. Even a ratio of 16384:1 would be excessive.
|
||||
if (time_period_msec / send_interval_msec > 65536)
|
||||
{
|
||||
fatal("ratio of time period to send interval cannot exceed 65536:1\n");
|
||||
fatal("the ratio of time period to send interval cannot exceed 65536:1\n");
|
||||
}
|
||||
|
||||
// Check destination address
|
||||
@@ -1127,10 +1165,13 @@ main(
|
||||
char *argv[])
|
||||
{
|
||||
char bind_str[ADDR_STR_MAX] = "(none)";
|
||||
char pidbuf[64];
|
||||
int pidfile_fd = -1;
|
||||
pid_t pid;
|
||||
pthread_t thread;
|
||||
struct sigaction act;
|
||||
int buflen = PACKET_BUFLEN;
|
||||
ssize_t len;
|
||||
ssize_t rs;
|
||||
int r;
|
||||
|
||||
@@ -1177,6 +1218,66 @@ main(
|
||||
(void) setgid(getgid());
|
||||
(void) setuid(getuid());
|
||||
|
||||
// Create pid file
|
||||
if (pidfile_name)
|
||||
{
|
||||
pidfile_fd = open(pidfile_name, O_WRONLY | O_CREAT | O_EXCL | O_CLOEXEC, 0644);
|
||||
if (pidfile_fd != -1)
|
||||
{
|
||||
// Lock the pid file
|
||||
r = flock(pidfile_fd, LOCK_EX | LOCK_NB);
|
||||
if (r == -1)
|
||||
{
|
||||
perror("flock");
|
||||
fatal("error locking pid file\n");
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
// Pid file already exists?
|
||||
pidfile_fd = open(pidfile_name, O_RDWR | O_CREAT | O_CLOEXEC, 0644);
|
||||
if (pidfile_fd == -1)
|
||||
{
|
||||
perror("open");
|
||||
fatal("cannot create/open pid file %s\n", pidfile_name);
|
||||
}
|
||||
|
||||
// Lock the pid file
|
||||
r = flock(pidfile_fd, LOCK_EX | LOCK_NB);
|
||||
if (r == -1)
|
||||
{
|
||||
fatal("pid file %s is in use by another process\n", pidfile_name);
|
||||
}
|
||||
|
||||
// Check for existing pid
|
||||
rs = read(pidfile_fd, pidbuf, sizeof(pidbuf) - 1);
|
||||
if (rs > 0)
|
||||
{
|
||||
pidbuf[rs] = 0;
|
||||
|
||||
pid = (pid_t) strtol(pidbuf, NULL, 10);
|
||||
if (pid > 0)
|
||||
{
|
||||
// Is the pid still alive?
|
||||
r = kill(pid, 0);
|
||||
if (r == 0)
|
||||
{
|
||||
fatal("pid file %s is in use by process %u\n", pidfile_name, (unsigned int) pid);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Reset the pid file
|
||||
(void) lseek(pidfile_fd, 0, 0);
|
||||
r = ftruncate(pidfile_fd, 0);
|
||||
if (r == -1)
|
||||
{
|
||||
perror("ftruncate");
|
||||
fatal("cannot write pid file %s\n", pidfile_name);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Create report file
|
||||
if (report_name)
|
||||
{
|
||||
@@ -1236,31 +1337,20 @@ main(
|
||||
}
|
||||
}
|
||||
|
||||
// Create pid file
|
||||
if (pidfile_name)
|
||||
{
|
||||
pidfile_fd = open(pidfile_name, O_WRONLY | O_CREAT | O_TRUNC | O_CLOEXEC, 0644);
|
||||
if (pidfile_fd == -1)
|
||||
{
|
||||
perror("open");
|
||||
fatal("cannot open/create pid file %s\n", pidfile_name);
|
||||
}
|
||||
}
|
||||
|
||||
// End of general errors from command line options
|
||||
|
||||
// Self background
|
||||
if (foreground == 0)
|
||||
{
|
||||
r = fork();
|
||||
pid = fork();
|
||||
|
||||
if (r == -1)
|
||||
if (pid == -1)
|
||||
{
|
||||
perror("fork");
|
||||
fatal("cannot background\n");
|
||||
}
|
||||
|
||||
if (r)
|
||||
if (pid)
|
||||
{
|
||||
_exit(EXIT_SUCCESS);
|
||||
}
|
||||
@@ -1277,16 +1367,13 @@ main(
|
||||
// Write pid file
|
||||
if (pidfile_fd != -1)
|
||||
{
|
||||
char buf[64];
|
||||
ssize_t len;
|
||||
|
||||
len = snprintf(buf, sizeof(buf), "%u\n", (unsigned) getpid());
|
||||
if (len < 0 || (size_t) len > sizeof(buf))
|
||||
len = snprintf(pidbuf, sizeof(pidbuf), "%u\n", (unsigned) getpid());
|
||||
if (len < 0 || (size_t) len > sizeof(pidbuf))
|
||||
{
|
||||
fatal("error formatting pidfile\n");
|
||||
}
|
||||
|
||||
rs = write(pidfile_fd, buf, (size_t) len);
|
||||
rs = write(pidfile_fd, pidbuf, (size_t) len);
|
||||
if (rs == -1)
|
||||
{
|
||||
perror("write");
|
||||
@@ -1320,7 +1407,7 @@ main(
|
||||
// Set the default loss interval
|
||||
if (loss_interval_msec == 0)
|
||||
{
|
||||
loss_interval_msec = send_interval_msec * 5;
|
||||
loss_interval_msec = send_interval_msec * 4;
|
||||
}
|
||||
loss_interval_usec = loss_interval_msec * 1000;
|
||||
|
||||
@@ -1331,6 +1418,12 @@ main(
|
||||
fatal("getnameinfo of destination address failed\n");
|
||||
}
|
||||
|
||||
// Default alarm hold if not explicitly set
|
||||
if (alarm_hold_msec == 0)
|
||||
{
|
||||
alarm_hold_msec = alert_interval_msec * DEFAULT_HOLD_PERIODS;
|
||||
}
|
||||
|
||||
if (bind_addr_len)
|
||||
{
|
||||
r = getnameinfo((struct sockaddr *) &bind_addr, bind_addr_len, bind_str, sizeof(bind_str), NULL, 0, NI_NUMERICHOST);
|
||||
@@ -1340,9 +1433,9 @@ main(
|
||||
}
|
||||
}
|
||||
|
||||
logger("send_interval %lums loss_interval %lums time_period %lums report_interval %lums data_len %lu alert_interval %lums latency_alarm %lums loss_alarm %lu%% dest_addr %s bind_addr %s identifier \"%s\"\n",
|
||||
logger("send_interval %lums loss_interval %lums time_period %lums report_interval %lums data_len %lu alert_interval %lums latency_alarm %lums loss_alarm %lu%% alarm_hold %lums dest_addr %s bind_addr %s identifier \"%s\"\n",
|
||||
send_interval_msec, loss_interval_msec, time_period_msec, report_interval_msec, echo_data_len,
|
||||
alert_interval_msec, latency_alarm_threshold_msec, loss_alarm_threshold_percent,
|
||||
alert_interval_msec, latency_alarm_threshold_msec, loss_alarm_threshold_percent, alarm_hold_msec,
|
||||
dest_str, bind_str, identifier);
|
||||
|
||||
// Set my echo id
|
||||
@@ -1363,6 +1456,27 @@ main(
|
||||
fatal("cannot create recv thread\n");
|
||||
}
|
||||
|
||||
// Set priority on recv thread if requested
|
||||
if (flag_priority)
|
||||
{
|
||||
struct sched_param thread_sched_param;
|
||||
|
||||
r = sched_get_priority_min(SCHED_RR);
|
||||
if (r == -1)
|
||||
{
|
||||
perror("sched_get_priority_min");
|
||||
fatal("cannot determin minimum shceduling priority for SCHED_RR\n");
|
||||
}
|
||||
thread_sched_param.sched_priority = r;
|
||||
|
||||
r = pthread_setschedparam(thread, SCHED_RR, &thread_sched_param);
|
||||
if (r != 0)
|
||||
{
|
||||
perror("pthread_setschedparam");
|
||||
fatal("cannot set receive thread priority\n");
|
||||
}
|
||||
}
|
||||
|
||||
// Create send thread
|
||||
r = pthread_create(&thread, NULL, &send_thread, NULL);
|
||||
if (r != 0)
|
||||
|
||||
19
influx/README.md
Normal file
19
influx/README.md
Normal file
@@ -0,0 +1,19 @@
|
||||
Examples for dpinger logging/monitoring with InfluxDB and Grafana
|
||||
|
||||
<br>
|
||||
|
||||
Files:
|
||||
|
||||
dpinger_influx_logger
|
||||
|
||||
Python script for logging dpinger data in InfluxDB
|
||||
|
||||
|
||||
dpinger_start.sh
|
||||
|
||||
Sample start script for dpinger influx logging
|
||||
|
||||
|
||||
dpinger_grafana_dashboard.json
|
||||
|
||||
Example Grafana dashboard for monitoring dpinger data
|
||||
456
influx/dpinger_grafana_dashboard.json
Normal file
456
influx/dpinger_grafana_dashboard.json
Normal file
@@ -0,0 +1,456 @@
|
||||
{
|
||||
"annotations": {
|
||||
"list": [
|
||||
{
|
||||
"builtIn": 1,
|
||||
"datasource": {
|
||||
"type": "datasource",
|
||||
"uid": "grafana"
|
||||
},
|
||||
"enable": true,
|
||||
"hide": true,
|
||||
"iconColor": "rgba(0, 211, 255, 1)",
|
||||
"name": "Annotations & Alerts",
|
||||
"target": {
|
||||
"limit": 100,
|
||||
"matchAny": false,
|
||||
"tags": [],
|
||||
"type": "dashboard"
|
||||
},
|
||||
"type": "dashboard"
|
||||
}
|
||||
]
|
||||
},
|
||||
"editable": true,
|
||||
"fiscalYearStartMonth": 0,
|
||||
"graphTooltip": 0,
|
||||
"id": 3,
|
||||
"iteration": 1652309379625,
|
||||
"links": [],
|
||||
"liveNow": false,
|
||||
"panels": [
|
||||
{
|
||||
"datasource": {
|
||||
"uid": "$source"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "palette-classic"
|
||||
},
|
||||
"custom": {
|
||||
"axisLabel": "",
|
||||
"axisPlacement": "auto",
|
||||
"barAlignment": 0,
|
||||
"drawStyle": "line",
|
||||
"fillOpacity": 10,
|
||||
"gradientMode": "none",
|
||||
"hideFrom": {
|
||||
"legend": false,
|
||||
"tooltip": false,
|
||||
"viz": false
|
||||
},
|
||||
"lineInterpolation": "linear",
|
||||
"lineWidth": 1,
|
||||
"pointSize": 5,
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
},
|
||||
"showPoints": "never",
|
||||
"spanNulls": false,
|
||||
"stacking": {
|
||||
"group": "A",
|
||||
"mode": "none"
|
||||
},
|
||||
"thresholdsStyle": {
|
||||
"mode": "off"
|
||||
}
|
||||
},
|
||||
"mappings": [],
|
||||
"min": 0,
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 80
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "ms"
|
||||
},
|
||||
"overrides": [
|
||||
{
|
||||
"matcher": {
|
||||
"id": "byName",
|
||||
"options": "loss"
|
||||
},
|
||||
"properties": [
|
||||
{
|
||||
"id": "unit",
|
||||
"value": "percent"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"matcher": {
|
||||
"id": "byName",
|
||||
"options": "loss"
|
||||
},
|
||||
"properties": [
|
||||
{
|
||||
"id": "color",
|
||||
"value": {
|
||||
"fixedColor": "#e00000",
|
||||
"mode": "fixed"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "custom.fillOpacity",
|
||||
"value": 100
|
||||
},
|
||||
{
|
||||
"id": "custom.lineWidth",
|
||||
"value": 0
|
||||
},
|
||||
{
|
||||
"id": "unit",
|
||||
"value": "percent"
|
||||
},
|
||||
{
|
||||
"id": "max",
|
||||
"value": 100
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 19,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 0
|
||||
},
|
||||
"id": 2,
|
||||
"options": {
|
||||
"legend": {
|
||||
"calcs": [
|
||||
"mean",
|
||||
"lastNotNull",
|
||||
"max",
|
||||
"min"
|
||||
],
|
||||
"displayMode": "table",
|
||||
"placement": "bottom"
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi",
|
||||
"sort": "none"
|
||||
}
|
||||
},
|
||||
"pluginVersion": "8.3.5",
|
||||
"targets": [
|
||||
{
|
||||
"alias": "latency",
|
||||
"groupBy": [
|
||||
{
|
||||
"params": [
|
||||
"$intervals"
|
||||
],
|
||||
"type": "time"
|
||||
},
|
||||
{
|
||||
"params": [
|
||||
"null"
|
||||
],
|
||||
"type": "fill"
|
||||
}
|
||||
],
|
||||
"measurement": "dpinger",
|
||||
"orderByTime": "ASC",
|
||||
"policy": "default",
|
||||
"query": "SELECT mean(\"latency\") FROM \"wan\" WHERE $timeFilter GROUP BY time($__interval) fill(null)",
|
||||
"queryType": "randomWalk",
|
||||
"rawQuery": false,
|
||||
"refId": "A",
|
||||
"resultFormat": "time_series",
|
||||
"select": [
|
||||
[
|
||||
{
|
||||
"params": [
|
||||
"latency"
|
||||
],
|
||||
"type": "field"
|
||||
},
|
||||
{
|
||||
"params": [],
|
||||
"type": "mean"
|
||||
}
|
||||
]
|
||||
],
|
||||
"tags": [
|
||||
{
|
||||
"key": "name",
|
||||
"operator": "=~",
|
||||
"value": "/^$name$/"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"alias": "stddev",
|
||||
"groupBy": [
|
||||
{
|
||||
"params": [
|
||||
"$intervals"
|
||||
],
|
||||
"type": "time"
|
||||
},
|
||||
{
|
||||
"params": [
|
||||
"null"
|
||||
],
|
||||
"type": "fill"
|
||||
}
|
||||
],
|
||||
"measurement": "dpinger",
|
||||
"orderByTime": "ASC",
|
||||
"policy": "default",
|
||||
"queryType": "randomWalk",
|
||||
"refId": "B",
|
||||
"resultFormat": "time_series",
|
||||
"select": [
|
||||
[
|
||||
{
|
||||
"params": [
|
||||
"stddev"
|
||||
],
|
||||
"type": "field"
|
||||
},
|
||||
{
|
||||
"params": [],
|
||||
"type": "mean"
|
||||
}
|
||||
]
|
||||
],
|
||||
"tags": [
|
||||
{
|
||||
"key": "name",
|
||||
"operator": "=~",
|
||||
"value": "/^$name$/"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"alias": "loss",
|
||||
"groupBy": [
|
||||
{
|
||||
"params": [
|
||||
"$intervals"
|
||||
],
|
||||
"type": "time"
|
||||
},
|
||||
{
|
||||
"params": [
|
||||
"null"
|
||||
],
|
||||
"type": "fill"
|
||||
}
|
||||
],
|
||||
"measurement": "dpinger",
|
||||
"orderByTime": "ASC",
|
||||
"policy": "default",
|
||||
"queryType": "randomWalk",
|
||||
"refId": "C",
|
||||
"resultFormat": "time_series",
|
||||
"select": [
|
||||
[
|
||||
{
|
||||
"params": [
|
||||
"loss"
|
||||
],
|
||||
"type": "field"
|
||||
},
|
||||
{
|
||||
"params": [],
|
||||
"type": "mean"
|
||||
}
|
||||
]
|
||||
],
|
||||
"tags": [
|
||||
{
|
||||
"key": "name",
|
||||
"operator": "=~",
|
||||
"value": "/^$name$/"
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"title": "$name - ${intervals} intervals",
|
||||
"transformations": [],
|
||||
"type": "timeseries"
|
||||
}
|
||||
],
|
||||
"refresh": "1m",
|
||||
"schemaVersion": 36,
|
||||
"style": "dark",
|
||||
"tags": [],
|
||||
"templating": {
|
||||
"list": [
|
||||
{
|
||||
"current": {
|
||||
"selected": false,
|
||||
"text": "dpinger",
|
||||
"value": "dpinger"
|
||||
},
|
||||
"hide": 0,
|
||||
"includeAll": false,
|
||||
"label": "Source",
|
||||
"multi": false,
|
||||
"name": "source",
|
||||
"options": [],
|
||||
"query": "influxdb",
|
||||
"queryValue": "",
|
||||
"refresh": 1,
|
||||
"regex": "",
|
||||
"skipUrlSync": false,
|
||||
"type": "datasource"
|
||||
},
|
||||
{
|
||||
"current": {
|
||||
"selected": false,
|
||||
"text": "wan",
|
||||
"value": "wan"
|
||||
},
|
||||
"datasource": {
|
||||
"type": "influxdb",
|
||||
"uid": "$source"
|
||||
},
|
||||
"definition": "SHOW TAG VALUES WITH KEY = \"name\"",
|
||||
"hide": 0,
|
||||
"includeAll": false,
|
||||
"label": "Name",
|
||||
"multi": false,
|
||||
"name": "name",
|
||||
"options": [],
|
||||
"query": "SHOW TAG VALUES WITH KEY = \"name\"",
|
||||
"refresh": 1,
|
||||
"regex": "",
|
||||
"skipUrlSync": false,
|
||||
"sort": 0,
|
||||
"tagValuesQuery": "",
|
||||
"tagsQuery": "",
|
||||
"type": "query",
|
||||
"useTags": false
|
||||
},
|
||||
{
|
||||
"auto": true,
|
||||
"auto_count": 500,
|
||||
"auto_min": "10s",
|
||||
"current": {
|
||||
"selected": false,
|
||||
"text": "auto",
|
||||
"value": "$__auto_interval_intervals"
|
||||
},
|
||||
"hide": 0,
|
||||
"label": "Intervals",
|
||||
"name": "intervals",
|
||||
"options": [
|
||||
{
|
||||
"selected": true,
|
||||
"text": "auto",
|
||||
"value": "$__auto_interval_intervals"
|
||||
},
|
||||
{
|
||||
"selected": false,
|
||||
"text": "10s",
|
||||
"value": "10s"
|
||||
},
|
||||
{
|
||||
"selected": false,
|
||||
"text": "30s",
|
||||
"value": "30s"
|
||||
},
|
||||
{
|
||||
"selected": false,
|
||||
"text": "1m",
|
||||
"value": "1m"
|
||||
},
|
||||
{
|
||||
"selected": false,
|
||||
"text": "2m",
|
||||
"value": "2m"
|
||||
},
|
||||
{
|
||||
"selected": false,
|
||||
"text": "5m",
|
||||
"value": "5m"
|
||||
},
|
||||
{
|
||||
"selected": false,
|
||||
"text": "10m",
|
||||
"value": "10m"
|
||||
},
|
||||
{
|
||||
"selected": false,
|
||||
"text": "15m",
|
||||
"value": "15m"
|
||||
},
|
||||
{
|
||||
"selected": false,
|
||||
"text": "30m",
|
||||
"value": "30m"
|
||||
},
|
||||
{
|
||||
"selected": false,
|
||||
"text": "1h",
|
||||
"value": "1h"
|
||||
},
|
||||
{
|
||||
"selected": false,
|
||||
"text": "6h",
|
||||
"value": "6h"
|
||||
},
|
||||
{
|
||||
"selected": false,
|
||||
"text": "12h",
|
||||
"value": "12h"
|
||||
},
|
||||
{
|
||||
"selected": false,
|
||||
"text": "1d",
|
||||
"value": "1d"
|
||||
},
|
||||
{
|
||||
"selected": false,
|
||||
"text": "7d",
|
||||
"value": "7d"
|
||||
}
|
||||
],
|
||||
"query": "10s,30s,1m,2m,5m,10m,15m,30m,1h,6h,12h,1d,7d",
|
||||
"queryValue": "",
|
||||
"refresh": 2,
|
||||
"skipUrlSync": false,
|
||||
"type": "interval"
|
||||
}
|
||||
]
|
||||
},
|
||||
"time": {
|
||||
"from": "now-24h",
|
||||
"to": "now"
|
||||
},
|
||||
"timepicker": {
|
||||
"refresh_intervals": [
|
||||
"1m",
|
||||
"5m"
|
||||
]
|
||||
},
|
||||
"timezone": "",
|
||||
"title": "WAN Latency",
|
||||
"uid": "ThwrgHYMk",
|
||||
"version": 46,
|
||||
"weekStart": ""
|
||||
}
|
||||
70
influx/dpinger_influx_logger
Executable file
70
influx/dpinger_influx_logger
Executable file
@@ -0,0 +1,70 @@
|
||||
#!/usr/bin/python
|
||||
|
||||
dpinger_path = "/usr/local/bin/dpinger"
|
||||
|
||||
import os
|
||||
import sys
|
||||
import signal
|
||||
import requests
|
||||
from subprocess import Popen, PIPE
|
||||
from requests import post
|
||||
|
||||
# Handle SIGINT
|
||||
def signal_handler(signal, frame):
|
||||
try:
|
||||
dpinger.kill()
|
||||
except:
|
||||
pass
|
||||
sys.exit(0)
|
||||
|
||||
signal.signal(signal.SIGINT, signal_handler)
|
||||
|
||||
# Handle command line ars
|
||||
progname = sys.argv.pop(0)
|
||||
if (len(sys.argv) < 4):
|
||||
print('Usage: {0} influx_url influx_db host name target [additional dpinger options]'.format(progname))
|
||||
print(' influx_url URL of the Influx server')
|
||||
print(' influx_db name of the Influx database')
|
||||
print(' host value of "host" tag (example: output of hostname command)')
|
||||
print(' name value of "name" tag (example: a circuit name such as "wan")')
|
||||
print(' target IP address to monitor (also the value of the "target" tag)')
|
||||
sys.exit(1)
|
||||
influx_url = sys.argv.pop(0)
|
||||
influx_db = sys.argv.pop(0)
|
||||
host = sys.argv.pop(0)
|
||||
name = sys.argv.pop(0)
|
||||
target = sys.argv.pop(0)
|
||||
|
||||
influx_user = os.getenv('INFLUX_USER')
|
||||
influx_pass = os.getenv('INFLUX_PASS')
|
||||
|
||||
# Set up dpinger command
|
||||
cmd = [dpinger_path, "-f"]
|
||||
cmd.extend(sys.argv)
|
||||
cmd.extend(["-r", "10s", target])
|
||||
|
||||
# Set up formats
|
||||
url = '{0}/write?db={1}'.format(influx_url, influx_db)
|
||||
datafmt = "dpinger,host={0},name={1},target={2} latency={{0:.3f}},stddev={{1:.3f}},loss={{2}}i".format(host, name, target)
|
||||
|
||||
# Start up dpinger
|
||||
try:
|
||||
dpinger = Popen(cmd, stdout=PIPE, text=True, bufsize=0)
|
||||
except:
|
||||
print("failed to start dpinger")
|
||||
sys.exit(1)
|
||||
|
||||
# Start the show
|
||||
while True:
|
||||
line = dpinger.stdout.readline()
|
||||
if (len(line) == 0):
|
||||
print("dpinger exited")
|
||||
sys.exit(1)
|
||||
|
||||
[latency, stddev, loss] = line.split()
|
||||
data = datafmt.format(float(latency) / 1000, float(stddev) / 1000, loss)
|
||||
#print(data)
|
||||
try:
|
||||
post(url = url, auth = (influx_user, influx_pass), data = data)
|
||||
except:
|
||||
print("post failed")
|
||||
7
influx/dpinger_start.sh
Executable file
7
influx/dpinger_start.sh
Executable file
@@ -0,0 +1,7 @@
|
||||
#!/bin/sh
|
||||
|
||||
INFLUX_URL="http://myinfluxhost:8086"
|
||||
export INFLUX_USER="dpinger"
|
||||
export INFLUX_PASS="myinfluxpass"
|
||||
|
||||
exec /usr/local/dpinger_influx_logger $INFLUX_URL dpinger `hostname` wan 8.8.8.8
|
||||
@@ -10,10 +10,21 @@ name="$1"
|
||||
rrdfile="${name}.rrd"
|
||||
echo "Creating rrd file ${rrdfile}"
|
||||
|
||||
rrdtool create "${rrdfile}" --step 1m \
|
||||
DS:latency:GAUGE:5m:0:U \
|
||||
DS:stddev:GAUGE:5m:0:U \
|
||||
DS:loss:GAUGE:5m:0:100 \
|
||||
RRA:AVERAGE:0.5:1m:15d \
|
||||
RRA:AVERAGE:0.5:5m:90d \
|
||||
RRA:AVERAGE:0.5:1h:3y
|
||||
|
||||
# Time duration method doesn't work in all versions of rrdtool
|
||||
#rrdtool create "${rrdfile}" --step 1m \
|
||||
# DS:latency:GAUGE:5m:0:U \
|
||||
# DS:stddev:GAUGE:5m:0:U \
|
||||
# DS:loss:GAUGE:5m:0:100 \
|
||||
# RRA:AVERAGE:0.5:1m:15d \
|
||||
# RRA:AVERAGE:0.5:5m:90d \
|
||||
# RRA:AVERAGE:0.5:1h:3y
|
||||
|
||||
# This method works in all versions
|
||||
rrdtool create "${rrdfile}" --step 60 \
|
||||
DS:latency:GAUGE:300:0:U \
|
||||
DS:stddev:GAUGE:300:0:U \
|
||||
DS:loss:GAUGE:300:0:100 \
|
||||
RRA:AVERAGE:0.5:1:21600 \
|
||||
RRA:AVERAGE:0.5:5:25920 \
|
||||
RRA:AVERAGE:0.5:60:26352
|
||||
|
||||
@@ -15,6 +15,11 @@ dpinger=/usr/local/bin/dpinger
|
||||
|
||||
|
||||
rrdfile="${name}.rrd"
|
||||
if [ \! -w ${rrdfile} ]
|
||||
then
|
||||
echo "$0: file \"${rrdfile}\" does not exist or is not writable"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
${dpinger} -f ${options} -s 500m -t 60s -r 60s ${targetip} |
|
||||
while read -r latency stddev loss; do
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
<html>
|
||||
<head><title>WAN Statistics</title></head>
|
||||
<head><title>>Latency Statistics for WAN</title></head>
|
||||
<body>
|
||||
<img src="/tmp/wan-1.png" alt="wan-1">
|
||||
<p>
|
||||
|
||||
Reference in New Issue
Block a user