Commit Graph

15 Commits

Author SHA1 Message Date
Jellyfrog
9946fe8b15 Format python code with Black (#12663) 2021-03-28 11:02:33 -05:00
Anthony F. McInerney
a625faaa1b service watchdog - add systemd watchdog for resiliency (#12188)
* Add systemd watchdog service

* Add systemd watchdog service

* Add systemd watchdog service - add try

* Add systemd watchdog service - add try

* Add systemd watchdog service - add try

* Add systemd watchdog service - add try

* Add systemd watchdog service - add try

* Add systemd watchdog service - update docs for python3-systemd

* systemd-watchdog - move to 10 second alert frequency

* systemd-watchdog - move to 10 second alert frequency

* systemd-watchdog - move to 30 second restart, 10 second delay between restarts

* systemd-watchdog - safely integrate changes

* systemd-watchdog - safely integrate changes

* systemd-watchdog - revert old doc changes

* systemd-watchdog -  doc typo fix
2021-03-22 10:34:45 -05:00
Richard Kojedzinszky
e3a1a239f9 Simplify process reaping (#12593)
Fixes #12427
2021-03-18 16:39:53 -05:00
Oahz Egroeg
50b99a4f1a Fixes #12480 (#12482)
Casting REDIS_TIMEOUT to integer

Co-authored-by: Oahz Egroeg <8146946+EgroegOahz@users.noreply.github.com>
2021-02-01 20:43:32 +01:00
Tony Murray
d314f6429a Attempt to fix dispatcher crash on restart (#12257) 2020-10-24 21:13:59 -05:00
Adam Bishop
41ed0537b4 Fix midnight poller data loss (#11582)
* Handle more signals

* Flush buffers before exiting process
This ensures log messages aren't lost

* Restart process before jobs have finished
If there is a very log running job it can cause service restart to
take over 5 minutes.

We tweak the order of things to make sure that running processes
continue, but nothing more is scheduled.

The worst case impact is that a pollling/discovery job gets
scheduled twice, but this should not be a big issue - this should
only occur at most once per day.

* Remove python 3.8 feature

* Ensure that processes from the previous invocation are reaped

* Correct typo's

* Attach subprocess descriptors to /dev/null

Occasionally, PHP would throw a fit and crash when its stdout went
away. To avoid this, we attach stdout to devnull.

This means we lost output of daily.sh - but this is already recorded
in $LOGDIR/daily.log

* Don't immediately schedule long running jobs

To avoid the situation where the maintenance reload happens or a sighup,
then a second long running job is immediately started, we wait
(`last_[poll/discovery]_timetaken` * 1.25) seconds before scheduling
any jobs.

* Add `psutil` to requirements

* Add support for "systemctl reload" to the unit files

* Add a fallback for systems that don't have psutil

* Reduce CPU load when psutil is not installed

* Don't avoid double polling by extending the timeout

This shouldn't happen due to locks

* Remove fallback option

* Remove extra variable

* Fix issue introduced during rebase

* Fix issue introduced when fixing issue introduced during rebase

* Make psutil optional
2020-09-29 23:50:40 -05:00
Tony Murray
38cfab612b Dispatch Service Fix maintenance issues (#11973)
If daily.sh exited with non-zero it would kill the maintenance thread, stopping daily.sh
The maintenance lock was never released, this wouldn't cause an issue in normal operation as it should expire.
2020-07-29 23:12:13 -05:00
Tony Murray
300645388f Dispatcher Service settings (#11760)
* Poller settings WIP

* Poller settings WIP2

* working on SettingMultiple

* setting multiple working

* settings sent with all required info

* fix translation

* Fix keys

* fix groups setting

* Apply settings to service
fixes and validations for setting

* don't error when no poller_cluster entry exists

* hid tab when no poller cluster entries

* Authorization

* make prod

* daily maintenance toggle should be advanced

* Update schema def
2020-06-08 08:27:03 -05:00
SourceDoctor
b89eb22cd5 Enumerate AlertState (#11665)
* Enumerate AlertState

* fix typo

* add missing use's

* .

* .
2020-05-23 21:14:36 -05:00
Hayden
cdb6a74dc8 implement watchdog to librenms-service (#11353)
* add watchdog to librenms-service to check log file
add Redis timeout to librenms-service

* updated docs

* fixed logfile_watchdog() indentation in service.py

* indentation fix

* code climate patch

* updated default redis timeout if alerting frequency is 0
2020-03-31 23:10:45 -05:00
bewing
74724a4618 Add redis sentinel support to dispatcher service (#10598)
* Add redis sentinel support to dispatcher service

* Update docs for redis sentinel support

* Don't re-raise python exception in service
2019-10-01 06:51:07 +00:00
Tony Murray
cf35d99319 Warn maintenance tasks are disabled (#10273) 2019-06-06 23:41:00 -05:00
Tony Murray
ecc05b07fb Fix couldn't disable alerting (#10258)
service_alerting_enable no will properly disable alerting (may be set globally or per node)
service_alerting_frequency will no properly control frequency (set globally the same, via db is best)
2019-05-23 16:07:45 -05:00
Tony Murray
604a200891 Python dispatcher service v2 (#10050)
* Refactor LibreNMS service
add ping

* services ported
remote legacy stats collection

* alerting

* implement unique queues

* update discovery queue manager

* remove message

* more cleanup

* Don't shuffle queue

* clean up imports

* don't try to discover ping only devices

* Fix for discovery not running timer

* Update docs a bit and and add some additional config options.
Intentionally undocumented.

* Wait until the device is marked up by the poller before discovering

* Handle loosing connection to db gracefully

* Attempt to release master after 5 db failures

* Sleep to give other nodes a chance to acquire

* Update docs and rename the doc to Dispatcher Service to more accurately reflect its function.

* add local notification
2019-05-20 11:35:47 -05:00
Tony Murray
0ba76e6d62 New python service for poller, discovery + more (#8455)
Currently has a file handle leak (and will eventually run out of handles) related to the self update process.

Either need to fix that or rip out self-update and leave that up to cron or something.


DO NOT DELETE THIS TEXT

#### Please note

> Please read this information carefully. You can run `./scripts/pre-commit.php` to check your code before submitting.

- [x] Have you followed our [code guidelines?](http://docs.librenms.org/Developing/Code-Guidelines/)

#### Testers

If you would like to test this pull request then please run: `./scripts/github-apply <pr_id>`, i.e `./scripts/github-apply 5926`
2018-06-30 12:19:49 +01:00