Commit Graph

23 Commits

Author SHA1 Message Date
Tony Murray
0f8862a747 Attempt to fix dispatcher stats thread exception (#13478)
* Attempt to fix dispatcher stats thread exception

* catch both exceptions

* Make it work when redis module does not exist

* fix style
2021-11-11 22:20:36 -06:00
Tony Murray
eb653619a3 Dispatcher bugfix queues not being disabled properly (#13364)
* Dispatcher bugfix queues not being disabled properly
Introduced in #13355
Affected queues: Alerting, Discovery, Services, Ping

Adjust log level of several messages

* better formatting? looks like garbage python black
2021-10-14 23:39:08 -05:00
Tony Murray
da57ea65f6 Dispatcher: Don't update poller groups when updating stats (#13363)
* Dispatcher: Don't update workers/frequency when updating stats

* Fix the right thing

* Don't set poller groups on the cluster entry, this way config.php can override if it hasn't been set by the webui
2021-10-14 19:41:35 -05:00
Tony Murray
436487f5f2 Dispatch Service: always start queue managers (#13355)
* dispatch service always start queue managers
Only start workers if they are enabled for this node

* style

* please stop mr ide, sir
2021-10-13 21:49:43 -05:00
Tony Murray
681508f45b Fix device query when last_polled_timetaken is null (#13331)
Caused by recent bug
2021-10-04 16:04:44 -05:00
Orsiris de Jong
bfa200f3f7 Full Python code fusion / refactor and hardening 2nd edition (#13188)
* New service/discovery/poller wrapper

* Convert old wrapper scripts to bootstrap loaders for wrapper.py

* Move wrapper.py to LibreNMS module directory

* Reformat files

* File reformatting

* bootstrap files reformatting

* Fusion service and wrapper database connections and get_config_data functions

* Moved subprocess calls to command_runner

* LibreNMS library and __init__ fusion

* Reformat files

* Normalize logging use

* Reformatting code

* Fix missing argument for error log

* Fix refactor typo in DBConfig class

* Add default timeout for config.php data fetching

* distributed discovery should finish with a timestamp instead of an epoch

* Fix docstring inside dict prevents service key to work

* Fix poller insert statement

* Fix service wrapper typo

* Update docstring since we changed function behavior

* Normalize SQL statements

* Convert optparse to argparse

* Revert discovery thread number

* Handle debug logging

* Fix file option typo

* Reformat code

* Add credits to source package

* Rename logs depending on the wrapper type

* Cap max logfile size to 10MB

* Reformat code

* Add exception for Redis < 5.0

* Make sure we always log something from service

* Fix bogus description

* Add an error message on missing config file

* Improve error message when .env file cannot be loaded

* Improve wrapper logging

* Fix cron run may fail when environment path is not set

* Add missing -wrapper suffix for logs

* Conform to prior naming scheme

* Linter fix

* Add inline copy of command_runner

* Another linter fix

* Raise exception after logging

* Updated inline command_runner

* Add command_runner to requirements

* I guess I love linter fixes ;)

* Don't spawn more threads than devices

* Fix typo in log call

* Add exit codes to log on error, add command line to debug log

* Add thread name to error message

* Log errors in end message for easier debugging

* Typo fix

* In love of linting
2021-09-27 14:24:25 -05:00
Tony Murray
31246c6ba6 Revert "Full Python code fusion / refactor and hardening (#13094)" (#13123)
This reverts commit 9c534a1a90.
2021-08-10 15:13:05 -05:00
Orsiris de Jong
9c534a1a90 Full Python code fusion / refactor and hardening (#13094)
* Add inline command_runner library

* New service/discovery/poller wrapper

* Convert old wrapper scripts to bootstrap loaders for wrapper.py

* Add command_runner to current requirements

* Move wrapper.py to LibreNMS module directory

* Reformat files

* File reformatting

* bootstrap files reformatting

* Fusion service and wrapper database connections and get_config_data functions

* Moved subprocess calls to command_runner

* LibreNMS library and __init__ fusion

* Reformat files

* Normalize logging use

* Reformatting code

* Fix missing argument for error log

* Fix refactor typo in DBConfig class

* Add default timeout for config.php data fetching

* distributed discovery should finish with a timestamp instead of an epoch

* Fix docstring inside dict prevents service key to work

* Fix poller insert statement

* Fix service wrapper typo

* Update docstring since we changed function behavior

* Normalize SQL statements

* Convert optparse to argparse

* Revert discovery thread number

* Handle debug logging

* Fix file option typo

* Reformat code

* Add credits to source package

* Rename logs depending on the wrapper type

* Cap max logfile size to 10MB

* Reformat code

* Add exception for Redis < 5.0

* Make sure we always log something from service

* Fix bogus description
2021-08-09 18:49:29 -05:00
Jellyfrog
9946fe8b15 Format python code with Black (#12663) 2021-03-28 11:02:33 -05:00
Anthony F. McInerney
a625faaa1b service watchdog - add systemd watchdog for resiliency (#12188)
* Add systemd watchdog service

* Add systemd watchdog service

* Add systemd watchdog service - add try

* Add systemd watchdog service - add try

* Add systemd watchdog service - add try

* Add systemd watchdog service - add try

* Add systemd watchdog service - add try

* Add systemd watchdog service - update docs for python3-systemd

* systemd-watchdog - move to 10 second alert frequency

* systemd-watchdog - move to 10 second alert frequency

* systemd-watchdog - move to 30 second restart, 10 second delay between restarts

* systemd-watchdog - safely integrate changes

* systemd-watchdog - safely integrate changes

* systemd-watchdog - revert old doc changes

* systemd-watchdog -  doc typo fix
2021-03-22 10:34:45 -05:00
Richard Kojedzinszky
e3a1a239f9 Simplify process reaping (#12593)
Fixes #12427
2021-03-18 16:39:53 -05:00
Oahz Egroeg
50b99a4f1a Fixes #12480 (#12482)
Casting REDIS_TIMEOUT to integer

Co-authored-by: Oahz Egroeg <8146946+EgroegOahz@users.noreply.github.com>
2021-02-01 20:43:32 +01:00
Tony Murray
d314f6429a Attempt to fix dispatcher crash on restart (#12257) 2020-10-24 21:13:59 -05:00
Adam Bishop
41ed0537b4 Fix midnight poller data loss (#11582)
* Handle more signals

* Flush buffers before exiting process
This ensures log messages aren't lost

* Restart process before jobs have finished
If there is a very log running job it can cause service restart to
take over 5 minutes.

We tweak the order of things to make sure that running processes
continue, but nothing more is scheduled.

The worst case impact is that a pollling/discovery job gets
scheduled twice, but this should not be a big issue - this should
only occur at most once per day.

* Remove python 3.8 feature

* Ensure that processes from the previous invocation are reaped

* Correct typo's

* Attach subprocess descriptors to /dev/null

Occasionally, PHP would throw a fit and crash when its stdout went
away. To avoid this, we attach stdout to devnull.

This means we lost output of daily.sh - but this is already recorded
in $LOGDIR/daily.log

* Don't immediately schedule long running jobs

To avoid the situation where the maintenance reload happens or a sighup,
then a second long running job is immediately started, we wait
(`last_[poll/discovery]_timetaken` * 1.25) seconds before scheduling
any jobs.

* Add `psutil` to requirements

* Add support for "systemctl reload" to the unit files

* Add a fallback for systems that don't have psutil

* Reduce CPU load when psutil is not installed

* Don't avoid double polling by extending the timeout

This shouldn't happen due to locks

* Remove fallback option

* Remove extra variable

* Fix issue introduced during rebase

* Fix issue introduced when fixing issue introduced during rebase

* Make psutil optional
2020-09-29 23:50:40 -05:00
Tony Murray
38cfab612b Dispatch Service Fix maintenance issues (#11973)
If daily.sh exited with non-zero it would kill the maintenance thread, stopping daily.sh
The maintenance lock was never released, this wouldn't cause an issue in normal operation as it should expire.
2020-07-29 23:12:13 -05:00
Tony Murray
300645388f Dispatcher Service settings (#11760)
* Poller settings WIP

* Poller settings WIP2

* working on SettingMultiple

* setting multiple working

* settings sent with all required info

* fix translation

* Fix keys

* fix groups setting

* Apply settings to service
fixes and validations for setting

* don't error when no poller_cluster entry exists

* hid tab when no poller cluster entries

* Authorization

* make prod

* daily maintenance toggle should be advanced

* Update schema def
2020-06-08 08:27:03 -05:00
SourceDoctor
b89eb22cd5 Enumerate AlertState (#11665)
* Enumerate AlertState

* fix typo

* add missing use's

* .

* .
2020-05-23 21:14:36 -05:00
Hayden
cdb6a74dc8 implement watchdog to librenms-service (#11353)
* add watchdog to librenms-service to check log file
add Redis timeout to librenms-service

* updated docs

* fixed logfile_watchdog() indentation in service.py

* indentation fix

* code climate patch

* updated default redis timeout if alerting frequency is 0
2020-03-31 23:10:45 -05:00
bewing
74724a4618 Add redis sentinel support to dispatcher service (#10598)
* Add redis sentinel support to dispatcher service

* Update docs for redis sentinel support

* Don't re-raise python exception in service
2019-10-01 06:51:07 +00:00
Tony Murray
cf35d99319 Warn maintenance tasks are disabled (#10273) 2019-06-06 23:41:00 -05:00
Tony Murray
ecc05b07fb Fix couldn't disable alerting (#10258)
service_alerting_enable no will properly disable alerting (may be set globally or per node)
service_alerting_frequency will no properly control frequency (set globally the same, via db is best)
2019-05-23 16:07:45 -05:00
Tony Murray
604a200891 Python dispatcher service v2 (#10050)
* Refactor LibreNMS service
add ping

* services ported
remote legacy stats collection

* alerting

* implement unique queues

* update discovery queue manager

* remove message

* more cleanup

* Don't shuffle queue

* clean up imports

* don't try to discover ping only devices

* Fix for discovery not running timer

* Update docs a bit and and add some additional config options.
Intentionally undocumented.

* Wait until the device is marked up by the poller before discovering

* Handle loosing connection to db gracefully

* Attempt to release master after 5 db failures

* Sleep to give other nodes a chance to acquire

* Update docs and rename the doc to Dispatcher Service to more accurately reflect its function.

* add local notification
2019-05-20 11:35:47 -05:00
Tony Murray
0ba76e6d62 New python service for poller, discovery + more (#8455)
Currently has a file handle leak (and will eventually run out of handles) related to the self update process.

Either need to fix that or rip out self-update and leave that up to cron or something.


DO NOT DELETE THIS TEXT

#### Please note

> Please read this information carefully. You can run `./scripts/pre-commit.php` to check your code before submitting.

- [x] Have you followed our [code guidelines?](http://docs.librenms.org/Developing/Code-Guidelines/)

#### Testers

If you would like to test this pull request then please run: `./scripts/github-apply <pr_id>`, i.e `./scripts/github-apply 5926`
2018-06-30 12:19:49 +01:00