Commit Graph

30 Commits

Author SHA1 Message Date
Tony Murray
723600751c Dispatcher option to log output (#15230)
* Dispatcher option to log output
-o --log-output Log output into various files in the log directory
wire up -d option to be passed into scheduled commands
Caution, can fill your disk.

* style fixes

* more silly style fixes (and a typo accidentally added)

* final lint maybe?

* more lint...

* believe it or not, more lint
2023-08-21 18:28:07 -05:00
Tony Murray
d865e3b372 Always mark device last_discovered (#15218)
Previously, if the device was ping only, it wasn't marked as discovered.
Now we always run discovery, but basically all it does is update last_discovered.
2023-08-13 16:56:54 +02:00
Tony Murray
670904bd30 Fix service_master_timeout setting (#15217) 2023-08-11 14:20:39 -05:00
Tony Murray
a2f906c3f4 Allow dispatcher service master timeout to be set (#15161)
and increase default to 20s from 10s
20s should still be fast enough to prevent gaps, but larger installs can take longer than 10s (or even 20s) to do dispatch work.
2023-07-25 11:27:34 -05:00
AdamB
55b167562e Implement authentication for Redis/Sentinel (#14805)
* Implement ACL support for redis (and sentinel)

Currently, sentinel only works with anonymous connections.
Some parameters are passed when using sentinel, however these are
dropped on the floor.
This encapsulates them as py-redis expects, and passes them correctly.

* Pass username

* Differentiate duplicate error messages

* Actually pass var

* Docs and requirement bump

* Lint

* Consistency

* More lint

* Lint harder

* Doc Updates
2023-04-14 07:11:44 -05:00
Peter Childs
d35679d991 add poller_groups (served) to the poller_cluster table (#14886) 2023-03-09 17:20:49 +01:00
Nash Kaminski
9bb6b19832 Support for SSL/TLS protected connections to MySQL databases (#14142)
* Allow configuration of the SSL/TLS operating mode when connecting to a mysql database

* Support SSL/TLS DB connections in the dispatcher service as well

* Apply black formatting standards to Python files

* Suppress pylint errors as redis module is not installed when linting

* More pylint fixes

* Correct typo in logging output

* Refactor SSL/TLS changes into DBConfig class instead of ServiceConfig

* Define DB config variables as class vars instead of instance vars

* Break circular import
2022-08-07 14:53:29 -05:00
Tony Murray
0f8862a747 Attempt to fix dispatcher stats thread exception (#13478)
* Attempt to fix dispatcher stats thread exception

* catch both exceptions

* Make it work when redis module does not exist

* fix style
2021-11-11 22:20:36 -06:00
Tony Murray
eb653619a3 Dispatcher bugfix queues not being disabled properly (#13364)
* Dispatcher bugfix queues not being disabled properly
Introduced in #13355
Affected queues: Alerting, Discovery, Services, Ping

Adjust log level of several messages

* better formatting? looks like garbage python black
2021-10-14 23:39:08 -05:00
Tony Murray
da57ea65f6 Dispatcher: Don't update poller groups when updating stats (#13363)
* Dispatcher: Don't update workers/frequency when updating stats

* Fix the right thing

* Don't set poller groups on the cluster entry, this way config.php can override if it hasn't been set by the webui
2021-10-14 19:41:35 -05:00
Tony Murray
436487f5f2 Dispatch Service: always start queue managers (#13355)
* dispatch service always start queue managers
Only start workers if they are enabled for this node

* style

* please stop mr ide, sir
2021-10-13 21:49:43 -05:00
Tony Murray
681508f45b Fix device query when last_polled_timetaken is null (#13331)
Caused by recent bug
2021-10-04 16:04:44 -05:00
Orsiris de Jong
bfa200f3f7 Full Python code fusion / refactor and hardening 2nd edition (#13188)
* New service/discovery/poller wrapper

* Convert old wrapper scripts to bootstrap loaders for wrapper.py

* Move wrapper.py to LibreNMS module directory

* Reformat files

* File reformatting

* bootstrap files reformatting

* Fusion service and wrapper database connections and get_config_data functions

* Moved subprocess calls to command_runner

* LibreNMS library and __init__ fusion

* Reformat files

* Normalize logging use

* Reformatting code

* Fix missing argument for error log

* Fix refactor typo in DBConfig class

* Add default timeout for config.php data fetching

* distributed discovery should finish with a timestamp instead of an epoch

* Fix docstring inside dict prevents service key to work

* Fix poller insert statement

* Fix service wrapper typo

* Update docstring since we changed function behavior

* Normalize SQL statements

* Convert optparse to argparse

* Revert discovery thread number

* Handle debug logging

* Fix file option typo

* Reformat code

* Add credits to source package

* Rename logs depending on the wrapper type

* Cap max logfile size to 10MB

* Reformat code

* Add exception for Redis < 5.0

* Make sure we always log something from service

* Fix bogus description

* Add an error message on missing config file

* Improve error message when .env file cannot be loaded

* Improve wrapper logging

* Fix cron run may fail when environment path is not set

* Add missing -wrapper suffix for logs

* Conform to prior naming scheme

* Linter fix

* Add inline copy of command_runner

* Another linter fix

* Raise exception after logging

* Updated inline command_runner

* Add command_runner to requirements

* I guess I love linter fixes ;)

* Don't spawn more threads than devices

* Fix typo in log call

* Add exit codes to log on error, add command line to debug log

* Add thread name to error message

* Log errors in end message for easier debugging

* Typo fix

* In love of linting
2021-09-27 14:24:25 -05:00
Tony Murray
31246c6ba6 Revert "Full Python code fusion / refactor and hardening (#13094)" (#13123)
This reverts commit 9c534a1a90.
2021-08-10 15:13:05 -05:00
Orsiris de Jong
9c534a1a90 Full Python code fusion / refactor and hardening (#13094)
* Add inline command_runner library

* New service/discovery/poller wrapper

* Convert old wrapper scripts to bootstrap loaders for wrapper.py

* Add command_runner to current requirements

* Move wrapper.py to LibreNMS module directory

* Reformat files

* File reformatting

* bootstrap files reformatting

* Fusion service and wrapper database connections and get_config_data functions

* Moved subprocess calls to command_runner

* LibreNMS library and __init__ fusion

* Reformat files

* Normalize logging use

* Reformatting code

* Fix missing argument for error log

* Fix refactor typo in DBConfig class

* Add default timeout for config.php data fetching

* distributed discovery should finish with a timestamp instead of an epoch

* Fix docstring inside dict prevents service key to work

* Fix poller insert statement

* Fix service wrapper typo

* Update docstring since we changed function behavior

* Normalize SQL statements

* Convert optparse to argparse

* Revert discovery thread number

* Handle debug logging

* Fix file option typo

* Reformat code

* Add credits to source package

* Rename logs depending on the wrapper type

* Cap max logfile size to 10MB

* Reformat code

* Add exception for Redis < 5.0

* Make sure we always log something from service

* Fix bogus description
2021-08-09 18:49:29 -05:00
Jellyfrog
9946fe8b15 Format python code with Black (#12663) 2021-03-28 11:02:33 -05:00
Anthony F. McInerney
a625faaa1b service watchdog - add systemd watchdog for resiliency (#12188)
* Add systemd watchdog service

* Add systemd watchdog service

* Add systemd watchdog service - add try

* Add systemd watchdog service - add try

* Add systemd watchdog service - add try

* Add systemd watchdog service - add try

* Add systemd watchdog service - add try

* Add systemd watchdog service - update docs for python3-systemd

* systemd-watchdog - move to 10 second alert frequency

* systemd-watchdog - move to 10 second alert frequency

* systemd-watchdog - move to 30 second restart, 10 second delay between restarts

* systemd-watchdog - safely integrate changes

* systemd-watchdog - safely integrate changes

* systemd-watchdog - revert old doc changes

* systemd-watchdog -  doc typo fix
2021-03-22 10:34:45 -05:00
Richard Kojedzinszky
e3a1a239f9 Simplify process reaping (#12593)
Fixes #12427
2021-03-18 16:39:53 -05:00
Oahz Egroeg
50b99a4f1a Fixes #12480 (#12482)
Casting REDIS_TIMEOUT to integer

Co-authored-by: Oahz Egroeg <8146946+EgroegOahz@users.noreply.github.com>
2021-02-01 20:43:32 +01:00
Tony Murray
d314f6429a Attempt to fix dispatcher crash on restart (#12257) 2020-10-24 21:13:59 -05:00
Adam Bishop
41ed0537b4 Fix midnight poller data loss (#11582)
* Handle more signals

* Flush buffers before exiting process
This ensures log messages aren't lost

* Restart process before jobs have finished
If there is a very log running job it can cause service restart to
take over 5 minutes.

We tweak the order of things to make sure that running processes
continue, but nothing more is scheduled.

The worst case impact is that a pollling/discovery job gets
scheduled twice, but this should not be a big issue - this should
only occur at most once per day.

* Remove python 3.8 feature

* Ensure that processes from the previous invocation are reaped

* Correct typo's

* Attach subprocess descriptors to /dev/null

Occasionally, PHP would throw a fit and crash when its stdout went
away. To avoid this, we attach stdout to devnull.

This means we lost output of daily.sh - but this is already recorded
in $LOGDIR/daily.log

* Don't immediately schedule long running jobs

To avoid the situation where the maintenance reload happens or a sighup,
then a second long running job is immediately started, we wait
(`last_[poll/discovery]_timetaken` * 1.25) seconds before scheduling
any jobs.

* Add `psutil` to requirements

* Add support for "systemctl reload" to the unit files

* Add a fallback for systems that don't have psutil

* Reduce CPU load when psutil is not installed

* Don't avoid double polling by extending the timeout

This shouldn't happen due to locks

* Remove fallback option

* Remove extra variable

* Fix issue introduced during rebase

* Fix issue introduced when fixing issue introduced during rebase

* Make psutil optional
2020-09-29 23:50:40 -05:00
Tony Murray
38cfab612b Dispatch Service Fix maintenance issues (#11973)
If daily.sh exited with non-zero it would kill the maintenance thread, stopping daily.sh
The maintenance lock was never released, this wouldn't cause an issue in normal operation as it should expire.
2020-07-29 23:12:13 -05:00
Tony Murray
300645388f Dispatcher Service settings (#11760)
* Poller settings WIP

* Poller settings WIP2

* working on SettingMultiple

* setting multiple working

* settings sent with all required info

* fix translation

* Fix keys

* fix groups setting

* Apply settings to service
fixes and validations for setting

* don't error when no poller_cluster entry exists

* hid tab when no poller cluster entries

* Authorization

* make prod

* daily maintenance toggle should be advanced

* Update schema def
2020-06-08 08:27:03 -05:00
SourceDoctor
b89eb22cd5 Enumerate AlertState (#11665)
* Enumerate AlertState

* fix typo

* add missing use's

* .

* .
2020-05-23 21:14:36 -05:00
Hayden
cdb6a74dc8 implement watchdog to librenms-service (#11353)
* add watchdog to librenms-service to check log file
add Redis timeout to librenms-service

* updated docs

* fixed logfile_watchdog() indentation in service.py

* indentation fix

* code climate patch

* updated default redis timeout if alerting frequency is 0
2020-03-31 23:10:45 -05:00
bewing
74724a4618 Add redis sentinel support to dispatcher service (#10598)
* Add redis sentinel support to dispatcher service

* Update docs for redis sentinel support

* Don't re-raise python exception in service
2019-10-01 06:51:07 +00:00
Tony Murray
cf35d99319 Warn maintenance tasks are disabled (#10273) 2019-06-06 23:41:00 -05:00
Tony Murray
ecc05b07fb Fix couldn't disable alerting (#10258)
service_alerting_enable no will properly disable alerting (may be set globally or per node)
service_alerting_frequency will no properly control frequency (set globally the same, via db is best)
2019-05-23 16:07:45 -05:00
Tony Murray
604a200891 Python dispatcher service v2 (#10050)
* Refactor LibreNMS service
add ping

* services ported
remote legacy stats collection

* alerting

* implement unique queues

* update discovery queue manager

* remove message

* more cleanup

* Don't shuffle queue

* clean up imports

* don't try to discover ping only devices

* Fix for discovery not running timer

* Update docs a bit and and add some additional config options.
Intentionally undocumented.

* Wait until the device is marked up by the poller before discovering

* Handle loosing connection to db gracefully

* Attempt to release master after 5 db failures

* Sleep to give other nodes a chance to acquire

* Update docs and rename the doc to Dispatcher Service to more accurately reflect its function.

* add local notification
2019-05-20 11:35:47 -05:00
Tony Murray
0ba76e6d62 New python service for poller, discovery + more (#8455)
Currently has a file handle leak (and will eventually run out of handles) related to the self update process.

Either need to fix that or rip out self-update and leave that up to cron or something.


DO NOT DELETE THIS TEXT

#### Please note

> Please read this information carefully. You can run `./scripts/pre-commit.php` to check your code before submitting.

- [x] Have you followed our [code guidelines?](http://docs.librenms.org/Developing/Code-Guidelines/)

#### Testers

If you would like to test this pull request then please run: `./scripts/github-apply <pr_id>`, i.e `./scripts/github-apply 5926`
2018-06-30 12:19:49 +01:00