test rack/lab power management software with usage/reference counting
In our test/lab setups, we do have a number of systems that run 24/7 but which are really used only very few hours per day. This has become very visible after we started (a few months ago) to deploy tasmota/influxdb/grafana for plotting many different power rails.
Direct on/off switching from within a given test job only works if that test job is the only user of the given resource (such as e.g. a BTS in osmo-gsm-tester).
For jenkins builders, OBS workers and similar machines, there could be any number of concurrent users. So there's no single job that can power on the resourec before using it, and power it off after it terminates.
What we need is a system that maintains a usage count, similar to how we do usage/reference counting in data structures in software development.
I've started to prototype a modular python daemon which offers a REST API over which users can obtain usage tokens for named resources. The daemon then keeps track of the current use count and switches resources on/off as needed.
Jenkins jobs would then (e.g. in a pipeline) first obtain a usage token (which would implicitly power up the resource if it is not aleady powwered), and release the usage token after they're gone. This way we can power up build machines only when needed, saving significant electrical power, reducing noise and minimizing heat dissipation.
Architecture / Class model¶
A Resource is typically some kind of physical equipment (server, build host, network equipment, ...) which one or multiple users may be using concurrently.
The Resource has a state, such as
|powered||Powered up, but not reachable yet|
|available||Powered up and reachable (typically via network)|
A Resource refers to a Switcher and an AvailabilityChecker
A Resource keeps a list of UsageToken; one for each concurrent user.
Switcher¶A Switcher is something that can switch power. Possible implementations include
- sispm compatible USB-switchable power sockets
- Intellinet rackmount PDUs with IP/HTTP interface
- Tasmota switchable power sockets
- ethernet wake-on-lan (on) + ssh-based shutdown (off)
- changing the power state (on/off)
- determining the current actual power state (if supported by hardware). This is important to get the state right at start-up time.
A SwitcherGroup is a logical group of multiple Switcher, for example the set of four switchable sispm sockets in one power strip, or the set of 8 switchable power ports in an Intellinet PDU.
An AvailabilityChecker is something that can check the logical availability of a (powered-up) resource. One common example for an IP-attached resources is an ICMP echo request/response based check.
An UsageToken for a Resource must be obtained by any user intending to use the Resource. The UsageToken has a validity time (in seconds), after which it automatically expires.
Release of the UsageToken can hence be either explicit (via REST API after the user is done) or implicit (timeout of the validity period).
Updated by laforge 26 days ago
- Status changed from New to In Progress
- % Done changed from 0 to 10
initial skeleton code in https://gitea.osmocom.org/laforge/osmo-lpmgd