Project

General

Profile

Actions

Feature #6023

open

test rack/lab power management software with usage/reference counting

Added by laforge 12 months ago. Updated 6 months ago.

Status:
Stalled
Priority:
Normal
Assignee:
Target version:
-
Start date:
05/04/2023
Due date:
% Done:

40%

Spec Reference:

Description

Problem Statement

In our test/lab setups, we do have a number of systems that run 24/7 but which are really used only very few hours per day. This has become very visible after we started (a few months ago) to deploy tasmota/influxdb/grafana for plotting many different power rails.

Direct on/off switching from within a given test job only works if that test job is the only user of the given resource (such as e.g. a BTS in osmo-gsm-tester).

For jenkins builders, OBS workers and similar machines, there could be any number of concurrent users. So there's no single job that can power on the resourec before using it, and power it off after it terminates.

What we need is a system that maintains a usage count, similar to how we do usage/reference counting in data structures in software development.

I've started to prototype a modular python daemon which offers a REST API over which users can obtain usage tokens for named resources. The daemon then keeps track of the current use count and switches resources on/off as needed.

Jenkins jobs would then (e.g. in a pipeline) first obtain a usage token (which would implicitly power up the resource if it is not aleady powwered), and release the usage token after they're gone. This way we can power up build machines only when needed, saving significant electrical power, reducing noise and minimizing heat dissipation.

Architecture / Class model

Resource

A Resource is typically some kind of physical equipment (server, build host, network equipment, ...) which one or multiple users may be using concurrently.

The Resource has a state, such as

State Description
off Powered down
powered Powered up, but not reachable yet
available Powered up and reachable (typically via network)

A Resource refers to a Switcher and an AvailabilityChecker

A Resource keeps a list of UsageToken; one for each concurrent user.

Switcher

A Switcher is something that can switch power. Possible implementations include
  • sispm compatible USB-switchable power sockets
  • Intellinet rackmount PDUs with IP/HTTP interface
  • Tasmota switchable power sockets
  • ethernet wake-on-lan (on) + ssh-based shutdown (off)
A Switcher implementation (inheriting from the abstract Switcher base class) provides methods for
  • changing the power state (on/off)
  • determining the current actual power state (if supported by hardware). This is important to get the state right at start-up time.

SwitcherGroup

A SwitcherGroup is a logical group of multiple Switcher, for example the set of four switchable sispm sockets in one power strip, or the set of 8 switchable power ports in an Intellinet PDU.

AvailabilityChecker

An AvailabilityChecker is something that can check the logical availability of a (powered-up) resource. One common example for an IP-attached resources is an ICMP echo request/response based check.

UsageToken

An UsageToken for a Resource must be obtained by any user intending to use the Resource. The UsageToken has a validity time (in seconds), after which it automatically expires.

Release of the UsageToken can hence be either explicit (via REST API after the user is done) or implicit (timeout of the validity period).

Actions #1

Updated by laforge 12 months ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 10
Actions #2

Updated by laforge 11 months ago

  • % Done changed from 10 to 40
Actions #3

Updated by laforge 6 months ago

  • Status changed from In Progress to Stalled
Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)