Scenario 1¶
Context¶
In this use case, the application is distributed over 3 nodes. The process distribution is fixed. The application logs and other data are written to a disk that is made available through a NFS mount point.
Requirements¶
Here are the use case requirements:
- Requirement 1
Due to the inter-processes communication scheme, the process distribution shall be fixed.
- Requirement 2
The application shall wait for the NFS mount point before it is started.
- Requirement 3
An operational status of the application shall be provided.
- Requirement 4
The user shall not be able to start an unexpected application process on any other node.
- Requirement 5
The application shall be restarted on the 3 nodes upon user request.
- Requirement 6
There shall be a non-distributed configuration for developers’ use, assuming a different inter-processes communication scheme.
- Requirement 7
The non-distributed configuration shall not wait for the NFS mount point.
Supervisor configuration¶
There are undoubtedly many ways to skin the cat. Here follows one solution.
As an answer to Requirement 1 and Requirement 4, let’s split the Supervisor configuration file into 4 parts:
the
supervisord.conf
configuration file ;the program definitions and the group definition (
.ini
files) for the first node ;the program definitions and the group definition (
.ini
files) for the second node ;the program definitions and the group definition (
.ini
files) for the third node.
All programs are configured using autostart=true
.
For packaging facility, the full configuration is available to all nodes but the include
section of the
configuration file uses the host_node_name
so that the running configuration is actually different on all nodes.
[include]
files = %(host_node_name)s/*.ini
The resulting file tree would be as follows.
[bash] > tree
.
├── etc
│ ├── rocky51
│ │ ├── group_rocky51.ini
│ │ └── programs_rocky51.ini
│ ├── rocky52
│ │ ├── group_rocky52.ini
│ │ └── programs_rocky52.ini
│ ├── rocky53
│ │ ├── group_rocky53.ini
│ │ └── programs_rocky53.ini
│ └── supervisord.conf
For Requirement 6, let’s just define a group where all programs are declared. The proposal is to have 2 Supervisor configuration files, one for the distributed application and the other for the non-distributed application, the variation being just in the include section.
[bash] > tree
.
├── etc
│ ├── rocky51
│ │ ├── group_rocky51.ini
│ │ └── programs_rocky51.ini
│ ├── rocky52
│ │ ├── group_rocky52.ini
│ │ └── programs_rocky52.ini
│ ├── rocky53
│ │ ├── group_rocky53.ini
│ │ └── programs_rocky53.ini
│ ├── localhost
│ │ ├── group_localhost.ini
│ │ └── programs_localhost.ini
│ ├── supervisord.conf -> supervisord_distributed.conf
│ ├── supervisord_distributed.conf
│ ├── supervisord_localhost.conf
│ └── supvisors-rules.xml
Here is the resulting include
sections:
# include section for distributed application in supervisord_distributed.conf
[include]
files = %(host_node_name)s/*.ini
# include section for non-distributed application in supervisord_localhost.conf
[include]
files = localhost/*.ini
About Requirement 2, Supervisor does not provide any facility to stage the starting sequence (refer to Issue #122 - supervisord Starts All Processes at the Same Time). A workaround here would be to insert a wait loop in all the application programs (in the program command line or in the program source code). The idea of pushing this wait loop outside the Supervisor scope - just before starting supervisord - is excluded as it would impose this dependency on other applications eventually managed by Supervisor.
With regard to Requirement 7, this workaround would require different program commands or parameters, so finally different program definitions from Supervisor configuration perspective.
Supervisor provides nothing for Requirement 3. The user has to evaluate the operational status based on the process status provided by the Supervisor instances on the 3 nodes, either using multiple supervisorctl shell commands, XML-RPCs or event listeners.
To restart the whole application (Requirement 5), the user can perform supervisorctl shell commands or XML-RPCs on each Supervisor instance.
[bash] > for i in rocky51 rocky52 rocky53
... do
... supervisorctl -s http://$i:<port> restart scenario_1:*
... done
Eventually, all the requirements could be met using Supervisor but it would require additional software development at application level to build an operational status, based on process information provided by Supervisor.
It would also require some additional complexity in the configuration files and in the program command lines to manage a staged starting sequence of the programs in the group and to manage the distribution of the application over different platforms.
Involving Supvisors¶
A solution based on Supvisors could use the following Supervisor configuration (same principles as the previous section):
the
supervisord_distributed.conf
configuration file for the distributed application ;the
supervisord_localhost.conf
configuration file for the non-distributed application ;the program definitions and the group definition (
.ini
files) for the first node ;the program definitions and the group definition (
.ini
files) for the second node ;the program definitions and the group definition (
.ini
files) for the third node ;the group definition including all application programs for a local node.
All programs are now configured using autostart=false
.
Introducing the staged start sequence¶
About Requirement 2, Supvisors manages staged starting sequences and it offers a possibility to wait for a planned exit of a process in the sequence. So let’s define a program scen1_wait_nfs_mount[_X] per node and whose role is to exit (using an expected exit code, as defined in Supervisor program configuration) as soon as the NFS mount is available.
Satisfying Requirement 7 is just about avoiding the inclusion of the scen1_wait_nfs_mount[_X] programs in the Supervisor configuration file in the case of a non-distributed application. That’s why the Supervisor configuration of these programs is isolated from the configuration of the other programs. That way, Supvisors makes it possible to avoid an impact to program definitions, scripts and source code when dealing with such a requirement.
Here follows what the include section may look like in both Supervisor configuration files.
# include section for distributed application in supervisord_distributed.conf (unchanged)
[include]
files = %(host_node_name)s/*.ini
# include section for non-distributed application in supervisord_localhost.conf
# the same program definitions as the distributed application are used
[include]
files = */programs_*.ini localhost/group_localhost.ini
Rules file¶
Now that programs are not started automatically by Supervisor, a Supvisors rules file is needed to define the staged starting sequence. A first naive - yet functional - approach would be to use a model for all programs to be started on the same node.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<root>
<!-- models -->
<model name="model_rocky51">
<identifiers>rocky51</identifiers>
<start_sequence>2</start_sequence>
<required>true</required>
</model>
<model name="model_rocky52">
<reference>model_rocky51</reference>
<identifiers>rocky52</identifiers>
</model>
<model name="model_rocky53">
<reference>model_rocky51</reference>
<identifiers>rocky53</identifiers>
</model>
<!-- Scenario 1 Application -->
<application name="scen1">
<start_sequence>1</start_sequence>
<starting_failure_strategy>CONTINUE</starting_failure_strategy>
<programs>
<!-- Programs on rocky51 -->
<program name="scen1_hci">
<reference>model_rocky51</reference>
</program>
<program name="scen1_config_manager">
<reference>model_rocky51</reference>
</program>
<program name="scen1_data_processing">
<reference>model_rocky51</reference>
</program>
<program name="scen1_external_interface">
<reference>model_rocky51</reference>
</program>
<program name="scen1_data_recorder">
<reference>model_rocky51</reference>
</program>
<program name="scen1_wait_nfs_mount_1">
<reference>model_rocky51</reference>
<start_sequence>1</start_sequence>
<wait_exit>true</wait_exit>
</program>
<!-- Programs on rocky52 -->
<program name="scen1_sensor_acquisition_1">
<reference>model_rocky52</reference>
</program>
<program name="scen1_sensor_processing_1">
<reference>model_rocky52</reference>
</program>
<program name="scen1_wait_nfs_mount_2">
<reference>model_rocky52</reference>
<start_sequence>1</start_sequence>
<wait_exit>true</wait_exit>
</program>
<!-- Programs on rocky53 -->
<program name="scen1_sensor_acquisition_2">
<reference>model_rocky53</reference>
</program>
<program name="scen1_sensor_processing_2">
<reference>model_rocky53</reference>
</program>
<program name="scen1_wait_nfs_mount_3">
<reference>model_rocky53</reference>
<start_sequence>1</start_sequence>
<wait_exit>true</wait_exit>
</program>
</programs>
</application>
</root>
Note
About the choice to prefix all program names with ‘scen1_’
These programs are all included in a Supervisor group named scen1
. It may indeed seem useless to add the
information into the program name. Actually the program names are quite generic and at some point the intention is
to group all the applications of the different use cases into an unique Supvisors configuration. Adding scen1
at this point is just to avoid overwriting of program definitions.
Note
A few words about how the scen1_wait_nfs_mount[_X] programs have been introduced here. It has to be noted that:
the
start_sequence
of these programs is lower than thestart_sequence
of the other application programs ;their attribute
wait_exit
is set totrue
.
The consequence is that the 3 programs scen1_wait_nfs_mount[_X] are started first on their respective node when starting the scen1 application. Then Supvisors waits for all of them to exit before it triggers the starting of the other programs.
Well, assuming that the node name could be included as a prefix to the program names, that would simplify the rules file a bit.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<root>
<!-- models -->
<model name="model_rocky51">
<identifiers>rocky51</identifiers>
<start_sequence>2</start_sequence>
<required>true</required>
</model>
<model name="model_rocky52">
<reference>model_rocky51</reference>
<identifiers>rocky52</identifiers>
</model>
<model name="model_rocky53">
<reference>model_rocky51</reference>
<identifiers>rocky53</identifiers>
</model>
<!-- Scenario 1 Application -->
<application name="scen1">
<start_sequence>1</start_sequence>
<starting_failure_strategy>CONTINUE</starting_failure_strategy>
<programs>
<!-- Programs on rocky51 -->
<program pattern="rocky51_">
<reference>model_rocky51</reference>
</program>
<program name="scen1_wait_nfs_mount_1">
<reference>model_rocky51</reference>
<start_sequence>1</start_sequence>
<wait_exit>true</wait_exit>
</program>
<!-- Programs on rocky52 -->
<program pattern="rocky52_">
<reference>model_rocky52</reference>
</program>
<program name="scen1_wait_nfs_mount_2">
<reference>model_rocky52</reference>
<start_sequence>1</start_sequence>
<wait_exit>true</wait_exit>
</program>
<!-- Programs on rocky53 -->
<program pattern="rocky53_">
<reference>model_rocky53</reference>
</program>
<program name="scen1_wait_nfs_mount_3">
<reference>model_rocky53</reference>
<start_sequence>1</start_sequence>
<wait_exit>true</wait_exit>
</program>
</programs>
</application>
</root>
A bit shorter, still functional but the program names are now quite ugly. And the non-distributed version has not been
considered yet. With this approach, a different rules file is required to replace the node names with the developer’s
host name - assumed called rocky51
here for the example.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<root>
<!-- Scenario 1 Application -->
<application name="scen1">
<start_sequence>1</start_sequence>
<starting_failure_strategy>CONTINUE</starting_failure_strategy>
<programs>
<!-- Programs on localhost -->
<program pattern="">
<identifiers>rocky51</identifiers>
<start_sequence>1</start_sequence>
<required>true</required>
</program>
</programs>
</application>
</root>
This rules file is fairly simple here as all programs have the exactly same rules.
Hint
When the same rules apply to all programs in an application, an empty pattern can be used as it will match all program names of the application.
But actually, there is a much more simple solution in the present case. Let’s consider this instead:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<root>
<!-- models -->
<model name="model_scenario_1">
<start_sequence>2</start_sequence>
<required>true</required>
</model>
<!-- Scenario 1 Application -->
<application name="scen1">
<start_sequence>1</start_sequence>
<starting_failure_strategy>CONTINUE</starting_failure_strategy>
<programs>
<program pattern="">
<reference>model_scenario_1</reference>
</program>
<program pattern="wait_nfs_mount">
<reference>model_scenario_1</reference>
<start_sequence>1</start_sequence>
<wait_exit>true</wait_exit>
</program>
</programs>
</application>
</root>
Much shorter. Yet it does the same. For both the distributed application and the non-distributed application !
The main point is that the identifiers
attribute is not used at all. Clearly, this gives Supvisors
the authorization to start all programs on every nodes. However Supvisors knows about the Supervisor configuration
in the 3 nodes. When choosing a node to start a program, Supvisors considers the intersection between the authorized
nodes - all of them here - and the possible nodes, i.e. the active nodes where the program is defined in Supervisor.
One of the first decisions in this use case is that every programs are known to only one Supervisor instance so that
gives Supvisors only one possibility.
For Requirement 3, Supvisors provides the operational status of the application based on the status of its
processes, in accordance with their importance. In the present example, all programs are defined with the same
importance (required
set to true
).
The key point here is that Supvisors is able to build a single application from the processes configured on the 3 nodes because the same group name (scen1) is used in all Supervisor configuration files. This also explains why scen1_wait_nfs_mount[_X] has been suffixed with a number. Otherwise, Supvisors would have detected 3 running instances of the same program in a Managed application, which is considered as a conflict and leads to a Conciliation phase. Please refer to Conciliation for more details.
Here follows the relevant sections of the supervisord_distributed.conf
configuration file, including the declaration
of the Supvisors plugin.
[include]
files = %(host_node_name)s/*.ini
[rpcinterface:supvisors]
supervisor.rpcinterface_factory = supvisors.plugin:make_supvisors_rpcinterface
supvisors_list = rocky51,rocky52,rocky53
rules_files = etc/supvisors_rules.xml
[ctlplugin:supvisors]
supervisor.ctl_factory = supvisors.supvisorsctl:make_supvisors_controller_plugin
And the equivalent in the supervisord_localhost.conf
configuration file. No supvisors_list
is provided here as
the default value is the local host name, which is perfectly suitable here.
[include]
files = */programs_*.ini localhost/group_localhost.ini
[rpcinterface:supvisors]
supervisor.rpcinterface_factory = supvisors.plugin:make_supvisors_rpcinterface
rules_files = etc/supvisors_rules.xml
[ctlplugin:supvisors]
supervisor.ctl_factory = supvisors.supvisorsctl:make_supvisors_controller_plugin
The final file tree is as follows.
[bash] > tree
.
├── etc
│ ├── rocky51
│ │ ├── group_rocky51.ini
│ │ ├── programs_rocky51.ini
│ │ └── wait_nfs_mount.ini
│ ├── rocky52
│ │ ├── group_rocky52.ini
│ │ ├── programs_rocky52.ini
│ │ └── wait_nfs_mount.ini
│ ├── rocky53
│ │ ├── group_rocky53.ini
│ │ ├── programs_rocky53.ini
│ │ └── wait_nfs_mount.ini
│ ├── localhost
│ │ └── group_localhost.ini
│ ├── supervisord.conf -> supervisord_distributed.conf
│ ├── supervisord_distributed.conf
│ ├── supervisord_localhost.conf
│ └── supvisors_rules.xml
Control & Status¶
The operational status of Scenario 1 required by the Requirement 3 is made available through:
the Application Page of the Supvisors Web UI, as a LED near the application state,
the XML-RPC API (example below),
the REST API (if supvisorsflask is started),
the Status of the extended supervisorctl or supvisorsctl (example below),
the Event interface.
>>> from supervisor.childutils import getRPCInterface
>>> proxy = getRPCInterface({'SUPERVISOR_SERVER_URL': 'http://localhost:61000'})
>>> proxy.supvisors.get_application_info('scen1')
{'application_name': 'scen1', 'statecode': 2, 'statename': 'RUNNING', 'major_failure': False, 'minor_failure': False}
[bash] > supervisorctl -c etc/supervisord_localhost.conf application_info scen1
Node State Major Minor
scen1 RUNNING True False
[bash] > supvisorsctl -s http://localhost:61000 application_info scen1
Node State Major Minor
scen1 RUNNING True False
To restart the whole application (Requirement 5), the following methods are available:
the XML-RPC API (example below),
the REST API (if supvisorsflask is started),
the Status of the extended supervisorctl or supvisorsctl (example below),
the restart button at the top right of the Application Page of the Supvisors Web UI.
>>> from supervisor.childutils import getRPCInterface
>>> proxy = getRPCInterface({'SUPERVISOR_SERVER_URL': 'http://localhost:61000'})
>>> proxy.supvisors.restart_application('CONFIG', 'scen1')
True
[bash] > supervisorctl -c etc/supervisord_localhost.conf restart_application CONFIG scen1
scenario_1 restarted
[bash] > supvisorsctl -s http://localhost:61000 restart_application CONFIG scen1
scenario_1 restarted
Here is a snapshot of the Application page of the Supvisors Web UI for the Scenario 1 application.
As a conclusion, all the requirements are met using Supvisors and without any impact on the application to be supervised. Supvisors improves application control and status.
Example¶
The full example is available in Supvisors Use Cases - Scenario 1.