Conductor Documentation

WRCP Plugin

Introduction

This plugin has two main capabilities: Discover and manage existing Wind River Cloud Platform (WRCP) installations and make new WRCP installations. For management we can discover system controllers and subclouds, upgrade kubernetes, upgrade trident, certificate audit and “upload and apply” patch and WRA management. For the installation capabilities, this plugin has the ability to provision a fully working WRCP installation for AIO-SX and AIO-DX configurations.

NOTE:

WRCP Management

Each item on the following list is materialized as a workflow in the plugin.

Prerequisites

All stories require that you provide secrets for the following values:

You are free to change the names of the secrets, however, you must provide the secret names in the deployment inputs when enrolling a new system.

Node Types

cloudify.nodes.starlingx.WRCP This node represents a WRCP System. A system can be a System Controller, Standalone system, or a Subcloud.

Properties

Runtime Properties:

Labels

The following are possible for a WRCP system deployment:

Sites

During installation the plugin requests a system’s location (latitude and longitude) from the StarlingX API, and creates a Site in Conductor. You can see sites’ location on the map on the Dashboard screen.

discover_and_deploy

This workflow discovers and deploys subclouds. The steps followed are listed next.

  1. Checks if there are discovered subclouds using the “Discover Subclouds” workflow, then gets the controller node instance, and uses the parent deployment’s capabilities to get auth data.

  2. Checks if the deployment_id passed from the workflow is valid and if there are 2 subclouds or more in the controller node instance runtime properties. If not, raises an Exception.

  3. Then, for each of the subclouds, checks if the subcloud has already been discovered and deployed. If not, checks if the address of the subcloud is ipv4 or ipv6, formats and dict of “inputs”, adds the labels, adds the id of each of the deployments to a list, as the inputs and labels.

  4. To complete, calls deploy_subclouds with the data gathered in the previous step.

upload_and_apply_patch ##

Workflow to install patches. It includes the following steps:

Parameters:

      type_names:
        default: []
        description: |
          Type names for which the workflow will execute the update operation.
          By default the operation will execute for nodes of type
          cloudify.nodes.starlingx.WRCP
      node_ids:
        default: []
        description: |
          Node template IDs for which the workflow will execute the update
          operation
      node_instance_ids:
        default: []
        description: |
          Node instance IDs for which the workflow will execute the update
          operation
      patch_dir:
        default: ''
        description: |
          The path to a directory on the manager where the patches are located.
          The patches will be uploaded and applied.
      max_parallel_subclouds:
        default: 1
        description: |
          The maximum number of workers within a subcloud to update in
          parallel
      stop_on_failure:
        default: True
        description: |
          Flag to indicate if the update should stop
          updating additional subclouds if a failure
          is encountered
      subcloud_apply_type:
        default: 'serial'
        description: |
          The apply type for the update. serial or parallel
      strategy_action:
        default: 'apply'
        description: |
          Perform an action on the update strategy.
          Valid values are: apply, or abort
      apply_patches:
        type: boolean
        default: True
        description: |
          Flag to indicate if the patches should be applied.
          When the flag is force also update will be skipped
      install_patches:
        type: boolean
        default: True
        description: |
          Flag to indicate if the patches should be installed.
      check_status:
        type: boolean
        default: True
        description: |
          Flag to indicate if status of installation should be checked.

check_update_status

Checking status of strategy steps and update labels if failed/complete.

Parameters:

      type_names:
        default: [ ]
        description: |
          Type names for which the workflow will execute the update
          check_update_status operation.
          By default the operation will execute for nodes of type
          cloudify.nodes.starlingx.WRCP
      node_ids:
        default: [ ]
        description: |
          Node template IDs for which the workflow will execute the
          check_update_status operation
      node_instance_ids:
        default: [ ]
        description: |
          Node instance IDs for which the workflow will execute the
          check_update_status operation

refresh_status

Workflow starts check_update_status on each subcloud.

Parameters:

      type_names:
        default: []
        description: |
          Type names for which the workflow will execute the refresh_status operation.
          By default the operation will execute for nodes of type
          cloudify.nodes.starlingx.WRCP
      node_ids:
        default: []
        description: |
          Node template IDs for which the workflow will execute the refresh_status
          operation
      node_instance_ids:
        default: []
        description: |
          Node instance IDs for which the workflow will execute the refresh_status
          operation

Upgrade

ATTENTION: Before start workflow make sure that below files does not exist on controller-0. These will be automatically copied from the load installed during the upgrade.

Workflow to upgrade sw version on nodes, deployment managers and subcloud. It contains the following steps:

Parameters:

      type_names:
        default: []
        description: |
          Type names for which the workflow will execute the upgrade operation.
          By default the operation will execute for nodes of type
          cloudify.nodes.starlingx.WRCP
      node_ids:
        default: []
        description: |
          Node template IDs for which the workflow will execute the upgrade
          operation
      node_instance_ids:
        default: []
        description: |
          Node instance IDs for which the workflow will execute the upgrade
          operation
      sw_version:
        default: ''
        description: |
          SW version to upgrade to. Example: 21.12
      license_file_path:
        default: ''
        description: |
          File path on the manager where the license file is located.
          This license file will be applied as a part of upgrade process.
      iso_path:
        default: ''
        description: |
          File path on the manager where the iso with new SW version is
          located. Example: /home/user/21.12/bootimage.iso
          cfy_user needs to be able to access this file.
      sig_path:
        default: ''
        description: |
          File path on the manager where the iso signarute is
          located. Example: /home/user/21.12/bootimage.sig
          cfy_user needs to be able to access this file.
      type_of_strategy:
        default: 'upgrade'
        description: |
          The Subcloud update strategy to be used for subclouds.
          One of: firmware, kube-rootca-update, kubernetes, patch, prestage,
          or upgrade
      subcloud_apply_type:
        default: 'parallel'
        description: |
          The apply type for the update. serial or parallel
      strategy_action:
        default: 'apply'
        description: |
           Perform an action on the update strategy.
           Valid values are: apply, or abort
      max_parallel_subclouds:
        default: 1
        description: |
           The maximum number of workers within a subcloud to update in
           parallel
      stop_on_failure:
        default: True
        description: |
          Flag to indicate if the update should stop
          updating additional subclouds if a failure
          is encountered
      force_flag:
        default: True
        description: |
          Force upgrade to run. Required if the workflow needs to run while
          there are active alarms.

run_on_subclouds

The run_on_subclouds workflow creates temporary execution group based on provided parameters, start batch execution on created group (run workflow with name workflow_id on all matched deployments) and after finish the workflow (all executions are completed) delete this group.

Currently, the workflow is looking for match in all deployments (including environment and not related deployment) so the user must use labels and filters carefully.

Parameters:

      workflow_id:
        description: |
          The name of the workflow to execute on all matched deployments (deployments group)
        default: ''
      workflow_inputs:
        description: The workflow parameters required during workflow execution
        default: {}
      labels:
        description: The labels on the basis of which deployments are selected to create
          a deployment group and start batch execution of workflow on each matched subcloud.
          The labels are optional and can be provided parallel with filter_ids
        default: {}
      filter_ids:
        description: |
          List of filter ids on the basis of which deployments are selected to create
          a deployment group and start batch execution of workflow on each matched subcloud.
          The filter_ids can be provided parallel with labels.
          The provided id must be really id of filter
        default: []
      run_on_all_subclouds:
        type: boolean
        description: |
          You can run a workflow on all sub environments.
          When the parameters is True, the deployments will be selected based on rule:
          {"csys-obj-parent": "<parent_deployment_id>"}
          In this case, labels rules and filter_ids will be skipped
        default: False
      type_names:
        default: [ ]
        description: |
          Type names for which the workflow will execute the workflow,
          especially start_subclouds_executions and
          wait_for_execution_end_and_delete_deployment_group operation.
          By default the operation will execute for nodes of type
          cloudify.nodes.starlingx.WRCP
      node_ids:
        default: [ ]
        description: |
          Node template IDs for which the workflow will execute the workflow,
          especially start_subclouds_executions and
          wait_for_execution_end_and_delete_deployment_group operation.
      node_instance_ids:
        default: [ ]
        description: |
          Node instance IDs for which the workflow will execute the workflow,
          especially start_subclouds_executions and
          wait_for_execution_end_and_delete_deployment_group operation.

Kubernetes Cluster Upgrade Automation (upgrade_kubernetes)

Upgrade the Kubernetes version in a standalone or a DC system.

Parameters:

Note: most of these parameters are directly inserted into WRCP commands, so additional details about them can be found in the platform documentation.

Summary:

The workflow will invoke the sw-manager and dcmanager (if in a DC system) WRCP APIs to run a Kubernetes upgrade following the chosen parameters. The target version must be available in the platform and higher than the currently installed version by one step, i.e. it’s not possible to skip a Kubernetes version when upgrading.

For DC systems, this workflow should only be executed from the System Controller environment in Conductor, as it leverages the DC APIs to apply the upgrade on its subclouds.

As the Kubernetes upgrade is a time consuming process, expect the workflow to retry some operations, such as polling the upgrade progress in the platform, during its execution.

Analytics Deployment Automation (install_wra, upgrade_wra, uninstall_wra)

Install WRA:

The install_wra workflow performs the following steps:

Parameters:

      wra_tgz_url:
        description: URL containing the path to a WRA .tar.gz file.
        type: string
        default: ''
      oidc:
        description: Whether the OIDC DEX login setup is updated.
        type: boolean
        default: False
      storage_resource_allocations:
        description: >
          The list of helm overrides for storage resources. For example:
          [
            {
                "file_url": "elasticsearch-data.yaml",
                "helm_chart_name": "elasticsearch-data"
            },
            {
                "file_url": "arbitrary-overrides.yaml",
                "helm_chart_name": "placeholder"
            }
          ]
        type: list
        default: []

Upgrade WRA:

The upgrade_wra workflow supports both the “update” and “upgrade” capabilities. An “update” is defined by an increase of the minor version number, from 21.12-0 to 21.12-1 for example, while an “upgrade” is defined by an increase of the major version number, like from 21.12 to 22.06.

The operation is chosen based on the provided WRA tarball, relative to the currently installed WRA version.

These are the steps to update WRA:

These are the steps to upgrade WRA:

WRA upgrades requires the same inputs as a clean install (install_wra), updates only require the URL to the WRA tarball to be installed.

Parameters:

      wra_tgz_url:
        description: URL containing the path to a WRA .tar.gz file.
        type: string
        default: ''
      oidc:
        description: Whether the OIDC DEX login setup is updated.
        type: boolean
        default: False
      storage_resource_allocations:
        description: >
          The list of helm overrides for storage resources. For example:
          [
            {
                "file_url": "elasticsearch-data.yaml",
                "helm_chart_name": "elasticsearch-data"
            },
            {
                "file_url": "arbitrary-overrides.yaml",
                "helm_chart_name": "placeholder"
            }
          ]
        type: list
        default: []

Uninstall WRA:

The uninstall_wra workflow performs the following steps, they apply to system controllers and subclouds:

The workflow does not require any parameters.

NetApp Trident Storage Upgrade Automation (upgrade_trident)

Overview

Astra Trident deploys in Kubernetes clusters as pods and provides dynamic storage orchestration services for your Kubernetes workloads. It enables your containerized applications to quickly and easily consume persistent storage from NetApp’s broad portfolio that includes ONTAP (AFF/FAS/Select/Cloud/Amazon FSx for NetApp ONTAP, Element software (NetApp HCI/SolidFire), as well as the Azure NetApp Files service, and Cloud Volumes Service on Google Cloud.

The NetApp Trident upgrade process is integrated as a new workflow. This workflow can be broken in a series of relatively independent tasks that are called in order:

  1. Trident Check: Checks whether Trident is being used and also checks whether it is up to date.
  2. This step decides whether we should continue. Trident Health Check: Checks whether Trident is in a working state just to make sure it can be safely upgraded.
  3. Trident Upgrade: Reinstalls Trident to perform an upgrade. When you uninstall Trident, the Persistent Volume Claim (PVC) and Persistent Volume (PV) used by the Astra Trident deployment are not deleted. PVs that have already been provisioned will remain available while Astra Trident is offline, and Astra Trident will provision volumes for any PVCs that are created in the interim once it is back online.
  4. Trident Health Check: Checks whether Trident is in a working state just to make sure the upgrade worked as intended.

These steps are shown in the diagram below:

NetApp Trident Update

How to run the upgrade workflow

Upgrades can be performed via the “upgrade_trident” workflow, the user only needs to provide the target’s deployment id:

cfy executions start upgrade_trident --deployment-id <DEPLOYMENT_ID> 

When a platform upgrade is performed, the newer WRCP version already comes with the newer version of tridentctl which is then used to upgrade Trident. The workflow does not come with any rollback functionality as that would require downgrading tridentctl, which might make it incompatible with the Kubernetes version being used in the system.

Certificate Audit (audit_certificates)

Summary

This workflow will scan for all certificates it can find in a WRCP system, including those stored in the Kubernetes cluster’s secrets. These certificates can include the local registry, SSL, DC Admin, etcd, cert-manager, kube-system certificates and more, depending on which certificates are stored in the platform.

After the workflow is done executing, the certificate data can be verified in each node instance’s runtime properties, or in the Certificates Audit page in the web GUI. For each certificate, the following fields will be visible: - Name - Expiry date - CA - Region - Deployment - Site - Type - Auto-renew

Usage

To run this workflow, simply choose the desired environment(s) and run the audit_certificates workflow from either the environment’s page or the environments list page (bulk action).

The workflow does not require any parameters.

WRCP Provisioning (install_wrcp)

This workflow has the ability to provision new WRCP installation for servers with Hewlett Packard Enterprise (HPE) hardware only. The WRCP configurations supported are AIO-SX and AIO-DX. The requirements are:

Note The WRCP Plugin is only supported on IPV4 installations.

This workflow includes the following 4 steps:

  1. Custom ISO: With the files input creates a new custom WRCP ISO to be used in the installation.
  2. Redfish: Inserts the custom ISO in the virtual drive DVD and reboots the server.
  3. Bootstrap and DM: After the initial boot, starting the WRCP bootstrapping and starts the Deployment Manager
  4. Custom ISO cleanup: After all the steps, The ISO created in the first step is deleted.

Node Types

windriver.nodes.CustomISO This is a node that represents the step that builds custom ISO using a base one together with the information input in the CIQ files. Since this is step that builds the ISO, it is supposed to be the first one for the System Install blueprint.

windriver.nodes.RedfishBMC This node is responsible for communicating with the BMC. Using the information present in the CIQ files, this step inserts the ISO generated before by CustomISO step in the Virtual Drive and then reboots the server.

windriver.nodes.controller.DM This node defines the operations that will run the initial bootstrap in WRCP, and then run the Deployment Manager. This step still uses the information contained in the CIQ files.

windriver.nodes.controller.SystemController This node will define the operations necessary for subcloud installation in a future release.

windriver.nodes.CustomISOCleaner This node deletes the iso created by the CustomISO node. It is supposed to be the last node in the blueprint.

Runtime Properties:

How to use

  1. Upload the blueprint to Conductor
  2. Fill up the CIQ files
  3. Provide valid localhost.yaml and deployment-config.yaml files and rename them, respectively, to bootstrap_values.yaml.v1.jinja2 and deployment_config.yaml.v2.jinja2. Place them in the golden_configs directory.
  4. Upload the CIQ files and golden configs to CFS (FileServer and CustomISO API)
  5. Upload Credentials

    • Create a secret named wrcp_license
      • Containing a full copy of a compatible WRCP license
      • must be named wrcp_license
    • Create secret named ‘global_secrets’ filling up the following fields:

      # This CIQ secrets Template applies to a single National Datacenter (NDC)
      site_name: "global"               # (String) required, Name of the NDC site for this secrets file
      
      registry_username: username       # (String) required, username for access to docker images registry
      registry_password: password       # (String) required, password for access to docker images registry
      
      # This section lists BMC secrets per server
      server_list:
      - service_tag: "ndc_server_1"     # (String) required with quotes, value should be lower case only,e.g '1a2b3c4'
      bmc_user: user1               # (String) required, username of the BMC account to use
      bmc_password: password1       # (String) required, password of the BMC account to use
      
      - service_tag: "ndc_server_2"
      bmc_user: user1
      bmc_password: password1
      
      - service_tag: "ndc_server_3"
      bmc_user: user1
      bmc_password: password1
    • Create a secret ‘CUSTOM_NAME_secrets’ filling up the following fields:

    • Where CUSTOM_NAME is the same name you have for yaml filename in the ciq_file_list blueprint input. Using the example ciq_file_list below, it should be wr-duplex-1_2_secrets.

      # This CIQ secrets Template applies to a single Regional Datacenter (RDC)
      # RDC CIQ schema: schemas/regional_datacenter_ciq_secrets.spec.v1.json
      
      site_name: wr-duplex-1      # (String) required, Name of the RDC site for this secrets file
      
      server_list:
      - service_tag: ''                   # (String) required with quotes, value should be lower case only,e.g '1a2b3c4'
      bmc_user: user                    # (String) required, username of the BMC account to use
      bmc_password: password            # (String) required, password of the BMC account to use
      
      - service_tag: ''
      bmc_user: user
      bmc_password: password
      
      initial_sysadmin_password: syspwd*  # (String) required, the initial password of controller-0 of this cell site
  6. Create Deployment While filling up the inputs, special attention to the next two fields. Here is an example:

    ciq_file_list:
    [
     {
         "file_name": "wr-duplex-1_2_ndc.yaml",    
         "file_type": "NDC"
     },
     {
         "file_name": "wr-duplex-1_2.yaml",
         "file_type": "RDC"
     }
    ]
    
    golden_config_file_list:
    [
     {
         "file_name":"bootstrap_values.yaml.v1.jinja2",
         "file_type":"RDCboostrapValues"
     },
     {
         "file_name":"deployment_config.yaml.v2.jinja2",
         "file_type":"RDCdeploymentConfig"
     }
    ]
  7. shutdown controller-1 target server.

  8. Then, run “install_wrcp” workflow.