NIC Bonding at Cloud Platform HLD
Introduction
NIC Bonding supports aggregating network interfaces on a host within a WRCP installation, specifically in the All-in-One Duplex configurations.
The network interface aggregation can operate in modes such as “Active-backup,” “Balanced XOR,” or “802.3ad.”
Workflows
This feature is enabled at provisioning time, through the deployment config. The relevant workflow is provision_sysctrl.
- Deploy WRCP through Conductor targeting the installation of a Dell lab with an All-in-One Duplex configuration. This configuration involves bonding two network interfaces using the specified ports in the CIQ.
- Two interfaces will be connected with the following settings: pxe (untagged), Oam (tagged), mgmt (tagged), cluster (tagged), and one for Data. Link aggregation will be configured as LACP/802.3ad/L2/Always.
- The Deployment Config will be used to configure the feature on the site.
An indicative high-level sequence of events is shown in the sequence diagram below:
Flow
- Add Worker node blueprint is run through WRC.
- CIQ information is sent to BMOM.
- BMOM parses and validates CIQ, and sends a request to BMO.
- BMO performs HW Discovery, audits, configures the golden Config, and adds the server to the inventory list.
- BMO returns MAC address(es) and the final config to Conductor.
- BMOM updates deployment-config.yaml and uploads it to WRCP controller nodes.
- Hardware is powered on (Can be Conductor via Redfish or BMO executing this step).
- WRC drives NIC bonding through WRC Plugin on compute clusters in RDC/LDC.
Input
Updated CIQ with NIC Bonding Spec / Server Golden Config.
Output
Successful NIC bonding deployment on all 15G and 16G compute nodes on the Compute/workload cluster or management/WR Systems controller on 15G or 16G HW.
Sample Logical Network Address Configuration (IPv4)
floating IP: 128.224.54.34 controller-0 IP: 128.224.54.35 controller-1 IP: 128.224.54.36 oam_gateway IP: 128.224.54.1
dns_servers: 128.224.144.130
management_subnet: 10.9.32.0/24
management_start_address: 10.9.32.2
management_end_address: 10.9.32.254
management_multicast_subnet:
cluster_host_subnet: 192.168.206.0/24
cluster_pod_subnet:
external_oam_subnet: 128.224.54.0/24 external_oam_gateway_address: 128.224.54.1 external_oam_floating_address: 128.224.54.34 external_oam_node_0_address: 128.224.54.35 external_oam_node_1_address: 128.224.54.36
pxeboot_subnet: 192.168.202.0/24
Including the definition of the network interface bond name. For example, cat /proc/net/bonding/pxeboot0.
CIQ Structure for NIC Bonding
In the installation CIQs, the structure for each service_tag will be as follows:
server_list:
- service_tag: "14hxys3"
hostname: "controller-1"
role: controller-std
id: 1
bmc_endpoint: "https://100.76.27.140"
bootstrap_interface: eno8303
bootstrap_mac: a1:23:45:67:bb:89
bonded_interface_: NIC.Bond
bonded_interface_0: eno12399
bonded_interface_1: eno12409
# in case plugin fails to map main_interface to interface name and mac,
# fall back to legacy way to provision bootstrap_interface or bootstrap_mac
boot_device: /dev/disk/by-path/pci-0000:67:00.0-scsi-0:3:111:0
osds:
- /dev/disk/by-path/pci-0000:67:00.0-scsi-0:2:3:0
#0 - Standard Controller, Serial Console
#1 - Standard Controller, Graphical Console
#2 - AIO, Serial Console
#3 - AIO, Graphical Console
#4 - AIO Low-latency, Serial Console
#5 - AIO Low-latency, Graphical Console
install_type: 3
bootstrap_dns_1: 123.4.5.6
bootstrap_dns_2: 1.1.1.1
site_name: acre_nic_test
location: dallas
latitude: -1.2345
longitude: 1.2345
contact: someone@company.com
timezone: UTC
external_oam_subnet: 123.4.5.0/24
external_oam_gateway_address: 123.4.5.1
external_oam_floating_address: 123.4.5.34
NIC.Bond.1-1-1 and NIC.Bond.1-1-2 should be present in the fqdd mapping CIQ, like, for example, below:
device_mappers:
- filter:
models:
- 'PowerEdge R750'
disk_mappers:
- fqdd: HBA355i
prefix: pci-0000:67:00.0-sas-0x
suffix: -lun-0
port_mappers:
- fqdd: 'NIC.Bond.1-1-1'
ethifname: 'enp23s0f0'
- fqdd: 'NIC.Bond.1-1-2'
ethifname: 'enp202s0f0'
- fqdd: 'NIC.Embedded.1-1-1'
ethifname: 'eno8303'
- fqdd: 'NIC.Embedded.2-1-1'
ethifname: 'eno8403'
# R750, Broadcom Adv Quad 25Gb Ethernet, Broadcom Corp
- fqdd: 'NIC.Integrated.1-1-1'
ethifname: 'eno12399'
- fqdd: 'NIC.Integrated.1-2-1'
ethifname: 'eno12409'
- fqdd: 'NIC.Integrated.1-3-1'
ethifname: 'eno12419'
- fqdd: 'NIC.Integrated.1-4-1'
ethifname: 'eno12429'
The deployment config jinja file changes in the following manner:
---
apiVersion: starlingx.windriver.com/v1
kind: HostProfile
metadata:
labels:
controller-tools.k8s.io: "1.0"
name: controller-{{server.id}}-profile
namespace: deployment
spec:
{% if server.tag_subfunction_lowlatency_ %}
base: controller-aio-ll-profile
{% elif server.tag_subfunction_worker_%}
base: controller-aio-profile
{% else %}
base: controller-profile
{% endif %}
bootDevice: {{server.boot_device}}
rootDevice: {{server.boot_device}}
boardManagement:
credentials:
password:
secret: bmc-secret-{{server.service_tag | lower}}
type: {{server.bmc_type or 'dynamic'}}
address: {{server.bmc_ip_}}
interfaces:
{% if server.bond_mode %}
bond:
- class: platform
dataNetworks: []
members:
{% for bond_if in server.bonded_interfaces_ %}
- {{bond_if}}
{% endfor %}
mode: {{server.bond_mode}}
{% if server.transmitHashPolicy %}
transmitHashPolicy: {{server.transmitHashPolicy}}
{% endif %}
{% if server.primaryReselect %}
primaryReselect: {{server.primaryReselect}}
{% endif %}
name: apxeboot0
platformNetworks:
- pxeboot
ethernet:
- class: none
dataNetworks: []
name: enp23s0f0
platformNetworks: []
port:
name: enp23s0f0
- class: none
dataNetworks: []
name: enp202s0f0
platformNetworks: []
port:
name: enp202s0f0
{% if server.id == 0 %}
- class: none
dataNetworks: []
mtu: 1500
name: lo
platformNetworks: []
port:
name: lo
ptpRole: none
{% endif %}
vlan:
- class: platform
dataNetworks: []
lower: apxeboot0
name: oam0
platformNetworks:
- oam
ptpRole: none
vid: {{ oam_vlan }}
- class: platform
dataNetworks: []
lower: apxeboot0
name: mgmt0
platformNetworks:
- mgmt
ptpRole: none
vid: {{ management_vlan }}
- class: platform
dataNetworks: []
lower: apxeboot0
name: cluster0
platformNetworks:
- cluster-host
ptpRole: none
vid: {{ cluster_host_vlan }}
{% if server.osds %}
storage:
osds:
{% for osd_device in server.osds %}
- cluster: ceph_cluster
function: osd
path: {{osd_device}}
{% endfor %}
{% endif %}
---