Enhance Performance of Containerized Application using SR-IOV

When you digitized and moved your traditional application running on bare metal platform to the cloud in the form of a container, one drawback is the network performance degradation in the virtualized environment no matter what hypervisor you are using.

One way to keep the similar network performance is to use modern network interface card (NIC)’s: such as Intel X710 or Mellanox MT27700/27710 Family , SR-IOV feature: It divided the same physical NIC into multiple physical function (PF)s, and each PF can be further be divided into a couple of Virtual Function (VF)s. You can directly pass one or more of the VFs to a container to achieve the native network performance for the application running inside the container without the penalty associated with the hypervisor layer overhead. You can also utilize other network performance mechanism such as DPDK poll mode driver to further enhance your application’s network performance.

How does it work ?

For the Kubernetes cluster with Calico and Multi plugin, during the Container Network Interface (CNI) initialization phase, CNI configuration in Network Attachment Definition (NAD) format can be used to configure and set up SR-IOV resources.

Those processes are complicated and error-prone. Most popular, publicly available Kubernetes platforms, such as openshift, StarlingX, provided users with high level utilities to make those tasks easier. In openshift, there are a utility called “SR-IOV network operator” with following CRDs (Custom Resource Definition) : Sriov Network, SriovNetworkNodeState and SriovNetworkNodePolicy to do node NIC hardware feature auto discovery, initialization, provision, upgrade and initialization of CNI plugin with auto generation of NAD Customer Resource (CR) from declarative high level manifest. In the StarlingX open source solution, users can use system commands to achieve the same goal from a high level command line interface.

A Complete Working Flow Example

Now, let’s use an SR-IOV device in a containerized application with an working example on StarlingX platform:

First, find your available nodes ( especially worker node, which can host pod based applications ) by system node management command, find and lock the worker node with SR-IOV capable NIC card, assign sriov related label to the worker node, provision the parameters for the SR-IOV PF/VF related parameters as following steps:

~(keystone_admin)$ system host-list 
~(keystone_admin)$ system host-lock worker-0
~(keystone_admin)$ system host-label-assign worker-0 sriovdp=enabled
~(keystone_admin)$ system host-if-list -a worker-0
~(keystone_admin)$ system host-if-modify -m 1500 -n sriov1 -c pci-sriov -N 16 --vf-drvier=vfio worker-0 ens0
~(keystone_admin)$ system datanetwork-add datanet-0 vlan

~(keystone_admin)$ system interface-datanetwork-assign worker-0 sriov1 datanet-0
~(keystone_admin)$ system host-lock worker-0

The above third system command assigns the sriovdp=enabled label to the worker-0 node to make it managed under the Multus and Calico plugin mechanism. After find the corresponding NIC (ens0 here, for example), the 5th command assign 16 VFs to it, with MTU value of 1500, gave it a new name: sriov1 and bound vfio driver to it, which will allow DPDK poll mode driver used in the pod, a netdevice driver can be used if a normal Linux network device driver is desired. Next command provision a VLAN type data network with name datanet-0. Data network is an abstract concept to glue physical network card entities with logical data network concepts. You can think of it as similar to the Linux Volume Group concept, and you can add logical volume residing on a physical disk device partition such as /dev/sda1 into the volume group. Here, datanet is an abstract concept like volume group and you can associate a specific NIC function ( such as a VF of an SR-IOV interface ) with the datanet. This is what the next command did. After all the configuration, unlock the worker node and you are ready to use your new SR-IOV interface in your pod application.

After the SR-IOV interface is successfully provisioned by above procedure, the next step is to create an NAD for its usage as following NAD.yaml file:

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition

metadata:
   name: net1
   annotations:
      k8s.v1.cni.cncf.io/resourceName:
      intel.com/pci_sriov_net_datanet_0
spec:
   config: '{
     "cniVersion": "0.3.0",
     "type": "sriov"
   }'

Apply the NAD by:

$kubectl apply -f NAD.yaml

Then, create your pod with the SR-IOV interface as following pod.yaml file:

apiVersion: v1
kind: Pod
metadata:
   name: pod-with-sriov
   annotations:
k8s.v1.cni.cncf.io/networks: '[
     
{ "name": "net1" }
   ]'

spec:
   containers:
   - name: pod1-with-sriov
     image: centos/tools
     imagePullPolicy: IfNotPresent
     command: [ "/bin/bash", "-c", "--" ]
     args: [ "while true; do sleep 300000; done;"]
     resources:
        requests:
intel.com/pci_sriov_net_datanet_0: '1'
        limits:
intel.com/pci_sriov_net_datanet_0: '1'  

Apply the pod configuration yaml file by:

$kubectl apply -f pod.yaml

After the pod is successfully launched, you can exam its network resources by procedures as following:

$kubectl get pods -A
$kubectl exec -it <Your pod name from above cmd > -- /bin/sh
#ip addr

From the above complete working example, the whole concept of provision and use of SR-IOV inside the Kubernetes pod is illustrated. The StarlingX open source project made this complex task much easier to use.

Thanks for reading.

Keyuan Zhang

Published by Keyuan Zhang

Professional with intensive industry experience and knowledge on Cloud Computing, IoT and Embedded System.

Leave a comment