how to configure SR-IOV for the guest VM

  • Intel 82599ES 10G NIC

SR-IOV support: yes
https://www.intel.com/content/www/us/en/support/articles/000005722/network-and-i-o/ethernet-products.html
https://www.intel.com.au/content/dam/doc/design-guide/82599-sr-iov-driver-companion-guide.pdf

you can also verify if the NIC supports SR-IOV with lspci command

lspci -s 0000:04:00.0 -vvv |grep -i "Single Root"

  • download the latest ixgbe driver from intel site

ixgbe: https://downloadcenter.intel.com/download/14687
xxgbevf: https://downloadcenter.intel.com/download/18700

Which file to choose?
ixgbe driver supports all 82599, 82598EB, X540, and X552-based Intel 10 NIC.
ixgbevf driver supports 82599, X540, and X552-based virtual function devices that can only be activated on kernels that support SR-IOV.


  • build and install the most recent ixgbe driver (i.e. current version 5.3.4)
rmmod ixgbe
modprobe ixgbe max_vfs=2,2,2,2
    
# you can create or delete VFs in live
cat /sys/class/net/eth5/device/sriov_totalvfs
cat /sys/class/net/eth5/device/sriov_numvfs
echo 2 > /sys/class/net/eth4/device/sriov_numvfs
echo 0 > /sys/class/net/eth4/device/sriov_numvfs
    
# if you use igb_uio instead of ixgbe, 
echo 2 > /sys/bus/pci/devices/0000\:04\:00.0/max_vfs
echo 2 > /sys/bus/pci/devices/0000\:04\:00.1/max_vfs

modinfo tells how to assign VFs to each port. for example, “max_vfs=2,2,2,2” assigns 2 VFs for each port over two dual-port NICs.


  • you can specify max_vfs parameter for loading-time adoption
vi /etc/modprobe.d/ixgbe.conf
options ixgbe max_vfs=2,2,2,2

this config file affects when you modprobe, but not in boot-up loading which is specified in /etc/modules.


  • create sr-iov network and configure VM
cat sr-iov-network-1.xml
<network>
  <name>sr-iov-net1</name>
  <uuid>a25332ee-6f65-420c-a6b9-bb0d6083802a</uuid>
  <forward mode='hostdev' managed='yes'>
    <pf dev='eth4'/>
  </forward>
</network>

virsh net-define sr-iov-network-1.xml
virsh net-list --all
virsh net-start sr-iov-net1

virsh attach-interface --domain domID --type network --source sr-iov-net1 --model virtio --config --live

you can either edit VM configuration and reboot the VM

virsh edit vmName
    <interface type='network'>
      <source network='sr-iov-net1'/>
    </interface>

  • verify what driver the VM had loaded into the kernel for the SR-IOV NIC
# login into VM to execute this command
lspci -vmmks 00:04:00.0

  • verify how many RX queues are assgined to PF and VF
dmesg |grep 'Rx Queue' |grep 04:00.0
dmesg |grep 'Rx Queue' |grep 04:10.0

  • if you configure VM with guest kernel driver

Configure SR-IOV Network Virtual Functions in Linux KVM
https://software.intel.com/en-us/articles/configure-sr-iov-network-virtual-functions-in-linux-kvm


  • if you configure VM with guest DPDK PMD

http://dpdk.org/doc/guides/nics/intel_vf.html
https://software.intel.com/en-us/articles/sr-iov-and-dpdk-hands-on-labs
https://github.com/intel/SDN-NFV-Hands-on-Samples/blob/master/SR-IOV_DPDK_Hands-on_Lab/docs/SR-IOV-HandsOn-IEEE.pdf


  • utilize intel’s VF listing script, listvfs-by-pf.sh
cat listvfs-by-pf.sh
# Copyright (c) 2017 Intel Corporation

# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the
# "Software"), to deal in the Software without restriction, including
# without limitation the rights to use, copy, modify, merge, publish,
# distribute, sublicense, and/or sell copies of the Software, and to
# permit persons to whom the Software is furnished to do so, subject to
# the following conditions:

# The above copyright notice and this permission notice shall be included
# in all copies or substantial portions of the Software.

# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
# OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
# IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
# CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
# TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
# SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

# Author: Clayne B. Robison <clayne dot b dot robison at intel dot com>

#!/bin/bash

NIC_DIR="/sys/class/net"
for i in $( ls $NIC_DIR) ;
do
    if [ -d "${NIC_DIR}/$i/device" -a ! -L "${NIC_DIR}/$i/device/physfn" ]; then
        declare -a VF_PCI_BDF
        declare -a VF_INTERFACE
        k=0
        for j in $( ls "${NIC_DIR}/$i/device" ) ;
        do
            if [[ "$j" == "virtfn"* ]]; then
                VF_PCI=$( readlink "${NIC_DIR}/$i/device/$j" | cut -d '/' -f2 )
                VF_PCI_BDF[$k]=$VF_PCI
                #get the interface name for the VF at this PCI Address
                for iface in $( ls $NIC_DIR );
                do
                    link_dir=$( readlink ${NIC_DIR}/$iface )
                    if [[ "$link_dir" == *"$VF_PCI"* ]]; then
                        VF_INTERFACE[$k]=$iface
                    fi
                done
                ((k++))
            fi
        done
        NUM_VFs=${#VF_PCI_BDF[@]}
        if [[ $NUM_VFs -gt 0 ]]; then
            echo "Physical Function $i has the following virtual functions:"
            echo -e "PCI BDF\t\tInterface"
            echo -e "=======\t\t========="
            for (( l = 0; l < $NUM_VFs; l++ )) ;
            do
                echo -e "${VF_PCI_BDF[$l]}\t${VF_INTERFACE[$l]}"
            done
            unset VF_PCI_BDF
            unset VF_INTERFACE
            echo " "
        fi
    fi
done

  • utilize intel’s VF listing script, list-all-vfs.sh
cat list-all-vfs.sh
# Copyright (c) 2017 Intel Corporation

# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the
# "Software"), to deal in the Software without restriction, including
# without limitation the rights to use, copy, modify, merge, publish,
# distribute, sublicense, and/or sell copies of the Software, and to
# permit persons to whom the Software is furnished to do so, subject to
# the following conditions:

# The above copyright notice and this permission notice shall be included
# in all copies or substantial portions of the Software.

# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
# OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
# IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
# CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
# TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
# SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

# Author: Clayne B. Robison <clayne dot b dot robison at intel dot com>

#!/bin/bash

NIC_DIR="/sys/class/net"
declare -a PARENT_PF_PCI_INFO
declare -a VF_INTERFACE_INFO
declare -a PARENT_PF_DESCRIPTION_INFO
num_vfs=0
for i in $( ls $NIC_DIR) ;
do
#   echo "Determining Physfn info for $i"
    if [ -L "${NIC_DIR}/$i/device/physfn" ]; then
        PARENT_PF_PCI=$( readlink "${NIC_DIR}/$i/device/physfn" | cut -d
 '/' -f2 )
        PARENT_PF_VENDOR=$( lspci -vmmks $PARENT_PF_PCI | grep ^Vendor |
 cut -f2)
        PARENT_PF_NAME=$( lspci -vmmks $PARENT_PF_PCI | grep ^Device | c
ut -f2)
        VF_INTERFACE_INFO[$num_vfs]="$i"
        PARENT_PF_PCI_INFO[$num_vfs]="$PARENT_PF_PCI"
        PARENT_PF_DESCRIPTION_INFO[$num_vfs]="$PARENT_PF_VENDOR $PARENT_
PF_NAME"
        ((num_vfs++))
    fi
done

if [[ $num_vfs -gt 0 ]]; then
    echo -e "VF Device\tParent PF PCI BDF\tParent PF Description"
    echo -e "=========\t=================\t====================="
    for (( i=0; i < $num_vfs; i++ )) ;
    do
        echo -e "${VF_INTERFACE_INFO[$i]}\t\t${PARENT_PF_PCI_INFO[$i]}\t
\t${PARENT_PF_DESCRIPTION_INFO[$i]}"
    done
    echo " "
fi

  • sample VM
<domain type='kvm'>
  <name>xenial-sriov</name>
  <uuid>4c6042ca-dcce-487a-9493-959764a8e2c9</uuid>
  <memory unit='KiB'>4194304</memory>
  <currentMemory unit='KiB'>4194304</currentMemory>
  <memoryBacking>
    <hugepages>
      <page size='2048' unit='KiB' nodeset='0'/>
    </hugepages>
  </memoryBacking>
  <vcpu placement='static'>4</vcpu>
  <cputune>
    <shares>2048</shares>
    <vcpupin vcpu='0' cpuset='4'/>
    <vcpupin vcpu='1' cpuset='5'/>
    <vcpupin vcpu='2' cpuset='6'/>
    <vcpupin vcpu='3' cpuset='7'/>
  </cputune>
  <numatune>
    <memory mode='strict' nodeset='0'/>
    <memnode cellid='0' mode='strict' nodeset='0'/>
  </numatune>
  <os>
    <type arch='x86_64' machine='pc-i440fx-zesty'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <cpu mode='host-passthrough'>
    <topology sockets='1' cores='4' threads='1'/>
    <numa>
      <cell id='0' cpus='0-3' memory='4194304' unit='KiB' memAccess='shared'/>
    </numa>
  </cpu>
  <clock offset='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/bin/kvm-spice</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/home/mslee/work/CloudRouter/test/xenial-server-cloudimg-amd64-disk1-kairoson-certificate.img'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:6f:38:be'/>
      <source bridge='virbr0'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/>
    </interface>
    <interface type='network'>
      <mac address='52:54:00:27:df:51'/>
      <source network='sr-iov-net1'/>
      <model type='rtl8139'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <interface type='network'>
      <mac address='52:54:00:be:d1:d7'/>
      <source network='sr-iov-net2'/>
      <model type='rtl8139'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='spicevmc'>
      <target type='virtio' name='com.redhat.spice.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='spice' autoport='yes'>
      <listen type='address'/>
      <image compression='off'/>
    </graphics>
    <sound model='ich6'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </sound>
    <video>
      <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <redirdev bus='usb' type='spicevmc'>
      <address type='usb' bus='0' port='1'/>
    </redirdev>
    <redirdev bus='usb' type='spicevmc'>
      <address type='usb' bus='0' port='2'/>
    </redirdev>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </memballoon>
  </devices>
</domain>

Leave a Reply

Your email address will not be published. Required fields are marked *