linux/Documentation
David S. Miller 26abf15c49 mlx5-updates-2022-01-06
1) Expose FEC per lane block counters via ethtool
 
 2) Trivial fixes/updates/cleanup to mlx5e netdev driver
 
 3) Fix htmldoc build warning
 
 4) Spread mlx5 SFs (sub-functions) to all available CPU cores: Commits 1..5
 
 Shay Drory Says:
 ================
 Before this patchset, mlx5 subfunction shared the same IRQs (MSI-X) with
 their peers subfunctions, causing them to use same CPU cores.
 
 In large scale, this is very undesirable, SFs use small number of cpu
 cores and all of them will be packed on the same CPU cores, not
 utilizing all CPU cores in the system.
 
 In this patchset we want to achieve two things.
  a) Spread IRQs used by SFs to all cpu cores
  b) Pack less SFs in the same IRQ, will result in multiple IRQs per core.
 
 In this patchset, we spread SFs over all online cpus available to mlx5
 irqs in Round-Robin manner. e.g.: Whenever a SF is created, pick the next
 CPU core with least number of SF IRQs bound to it, SFs will share IRQs on
 the same core until a certain limit, when such limit is reached, we
 request a new IRQ and add it to that CPU core IRQ pool, when out of IRQs,
 pick any IRQ with least number of SF users.
 
 This enhancement is done in order to achieve a better distribution of
 the SFs over all the available CPUs, which reduces application latency,
 as shown bellow.
 
 Machine details:
 Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz with 56 cores.
 PCI Express 3 with BW of 126 Gb/s.
 ConnectX-5 Ex; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; PCIe4.0
 x16.
 
 Base line test description:
 Single SF on the system. One instance of netperf is running on-top the
 SF.
 Numbers: latency = 15.136 usec, CPU Util = 35%
 
 Test description:
 There are 250 SFs on the system. There are 3 instances of netperf
 running, on-top three different SFs, in parallel.
 
 Perf numbers:
  # netperf     SFs         latency(usec)     latency    CPU utilization
    affinity    affinity    (lower is better) increase %
  1 cpu=0       cpu={0}     ~23 (app 1-3)     35%        75%
  2 cpu=0,2,4   cpu={0}     app 1: 21.625     30%        68% (CPU 0)
                            app 2-3: 16.5     9%         15% (CPU 2,4)
  3 cpu=0       cpu={0,2,4} app 1: ~16        7%         84% (CPU 0)
                            app 2-3: ~17.9    14%        22% (CPU 2,4)
  4 cpu=0,2,4   cpu={0,2,4} 15.2 (app 1-3)    0%         33% (CPU 0,2,4)
 
  - The first two entries (#1 and #2) show current state. e.g.: SFs are
    using the same CPU. The last two entries (#3 and #4) shows the latency
    reduction improvement of this patch. e.g.: SFs are on different CPUs.
  - Whenever we use several CPUs, in case there is a different CPU
    utilization, write the utilization of each CPU separately.
  - Whenever the latency result of the netperf instances were different,
    write the latency of each netperf instances separately.
 
 Commands:
  - for netperf CPU=0:
 $ for i in {1..3}; do taskset -c 0 netperf -H 1${i}.1.1.1 -t TCP_RR  -- \
   -o RT_LATENCY -r8 & done
 
  - for netperf CPU=0,2,4
 $ for i in {1..3}; do taskset -c $(( ($i - 1) * 2  )) netperf -H \
   1${i}.1.1.1 -t TCP_RR  -- -o RT_LATENCY -r8 & done
 
 ================
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmHXh+AACgkQSD+KveBX
 +j68fQgAghUX4TFS2JSwa7+XSCtzz7GIu2Xrz8aWTAnydRLlNXuFuuHYcNed6I0l
 7DaVOZwHp1tp3tnx3WMGPUU6ujDPEgasaDDblvG2UXix5LPVEHDXY44ittQX8mpC
 SC8Yj9mNo6DSfOMUZklFDMbw57XuLJ+HEGnwnrOEEyLX7ruDXGEViUmVBd4IoC3B
 F2fJHBkdTJfHWTJRB4pWbZD1dw7WbKd0RyPla3OkoHugEUCKnbjii8cMwNM64Bbp
 Pjz/SiShVy+NTotqPzRNjcx7y4tHOXCYt33zt1VlGtdUxs5eCA5jkjHFz0jb12Lu
 rvfHaBaU+elMKTw5G/WMGJxZQx0kEQ==
 =VBWY
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2022-01-06' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2022-01-06

1) Expose FEC per lane block counters via ethtool

2) Trivial fixes/updates/cleanup to mlx5e netdev driver

3) Fix htmldoc build warning

4) Spread mlx5 SFs (sub-functions) to all available CPU cores: Commits 1..5

Shay Drory Says:
================
Before this patchset, mlx5 subfunction shared the same IRQs (MSI-X) with
their peers subfunctions, causing them to use same CPU cores.

In large scale, this is very undesirable, SFs use small number of cpu
cores and all of them will be packed on the same CPU cores, not
utilizing all CPU cores in the system.

In this patchset we want to achieve two things.
 a) Spread IRQs used by SFs to all cpu cores
 b) Pack less SFs in the same IRQ, will result in multiple IRQs per core.

In this patchset, we spread SFs over all online cpus available to mlx5
irqs in Round-Robin manner. e.g.: Whenever a SF is created, pick the next
CPU core with least number of SF IRQs bound to it, SFs will share IRQs on
the same core until a certain limit, when such limit is reached, we
request a new IRQ and add it to that CPU core IRQ pool, when out of IRQs,
pick any IRQ with least number of SF users.

This enhancement is done in order to achieve a better distribution of
the SFs over all the available CPUs, which reduces application latency,
as shown bellow.

Machine details:
Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz with 56 cores.
PCI Express 3 with BW of 126 Gb/s.
ConnectX-5 Ex; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; PCIe4.0
x16.

Base line test description:
Single SF on the system. One instance of netperf is running on-top the
SF.
Numbers: latency = 15.136 usec, CPU Util = 35%

Test description:
There are 250 SFs on the system. There are 3 instances of netperf
running, on-top three different SFs, in parallel.

Perf numbers:
 # netperf     SFs         latency(usec)     latency    CPU utilization
   affinity    affinity    (lower is better) increase %
 1 cpu=0       cpu={0}     ~23 (app 1-3)     35%        75%
 2 cpu=0,2,4   cpu={0}     app 1: 21.625     30%        68% (CPU 0)
                           app 2-3: 16.5     9%         15% (CPU 2,4)
 3 cpu=0       cpu={0,2,4} app 1: ~16        7%         84% (CPU 0)
                           app 2-3: ~17.9    14%        22% (CPU 2,4)
 4 cpu=0,2,4   cpu={0,2,4} 15.2 (app 1-3)    0%         33% (CPU 0,2,4)

 - The first two entries (#1 and #2) show current state. e.g.: SFs are
   using the same CPU. The last two entries (#3 and #4) shows the latency
   reduction improvement of this patch. e.g.: SFs are on different CPUs.
 - Whenever we use several CPUs, in case there is a different CPU
   utilization, write the utilization of each CPU separately.
 - Whenever the latency result of the netperf instances were different,
   write the latency of each netperf instances separately.

Commands:
 - for netperf CPU=0:
$ for i in {1..3}; do taskset -c 0 netperf -H 1${i}.1.1.1 -t TCP_RR  -- \
  -o RT_LATENCY -r8 & done

 - for netperf CPU=0,2,4
$ for i in {1..3}; do taskset -c $(( ($i - 1) * 2  )) netperf -H \
  1${i}.1.1.1 -t TCP_RR  -- -o RT_LATENCY -r8 & done

================

====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-07 11:10:57 +00:00
..
ABI f2fs-for-5.16-rc1 2021-11-13 11:20:22 -08:00
accounting
admin-guide Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2021-12-25 13:00:14 -08:00
arm Documentation: arm: marvell: Fix link to armada_1000_pb.pdf document 2021-11-15 02:49:56 -07:00
arm64 arm64: update PAC description for kernel 2021-12-02 10:13:35 +00:00
block This is a relatively unexciting cycle for documentation. 2021-11-02 22:11:39 -07:00
bpf bpf, docs: Fully document the JMP mode modifiers 2022-01-05 13:11:26 -08:00
cdrom
core-api Merge branch 'akpm' (patches from Andrew) 2021-11-06 14:08:17 -07:00
cpu-freq cpufreq: docs: Update core.rst 2021-12-01 20:02:11 +01:00
crypto crypto: engine - Add KPP Support to Crypto Engine 2021-10-29 21:04:03 +08:00
dev-tools Merge branch 'akpm' (patches from Andrew) 2021-11-09 10:11:53 -08:00
devicetree Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-12-30 12:12:12 -08:00
doc-guide docs: Update Sphinx requirements 2021-11-15 02:47:22 -07:00
driver-api cxl for v5.16 2021-11-08 11:49:48 -08:00
fault-injection
fb
features parisc: Move thread_info into task struct 2021-11-01 07:35:59 +01:00
filesystems netfs: Adjust docs after foliation 2021-11-29 10:10:26 -08:00
firmware_class
firmware-guide Documentation: ACPI: Fix non-D0 probe _DSC object example 2021-11-10 13:59:12 +01:00
fpga
gpu drm-misc-next for 5.16: 2021-11-05 13:50:15 +10:00
hid
hwmon Driver core changes for 5.16-rc1 2021-11-04 08:32:38 -07:00
i2c Docs: Fixes link to I2C specification 2021-12-31 14:39:28 +01:00
ia64
ide
iio
infiniband
input
isdn
kbuild Kbuild updates for v5.16 2021-11-08 09:15:45 -08:00
kernel-hacking docs: futex: Fix kernel-doc references 2021-10-19 17:27:05 +02:00
leds leds: add new LED_FUNCTION_PLAYER for player LEDs for game controllers. 2021-10-27 09:49:29 +02:00
litmus-tests
livepatch
locking Documentation/locking/locktypes: Update migrate_disable() bits. 2021-11-30 15:40:31 +01:00
m68k
maintainer docs: use the lore redirector everywhere 2021-10-12 13:58:19 -06:00
mhi
mips
misc-devices
netlabel
networking Documentation: devlink: mlx5.rst: Fix htmldoc build warning 2022-01-06 16:22:55 -08:00
nios2
nvdimm
openrisc
parisc
PCI
pcmcia
power Documentation: power: Describe 'advanced' and 'simple' EM models 2021-11-10 21:26:34 +01:00
powerpc
process Documentation: Add minimum pahole version 2021-11-29 14:48:00 -07:00
RCU
riscv
s390
scheduler
scsi
security net,lsm,selinux: revert the security_sctp_assoc_established() hook 2021-11-14 12:21:53 +00:00
sh
sound ALSA: hda/realtek: Add new alc285-hp-amp-init model 2021-12-14 10:44:26 +01:00
sparc
sphinx
sphinx-static
spi
staging
target
timers
trace docs: ftrace: fix the wrong path of tracefs 2021-11-15 02:50:39 -07:00
translations doc/zh_CN: fix a translation error in management-style 2021-11-15 02:53:30 -07:00
usb
userspace-api Char/Misc driver update for 5.16-rc1 2021-11-04 08:21:47 -07:00
virt Merge branch 'kvm-sev-move-context' into kvm-master 2021-11-11 11:02:58 -05:00
vm mm/migrate.c: remove MIGRATE_PFN_LOCKED 2021-11-11 09:34:35 -08:00
w1
watchdog
x86 - Add the model number of a new, Raptor Lake CPU, to intel-family.h 2021-11-14 09:29:03 -08:00
xtensa
.gitignore
arch.rst
asm-annotations.rst docs: use the lore redirector everywhere 2021-10-12 13:58:19 -06:00
atomic_bitops.txt
atomic_t.txt
Changes
CodingStyle
conf.py docs: conf.py: fix support for Readthedocs v 1.0.0 2021-11-29 14:27:52 -07:00
COPYING-logo
docutils.conf
dontdiff
index.rst
Kconfig
logo.gif
Makefile
memory-barriers.txt
SubmittingPatches
watch_queue.rst