linux

History

David S. Miller 26abf15c49 mlx5-updates-2022-01-06 1) Expose FEC per lane block counters via ethtool 2) Trivial fixes/updates/cleanup to mlx5e netdev driver 3) Fix htmldoc build warning 4) Spread mlx5 SFs (sub-functions) to all available CPU cores: Commits 1..5 Shay Drory Says: ================ Before this patchset, mlx5 subfunction shared the same IRQs (MSI-X) with their peers subfunctions, causing them to use same CPU cores. In large scale, this is very undesirable, SFs use small number of cpu cores and all of them will be packed on the same CPU cores, not utilizing all CPU cores in the system. In this patchset we want to achieve two things. a) Spread IRQs used by SFs to all cpu cores b) Pack less SFs in the same IRQ, will result in multiple IRQs per core. In this patchset, we spread SFs over all online cpus available to mlx5 irqs in Round-Robin manner. e.g.: Whenever a SF is created, pick the next CPU core with least number of SF IRQs bound to it, SFs will share IRQs on the same core until a certain limit, when such limit is reached, we request a new IRQ and add it to that CPU core IRQ pool, when out of IRQs, pick any IRQ with least number of SF users. This enhancement is done in order to achieve a better distribution of the SFs over all the available CPUs, which reduces application latency, as shown bellow. Machine details: Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz with 56 cores. PCI Express 3 with BW of 126 Gb/s. ConnectX-5 Ex; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; PCIe4.0 x16. Base line test description: Single SF on the system. One instance of netperf is running on-top the SF. Numbers: latency = 15.136 usec, CPU Util = 35% Test description: There are 250 SFs on the system. There are 3 instances of netperf running, on-top three different SFs, in parallel. Perf numbers: # netperf SFs latency(usec) latency CPU utilization affinity affinity (lower is better) increase % 1 cpu=0 cpu={0} ~23 (app 1-3) 35% 75% 2 cpu=0,2,4 cpu={0} app 1: 21.625 30% 68% (CPU 0) app 2-3: 16.5 9% 15% (CPU 2,4) 3 cpu=0 cpu={0,2,4} app 1: ~16 7% 84% (CPU 0) app 2-3: ~17.9 14% 22% (CPU 2,4) 4 cpu=0,2,4 cpu={0,2,4} 15.2 (app 1-3) 0% 33% (CPU 0,2,4) - The first two entries (#1 and #2) show current state. e.g.: SFs are using the same CPU. The last two entries (#3 and #4) shows the latency reduction improvement of this patch. e.g.: SFs are on different CPUs. - Whenever we use several CPUs, in case there is a different CPU utilization, write the utilization of each CPU separately. - Whenever the latency result of the netperf instances were different, write the latency of each netperf instances separately. Commands: - for netperf CPU=0: $ for i in {1..3}; do taskset -c 0 netperf -H 1${i}.1.1.1 -t TCP_RR -- \ -o RT_LATENCY -r8 & done - for netperf CPU=0,2,4 $ for i in {1..3}; do taskset -c $(( ($i - 1) * 2 )) netperf -H \ 1${i}.1.1.1 -t TCP_RR -- -o RT_LATENCY -r8 & done ================ -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmHXh+AACgkQSD+KveBX +j68fQgAghUX4TFS2JSwa7+XSCtzz7GIu2Xrz8aWTAnydRLlNXuFuuHYcNed6I0l 7DaVOZwHp1tp3tnx3WMGPUU6ujDPEgasaDDblvG2UXix5LPVEHDXY44ittQX8mpC SC8Yj9mNo6DSfOMUZklFDMbw57XuLJ+HEGnwnrOEEyLX7ruDXGEViUmVBd4IoC3B F2fJHBkdTJfHWTJRB4pWbZD1dw7WbKd0RyPla3OkoHugEUCKnbjii8cMwNM64Bbp Pjz/SiShVy+NTotqPzRNjcx7y4tHOXCYt33zt1VlGtdUxs5eCA5jkjHFz0jb12Lu rvfHaBaU+elMKTw5G/WMGJxZQx0kEQ== =VBWY -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2022-01-06' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2022-01-06 1) Expose FEC per lane block counters via ethtool 2) Trivial fixes/updates/cleanup to mlx5e netdev driver 3) Fix htmldoc build warning 4) Spread mlx5 SFs (sub-functions) to all available CPU cores: Commits 1..5 Shay Drory Says: ================ Before this patchset, mlx5 subfunction shared the same IRQs (MSI-X) with their peers subfunctions, causing them to use same CPU cores. In large scale, this is very undesirable, SFs use small number of cpu cores and all of them will be packed on the same CPU cores, not utilizing all CPU cores in the system. In this patchset we want to achieve two things. a) Spread IRQs used by SFs to all cpu cores b) Pack less SFs in the same IRQ, will result in multiple IRQs per core. In this patchset, we spread SFs over all online cpus available to mlx5 irqs in Round-Robin manner. e.g.: Whenever a SF is created, pick the next CPU core with least number of SF IRQs bound to it, SFs will share IRQs on the same core until a certain limit, when such limit is reached, we request a new IRQ and add it to that CPU core IRQ pool, when out of IRQs, pick any IRQ with least number of SF users. This enhancement is done in order to achieve a better distribution of the SFs over all the available CPUs, which reduces application latency, as shown bellow. Machine details: Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz with 56 cores. PCI Express 3 with BW of 126 Gb/s. ConnectX-5 Ex; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; PCIe4.0 x16. Base line test description: Single SF on the system. One instance of netperf is running on-top the SF. Numbers: latency = 15.136 usec, CPU Util = 35% Test description: There are 250 SFs on the system. There are 3 instances of netperf running, on-top three different SFs, in parallel. Perf numbers: # netperf SFs latency(usec) latency CPU utilization affinity affinity (lower is better) increase % 1 cpu=0 cpu={0} ~23 (app 1-3) 35% 75% 2 cpu=0,2,4 cpu={0} app 1: 21.625 30% 68% (CPU 0) app 2-3: 16.5 9% 15% (CPU 2,4) 3 cpu=0 cpu={0,2,4} app 1: ~16 7% 84% (CPU 0) app 2-3: ~17.9 14% 22% (CPU 2,4) 4 cpu=0,2,4 cpu={0,2,4} 15.2 (app 1-3) 0% 33% (CPU 0,2,4) - The first two entries (#1 and #2) show current state. e.g.: SFs are using the same CPU. The last two entries (#3 and #4) shows the latency reduction improvement of this patch. e.g.: SFs are on different CPUs. - Whenever we use several CPUs, in case there is a different CPU utilization, write the utilization of each CPU separately. - Whenever the latency result of the netperf instances were different, write the latency of each netperf instances separately. Commands: - for netperf CPU=0: $ for i in {1..3}; do taskset -c 0 netperf -H 1${i}.1.1.1 -t TCP_RR -- \ -o RT_LATENCY -r8 & done - for netperf CPU=0,2,4 $ for i in {1..3}; do taskset -c $(( ($i - 1) * 2 )) netperf -H \ 1${i}.1.1.1 -t TCP_RR -- -o RT_LATENCY -r8 & done ================ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>		2022-01-07 11:10:57 +00:00
..
ABI	f2fs-for-5.16-rc1	2021-11-13 11:20:22 -08:00
accounting
admin-guide	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input	2021-12-25 13:00:14 -08:00
arm	Documentation: arm: marvell: Fix link to armada_1000_pb.pdf document	2021-11-15 02:49:56 -07:00
arm64	arm64: update PAC description for kernel	2021-12-02 10:13:35 +00:00
block	This is a relatively unexciting cycle for documentation.	2021-11-02 22:11:39 -07:00
bpf	bpf, docs: Fully document the JMP mode modifiers	2022-01-05 13:11:26 -08:00
cdrom
core-api	Merge branch 'akpm' (patches from Andrew)	2021-11-06 14:08:17 -07:00
cpu-freq	cpufreq: docs: Update core.rst	2021-12-01 20:02:11 +01:00
crypto	crypto: engine - Add KPP Support to Crypto Engine	2021-10-29 21:04:03 +08:00
dev-tools	Merge branch 'akpm' (patches from Andrew)	2021-11-09 10:11:53 -08:00
devicetree	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	2021-12-30 12:12:12 -08:00
doc-guide	docs: Update Sphinx requirements	2021-11-15 02:47:22 -07:00
driver-api	cxl for v5.16	2021-11-08 11:49:48 -08:00
fault-injection
fb
features	parisc: Move thread_info into task struct	2021-11-01 07:35:59 +01:00
filesystems	netfs: Adjust docs after foliation	2021-11-29 10:10:26 -08:00
firmware_class
firmware-guide	Documentation: ACPI: Fix non-D0 probe _DSC object example	2021-11-10 13:59:12 +01:00
fpga
gpu	drm-misc-next for 5.16:	2021-11-05 13:50:15 +10:00
hid
hwmon	Driver core changes for 5.16-rc1	2021-11-04 08:32:38 -07:00
i2c	Docs: Fixes link to I2C specification	2021-12-31 14:39:28 +01:00
ia64
ide
iio
infiniband
input
isdn
kbuild	Kbuild updates for v5.16	2021-11-08 09:15:45 -08:00
kernel-hacking	docs: futex: Fix kernel-doc references	2021-10-19 17:27:05 +02:00
leds	leds: add new LED_FUNCTION_PLAYER for player LEDs for game controllers.	2021-10-27 09:49:29 +02:00
litmus-tests
livepatch
locking	Documentation/locking/locktypes: Update migrate_disable() bits.	2021-11-30 15:40:31 +01:00
m68k
maintainer	docs: use the lore redirector everywhere	2021-10-12 13:58:19 -06:00
mhi
mips
misc-devices
netlabel
networking	Documentation: devlink: mlx5.rst: Fix htmldoc build warning	2022-01-06 16:22:55 -08:00
nios2
nvdimm
openrisc
parisc
PCI
pcmcia
power	Documentation: power: Describe 'advanced' and 'simple' EM models	2021-11-10 21:26:34 +01:00
powerpc
process	Documentation: Add minimum pahole version	2021-11-29 14:48:00 -07:00
RCU
riscv
s390
scheduler
scsi
security	net,lsm,selinux: revert the security_sctp_assoc_established() hook	2021-11-14 12:21:53 +00:00
sh
sound	ALSA: hda/realtek: Add new alc285-hp-amp-init model	2021-12-14 10:44:26 +01:00
sparc
sphinx
sphinx-static
spi
staging
target
timers
trace	docs: ftrace: fix the wrong path of tracefs	2021-11-15 02:50:39 -07:00
translations	doc/zh_CN: fix a translation error in management-style	2021-11-15 02:53:30 -07:00
usb
userspace-api	Char/Misc driver update for 5.16-rc1	2021-11-04 08:21:47 -07:00
virt	Merge branch 'kvm-sev-move-context' into kvm-master	2021-11-11 11:02:58 -05:00
vm	mm/migrate.c: remove MIGRATE_PFN_LOCKED	2021-11-11 09:34:35 -08:00
w1
watchdog
x86	- Add the model number of a new, Raptor Lake CPU, to intel-family.h	2021-11-14 09:29:03 -08:00
xtensa
.gitignore
arch.rst
asm-annotations.rst	docs: use the lore redirector everywhere	2021-10-12 13:58:19 -06:00
atomic_bitops.txt
atomic_t.txt
Changes
CodingStyle
conf.py	docs: conf.py: fix support for Readthedocs v 1.0.0	2021-11-29 14:27:52 -07:00
COPYING-logo
docutils.conf
dontdiff
index.rst
Kconfig
logo.gif
Makefile
memory-barriers.txt
SubmittingPatches
watch_queue.rst