Rather than incurring a division or requesting too many random bytes for
the given range, use the prandom_u32_max() function, which only takes
the minimum required bytes from the RNG and avoids divisions. This was
done mechanically with this coccinelle script:
@basic@
expression E;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u64;
@@
(
- ((T)get_random_u32() % (E))
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ((E) - 1))
+ prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2)
|
- ((u64)(E) * get_random_u32() >> 32)
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ~PAGE_MASK)
+ prandom_u32_max(PAGE_SIZE)
)
@multi_line@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
identifier RAND;
expression E;
@@
- RAND = get_random_u32();
... when != RAND
- RAND %= (E);
+ RAND = prandom_u32_max(E);
// Find a potential literal
@literal_mask@
expression LITERAL;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
position p;
@@
((T)get_random_u32()@p & (LITERAL))
// Add one to the literal.
@script:python add_one@
literal << literal_mask.LITERAL;
RESULT;
@@
value = None
if literal.startswith('0x'):
value = int(literal, 16)
elif literal[0] in '123456789':
value = int(literal, 10)
if value is None:
print("I don't know how to handle %s" % (literal))
cocci.include_match(False)
elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1:
print("Skipping 0x%x for cleanup elsewhere" % (value))
cocci.include_match(False)
elif value & (value + 1) != 0:
print("Skipping 0x%x because it's not a power of two minus one" % (value))
cocci.include_match(False)
elif literal.startswith('0x'):
coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1))
else:
coccinelle.RESULT = cocci.make_expr("%d" % (value + 1))
// Replace the literal mask with the calculated result.
@plus_one@
expression literal_mask.LITERAL;
position literal_mask.p;
expression add_one.RESULT;
identifier FUNC;
@@
- (FUNC()@p & (LITERAL))
+ prandom_u32_max(RESULT)
@collapse_ret@
type T;
identifier VAR;
expression E;
@@
{
- T VAR;
- VAR = (E);
- return VAR;
+ return E;
}
@drop_var@
type T;
identifier VAR;
@@
{
- T VAR;
... when != VAR
}
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: KP Singh <kpsingh@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
ib_dma_map_sg() augments the SGL into a 'dma mapped SGL'. This process
may change the number of entries and the lengths of each entry.
Code that touches dma_address is iterating over the 'dma mapped SGL'
and must use dma_nents which returned from ib_dma_map_sg().
We should use the return count from ib_dma_map_sg for futher usage.
Fixes: 9cb8374804 ("RDMA/rtrs: server: main functionality")
Link: https://lore.kernel.org/r/20220818105355.110344-4-haris.iqbal@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
When iommu is enabled, we hit warnings like this:
WARNING: at rtrs/rtrs.c:178 rtrs_iu_post_rdma_write_imm+0x9b/0x110
rtrs warn on one sge entry length is 0, which is unexpected.
The problem is ib_dma_map_sg augments the SGL into a 'dma mapped SGL'.
This process may change the number of entries and the lengths of each
entry.
Code that touches dma_address is iterating over the 'dma mapped SGL'
and must use dma_nents which returned from ib_dma_map_sg().
So pass the count return from ib_dma_map_sg.
Fixes: 6a98d71dae ("RDMA/rtrs: client: main functionality")
Link: https://lore.kernel.org/r/20220818105355.110344-3-haris.iqbal@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Add event tracing mechanism for following routines:
- send_io_resp_imm()
How to use:
1. Load the rtrs_server module
2. cd /sys/kernel/debug/tracing
3. If all the events need to be enabled:
echo 1 > events/rtrs_srv/enable
4. OR only speific routine/event needs to be enabled e.g.
echo 1 > events/rtrs_srv/send_io_resp_imm/enable
5. cat trace
6. Run some I/O workload which can trigger send_io_resp_imm()
Link: https://lore.kernel.org/r/20220818105240.110234-3-haris.iqbal@ionos.com
Signed-off-by: Santosh Pradhan <santosh.pradhan@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Add event tracing mechanism for following routines:
- rtrs_clt_reconnect_work()
- rtrs_clt_close_conns()
- rtrs_rdma_error_recovery()
How to use:
1. Load the rtrs_client module
2. cd /sys/kernel/debug/tracing
3. If all the events need to be enabled:
echo 1 > events/rtrs_clt/enable
4. OR only speific routine/event needs to be enabled e.g.
echo 1 > events/rtrs_clt/rtrs_clt_close_conns/enable
5. cat trace
6. Run some workload which can trigger rtrs_clt_close_conns()
Link: https://lore.kernel.org/r/20220818105240.110234-2-haris.iqbal@ionos.com
Signed-off-by: Santosh Pradhan <santosh.pradhan@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
removes list_next_or_null_rr_rcu macro to fix below warnings.
That macro is used only twice.
CHECK:MACRO_ARG_REUSE: Macro argument reuse 'head' - possible side-effects?
CHECK:MACRO_ARG_REUSE: Macro argument reuse 'ptr' - possible side-effects?
CHECK:MACRO_ARG_REUSE: Macro argument reuse 'memb' - possible side-effects?
Replaces that macro with an inline function.
Fixes: 6a98d71dae ("RDMA/rtrs: client: main functionality")
Cc: jinpu.wang@ionos.com
Link: https://lore.kernel.org/r/20220712103113.617754-5-haris.iqbal@ionos.com
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Add the description of @pathname and remove @sessname in rtrs_clt_open()
kernel-doc comment to remove warnings found by running scripts/kernel-doc,
which is caused by using 'make W=1'.
drivers/infiniband/ulp/rtrs/rtrs-clt.c:2809: warning: Function parameter or member 'pathname' not described in 'rtrs_clt_open'
drivers/infiniband/ulp/rtrs/rtrs-clt.c:2809: warning: Excess function parameter 'sessname' description in 'rtrs_clt_open'
Link: https://lore.kernel.org/r/20220526130945.98601-1-yang.lee@linux.alibaba.com
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Pull rdma updates from Jason Gunthorpe:
- Minor bug fixes in mlx5, mthca, pvrdma, rtrs, mlx4, hfi1, hns
- Minor cleanups: coding style, useless includes and documentation
- Reorganize how multicast processing works in rxe
- Replace a red/black tree with xarray in rxe which improves performance
- DSCP support and HW address handle re-use in irdma
- Simplify the mailbox command handling in hns
- Simplify iser now that FMR is eliminated
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (93 commits)
RDMA/nldev: Prevent underflow in nldev_stat_set_counter_dynamic_doit()
IB/iser: Fix error flow in case of registration failure
IB/iser: Generalize map/unmap dma tasks
IB/iser: Use iser_fr_desc as registration context
IB/iser: Remove iser_reg_data_sg helper function
RDMA/rxe: Use standard names for ref counting
RDMA/rxe: Replace red-black trees by xarrays
RDMA/rxe: Shorten pool names in rxe_pool.c
RDMA/rxe: Move max_elem into rxe_type_info
RDMA/rxe: Replace obj by elem in declaration
RDMA/rxe: Delete _locked() APIs for pool objects
RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC
RDMA/rxe: Replace mr by rkey in responder resources
RDMA/rxe: Fix ref error in rxe_av.c
RDMA/hns: Use the reserved loopback QPs to free MR before destroying MPT
RDMA/irdma: Add support for address handle re-use
RDMA/qib: Fix typos in comments
RDMA/mlx5: Fix memory leak in error flow for subscribe event routine
Revert "RDMA/core: Fix ib_qp_usecnt_dec() called when error"
RDMA/rxe: Remove useless argument for update_state()
...
Error path of rtrs_clt_open() calls free_clt(), where free_permit is
called. This is wrong since error path of rtrs_clt_open() does not need
to call free_permit().
Also, moving free_permits() call to rtrs_clt_close(), makes it more
aligned with the call to alloc_permit() in rtrs_clt_open().
Fixes: 6a98d71dae ("RDMA/rtrs: client: main functionality")
Link: https://lore.kernel.org/r/20220217030929.323849-2-haris.iqbal@ionos.com
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Reviewed-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Callback function rtrs_clt_dev_release() for put_device() calls kfree(clt)
to free memory. We shouldn't call kfree(clt) again, and we can't use the
clt after kfree too.
Replace device_register() with device_initialize() and device_add() so that
dev_set_name can() be used appropriately.
Move mutex_destroy() to the release function so it can be called in
the alloc_clt err path.
Fixes: eab0982466 ("RDMA/rtrs-clt: Refactor the failure cases in alloc_clt")
Link: https://lore.kernel.org/r/20220217030929.323849-1-haris.iqbal@ionos.com
Reported-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Reviewed-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
rtrs_clt_sess is used for paths and not sessions on the client side. This
creates confusion so let's rename it to rtrs_clt_path. Also, rename
related variables and functions.
Coccinelle is used to do the transformations for most of the occurrences
and remaining ones were handled manually.
Link: https://lore.kernel.org/r/20220105180708.7774-4-jinpu.wang@ionos.com
Signed-off-by: Vaishali Thakkar <vaishali.thakkar@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
rtrs_srv_sess is used for paths and not sessions on the server side. This
creates confusion so let's rename it to rtrs_srv_path. Also, rename
related variables and functions.
Coccinelle is used to do the transformations for most of the occurrences
and remaining ones were handled manually.
Link: https://lore.kernel.org/r/20220105180708.7774-3-jinpu.wang@ionos.com
Signed-off-by: Vaishali Thakkar <vaishali.thakkar@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
With preemption enabled (CONFIG_DEBUG_PREEMPT=y), the following appeared
when rnbd client tries to map remote block device.
BUG: using smp_processor_id() in preemptible [00000000] code: bash/1733
caller is debug_smp_processor_id+0x17/0x20
CPU: 0 PID: 1733 Comm: bash Not tainted 5.16.0-rc1 #5
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x5d/0x78
dump_stack+0x10/0x12
check_preemption_disabled+0xe4/0xf0
debug_smp_processor_id+0x17/0x20
rtrs_clt_update_all_stats+0x3b/0x70 [rtrs_client]
rtrs_clt_read_req+0xc3/0x380 [rtrs_client]
? rtrs_clt_init_req+0xe3/0x120 [rtrs_client]
rtrs_clt_request+0x1a7/0x320 [rtrs_client]
? 0xffffffffc0ab1000
send_usr_msg+0xbf/0x160 [rnbd_client]
? rnbd_clt_put_sess+0x60/0x60 [rnbd_client]
? send_usr_msg+0x160/0x160 [rnbd_client]
? sg_alloc_table+0x27/0xb0
? sg_zero_buffer+0xd0/0xd0
send_msg_sess_info+0xe9/0x180 [rnbd_client]
? rnbd_clt_put_sess+0x60/0x60 [rnbd_client]
? blk_mq_alloc_tag_set+0x2ef/0x370
rnbd_clt_map_device+0xba8/0xcd0 [rnbd_client]
? send_msg_open+0x200/0x200 [rnbd_client]
rnbd_clt_map_device_store+0x3e5/0x620 [rnbd_client
To supress the calltrace, let's call get_cpu_ptr/put_cpu_ptr pair in
rtrs_clt_update_rdma_stats to disable preemption when accessing per-cpu
variable.
While at it, let's make the similar change in rtrs_clt_update_wc_stats.
And for rtrs_clt_inc_failover_cnt, though it was only called inside rcu
section, but it still can be preempted in case CONFIG_PREEMPT_RCU is
enabled, so change it to {get,put}_cpu_ptr pair either.
Link: https://lore.kernel.org/r/20211128133501.38710-1-guoqing.jiang@linux.dev
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
When testing with poll mode, it will fail and lead to warning below on
client side:
$ echo "sessname=bla path=gid:fe80::2:c903:4e:d0b3@gid:fe80::2:c903:8:ca17 device_path=/dev/nullb2 nr_poll_queues=-1" | \
sudo tee /sys/devices/virtual/rnbd-client/ctl/map_device
rnbd_client L597: Mapping device /dev/nullb2 on session bla, (access_mode: rw, nr_poll_queues: 8)
WARNING: CPU: 3 PID: 9886 at drivers/infiniband/core/cq.c:447 ib_cq_pool_get+0x26f/0x2a0 [ib_core]
The problem is in case of poll queue, we need to still call
ib_alloc_cq/ib_free_cq, we can't use cq_poll api for poll queue.
As both client and server use shared function from rtrs, set irq_con_num
to con_num on server side, which is number of total connection of the
session, this way we can differ if the rtrs_con requires pollqueue.
Following up patches will replace the duplicate code with helpers.
Link: https://lore.kernel.org/r/20210922125333.351454-4-haris.iqbal@ionos.com
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
There are mis-match at counting inflight IO after changing the multipath
policy.
For example, we started fio test with round-robin policy and then we
changed the policy to min-inflight. IOs created under the RR policy is
finished under the min-inflight policy and inflight counter only
decreased. So the counter would be negative value. And also we started
fio test with min-inflight policy and changed the policy to the
round-robin. IOs created under the min-inflight policy increased the
inflight IO counter but the inflight IO counter was not decreased because
the policy was the round-robin when IO was finished.
So it should count IOs only if the IO is created under the min-inflight
policy. It should not care the policy when the IO is finished.
This patch adds a field mp_policy in struct rtrs_clt_io_req and stores the
multipath policy when an object of rtrs_clt_io_req is created. Then
rtrs-clt checks the mp_policy of only struct rtrs_clt_io_req instead of
the struct rtrs_clt.
Link: https://lore.kernel.org/r/20210806112112.124313-6-haris.iqbal@ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The IO performance test with fio after swapping the likely and unlikely
macros in all if-statement shows no difference. They do not help for the
performance of rtrs.
Thanks to Haakon Bugge for the test scenario.
The fio test did random read on 32 rnbd devices and 64 processes.
Test environment:
- Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
- 376G memory
- kernel version: 5.4.86
- gcc version: gcc (Debian 8.3.0-6) 8.3.0
- Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5]
Test result:
- before swapping: IOPS=829k, BW=3239MiB/s
- after swapping: IOPS=829k, BW=3238MiB/s
- remove all (un)likely: IOPS=829k, BW=3238MiB/s
Link: https://lore.kernel.org/r/20210806112112.124313-5-haris.iqbal@ionos.com
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>