linux/fs/proc
Eric W. Biederman 63f818f46a proc: Use a dedicated lock in struct pid
syzbot wrote:
> ========================================================
> WARNING: possible irq lock inversion dependency detected
> 5.6.0-syzkaller #0 Not tainted
> --------------------------------------------------------
> swapper/1/0 just changed the state of lock:
> ffffffff898090d8 (tasklist_lock){.+.?}-{2:2}, at: send_sigurg+0x9f/0x320 fs/fcntl.c:840
> but this lock took another, SOFTIRQ-unsafe lock in the past:
>  (&pid->wait_pidfd){+.+.}-{2:2}
>
>
> and interrupts could create inverse lock ordering between them.
>
>
> other info that might help us debug this:
>  Possible interrupt unsafe locking scenario:
>
>        CPU0                    CPU1
>        ----                    ----
>   lock(&pid->wait_pidfd);
>                                local_irq_disable();
>                                lock(tasklist_lock);
>                                lock(&pid->wait_pidfd);
>   <Interrupt>
>     lock(tasklist_lock);
>
>  *** DEADLOCK ***
>
> 4 locks held by swapper/1/0:

The problem is that because wait_pidfd.lock is taken under the tasklist
lock.  It must always be taken with irqs disabled as tasklist_lock can be
taken from interrupt context and if wait_pidfd.lock was already taken this
would create a lock order inversion.

Oleg suggested just disabling irqs where I have added extra calls to
wait_pidfd.lock.  That should be safe and I think the code will eventually
do that.  It was rightly pointed out by Christian that sharing the
wait_pidfd.lock was a premature optimization.

It is also true that my pre-merge window testing was insufficient.  So
remove the premature optimization and give struct pid a dedicated lock of
it's own for struct pid things.  I have verified that lockdep sees all 3
paths where we take the new pid->lock and lockdep does not complain.

It is my current day dream that one day pid->lock can be used to guard the
task lists as well and then the tasklist_lock won't need to be held to
deliver signals.  That will require taking pid->lock with irqs disabled.

Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/lkml/00000000000011d66805a25cd73f@google.com/
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Reported-by: syzbot+343f75cdeea091340956@syzkaller.appspotmail.com
Reported-by: syzbot+832aabf700bc3ec920b9@syzkaller.appspotmail.com
Reported-by: syzbot+f675f964019f884dbd0f@syzkaller.appspotmail.com
Reported-by: syzbot+a9fb1457d720a55d6dc5@syzkaller.appspotmail.com
Fixes: 7bc3e6e55a ("proc: Use a list of inodes to flush from proc")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-04-09 12:15:35 -05:00
..
array.c time: Rename tsk->real_start_time to ->start_boottime 2019-11-13 11:09:49 +01:00
base.c proc: Use a dedicated lock in struct pid 2020-04-09 12:15:35 -05:00
bootconfig.c proc: bootconfig: Add /proc/bootconfig to show boot config list 2020-01-13 13:19:39 -05:00
cmdline.c proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00
consoles.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 191 2019-05-30 11:29:21 -07:00
cpuinfo.c proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
devices.c proc: introduce proc_create_seq{,_data} 2018-05-16 07:23:35 +02:00
fd.c proc: use "unsigned int" in proc_fill_cache() 2018-06-07 17:34:38 -07:00
fd.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
generic.c proc: decouple proc from VFS with "struct proc_ops" 2020-02-04 03:05:26 +00:00
inode.c proc: Use a list of inodes to flush from proc 2020-02-24 10:14:44 -06:00
internal.h proc: Use a list of inodes to flush from proc 2020-02-24 10:14:44 -06:00
interrupts.c proc: introduce proc_create_seq{,_data} 2018-05-16 07:23:35 +02:00
Kconfig x86/resctrl: Add task resctrl information display 2020-01-20 16:19:10 +01:00
kcore.c proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
kmsg.c proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
loadavg.c sched: loadavg: consolidate LOAD_INT, LOAD_FRAC, CALC_LOAD 2018-10-26 16:26:32 -07:00
Makefile proc: bootconfig: Add /proc/bootconfig to show boot config list 2020-01-13 13:19:39 -05:00
meminfo.c proc/meminfo: fix output alignment 2019-10-19 06:32:32 -04:00
namespaces.c Merge branch 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-01-29 11:20:24 -08:00
nommu.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
page.c proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
proc_net.c proc: decouple proc from VFS with "struct proc_ops" 2020-02-04 03:05:26 +00:00
proc_sysctl.c proc: Use d_invalidate in proc_prune_siblings_dcache 2020-02-24 09:50:04 -06:00
proc_tty.c tty: replace ->proc_fops with ->proc_show 2018-05-16 07:24:30 +02:00
root.c proc: Remove the now unnecessary internal mount of proc 2020-02-28 12:06:14 -06:00
self.c fs/proc/self.c: code cleanup for proc_setup_self() 2019-03-05 21:07:21 -08:00
softirqs.c proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00
stat.c proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
task_mmu.c mm: pagewalk: add 'depth' parameter to pte_hole 2020-02-04 03:05:25 +00:00
task_nommu.c proc: use down_read_killable mmap_sem for /proc/pid/maps 2019-07-12 11:05:46 -07:00
thread_self.c fs/proc/thread_self.c: code cleanup for proc_setup_thread_self() 2019-03-05 21:07:21 -08:00
uptime.c fs/proc: Respect boottime inside time namespace for /proc/uptime 2020-01-14 12:20:56 +01:00
util.c fs/proc/util.c: include fs/proc/internal.h for name_to_int() 2019-01-04 13:13:45 -08:00
version.c proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00
vmcore.c proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00