linux/net/core
Willy Tarreau 6922110d15 net: linkwatch: fix failure to restore device state across suspend/resume
After migrating my laptop from 4.19-LTS to 5.4-LTS a while ago I noticed
that my Ethernet port to which a bond and a VLAN interface are attached
appeared to remain up after resuming from suspend with the cable unplugged
(and that problem still persists with 5.10-LTS).

It happens that the following happens:

  - the network driver (e1000e here) prepares to suspend, calls e1000e_down()
    which calls netif_carrier_off() to signal that the link is going down.
  - netif_carrier_off() adds a link_watch event to the list of events for
    this device
  - the device is completely stopped.
  - the machine suspends
  - the cable is unplugged and the machine brought to another location
  - the machine is resumed
  - the queued linkwatch events are processed for the device
  - the device doesn't yet have the __LINK_STATE_PRESENT bit and its events
    are silently dropped
  - the device is resumed with its link down
  - the upper VLAN and bond interfaces are never notified that the link had
    been turned down and remain up
  - the only way to provoke a change is to physically connect the machine
    to a port and possibly unplug it.

The state after resume looks like this:
  $ ip -br li | egrep 'bond|eth'
  bond0            UP             e8:6a:64:64:64:64 <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP>
  eth0             DOWN           e8:6a:64:64:64:64 <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP>
  eth0.2@eth0      UP             e8:6a:64:64:64:64 <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP>

Placing an explicit call to netdev_state_change() either in the suspend
or the resume code in the NIC driver worked around this but the solution
is not satisfying.

The issue in fact really is in link_watch that loses events while it
ought not to. It happens that the test for the device being present was
added by commit 124eee3f69 ("net: linkwatch: add check for netdevice
being present to linkwatch_do_dev") in 4.20 to avoid an access to
devices that are not present.

Instead of dropping events, this patch proceeds slightly differently by
postponing their handling so that they happen after the device is fully
resumed.

Fixes: 124eee3f69 ("net: linkwatch: add check for netdevice being present to linkwatch_do_dev")
Link: https://lists.openwall.net/netdev/2018/03/15/62
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20210809160628.22623-1-w@1wt.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-11 14:43:16 -07:00
..
bpf_sk_storage.c bpf: Use struct_size() in kzalloc() 2021-05-13 15:58:00 -07:00
datagram.c
datagram.h
dev_addr_lists.c net: core: Correct function name dev_uc_flush() in the kerneldoc 2021-03-28 17:56:56 -07:00
dev_ioctl.c
dev.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf 2021-07-15 14:39:45 -07:00
devlink.c devlink: Fix phys_port_name of virtual port and merge error 2021-07-25 10:44:54 +01:00
drop_monitor.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-03-25 15:31:22 -07:00
dst_cache.c
dst.c net, bpf: Fix ip6ip6 crash with collect_md populated skbs 2021-03-10 12:24:18 -08:00
failover.c
fib_notifier.c
fib_rules.c fib: Return the correct errno code 2021-06-03 15:13:56 -07:00
filter.c Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
flow_dissector.c net: let flow have same hash in two directions 2021-07-28 12:54:06 +01:00
flow_offload.c
gen_estimator.c
gen_stats.c
gro_cells.c
hwbm.c
link_watch.c net: linkwatch: fix failure to restore device state across suspend/resume 2021-08-11 14:43:16 -07:00
lwt_bpf.c
lwtunnel.c
Makefile net: selftest: fix build issue if INET is disabled 2021-04-28 14:06:45 -07:00
neighbour.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-06-18 19:47:02 -07:00
net_namespace.c net: inline function get_net_ns_by_fd if NET_NS is disabled 2021-06-15 11:00:45 -07:00
net-procfs.c net: move the ptype_all and ptype_base declarations to include/linux/netdevice.h 2021-03-22 13:14:45 -07:00
net-sysfs.c net-sysfs: remove possible sleep from an RCU read-side critical section 2021-03-22 13:28:13 -07:00
net-sysfs.h
net-traces.c tcp: add tracepoint for checksum errors 2021-05-14 15:26:03 -07:00
netclassid_cgroup.c
netevent.c net: core: Correct function name netevent_unregister_notifier() in the kerneldoc 2021-03-28 17:56:56 -07:00
netpoll.c asm-generic/unaligned: Unify asm/unaligned.h around struct helper 2021-07-02 12:43:40 -07:00
netprio_cgroup.c
page_pool.c page_pool: mask the page->signature before the checking 2021-08-09 10:03:02 +01:00
pktgen.c pktgen: add pktgen_handle_all_threads() for the same code 2021-06-07 13:15:31 -07:00
ptp_classifier.c
request_sock.c
rtnetlink.c net: say "local" instead of "static" addresses in ndo_dflt_fdb_{add,del} 2021-06-29 11:31:57 -07:00
scm.c scm: fix a typo in put_cmsg() 2021-04-16 11:41:07 -07:00
secure_seq.c
selftests.c net: add generic selftest support 2021-04-20 16:08:02 -07:00
skbuff.c net: Fix zero-copy head len calculation. 2021-07-18 09:42:17 -07:00
skmsg.c bpf, sockmap: Fix memleak on ingress msg enqueue 2021-07-27 14:55:30 -07:00
sock_diag.c
sock_map.c bpf: Fix integer overflow in argument calculation for bpf_map_area_alloc 2021-06-22 10:14:29 -07:00
sock_reuseport.c tcp: Add stats for socket migration. 2021-06-23 12:56:08 -07:00
sock.c sock: unlock on error in sock_setsockopt() 2021-07-07 20:49:12 -07:00
stream.c
sysctl_net_core.c net: change netdev_unregister_timeout_secs min value to 1 2021-03-25 17:24:06 -07:00
timestamping.c
tso.c
utils.c
xdp.c xdp: Move the rxq_info.mem clearing to unreg_mem_model() 2021-06-28 23:07:59 +02:00