linux/drivers/base
Luis R. Rodriguez e44565f62a firmware: fix batched requests - wake all waiters
The firmware cache mechanism serves two purposes, the secondary purpose is
not well documented nor understood. This fixes a regression with the
secondary purpose of the firmware cache mechanism: batched requests on
successful lookups. Without this fix *any* time a batched request is
triggered, secondary requests for which the batched request mechanism
was designed for will seem to last forver and seem to never return.
This issue is present for all kernel builds possible, and a hard reset
is required.

The firmware cache is used for:

1) Addressing races with file lookups during the suspend/resume cycle
   by keeping firmware in memory during the suspend/resume cycle

2) Batched requests for the same file rely only on work from the first file
   lookup, which keeps the firmware in memory until the last
   release_firmware() is called

Batched requests *only* take effect if secondary requests come in prior to
the first user calling release_firmware(). The devres name used for the
internal firmware cache is used as a hint other pending requests are
ongoing, the firmware buffer data is kept in memory until the last user of
the buffer calls release_firmware(), therefore serializing requests and
delaying the release until all requests are done.

Batched requests wait for a wakup or signal so we can rely on the first file
fetch to write to the pending secondary requests. Commit 5b02962494
("firmware: do not use fw_lock for fw_state protection") ported the firmware
API to use swait, and in doing so failed to convert complete_all() to
swake_up_all() -- it used swake_up(), loosing the ability for *some* batched
requests to take effect.

We *could* fix this by just using swake_up_all() *but* swait is now known
to be very special use case, so its best to just move away from it. So we
just go back to using completions as before commit 5b02962494 ("firmware:
do not use fw_lock for fw_state protection") given this was using
complete_all().

Without this fix it has been reported plugging in two Intel 6260 Wifi cards
on a system will end up enumerating the two devices only 50% of the time
[0]. The ported swake_up() should have actually handled the case with two
devices, however, *if more than two cards are used* the swake_up() would
not have sufficed. This change is only part of the required fixes for
batched requests. Another fix is provided in the next patch.

This particular change should fix the cases where more than three requests
with the same firmware name is used, otherwise batched requests will wait
for MAX_SCHEDULE_TIMEOUT and just timeout eventually.

Below is a summary of tests triggering batched requests on different
kernel builds.

Before this patch:
============================================================================
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n
CONFIG_FW_LOADER_USER_HELPER=y

Most common Linux distribution setup.

API-type                               no-firmware-found   firmware-found
----------------------------------------------------------------------
request_firmware()                     FAIL                FAIL
request_firmware_direct()              FAIL                FAIL
request_firmware_nowait(uevent=true)   FAIL                FAIL
request_firmware_nowait(uevent=false)  FAIL                FAIL
============================================================================
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n
CONFIG_FW_LOADER_USER_HELPER=n

Only possible if CONFIG_DELL_RBU=n and CONFIG_LEDS_LP55XX_COMMON=n, rare.

API-type                               no-firmware-found   firmware-found
----------------------------------------------------------------------
request_firmware()                     FAIL                FAIL
request_firmware_direct()              FAIL                FAIL
request_firmware_nowait(uevent=true)   FAIL                FAIL
request_firmware_nowait(uevent=false)  FAIL                FAIL
============================================================================
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y
CONFIG_FW_LOADER_USER_HELPER=y

Google Android setup.

API-type                               no-firmware-found   firmware-found
----------------------------------------------------------------------
request_firmware()                     FAIL                FAIL
request_firmware_direct()              FAIL                FAIL
request_firmware_nowait(uevent=true)   FAIL                FAIL
request_firmware_nowait(uevent=false)  FAIL                FAIL
============================================================================

After this patch:
============================================================================
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n
CONFIG_FW_LOADER_USER_HELPER=y

Most common Linux distribution setup.

API-type                               no-firmware-found   firmware-found
----------------------------------------------------------------------
request_firmware()                     FAIL                OK
request_firmware_direct()              FAIL                OK
request_firmware_nowait(uevent=true)   FAIL                OK
request_firmware_nowait(uevent=false)  FAIL                OK
============================================================================
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n
CONFIG_FW_LOADER_USER_HELPER=n

Only possible if CONFIG_DELL_RBU=n and CONFIG_LEDS_LP55XX_COMMON=n, rare.

API-type                               no-firmware-found   firmware-found
----------------------------------------------------------------------
request_firmware()                     FAIL                OK
request_firmware_direct()              FAIL                OK
request_firmware_nowait(uevent=true)   FAIL                OK
request_firmware_nowait(uevent=false)  FAIL                OK
============================================================================
CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y
CONFIG_FW_LOADER_USER_HELPER=y

Google Android setup.

API-type                               no-firmware-found   firmware-found
----------------------------------------------------------------------
request_firmware()                     OK                  OK
request_firmware_direct()              FAIL                OK
request_firmware_nowait(uevent=true)   OK                  OK
request_firmware_nowait(uevent=false)  OK                  OK
============================================================================

[0] https://bugzilla.kernel.org/show_bug.cgi?id=195477

CC: <stable@vger.kernel.org>    [4.10+]
Cc: Ming Lei <ming.lei@redhat.com>
Fixes: 5b02962494 ("firmware: do not use fw_lock for fw_state protection")
Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-10 13:54:16 -07:00
..
power Merge branches 'pm-qos' and 'pm-devfreq' 2017-07-14 13:16:31 +02:00
regmap Merge remote-tracking branches 'regmap/topic/1wire', 'regmap/topic/irq' and 'regmap/topic/lzo' into regmap-next 2017-07-03 16:20:28 +01:00
test driver core: test_async: fix up typo found by 0-day 2016-11-29 22:06:42 +01:00
arch_topology.c arm,arm64,drivers: add a prefix to drivers arch_topology interfaces 2017-06-03 19:10:09 +09:00
attribute_container.c attribute_container: Fix typo 2016-08-31 15:13:56 +02:00
base.h Revert "driver core: Add deferred_probe attribute to devices in sysfs" 2017-01-14 14:09:03 +01:00
bus.c driver-core: remove struct bus_type.dev_attrs 2017-06-12 16:18:37 +02:00
cacheinfo.c Merge branch 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-12-22 09:25:45 -08:00
class.c driver core: remove class_attrs from struct class 2017-06-09 10:41:00 +02:00
component.c Merge 4.5-rc4 into driver-core-next 2016-02-14 14:29:55 -08:00
container.c
core.c Add "shutdown" to "struct class". 2017-07-07 09:49:24 +10:00
cpu.c CPU / PM: expose pm_qos_resume_latency for CPUs 2017-01-30 11:05:29 +01:00
dd.c of/acpi: Configure dma operations at probe time for platform/amba/pci bus devices 2017-04-20 16:31:06 +02:00
devcoredump.c driver core: devcoredump: convert to use class_groups 2016-11-29 21:12:12 +01:00
devres.c devres: add devm_alloc_percpu() 2016-11-15 22:34:25 -05:00
devtmpfs.c statx: Add a system call to make enhanced file info available 2017-03-02 20:51:15 -05:00
dma-coherent.c drivers: dma-coherent: Introduce default DMA pool 2017-06-28 06:55:03 -07:00
dma-contiguous.c cma: Store a name in the cma structure 2017-04-18 20:41:12 +02:00
dma-mapping.c This is the first pull request for the new dma-mapping subsystem 2017-07-06 19:20:54 -07:00
driver.c driver core: add missing blank line after declaration 2015-03-25 14:36:30 +01:00
firmware_class.c firmware: fix batched requests - wake all waiters 2017-08-10 13:54:16 -07:00
firmware.c
hypervisor.c
init.c drivers: of/base: move of_init to driver_init 2015-05-26 19:55:56 -07:00
isa.c isa: Call isa_bus_init before dependent ISA bus drivers register 2016-06-17 20:47:11 -07:00
Kconfig arm, arm64: factorize common cpu capacity default code 2017-06-03 19:10:09 +09:00
Makefile arm, arm64: factorize common cpu capacity default code 2017-06-03 19:10:09 +09:00
map.c drivers: base: map: Use kmalloc_array instead of kmalloc 2015-03-25 14:35:08 +01:00
memory.c mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zone 2017-07-06 16:24:32 -07:00
module.c base: make module_create_drivers_dir race-free 2016-06-15 19:21:31 -07:00
node.c mm: drop useless local parameters of __register_one_node() 2017-07-10 16:32:32 -07:00
pinctrl.c driver core: fix automatic pinctrl management 2017-06-13 11:07:32 +02:00
platform-msi.c irqchip/MSI: Use irq_domain_update_bus_token instead of an open coded access 2017-06-22 18:29:17 +02:00
platform.c This is the bulk of GPIO changes for the v4.13 series: 2017-07-07 12:40:27 -07:00
property.c device property: Introduce fwnode_call_bool_op() for ops that return bool 2017-07-12 13:32:46 +02:00
soc.c base: soc: Allow early registration of a single SoC device 2017-03-29 21:43:26 +02:00
syscore.c
topology.c drivers base/topology: Convert to hotplug state machine 2016-11-09 23:45:29 +01:00
transport_class.c