free_msi_irqs() frees the MSI entries before destroying the sysfs entries
which are exposing them. Nothing prevents a concurrent free while a sysfs
file is read and accesses the possibly freed entry.
Move the sysfs release ahead of freeing the entries.
Fixes: 1c51b50c29 ("PCI/MSI: Export MSI mode using attributes, not kobjects")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Bjorn Helgaas <helgaas@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87sfw5305m.ffs@tglx
It appears that some devices are lying about their mask capability,
pretending that they don't have it, while they actually do.
The net result is that now that we don't enable MSIs on such
endpoint.
Add a new per-device flag to deal with this. Further patches will
make use of it, sadly.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20211104180130.3825416-2-maz@kernel.org
Cc: Bjorn Helgaas <helgaas@kernel.org>
The recent rework of PCI/MSI[X] masking moved the non-mask checks from the
low level accessors into the higher level mask/unmask functions.
This missed the fact that these accessors can be invoked from other places
as well. The missing checks break XEN-PV which sets pci_msi_ignore_mask and
also violates the virtual MSIX and the msi_attrib.maskbit protections.
Instead of sprinkling checks all over the place, lift them back into the
low level accessor functions. To avoid checking three different conditions
combine them into one property of msi_desc::msi_attrib.
[ josef: Fixed the missed conversion in the core code ]
Fixes: fcacdfbef5 ("PCI/MSI: Provide a new set of mask and unmask functions")
Reported-by: Josef Johansson <josef@oderland.se>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Josef Johansson <josef@oderland.se>
Cc: Bjorn Helgaas <helgaas@kernel.org>
Cc: stable@vger.kernel.org
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Conserve IRQs by setting up portdrv IRQs only when there are users
(Jan Kiszka)
- Rework and simplify _OSC negotiation for control of PCIe features
(Joerg Roedel)
- Remove struct pci_dev.driver pointer since it's redundant with the
struct device.driver pointer (Uwe Kleine-König)
Resource management:
- Coalesce contiguous host bridge apertures from _CRS to accommodate
BARs that cover more than one aperture (Kai-Heng Feng)
Sysfs:
- Check CAP_SYS_ADMIN before parsing user input (Krzysztof
Wilczyński)
- Return -EINVAL consistently from "store" functions (Krzysztof
Wilczyński)
- Use sysfs_emit() in endpoint "show" functions to avoid buffer
overruns (Kunihiko Hayashi)
PCIe native device hotplug:
- Ignore Link Down/Up caused by resets during error recovery so
endpoint drivers can remain bound to the device (Lukas Wunner)
Virtualization:
- Avoid bus resets on Atheros QCA6174, where they hang the device
(Ingmar Klein)
- Work around Pericom PI7C9X2G switch packet drop erratum by using
store and forward mode instead of cut-through (Nathan Rossi)
- Avoid trying to enable AtomicOps on VFs; the PF setting applies to
all VFs (Selvin Xavier)
MSI:
- Document that /sys/bus/pci/devices/.../irq contains the legacy INTx
interrupt or the IRQ of the first MSI (not MSI-X) vector (Barry
Song)
VPD:
- Add pci_read_vpd_any() and pci_write_vpd_any() to access anywhere
in the possible VPD space; use these to simplify the cxgb3 driver
(Heiner Kallweit)
Peer-to-peer DMA:
- Add (not subtract) the bus offset when calculating DMA address
(Wang Lu)
ASPM:
- Re-enable LTR at Downstream Ports so they don't report Unsupported
Requests when reset or hot-added devices send LTR messages
(Mingchuang Qiao)
Apple PCIe controller driver:
- Add driver for Apple M1 PCIe controller (Alyssa Rosenzweig, Marc
Zyngier)
Cadence PCIe controller driver:
- Return success when probe succeeds instead of falling into error
path (Li Chen)
HiSilicon Kirin PCIe controller driver:
- Reorganize PHY logic and add support for external PHY drivers
(Mauro Carvalho Chehab)
- Support PERST# GPIOs for HiKey970 external PEX 8606 bridge (Mauro
Carvalho Chehab)
- Add Kirin 970 support (Mauro Carvalho Chehab)
- Make driver removable (Mauro Carvalho Chehab)
Intel VMD host bridge driver:
- If IOMMU supports interrupt remapping, leave VMD MSI-X remapping
enabled (Adrian Huang)
- Number each controller so we can tell them apart in
/proc/interrupts (Chunguang Xu)
- Avoid building on UML because VMD depends on x86 bare metal APIs
(Johannes Berg)
Marvell Aardvark PCIe controller driver:
- Define macros for PCI_EXP_DEVCTL_PAYLOAD_* (Pali Rohár)
- Set Max Payload Size to 512 bytes per Marvell spec (Pali Rohár)
- Downgrade PIO Response Status messages to debug level (Marek Behún)
- Preserve CRS SV (Config Request Retry Software Visibility) bit in
emulated Root Control register (Pali Rohár)
- Fix issue in configuring reference clock (Pali Rohár)
- Don't clear status bits for masked interrupts (Pali Rohár)
- Don't mask unused interrupts (Pali Rohár)
- Avoid code repetition in advk_pcie_rd_conf() (Marek Behún)
- Retry config accesses on CRS response (Pali Rohár)
- Simplify emulated Root Capabilities initialization (Pali Rohár)
- Fix several link training issues (Pali Rohár)
- Fix link-up checking via LTSSM (Pali Rohár)
- Fix reporting of Data Link Layer Link Active (Pali Rohár)
- Fix emulation of W1C bits (Marek Behún)
- Fix MSI domain .alloc() method to return zero on success (Marek
Behún)
- Read entire 16-bit MSI vector in MSI handler, not just low 8 bits
(Marek Behún)
- Clear Root Port I/O Space, Memory Space, and Bus Master Enable bits
at startup; PCI core will set those as necessary (Pali Rohár)
- When operating as a Root Port, set class code to "PCI Bridge"
instead of the default "Mass Storage Controller" (Pali Rohár)
- Add emulation for PCI_BRIDGE_CTL_BUS_RESET since aardvark doesn't
implement this per spec (Pali Rohár)
- Add emulation of option ROM BAR since aardvark doesn't implement
this per spec (Pali Rohár)
MediaTek MT7621 PCIe controller driver:
- Add MediaTek MT7621 PCIe host controller driver and DT binding
(Sergio Paracuellos)
Qualcomm PCIe controller driver:
- Add SC8180x compatible string (Bjorn Andersson)
- Add endpoint controller driver and DT binding (Manivannan
Sadhasivam)
- Restructure to use of_device_get_match_data() (Prasad Malisetty)
- Add SC7280-specific pcie_1_pipe_clk_src handling (Prasad Malisetty)
Renesas R-Car PCIe controller driver:
- Remove unnecessary includes (Geert Uytterhoeven)
Rockchip DesignWare PCIe controller driver:
- Add DT binding (Simon Xue)
Socionext UniPhier Pro5 controller driver:
- Serialize INTx masking/unmasking (Kunihiko Hayashi)
Synopsys DesignWare PCIe controller driver:
- Run dwc .host_init() method before registering MSI interrupt
handler so we can deal with pending interrupts left by bootloader
(Bjorn Andersson)
- Clean up Kconfig dependencies (Andy Shevchenko)
- Export symbols to allow more modular drivers (Luca Ceresoli)
TI DRA7xx PCIe controller driver:
- Allow host and endpoint drivers to be modules (Luca Ceresoli)
- Enable external clock if present (Luca Ceresoli)
TI J721E PCIe driver:
- Disable PHY when probe fails after initializing it (Christophe
JAILLET)
MicroSemi Switchtec management driver:
- Return error to application when command execution fails because an
out-of-band reset has cleared the device BARs, Memory Space Enable,
etc (Kelvin Cao)
- Fix MRPC error status handling issue (Kelvin Cao)
- Mask out other bits when reading of management VEP instance ID
(Kelvin Cao)
- Return EOPNOTSUPP instead of ENOTSUPP from sysfs show functions
(Kelvin Cao)
- Add check of event support (Logan Gunthorpe)
Miscellaneous:
- Remove unused pci_pool wrappers, which have been replaced by
dma_pool (Cai Huoqing)
- Use 'unsigned int' instead of bare 'unsigned' (Krzysztof
Wilczyński)
- Use kstrtobool() directly, sans strtobool() wrapper (Krzysztof
Wilczyński)
- Fix some sscanf(), sprintf() format mismatches (Krzysztof
Wilczyński)
- Update PCI subsystem information in MAINTAINERS (Krzysztof
Wilczyński)
- Correct some misspellings (Krzysztof Wilczyński)"
* tag 'pci-v5.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (137 commits)
PCI: Add ACS quirk for Pericom PI7C9X2G switches
PCI: apple: Configure RID to SID mapper on device addition
iommu/dart: Exclude MSI doorbell from PCIe device IOVA range
PCI: apple: Implement MSI support
PCI: apple: Add INTx and per-port interrupt support
PCI: kirin: Allow removing the driver
PCI: kirin: De-init the dwc driver
PCI: kirin: Disable clkreq during poweroff sequence
PCI: kirin: Move the power-off code to a common routine
PCI: kirin: Add power_off support for Kirin 960 PHY
PCI: kirin: Allow building it as a module
PCI: kirin: Add MODULE_* macros
PCI: kirin: Add Kirin 970 compatible
PCI: kirin: Support PERST# GPIOs for HiKey970 external PEX 8606 bridge
PCI: apple: Set up reference clocks when probing
PCI: apple: Add initial hardware bring-up
PCI: of: Allow matching of an interrupt-map local to a PCI device
of/irq: Allow matching of an interrupt-map local to an interrupt controller
irqdomain: Make of_phandle_args_to_fwspec() generally available
PCI: Do not enable AtomicOps on VFs
...
The bare "unsigned" type implicitly means "unsigned int", but the preferred
coding style is to use the complete type name.
Update the bare use of "unsigned" to the preferred "unsigned int".
No change to functionality intended.
See a1ce18e4f9 ("checkpatch: warn on bare unsigned or signed declarations
without int").
Link: https://lore.kernel.org/r/20211013014136.1117543-1-kw@linux.com
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Previously, when msi_populate_sysfs() failed, we saved the error return
value as dev->msi_irq_groups, which leads to a page fault when
free_msi_irqs() calls msi_destroy_sysfs().
To prevent this, leave dev->msi_irq_groups alone when msi_populate_sysfs()
fails.
Found by the Hulk Robot when injecting a memory allocation fault in
msi_populate_sysfs():
BUG: unable to handle page fault for address: fffffffffffffff4
...
Call Trace:
msi_destroy_sysfs+0x30/0xa0
free_msi_irqs+0x11d/0x1b0
Fixes: 2f170814bd ("genirq/msi: Move MSI sysfs handling from PCI to MSI core")
Link: https://lore.kernel.org/r/20211012071556.939137-1-wanghai38@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Barry Song <song.bao.hua@hisilicon.com>
Pull irq updates from Thomas Gleixner:
"Updates to the interrupt core and driver subsystems:
Core changes:
- The usual set of small fixes and improvements all over the place,
but nothing stands out
MSI changes:
- Further consolidation of the PCI/MSI interrupt chip code
- Make MSI sysfs code independent of PCI/MSI and expose the MSI
interrupts of platform devices in the same way as PCI exposes them.
Driver changes:
- Support for ARM GICv3 EPPI partitions
- Treewide conversion to generic_handle_domain_irq() for all chained
interrupt controllers
- Conversion to bitmap_zalloc() throughout the irq chip drivers
- The usual set of small fixes and improvements"
* tag 'irq-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits)
platform-msi: Add ABI to show msi_irqs of platform devices
genirq/msi: Move MSI sysfs handling from PCI to MSI core
genirq/cpuhotplug: Demote debug printk to KERN_DEBUG
irqchip/qcom-pdc: Trim unused levels of the interrupt hierarchy
irqdomain: Export irq_domain_disconnect_hierarchy()
irqchip/gic-v3: Fix priority comparison when non-secure priorities are used
irqchip/apple-aic: Fix irq_disable from within irq handlers
pinctrl/rockchip: drop the gpio related codes
gpio/rockchip: drop irq_gc_lock/irq_gc_unlock for irq set type
gpio/rockchip: support next version gpio controller
gpio/rockchip: use struct rockchip_gpio_regs for gpio controller
gpio/rockchip: add driver for rockchip gpio
dt-bindings: gpio: change items restriction of clock for rockchip,gpio-bank
pinctrl/rockchip: add pinctrl device to gpio bank struct
pinctrl/rockchip: separate struct rockchip_pin_bank to a head file
pinctrl/rockchip: always enable clock for gpio controller
genirq: Fix kernel doc indentation
EDAC/altera: Convert to generic_handle_domain_irq()
powerpc: Bulk conversion to generic_handle_domain_irq()
nios2: Bulk conversion to generic_handle_domain_irq()
...
When running as Xen PV guest, masking MSI-X is a responsibility of the
hypervisor. The guest has no write access to the relevant BAR at all - when
it tries to, it results in a crash like this:
BUG: unable to handle page fault for address: ffffc9004069100c
#PF: supervisor write access in kernel mode
#PF: error_code(0x0003) - permissions violation
RIP: e030:__pci_enable_msix_range.part.0+0x26b/0x5f0
e1000e_set_interrupt_capability+0xbf/0xd0 [e1000e]
e1000_probe+0x41f/0xdb0 [e1000e]
local_pci_probe+0x42/0x80
(...)
The recently introduced function msix_mask_all() does not check the global
variable pci_msi_ignore_mask which is set by XEN PV to bypass the masking
of MSI[-X] interrupts.
Add the check to make this function XEN PV compatible.
Fixes: 7d5ec3d361 ("PCI/MSI: Mask all unused MSI-X entries")
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210826170342.135172-1-marmarek@invisiblethingslab.com
The existing mask/unmask functions are convoluted and generate suboptimal
assembly code.
Provide a new set of functions which will be used in later patches to
replace the exisiting ones.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/875ywetozb.ffs@tglx
msi_desc::masked is a misnomer. For MSI it's used to cache the MSI mask
bits when the device supports per vector masking. For MSI-X it's used to
cache the content of the vector control word which contains the mask bit
for the vector.
Replace it with a union of msi_mask and msix_ctrl to make the purpose clear
and fix up the usage sites.
No functional change
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210729222543.045993608@linutronix.de
Multi-MSI uses a single MSI descriptor and there is a single mask register
when the device supports per vector masking. To avoid reading back the mask
register the value is cached in the MSI descriptor and updates are done by
clearing and setting bits in the cache and writing it to the device.
But nothing protects msi_desc::masked and the mask register from being
modified concurrently on two different CPUs for two different Linux
interrupts which belong to the same multi-MSI descriptor.
Add a lock to struct device and protect any operation on the mask and the
mask register with it.
This makes the update of msi_desc::masked unconditional, but there is no
place which requires a modification of the hardware register without
updating the masked cache.
msi_mask_irq() is now an empty wrapper which will be cleaned up in follow
up changes.
The problem goes way back to the initial support of multi-MSI, but picking
the commit which introduced the mask cache is a valid cut off point
(2.6.30).
Fixes: f2440d9acb ("PCI MSI: Refactor interrupt masking code")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210729222542.726833414@linutronix.de
msi_mask_irq() takes a mask and a flags argument. The mask argument is used
to mask out bits from the cached mask and the flags argument to set bits.
Some places invoke it with a flags argument which sets bits which are not
used by the device, i.e. when the device supports up to 8 vectors a full
unmask in some places sets the mask to 0xFFFFFF00. While devices probably
do not care, it's still bad practice.
Fixes: 7ba1930db0 ("PCI MSI: Unmask MSI if setup failed")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210729222542.568173099@linutronix.de
Nothing enforces the posted writes to be visible when the function
returns. Flush them even if the flush might be redundant when the entry is
masked already as the unmask will flush as well. This is either setup or a
rare affinity change event so the extra flush is not the end of the world.
While this is more a theoretical issue especially the logic in the X86
specific msi_set_affinity() function relies on the assumption that the
update has reached the hardware when the function returns.
Again, as this never has been enforced the Fixes tag refers to a commit in:
git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Fixes: f036d4ea5fa7 ("[PATCH] ia32 Message Signalled Interrupt support")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210729222542.515188147@linutronix.de
The specification (PCIe r5.0, sec 6.1.4.5) states:
For MSI-X, a function is permitted to cache Address and Data values
from unmasked MSI-X Table entries. However, anytime software unmasks a
currently masked MSI-X Table entry either by clearing its Mask bit or
by clearing the Function Mask bit, the function must update any Address
or Data values that it cached from that entry. If software changes the
Address or Data value of an entry while the entry is unmasked, the
result is undefined.
The Linux kernel's MSI-X support never enforced that the entry is masked
before the entry is modified hence the Fixes tag refers to a commit in:
git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Enforce the entry to be masked across the update.
There is no point in enforcing this to be handled at all possible call
sites as this is just pointless code duplication and the common update
function is the obvious place to enforce this.
Fixes: f036d4ea5fa7 ("[PATCH] ia32 Message Signalled Interrupt support")
Reported-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210729222542.462096385@linutronix.de
When MSI-X is enabled the ordering of calls is:
msix_map_region();
msix_setup_entries();
pci_msi_setup_msi_irqs();
msix_program_entries();
This has a few interesting issues:
1) msix_setup_entries() allocates the MSI descriptors and initializes them
except for the msi_desc:masked member which is left zero initialized.
2) pci_msi_setup_msi_irqs() allocates the interrupt descriptors and sets
up the MSI interrupts which ends up in pci_write_msi_msg() unless the
interrupt chip provides its own irq_write_msi_msg() function.
3) msix_program_entries() does not do what the name suggests. It solely
updates the entries array (if not NULL) and initializes the masked
member for each MSI descriptor by reading the hardware state and then
masks the entry.
Obviously this has some issues:
1) The uninitialized masked member of msi_desc prevents the enforcement
of masking the entry in pci_write_msi_msg() depending on the cached
masked bit. Aside of that half initialized data is a NONO in general
2) msix_program_entries() only ensures that the actually allocated entries
are masked. This is wrong as experimentation with crash testing and
crash kernel kexec has shown.
This limited testing unearthed that when the production kernel had more
entries in use and unmasked when it crashed and the crash kernel
allocated a smaller amount of entries, then a full scan of all entries
found unmasked entries which were in use in the production kernel.
This is obviously a device or emulation issue as the device reset
should mask all MSI-X table entries, but obviously that's just part
of the paper specification.
Cure this by:
1) Masking all table entries in hardware
2) Initializing msi_desc::masked in msix_setup_entries()
3) Removing the mask dance in msix_program_entries()
4) Renaming msix_program_entries() to msix_update_entries() to
reflect the purpose of that function.
As the masking of unused entries has never been done the Fixes tag refers
to a commit in:
git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Fixes: f036d4ea5fa7 ("[PATCH] ia32 Message Signalled Interrupt support")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210729222542.403833459@linutronix.de
The ordering of MSI-X enable in hardware is dysfunctional:
1) MSI-X is disabled in the control register
2) Various setup functions
3) pci_msi_setup_msi_irqs() is invoked which ends up accessing
the MSI-X table entries
4) MSI-X is enabled and masked in the control register with the
comment that enabling is required for some hardware to access
the MSI-X table
Step #4 obviously contradicts #3. The history of this is an issue with the
NIU hardware. When #4 was introduced the table access actually happened in
msix_program_entries() which was invoked after enabling and masking MSI-X.
This was changed in commit d71d6432e1 ("PCI/MSI: Kill redundant call of
irq_set_msi_desc() for MSI-X interrupts") which removed the table write
from msix_program_entries().
Interestingly enough nobody noticed and either NIU still works or it did
not get any testing with a kernel 3.19 or later.
Nevertheless this is inconsistent and there is no reason why MSI-X can't be
enabled and masked in the control register early on, i.e. move step #4
above to step #1. This preserves the NIU workaround and has no side effects
on other hardware.
Fixes: d71d6432e1 ("PCI/MSI: Kill redundant call of irq_set_msi_desc() for MSI-X interrupts")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210729222542.344136412@linutronix.de
The sysfs_emit() and sysfs_emit_at() functions were introduced to make
it less ambiguous which function is preferred when writing to the output
buffer in a device attribute's "show" callback [1].
Convert the PCI sysfs object "show" functions from sprintf(), snprintf()
and scnprintf() to sysfs_emit() and sysfs_emit_at() accordingly, as the
latter is aware of the PAGE_SIZE buffer and correctly returns the number
of bytes written into the buffer.
No functional change intended.
[1] Documentation/filesystems/sysfs.rst
Related commit: ad025f8e46 ("PCI/sysfs: Use sysfs_emit() and
sysfs_emit_at() in "show" functions").
Link: https://lore.kernel.org/r/20210603000112.703037-2-kw@linux.com
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Move pci_msi_setup_pci_dev(), which disables MSI and MSI-X interrupts, from
probe.c to msi.c so it's with all the other MSI code and more consistent
with other capability initialization. This means we must compile msi.c
always, even without CONFIG_PCI_MSI, so wrap the rest of msi.c in an #ifdef
and adjust the Makefile accordingly. No functional change intended.
Link: https://lore.kernel.org/r/20201203185110.1583077-2-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thierry Reding <treding@nvidia.com>
The arch_.*_msi_irq[s] fallbacks are compiled in whether an architecture
requires them or not. Architectures which are fully utilizing hierarchical
irq domains should never call into that code.
It's not only architectures which depend on that by implementing one or
more of the weak functions, there is also a bunch of drivers which relies
on the weak functions which invoke msi_controller::setup_irq[s] and
msi_controller::teardown_irq.
Make the architectures and drivers which rely on them select them in Kconfig
and if not selected replace them by stub functions which emit a warning and
fail the PCI/MSI interrupt allocation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200826112333.992429909@linutronix.de
Provide a helper function to check whether a PCI device is handled by a
non-standard PCI/MSI domain. This will be used to exclude such devices
which hang of a special bus, e.g. VMD, to be excluded from the irq domain
override in irq remapping.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20200826112333.139387358@linutronix.de
pci_msi_get_hwirq() and pci_msi_set_desc are not longer special. Enable the
generic MSI domain ops in the core and PCI MSI code unconditionally and get
rid of the x86 specific implementations in the X86 MSI code and in the
hyperv PCI driver.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200826112332.564274859@linutronix.de
When debugging an issue where I was asking the PCI machinery to enable a
set of MSI-X vectors, without falling back on MSI, I ran across a behaviour
which seems odd. The pci_alloc_irq_vectors_affinity() will always return
-ENOSPC on failure, when allocating MSI-X vectors only, whereas with MSI
fallback it will forward any error returned by __pci_enable_msi_range().
This is a confusing behaviour, so have the pci_alloc_irq_vectors_affinity()
forward the error code from __pci_enable_msix_range() when appropriate.
Link: https://lore.kernel.org/r/20200616073318.20229-1-piotr.stankiewicz@intel.com
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
ioremap has provided non-cached semantics by default since the Linux 2.6
days, so remove the additional ioremap_nocache interface.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
When a driver enables MSI-X, msix_program_entries() reads the MSI-X Vector
Control register for each vector and saves it in desc->masked. Each
register is 32 bits and bit 0 is the actual Mask bit.
When we restored these registers during resume, we previously set the Mask
bit if *any* bit in desc->masked was set instead of when the Mask bit
itself was set:
pci_restore_state
pci_restore_msi_state
__pci_restore_msix_state
for_each_pci_msi_entry
msix_mask_irq(entry, entry->masked) <-- entire u32 word
__pci_msix_desc_mask_irq(desc, flag)
mask_bits = desc->masked & ~PCI_MSIX_ENTRY_CTRL_MASKBIT
if (flag) <-- testing entire u32, not just bit 0
mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT
writel(mask_bits, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL)
This means that after resume, MSI-X vectors were masked when they shouldn't
be, which leads to timeouts like this:
nvme nvme0: I/O 978 QID 3 timeout, completion polled
On resume, set the Mask bit only when the saved Mask bit from suspend was
set.
This should remove the need for 19ea025e1d ("nvme: Add quirk for Kingston
NVME SSD running FW E8FK11.T").
[bhelgaas: commit log, move fix to __pci_msix_desc_mask_irq()]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=204887
Link: https://lore.kernel.org/r/20191008034238.2503-1-jian-hong@endlessm.com
Fixes: f2440d9acb ("PCI MSI: Refactor interrupt masking code")
Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
27e20603c5 ("PCI/MSI: Move D0 check into pci_msi_check_device()")
moved the power state check into pci_msi_check_device(), which was
subsequently renamed to pci_msi_supported(). This didn't change the
behavior, since both callers checked the power state.
However, it doesn't fit the current "pci_msi_supported()" name, which
should return what the device is capable of, independent of the power
state.
Move the power state check back into the callers for readability. No
functional change intended.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull NTB updates from Jon Mason:
"New feature to add support for NTB virtual MSI interrupts, the ability
to test and use this feature in the NTB transport layer.
Also, bug fixes for the AMD and Switchtec drivers, as well as some
general patches"
* tag 'ntb-5.3' of git://github.com/jonmason/ntb: (22 commits)
NTB: Describe the ntb_msi_test client in the documentation.
NTB: Add MSI interrupt support to ntb_transport
NTB: Add ntb_msi_test support to ntb_test
NTB: Introduce NTB MSI Test Client
NTB: Introduce MSI library
NTB: Rename ntb.c to support multiple source files in the module
NTB: Introduce functions to calculate multi-port resource index
NTB: Introduce helper functions to calculate logical port number
PCI/switchtec: Add module parameter to request more interrupts
PCI/MSI: Support allocating virtual MSI interrupts
ntb_hw_switchtec: Fix setup MW with failure bug
ntb_hw_switchtec: Skip unnecessary re-setup of shared memory window for crosslink case
ntb_hw_switchtec: Remove redundant steps of switchtec_ntb_reinit_peer() function
NTB: correct ntb_dev_ops and ntb_dev comment typos
NTB: amd: Silence shift wrapping warning in amd_ntb_db_vector_mask()
ntb_hw_switchtec: potential shift wrapping bug in switchtec_ntb_init_sndev()
NTB: ntb_transport: Ensure qp->tx_mw_dma_addr is initaliazed
NTB: ntb_hw_amd: set peer limit register
NTB: ntb_perf: Clear stale values in doorbell and command SPAD register
NTB: ntb_perf: Disable NTB link after clearing peer XLAT registers
...
For NTB devices, we want to be able to trigger MSI interrupts
through a memory window. In these cases we may want to use
more interrupts than the NTB PCI device has available in its MSI-X
table.
We allow for this by creating a new 'virtual' interrupt. These
interrupts are allocated as usual but are not programmed into the
MSI-X table (as there may not be space for them).
The MSI address and data will then handled through an NTB MSI library
introduced later in this series.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
In several places in the kernel we find PCI_DEVID used like this:
PCI_DEVID(dev->bus->number, dev->devfn)
Add a "pci_dev_id(struct pci_dev *dev)" helper to simplify callers.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>