The function cookie_check_timestamp(), both called from IPv4/6 context,
is being used to decode the echoed timestamp from the SYN/ACK into TCP
options used for follow-up communication with the peer.
We can remove ECN handling from that function, split it into a separate
one, and simply rename the original function into cookie_decode_options().
cookie_decode_options() just fills in tcp_option struct based on the
echoed timestamp received from the peer. Anything that fails in this
function will actually discard the request socket.
While this is the natural place for decoding options such as ECN which
commit 172d69e63c ("syncookies: add support for ECN") added, we argue
that in particular for ECN handling, it can be checked at a later point
in time as the request sock would actually not need to be dropped from
this, but just ECN support turned off.
Therefore, we split this functionality into cookie_ecn_ok(), which tells
us if the timestamp indicates ECN support AND the tcp_ecn sysctl is enabled.
This prepares for per-route ECN support: just looking at the tcp_ecn sysctl
won't be enough anymore at that point; if the timestamp indicates ECN
and sysctl tcp_ecn == 0, we will also need to check the ECN dst metric.
This would mean adding a route lookup to cookie_check_timestamp(), which
we definitely want to avoid. As we already do a route lookup at a later
point in cookie_{v4,v6}_check(), we can simply make use of that as well
for the new cookie_ecn_ok() function w/o any additional cost.
Joint work with Daniel Borkmann.
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Was a bit more difficult to read than needed due to magic shifts;
add defines and document the used encoding scheme.
Joint work with Daniel Borkmann.
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
else is unnecessary after return 0 in __udp4_lib_rcv()
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
remove __inline__ / inline and let compiler decide what to do
with static functions
Inspired-by: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yaogong replaces TCP out of order receive queue by an RB tree.
As netem already does a private skb->{next/prev/tstamp} union
with a 'struct rb_node', lets do this in a cleaner way.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yaogong Wang <wygivan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2014-11-03
This series contains updates to i40e and i40evf.
Akeem adds a check for i40e so that flow director flush and reinit are
not done when flow director is not enabled.
Mitch fixes the i40evf driver to properly handle multiple admin queue
messages, by reinit the msg_size field each time we go through the loop.
Without this, we may receive truncated messages due to the firmware
thinking we have insufficient buffer size. Also fixes the link checking
logic to only check the carrier state if the interface is actually
open, which allows link changes to be reported correctly without spamming
the VFs. Updates i40e to inset the VSI ID in the QTX_CTL register
when configuring queues for VMDq VSIs.
Paul adds support for 10G-base-T in i40evf.
Jesse fixes i40e where the call to irq_dynamic_disable() was turning off
the interrupt completely when trying to set ITR to 0 (for lowest
moderation).
Shannon removes debugfs dump stats function, since it was not being
kept up-to-date and was redundant with the ethtool output. Also, scales
back the LAN MSIx usage to force queue/vector sharing and leave some
vectors for Flow Director, VMDq, etc. when there are more cores than
vectors available to the PF. Cleans up the error reporting for
get_lump() resource tracking errors. Also adds a check for the
debug module parameter earlier to be able to catch the early configuration
phase admin queue messages.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
this is check for dev is unnecessary, as we are already checking dev
after allocating it via alloc_netdev, and jumping to label: out
if it is NULL.
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz says:
====================
Mellanox ethernet driver update Oct-30-2014
The 1st patch from Saeed fixes a bug in the last net-next batch where
a VF could get access to set port configuration, the next patch from Amir
fixes a race in the port VPI logic. Next are two performance patches from Ido.
The patch to add checksum complete status on GRE and such packets was
preceded with a patch that converted the driver to only use napi_gro_receive
vs. the current code which goes through napi_gro_frags on it's usual track.
Eric D. has some thoughts and suggestions on that change for which we
want to take the time and consider, so for the time being dropped that
patch and the ones that depend on it.
Changes from V0:
- have the caller to provide the __GFP_COLD hint to the service function
- dropped the patch that changes the GRO logic and the subsequent dependent
patches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add code to issue CONFIG_DEV "get" firmware command.
This command is used in order to obtain certain parameters used for
supporting various RX checksumming options and vxlan UDP port.
The GET operation is allowed for VFs too.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Needed in order to get cache cold pages (L3 flushed) for HW scatter.
Otherwise memory may flush those entries when the packet comes from
PCI, causing back pressure resulting in BW decrease.
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When IP_ALIGN has a non zero value, hardware will write to a non aligned
address. The only reader from this address is when copying the header
from the first frag into the linear buffer (further access to the IP
address will be from the linear buffer, in which the headers are
aligned). Since the penalty of non align access by the hardware is
greater than the software memcpy, changing the frag_align to always be 0.
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to protect set_port_type() for concurrency, as the sysfs code could
call it from mutliple contexts in parallel.
The port_mutex is not enough because we need to protect from concurrent
modification of 'info' and stopping of the port sensing work.
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added wrapper to the ACCESS_REG command for handling guest HW
registers access, preventing write operations, but do allow reads.
This will prevent SRIOV guests to change port PTYS configuration,
such as speed/advertised link modes.
Fixes: adbc7ac5c1 ('net/mlx4_core: Introduce ACCESS_REG CMD [...]')
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net_rx_action() can mask irqs a single time to transfert sd->poll_list
into a private list, for a very short duration.
Then, napi_complete() can avoid masking irqs again,
and net_rx_action() only needs to mask irq again in slow path.
This patch removes 2 couples of irq mask/unmask per typical NAPI run,
more if multiple napi were triggered.
Note this also allows to give control back to caller (do_softirq())
more often, so that other softirq handlers can be called a bit earlier,
or ksoftirqd can be wakeup earlier under pressure.
This was developed while testing an alternative to RX interrupt
mitigation to reduce latencies while keeping or improving GRO
aggregation on fast NIC.
Idea is to test napi->gro_list at the end of a napi->poll() and
reschedule one NAPI poll, but after servicing a full round of
softirqs (timers, TX, rcu, ...). This will be allowed only if softirq
is currently serviced by idle task or ksoftirqd, and resched not needed.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
flow_limit in struct softnet_data is only read from local cpu
and can be moved to fill a hole, reducing softnet_data size by
64 bytes on x86_64
While we are at it, move output_queue, output_queue_tailp and
completion_queue, so that rx / tx paths touch a single cache line.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a few problems with our parsing of the MDET registers:
* Queue IDs are longer than 8 bits
* Queue IDs are absolute for the device and the base queue must be
subtracted out.
* VF IDs are longer than 8 bits
* Use the MASK define to mask the event value, instead of the SHIFT
define.
Change-ID: I3dc7237f480c02e1192a2a8ea782f8a02ab2a8b7
Reported-by: Marc Neustadter <marc.neustadter@intel.com>
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Patrick Lu <patrick.lu@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We must insert the VSI ID in the QTX_CTL register when
configuring queues for VMDQ VSIs.
Change-ID: Iedfe36bd42ca0adc90a7cc2b7cf04795a98f4761
Reported-by: Marc Neustadter <marc.neustadter@intel.com>
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Patrick Lu <patrick.lu@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Check the debug module parameter earlier to be able to catch the early
configuration phase adminq messages.
Change-ID: Ic84fabd72393489bbf96042de770790a80fd8468
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Patrick Lu <patrick.lu@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tweak and homogenize the error reporting for get_lump() resource
tracking errors.
Change-ID: I11330161cc6ad8d04371c499c63071c816171c3b
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Patrick Lu <patrick.lu@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When there are more cores than vectors available to the PF, scale back
the LAN msix usage to force queue/vector sharing and leave some vectors
for Flow Director, VMDq, etc.
Change-ID: Ie0317732eb85ad8d851d7da7d9af86b1bf8c21ad
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Patrick Lu <patrick.lu@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The debugfs dump stats wasn't being kept up-to-date, was redundant with
the ethtool output, and didn't offer any useful additional info. Rather
than continue trying to keep them aligned, just remove the debugfs command.
Change-ID: Id130ed9aef01c6369ab662c7b4c5ec5b1dbc5b40
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Patrick Lu <patrick.lu@intel.com>
Tested-by: Jim Young <Jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The call to irq_dynamic_disable was turning off the interrupt completely
when trying to set ITR to 0 (for lowest moderation). Just remove the
call as setting the values to 0 later in this function will suffice.
Change-ID: I47caf1ecbe65653cf63ec833db93094cd83fd84d
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Patrick Lu <patrick.lu@intel.com>
Tested-By: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Add 10G-Base-T support in i40evf.
Change-ID: I98a1c3138d7d6572fe7903a7c1c4692cae3260d5
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Patrick Lu <patrick.lu@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
If the interface is closed, but VFs exist, current code will spam all
the VFs with link messages every second. This is because the link event
code was looking at netif_carrier_ok() without checking to see if the
interface was actually open.
Refactor the logic to only check the carrier state if the interface is
actually open. This allows link changes to be reported correctly without
spamming the VFs.
Change-ID: If136e79bb3820d21ea4e39e332e8a9604efc2b2a
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Patrick Lu <patrick.lu@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When we receive an admin queue message, the msg_size field in the event
struct gets overwritten. Because of this, we need to reinit the field
each time we go through the loop. Without this we may receive truncated
messages due to the firmware thinking we have insufficient buffer size.
Change-ID: I21dcca5114d91365d731169965ce3ffec0e4a190
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Patrick Lu <patrick.lu@intel.com>
Tested-by: Jim Young <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When FD_SB/ATR are not enabled, do not allow flow director flush
and reinit.
Change-ID: Iafe261c1862992981615815551abd1ed9fada0a8
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: Patrick Lu <patrick.lu@intel.com>
Tested-by: <jamesx.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Pull input updates from Dmitry Torokhov:
"A bunch of fixes for minor defects reported by Coverity, a few driver
fixups and revert of i8042.nomux change so that we are once again
enable active MUX mode if box claims to support it"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Revert "Input: i8042 - disable active multiplexing by default"
Input: altera_ps2 - use correct type for irq return value
Input: altera_ps2 - write to correct register when disabling interrupts
Input: max77693-haptic - fix potential overflow
Input: psmouse - remove unneeded check in psmouse_reconnect()
Input: vsxxxaa - fix code dropping bytes from queue
Input: ims-pcu - fix dead code in ims_pcu_ofn_reg_addr_store()
Input: opencores-kbd - fix error handling
Input: wm97xx - adapt parameters to tosa touchscreen.
Input: i8042 - quirks for Fujitsu Lifebook A544 and Lifebook AH544
Input: stmpe-keypad - fix valid key line bitmask
Input: soc_button_array - update calls to gpiod_get*()
- Fix a crash on r8a7791/koelsch during resume from system suspend
caused by a recent cpufreq-dt commit (Geert Uytterhoeven).
- Fix an MFD enumeration problem introduced by a recent commit
adding ACPI support to the MFD subsystem that exposed a weakness
in the ACPI core causing ACPI enumeration to be applied to all
devices associated with one ACPI companion object, although it
should be used for one of them only (Mika Westerberg).
- Fix an ACPI EC regression introduced during the 3.17 cycle
causing some Samsung laptops to misbehave as a result of a
workaround targeted at some Acer machines. That includes
a revert of a commit that went too far and a quirk for the
Acer machines in question. From Lv Zheng.
- Fix a regression in the system suspend error code path introduced
during the 3.15 cycle that causes it to fail to take errors from
asychronous execution of "late" suspend callbacks into account
(Imre Deak).
- Fix a long-standing bug in the hibernation resume error code path
that fails to roll back everything correcty on "freeze" callback
errors and leaves some devices in a "suspended" state causing more
breakage to happen subsequently (Imre Deak).
- Make the cpufreq-dt driver disable operation performance points
that are not supported by the VR connected to the CPU voltage
plane with acceptable tolerance instead of constantly failing
voltage scaling later on (Lucas Stach).
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJUVAuPAAoJEILEb/54YlRxGfQP/0nFfTqyuDN8cPA2qRIzDIoi
8PTOzlhrRuzUlpMkYdsDijxwFcK2/59LomwtuAKHi7309N6UzUa8vAkb8WrzpY7m
XUU+fhsLEkDnEczMfgmbP5ljtP75eJSSWRO0WIBuk4k79qcsutLNtgGpJV7feYSv
+t7OE9DrBPM8lSpBKM/4qs5gnXzdaWmi4xGH7upQWyxAC6RG9GosKdDUZxVxSJQt
oy/y0O4oxwyjg+8EvPwd22JtoFJ6axoEwCJXXlkn7NbIQNGtxrMR9zcMglsuOklg
bG93g1xJl4YCwLXV8sKfPU2kQkQ1ISY3rYIkwIjvBNIY4QFsQpCg3GYt08OJI0bO
4wDD7kH8C51aD9Zfi9luCdE4MsMyGB7SeNvQJul5uMujuG9ZeI61a8d7P6fmXu5X
lk+GeNl/rMujaESwqQlNgm3DvSYfc5FFEDC6F4Wcu4koomSlJwj//lMlOg2ajIgz
p5En6FeC8yGTuobGqo2dT7yYjmxm+kdX+gTStsto+hkxWA7beNjI1iXXWwPrQa/F
7pzneSrdbTZVdzZ1F9eR9AcGljhRMLBxs2XembXgkviCv+IVjw4qHWWKveDQKkhG
CVtcd3jrFSRHeAaqVNnbsoMu2nOLRY2W+f2+FNEfYKc+13aDJYm7pyAOIjujY7ns
Q1jSP7ZZQBVlxP5j5W5x
=g4QU
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from Rafael Wysocki:
"These are fixes received after my previous pull request plus one that
has been in the works for quite a while, but its previous version
caused problems to happen, so it's been deferred till now.
Fixed are two recent regressions (MFD enumeration and cpufreq-dt),
ACPI EC regression introduced in 3.17, system suspend error code path
regression introduced in 3.15, an older bug related to recovery from
failing resume from hibernation and a cpufreq-dt driver issue related
to operation performance points.
Specifics:
- Fix a crash on r8a7791/koelsch during resume from system suspend
caused by a recent cpufreq-dt commit (Geert Uytterhoeven).
- Fix an MFD enumeration problem introduced by a recent commit adding
ACPI support to the MFD subsystem that exposed a weakness in the
ACPI core causing ACPI enumeration to be applied to all devices
associated with one ACPI companion object, although it should be
used for one of them only (Mika Westerberg).
- Fix an ACPI EC regression introduced during the 3.17 cycle causing
some Samsung laptops to misbehave as a result of a workaround
targeted at some Acer machines. That includes a revert of a commit
that went too far and a quirk for the Acer machines in question.
From Lv Zheng.
- Fix a regression in the system suspend error code path introduced
during the 3.15 cycle that causes it to fail to take errors from
asychronous execution of "late" suspend callbacks into account
(Imre Deak).
- Fix a long-standing bug in the hibernation resume error code path
that fails to roll back everything correcty on "freeze" callback
errors and leaves some devices in a "suspended" state causing more
breakage to happen subsequently (Imre Deak).
- Make the cpufreq-dt driver disable operation performance points
that are not supported by the VR connected to the CPU voltage plane
with acceptable tolerance instead of constantly failing voltage
scaling later on (Lucas Stach)"
* tag 'pm+acpi-3.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / EC: Fix regression due to conflicting firmware behavior between Samsung and Acer.
Revert "ACPI / EC: Add support to disallow QR_EC to be issued before completing previous QR_EC"
cpufreq: cpufreq-dt: Restore default cpumask_setall(policy->cpus)
PM / Sleep: fix recovery during resuming from hibernation
PM / Sleep: fix async suspend_late/freeze_late error handling
ACPI: Use ACPI companion to match only the first physical device
cpufreq: cpufreq-dt: disable unsupported OPPs
Sysfs
- Fix "enable" filename change (Greg Kroah-Hartman)
PCI device hotplug
- Revert duplicate merge (Kamal Mostafa)
Freescale i.MX6
- Wait for clocks to stabilize after ref_en (Richard Zhu)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUU7kyAAoJEFmIoMA60/r8Y4gQALZtzaD6UCsFT9Ga+RufrXtA
B5i7gIhgyJWea0uLTATKVljrBH8mO1Cr7AC8Z0MWZ24zC+ot73HF9h6RT3Jl3gSH
ddWBVntF2+qyzP19vcLQzquA8njdXLHefYX66SvtR0/KmEG1H66r3TOUrTgR8VQg
2hzFSW2eWlA8Hf5TTt8JrqHA00z9PKBjD+YvWbd75e9kJDKOkN6yn0ljfFgDqkRZ
CKOa6BRFuX83ZYJipBvQmVQFXpgUzE82VyCd1wy18/R0dUNtChZDOqi01SfHP9mq
NNBHuNZ+zrdgmeB3mhgPZSYSuo27T4Hu+XoKWOrjGrooKC4mmqLvOyj3l1rgsAr7
nURTZ93ecVkSUj7IsrhxSiJOD71YD4Fx7MXfzTN6Yb1+K7z821+UIBWLhE362Thq
8Q+LJAqc6g+Cnh/PkM1iL/KCHUgLWbLO6n/PZNU3Nju2dW10U4dVCmv5QiJy76JM
prDXqwHls/cvKIJ7I2nT60UGeFcYEs9t88J/DE8SumxMK54Dk+VVKvbESzHdetUo
LyLriwBtlzl50IE71DFQLLz1haiasBDuDljlq2FmwN5E20MmkoTl46fhtgpQMyKm
jYsITlScwT/3GEhtnmAGpfUjFihCGJ6zZM1la42NmcVgE85fmCAqneJrHMY8t6mT
r97g/GsGk4u9DmfMMpKh
=Uj0Z
-----END PGP SIGNATURE-----
Merge tag 'pci-v3.18-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
"These changes, intended for v3.18, fix:
Sysfs
- Fix "enable" filename change (Greg Kroah-Hartman)
An unintentional sysfs filename change in commit 5136b2da77
("PCI: convert bus code to use dev_groups"), which appeared in
v3.13, changed "enable" to "enabled", and this changes it back.
Old users of "enable" are currently broken and will be helped by
this change. Anything that started to use "enabled" after v3.13
will be broken by this change. If necessary, we can add a symlink
to make both work, but this patch doesn't do that.
PCI device hotplug
- Revert duplicate merge (Kamal Mostafa)
A mistaken duplicate merge that added a check twice. Nothing's
broken; this just removes the unnecessary code.
Freescale i.MX6
- Wait for clocks to stabilize after ref_en (Richard Zhu)
An i.MX6 clock problem that prevents mx6 nitrogen boards from booting"
* tag 'pci-v3.18-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: Rename sysfs 'enabled' file back to 'enable'
PCI: imx6: Wait for clocks to stabilize after ref_en
Revert duplicate "PCI: pciehp: Prevent NULL dereference during probe"
Rusty noticed a Really Bad Bug (tm) in my NT fix. The entry code
reads out of bounds, causing the NT fix to be unreliable. But, and
this is much, much worse, if your stack is somehow just below the
top of the direct map (or a hole), you read out of bounds and crash.
Excerpt from the crash:
[ 1.129513] RSP: 0018:ffff88001da4bf88 EFLAGS: 00010296
2b:* f7 84 24 90 00 00 00 testl $0x4000,0x90(%rsp)
That read is deterministically above the top of the stack. I
thought I even single-stepped through this code when I wrote it to
check the offset, but I clearly screwed it up.
Fixes: 8c7aa698ba ("x86_64, entry: Filter RFLAGS.NT on entry from userspace")
Reported-by: Rusty Russell <rusty@ozlabs.org>
Cc: stable@vger.kernel.org
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJUVAF/AAoJENNvdpvBGATwEbAQALNiAIChEyJTnQDkAQc2wqqn
dv8NQmFr5aefc63A/+n/yJJGrQZtKs0ceh29ty5ksYLFXzUdc2ctFg6vBmllQfbz
PQawAk2gOkF8zfVuqiQU7X+wTBpGmGXTa8HY+WJTtk0pBfhl+p0PDCYsWXMwZJ1D
tAZpxJ4AmPc7A4hApWOvce6r7Xg24vZk/8UA93Tif9AkeY6VoN272Hx5b/UGmBHY
RCEgpowuiIY38bghtLh5+T0J98/EQNof46cEHgGI9nIDZeXRzgvDojE5bLI0/IS/
K07MjYlm/WFWsLFkgNJkTiqEXgnji9BNYRF1xxUjMMBAR4+fnFLw9kXXgcETrPCx
U7lHOhs8M2FK40cWhUDz/tukvL4S4lQwPEeqBPlRE8J5/twRyXHeZDp4F7LOobwq
mk6AajSJlP+05XwXOuCx7Hcf9uxjw/IpqhBS5IZxy8Nn3T2guPlY9wMhYU1RYFws
54FeE76SJ8EDgjVK/txj7rgh11GggWsjsdXvftSElM2DsKsqYEOKAvDzvwmbm7eV
dsFOlRB6B/X4UpiAC2MiPJynYg9TJ7LkVBzDZeZ/fbm7JhTqChSJDzapqdrmNPIY
SQqwLmFXnHqaw6HNitZ5Bs+fD6nfvKqy85NeImxE3lhLWDuiTt77Y3o80IW30TgN
5bnuXq8Rkukrxs/VDvPq
=kI6P
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 bugfixes from Ted Ts'o:
"A set of miscellaneous ext4 bug fixes for 3.18"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: make ext4_ext_convert_to_initialized() return proper number of blocks
ext4: bail early when clearing inode journal flag fails
ext4: bail out from make_indexed_dir() on first error
jbd2: use a better hash function for the revoke table
ext4: prevent bugon on race between write/fcntl
ext4: remove extent status procfs files if journal load fails
ext4: disallow changing journal_csum option during remount
ext4: enable journal checksum when metadata checksum feature enabled
ext4: fix oops when loading block bitmap failed
ext4: fix overflow when updating superblock backups after resize
Pull quota and ext3 fixes from Jan Kara.
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fs, jbd: use a more generic hash function
quota: Properly return errors from dquot_writeback_dquots()
ext3: Don't check quota format when there are no quota files
Pull networking fixes from David Miller:
"A bit has accumulated, but it's been a week or so since my last batch
of post-merge-window fixes, so...
1) Missing module license in netfilter reject module, from Pablo.
Lots of people ran into this.
2) Off by one in mac80211 baserate calculation, from Karl Beldan.
3) Fix incorrect return value from ax88179_178a driver's set_mac_addr
op, which broke use of it with bonding. From Ian Morgan.
4) Checking of skb_gso_segment()'s return value was not all
encompassing, it can return an SKB pointer, a pointer error, or
NULL. Fix from Florian Westphal.
This is crummy, and longer term will be fixed to just return error
pointers or a real SKB.
6) Encapsulation offloads not being handled by
skb_gso_transport_seglen(). From Florian Westphal.
7) Fix deadlock in TIPC stack, from Ying Xue.
8) Fix performance regression from using rhashtable for netlink
sockets. The problem was the synchronize_net() invoked for every
socket destroy. From Thomas Graf.
9) Fix bug in eBPF verifier, and remove the strong dependency of BPF
on NET. From Alexei Starovoitov.
10) In qdisc_create(), use the correct interface to allocate
->cpu_bstats, otherwise the u64_stats_sync member isn't
initialized properly. From Sabrina Dubroca.
11) Off by one in ip_set_nfnl_get_byindex(), from Dan Carpenter.
12) nf_tables_newchain() was erroneously expecting error pointers from
netdev_alloc_pcpu_stats(). It only returna a valid pointer or
NULL. From Sabrina Dubroca.
13) Fix use-after-free in _decode_session6(), from Li RongQing.
14) When we set the TX flow hash on a socket, we mistakenly do so
before we've nailed down the final source port. Move the setting
deeper to fix this. From Sathya Perla.
15) NAPI budget accounting in amd-xgbe driver was counting descriptors
instead of full packets, fix from Thomas Lendacky.
16) Fix total_data_buflen calculation in hyperv driver, from Haiyang
Zhang.
17) Fix bcma driver build with OF_ADDRESS disabled, from Hauke
Mehrtens.
18) Fix mis-use of per-cpu memory in TCP md5 code. The problem is
that something that ends up being vmalloc memory can't be passed
to the crypto hash routines via scatter-gather lists. From Eric
Dumazet.
19) Fix regression in promiscuous mode enabling in cdc-ether, from
Olivier Blin.
20) Bucket eviction and frag entry killing can race with eachother,
causing an unlink of the object from the wrong list. Fix from
Nikolay Aleksandrov.
21) Missing initialization of spinlock in cxgb4 driver, from Anish
Bhatt.
22) Do not cache ipv4 routing failures, otherwise if the sysctl for
forwarding is subsequently enabled this won't be seen. From
Nicolas Cavallari"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (131 commits)
drivers: net: cpsw: Support ALLMULTI and fix IFF_PROMISC in switch mode
drivers: net: cpsw: Fix broken loop condition in switch mode
net: ethtool: Return -EOPNOTSUPP if user space tries to read EEPROM with lengh 0
stmmac: pci: set default of the filter bins
net: smc91x: Fix gpios for device tree based booting
mpls: Allow mpls_gso to be built as module
mpls: Fix mpls_gso handler.
r8152: stop submitting intr for -EPROTO
netfilter: nft_reject_bridge: restrict reject to prerouting and input
netfilter: nft_reject_bridge: don't use IP stack to reject traffic
netfilter: nf_reject_ipv6: split nf_send_reset6() in smaller functions
netfilter: nf_reject_ipv4: split nf_send_reset() in smaller functions
netfilter: nf_tables_bridge: update hook_mask to allow {pre,post}routing
drivers/net: macvtap and tun depend on INET
drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets
drivers/net: Disable UFO through virtio
net: skb_fclone_busy() needs to detect orphaned skb
gre: Use inner mac length when computing tunnel length
mlx4: Avoid leaking steering rules on flow creation error flow
net/mlx4_en: Don't attempt to TX offload the outer UDP checksum for VXLAN
...
Pull sparc update from David Miller:
"Two changes:
1) It makes no sense to execute a VTOC partition table request in the
Sun virtual block device driver and fail to load if it doesn't
succeed because a) we don't use the result at all and b) it won't
succeed if there is an EFI partition on the disk, for example.
We read the partition table via the normal means in the block layer
anyways, so this is really completely useless, so just remove it.
From Dwight Engen.
2) Hook up new bpf system call"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sunvdc: don't call VD_OP_GET_VTOC
sparc: Hook up bpf system call.
Pull x86 fixes from Ingo Molnar:
"Fixes from all around the place:
- hyper-V 32-bit PAE guest kernel fix
- two IRQ allocation fixes on certain x86 boards
- intel-mid boot crash fix
- intel-quark quirk
- /proc/interrupts duplicate irq chip name fix
- cma boot crash fix
- syscall audit fix
- boot crash fix with certain TSC configurations (seen on Qemu)
- smpboot.c build warning fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, pageattr: Prevent overflow in slow_virt_to_phys() for X86_PAE
ACPI, irq, x86: Return IRQ instead of GSI in mp_register_gsi()
x86, intel-mid: Create IRQs for APB timers and RTC timers
x86: Don't enable F00F workaround on Intel Quark processors
x86/irq: Fix XT-PIC-XT-PIC in /proc/interrupts
x86, cma: Reserve DMA contiguous area after initmem_init()
i386/audit: stop scribbling on the stack frame
x86, apic: Handle a bad TSC more gracefully
x86: ACPI: Do not translate GSI number if IOAPIC is disabled
x86/smpboot: Move data structure to its primary usage scope
* acpi-scan:
ACPI: Use ACPI companion to match only the first physical device
* acpi-ec:
ACPI / EC: Fix regression due to conflicting firmware behavior between Samsung and Acer.
Revert "ACPI / EC: Add support to disallow QR_EC to be issued before completing previous QR_EC"
Pull scheduler fixes from Ingo Molnar:
"Various scheduler fixes all over the place: three SCHED_DL fixes,
three sched/numa fixes, two generic race fixes and a comment fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/dl: Fix preemption checks
sched: Update comments for CLONE_NEWNS
sched: stop the unbound recursion in preempt_schedule_context()
sched/fair: Fix division by zero sysctl_numa_balancing_scan_size
sched/fair: Care divide error in update_task_scan_period()
sched/numa: Fix unsafe get_task_struct() in task_numa_assign()
sched/deadline: Fix races between rt_mutex_setprio() and dl_task_timer()
sched/deadline: Don't replenish from a !SCHED_DEADLINE entity
sched: Fix race between task_group and sched_task_group
Pull perf fixes from Ingo Molnar:
"Mostly tooling fixes, plus on the kernel side:
- a revert for a newly introduced PMU driver which isn't complete yet
and where we ran out of time with fixes (to be tried again in
v3.19) - this makes up for a large chunk of the diffstat.
- compilation warning fixes
- a printk message fix
- event_idx usage fixes/cleanups"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf probe: Trivial typo fix for --demangle
perf tools: Fix report -F dso_from for data without branch info
perf tools: Fix report -F dso_to for data without branch info
perf tools: Fix report -F symbol_from for data without branch info
perf tools: Fix report -F symbol_to for data without branch info
perf tools: Fix report -F mispredict for data without branch info
perf tools: Fix report -F in_tx for data without branch info
perf tools: Fix report -F abort for data without branch info
perf tools: Make CPUINFO_PROC an array to support different kernel versions
perf callchain: Use global caching provided by libunwind
perf/x86/intel: Revert incomplete and undocumented Broadwell client support
perf/x86: Fix compile warnings for intel_uncore
perf: Fix typos in sample code in the perf_event.h header
perf: Fix and clean up initialization of pmu::event_idx
perf: Fix bogus kernel printk
perf diff: Add missing hists__init() call at tool start