* 'gpio/next' of git://git.secretlab.ca/git/linux-2.6: (61 commits)
gpio/mxc/mxs: fix build error introduced by the irq_gc_ack() renaming
mcp23s08: add i2c support
mcp23s08: isolate spi specific parts
mcp23s08: get rid of setup/teardown callbacks
gpio/tegra: dt: add binding for gpio polarity
mcp23s08: remove unused work queue
gpio/da9052: remove a redundant assignment for gpio->da9052
gpio/mxc: add device tree probe support
ARM: mxc: use ARCH_NR_GPIOS to define gpio number
gpio/mxc: get rid of the uses of cpu_is_mx()
gpio/mxc: add missing initialization of basic_mmio_gpio shadow variables
gpio: Move mpc5200 gpio driver to drivers/gpio
GPIO: DA9052 GPIO module v3
gpio/tegra: Use engineering names in DT compatible property
of/gpio: Add new method for getting gpios under different property names
gpio/dt: Refine GPIO device tree binding
gpio/ml-ioh: fix off-by-one for displaying variable i in dev_err
gpio/pca953x: Deprecate meaningless device-tree bindings
gpio/pca953x: Remove dynamic platform data pointer
gpio/pca953x: Fix IRQ support.
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (26 commits)
IB/qib: Defer HCA error events to tasklet
mlx4_core: Bump the driver version to 1.0
RDMA/cxgb4: Use printk_ratelimited() instead of printk_ratelimit()
IB/mlx4: Support PMA counters for IBoE
IB/mlx4: Use flow counters on IBoE ports
IB/pma: Add include file for IBA performance counters definitions
mlx4_core: Add network flow counters
mlx4_core: Fix location of counter index in QP context struct
mlx4_core: Read extended capabilities into the flags field
mlx4_core: Extend capability flags to 64 bits
IB/mlx4: Generate GID change events in IBoE code
IB/core: Add GID change event
RDMA/cma: Don't allow IPoIB port space for IBoE
RDMA: Allow for NULL .modify_device() and .modify_port() methods
IB/qib: Update active link width
IB/qib: Fix potential deadlock with link down interrupt
IB/qib: Add sysfs interface to read free contexts
IB/mthca: Remove unnecessary read of PCI_CAP_ID_EXP
IB/qib: Remove double define
IB/qib: Remove unnecessary read of PCI_CAP_ID_EXP
...
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1287 commits)
icmp: Fix regression in nexthop resolution during replies.
net: Fix ppc64 BPF JIT dependencies.
acenic: include NET_SKB_PAD headroom to incoming skbs
ixgbe: convert to ndo_fix_features
ixgbe: only enable WoL for magic packet by default
ixgbe: remove ifdef check for non-existent define
ixgbe: Pass staterr instead of re-reading status and error bits from descriptor
ixgbe: Move interrupt related values out of ring and into q_vector
ixgbe: add structure for containing RX/TX rings to q_vector
ixgbe: inline the ixgbe_maybe_stop_tx function
ixgbe: Update ATR to use recorded TX queues instead of CPU for routing
igb: Fix for DH89xxCC near end loopback test
e1000: always call e1000_check_for_link() on e1000_ce4100 MACs.
netxen: add fw version compatibility check
be2net: request native mode each time the card is reset
ipv4: Constrain UFO fragment sizes to multiples of 8 bytes
virtio_net: Fix panic in virtnet_remove
ipv6: make fragment identifications less predictable
ipv6: unshare inetpeers
can: make function can_get_bittiming static
...
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
lguest: Fix in/out emulation
lguest: Fix translation count about wikipedia's cpuid page
lguest: Fix three simple typos in comments
lguest: update comments
lguest: Simplify device initialization.
lguest: don't rewrite vmcall instructions
lguest: remove remaining vmcall
lguest: use a special 1:1 linear pagetable mode until first switch.
lguest: Do not exit on non-fatal errors
* 'stable/drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/pciback: Have 'passthrough' option instead of XEN_PCIDEV_BACKEND_PASS and XEN_PCIDEV_BACKEND_VPCI
xen/pciback: Remove the DEBUG option.
xen/pciback: Drop two backends, squash and cleanup some code.
xen/pciback: Print out the MSI/MSI-X (PIRQ) values
xen/pciback: Don't setup an fake IRQ handler for SR-IOV devices.
xen: rename pciback module to xen-pciback.
xen/pciback: Fine-grain the spinlocks and fix BUG: scheduling while atomic cases.
xen/pciback: Allocate IRQ handler for device that is shared with guest.
xen/pciback: Disable MSI/MSI-X when reseting a device
xen/pciback: guest SR-IOV support for PV guest
xen/pciback: Register the owner (domain) of the PCI device.
xen/pciback: Cleanup the driver based on checkpatch warnings and errors.
xen/pciback: xen pci backend driver.
xen: tmem: self-ballooning and frontswap-selfshrinking
xen: Add module alias to autoload backend drivers
xen: Populate xenbus device attributes
xen: Add __attribute__((format(printf... where appropriate
xen: prepare tmem shim to handle frontswap
xen: allow enable use of VGA console on dom0
* 'stable/pci.cleanups.v1' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/pci: Use 'acpi_gsi_to_irq' value unconditionally.
xen/pci: Remove 'xen_allocate_pirq_gsi'.
xen/pci: Retire unnecessary #ifdef CONFIG_ACPI
xen/pci: Move the allocation of IRQs when there are no IOAPIC's to the end
xen/pci: Squash pci_xen_initial_domain and xen_setup_pirqs together.
xen/pci: Use the xen_register_pirq for HVM and initial domain users
xen/pci: In xen_register_pirq bind the GSI to the IRQ after the hypercall.
xen/pci: Provide #ifdef CONFIG_ACPI to easy code squashing.
xen/pci: Update comments and fix empty spaces.
xen/pci: Shuffle code around.
* 'for-linus' of git://oss.sgi.com/xfs/xfs: (49 commits)
xfs: add size update tracepoint to IO completion
xfs: convert AIL cursors to use struct list_head
xfs: remove confusing ail cursor wrapper
xfs: use a cursor for bulk AIL insertion
xfs: failure mapping nfs fh to inode should return ESTALE
xfs: Remove the second parameter to xfs_sb_count()
xfs: remove the dead XFS_DABUF_DEBUG code
xfs: remove leftovers of the old btree tracing code
xfs: remove the dead QUOTADEBUG code
xfs: remove the unused xfs_buf_delwri_sort function
xfs: remove wrappers around b_iodone
xfs: remove wrappers around b_fspriv
xfs: add a proper transaction pointer to struct xfs_buf
xfs: factor out xfs_da_grow_inode_int
xfs: factor out xfs_dir2_leaf_find_stale
xfs: cleanup struct xfs_dir2_free
xfs: reshuffle dir2 headers
xfs: start periodic workers later
Revert "xfs: fix filesystsem freeze race in xfs_trans_alloc"
xfs: remove variables that serve no purpose in xfs_alloc_ag_vextent_exact()
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm:
dlm: don't limit active work items
dlm: use workqueue for callbacks
dlm: remove deadlock debug print
dlm: improve rsb searches
dlm: keep lkbs in idr
dlm: fix kmalloc args
dlm: don't do pointless NULL check, use kzalloc and fix order of arguments
dlm: dump address of unknown node
dlm: use vmalloc for hash tables
dlm: show addresses in configfs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/hfsplus:
hfsplus: ensure bio requests are not smaller than the hardware sectors
hfsplus: Add additional range check to handle on-disk corruptions
hfsplus: Add error propagation for hfsplus_ext_write_extent_locked
hfsplus: add error checking for hfs_find_init()
hfsplus: lift the 2TB size limit
hfsplus: fix overflow in hfsplus_read_wrapper
hfsplus: fix overflow in hfsplus_get_block
hfsplus: assignments inside `if' condition clean-up
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw:
GFS2: combine duplicated block freeing routines
GFS2: Add S_NOSEC support
GFS2: Automatically adjust glock min hold time
GFS2: Cache dir hash table in a contiguous buffer
* 'linux-next' of git://git.infradead.org/ubi-2.6:
UBI: clarify the volume notification types' doc
UBI: remove dead code
UBI: dump stack when switching to R/O mode
UBI: fix oops in error path
UBI: switch debugging tests knobs to debugfs
UBI: make it possible to use struct ubi_device in debug.h
UBI: prepare debugging stuff to further debugfs conversion
UBI: use debugfs for the extra checks knobs
UBI: change the interface of a debugging check function
* 'linux-next' of git://git.infradead.org/ubifs-2.6: (32 commits)
MAINTAINERS: change e-mail of Adrian Hunter
UBIFS: fix master node recovery
UBIFS: improve power cut emulation testing
UBIFS: rename recovery testing variables
UBIFS: remove custom list of superblocks
UBIFS: stop re-defining UBI operations
UBIFS: switch to I/O helpers
UBIFS: switch to ubifs_leb_write
UBIFS: switch to ubifs_leb_read
UBIFS: introduce more I/O helpers
UBIFS: always print stacktrace when switching to R/O mode
UBIFS: remove unused and unneeded debugging function
UBIFS: add global debugfs knobs
UBIFS: introduce debugfs helpers
UBIFS: re-arrange debugging code a bit
UBIFS: be more informative in failure mode
UBIFS: switch self-check knobs to debugfs
UBIFS: lessen amount of debugging check types
UBIFS: introduce helper functions for debugging checks and tests
UBIFS: amend debugging inode size check function prototype
...
With ib_qib options:
options ib_qib krcvqs=1 pcie_caps=0x51 rcvhdrcnt=4096 singleport=1 ibmtu=4
a run of ib_write_bw -a yields the following:
------------------------------------------------------------------
#bytes #iterations BW peak[MB/sec] BW average[MB/sec]
1048576 5000 2910.64 229.80
------------------------------------------------------------------
The top cpu use in a profile is:
CPU: Intel Architectural Perfmon, speed 2400.15 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask
of 0x00 (No unit mask) count 1002300
Counted LLC_MISSES events (Last level cache demand requests from this core that
missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000
samples % samples % app name symbol name
15237 29.2642 964 17.1195 ib_qib.ko qib_7322intr
12320 23.6618 1040 18.4692 ib_qib.ko handle_7322_errors
4106 7.8860 0 0 vmlinux vsnprintf
Analysis of the stats, profile, the code, and the annotated profile indicate:
- All of the overflow interrupts (one per packet overflow) are
serviced on CPU0 with no mitigation on the frequency.
- All of the receive interrupts are being serviced by CPU0. (That is
the way truescale.cmds statically allocates the kctx IRQs to CPU)
- The code is spending all of its time servicing QIB_I_C_ERROR
RcvEgrFullErr interrupts on CPU0, starving the packet receive
processing.
- The decode_err routine is very inefficient, using a printf variant
to format a "%s" and continues to loop when the errs mask has been
cleared.
- Both qib_7322intr and handle_7322_errors read pci registers, which
is very inefficient.
The fix does the following:
- Adds a tasklet to service QIB_I_C_ERROR
- Replaces the very inefficient scnprintf() with a memcpy(). A field
is added to qib_hwerror_msgs to save the sizeof("string") at
compile time so that a strlen is not needed during err_decode().
- The most frequent errors (Overflows) are serviced first to exit the
loop as early as possible.
- The loop now exits as soon as the errs mask is clear rather than
fruitlessly looping through the msp array.
With this fix the performance changes to:
------------------------------------------------------------------
#bytes #iterations BW peak[MB/sec] BW average[MB/sec]
1048576 5000 2990.64 2941.35
------------------------------------------------------------------
During testing of the error handling overflow patch, it was determined
that some CPU's were slower when servicing both overflow and receive
interrupts on CPU0 with different MSI interrupt vectors.
This patch adds an option (krcvq01_no_msi) to not use a dedicated MSI
interrupt for kctx's < 2 and to service them on the default interrupt.
For some CPUs, the cost of the interrupt enter/exit is more costly
than then the additional PCI read in the default handler.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Currently all bio requests are 512 bytes, which may fail for media
whose physical sector size is larger than this. Ensure these
requests are not smaller than the block device logical block size.
BugLink: http://bugs.launchpad.net/bugs/734883
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
'recoff' is read from disk and used for an argument to memcpy, so if
the value read from disk is larger than the page size, it result to
"general protection fault". This patch add additional range check for
the value, so that disk fuzz won't cause such fault.
Signed-off-by: Naohiro Aota <naota@elisp.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
icmp_route_lookup() uses the wrong flow parameters if the reverse
session route lookup isn't used.
So do not commit to the re-decoded flow until we actually make a
final decision to use a real route saved in 'rt2'.
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit c225150b "slab: fix DEBUG_SLAB build",
"if ((unsigned long)objp & (ARCH_SLAB_MINALIGN-1))" is always true if
ARCH_SLAB_MINALIGN == 0. Do not print warning if ARCH_SLAB_MINALIGN == 0.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Some workloads need some headroom (NET_SKB_PAD) to avoid expensive
reallocations.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Private rx_csum flags are now duplicate of netdev->features &
NETIF_F_RXCSUM. We remove those duplicates and now use the net_device_ops
ndo_set_features. This was based on the original patch submitted by
Michal Miroslaw <mirq-linux@rere.qmqm.pl>. I also removed the special
case not requiring a reset for X540 hardware. It is needed just as it is
in 82599 hardware.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Cc: Michal Miroslaw <mirq-linux@rere.qmqm.pl>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Martin Wilck <martin.wilck@ts.fujitsu.com> reported that systems using
the ixgbe-driver that were capable of WoL were rebooting almost as soon
as they were shut down. This is because the default WoL settings
enabled magic packet, broadcast, unicast, and multicast.
Other Intel devices seem to use the stored eeprom value for initial WoL
capabilities. The 82578DM (e1000e) and 82576 (igb) the devices I looked
at had only the magic packet enabled in the eeprom, so that seems
appropriate on ixgbe-based devices as well. I set the WoL options on my
82578DM to be the same default as the ixgbe devices (umbg) and saw the
same as Martin -- almost as soon as my box shutdown, it booted again.
This patch changes the default to only be the magic packet. This is the
same as the default for most Intel and non-Intel hardware currently
upstream.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
CC: Martin Wilck <martin.wilck@ts.fujitsu.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change is meant to address possible race conditions from the status
and error bits on the RX descriptors being re-read by multiple functions in
the RX cleanup path. To resolve this I have added code that will pass the
staterr value to those functions.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change moves work_limit, total_packets, and total_bytes into the ring
container struct of the q_vector. The advantage of this is that it should
reduce the size of memory used in the event of multiple rings being
assigned to a single q_vector. In addition it should help to reduce the
total workload for calculating itr since now total_packets and total_bytes
will be the total work done of the interrupt instead of for the ring.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds support for a ring container structure to be used within
the q_vector. The basic idea is to provide a means of separating the RX
and TX rings while maintaining a common structure for their containment.
The advantage to this is that later we should be able to pass this
structure to the update_itr functions without needing to pass individual
rings.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Many features were added to this driver, so the driver version should change too.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The ixgbe_maybe_stop_tx function is only a few lines long and is called
multiple times through the xmit hotpath. In order to streamline things it
makes sense to just inline it.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change is meant to update ATR so that it will use the recorded RX
queue instead of the CPU in the case of routing. This change is meant to
help ixgbe default behavior to more closely match that of the kernel.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
On this chipset it is required to configure the MPHY block for loopback tests. If MPHY is not configured then all loopback tests will report failures.
Signed-off-by: Robert Healy <robert.healy@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Interrupts about link lost or rx sequence errors are not reported by
the ce4100 hardware, leading to transitions from link UP to link DOWN
never being reported.
Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We were blatting too much of the register. Linux didn't care, but in
theory it might.
Reported-by: Jonas Maebe <jonas.maebe@elis.ugent.be>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The comment is outdated, wikipedia now has six translations of the cpuid
page.
Signed-off-by: Adrian Knoth <adi@drcomp.erfurt.thur.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This patch fixes three typos I've accidentally spotted.
Signed-off-by: Adrian Knoth <adi@drcomp.erfurt.thur.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (one was already fixed)
We used to notify the Host every time we updated a device's status. However,
it only really needs to know when we're resetting the device, or failed to
initialize it, or when we've finished our feature negotiation.
In particular, we used to wait for VIRTIO_CONFIG_S_DRIVER_OK in the
status byte before starting the device service threads. But this
corresponds to the successful finish of device initialization, which
might (like virtio_blk's partition scanning) use the device. So we
had a hack, if they used the device before we expected we started the
threads anyway.
Now we hook into the finalize_features hook in the Guest: at that
point we tell the Launcher that it can rely on the features we have
acked. On the Launcher side, we look at the status at that point, and
start servicing the device.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We switch back from using vmcall in 091ebf07a2
because it was unreliable under kvm, but I missed one (rarely-used) place.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The Host used to create some page tables for the Guest to use at the
top of Guest memory; it would then tell the Guest where this was. In
particular, it created linear mappings for 0 and 0xC0000000 addresses
because lguest used to switch to its real page tables quite late in
boot.
However, since d50d8fe19 Linux initialized boot page tables in
head_32.S even before the "are we lguest?" boot jump. So, now we can
simplify things: the Host pagetable code assumes 1:1 linear mapping
until it first calls the LHCALL_NEW_PGTABLE hypercall, which we now do
before we reach C code.
This also means that the Host doesn't need to know anything about the
Guest's PAGE_OFFSET. (Non-Linux guests might not even have such a
thing).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
o Minimum fw version supported for P3 chip is 4.0.505
o File Fw > 4.0.554 is not supported if flash fw < 4.0.554.
o In mn firmware case, file fw older than flash fw is allowed.
o Change variable names for readability
o Update driver version 4.0.76
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently be3-native mode is requested only in probe(). It must be requested, each time the card is reset either after an EEH error or after
sleep/hibernation.
Also, the be_cmd_check_native_mode() is better named be_cmd_req_native_mode()
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because the ip fragment offset field counts 8-byte chunks, ip
fragments other than the last must contain a multiple of 8 bytes of
payload. ip_ufo_append_data wasn't respecting this constraint and,
depending on the MTU and ip option sizes, could create malformed
non-final fragments.
Google-Bug-Id: 5009328
Signed-off-by: Bill Sommerfeld <wsommerfeld@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>