Commit Graph

426320 Commits

Author SHA1 Message Date
Ben Hutchings
5eed1f6852 sfc: Fail self-test with -EBUSY, not -EIO, if the device is busy
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 17:53:35 -05:00
Ben Hutchings
17e678d12b sfc: Cosmetic changes to self-test from the out-of-tree driver
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 17:53:35 -05:00
Ben Hutchings
6a350fdb60 sfc: Update product naming
We don't use 'Solarstorm' or 'Solarflare Communications' in full
any more.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 17:53:35 -05:00
Ben Hutchings
e0b3ae30a2 sfc: Use canonical pointer type for MAC address in efx_set_mac_address()
Functions such as is_valid_ether_addr() expect u8 *, so use that
instead of char *.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 17:53:35 -05:00
Ben Hutchings
93413f5058 sfc: Rename 'use_options' variable in tso_start() to clearer 'use_opt_desc'
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 17:53:35 -05:00
Ben Hutchings
2fa25cf1e1 sfc: Preserve rx_frm_trunc counters when resizing DMA rings
We allocate efx_channel structures with kzalloc() so we don't need to
zero-initialise individual fields in efx_probe_channel().  Further,
this function will be called again during DMA ring resizing and we
should not reset any statistics then.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 17:53:34 -05:00
Ben Hutchings
aa3930ee8c sfc: Correct comment about number of TX queues used on EF10
EF10 implements option descriptors to switch TX checksum offload
on and off between packets.  We could therefore use a single
hardware TX queue per kernel TX queue, although we don't yet.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 17:53:34 -05:00
Ben Hutchings
86fc187fcc sfc: Remove unused definitions of EF10 user-mode DMA descriptors
These DMA descriptor types will only be used by the userland
networking stack.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 17:53:34 -05:00
Ben Hutchings
0bdadad166 sfc: Replace TSOH_OFFSET with the equivalent NET_IP_ALIGN
If CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is defined then NET_IP_ALIGN
will be defined as 0, so this macro is redundant.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 17:53:34 -05:00
Ben Hutchings
92d8f766ec sfc: Rewrite adjustment of PPS event in a clearer way
There is substantial latency in generation and handling of PPS events
from the NIC, which we have to correct for before passing a host
timestamp to the PPS subsystem.  We compare clocks with the MC,
giving us two offsets to subtract from the timestamp generated by
pps_get_ts():

(a) Time from the last good sync (where we got host and NIC timestamps
    for nearly the same instant) to the time we called pps_get_ts()
(b) Time from NIC top of second to the last good sync

We currently calculate (a) + (b) in a quite confusing way.
Instead, calculate (a) completely, then add (b) to it.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 17:53:34 -05:00
Ben Hutchings
ce320f44d6 sfc: Cache skb->data in local variable in efx_ptp_rx()
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 17:53:33 -05:00
Laurence Evans
f9fd7ec786 sfc: Removed adhoc scheme to rate limit PTP event queue overflow message
Use conventional net_ratelimit() instead.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 17:53:33 -05:00
David S. Miller
a146b59197 Merge branch 'ethtool_docs'
Ben Hutchings says:

====================
ethtool: Improve documentation

This series converts most comments in <linux/ethtool.h> into kernel-doc
format, and expands them to describe the semantics as well as I
understand them.  It also fixes some kernel-doc formatting errors.

I started on this at Solarflare, but they graciously allowed me to take
it away and finish it off.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 17:33:39 -05:00
Ben Hutchings
073e3cf219 ethtool: Fix unwanted section breaks in kernel-doc
A colon almost unavoidably starts a new section.  The script should be
changed to provide a way to avoid this, but for now reword the
comments to avoid using colons.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 17:33:04 -05:00
Ben Hutchings
ba569dc3e8 ethtool: Move kernel-doc comment next to struct ethtool_dump definition
The kernel-doc script does not tolerate the macro definition in between.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 17:33:04 -05:00
Ben Hutchings
6e201c857b ethtool: Document the general convention for VLAs in kernel space
Various ethtool command structures are declared with zero-length array
at the end which are intended to be variable-length in userland
(relying on lack of compiler bounds checking).  However, in the kernel
the structure and array are always allocated and passed to driver
operations separately.  Make that explicit.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 17:33:03 -05:00
Ben Hutchings
f432c095f7 ethtool: Expand documentation of struct ethtool_perm_addr
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 17:33:03 -05:00
Ben Hutchings
590912298c ethtool: Expand documentation of struct ethtool_stats
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 17:33:03 -05:00
Ben Hutchings
4e5a62db2b ethtool: Expand documentation of struct ethtool_test
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 17:33:03 -05:00
Ben Hutchings
fe5df1b91e ethtool: Expand documentation of string set types
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 17:33:03 -05:00
Ben Hutchings
6a7a1081ce ethtool: Update documentation of struct ethtool_pauseparam
Convert the inline comments to kernel-doc format.

Explicitly specify that non-zero autoneg is an error if link
autonegotiation is disabled.

Specify that pause capabilities should be advertised dependent on link
autonegotiation, not the autoneg flag here.  There is no way to
opt-out of pause frame autonegotiation, and this improves behaviour
when the link partner is configured to follow pause frame
autonegotiation and our interface is not.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 17:33:02 -05:00
Ben Hutchings
af440a8aed ethtool: Expand documentation of struct ethtool_ringparam
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 17:33:02 -05:00
Ben Hutchings
c8364a63f6 ethtool: Expand documentation of struct ethtool_eeprom
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 17:33:02 -05:00
Ben Hutchings
09fb8bb068 ethtool: Expand documentation of struct ethtool_regs
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 17:33:02 -05:00
Ben Hutchings
02d59f3fdb ethtool: Expand documentation of struct ethtool_wol
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 17:33:02 -05:00
Ben Hutchings
daba1b6bc1 ethtool: Expand documentation of struct ethtool_drvinfo
Replace the inline comments (and some others below) with a full
explanation of the semantics, in kernel-doc format.  Specify which
strings may be empty.  Document the relationship with other commands.

Replace the 'deprecation' of some fields with a proper explanation of
the conversion to generalised string sets, as userland programs may
not be able to assume that ETHTOOL_GSSET_INFO is available.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 17:33:02 -05:00
Ben Hutchings
bf8fc60a62 ethtool: Expand documentation of struct ethtool_cmd
struct ethtool_cmd has very limited documentation; it contains
several obscure or obsolete fields and several with non-obvious
interpretation.

Replace the inline comments (and some others below) with a full
explanation of the semantics as well as I understand them, in
kernel-doc format.  Formally deprecate some fields that seem to be of
historical use only.

Extend the comment about 32/64-bit compatibility to cover all
ethtool structures.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-12 17:33:01 -05:00
Linus Torvalds
738b52bb98 Microblaze fixes for 3.14-rc3
- Fix two compilation issues - HZ, readq/writeq
 - Fix stack protection support
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iEUEABECAAYFAlL6EgAACgkQykllyylKDCH/4ACYjiUz6pz7esQbcq4HlSN97we4
 QwCcD+VvWFMdGGmvB0NFYB8U3Zu57so=
 =uDeC
 -----END PGP SIGNATURE-----

Merge tag 'microblaze-3.14-rc3' of git://git.monstr.eu/linux-2.6-microblaze

Pull microblaze fixes from Michal Simek:
 - Fix two compilation issues - HZ, readq/writeq
 - Fix stack protection support

* tag 'microblaze-3.14-rc3' of git://git.monstr.eu/linux-2.6-microblaze:
  microblaze: Fix a typo when disabling stack protection
  microblaze: Define readq and writeq IO helper function
  microblaze: Fix missing HZ macro
2014-02-11 12:24:35 -08:00
Linus Torvalds
a87af778d8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 bugfixes from Martin Schwidefsky:
 "A collection a bug fixes.  Most of them are minor but two of them are
  more severe.  The linkage stack bug can be used by user space to force
  an oops, with panic_on_oops this is a denial-of-service.  And the dump
  memory detection issue can cause incomplete memory dumps"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/cio: improve cio_commit_config
  s390: fix kernel crash due to linkage stack instructions
  s390/dump: Fix dump memory detection
  s390/appldata: restore missing init_virt_timer()
  s390/qdio: correct program-controlled interruption checking
  s390/qdio: for_each macro correctness
2014-02-11 12:23:50 -08:00
Linus Torvalds
16e5a2ed59 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking updates from David Miller:

 1) Fix flexcan build on big endian, from Arnd Bergmann

 2) Correctly attach cpsw to GPIO bitbang MDIO drive, from Stefan Roese

 3) udp_add_offload has to use GFP_ATOMIC since it can be invoked from
    non-sleepable contexts.  From Or Gerlitz

 4) vxlan_gro_receive() does not iterate over all possible flows
    properly, fix also from Or Gerlitz

 5) CAN core doesn't use a proper SKB destructor when it hooks up
    sockets to SKBs.  Fix from Oliver Hartkopp

 6) ip_tunnel_xmit() can use an uninitialized route pointer, fix from
    Eric Dumazet

 7) Fix address family assignment in IPVS, from Michal Kubecek

 8) Fix ath9k build on ARM, from Sujith Manoharan

 9) Make sure fail_over_mac only applies for the correct bonding modes,
    from Ding Tianhong

10) The udp offload code doesn't use RCU correctly, from Shlomo Pongratz

11) Handle gigabit features properly in generic PHY code, from Florian
    Fainelli

12) Don't blindly invoke link operations in
    rtnl_link_get_slave_info_data_size, they are optional.  Fix from
    Fernando Luis Vazquez Cao

13) Add USB IDs for Netgear Aircard 340U, from Bjørn Mork

14) Handle netlink packet padding properly in openvswitch, from Thomas
    Graf

15) Fix oops when deleting chains in nf_tables, from Patrick McHardy

16) Fix RX stalls in xen-netback driver, from Zoltan Kiss

17) Fix deadlock in mac80211 stack, from Emmanuel Grumbach

18) inet_nlmsg_size() forgets to consider ifa_cacheinfo, fix from Geert
    Uytterhoeven

19) tg3_change_mtu() can deadlock, fix from Nithin Sujir

20) Fix regression in setting SCTP local source addresses on accepted
    sockets, caused by some generic ipv6 socket changes.  Fix from
    Matija Glavinic Pecotic

21) IPPROTO_* must be pure defines, otherwise module aliases don't get
    constructed properly.  Fix from Jan Moskyto

22) IPV6 netconsole setup doesn't work properly unless an explicit
    source address is specified, fix from Sabrina Dubroca

23) Use __GFP_NORETRY for high order skb page allocations in
    sock_alloc_send_pskb and skb_page_frag_refill.  From Eric Dumazet

24) Fix a regression added in netconsole over bridging, from Cong Wang

25) TCP uses an artificial offset of 1ms for SRTT, but this doesn't jive
    well with TCP pacing which needs the SRTT to be accurate.  Fix from
    Eric Dumazet

26) Several cases of missing header file includes from Rashika Kheria

27) Add ZTE MF667 device ID to qmi_wwan driver, from Raymond Wanyoike

28) TCP Small Queues doesn't handle nonagle properly in some corner
    cases, fix from Eric Dumazet

29) Remove extraneous read_unlock in bond_enslave, whoops.  From Ding
    Tianhong

30) Fix 9p trans_virtio handling of vmalloc buffers, from Richard Yao

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (136 commits)
  6lowpan: fix lockdep splats
  alx: add missing stats_lock spinlock init
  9p/trans_virtio.c: Fix broken zero-copy on vmalloc() buffers
  bonding: remove unwanted bond lock for enslave processing
  USB2NET : SR9800 : One chip USB2.0 USB2NET SR9800 Device Driver Support
  tcp: tsq: fix nonagle handling
  bridge: Prevent possible race condition in br_fdb_change_mac_address
  bridge: Properly check if local fdb entry can be deleted when deleting vlan
  bridge: Properly check if local fdb entry can be deleted in br_fdb_delete_by_port
  bridge: Properly check if local fdb entry can be deleted in br_fdb_change_mac_address
  bridge: Fix the way to check if a local fdb entry can be deleted
  bridge: Change local fdb entries whenever mac address of bridge device changes
  bridge: Fix the way to find old local fdb entries in br_fdb_change_mac_address
  bridge: Fix the way to insert new local fdb entries in br_fdb_changeaddr
  bridge: Fix the way to find old local fdb entries in br_fdb_changeaddr
  tcp: correct code comment stating 3 min timeout for FIN_WAIT2, we only do 1 min
  net: vxge: Remove unused device pointer
  net: qmi_wwan: add ZTE MF667
  3c59x: Remove unused pointer in vortex_eisa_cleanup()
  net: fix 'ip rule' iif/oif device rename
  ...
2014-02-11 12:05:55 -08:00
Eric Dumazet
20e7c4e80d 6lowpan: fix lockdep splats
When a device ndo_start_xmit() calls again dev_queue_xmit(),
lockdep can complain because dev_queue_xmit() is re-entered and the
spinlocks protecting tx queues share a common lockdep class.

Same issue was fixed for bonding/l2tp/ppp in commits

0daa230302 ("[PATCH] bonding: lockdep annotation")
49ee49202b ("bonding: set qdisc_tx_busylock to avoid LOCKDEP splat")
23d3b8bfb8 ("net: qdisc busylock needs lockdep annotations ")
303c07db48 ("ppp: set qdisc_tx_busylock to avoid LOCKDEP splat ")

Reported-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-10 17:51:29 -08:00
John Greene
3e5ccc29f7 alx: add missing stats_lock spinlock init
Trivial fix for init time stack trace occuring in
alx_get_stats64 upon start up. Should have been part of
commit adding the spinlock:
f1b6b106 alx: add alx_get_stats64 operation

Signed-off-by: John Greene <jogreene@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-10 17:50:35 -08:00
Richard Yao
b6f52ae2f0 9p/trans_virtio.c: Fix broken zero-copy on vmalloc() buffers
The 9p-virtio transport does zero copy on things larger than 1024 bytes
in size. It accomplishes this by returning the physical addresses of
pages to the virtio-pci device. At present, the translation is usually a
bit shift.

That approach produces an invalid page address when we read/write to
vmalloc buffers, such as those used for Linux kernel modules. Any
attempt to load a Linux kernel module from 9p-virtio produces the
following stack.

[<ffffffff814878ce>] p9_virtio_zc_request+0x45e/0x510
[<ffffffff814814ed>] p9_client_zc_rpc.constprop.16+0xfd/0x4f0
[<ffffffff814839dd>] p9_client_read+0x15d/0x240
[<ffffffff811c8440>] v9fs_fid_readn+0x50/0xa0
[<ffffffff811c84a0>] v9fs_file_readn+0x10/0x20
[<ffffffff811c84e7>] v9fs_file_read+0x37/0x70
[<ffffffff8114e3fb>] vfs_read+0x9b/0x160
[<ffffffff81153571>] kernel_read+0x41/0x60
[<ffffffff810c83ab>] copy_module_from_fd.isra.34+0xfb/0x180

Subsequently, QEMU will die printing:

qemu-system-x86_64: virtio: trying to map MMIO memory

This patch enables 9p-virtio to correctly handle this case. This not
only enables us to load Linux kernel modules off virtfs, but also
enables ZFS file-based vdevs on virtfs to be used without killing QEMU.

Special thanks to both Avi Kivity and Alexander Graf for their
interpretation of QEMU backtraces. Without their guidence, tracking down
this bug would have taken much longer. Also, special thanks to Linus
Torvalds for his insightful explanation of why this should use
is_vmalloc_addr() instead of is_vmalloc_or_module_addr():

https://lkml.org/lkml/2014/2/8/272

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-10 17:48:54 -08:00
dingtianhong
6b8790b500 bonding: remove unwanted bond lock for enslave processing
The bond enslave processing don't hold bond->lock anymore,
so release an unlocked rw lock will cause warning message,
remove the unwanted read_unlock(&bond->lock).

Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-10 16:54:29 -08:00
Liu Junliang
19a38d8e0a USB2NET : SR9800 : One chip USB2.0 USB2NET SR9800 Device Driver Support
Signed-off-by: Liu Junliang <liujunliang_ljl@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-10 16:53:06 -08:00
Linus Torvalds
6792dfe383 Merge branch 'akpm' (patches from Andrew Morton)
Merge misc fixes from Andrew Morton:
 "A bunch of fixes"

* emailed patches fron Andrew Morton <akpm@linux-foundation.org>:
  ocfs2: check existence of old dentry in ocfs2_link()
  ocfs2: update inode size after zeroing the hole
  ocfs2: fix issue that ocfs2_setattr() does not deal with new_i_size==i_size
  mm/memory-failure.c: move refcount only in !MF_COUNT_INCREASED
  smp.h: fix x86+cpu.c sparse warnings about arch nonboot CPU calls
  mm: fix page leak at nfs_symlink()
  slub: do not assert not having lock in removing freed partial
  gitignore: add all.config
  ocfs2: fix ocfs2_sync_file() if filesystem is readonly
  drivers/edac/edac_mc_sysfs.c: poll timeout cannot be zero
  fs/file.c:fdtable: avoid triggering OOMs from alloc_fdmem
  xen: properly account for _PAGE_NUMA during xen pte translations
  mm/slub.c: list_lock may not be held in some circumstances
  drivers/md/bcache/extents.c: use %zi to format size_t
  vmcore: prevent PT_NOTE p_memsz overflow during header update
  drivers/message/i2o/i2o_config.c: fix deadlock in compat_ioctl(I2OGETIOPS)
  Documentation/: update 00-INDEX files
  checkpatch: fix detection of git repository
  get_maintainer: fix detection of git repository
  drivers/misc/sgi-gru/grukdump.c: unlocking should be conditional in gru_dump_context()
2014-02-10 16:03:16 -08:00
Xue jiufei
0e048316ff ocfs2: check existence of old dentry in ocfs2_link()
System call linkat first calls user_path_at(), check the existence of
old dentry, and then calls vfs_link()->ocfs2_link() to do the actual
work.  There may exist a race when Node A create a hard link for file
while node B rm it.

         Node A                          Node B
user_path_at()
  ->ocfs2_lookup(),
find old dentry exist
                                rm file, add inode say inodeA
                                to orphan_dir

call ocfs2_link(),create a
hard link for inodeA.

                                rm the link, add inodeA to orphan_dir
                                again

When orphan_scan work start, it calls ocfs2_queue_orphans() to do the
main work.  It first tranverses entrys in orphan_dir, linking all inodes
in this orphan_dir to a list look like this:

	inodeA->inodeB->...->inodeA

When tranvering this list, it will fall into loop, calling iput() again
and again.  And finally trigger BUG_ON(inode->i_state & I_CLEAR).

Signed-off-by: joyce <xuejiufei@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-10 16:01:43 -08:00
Junxiao Bi
c7d2cbc364 ocfs2: update inode size after zeroing the hole
fs-writeback will release the dirty pages without page lock whose offset
are over inode size, the release happens at
block_write_full_page_endio().  If not update, dirty pages in file holes
may be released before flushed to the disk, then file holes will contain
some non-zero data, this will cause sparse file md5sum error.

To reproduce the bug, find a big sparse file with many holes, like vm
image file, its actual size should be bigger than available mem size to
make writeback work more frequently, tar it with -S option, then keep
untar it and check its md5sum again and again until you get a wrong
md5sum.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Younger Liu <younger.liu@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-10 16:01:43 -08:00
Younger Liu
d62e74be12 ocfs2: fix issue that ocfs2_setattr() does not deal with new_i_size==i_size
The issue scenario is as following:

- Create a small file and fallocate a large disk space for a file with
  FALLOC_FL_KEEP_SIZE option.

- ftruncate the file back to the original size again.  but the disk free
  space is not changed back.  This is a real bug that be fixed in this
  patch.

In order to solve the issue above, we modified ocfs2_setattr(), if
attr->ia_size != i_size_read(inode), It calls ocfs2_truncate_file(), and
truncate disk space to attr->ia_size.

Signed-off-by: Younger Liu <younger.liu@huawei.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Tested-by: Jie Liu <jeff.liu@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Sunil Mushran <sunil.mushran@gmail.com>
Reviewed-by: Jensen <shencanquan@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-10 16:01:43 -08:00
Naoya Horiguchi
8d547ff4ac mm/memory-failure.c: move refcount only in !MF_COUNT_INCREASED
mce-test detected a test failure when injecting error to a thp tail
page.  This is because we take page refcount of the tail page in
madvise_hwpoison() while the fix in commit a3e0f9e47d
("mm/memory-failure.c: transfer page count from head page to tail page
after split thp") assumes that we always take refcount on the head page.

When a real memory error happens we take refcount on the head page where
memory_failure() is called without MF_COUNT_INCREASED set, so it seems
to me that testing memory error on thp tail page using madvise makes
little sense.

This patch cancels moving refcount in !MF_COUNT_INCREASED for valid
testing.

[akpm@linux-foundation.org: s/&&/&/]
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Chen Gong <gong.chen@linux.intel.com>
Cc: <stable@vger.kernel.org>	[3.9+: a3e0f9e47d]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-10 16:01:43 -08:00
Paul Gortmaker
fb37bb04d6 smp.h: fix x86+cpu.c sparse warnings about arch nonboot CPU calls
Use what we already do for arch_disable_smp_support() to fix these:

  arch/x86/kernel/smpboot.c:1155:6: warning: symbol 'arch_enable_nonboot_cpus_begin' was not declared. Should it be static?
  arch/x86/kernel/smpboot.c:1160:6: warning: symbol 'arch_enable_nonboot_cpus_end' was not declared. Should it be static?
  kernel/cpu.c:512:13: warning: symbol 'arch_enable_nonboot_cpus_begin' was not declared. Should it be static?
  kernel/cpu.c:516:13: warning: symbol 'arch_enable_nonboot_cpus_end' was not declared. Should it be static?

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-10 16:01:42 -08:00
Rafael Aquini
a0b54adda3 mm: fix page leak at nfs_symlink()
Changes in commit a0b8cab3b9 ("mm: remove lru parameter from
__pagevec_lru_add and remove parts of pagevec API") have introduced a
call to add_to_page_cache_lru() which causes a leak in nfs_symlink() as
now the page gets an extra refcount that is not dropped.

Jan Stancek observed and reported the leak effect while running test8
from Connectathon Testsuite.  After several iterations over the test
case, which creates several symlinks on a NFS mountpoint, the test
system was quickly getting into an out-of-memory scenario.

This patch fixes the page leak by dropping that extra refcount
add_to_page_cache_lru() is grabbing.

Signed-off-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Rafael Aquini <aquini@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: <stable@vger.kernel.org>	[3.11.x+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-10 16:01:42 -08:00
Steven Rostedt
1e4dd9461f slub: do not assert not having lock in removing freed partial
Vladimir reported the following issue:

Commit c65c1877bd ("slub: use lockdep_assert_held") requires
remove_partial() to be called with n->list_lock held, but free_partial()
called from kmem_cache_close() on cache destruction does not follow this
rule, leading to a warning:

  WARNING: CPU: 0 PID: 2787 at mm/slub.c:1536 __kmem_cache_shutdown+0x1b2/0x1f0()
  Modules linked in:
  CPU: 0 PID: 2787 Comm: modprobe Tainted: G        W    3.14.0-rc1-mm1+ #1
  Hardware name:
   0000000000000600 ffff88003ae1dde8 ffffffff816d9583 0000000000000600
   0000000000000000 ffff88003ae1de28 ffffffff8107c107 0000000000000000
   ffff880037ab2b00 ffff88007c240d30 ffffea0001ee5280 ffffea0001ee52a0
  Call Trace:
    __kmem_cache_shutdown+0x1b2/0x1f0
    kmem_cache_destroy+0x43/0xf0
    xfs_destroy_zones+0x103/0x110 [xfs]
    exit_xfs_fs+0x38/0x4e4 [xfs]
    SyS_delete_module+0x19a/0x1f0
    system_call_fastpath+0x16/0x1b

His solution was to add a spinlock in order to quiet lockdep.  Although
there would be no contention to adding the lock, that lock also requires
disabling of interrupts which will have a larger impact on the system.

Instead of adding a spinlock to a location where it is not needed for
lockdep, make a __remove_partial() function that does not test if the
list_lock is held, as no one should have it due to it being freed.

Also added a __add_partial() function that does not do the lock
validation either, as it is not needed for the creation of the cache.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Vladimir Davydov <vdavydov@parallels.com>
Suggested-by: David Rientjes <rientjes@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-10 16:01:42 -08:00
Borislav Petkov
25fba9bebe gitignore: add all.config
This is used by kbuild to load preset Kconfig options.  We need to
ignore it, otherwise git clean kills it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-10 16:01:42 -08:00
Younger Liu
a987c7ca7f ocfs2: fix ocfs2_sync_file() if filesystem is readonly
If filesystem is readonly, there is no need to flush drive's caches or
force any uncommitted transactions.

[akpm@linux-foundation.org: return -EROFS, not 0]
Signed-off-by: Younger Liu <younger.liucn@gmail.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-10 16:01:42 -08:00
Prarit Bhargava
79040cad3f drivers/edac/edac_mc_sysfs.c: poll timeout cannot be zero
If you do

  echo 0 > /sys/module/edac_core/parameters/edac_mc_poll_msec

the following stack trace is output because the edac module is not
designed to poll with a timeout of zero.

  WARNING: CPU: 12 PID: 0 at lib/list_debug.c:33 __list_add+0xac/0xc0()
  list_add corruption. prev->next should be next (ffff8808291dd1b8), but was           (null). (prev=ffff8808286fe3f8).
  Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod
  CPU: 12 PID: 0 Comm: swapper/12 Not tainted 3.13.0+ #1
  Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
  Call Trace:
   <IRQ>
    __list_add+0xac/0xc0
    __internal_add_timer+0xab/0x130
    internal_add_timer+0x17/0x40
    mod_timer_pinned+0xca/0x170
    intel_pstate_timer_func+0x28a/0x380
    call_timer_fn+0x36/0x100
    run_timer_softirq+0x1ff/0x2f0
    __do_softirq+0xf5/0x2e0
    irq_exit+0x10d/0x120
    smp_apic_timer_interrupt+0x45/0x60
    apic_timer_interrupt+0x6d/0x80
   <EOI>
    cpuidle_idle_call+0xb9/0x1f0
    arch_cpu_idle+0xe/0x30
    cpu_startup_entry+0x9e/0x240
    start_secondary+0x1e4/0x290

  kernel BUG at kernel/timer.c:1084!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod
  CPU: 12 PID: 0 Comm: swapper/12 Tainted: G        W    3.13.0+ #1
  Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
  Call Trace:
   <IRQ>
    run_timer_softirq+0x245/0x2f0
    __do_softirq+0xf5/0x2e0
    irq_exit+0x10d/0x120
    smp_apic_timer_interrupt+0x45/0x60
    apic_timer_interrupt+0x6d/0x80
   <EOI>
    cpuidle_idle_call+0xb9/0x1f0
    arch_cpu_idle+0xe/0x30
    cpu_startup_entry+0x9e/0x240
    start_secondary+0x1e4/0x290
  RIP   cascade+0x93/0xa0

  WARNING: CPU: 36 PID: 1154 at kernel/workqueue.c:1461 __queue_delayed_work+0xed/0x1a0()
  Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod
  CPU: 36 PID: 1154 Comm: kworker/u481:3 Tainted: G        W    3.13.0+ #1
  Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
  Workqueue: edac-poller edac_mc_workq_function [edac_core]
  Call Trace:
    dump_stack+0x45/0x56
    warn_slowpath_common+0x7d/0xa0
    warn_slowpath_null+0x1a/0x20
    __queue_delayed_work+0xed/0x1a0
    queue_delayed_work_on+0x27/0x50
    edac_mc_workq_function+0x72/0xa0 [edac_core]
    process_one_work+0x17b/0x460
    worker_thread+0x11b/0x400
    kthread+0xd2/0xf0
    ret_from_fork+0x7c/0xb0

This patch adds a range check in the edac_mc_poll_msec code to check for 0.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-10 16:01:41 -08:00
Eric W. Biederman
96c7a2ff21 fs/file.c:fdtable: avoid triggering OOMs from alloc_fdmem
Recently due to a spike in connections per second memcached on 3
separate boxes triggered the OOM killer from accept.  At the time the
OOM killer was triggered there was 4GB out of 36GB free in zone 1.  The
problem was that alloc_fdtable was allocating an order 3 page (32KiB) to
hold a bitmap, and there was sufficient fragmentation that the largest
page available was 8KiB.

I find the logic that PAGE_ALLOC_COSTLY_ORDER can't fail pretty dubious
but I do agree that order 3 allocations are very likely to succeed.

There are always pathologies where order > 0 allocations can fail when
there are copious amounts of free memory available.  Using the pigeon
hole principle it is easy to show that it requires 1 page more than 50%
of the pages being free to guarantee an order 1 (8KiB) allocation will
succeed, 1 page more than 75% of the pages being free to guarantee an
order 2 (16KiB) allocation will succeed and 1 page more than 87.5% of
the pages being free to guarantee an order 3 allocate will succeed.

A server churning memory with a lot of small requests and replies like
memcached is a common case that if anything can will skew the odds
against large pages being available.

Therefore let's not give external applications a practical way to kill
linux server applications, and specify __GFP_NORETRY to the kmalloc in
alloc_fdmem.  Unless I am misreading the code and by the time the code
reaches should_alloc_retry in __alloc_pages_slowpath (where
__GFP_NORETRY becomes signification).  We have already tried everything
reasonable to allocate a page and the only thing left to do is wait.  So
not waiting and falling back to vmalloc immediately seems like the
reasonable thing to do even if there wasn't a chance of triggering the
OOM killer.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Cong Wang <cwang@twopensource.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-10 16:01:41 -08:00
Mel Gorman
a9c8e4beee xen: properly account for _PAGE_NUMA during xen pte translations
Steven Noonan forwarded a users report where they had a problem starting
vsftpd on a Xen paravirtualized guest, with this in dmesg:

  BUG: Bad page map in process vsftpd  pte:8000000493b88165 pmd:e9cc01067
  page:ffffea00124ee200 count:0 mapcount:-1 mapping:     (null) index:0x0
  page flags: 0x2ffc0000000014(referenced|dirty)
  addr:00007f97eea74000 vm_flags:00100071 anon_vma:ffff880e98f80380 mapping:          (null) index:7f97eea74
  CPU: 4 PID: 587 Comm: vsftpd Not tainted 3.12.7-1-ec2 #1
  Call Trace:
    dump_stack+0x45/0x56
    print_bad_pte+0x22e/0x250
    unmap_single_vma+0x583/0x890
    unmap_vmas+0x65/0x90
    exit_mmap+0xc5/0x170
    mmput+0x65/0x100
    do_exit+0x393/0x9e0
    do_group_exit+0xcc/0x140
    SyS_exit_group+0x14/0x20
    system_call_fastpath+0x1a/0x1f
  Disabling lock debugging due to kernel taint
  BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:0 val:-1
  BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:1 val:1

The issue could not be reproduced under an HVM instance with the same
kernel, so it appears to be exclusive to paravirtual Xen guests.  He
bisected the problem to commit 1667918b64 ("mm: numa: clear numa
hinting information on mprotect") that was also included in 3.12-stable.

The problem was related to how xen translates ptes because it was not
accounting for the _PAGE_NUMA bit.  This patch splits pte_present to add
a pteval_present helper for use by xen so both bare metal and xen use
the same code when checking if a PTE is present.

[mgorman@suse.de: wrote changelog, proposed minor modifications]
[akpm@linux-foundation.org: fix typo in comment]
Reported-by: Steven Noonan <steven@uplinklabs.net>
Tested-by: Steven Noonan <steven@uplinklabs.net>
Signed-off-by: Elena Ufimtseva <ufimtseva@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: <stable@vger.kernel.org>	[3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-10 16:01:41 -08:00
David Rientjes
255d0884f5 mm/slub.c: list_lock may not be held in some circumstances
Commit c65c1877bd ("slub: use lockdep_assert_held") incorrectly
required that add_full() and remove_full() hold n->list_lock.  The lock
is only taken when kmem_cache_debug(s), since that's the only time it
actually does anything.

Require that the lock only be taken under such a condition.

Reported-by: Larry Finger <Larry.Finger@lwfinger.net>
Tested-by: Larry Finger <Larry.Finger@lwfinger.net>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-10 16:01:41 -08:00
Geert Uytterhoeven
bd180b4e2b drivers/md/bcache/extents.c: use %zi to format size_t
drivers/md/bcache/extents.c: In function `btree_ptr_bad_expensive':
  drivers/md/bcache/extents.c:196: warning: format `%li' expects type `long int', but argument 4 has type `size_t'

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-10 16:01:40 -08:00