Commit Graph

525882 Commits

Author SHA1 Message Date
Geert Uytterhoeven
8e690ffdbc flow_dissector: Pre-initialize ip_proto in __skb_flow_dissect()
net/core/flow_dissector.c: In function ‘__skb_flow_dissect’:
net/core/flow_dissector.c:132: warning: ‘ip_proto’ may be used uninitialized in this function

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-28 16:53:54 -07:00
Li, Liang Z
905726c1c5 xen-netfront: Remove the meaningless code
The function netif_set_real_num_tx_queues() will return -EINVAL if
the second parameter < 1, so call this function with the second
parameter set to 0 is meaningless.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-28 16:49:14 -07:00
Andy Gospodarek
96ac5cc963 ipv4: fix RCU lockdep warning from linkdown changes
The following lockdep splat was seen due to the wrong context for
grabbing in_dev.

===============================
[ INFO: suspicious RCU usage. ]
4.1.0-next-20150626-dbg-00020-g54a6d91-dirty #244 Not tainted
-------------------------------
include/linux/inetdevice.h:205 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
2 locks held by ip/403:
 #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff81453305>] rtnl_lock+0x17/0x19
 #1:  ((inetaddr_chain).rwsem){.+.+.+}, at: [<ffffffff8105c327>] __blocking_notifier_call_chain+0x35/0x6a

stack backtrace:
CPU: 2 PID: 403 Comm: ip Not tainted 4.1.0-next-20150626-dbg-00020-g54a6d91-dirty #244
 0000000000000001 ffff8800b189b728 ffffffff8150a542 ffffffff8107a8b3
 ffff880037bbea40 ffff8800b189b758 ffffffff8107cb74 ffff8800379dbd00
 ffff8800bec85800 ffff8800bf9e13c0 00000000000000ff ffff8800b189b7d8
Call Trace:
 [<ffffffff8150a542>] dump_stack+0x4c/0x6e
 [<ffffffff8107a8b3>] ? up+0x39/0x3e
 [<ffffffff8107cb74>] lockdep_rcu_suspicious+0xf7/0x100
 [<ffffffff814b63c3>] fib_dump_info+0x227/0x3e2
 [<ffffffff814b6624>] rtmsg_fib+0xa6/0x116
 [<ffffffff814b978f>] fib_table_insert+0x316/0x355
 [<ffffffff814b362e>] fib_magic+0xb7/0xc7
 [<ffffffff814b4803>] fib_add_ifaddr+0xb1/0x13b
 [<ffffffff814b4d09>] fib_inetaddr_event+0x36/0x90
 [<ffffffff8105c086>] notifier_call_chain+0x4c/0x71
 [<ffffffff8105c340>] __blocking_notifier_call_chain+0x4e/0x6a
 [<ffffffff8105c370>] blocking_notifier_call_chain+0x14/0x16
 [<ffffffff814a7f50>] __inet_insert_ifa+0x1a5/0x1b3
 [<ffffffff814a894d>] inet_rtm_newaddr+0x350/0x35f
 [<ffffffff81457866>] rtnetlink_rcv_msg+0x17b/0x18a
 [<ffffffff8107e7c3>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff8146965f>] ? netlink_deliver_tap+0x1cb/0x1f7
 [<ffffffff814576eb>] ? rtnl_newlink+0x72a/0x72a
...

This patch resolves that splat.

Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Reported-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-28 16:47:12 -07:00
Jon Paul Maloy
7d967b673c tipc: purge backlog queue counters when broadcast link is reset
In commit 1f66d161ab
("tipc: introduce starvation free send algorithm")
we introduced a counter per priority level for buffers
in the link backlog queue. We also introduced a new
function tipc_link_purge_backlog(), to reset these
counters to zero when the link is reset.

Unfortunately, we missed to call this function when
the broadcast link is reset, with the result that the
values of these counters might be permanently skewed
when new nodes are attached. This may in the worst case
lead to permananent, but spurious, broadcast link
congestion, where no broadcast packets can be sent at
all.

We fix this bug with this commit.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-28 16:43:02 -07:00
David S. Miller
011cb197a8 Merge branch 'bnx2x'
Yuval Mintz says:

====================
bnx2x: various fixes

This patch series contains several small fixes [with the possible
exception of the first 2 link fixes] for various driver flows.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-25 06:30:43 -07:00
Yuval Mintz
592b9b8d68 bnx2x: Fix linearization for encapsulated packets
Due to FW constraints, driver must make sure that transmitted SKBs will
not be too fragmented, or in the case that they are - that each 'window'
of fragments passed to the FW would contain at least an mss worth of data.

For encapsultaed packets the calculation is wrong, since it ignores the
inner headers in the calculation of the headers' length.
This could lead to a FW assertion in case of a too-fragmented encapsulated
packet.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-25 06:30:38 -07:00
Yuval Mintz
efd38b8f52 bnx2x: Release nvram lock on error flow
During an error flow when trying to access the nvram the driver doesn't
release the hw lock it acquired.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-25 06:30:38 -07:00
Ariel Elior
dc6a20aa3b bnx2x: Fix statistics gathering on link change
Since driver statistics flow access MACs and those might reset during
link re-configurations, when we're about to change link properties we
have to make sure that statistics are not operational.
Statisics would be re-enabled [i.e., gathering of statistics would
re-commence] once physical link is achieved again.

Since driver employs a link-flap avoidance scheme, there are scenarios
where driver will receive no indication that the new link is up, and
as a result the statistics would not be re-enabled.

Preventing LFA from working in such cases would guarantee that we'll
always receive such indications and thus will fix statistics gathering.

Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-25 06:30:37 -07:00
Yuval Mintz
2f43b821b5 bnx2x: Fix self-test for 20g devices
20g-capable devices are not configured properly for self-test, using
10g as their speed which cause the link indication to remain down and
fail the internal loopback test.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-25 06:30:35 -07:00
Shahed Shaikh
bb9e9c1d20 bnx2x: Fix VF MAC removal
There's a bug in today's driver where VF requests to add/remove MAC filters
always reach the Hypervisor as add requests.
This prevents the VF from changing its MAC address, as it cannot remove the
previously configured MAC and runs out of MAC credits.

Signed-off-by: Shahed Shaikh <Shahed.Shaikh@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-25 06:30:35 -07:00
Manish Chopra
ad6afbe957 bnx2x: Don't notify about scratchpad parities
The scratchpad is a shared block between all functions of a given device.
Due to HW limitations, we can't properly close its parity notifications
to all functions on legal flows.
E.g., it's possible that while taking a register dump from one function
a parity error would be triggered on other functions.

Today driver doesn't consider this parity as a 'real' parity unless its
being accompanied by additional indications [which would happen in a real
parity scenario]; But it does print notifications for such events in the
system logs.

This eliminates such prints - in case of real parities driver would have
additional indications; But if this is the only signal user will not even
see a parity being logged in the system.

Signed-off-by: Manish Chopra <Manish.Chopra@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-25 06:30:34 -07:00
Yuval Mintz
9d18d270d7 bnx2x: Prevent false warning when accessing MACs
Each time a flow finishes reads from the classification shadow
configuration in the driver, that flow would check for pending commands
and pass them to FW if possible.
In case there's already a completion pending command, I.e., a ramrod
that has been sent to the FW and is yet to be completed while said flow
tries to configure the pending command we would get a false error message
in logs [and panic if SOE was used for driver compilation] since the
command could not have been completed.

This prevents said print [and panic]; The pending command will be sent by
the time the completion of the current sent command would arrive.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-25 06:30:34 -07:00
Yuval Mintz
5d67c1c593 bnx2x: Correct speed from baseT into KR.
ethtool shows KR supported/advertised speeds incorrectly as baseT
in cases the board is in fact KR-base.

Signed-off-by: Yaniv Rosner <Yaniv.Rosner@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-25 06:30:33 -07:00
Yuval Mintz
1359d73c1d bnx2x: Correct asymmetric flow-control
This fixes several issues relating to asymmetric configuration:
 1. When user requests to disable TX, the local-device needs to
    advertise both PAUSE and ASM_DIR, but to avoid transmitting pause
    frames. In the 578xx, it would ignore the TX disable.

 2. When user advertises RX-only, ASM_DIR was advertised instead of
    PAUSE/ASM_DIR.

 3. When changing mode, the advertised PAUSE/ASM_DIR was not cleared
    before setting new one, so disabling RX or TX had no impact on the
    'advertised' as appeared in the 'ethtool -a' output.

Signed-off-by: Yaniv Rosner <Yaniv.Rosner@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-25 06:30:33 -07:00
Jamal Hadi Salim
b175c3a44f net: sched: flower fix typo
Fix typo in the validation rules for flower's attributes

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-25 05:23:02 -07:00
Govindarajulu Varadarajan
f586a33601 enic: use atomic_t instead of spin_lock in busy poll
We use spinlock to access a single flag. We can avoid spin_locks by using
atomic variable and atomic_cmpxchg(). Use atomic_cmpxchg to set the flag
for idle to poll. And a simple atomic_set to unlock (set idle from poll).

In napi poll, if gro is enabled, we call napi_gro_receive() to deliver the
packets. Before we call napi_complete(), i.e while re-polling, if low
latency busy poll is called, we use netif_receive_skb() to deliver the packets.
At this point if there are some skb's held in GRO, busy poll could deliver the
packets out of order. So we call napi_gro_flush() to flush skbs before we
move the napi poll to idle.

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-25 05:23:01 -07:00
Shaohui Xie
1298267b54 net/phy: Add Vitesse 8641 phy ID
Vitesse VSC8641 is compatible with Vitesse 82xx

Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-25 02:13:05 -07:00
Alison Wang
bbc65bf7e0 net/fsl: remove dependency FSL_SOC for Gianfar
CONFIG_GIANFAR is not depended on FSL_SOC, it
can be built on non-PPC platforms.

Signed-off-by: Alison Wang <alison.wang@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-25 02:13:04 -07:00
Dan Carpenter
cbdb97773e cavium/liquidio: fix some error handling in lio_set_phys_id()
There was a missing assignment so the "if (ret)" on the next line is
never true.

Fixes: f21fb3ed36 ('Add support of Cavium Liquidio ethernet adapters')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-25 02:13:03 -07:00
Dan Carpenter
5102e23791 renesas: missing unlock on error path
We need to unlock before returning here.

Fixes: a0d2f20650 ('Renesas Ethernet AVB PTP clock driver')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-25 02:13:02 -07:00
David S. Miller
be35ffa38e Merge branch 'mlx4'
Or Gerlitz says:

====================
mlx4 driver fixes, June 24, 2015

Some fixes that we made recently, all need to go into stable.

patch #1 "net/mlx4_en: Release TX QP when destroying TX ring" and patch #3
"Fix wrong csum complete report when rxvlan offload is disabled" to >= 3.19

patch #2 "Wake TX queues only when there's enough room" addressing a bug
which is there from day one... should go to whatever kernels it's still applicable

patch #4 "mlx4: Disable HA for SRIOV PF RoCE devices" to >= 4.0

The patches are marked with net but are made against net-next,
as the net tree still doesn't contain all the net-next bits.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-25 02:06:33 -07:00
Or Gerlitz
7254acffee mlx4: Disable HA for SRIOV PF RoCE devices
When in HA mode, the driver exposes an IB (RoCE) device instance with only
one port. Under SRIOV, the existing implementation doesn't go well with
the PF RoCE driver's role of Special QPs Para-Virtualization, etc.

As such, disable HA for the mlx4 PF RoCE device in SRIOV mode.

Fixes: a575009030 ('IB/mlx4: Add port aggregation support')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-25 02:06:29 -07:00
Ido Shamay
79a258526c net/mlx4_en: Fix wrong csum complete report when rxvlan offload is disabled
The check_csum() function relied on hwtstamp_rx_filter to know if rxvlan
offload is disabled. This is wrong since rxvlan offload can be switched
on/off regardless of hwtstamp_rx_filter.

Also moved check_csum to query CQE information to identify VLAN packets
and removed the check of IP packets, since it has been validated before.

Fixes: f8c6455bb0 ('net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE')
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-25 02:06:28 -07:00
Ido Shamay
488a9b48e3 net/mlx4_en: Wake TX queues only when there's enough room
Indication of a single completed packet, marked by txbbs_skipped
being bigger then zero, in not enough in order to wake up a
stopped TX queue. The completed packet may contain a single TXBB,
while next packet to be sent (after the wake up) may have multiple
TXBBs (LSO/TSO packets for example), causing overflow in queue followed
by WQE corruption and TX queue timeout.
Instead, wake the stopped queue only when there's enough room for the
worst case (maximum sized WQE) packet that we should need to handle after
the queue is opened again.

Also created an helper routine - mlx4_en_is_tx_ring_full, which checks
if the current TX ring is full or not. It provides better code readability
and removes code duplication.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-25 02:06:27 -07:00
Eran Ben Elisha
0eb08514fd net/mlx4_en: Release TX QP when destroying TX ring
TX ring QP wasn't released at mlx4_en_destroy_tx_ring. Instead, the code
used the deprecated base_tx_qpn field. Move TX QP release to
mlx4_en_destroy_tx_ring and remove the base_tx_qpn field.

Fixes: ddae0349fd ('net/mlx4: Change QP allocation scheme')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-25 02:06:26 -07:00
Linus Torvalds
aefbef10e3 Merge branch 'akpm' (patches from Andrew)
Merge first patchbomb from Andrew Morton:

 - a few misc things

 - ocfs2 udpates

 - kernel/watchdog.c feature work (took ages to get right)

 - most of MM.  A few tricky bits are held up and probably won't make 4.2.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (91 commits)
  mm: kmemleak_alloc_percpu() should follow the gfp from per_alloc()
  mm, thp: respect MPOL_PREFERRED policy with non-local node
  tmpfs: truncate prealloc blocks past i_size
  mm/memory hotplug: print the last vmemmap region at the end of hot add memory
  mm/mmap.c: optimization of do_mmap_pgoff function
  mm: kmemleak: optimise kmemleak_lock acquiring during kmemleak_scan
  mm: kmemleak: avoid deadlock on the kmemleak object insertion error path
  mm: kmemleak: do not acquire scan_mutex in kmemleak_do_cleanup()
  mm: kmemleak: fix delete_object_*() race when called on the same memory block
  mm: kmemleak: allow safe memory scanning during kmemleak disabling
  memcg: convert mem_cgroup->under_oom from atomic_t to int
  memcg: remove unused mem_cgroup->oom_wakeups
  frontswap: allow multiple backends
  x86, mirror: x86 enabling - find mirrored memory ranges
  mm/memblock: allocate boot time data structures from mirrored memory
  mm/memblock: add extra "flags" to memblock to allow selection of memory based on attribute
  mm: do not ignore mapping_gfp_mask in page cache allocation paths
  mm/cma.c: fix typos in comments
  mm/oom_kill.c: print points as unsigned int
  mm/hugetlb: handle races in alloc_huge_page and hugetlb_reserve_pages
  ...
2015-06-24 20:47:21 -07:00
Linus Torvalds
266da6f142 Miscellaneous pstore improvements
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJViyiCAAoJEKurIx+X31iB3acP/1kPYKClGZLoqH6wqHNM/djq
 ROsPm9PDt9g7WZ/yw5HpKuUzC+XhOg4odYpZIy+6PogW6BUCumxPCLVq/qbVSPhT
 Q7Pv0mlmjyJS+kj6FncWuJrJG0xQfKYE6OeYrnkyHfrHYmJunf1bS6K71CDGHJAa
 O2bQr4E67twuU8yQR8BZ+YlZu4NTzPYZ4JWmb9Wepm3seIM2GEJsNRZ7WJXH7BOv
 BHOPi8FDr8fMkA+2WitE853gYvcTYcuxlsDgRumtGzWDhRIUH8Q5yS9QLAFd0Rly
 BW7YOHYCY1L75RJxnVTWd04GNrepxe4LY1bbtx+mqI6FrdMw0dK0M5BKuohV+BT0
 tBC/anSHBOqua/aDA6m+8c+p8I7qp1wXNHtmm15lqKQg1YHvh0Rs7FlP3HBdLDxQ
 rmUQHTcVQGaf00GCTUgsEn80kW8FYYtOnh4FJcbSkLdU8/mkr1+1rE/3i8ob/AL9
 EsJgzqptT9/VBX3j6tZgk8tt6xstLMGVw/DmScjxeLqA2WoaINh5XRPgGCGWQW6T
 wa9nFGBuJFtJv9NfVlMEe8xekxyDa6xPIgoCheWBcV9NFRmfBrUmbG4yF127nqTG
 TXYBzFPSxdBn0FqGIaz+6RPbcGd7tZuz317sIYHTgHBWQHeoWG9TJGHeGt8L5uql
 eKMoHAvVRZIyBuZruUeB
 =248K
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux

Pull pstore updates from Tony Luck:
 "Miscellaneous pstore improvements"

* tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  ramoops: make it possible to change mem_type param.
  pstore/ram: verify ramoops header before saving record
  fs/pstore: Optimization function ramoops_init_przs
  fs/pstore: update the backend parameter in pstore module
  pstore: do not use message compression without lock
2015-06-24 20:42:21 -07:00
Linus Torvalds
cfcc0ad47f Merge tag 'for-f2fs-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
 "New features:
   - per-file encryption (e.g., ext4)
   - FALLOC_FL_ZERO_RANGE
   - FALLOC_FL_COLLAPSE_RANGE
   - RENAME_WHITEOUT

  Major enhancement/fixes:
   - recovery broken superblocks
   - enhance f2fs_trim_fs with a discard_map
   - fix a race condition on dentry block allocation
   - fix a deadlock during summary operation
   - fix a missing fiemap result

  .. and many minor bug fixes and clean-ups were done"

* tag 'for-f2fs-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (83 commits)
  f2fs: do not trim preallocated blocks when truncating after i_size
  f2fs crypto: add alloc_bounce_page
  f2fs crypto: fix to handle errors likewise ext4
  f2fs: drop the volatile_write flag only
  f2fs: skip committing valid superblock
  f2fs: setting discard option in parse_options()
  f2fs: fix to return exact trimmed size
  f2fs: support FALLOC_FL_INSERT_RANGE
  f2fs: hide common code in f2fs_replace_block
  f2fs: disable the discard option when device doesn't support
  f2fs crypto: remove alloc_page for bounce_page
  f2fs: fix a deadlock for summary page lock vs. sentry_lock
  f2fs crypto: clean up error handling in f2fs_fname_setup_filename
  f2fs crypto: avoid f2fs_inherit_context for symlink
  f2fs crypto: do not set encryption policy for non-directory by ioctl
  f2fs crypto: allow setting encryption policy once
  f2fs crypto: check context consistent for rename2
  f2fs: avoid duplicated code by reusing f2fs_read_end_io
  f2fs crypto: use per-inode tfm structure
  f2fs: recovering broken superblock during mount
  ...
2015-06-24 20:38:29 -07:00
Linus Torvalds
a7296b49fb Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull UDF fixes and cleanups from Jan Kara:
 "The contains some small fixes and improvements in error handling for
  UDF.

  Bundled is also one ext3 coding style fix and a fix in quota
  documentation"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: fix udf_load_pvoldesc()
  udf: remove double err declaration in udf_file_write_iter()
  UDF: support NFSv2 export
  fs: ext3: super: fixed a space coding style issue
  quota: Update documentation
  udf: Return error from udf_find_entry()
  udf: Make udf_get_filename() return error instead of 0 length file name
  udf: bug on exotic flag in udf_get_filename()
  udf: improve error management in udf_CS0toNLS()
  udf: improve error management in udf_CS0toUTF8()
  udf: unicode: update function name in comments
  udf: remove unnecessary test in udf_build_ustr_exact()
  udf: Return -ENOMEM when allocation fails in udf_get_filename()
2015-06-24 20:07:10 -07:00
Linus Torvalds
1e467e68e5 Documentation updates for 4.2
The main thing here is Ingo's big subdirectory documenting feature support
 for each architecture.  Beyond that, it's the usual pile of fixes, tweaks,
 and small additions.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVi0g2AAoJEI3ONVYwIuV6Me4QAIfa79z05ABSjlyWaKw46plH
 lULR9cyHdR59JVPHKjSOfT9/c+GOdoz6kkXQoe/TgVyj5fRB8seUW5GJXCASndkk
 aVd4c6yKFH1NISXsSdVQC0JbpgAURgcSR6x59It++fG3NINvXronFTWGMBHMLKcI
 A2hM2jNP914Dy5r4ipWZKzF1KxIlqK9kmLxlNoE6/LoQfBhh1dMdnyfuM11sguAy
 s5pr9JeCPbWC0RE7st/qEivXF4lpj6hd3XoYfM2Y+oukj5xEPQevLTLHOgtesnx9
 guUAul5Sw27n+Dx8I0Qxf1n+5SkrijoAa72g5vAxTs+ilOey67qba012NaYSy7RK
 s15XOIZ/1JTS9JjkO7GR5NbG6AiIIAH5P+Y501ivCIrsWciTOgKj7cOzakIEV8/P
 NX4120Lh5lbBrWeYkl8WbgMO0Me8cThbALC+rncF/wjvGyREKyxNlZ9qvBqmHYjG
 5Et2DT+rANaDmmblgMK3tX/zI1g3pN51e+CRF+Hzh1jZD3MZ/i+KS4qgfGFDzMIj
 uoniO5VfyD4zRbyv4Grg7XMpXiP8xFxKDypglYiXzzwlkarUgbMGOoFE7AkiPOKB
 t9gLPetbDsDyU/bSpzHlfObZp+q+pCxHPhyLS7hxEi3gBxYajIMbkpHHJugnE0+H
 TfkIhy6QQm1vAPTpRXaE
 =ODt8
 -----END PGP SIGNATURE-----

Merge tag 'docs-for-linus' of git://git.lwn.net/linux-2.6

Pull documentation updates from Jonathan Corbet:
 "The main thing here is Ingo's big subdirectory documenting feature
  support for each architecture.  Beyond that, it's the usual pile of
  fixes, tweaks, and small additions"

* tag 'docs-for-linus' of git://git.lwn.net/linux-2.6: (79 commits)
  doc:md: fix typo in md.txt.
  Documentation/mic/mpssd: don't build x86 userspace when cross compiling
  Documentation/prctl: don't build tsc tests when cross compiling
  Documentation/vDSO: don't build tests when cross compiling
  Doc:ABI/testing: Fix typo in sysfs-bus-fcoe
  Doc: Docbook: Change wikipedia's URL from http to https in scsi.tmpl
  Doc: Change wikipedia's URL from http to https
  Documentation/kernel-parameters: add missing pciserial to the earlyprintk
  Doc:pps: Fix typo in pps.txt
  kbuild : Fix documentation of INSTALL_HDR_PATH
  Documentation: filesystems: updated struct file_operations documentation in vfs.txt
  kbuild: edit explanation of clean-files variable
  Doc: ja_JP: Fix typo in HOWTO
  Move freefall program from Documentation/ to tools/
  Documentation: ARM: EXYNOS: Describe boot loaders interface
  Doc:nfc: Fix typo in nfc-hci.txt
  vfs: Minor documentation fix
  Doc: networking: txtimestamp: fix printf format warning
  Documentation, intel_pstate: Improve legacy mode internal governors description
  Documentation: extend use case for EXPORT_SYMBOL_GPL()
  ...
2015-06-24 20:01:36 -07:00
Linus Torvalds
14738e0331 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input subsystem updates from Dmitry Torokhov:
 "Thanks to Samuel Thibault input device (keyboard) LEDs are no longer
  hardwired within the input core but use LED subsystem and so allow use
  of different triggers; Hans de Goede did a large update for the ALPS
  touchpad driver; we have new TI drv2665 haptics driver and DA9063
  OnKey driver, and host of other drivers got various fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (55 commits)
  Input: pixcir_i2c_ts - fix receive error
  MAINTAINERS: remove non existent input mt git tree
  Input: improve usage of gpiod API
  tty/vt/keyboard: define LED triggers for VT keyboard lock states
  tty/vt/keyboard: define LED triggers for VT LED states
  Input: export LEDs as class devices in sysfs
  Input: cyttsp4 - use swap() in cyttsp4_get_touch()
  Input: goodix - do not explicitly set evbits in input device
  Input: goodix - export id and version read from device
  Input: goodix - fix variable length array warning
  Input: goodix - fix alignment issues
  Input: add OnKey driver for DA9063 MFD part
  Input: elan_i2c - add product IDs FW names
  Input: elan_i2c - add support for multi IC type and iap format
  Input: focaltech - report finger width to userspace
  tty: remove platform_sysrq_reset_seq
  Input: synaptics_i2c - use proper boolean values
  Input: psmouse - use true instead of 1 for boolean values
  Input: cyapa - fix a few typos in comments
  Input: stmpe-ts - enforce device tree only mode
  ...
2015-06-24 19:56:58 -07:00
Linus Torvalds
45471cd98d EDAC changes, v2:
* New APM X-Gene SoC EDAC driver (Loc Ho)
 
 * AMD error injection module improvements (Aravind Gopalakrishnan)
 
 * Altera Arria 10 support (Thor Thayer)
 
 * misc fixes and cleanups all over the place
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJViuInAAoJEBLB8Bhh3lVKHT8QAKkHIMreO8obo09haxNJlfdF
 BaG7SNEDhvcgQ1B76RsjnjkUpsivvUt+mCYMP+BxcAqFrTA33UZCCOK5tEhGb1wr
 matRdR6+aezqAl2e/0/Ti25bWOkDxcOeazh2TyezuyIXtaJjOq1oZC7OaYGmxPun
 NlZY+/uY1eiHlewKsK04y8G8J5i4wGoKnuxBvOyELT90+a+fLfAOshAp0D4r0piB
 Znv0ydsHlu+Wx57slg1DktlsyswmcGS9WfWwwTlELOLulKgN8wEAVYzUB5pJzNbz
 ehq0J4wYz95juXADC4M4tEjErHVJNl6PbyMqwt0+XUUJ1NSgOj7Q6iqwxDoZX8km
 oxiLVydQBtoIzF1LojFKAVZDFnrMKHKwK3RaDaUJjTI90+tVzEU8xsBlUf6+EgD2
 Ss2RH8Gfuf52RdtwHh9++T1ur5rM9YNCAm31msq06mcOf0bEtmDbhZ+fVC5mjhqB
 fIb3hxnk0r2BVg+ZCN/boxGS6RzUtYVcCXaBPDMeHcg9BEEds70KCFEcsX7TvJIg
 5/SHI+033MylqkX2zrgDQLj7CQk3R0jaotHVbdhLupyOldcM7r5uF+VO84drNWGN
 GfM2lpyE/swZWnzKuotgYIGR1XvFjtJAVAyNGIvwP+ajjTsqXzEnLSLClY5LWfYd
 nSSSMpCCqsEmhoWftOix
 =Id4f
 -----END PGP SIGNATURE-----

Merge tag 'edac_for_4.2_2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull EDAC updates from Borislav Petkov:

 - New APM X-Gene SoC EDAC driver (Loc Ho)

 - AMD error injection module improvements (Aravind Gopalakrishnan)

 - Altera Arria 10 support (Thor Thayer)

 - misc fixes and cleanups all over the place

* tag 'edac_for_4.2_2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (28 commits)
  EDAC: Update Documentation/edac.txt
  EDAC: Fix typos in Documentation/edac.txt
  EDAC, mce_amd_inj: Set MISCV on injection
  EDAC, mce_amd_inj: Move bit preparations before the injection
  EDAC, mce_amd_inj: Cleanup and simplify README
  EDAC, altera: Do not allow suspend when EDAC is enabled
  EDAC, mce_amd_inj: Make inj_type static
  arm: socfpga: dts: Add Arria10 SDRAM EDAC DTS support
  EDAC, altera: Add Arria10 EDAC support
  EDAC, altera: Refactor for Altera CycloneV SoC
  EDAC, altera: Generalize driver to use DT Memory size
  EDAC, mce_amd_inj: Add README file
  EDAC, mce_amd_inj: Add individual permissions field to dfs_node
  EDAC, mce_amd_inj: Modify flags attribute to use string arguments
  EDAC, mce_amd_inj: Read out number of MCE banks from the hardware
  EDAC, mce_amd_inj: Use MCE_INJECT_GET macro for bank node too
  EDAC, xgene: Fix cpuid abuse
  EDAC, mpc85xx: Extend error address to 64 bit
  EDAC, mpc8xxx: Adapt for FSL SoC
  EDAC, edac_stub: Drop arch-specific include
  ...
2015-06-24 19:52:06 -07:00
Linus Torvalds
93a4b1b946 Here is the bulk of pin control changes for the v4.2 series:
- Core functionality:
   - Enable exclusive pin ownership: it is possible to flag a pin
     controller so that GPIO and other functions cannot use a single
     pin simultaneously.
 
 - New drivers:
   - NXP LPC18xx System Control Unit pin controller
   - Imagination Pistachio SoC pin controller
 
 - New subdrivers:
   - Freescale i.MX7d SoC
   - Intel Sunrisepoint-H PCH
   - Renesas PFC R8A7793
   - Renesas PFC R8A7794
   - Mediatek MT6397, MT8127
   - SiRF Atlas 7
   - Allwinner A33
   - Qualcomm MSM8660
   - Marvell Armada 395
   - Rockchip RK3368
 
 - Cleanups:
   - A big cleanup of the Marvell MVEBU driver rectifying it to
     correspond to reality
   - Drop platform device probing from the SH PFC driver, we are now a
     DT only shop for SuperH
   - Drop obsolte multi-platform check for SH PFC
   - Various janitorial: constification, grammar etc
 
 - Improvements:
   - The AT91 GPIO portions now supports the set_multiple() feature
   - Split out SPI pins on the Xilinx Zynq
   - Support DTs without specific function nodes in the i.MX driver
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVin37AAoJEEEQszewGV1zIlQP/i6+C47z3OV67hYAOmlGoynl
 wsdTFbyp+GIPl3N1r0lRzxOfQsuc9t93iDMrC5ssN9VFaj8MgH/j3XKWf5A55iVn
 u7nNQzIFjzTwl58/Pu4oM+d9l5i26o44teFKh3xI4aup4AFed3+lDkQtRipgo29c
 V4y+6SaQxQ46e2qaOAM20gEagm2a8EvChn1Zo/HLQnnmZcKBxgObJna7iTZWm+fN
 LzyBWtczFYPxfQ9IqYzklyeou4ohfrcHzqN71IEtmGMXxob+i04QS9FQXaPitgBG
 UORjwFVh8690n3ETQobjLrylOF5F/3+RdCGqanYOLgaJ0aix4+EByLz9FbxLPnJk
 4Utijk2SKxLUb3dXZIfpwKtmPmvLJkFqwSazN5WDIg9Rjqz/H1p9UTWP0cfPRwJa
 9INDZeK833kjYdtK6UMBpuNFkgGtpKTlhMX/cI78KYsEwVgK8r69b7uNr+2OUMgh
 4i7dbHgb5/NpHlUlacVPTBvXf7C1iQ//vqh0Oc20lp/mAY1tVGuYRHno6QVyRtfS
 DmCNPtbAgCa9FmP/t5NA8a3wana2ObTT2NCNMGEue7tJxVX4YaLpwIAEnUSHSJOQ
 seI8HT2M1yEiSes9V+OuigHt3pKk68fMe0ZqDkovcd4QBlub6WTAPXWrXpbHtBCo
 k+hT8TlDYaDbQkNDzXtg
 =UyKm
 -----END PGP SIGNATURE-----

Merge tag 'pinctrl-v4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control updates from Linus Walleij:
 "Here is the bulk of pin control changes for the v4.2 series: Quite a
  lot of new SoC subdrivers and two new main drivers this time, apart
  from that business as usual.

  Details:

  Core functionality:
   - Enable exclusive pin ownership: it is possible to flag a pin
     controller so that GPIO and other functions cannot use a single pin
     simultaneously.

  New drivers:
   - NXP LPC18xx System Control Unit pin controller
   - Imagination Pistachio SoC pin controller

  New subdrivers:
   - Freescale i.MX7d SoC
   - Intel Sunrisepoint-H PCH
   - Renesas PFC R8A7793
   - Renesas PFC R8A7794
   - Mediatek MT6397, MT8127
   - SiRF Atlas 7
   - Allwinner A33
   - Qualcomm MSM8660
   - Marvell Armada 395
   - Rockchip RK3368

  Cleanups:
   - A big cleanup of the Marvell MVEBU driver rectifying it to
     correspond to reality
   - Drop platform device probing from the SH PFC driver, we are now a
     DT only shop for SuperH
   - Drop obsolte multi-platform check for SH PFC
   - Various janitorial: constification, grammar etc

  Improvements:
   - The AT91 GPIO portions now supports the set_multiple() feature
   - Split out SPI pins on the Xilinx Zynq
   - Support DTs without specific function nodes in the i.MX driver"

* tag 'pinctrl-v4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (99 commits)
  pinctrl: rockchip: add support for the rk3368
  pinctrl: rockchip: generalize perpin driver-strength setting
  pinctrl: sh-pfc: r8a7794: add SDHI pin groups
  pinctrl: sh-pfc: r8a7794: add MMCIF pin groups
  pinctrl: sh-pfc: add R8A7794 PFC support
  pinctrl: make pinctrl_register() return proper error code
  pinctrl: mvebu: armada-39x: add support for Armada 395 variant
  pinctrl: mvebu: armada-39x: add missing SATA functions
  pinctrl: mvebu: armada-39x: add missing PCIe functions
  pinctrl: mvebu: armada-38x: add ptp functions
  pinctrl: mvebu: armada-38x: add ua1 functions
  pinctrl: mvebu: armada-38x: add nand functions
  pinctrl: mvebu: armada-38x: add sata functions
  pinctrl: mvebu: armada-xp: add dram functions
  pinctrl: mvebu: armada-xp: add nand rb function
  pinctrl: mvebu: armada-xp: add spi1 function
  pinctrl: mvebu: armada-39x: normalize ref clock naming
  pinctrl: mvebu: armada-xp: rename spi to spi0
  pinctrl: mvebu: armada-370: align spi1 clock pin naming
  pinctrl: mvebu: armada-370: align VDD cpu-pd pin naming with datasheet
  ...
2015-06-24 19:21:02 -07:00
Linus Torvalds
d59b92f93d == Changes to existing drivers ==
- Supply MODULE_DEVICE_TABLE() to ensure probing
  - Constify struct; da9052_bl
  - Enable compile test; lcd_l4f00242t03, lcd_lms283fg05, backlight_gpio
  - Suspend/resume bugfix; lp855x_bl
  - devm_gpiod_get_optional() API fixup; pwm_bl
  - Error handling fixup; backlight
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJViXR7AAoJEFGvii+H/HdhZPsP/3kxXQNCt2voEj96MFLXWLwq
 1cbWbPboslNkdYPY9ygLTIqop/NOoEwiZMLE2Ea3uBDFewLU27vEX4waG+ILITjG
 /KxBAUXnNvFTrv9X+5hiOm6D+6kLk2M2eEygSC6f9QgBXP4VjApkVvpehdDKWT5x
 X11wlfx/TYhLcj5iggyW39fACp8Aig7LvOigS7fjfhwn1PAjWVw6NLrxmIlysWbH
 8qMfL0u8Ks5BYSh4xr5ATrB6OLx5Hu3mv9d8AK8o4XsRXOrFtR/dMThOLcJq/bFi
 4Vp4roi/30RSpow1yKPS3+TBRFbn2PG+6G6GVbWCO/uVkQiaxteyM79P0gEy5Y2a
 8WvV3vOMYY1/FszCOIfrJbj4No5/Bc2fObLXYDursLYMOPhUNrWeyRxbLfHTpKR3
 kim8XFGzLE5qFLqQWheqkHDq24y1iz6fl4YEZ8avf1rnDfzNJx8fnHk2uXZrW6ru
 HdjXbGC4pht1j6uM+DDROZ3iM5+2AMb/ASPLSCqslXXG82BCva3wasrMd3RttEQN
 bUltoWgWAMonIYoNx3CYwOGS9sWFllq1b0dTl1qDQPRT55sR4zcMZbMp16A0whJ7
 bJQMflbkgWDCeiMg5vXrImxmBBwfAe6IQ3yfEMUNNf+CzJxVnGjvpWvDWHo5iqhd
 Ul8lq/XdnXvpSSUugWhM
 =8cYG
 -----END PGP SIGNATURE-----

Merge tag 'backlight-for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight

Pull backlight updates from Lee Jones:
 "Changes to existing drivers:

   - supply MODULE_DEVICE_TABLE() to ensure probing
   - constify struct; da9052_bl
   - enable compile test; lcd_l4f00242t03, lcd_lms283fg05, backlight_gpio
   - suspend/resume bugfix; lp855x_bl
   - devm_gpiod_get_optional() API fixup; pwm_bl
   - error handling fixup; backlight"

* tag 'backlight-for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
  backlight: Change the return type of backlight_update_status() to int
  backlight: pwm_bl: Simplify usage of devm_gpiod_get_optional
  backlight: lp855x: Don't clear level on suspend/blank
  backlight: Allow compile test of GPIO consumers if !GPIOLIB
  video: backlight: da9052: Constify platform_device_id
  gpio-backlight: Discover driver during boot time
2015-06-24 18:57:00 -07:00
Larry Finger
8a8c35fadf mm: kmemleak_alloc_percpu() should follow the gfp from per_alloc()
Beginning at commit d52d3997f8 ("ipv6: Create percpu rt6_info"), the
following INFO splat is logged:

  ===============================
  [ INFO: suspicious RCU usage. ]
  4.1.0-rc7-next-20150612 #1 Not tainted
  -------------------------------
  kernel/sched/core.c:7318 Illegal context switch in RCU-bh read-side critical section!
  other info that might help us debug this:
  rcu_scheduler_active = 1, debug_locks = 0
   3 locks held by systemd/1:
   #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff815f0c8f>] rtnetlink_rcv+0x1f/0x40
   #1:  (rcu_read_lock_bh){......}, at: [<ffffffff816a34e2>] ipv6_add_addr+0x62/0x540
   #2:  (addrconf_hash_lock){+...+.}, at: [<ffffffff816a3604>] ipv6_add_addr+0x184/0x540
  stack backtrace:
  CPU: 0 PID: 1 Comm: systemd Not tainted 4.1.0-rc7-next-20150612 #1
  Hardware name: TOSHIBA TECRA A50-A/TECRA A50-A, BIOS Version 4.20   04/17/2014
  Call Trace:
    dump_stack+0x4c/0x6e
    lockdep_rcu_suspicious+0xe7/0x120
    ___might_sleep+0x1d5/0x1f0
    __might_sleep+0x4d/0x90
    kmem_cache_alloc+0x47/0x250
    create_object+0x39/0x2e0
    kmemleak_alloc_percpu+0x61/0xe0
    pcpu_alloc+0x370/0x630

Additional backtrace lines are truncated.  In addition, the above splat
is followed by several "BUG: sleeping function called from invalid
context at mm/slub.c:1268" outputs.  As suggested by Martin KaFai Lau,
these are the clue to the fix.  Routine kmemleak_alloc_percpu() always
uses GFP_KERNEL for its allocations, whereas it should follow the gfp
from its callers.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: <stable@vger.kernel.org>	[3.18+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:46 -07:00
Vlastimil Babka
0867a57c4f mm, thp: respect MPOL_PREFERRED policy with non-local node
Since commit 077fcf116c ("mm/thp: allocate transparent hugepages on
local node"), we handle THP allocations on page fault in a special way -
for non-interleave memory policies, the allocation is only attempted on
the node local to the current CPU, if the policy's nodemask allows the
node.

This is motivated by the assumption that THP benefits cannot offset the
cost of remote accesses, so it's better to fallback to base pages on the
local node (which might still be available, while huge pages are not due
to fragmentation) than to allocate huge pages on a remote node.

The nodemask check prevents us from violating e.g.  MPOL_BIND policies
where the local node is not among the allowed nodes.  However, the
current implementation can still give surprising results for the
MPOL_PREFERRED policy when the preferred node is different than the
current CPU's local node.

In such case we should honor the preferred node and not use the local
node, which is what this patch does.  If hugepage allocation on the
preferred node fails, we fall back to base pages and don't try other
nodes, with the same motivation as is done for the local node hugepage
allocations.  The patch also moves the MPOL_INTERLEAVE check around to
simplify the hugepage specific test.

The difference can be demonstrated using in-tree transhuge-stress test
on the following 2-node machine where half memory on one node was
occupied to show the difference.

> numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 24 25 26 27 28 29 30 31 32 33 34 35
node 0 size: 7878 MB
node 0 free: 3623 MB
node 1 cpus: 12 13 14 15 16 17 18 19 20 21 22 23 36 37 38 39 40 41 42 43 44 45 46 47
node 1 size: 8045 MB
node 1 free: 7818 MB
node distances:
node   0   1
  0:  10  21
  1:  21  10

Before the patch:
> numactl -p0 -C0 ./transhuge-stress
transhuge-stress: 2.197 s/loop, 0.276 ms/page,   7249.168 MiB/s 7962 succeed,    0 failed, 1786 different pages

> numactl -p0 -C12 ./transhuge-stress
transhuge-stress: 2.962 s/loop, 0.372 ms/page,   5376.172 MiB/s 7962 succeed,    0 failed, 3873 different pages

Number of successful THP allocations corresponds to free memory on node 0 in
the first case and node 1 in the second case, i.e. -p parameter is ignored and
cpu binding "wins".

After the patch:
> numactl -p0 -C0 ./transhuge-stress
transhuge-stress: 2.183 s/loop, 0.274 ms/page,   7295.516 MiB/s 7962 succeed,    0 failed, 1760 different pages

> numactl -p0 -C12 ./transhuge-stress
transhuge-stress: 2.878 s/loop, 0.361 ms/page,   5533.638 MiB/s 7962 succeed,    0 failed, 1750 different pages

> numactl -p1 -C0 ./transhuge-stress
transhuge-stress: 4.628 s/loop, 0.581 ms/page,   3440.893 MiB/s 7962 succeed,    0 failed, 3918 different pages

The -p parameter is respected regardless of cpu binding.

> numactl -C0 ./transhuge-stress
transhuge-stress: 2.202 s/loop, 0.277 ms/page,   7230.003 MiB/s 7962 succeed,    0 failed, 1750 different pages

> numactl -C12 ./transhuge-stress
transhuge-stress: 3.020 s/loop, 0.379 ms/page,   5273.324 MiB/s 7962 succeed,    0 failed, 3916 different pages

Without -p parameter, hugepage restriction to CPU-local node works as before.

Fixes: 077fcf116c ("mm/thp: allocate transparent hugepages on local node")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: <stable@vger.kernel.org>	[4.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:46 -07:00
Josef Bacik
afa2db2fb6 tmpfs: truncate prealloc blocks past i_size
One of the rocksdb people noticed that when you do something like this

    fallocate(fd, FALLOC_FL_KEEP_SIZE, 0, 10M)
    pwrite(fd, buf, 5M, 0)
    ftruncate(5M)

on tmpfs, the file would still take up 10M: which led to super fun
issues because we were getting ENOSPC before we thought we should be
getting ENOSPC.  This patch fixes the problem, and mirrors what all the
other fs'es do (and was agreed to be the correct behaviour at LSF).

I tested it locally to make sure it worked properly with the following

    xfs_io -f -c "falloc -k 0 10M" -c "pwrite 0 5M" -c "truncate 5M" file

Without the patch we have "Blocks: 20480", with the patch we have the
correct value of "Blocks: 10240".

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:45 -07:00
Zhu Guihua
c435a39057 mm/memory hotplug: print the last vmemmap region at the end of hot add memory
When hot add two nodes continuously, we found the vmemmap region info is
a bit messed.  The last region of node 2 is printed when node 3 hot
added, like the following:

  Initmem setup node 2 [mem 0x0000000000000000-0xffffffffffffffff]
   On node 2 totalpages: 0
   Built 2 zonelists in Node order, mobility grouping on.  Total pages: 16090539
   Policy zone: Normal
   init_memory_mapping: [mem 0x40000000000-0x407ffffffff]
    [mem 0x40000000000-0x407ffffffff] page 1G
    [ffffea1000000000-ffffea10001fffff] PMD -> [ffff8a077d800000-ffff8a077d9fffff] on node 2
    [ffffea1000200000-ffffea10003fffff] PMD -> [ffff8a077de00000-ffff8a077dffffff] on node 2
  ...
    [ffffea101f600000-ffffea101f9fffff] PMD -> [ffff8a074ac00000-ffff8a074affffff] on node 2
    [ffffea101fa00000-ffffea101fdfffff] PMD -> [ffff8a074a800000-ffff8a074abfffff] on node 2
  Initmem setup node 3 [mem 0x0000000000000000-0xffffffffffffffff]
   On node 3 totalpages: 0
   Built 3 zonelists in Node order, mobility grouping on.  Total pages: 16090539
   Policy zone: Normal
   init_memory_mapping: [mem 0x60000000000-0x607ffffffff]
    [mem 0x60000000000-0x607ffffffff] page 1G
    [ffffea101fe00000-ffffea101fffffff] PMD -> [ffff8a074a400000-ffff8a074a5fffff] on node 2 <=== node 2 ???
    [ffffea1800000000-ffffea18001fffff] PMD -> [ffff8a074a600000-ffff8a074a7fffff] on node 3
    [ffffea1800200000-ffffea18005fffff] PMD -> [ffff8a074a000000-ffff8a074a3fffff] on node 3
    [ffffea1800600000-ffffea18009fffff] PMD -> [ffff8a0749c00000-ffff8a0749ffffff] on node 3
  ...

The cause is the last region was missed at the and of hot add memory,
and p_start, p_end, node_start were not reset, so when hot add memory to
a new node, it will consider they are not contiguous blocks and print
the previous one.  So we print the last vmemmap region at the end of hot
add memory to avoid the confusion.

Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:45 -07:00
Piotr Kwapulinski
e37609bb36 mm/mmap.c: optimization of do_mmap_pgoff function
The simple check for zero length memory mapping may be performed
earlier.  So that in case of zero length memory mapping some unnecessary
code is not executed at all.  It does not make the code less readable
and saves some CPU cycles.

Signed-off-by: Piotr Kwapulinski <kwapulinski.piotr@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:45 -07:00
Catalin Marinas
93ada579b0 mm: kmemleak: optimise kmemleak_lock acquiring during kmemleak_scan
The kmemleak memory scanning uses finer grained object->lock spinlocks
primarily to avoid races with the memory block freeing.  However, the
pointer lookup in the rb tree requires the kmemleak_lock to be held.
This is currently done in the find_and_get_object() function for each
pointer-like location read during scanning.  While this allows a low
latency on kmemleak_*() callbacks on other CPUs, the memory scanning is
slower.

This patch moves the kmemleak_lock outside the scan_block() loop,
acquiring/releasing it only once per scanned memory block.  The
allow_resched logic is moved outside scan_block() and a new
scan_large_block() function is implemented which splits large blocks in
MAX_SCAN_SIZE chunks with cond_resched() calls in-between.  A redundant
(object->flags & OBJECT_NO_SCAN) check is also removed from
scan_object().

With this patch, the kmemleak scanning performance is significantly
improved: at least 50% with lock debugging disabled and over an order of
magnitude with lock proving enabled (on an arm64 system).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:45 -07:00
Catalin Marinas
9d5a4c730d mm: kmemleak: avoid deadlock on the kmemleak object insertion error path
While very unlikely (usually kmemleak or sl*b bug), the create_object()
function in mm/kmemleak.c may fail to insert a newly allocated object into
the rb tree.  When this happens, kmemleak disables itself and prints
additional information about the object already found in the rb tree.
Such printing is done with the parent->lock acquired, however the
kmemleak_lock is already held.  This is a potential race with the scanning
thread which acquires object->lock and kmemleak_lock in a

This patch removes the locking around the 'parent' object information
printing.  Such object cannot be freed or removed from object_tree_root
and object_list since kmemleak_lock is already held.  There is a very
small risk that some of the object data is being modified on another CPU
but the only downside is inconsistent information printing.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:45 -07:00
Catalin Marinas
5f369f374b mm: kmemleak: do not acquire scan_mutex in kmemleak_do_cleanup()
The kmemleak_do_cleanup() work thread already waits for the kmemleak_scan
thread to finish via kthread_stop().  Waiting in kthread_stop() while
scan_mutex is held may lead to deadlock if kmemleak_scan_thread() also
waits to acquire for scan_mutex.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:45 -07:00
Catalin Marinas
e781a9ab48 mm: kmemleak: fix delete_object_*() race when called on the same memory block
Calling delete_object_*() on the same pointer is not a standard use case
(unless there is a bug in the code calling kmemleak_free()).  However,
during kmemleak disabling (error or user triggered via /sys), there is a
potential race between kmemleak_free() calls on a CPU and
__kmemleak_do_cleanup() on a different CPU.

The current delete_object_*() implementation first performs a look-up
holding kmemleak_lock, increments the object->use_count and then
re-acquires kmemleak_lock to remove the object from object_tree_root and
object_list.

This patch simplifies the delete_object_*() mechanism to both look up
and remove an object from the object_tree_root and object_list
atomically (guarded by kmemleak_lock).  This allows safe concurrent
calls to delete_object_*() on the same pointer without additional
locking for synchronising the kmemleak_free_enabled flag.

A side effect is a slight improvement in the delete_object_*() performance
by avoiding acquiring kmemleak_lock twice and incrementing/decrementing
object->use_count.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:45 -07:00
Catalin Marinas
c5f3b1a51a mm: kmemleak: allow safe memory scanning during kmemleak disabling
The kmemleak scanning thread can run for minutes.  Callbacks like
kmemleak_free() are allowed during this time, the race being taken care
of by the object->lock spinlock.  Such lock also prevents a memory block
from being freed or unmapped while it is being scanned by blocking the
kmemleak_free() -> ...  -> __delete_object() function until the lock is
released in scan_object().

When a kmemleak error occurs (e.g.  it fails to allocate its metadata),
kmemleak_enabled is set and __delete_object() is no longer called on
freed objects.  If kmemleak_scan is running at the same time,
kmemleak_free() no longer waits for the object scanning to complete,
allowing the corresponding memory block to be freed or unmapped (in the
case of vfree()).  This leads to kmemleak_scan potentially triggering a
page fault.

This patch separates the kmemleak_free() enabling/disabling from the
overall kmemleak_enabled nob so that we can defer the disabling of the
object freeing tracking until the scanning thread completed.  The
kmemleak_free_part() is deliberately ignored by this patch since this is
only called during boot before the scanning thread started.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Vignesh Radhakrishnan <vigneshr@codeaurora.org>
Tested-by: Vignesh Radhakrishnan <vigneshr@codeaurora.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:45 -07:00
Tejun Heo
c2b42d3cad memcg: convert mem_cgroup->under_oom from atomic_t to int
memcg->under_oom tracks whether the memcg is under OOM conditions and is
an atomic_t counter managed with mem_cgroup_[un]mark_under_oom().  While
atomic_t appears to be simple synchronization-wise, when used as a
synchronization construct like here, it's trickier and more error-prone
due to weak memory ordering rules, especially around atomic_read(), and
false sense of security.

For example, both non-trivial read sites of memcg->under_oom are a bit
problematic although not being actually broken.

* mem_cgroup_oom_register_event()

  It isn't explicit what guarantees the memory ordering between event
  addition and memcg->under_oom check.  This isn't broken only because
  memcg_oom_lock is used for both event list and memcg->oom_lock.

* memcg_oom_recover()

  The lockless test doesn't have any explanation why this would be
  safe.

mem_cgroup_[un]mark_under_oom() are very cold paths and there's no point
in avoiding locking memcg_oom_lock there.  This patch converts
memcg->under_oom from atomic_t to int, puts their modifications under
memcg_oom_lock and documents why the lockless test in
memcg_oom_recover() is safe.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:45 -07:00
Tejun Heo
f4b90b70b7 memcg: remove unused mem_cgroup->oom_wakeups
Since commit 4942642080 ("mm: memcg: handle non-error OOM situations
more gracefully"), nobody uses mem_cgroup->oom_wakeups.  Remove it.

While at it, also fold memcg_wakeup_oom() into memcg_oom_recover() which
is its only user.  This cleanup was suggested by Michal.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:45 -07:00
Dan Streetman
d1dc6f1bcf frontswap: allow multiple backends
Change frontswap single pointer to a singly linked list of frontswap
implementations.  Update Xen tmem implementation as register no longer
returns anything.

Frontswap only keeps track of a single implementation; any
implementation that registers second (or later) will replace the
previously registered implementation, and gets a pointer to the previous
implementation that the new implementation is expected to pass all
frontswap functions to if it can't handle the function itself.  However
that method doesn't really make much sense, as passing that work on to
every implementation adds unnecessary work to implementations; instead,
frontswap should simply keep a list of all registered implementations
and try each implementation for any function.  Most importantly, neither
of the two currently existing frontswap implementations in the kernel
actually do anything with any previous frontswap implementation that
they replace when registering.

This allows frontswap to successfully manage multiple implementations by
keeping a list of them all.

Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:45 -07:00
Tony Luck
b05b9f5f9d x86, mirror: x86 enabling - find mirrored memory ranges
UEFI GetMemoryMap() uses a new attribute bit to mark mirrored memory
address ranges.  See UEFI 2.5 spec pages 157-158:

  http://www.uefi.org/sites/default/files/resources/UEFI%202_5.pdf

On EFI enabled systems scan the memory map and tell memblock about any
mirrored ranges.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Xiexiuqi <xiexiuqi@huawei.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:45 -07:00
Tony Luck
a3f5bafcc0 mm/memblock: allocate boot time data structures from mirrored memory
Try to allocate all boot time kernel data structures from mirrored
memory.

If we run out of mirrored memory print warnings, but fall back to using
non-mirrored memory to make sure that we still boot.

By number of bytes, most of what we allocate at boot time is the page
structures.  64 bytes per 4K page on x86_64 ...  or about 1.5% of total
system memory.  For workloads where the bulk of memory is allocated to
applications this may represent a useful improvement to system
availability since 1.5% of total memory might be a third of the memory
allocated to the kernel.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Xiexiuqi <xiexiuqi@huawei.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:45 -07:00
Tony Luck
fc6daaf931 mm/memblock: add extra "flags" to memblock to allow selection of memory based on attribute
Some high end Intel Xeon systems report uncorrectable memory errors as a
recoverable machine check.  Linux has included code for some time to
process these and just signal the affected processes (or even recover
completely if the error was in a read only page that can be replaced by
reading from disk).

But we have no recovery path for errors encountered during kernel code
execution.  Except for some very specific cases were are unlikely to ever
be able to recover.

Enter memory mirroring. Actually 3rd generation of memory mirroing.

Gen1: All memory is mirrored
	Pro: No s/w enabling - h/w just gets good data from other side of the
	     mirror
	Con: Halves effective memory capacity available to OS/applications

Gen2: Partial memory mirror - just mirror memory begind some memory controllers
	Pro: Keep more of the capacity
	Con: Nightmare to enable. Have to choose between allocating from
	     mirrored memory for safety vs. NUMA local memory for performance

Gen3: Address range partial memory mirror - some mirror on each memory
      controller
	Pro: Can tune the amount of mirror and keep NUMA performance
	Con: I have to write memory management code to implement

The current plan is just to use mirrored memory for kernel allocations.
This has been broken into two phases:

1) This patch series - find the mirrored memory, use it for boot time
   allocations

2) Wade into mm/page_alloc.c and define a ZONE_MIRROR to pick up the
   unused mirrored memory from mm/memblock.c and only give it out to
   select kernel allocations (this is still being scoped because
   page_alloc.c is scary).

This patch (of 3):

Add extra "flags" to memblock to allow selection of memory based on
attribute.  No functional changes

Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Xiexiuqi <xiexiuqi@huawei.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:44 -07:00