Commit Graph

362136 Commits

Author SHA1 Message Date
James Bottomley
871dd9286e block: fix max discard sectors limit
linux-v3.8-rc1 and later support for plug for blkdev_issue_discard with
commit 0cfbcafcae
(block: add plug for blkdev_issue_discard )

For example,
1) DISCARD rq-1 with size size 4GB
2) DISCARD rq-2 with size size 1GB

If these 2 discard requests get merged, final request size will be 5GB.

In this case, request's __data_len field may overflow as it can store
max 4GB(unsigned int).

This issue was observed while doing mkfs.f2fs on 5GB SD card:
https://lkml.org/lkml/2013/4/1/292

Info: sector size = 512
Info: total sectors = 11370496 (in 512bytes)
Info: zone aligned segment0 blkaddr: 512
[  257.789764] blk_update_request: bio idx 0 >= vcnt 0

mkfs process gets stuck in D state and I see the following in the dmesg:

[  257.789733] __end_that: dev mmcblk0: type=1, flags=122c8081
[  257.789764]   sector 4194304, nr/cnr 2981888/4294959104
[  257.789764]   bio df3840c0, biotail df3848c0, buffer   (null), len
1526726656
[  257.789764] blk_update_request: bio idx 0 >= vcnt 0
[  257.794921] request botched: dev mmcblk0: type=1, flags=122c8081
[  257.794921]   sector 4194304, nr/cnr 2981888/4294959104
[  257.794921]   bio df3840c0, biotail df3848c0, buffer   (null), len
1526726656

This patch fixes this issue.

Reported-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Tested-by: Max Filippov <jcmvbkbc@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-04-24 08:52:50 -06:00
Jun'ichi Nomura
e5072664f8 blkcg: fix "scheduling while atomic" in blk_queue_bypass_start
Since 749fefe677 in v3.7 ("block: lift the initial queue bypass mode
on blk_register_queue() instead of blk_init_allocated_queue()"),
the following warning appears when multipath is used with CONFIG_PREEMPT=y.

This patch moves blk_queue_bypass_start() before radix_tree_preload()
to avoid the sleeping call while preemption is disabled.

  BUG: scheduling while atomic: multipath/2460/0x00000002
  1 lock held by multipath/2460:
   #0:  (&md->type_lock){......}, at: [<ffffffffa019fb05>] dm_lock_md_type+0x17/0x19 [dm_mod]
  Modules linked in: ...
  Pid: 2460, comm: multipath Tainted: G        W    3.7.0-rc2 #1
  Call Trace:
   [<ffffffff810723ae>] __schedule_bug+0x6a/0x78
   [<ffffffff81428ba2>] __schedule+0xb4/0x5e0
   [<ffffffff814291e6>] schedule+0x64/0x66
   [<ffffffff8142773a>] schedule_timeout+0x39/0xf8
   [<ffffffff8108ad5f>] ? put_lock_stats+0xe/0x29
   [<ffffffff8108ae30>] ? lock_release_holdtime+0xb6/0xbb
   [<ffffffff814289e3>] wait_for_common+0x9d/0xee
   [<ffffffff8107526c>] ? try_to_wake_up+0x206/0x206
   [<ffffffff810c0eb8>] ? kfree_call_rcu+0x1c/0x1c
   [<ffffffff81428aec>] wait_for_completion+0x1d/0x1f
   [<ffffffff810611f9>] wait_rcu_gp+0x5d/0x7a
   [<ffffffff81061216>] ? wait_rcu_gp+0x7a/0x7a
   [<ffffffff8106fb18>] ? complete+0x21/0x53
   [<ffffffff810c0556>] synchronize_rcu+0x1e/0x20
   [<ffffffff811dd903>] blk_queue_bypass_start+0x5d/0x62
   [<ffffffff811ee109>] blkcg_activate_policy+0x73/0x270
   [<ffffffff81130521>] ? kmem_cache_alloc_node_trace+0xc7/0x108
   [<ffffffff811f04b3>] cfq_init_queue+0x80/0x28e
   [<ffffffffa01a1600>] ? dm_blk_ioctl+0xa7/0xa7 [dm_mod]
   [<ffffffff811d8c41>] elevator_init+0xe1/0x115
   [<ffffffff811e229f>] ? blk_queue_make_request+0x54/0x59
   [<ffffffff811dd743>] blk_init_allocated_queue+0x8c/0x9e
   [<ffffffffa019ffcd>] dm_setup_md_queue+0x36/0xaa [dm_mod]
   [<ffffffffa01a60e6>] table_load+0x1bd/0x2c8 [dm_mod]
   [<ffffffffa01a7026>] ctl_ioctl+0x1d6/0x236 [dm_mod]
   [<ffffffffa01a5f29>] ? table_clear+0xaa/0xaa [dm_mod]
   [<ffffffffa01a7099>] dm_ctl_ioctl+0x13/0x17 [dm_mod]
   [<ffffffff811479fc>] do_vfs_ioctl+0x3fb/0x441
   [<ffffffff811b643c>] ? file_has_perm+0x8a/0x99
   [<ffffffff81147aa0>] sys_ioctl+0x5e/0x82
   [<ffffffff812010be>] ? trace_hardirqs_on_thunk+0x3a/0x3f
   [<ffffffff814310d9>] system_call_fastpath+0x16/0x1b

Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-04-09 15:01:21 +02:00
Namjae Jeon
fdc6fdc52e Documentation: cfq-iosched: update documentation help for cfq tunables
Add the documentation text for latency, target_latency & group_idle
tunnable parameters in the block/cfq-iosched.txt.
Also fix few typo(spelling) mistakes.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>

Language somewhat modified by Jens.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-04-09 14:57:06 +02:00
Jens Axboe
64f8de4da7 Merge branch 'writeback-workqueue' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq into for-3.10/core
Tejun writes:

-----

This is the pull request for the earlier patchset[1] with the same
name.  It's only three patches (the first one was committed to
workqueue tree) but the merge strategy is a bit involved due to the
dependencies.

* Because the conversion needs features from wq/for-3.10,
  block/for-3.10/core is based on rc3, and wq/for-3.10 has conflicts
  with rc3, I pulled mainline (rc5) into wq/for-3.10 to prevent those
  workqueue conflicts from flaring up in block tree.

* Resolving the issue that Jan and Dave raised about debugging
  requires arch-wide changes.  The patchset is being worked on[2] but
  it'll have to go through -mm after these changes show up in -next,
  and not included in this pull request.

The three commits are located in the following git branch.

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git writeback-workqueue

Pulling it into block/for-3.10/core produces a conflict in
drivers/md/raid5.c between the following two commits.

  e3620a3ad5 ("MD RAID5: Avoid accessing gendisk or queue structs when not available")
  2f6db2a707 ("raid5: use bio_reset()")

The conflict is trivial - one removes an "if ()" conditional while the
other removes "rbi->bi_next = NULL" right above it.  We just need to
remove both.  The merged branch is available at

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git block-test-merge

so that you can use it for verification.  The test merge commit has
proper merge description.

While these changes are a bit of pain to route, they make code simpler
and even have, while minute, measureable performance gain[3] even on a
workload which isn't particularly favorable to showing the benefits of
this conversion.

----

Fixed up the conflict.

Conflicts:
	drivers/md/raid5.c

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-04-02 10:04:39 +02:00
Tejun Heo
b5c872ddb7 writeback: expose the bdi_wq workqueue
There are cases where userland wants to tweak the priority and
affinity of writeback flushers.  Expose bdi_wq to userland by setting
WQ_SYSFS.  It appears under /sys/bus/workqueue/devices/writeback/ and
allows adjusting maximum concurrency level, cpumask and nice level.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-01 19:08:06 -07:00
Tejun Heo
839a8e8660 writeback: replace custom worker pool implementation with unbound workqueue
Writeback implements its own worker pool - each bdi can be associated
with a worker thread which is created and destroyed dynamically.  The
worker thread for the default bdi is always present and serves as the
"forker" thread which forks off worker threads for other bdis.

there's no reason for writeback to implement its own worker pool when
using unbound workqueue instead is much simpler and more efficient.
This patch replaces custom worker pool implementation in writeback
with an unbound workqueue.

The conversion isn't too complicated but the followings are worth
mentioning.

* bdi_writeback->last_active, task and wakeup_timer are removed.
  delayed_work ->dwork is added instead.  Explicit timer handling is
  no longer necessary.  Everything works by either queueing / modding
  / flushing / canceling the delayed_work item.

* bdi_writeback_thread() becomes bdi_writeback_workfn() which runs off
  bdi_writeback->dwork.  On each execution, it processes
  bdi->work_list and reschedules itself if there are more things to
  do.

  The function also handles low-mem condition, which used to be
  handled by the forker thread.  If the function is running off a
  rescuer thread, it only writes out limited number of pages so that
  the rescuer can serve other bdis too.  This preserves the flusher
  creation failure behavior of the forker thread.

* INIT_LIST_HEAD(&bdi->bdi_list) is used to tell
  bdi_writeback_workfn() about on-going bdi unregistration so that it
  always drains work_list even if it's running off the rescuer.  Note
  that the original code was broken in this regard.  Under memory
  pressure, a bdi could finish unregistration with non-empty
  work_list.

* The default bdi is no longer special.  It now is treated the same as
  any other bdi and bdi_cap_flush_forker() is removed.

* BDI_pending is no longer used.  Removed.

* Some tracepoints become non-applicable.  The following TPs are
  removed - writeback_nothread, writeback_wake_thread,
  writeback_wake_forker_thread, writeback_thread_start,
  writeback_thread_stop.

Everything, including devices coming and going away and rescuer
operation under simulated memory pressure, seems to work fine in my
test setup.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
2013-04-01 19:08:06 -07:00
Tejun Heo
181387da2d writeback: remove unused bdi_pending_list
There's no user left.  Remove it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Fengguang Wu <fengguang.wu@intel.com>
2013-04-01 19:08:06 -07:00
Tejun Heo
229641a6f1 Linux 3.9-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQEcBAABAgAGBQJRWLTrAAoJEHm+PkMAQRiGe8oH/iMy48mecVWvxVZn74Tx3Cef
 xmW/PnAIj28EhSPqK49N/Ow6AfQToFKf7AP0ge20KAf5teTq95AY+tH74DAANt8F
 BjKXXTZiR5xwBvRkq7CR5wDcCvEcBAAz8fgTEd6SEDB2d2VXFf5eKdKUqt1avTCh
 Z6Hup5kuwX+ddtwY2DCBXtp2n6fL0Rm5yLzY1A3OOBye1E7VyLTF7M5BR603Q44P
 4kRLxn8+R7jy3hTuZIhAeoS8TKUoBwVk7DmKxEzrhTHZVOmvwE9lEHybRnIyOpd/
 k1JnbRbiPsLsCVFOn10SQkGDAIk00lro3tuWP2C1ljERiD/OOh5Ui9nXYAhMkbI=
 =q15K
 -----END PGP SIGNATURE-----

Merge tag 'v3.9-rc5' into wq/for-3.10

Writeback conversion to workqueue will be based on top of wq/for-3.10
branch to take advantage of custom attrs and NUMA support for unbound
workqueues.  Mainline currently contains two commits which result in
non-trivial merge conflicts with wq/for-3.10 and because
block/for-3.10/core is based on v3.9-rc3 which contains one of the
conflicting commits, we need a pre-merge-window merge anyway.  Let's
pull v3.9-rc5 into wq/for-3.10 so that the block tree doesn't suffer
from workqueue merge conflicts.

The two conflicts and their resolutions:

* e68035fb65 ("workqueue: convert to idr_alloc()") in mainline changes
  worker_pool_assign_id() to use idr_alloc() instead of the old idr
  interface.  worker_pool_assign_id() goes through multiple locking
  changes in wq/for-3.10 causing the following conflict.

  static int worker_pool_assign_id(struct worker_pool *pool)
  {
	  int ret;

  <<<<<<< HEAD
	  lockdep_assert_held(&wq_pool_mutex);

	  do {
		  if (!idr_pre_get(&worker_pool_idr, GFP_KERNEL))
			  return -ENOMEM;
		  ret = idr_get_new(&worker_pool_idr, pool, &pool->id);
	  } while (ret == -EAGAIN);
  =======
	  mutex_lock(&worker_pool_idr_mutex);
	  ret = idr_alloc(&worker_pool_idr, pool, 0, 0, GFP_KERNEL);
	  if (ret >= 0)
		  pool->id = ret;
	  mutex_unlock(&worker_pool_idr_mutex);
  >>>>>>> c67bf5361e

	  return ret < 0 ? ret : 0;
  }

  We want locking from the former and idr_alloc() usage from the
  latter, which can be combined to the following.

  static int worker_pool_assign_id(struct worker_pool *pool)
  {
	  int ret;

	  lockdep_assert_held(&wq_pool_mutex);

	  ret = idr_alloc(&worker_pool_idr, pool, 0, 0, GFP_KERNEL);
	  if (ret >= 0) {
		  pool->id = ret;
		  return 0;
	  }
	  return ret;
   }

* eb2834285c ("workqueue: fix possible pool stall bug in
  wq_unbind_fn()") updated wq_unbind_fn() such that it has single
  larger for_each_std_worker_pool() loop instead of two separate loops
  with a schedule() call inbetween.  wq/for-3.10 renamed
  pool->assoc_mutex to pool->manager_mutex causing the following
  conflict (earlier function body and comments omitted for brevity).

  static void wq_unbind_fn(struct work_struct *work)
  {
  ...
		  spin_unlock_irq(&pool->lock);
  <<<<<<< HEAD
		  mutex_unlock(&pool->manager_mutex);
	  }
  =======
		  mutex_unlock(&pool->assoc_mutex);
  >>>>>>> c67bf5361e

		  schedule();

  <<<<<<< HEAD
	  for_each_cpu_worker_pool(pool, cpu)
  =======
  >>>>>>> c67bf5361e
		  atomic_set(&pool->nr_running, 0);

		  spin_lock_irq(&pool->lock);
		  wake_up_worker(pool);
		  spin_unlock_irq(&pool->lock);
	  }
  }

  The resolution is mostly trivial.  We want the control flow of the
  latter with the rename of the former.

  static void wq_unbind_fn(struct work_struct *work)
  {
  ...
		  spin_unlock_irq(&pool->lock);
		  mutex_unlock(&pool->manager_mutex);

		  schedule();

		  atomic_set(&pool->nr_running, 0);

		  spin_lock_irq(&pool->lock);
		  wake_up_worker(pool);
		  spin_unlock_irq(&pool->lock);
	  }
  }

Signed-off-by: Tejun Heo <tj@kernel.org>
2013-04-01 18:45:36 -07:00
Tejun Heo
d55262c4d1 workqueue: update sysfs interface to reflect NUMA awareness and a kernel param to disable NUMA affinity
Unbound workqueues are now NUMA aware.  Let's add some control knobs
and update sysfs interface accordingly.

* Add kernel param workqueue.numa_disable which disables NUMA affinity
  globally.

* Replace sysfs file "pool_id" with "pool_ids" which contain
  node:pool_id pairs.  This change is userland-visible but "pool_id"
  hasn't seen a release yet, so this is okay.

* Add a new sysf files "numa" which can toggle NUMA affinity on
  individual workqueues.  This is implemented as attrs->no_numa whichn
  is special in that it isn't part of a pool's attributes.  It only
  affects how apply_workqueue_attrs() picks which pools to use.

After "pool_ids" change, first_pwq() doesn't have any user left.
Removed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-04-01 11:23:38 -07:00
Tejun Heo
4c16bd327c workqueue: implement NUMA affinity for unbound workqueues
Currently, an unbound workqueue has single current, or first, pwq
(pool_workqueue) to which all new work items are queued.  This often
isn't optimal on NUMA machines as workers may jump around across node
boundaries and work items get assigned to workers without any regard
to NUMA affinity.

This patch implements NUMA affinity for unbound workqueues.  Instead
of mapping all entries of numa_pwq_tbl[] to the same pwq,
apply_workqueue_attrs() now creates a separate pwq covering the
intersecting CPUs for each NUMA node which has online CPUs in
@attrs->cpumask.  Nodes which don't have intersecting possible CPUs
are mapped to pwqs covering whole @attrs->cpumask.

As CPUs come up and go down, the pool association is changed
accordingly.  Changing pool association may involve allocating new
pools which may fail.  To avoid failing CPU_DOWN, each workqueue
always keeps a default pwq which covers whole attrs->cpumask which is
used as fallback if pool creation fails during a CPU hotplug
operation.

This ensures that all work items issued on a NUMA node is executed on
the same node as long as the workqueue allows execution on the CPUs of
the node.

As this maps a workqueue to multiple pwqs and max_active is per-pwq,
this change the behavior of max_active.  The limit is now per NUMA
node instead of global.  While this is an actual change, max_active is
already per-cpu for per-cpu workqueues and primarily used as safety
mechanism rather than for active concurrency control.  Concurrency is
usually limited from workqueue users by the number of concurrently
active work items and this change shouldn't matter much.

v2: Fixed pwq freeing in apply_workqueue_attrs() error path.  Spotted
    by Lai.

v3: The previous version incorrectly made a workqueue spanning
    multiple nodes spread work items over all online CPUs when some of
    its nodes don't have any desired cpus.  Reimplemented so that NUMA
    affinity is properly updated as CPUs go up and down.  This problem
    was spotted by Lai Jiangshan.

v4: destroy_workqueue() was putting wq->dfl_pwq and then clearing it;
    however, wq may be freed at any time after dfl_pwq is put making
    the clearing use-after-free.  Clear wq->dfl_pwq before putting it.

v5: apply_workqueue_attrs() was leaking @tmp_attrs, @new_attrs and
    @pwq_tbl after success.  Fixed.

    Retry loop in wq_update_unbound_numa_attrs() isn't necessary as
    application of new attrs is excluded via CPU hotplug.  Removed.

    Documentation on CPU affinity guarantee on CPU_DOWN added.

    All changes are suggested by Lai Jiangshan.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-04-01 11:23:36 -07:00
Tejun Heo
dce90d47c4 workqueue: introduce put_pwq_unlocked()
Factor out lock pool, put_pwq(), unlock sequence into
put_pwq_unlocked().  The two existing places are converted and there
will be more with NUMA affinity support.

This is to prepare for NUMA affinity support for unbound workqueues
and doesn't introduce any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-04-01 11:23:35 -07:00
Tejun Heo
1befcf3073 workqueue: introduce numa_pwq_tbl_install()
Factor out pool_workqueue linking and installation into numa_pwq_tbl[]
from apply_workqueue_attrs() into numa_pwq_tbl_install().  link_pwq()
is made safe to call multiple times.  numa_pwq_tbl_install() links the
pwq, installs it into numa_pwq_tbl[] at the specified node and returns
the old entry.

@last_pwq is removed from link_pwq() as the return value of the new
function can be used instead.

This is to prepare for NUMA affinity support for unbound workqueues.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-04-01 11:23:35 -07:00
Tejun Heo
e50aba9aea workqueue: use NUMA-aware allocation for pool_workqueues
Use kmem_cache_alloc_node() with @pool->node instead of
kmem_cache_zalloc() when allocating a pool_workqueue so that it's
allocated on the same node as the associated worker_pool.  As there's
no no kmem_cache_zalloc_node(), move zeroing to init_pwq().

This was suggested by Lai Jiangshan.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-04-01 11:23:35 -07:00
Tejun Heo
f147f29eb7 workqueue: break init_and_link_pwq() into two functions and introduce alloc_unbound_pwq()
Break init_and_link_pwq() into init_pwq() and link_pwq() and move
unbound-workqueue specific handling into apply_workqueue_attrs().
Also, factor out unbound pool and pool_workqueue allocation into
alloc_unbound_pwq().

This reorganization is to prepare for NUMA affinity and doesn't
introduce any functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-04-01 11:23:35 -07:00
Tejun Heo
df2d5ae499 workqueue: map an unbound workqueues to multiple per-node pool_workqueues
Currently, an unbound workqueue has only one "current" pool_workqueue
associated with it.  It may have multple pool_workqueues but only the
first pool_workqueue servies new work items.  For NUMA affinity, we
want to change this so that there are multiple current pool_workqueues
serving different NUMA nodes.

Introduce workqueue->numa_pwq_tbl[] which is indexed by NUMA node and
points to the pool_workqueue to use for each possible node.  This
replaces first_pwq() in __queue_work() and workqueue_congested().

numa_pwq_tbl[] is currently initialized to point to the same
pool_workqueue as first_pwq() so this patch doesn't make any behavior
changes.

v2: Use rcu_dereference_raw() in unbound_pwq_by_node() as the function
    may be called only with wq->mutex held.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-04-01 11:23:35 -07:00
Tejun Heo
2728fd2f09 workqueue: move hot fields of workqueue_struct to the end
Move wq->flags and ->cpu_pwqs to the end of workqueue_struct and align
them to the cacheline.  These two fields are used in the work item
issue path and thus hot.  The scheduled NUMA affinity support will add
dispatch table at the end of workqueue_struct and relocating these two
fields will allow us hitting only single cacheline on hot paths.

Note that wq->pwqs isn't moved although it currently is being used in
the work item issue path for unbound workqueues.  The dispatch table
mentioned above will replace its use in the issue path, so it will
become cold once NUMA support is implemented.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-04-01 11:23:35 -07:00
Tejun Heo
ecf6881ff3 workqueue: make workqueue->name[] fixed len
Currently workqueue->name[] is of flexible length.  We want to use the
flexible field for something more useful and there isn't much benefit
in allowing arbitrary name length anyway.  Make it fixed len capping
at 24 bytes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-04-01 11:23:34 -07:00
Tejun Heo
6029a91829 workqueue: add workqueue->unbound_attrs
Currently, when exposing attrs of an unbound workqueue via sysfs, the
workqueue_attrs of first_pwq() is used as that should equal the
current state of the workqueue.

The planned NUMA affinity support will make unbound workqueues make
use of multiple pool_workqueues for different NUMA nodes and the above
assumption will no longer hold.  Introduce workqueue->unbound_attrs
which records the current attrs in effect and use it for sysfs instead
of first_pwq()->attrs.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-04-01 11:23:34 -07:00
Tejun Heo
f3f90ad469 workqueue: determine NUMA node of workers accourding to the allowed cpumask
When worker tasks are created using kthread_create_on_node(),
currently only per-cpu ones have the matching NUMA node specified.
All unbound workers are always created with NUMA_NO_NODE.

Now that an unbound worker pool may have an arbitrary cpumask
associated with it, this isn't optimal.  Add pool->node which is
determined by the pool's cpumask.  If the pool's cpumask is contained
inside a NUMA node proper, the pool is associated with that node, and
all workers of the pool are created on that node.

This currently only makes difference for unbound worker pools with
cpumask contained inside single NUMA node, but this will serve as
foundation for making all unbound pools NUMA-affine.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-04-01 11:23:34 -07:00
Tejun Heo
e3c916a4c7 workqueue: drop 'H' from kworker names of unbound worker pools
Currently, all workqueue workers which have negative nice value has
'H' postfixed to their names.  This is necessary for per-cpu workers
as they use the CPU number instead of pool->id to identify the pool
and the 'H' postfix is the only thing distinguishing normal and
highpri workers.

As workers for unbound pools use pool->id, the 'H' postfix is purely
informational.  TASK_COMM_LEN is 16 and after the static part and
delimiters, there are only five characters left for the pool and
worker IDs.  We're expecting to have more unbound pools with the
scheduled NUMA awareness support.  Let's drop the non-essential 'H'
postfix from unbound kworker name.

While at it, restructure kthread_create*() invocation to help future
NUMA related changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-04-01 11:23:32 -07:00
Tejun Heo
bce903809a workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[]
Unbound workqueues are going to be NUMA-affine.  Add wq_numa_tbl_len
and wq_numa_possible_cpumask[] in preparation.  The former is the
highest NUMA node ID + 1 and the latter is masks of possibles CPUs for
each NUMA node.

This patch only introduces these.  Future patches will make use of
them.

v2: NUMA initialization move into wq_numa_init().  Also, the possible
    cpumask array is not created if there aren't multiple nodes on the
    system.  wq_numa_enabled bool added.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-04-01 11:23:32 -07:00
Tejun Heo
a892cacc7f workqueue: move pwq_pool_locking outside of get/put_unbound_pool()
The scheduled NUMA affinity support for unbound workqueues would need
to walk workqueues list and pool related operations on each workqueue.

Move wq_pool_mutex locking out of get/put_unbound_pool() to their
callers so that pool operations can be performed while walking the
workqueues list, which is also protected by wq_pool_mutex.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-04-01 11:23:32 -07:00
Tejun Heo
4862125b02 workqueue: fix memory leak in apply_workqueue_attrs()
apply_workqueue_attrs() wasn't freeing temp attrs variable @new_attrs
in its success path.  Fix it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-04-01 11:23:31 -07:00
Tejun Heo
13e2e55601 workqueue: fix unbound workqueue attrs hashing / comparison
29c91e9912 ("workqueue: implement attribute-based unbound worker_pool
management") implemented attrs based worker_pool matching.  It tried
to avoid false negative when comparing cpumasks with custom hash
function; unfortunately, the hash and comparison functions fail to
ignore CPUs which are not possible.  It incorrectly assumed that
bitmap_copy() skips leftover bits in the last word of bitmap and
cpumask_equal() ignores impossible CPUs.

This patch updates attrs->cpumask handling such that impossible CPUs
are properly ignored.

* Hash and copy functions no longer do anything special.  They expect
  their callers to clear impossible CPUs.

* alloc_workqueue_attrs() initializes the cpumask to cpu_possible_mask
  instead of setting all bits and explicit cpumask_setall() for
  unbound_std_wq_attrs[] in init_workqueues() is dropped.

* apply_workqueue_attrs() is now responsible for ignoring impossible
  CPUs.  It makes a copy of @attrs and clears impossible CPUs before
  doing anything else.

Signed-off-by: Tejun Heo <tj@kernel.org>
2013-04-01 11:23:31 -07:00
Tejun Heo
bc0caf099d workqueue: fix race condition in unbound workqueue free path
8864b4e59 ("workqueue: implement get/put_pwq()") implemented pwq
(pool_workqueue) refcnting which frees workqueue when the last pwq
goes away.  It determined whether it was the last pwq by testing
wq->pwqs is empty.  Unfortunately, the test was done outside wq->mutex
and multiple pwq release could race and try to free wq multiple times
leading to oops.

Test wq->pwqs emptiness while holding wq->mutex.

Signed-off-by: Tejun Heo <tj@kernel.org>
2013-04-01 11:23:31 -07:00
Linus Torvalds
07961ac7c0 Linux 3.9-rc5 2013-03-31 15:12:43 -07:00
Linus Torvalds
0bb44280b5 Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma
Pull slave-dmaengine fixes from Vinod Koul:
 "Two fixes for slave-dmaengine.

  The first one is for making slave_id value correct for dw_dmac and
  the other one fixes the endieness in DT parsing"

* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
  dw_dmac: adjust slave_id accordingly to request line base
  dmaengine: dw_dma: fix endianess for DT xlate function
2013-03-31 11:41:47 -07:00
Linus Torvalds
a7b436d356 Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
 "For a some fixes for Kernel 3.9:
   - subsystem build fix when VIDEO_DEV=y, VIDEO_V4L2=m and I2C=m
   - compilation fix for arm multiarch preventing IR_RX51 to be selected
   - regression fix at bttv crop logic
   - s5p-mfc/m5mols/exynos: a few fixes for cameras on exynos hardware"

* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  [media] [REGRESSION] bt8xx: Fix too large height in cropcap
  [media] fix compilation with both V4L2 and I2C as 'm'
  [media] m5mols: Fix bug in stream on handler
  [media] s5p-fimc: Do not attempt to disable not enabled media pipeline
  [media] s5p-mfc: Fix encoder control 15 issue
  [media] s5p-mfc: Fix frame skip bug
  [media] s5p-fimc: send valid m2m ctx to fimc_m2m_job_finish
  [media] exynos-gsc: send valid m2m ctx to gsc_m2m_job_finish
  [media] fimc-lite: Fix the variable type to avoid possible crash
  [media] fimc-lite: Initialize 'step' field in fimc_lite_ctrl structure
  [media] ir: IR_RX51 only works on OMAP2
2013-03-31 11:40:33 -07:00
Linus Torvalds
d299c29039 for-linus-20130331
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJRWHWXAAoJEPfTWPspceCmyGIQANHJlvexkzqkPsxzfA+hKi36
 90ramlHmOIGLqxKk8pJLEhAJAEAEmR1sN5FfPBeiI3I7E8RT+vuPHCOCqXhAXgku
 5saB294H0OGeaGsw4cxIl4KQFxBwa2PDskFq5irV4AYJd1IMolwUdyELr2wv37g1
 d4vJJUeJIUBON47pZjVfV96nQ4utISMjtHLeBmvpeREcmfqn2I1qKyYcEXxDkNeX
 DWRIyeJ/UApCxEWbZcxFgaVNVWE/9nGg861HgnuazCu+OiwUVhfMpS+azj/dtl8G
 wdZLhokjXZBi9yd70h8mZ9XReIqMbTUP6k4texNrUQXgHaN87OVUiCgbzL5JBfUB
 Iq2bmlCkSIUOwxV9qOsv1MfNo9TJTB2ZcOZJH381BAqf/ua1ouGzZu9KLTxmalZi
 yIO3oTpifELxgfCV7O/HGEP1jkRTROwpRFjErqPOFx+Jr9vhT+xj/LGZYgAzaVhX
 1HCXMtp8xjRBZa7TrHq/FZY2iO4fS3JZNGg0XaIVim8yHiFWfMnGxOg4TSs5rqEy
 AyPg3rFVufb7n9zSdRpYfgAg6gYK/pgHZ7OcyFTt44wRrGSWpMlR8TMxJREytbJx
 JjKlO2qRuIbBJXnoBS1J3W22Yt8NN/TaaMIoVL4GHD3fUYMbL88NugsjIZ5VKe/N
 /sw12PuUld2rTR+FghHV
 =u2RH
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20130331' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "Alright, this time from 10K up in the air.

  Collection of fixes that have been queued up since the merge window
  opened, hence postponed until later in the cycle.  The pull request
  contains:

   - A bunch of fixes for the xen blk front/back driver.

   - A round of fixes for the new IBM RamSan driver, fixing various
     nasty issues.

   - Fixes for multiple drives from Wei Yongjun, bad handling of return
     values and wrong pointer math.

   - A fix for loop properly killing partitions when being detached."

* tag 'for-linus-20130331' of git://git.kernel.dk/linux-block: (25 commits)
  mg_disk: fix error return code in mg_probe()
  rsxx: remove unused variable
  rsxx: enable error return of rsxx_eeh_save_issued_dmas()
  block: removes dynamic allocation on stack
  Block: blk-flush: Fixed indent code style
  cciss: fix invalid use of sizeof in cciss_find_cfgtables()
  loop: cleanup partitions when detaching loop device
  loop: fix error return code in loop_add()
  mtip32xx: fix error return code in mtip_pci_probe()
  xen-blkfront: remove frame list from blk_shadow
  xen-blkfront: pre-allocate pages for requests
  xen-blkback: don't store dev_bus_addr
  xen-blkfront: switch from llist to list
  xen-blkback: fix foreach_grant_safe to handle empty lists
  xen-blkfront: replace kmalloc and then memcpy with kmemdup
  xen-blkback: fix dispatch_rw_block_io() error path
  rsxx: fix missing unlock on error return in rsxx_eeh_remap_dmas()
  Adding in EEH support to the IBM FlashSystem 70/80 device driver
  block: IBM RamSan 70/80 error message bug fix.
  block: IBM RamSan 70/80 branding changes.
  ...
2013-03-31 11:38:59 -07:00
Paul Walmsley
dbf520a9d7 Revert "lockdep: check that no locks held at freeze time"
This reverts commit 6aa9707099.

Commit 6aa9707099 ("lockdep: check that no locks held at freeze time")
causes problems with NFS root filesystems.  The failures were noticed on
OMAP2 and 3 boards during kernel init:

  [ BUG: swapper/0/1 still has locks held! ]
  3.9.0-rc3-00344-ga937536 #1 Not tainted
  -------------------------------------
  1 lock held by swapper/0/1:
   #0:  (&type->s_umount_key#13/1){+.+.+.}, at: [<c011e84c>] sget+0x248/0x574

  stack backtrace:
    rpc_wait_bit_killable
    __wait_on_bit
    out_of_line_wait_on_bit
    __rpc_execute
    rpc_run_task
    rpc_call_sync
    nfs_proc_get_root
    nfs_get_root
    nfs_fs_mount_common
    nfs_try_mount
    nfs_fs_mount
    mount_fs
    vfs_kern_mount
    do_mount
    sys_mount
    do_mount_root
    mount_root
    prepare_namespace
    kernel_init_freeable
    kernel_init

Although the rootfs mounts, the system is unstable.  Here's a transcript
from a PM test:

  http://www.pwsan.com/omap/testlogs/test_v3.9-rc3/20130317194234/pm/37xxevm/37xxevm_log.txt

Here's what the test log should look like:

  http://www.pwsan.com/omap/testlogs/test_v3.8/20130218214403/pm/37xxevm/37xxevm_log.txt

Mailing list discussion is here:

  http://lkml.org/lkml/2013/3/4/221

Deal with this for v3.9 by reverting the problem commit, until folks can
figure out the right long-term course of action.

Signed-off-by: Paul Walmsley <paul@pwsan.com>
Cc: Mandeep Singh Baines <msb@chromium.org>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: Shawn Guo <shawn.guo@linaro.org>
Cc: <maciej.rutecki@gmail.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ben Chan <benchan@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-31 11:38:33 -07:00
Linus Torvalds
13d2080db3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
 "This includes the bug-fix for a >= v3.8-rc1 regression specific to
  iscsi-target persistent reservation conflict handling (CC'ed to
  stable), and a tcm_vhost patch to drop VIRTIO_RING_F_EVENT_IDX usage
  so that in-flight qemu vhost-scsi-pci device code can detect the
  proper vhost feature bits.

  Also, there are two more tcm_vhost patches still being discussed by
  MST and Asias for v3.9 that will be required for the in-flight qemu
  vhost-scsi-pci device patch to function properly, and that should
  (hopefully) be the last target fixes for this round."

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  target: Fix RESERVATION_CONFLICT status regression for iscsi-target special case
  tcm_vhost: Avoid VIRTIO_RING_F_EVENT_IDX feature bit
2013-03-30 13:13:05 -07:00
Andy Shevchenko
bce95c63ef dw_dmac: adjust slave_id accordingly to request line base
On some hardware configurations we have got the request line with the offset.
The patch introduces convert_slave_id() helper for that cases. The request line
base is came from the driver data provided by the platform_device_id table.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2013-03-30 04:34:07 +05:30
Arnd Bergmann
f73bb9b355 dmaengine: dw_dma: fix endianess for DT xlate function
As reported by Wu Fengguang's build robot tracking sparse warnings, the
dma_spec arguments in the dw_dma_xlate are already byte swapped on
little-endian platforms and must not get swapped again. This code is
currently not used anywhere, but will be used in Linux 3.10 when the
ARM SPEAr platform starts using the generic DMA DT binding.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2013-03-30 04:34:07 +05:30
Rafael J. Wysocki
46a1f21a67 PNP: List Rafael Wysocki as a maintainer
The Adam Belay's e-mail address in MAINTAINERS under PNP SUPPORT
is not valid any more and I started to maintain that code in the
meantime as a matter of fact, so list myself as a maintainer of it
along with Bjorn and remove the Adam's entry from it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-29 15:28:33 -07:00
Linus Torvalds
b92eded4b7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph fix from Sage Weil:
 "This fixes a regression introduced during the last merge window when
  mapping non-existent images."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  rbd: don't zero-fill non-image object requests
2013-03-29 11:47:43 -07:00
Alex Elder
6e2a4505db rbd: don't zero-fill non-image object requests
A result of ENOENT from a read request for an object that's part of
an rbd image indicates that there is a hole in that portion of the
image.  Similarly, a short read for such an object indicates that
the remainder of the read should be interpreted a full read with
zeros filling out the end of the request.

This behavior is not correct for objects that are not backing rbd
image data.  Currently rbd_img_obj_request_callback() assumes it
should be done for all objects.

Change rbd_img_obj_request_callback() so it only does this zeroing
for image objects.  Encapsulate that special handling in its own
function.  Add an assertion that the image object request is a bio
request, since we assume that (and we currently don't support any
other types).

This resolves a problem identified here:
    http://tracker.ceph.com/issues/4559

The regression was introduced by bf0d5f503d.

Reported-by: Dan van der Ster <dan@vanderster.com>
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-off-by: Sage Weil <sage@inktank.com>
2013-03-29 11:32:07 -07:00
Linus Torvalds
3615db41c4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "We've had a busy two weeks of bug fixing.  The biggest patches in here
  are some long standing early-enospc problems (Josef) and a very old
  race where compression and mmap combine forces to lose writes (me).
  I'm fairly sure the mmap bug goes all the way back to the introduction
  of the compression code, which is proof that fsx doesn't trigger every
  possible mmap corner after all.

  I'm sure you'll notice one of these is from this morning, it's a small
  and isolated use-after-free fix in our scrub error reporting.  I
  double checked it here."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: don't drop path when printing out tree errors in scrub
  Btrfs: fix wrong return value of btrfs_lookup_csum()
  Btrfs: fix wrong reservation of csums
  Btrfs: fix double free in the btrfs_qgroup_account_ref()
  Btrfs: limit the global reserve to 512mb
  Btrfs: hold the ordered operations mutex when waiting on ordered extents
  Btrfs: fix space accounting for unlink and rename
  Btrfs: fix space leak when we fail to reserve metadata space
  Btrfs: fix EIO from btrfs send in is_extent_unchanged for punched holes
  Btrfs: fix race between mmap writes and compression
  Btrfs: fix memory leak in btrfs_create_tree()
  Btrfs: fix locking on ROOT_REPLACE operations in tree mod log
  Btrfs: fix missing qgroup reservation before fallocating
  Btrfs: handle a bogus chunk tree nicely
  Btrfs: update to use fs_state bit
2013-03-29 11:13:25 -07:00
Len Brown
ed176886b6 ia64 idle: delete stale (*idle)() function pointer
Commit 3e7fc708eb ("ia64 idle: delete pm_idle") in 3.9-rc1 didn't
finish the job, leaving an un-initialized reference to (*idle)().

[ Haven't seen a crash from this - but seems like we are just being
  lucky that "idle" is zero so it does get initialized before we jump to
  randomland  - Len ]

Reported-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-29 11:12:25 -07:00
Linus Torvalds
67e17c1100 Merge branch 'for-curr' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull arc architecture fixes from Vineet Gupta:
 "This includes fix for a serious bug in DMA mapping API, make
  allyesconfig wreckage, removal of bogus email-list placeholder in
  MAINTAINERS, a typo in ptrace helper code and last remaining changes
  for syscall ABI v3 which we are finally starting to transition-to
  internally.

  The request is late than I intended to - but I was held up with
  debugging a timer link list corruption, for which a proposed fix to
  generic timer code was sent out to lkml/tglx earlier today."

* 'for-curr' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: Fix the typo in event identifier flags used by ptrace
  arc: fix dma_address assignment during dma_map_sg()
  ARC: Remove SET_PERSONALITY (tracks cross-arch change)
  ARC: ABIv3: fork/vfork wrappers not needed in "no-legacy-syscall" ABI
  ARC: ABIv3: Print the correct ABI ver
  ARC: make allyesconfig build breakages
  ARC: MAINTAINERS update for ARC
2013-03-29 11:00:43 -07:00
Josef Bacik
d8fe29e9de Btrfs: don't drop path when printing out tree errors in scrub
A user reported a panic where we were panicing somewhere in
tree_backref_for_extent from scrub_print_warning.  He only captured the trace
but looking at scrub_print_warning we drop the path right before we mess with
the extent buffer to print out a bunch of stuff, which isn't right.  So fix this
by dropping the path after we use the eb if we need to.  Thanks,

Cc: stable@vger.kernel.org
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-29 10:18:59 -04:00
Nicholas Bellinger
f85eda8d75 target: Fix RESERVATION_CONFLICT status regression for iscsi-target special case
This patch fixes a regression introduced in v3.8-rc1 code where a failed
target_check_reservation() check in target_setup_cmd_from_cdb() was causing
an incorrect SAM_STAT_GOOD status to be returned during a WRITE operation
performed by an unregistered / unreserved iscsi initiator port.

This regression is only effecting iscsi-target due to a special case check
for TCM_RESERVATION_CONFLICT within iscsi_target_erl1.c:iscsit_execute_cmd(),
and was still correctly disallowing WRITE commands from backend submission
for unregistered / unreserved initiator ports, while returning the incorrect
SAM_STAT_GOOD status due to the missing SAM_STAT_RESERVATION_CONFLICT
assignment.

This regression was first introduced with:

commit de103c93af
Author: Christoph Hellwig <hch@lst.de>
Date:   Tue Nov 6 12:24:09 2012 -0800

    target: pass sense_reason as a return value

Go ahead and re-add the missing SAM_STAT_RESERVATION_CONFLICT assignment
during a target_check_reservation() failure, so that iscsi-target code
sends the correct SCSI status.

All other fabrics using target_submit_cmd_*() with a RESERVATION_CONFLICT
call to transport_generic_request_failure() are not effected by this bug.

Reported-by: Jeff Leung <jleung@curriegrad2004.ca>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-03-28 23:42:47 -07:00
Nicholas Bellinger
5dade71050 tcm_vhost: Avoid VIRTIO_RING_F_EVENT_IDX feature bit
This patch adds a VHOST_SCSI_FEATURES mask minus VIRTIO_RING_F_EVENT_IDX
so that vhost-scsi-pci userspace will strip this feature bit once
GET_FEATURES reports it as being unsupported on the host.

This is to avoid a bug where ->handle_kicks() are missed when EVENT_IDX
is enabled by default in userspace code.

(mst: Rename to VHOST_SCSI_FEATURES + add comment)

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Asias He <asias@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-03-28 23:42:47 -07:00
Michel Lespinasse
09a9f1d278 Revert "mm: introduce VM_POPULATE flag to better deal with racy userspace programs"
This reverts commit 1869305009 ("mm: introduce VM_POPULATE flag to
better deal with racy userspace programs").

VM_POPULATE only has any effect when userspace plays racy games with
vmas by trying to unmap and remap memory regions that mmap or mlock are
operating on.

Also, the only effect of VM_POPULATE when userspace plays such games is
that it avoids populating new memory regions that get remapped into the
address range that was being operated on by the original mmap or mlock
calls.

Let's remove VM_POPULATE as there isn't any strong argument to mandate a
new vm_flag.

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-28 17:45:51 -07:00
Linus Torvalds
0776ce03b1 USB fixes for 3.9-rc4
Here are some USB fixes to resolve issues reported recently, as well as a new
 device id for the ftdi_sio driver.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlFUaA0ACgkQMUfUDdst+yk0AACfe9iitBiGERSO4NsyIvypoJ1q
 vOgAoKek8fiPmTKrZl18n79oX28qU9x2
 =Oee3
 -----END PGP SIGNATURE-----

Merge tag 'usb-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg Kroah-Hartman:
 "Here are some USB fixes to resolve issues reported recently, as well
  as a new device id for the ftdi_sio driver."

* tag 'usb-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: ftdi_sio: Add support for Mitsubishi FX-USB-AW/-BD
  usb: Fix compile error by selecting USB_OTG_UTILS
  USB: serial: fix hang when opening port
  USB: EHCI: fix bug in iTD/siTD DMA pool allocation
  xhci: Don't warn on empty ring for suspended devices.
  usb: xhci: Fix TRB transfer length macro used for Event TRB.
  usb/acpi: binding xhci root hub usb port with ACPI
  usb: add find_raw_port_number callback to struct hc_driver()
  usb: xhci: fix build warning
2013-03-28 15:54:25 -07:00
Linus Torvalds
045ecc26a0 TTY/serial fixes for 3.9-rc4
Here are some tty/serial driver fixes for 3.9.
 
 The big thing here is the fix for the huge mess we caused renaming the 8250
 driver accidentally in the 3.7 kernel release, without realizing that there
 were users of the module options that suddenly broke.  This is now resolved,
 and, to top the injury off, we have a backwards-compatible option for those
 users who got used to the new name since 3.7.  Ugh, sorry about that.
 
 Other than that, some other minor fixes for issues that have been reported by
 users.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlFUdtEACgkQMUfUDdst+ylklACgx47C3qGEXaPiu6yh3W/D/x97
 N1kAoMICQC6K1O1Nge3J5uqbNmYDP6Bv
 =wO2Z
 -----END PGP SIGNATURE-----

Merge tag 'tty-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull TTY/serial fixes from Greg Kroah-Hartman:
 "Here are some tty/serial driver fixes for 3.9.

  The big thing here is the fix for the huge mess we caused renaming the
  8250 driver accidentally in the 3.7 kernel release, without realizing
  that there were users of the module options that suddenly broke.  This
  is now resolved, and, to top the injury off, we have a backwards-
  compatible option for those users who got used to the new name since
  3.7.  Ugh, sorry about that.

  Other than that, some other minor fixes for issues that have been
  reported by users."

* tag 'tty-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  Xilinx: ARM: UART: clear pending irqs before enabling irqs
  TTY: 8250, deprecated 8250_core.* options
  TTY: 8250, revert module name change
  serial: 8250_pci: Add WCH CH352 quirk to avoid Xscale detection
  tty: atmel_serial_probe(): index of atmel_ports[] fix
2013-03-28 15:53:33 -07:00
Linus Torvalds
865752ed0a Staging driver fixes for 3.9-rc4
Here are two tiny staging driver fixes to resolve issues that have been
 reported.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlFUdV8ACgkQMUfUDdst+ylT7wCg06/tBMtv81D43yyiBBMKo3qy
 86gAoNBbunIiInbIOELTxOOdrwZzpANj
 =AVoV
 -----END PGP SIGNATURE-----

Merge tag 'staging-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg Kroah-Hartman:
 "Here are two tiny staging driver fixes to resolve issues that have
  been reported."

* tag 'staging-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: comedi: s626: fix continuous acquisition
  staging: zcache: fix typo "64_BIT"
2013-03-28 15:52:54 -07:00
Linus Torvalds
97f084b8e6 sysfs fixes for 3.9-rc4
Here are two fixes for sysfs that resolve issues that have been found by the
 Trinity fuzz tool, causing oopses in sysfs.  They both have been in linux-next
 for a while to ensure that they do not cause any other problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlFUdHUACgkQMUfUDdst+ykk+ACfWz6U/DW97ibFusDj+Sys1pEt
 essAn15ZFy/pT5myhCvxqVH0MHrIftup
 =BM+Q
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull sysfs fixes from Greg Kroah-Hartman:
 "Here are two fixes for sysfs that resolve issues that have been found
  by the Trinity fuzz tool, causing oopses in sysfs.  They both have
  been in linux-next for a while to ensure that they do not cause any
  other problems."

* tag 'driver-core-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  sysfs: handle failure path correctly for readdir()
  sysfs: fix race between readdir and lseek
2013-03-28 15:52:14 -07:00
Linus Torvalds
1b6a4db220 char/misc driver fixes for 3.9-rc4
Here are some small char/misc driver fixes that resolve issues recently
 reported against the 3.9-rc kernels.  All have been in linux-next for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlFUdOAACgkQMUfUDdst+ykb9QCeKb0WfCxqwPFZDCAbIiyX9AyA
 1OMAoJU7WJo1/wpfyyTLr6RuN8E0X0p/
 =1+Ic
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg Kroah-Hartman:
 "Here are some small char/misc driver fixes that resolve issues
  recently reported against the 3.9-rc kernels.  All have been in
  linux-next for a while."

* tag 'char-misc-3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  VMCI: Fix process-to-process DRGAMs.
  mei: ME hardware reset needs to be synchronized
  mei: add mei_stop function to stop mei device
  extcon: max77693: Initialize register of MUIC device to bring up it without platform data
  extcon: max77693: Fix bug of wrong pointer when platform data is not used
  extcon: max8997: Check the pointer of platform data to protect null pointer error
2013-03-28 15:51:33 -07:00
Linus Torvalds
dfca53fb16 ACPI and power management fixes for 3.9-rc5
- Fix for a recent cpufreq regression related to acpi-cpufreq and
   suspend/resume from Viresh Kumar.
 
 - cpufreq stats reference counting fix from Viresh Kumar.
 
 - intel_pstate driver fixes from Dirk Brandewie and
   Konrad Rzeszutek Wilk.
 
 - New ACPI suspend blacklist entry for Sony Vaio VGN-FW21M from
   Fabio Valentini.
 
 - ACPI Platform Error Interface (APEI) fix from Chen Gong.
 
 - PCI root bridge hotplug locking fix from Yinghai Lu.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABAgAGBQJRVETOAAoJEKhOf7ml8uNs30kP/3GsKWacHsaIPdhIiHQC3f91
 HMLabrW7NE7ldrOoXzj1lTHsIc1TQHm722vyI+aF061HErfkF8Jkdi5rkIai8VMq
 IJXe4CtwuuCi0SeKQsV9ymiQanTrgsP/AlGV5x/KM/As8dvAVW/1+Ln/gXAnH0IJ
 /Onqf3eA4NBw/1Hjg7AGHGeCmOlDHvcetHF7eX4MaiYZHEwuy/a7jswH4aNOjwgx
 GZtbrnwUO6OtDKv6ie//1EbP753VrkHDtK3jzIy2lUA5YyLmr0XOTvy4uQh2n/r7
 tVTqsVoNZNA4En0YUspfsWwBruUic3ra9qVTrJqn7Fzymyr+TgyCQQzSUGrOGy2a
 wY0vwMAwm1dMwAsZWPhnui6aqvu0bbg0u7sxCZQs8WapdtjxPdD7iIhRk2YU4wOZ
 omtejW0thUIwEmHWgBPo9rFvfZmxy9hb044UfhkLI9xBmuTVrDb/HqeVPA767ZoO
 k7IVg1DG4Ye6xboCIILfluoUAsc3DvkHpCIvWVujK3pF5j/M9ptt3d8eXDFIzmWD
 J6tm9ARkQoUPRAs6751cG1N0nP++ZlErYseU/h6eXoC0rkeC/WbGyxIumii4xJhg
 Gs6GGeM8OgQ/7Fat68kA2Z7jriY+MTteLbq1Sl3PBlfdURaceOXkTIVrxXo33Itq
 jQiEKa1CbJDi6OBKog8K
 =0bjZ
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI and power management fixes from Rafael J Wysocki:

 - Fix for a recent cpufreq regression related to acpi-cpufreq and
   suspend/resume from Viresh Kumar.

 - cpufreq stats reference counting fix from Viresh Kumar.

 - intel_pstate driver fixes from Dirk Brandewie and Konrad Rzeszutek
   Wilk.

 - New ACPI suspend blacklist entry for Sony Vaio VGN-FW21M from Fabio
   Valentini.

 - ACPI Platform Error Interface (APEI) fix from Chen Gong.

 - PCI root bridge hotplug locking fix from Yinghai Lu.

* tag 'pm+acpi-3.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PCI / ACPI: hold acpi_scan_lock during root bus hotplug
  ACPI / APEI: fix error status check condition for CPER
  ACPI / PM: fix suspend and resume on Sony Vaio VGN-FW21M
  cpufreq: acpi-cpufreq: Don't set policy->related_cpus from .init()
  cpufreq: stats: do cpufreq_cpu_put() corresponding to cpufreq_cpu_get()
  intel-pstate: Use #defines instead of hard-coded values.
  cpufreq / intel_pstate: Fix calculation of current frequency
  cpufreq / intel_pstate: Add function to check that all MSRs are valid
2013-03-28 13:47:31 -07:00
Linus Torvalds
8b1e54c48f Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
 "This removes IPsec ESN support from the talitos/caam drivers since
  they were implemented incorrectly, causing interoperability problems
  if ESN is used with them."

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  Revert "crypto: caam - add IPsec ESN support"
  Revert "crypto: talitos - add IPsec ESN support"
2013-03-28 13:46:20 -07:00