Commit Graph

321327 Commits

Author SHA1 Message Date
Tejun Heo
203b42f731 workqueue: make deferrable delayed_work initializer names consistent
Initalizers for deferrable delayed_work are confused.

* __DEFERRED_WORK_INITIALIZER()
* DECLARE_DEFERRED_WORK()
* INIT_DELAYED_WORK_DEFERRABLE()

Rename them to

* __DEFERRABLE_WORK_INITIALIZER()
* DECLARE_DEFERRABLE_WORK()
* INIT_DEFERRABLE_WORK()

This patch doesn't cause any functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-21 13:18:23 -07:00
Tejun Heo
ee64e7f697 workqueue: cosmetic whitespace updates for macro definitions
Consistently use the last tab position for '\' line continuation in
complex macro definitions.  This is to help the following patches.

This patch is cosmetic.

Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-21 13:18:23 -07:00
Tejun Heo
56e6a08154 Merge branch 'timers/core' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into for-3.7 2012-08-21 12:31:12 -07:00
Tejun Heo
c5f66e99b7 timer: Implement TIMER_IRQSAFE
Timer internals are protected with irq-safe locks but timer execution
isn't, so a timer being dequeued for execution and its execution
aren't atomic against IRQs.  This makes it impossible to wait for its
completion from IRQ handlers and difficult to shoot down a timer from
IRQ handlers.

This issue caused some issues for delayed_work interface.  Because
there's no way to reliably shoot down delayed_work->timer from IRQ
handlers, __cancel_delayed_work() can't share the logic to steal the
target delayed_work with cancel_delayed_work_sync(), and can only
steal delayed_works which are on queued on timer.  Similarly, the
pending mod_delayed_work() can't be used from IRQ handlers.

This patch adds a new timer flag TIMER_IRQSAFE, which makes the timer
to be executed without enabling IRQ after dequeueing such that its
dequeueing and execution are atomic against IRQ handlers.

This makes it safe to wait for the timer's completion from IRQ
handlers, for example, using del_timer_sync().  It can never be
executing on the local CPU and if executing on other CPUs it won't be
interrupted until done.

This will enable simplifying delayed_work cancel/mod interface.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/1344449428-24962-5-git-send-email-tj@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-08-21 16:28:31 +02:00
Tejun Heo
fc683995a6 timer: Clean up timer initializers
Over time, timer initializers became messy with unnecessarily
duplicated code which are inconsistently spread across timer.h and
timer.c.

This patch cleans up timer initializers.

* timer.c::__init_timer() is renamed to do_init_timer().

* __TIMER_INITIALIZER() added.  It takes @flags and all initializers
  are wrappers around it.

* init_timer[_on_stack]_key() now take @flags.

* __init_timer[_on_stack]() added.  They take @flags and all init
  macros are wrappers around them.

* __setup_timer[_on_stack]() added.  It uses __init_timer() and takes
  @flags.  All setup macros are wrappers around the two.

Note that this patch doesn't add missing init/setup combinations -
e.g. init_timer_deferrable_on_stack().  Adding missing ones is
trivial.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/1344449428-24962-4-git-send-email-tj@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-08-21 16:28:30 +02:00
Tejun Heo
5a9af38d05 timer: Relocate declarations of init_timer_on_stack_key()
init_timer_on_stack_key() is used by init macro definitions.  Move
init_timer_on_stack_key() and destroy_timer_on_stack() declarations
above init macro defs.  This will make the next init cleanup patch
easier to read.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/1344449428-24962-3-git-send-email-tj@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-08-21 16:28:30 +02:00
Tejun Heo
e52b1db37b timer: Generalize timer->base flags handling
To prepare for addition of another flag, generalize timer->base flags
handling.

* Rename from TBASE_*_FLAG to TIMER_* and make them LU constants.

* Define and use TIMER_FLAG_MASK for flags masking so that multiple
  flags can be handled correctly.

* Don't dereference timer->base directly even if
  !tbase_get_deferrable().  All two such places are already passed in
  @base, so use it instead.

* Make sure tvec_base's alignment is large enough for timer->base
  flags using BUILD_BUG_ON().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/1344449428-24962-2-git-send-email-tj@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-08-21 16:28:30 +02:00
Tejun Heo
3b07e9ca26 workqueue: deprecate system_nrt[_freezable]_wq
system_nrt[_freezable]_wq are now spurious.  Mark them deprecated and
convert all users to system[_freezable]_wq.

If you're cc'd and wondering what's going on: Now all workqueues are
non-reentrant, so there's no reason to use system_nrt[_freezable]_wq.
Please use system[_freezable]_wq instead.

This patch doesn't make any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-By: Lai Jiangshan <laijs@cn.fujitsu.com>

Cc: Jens Axboe <axboe@kernel.dk>
Cc: David Airlie <airlied@linux.ie>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: David Howells <dhowells@redhat.com>
2012-08-20 14:51:24 -07:00
Tejun Heo
43829731dd workqueue: deprecate flush[_delayed]_work_sync()
flush[_delayed]_work_sync() are now spurious.  Mark them deprecated
and convert all users to flush[_delayed]_work().

If you're cc'd and wondering what's going on: Now all workqueues are
non-reentrant and the regular flushes guarantee that the work item is
not pending or running on any CPU on return, so there's no reason to
use the sync flushes at all and they're going away.

This patch doesn't make any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mattia Dongili <malattia@linux.it>
Cc: Kent Yoder <key@linux.vnet.ibm.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: Bryan Wu <bryan.wu@canonical.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-wireless@vger.kernel.org
Cc: Anton Vorontsov <cbou@mail.ru>
Cc: Sangbeom Kim <sbkim73@samsung.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Petr Vandrovec <petr@vandrovec.name>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Avi Kivity <avi@redhat.com>
2012-08-20 14:51:24 -07:00
Tejun Heo
ae930e0f4e workqueue: gut system_nrt[_freezable]_wq()
Now that all workqueues are non-reentrant, system[_freezable]_wq() are
equivalent to system_nrt[_freezable]_wq().  Replace the latter with
wrappers around system[_freezable]_wq().  The wrapping goes through
inline functions so that __deprecated can be added easily.

Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-20 14:51:23 -07:00
Tejun Heo
606a5020b9 workqueue: gut flush[_delayed]_work_sync()
Now that all workqueues are non-reentrant, flush[_delayed]_work_sync()
are equivalent to flush[_delayed]_work().  Drop the separate
implementation and make them thin wrappers around
flush[_delayed]_work().

* start_flush_work() no longer takes @wait_executing as the only left
  user - flush_work() - always sets it to %true.

* __cancel_work_timer() uses flush_work() instead of wait_on_work().

Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-20 14:51:23 -07:00
Tejun Heo
dbf2576e37 workqueue: make all workqueues non-reentrant
By default, each per-cpu part of a bound workqueue operates separately
and a work item may be executing concurrently on different CPUs.  The
behavior avoids some cross-cpu traffic but leads to subtle weirdities
and not-so-subtle contortions in the API.

* There's no sane usefulness in allowing a single work item to be
  executed concurrently on multiple CPUs.  People just get the
  behavior unintentionally and get surprised after learning about it.
  Most either explicitly synchronize or use non-reentrant/ordered
  workqueue but this is error-prone.

* flush_work() can't wait for multiple instances of the same work item
  on different CPUs.  If a work item is executing on cpu0 and then
  queued on cpu1, flush_work() can only wait for the one on cpu1.

  Unfortunately, work items can easily cross CPU boundaries
  unintentionally when the queueing thread gets migrated.  This means
  that if multiple queuers compete, flush_work() can't even guarantee
  that the instance queued right before it is finished before
  returning.

* flush_work_sync() was added to work around some of the deficiencies
  of flush_work().  In addition to the usual flushing, it ensures that
  all currently executing instances are finished before returning.
  This operation is expensive as it has to walk all CPUs and at the
  same time fails to address competing queuer case.

  Incorrectly using flush_work() when flush_work_sync() is necessary
  is an easy error to make and can lead to bugs which are difficult to
  reproduce.

* Similar problems exist for flush_delayed_work[_sync]().

Other than the cross-cpu access concern, there's no benefit in
allowing parallel execution and it's plain silly to have this level of
contortion for workqueue which is widely used from core code to
extremely obscure drivers.

This patch makes all workqueues non-reentrant.  If a work item is
executing on a different CPU when queueing is requested, it is always
queued to that CPU.  This guarantees that any given work item can be
executing on one CPU at maximum and if a work item is queued and
executing, both are on the same CPU.

The only behavior change which may affect workqueue users negatively
is that non-reentrancy overrides the affinity specified by
queue_work_on().  On a reentrant workqueue, the affinity specified by
queue_work_on() is always followed.  Now, if the work item is
executing on one of the CPUs, the work item will be queued there
regardless of the requested affinity.  I've reviewed all workqueue
users which request explicit affinity, and, fortunately, none seems to
be crazy enough to exploit parallel execution of the same work item.

This adds an additional busy_hash lookup if the work item was
previously queued on a different CPU.  This shouldn't be noticeable
under any sane workload.  Work item queueing isn't a very
high-frequency operation and they don't jump across CPUs all the time.
In a micro benchmark to exaggerate this difference - measuring the
time it takes for two work items to repeatedly jump between two CPUs a
number (10M) of times with busy_hash table densely populated, the
difference was around 3%.

While the overhead is measureable, it is only visible in pathological
cases and the difference isn't huge.  This change brings much needed
sanity to workqueue and makes its behavior consistent with timer.  I
think this is the right tradeoff to make.

This enables significant simplification of workqueue API.
Simplification patches will follow.

Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-20 14:51:23 -07:00
Valentin Ilie
044c782ce3 workqueue: fix checkpatch issues
Fixed some checkpatch warnings.

tj: adapted to wq/for-3.7 and massaged pr_xxx() format strings a bit.

Signed-off-by: Valentin Ilie <valentin.ilie@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
LKML-Reference: <1345326762-21747-1-git-send-email-valentin.ilie@gmail.com>
2012-08-20 13:37:07 -07:00
Joonsoo Kim
7635d2fd7f workqueue: use system_highpri_wq for unbind_work
To speed cpu down processing up, use system_highpri_wq.
As scheduling priority of workers on it is higher than system_wq and
it is not contended by other normal works on this cpu, work on it
is processed faster than system_wq.

tj: CPU up/downs care quite a bit about latency these days.  This
    shouldn't hurt anything and makes sense.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-16 14:21:16 -07:00
Joonsoo Kim
e2b6a6d570 workqueue: use system_highpri_wq for highpri workers in rebind_workers()
In rebind_workers(), we do inserting a work to rebind to cpu for busy workers.
Currently, in this case, we use only system_wq. This makes a possible
error situation as there is mismatch between cwq->pool and worker->pool.

To prevent this, we should use system_highpri_wq for highpri worker
to match theses. This implements it.

tj: Rephrased comment a bit.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-16 14:21:15 -07:00
Joonsoo Kim
1aabe902ca workqueue: introduce system_highpri_wq
Commit 3270476a6c ('workqueue: reimplement
WQ_HIGHPRI using a separate worker_pool') introduce separate worker pool
for HIGHPRI. When we handle busyworkers for gcwq, it can be normal worker
or highpri worker. But, we don't consider this difference in rebind_workers(),
we use just system_wq for highpri worker. It makes mismatch between
cwq->pool and worker->pool.

It doesn't make error in current implementation, but possible in the future.
Now, we introduce system_highpri_wq to use proper cwq for highpri workers
in rebind_workers(). Following patch fix this issue properly.

tj: Even apart from rebinding, having system_highpri_wq generally
    makes sense.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-16 14:21:15 -07:00
Joonsoo Kim
e42986de48 workqueue: change value of lcpu in __queue_delayed_work_on()
We assign cpu id into work struct's data field in __queue_delayed_work_on().
In current implementation, when work is come in first time,
current running cpu id is assigned.
If we do __queue_delayed_work_on() with CPU A on CPU B,
__queue_work() invoked in delayed_work_timer_fn() go into
the following sub-optimal path in case of WQ_NON_REENTRANT.

	gcwq = get_gcwq(cpu);
	if (wq->flags & WQ_NON_REENTRANT &&
		(last_gcwq = get_work_gcwq(work)) && last_gcwq != gcwq) {

Change lcpu to @cpu and rechange lcpu to local cpu if lcpu is WORK_CPU_UNBOUND.
It is sufficient to prevent to go into sub-optimal path.

tj: Slightly rephrased the comment.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-16 14:21:15 -07:00
Joonsoo Kim
b75cac9368 workqueue: correct req_cpu in trace_workqueue_queue_work()
When we do tracing workqueue_queue_work(), it records requested cpu.
But, if !(@wq->flag & WQ_UNBOUND) and @cpu is WORK_CPU_UNBOUND,
requested cpu is changed as local cpu.
In case of @wq->flag & WQ_UNBOUND, above change is not occured,
therefore it is reasonable to correct it.

Use temporary local variable for storing requested cpu.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-16 14:21:15 -07:00
Joonsoo Kim
330dad5b9c workqueue: use enum value to set array size of pools in gcwq
Commit 3270476a6c ('workqueue: reimplement
WQ_HIGHPRI using a separate worker_pool') introduce separate worker_pool
for HIGHPRI. Although there is NR_WORKER_POOLS enum value which represent
size of pools, definition of worker_pool in gcwq doesn't use it.
Using it makes code robust and prevent future mistakes.
So change code to use this enum value.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-16 14:21:15 -07:00
Tejun Heo
23657bb192 workqueue: add missing wmb() in clear_work_data()
Any operation which clears PENDING should be preceded by a wmb to
guarantee that the next PENDING owner sees all the changes made before
PENDING release.

There are only two places where PENDING is cleared -
set_work_cpu_and_clear_pending() and clear_work_data().  The caller of
the former already does smp_wmb() but the latter doesn't have any.

Move the wmb above set_work_cpu_and_clear_pending() into it and add
one to clear_work_data().

There hasn't been any report related to this issue, and, given how
clear_work_data() is used, it is extremely unlikely to have caused any
actual problems on any architecture.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
2012-08-13 17:08:19 -07:00
Tejun Heo
1265057fa0 workqueue: fix CPU binding of flush_delayed_work[_sync]()
delayed_work encodes the workqueue to use and the last CPU in
delayed_work->work.data while it's on timer.  The target CPU is
implicitly recorded as the CPU the timer is queued on and
delayed_work_timer_fn() queues delayed_work->work to the CPU it is
running on.

Unfortunately, this leaves flush_delayed_work[_sync]() no way to find
out which CPU the delayed_work was queued for when they try to
re-queue after killing the timer.  Currently, it chooses the local CPU
flush is running on.  This can unexpectedly move a delayed_work queued
on a specific CPU to another CPU and lead to subtle errors.

There isn't much point in trying to save several bytes in struct
delayed_work, which is already close to a hundred bytes on 64bit with
all debug options turned off.  This patch adds delayed_work->cpu to
remember the CPU it's queued for.

Note that if the timer is migrated during CPU down, the work item
could be queued to the downed global_cwq after this change.  As a
detached global_cwq behaves like an unbound one, this doesn't change
much for the delayed_work.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
2012-08-13 16:27:55 -07:00
Tejun Heo
41f63c5359 workqueue: use mod_delayed_work() instead of cancel + queue
Convert delayed_work users doing cancel_delayed_work() followed by
queue_delayed_work() to mod_delayed_work().

Most conversions are straight-forward.  Ones worth mentioning are,

* drivers/edac/edac_mc.c: edac_mc_workq_setup() converted to always
  use mod_delayed_work() and cancel loop in
  edac_mc_reset_delay_period() is dropped.

* drivers/platform/x86/thinkpad_acpi.c: No need to remember whether
  watchdog is active or not.  @fan_watchdog_active and related code
  dropped.

* drivers/power/charger-manager.c: Seemingly a lot of
  delayed_work_pending() abuse going on here.
  [delayed_]work_pending() are unsynchronized and racy when used like
  this.  I converted one instance in fullbatt_handler().  Please
  conver the rest so that it invokes workqueue APIs for the intended
  target state rather than trying to game work item pending state
  transitions.  e.g. if timer should be modified - call
  mod_delayed_work(), canceled - call cancel_delayed_work[_sync]().

* drivers/thermal/thermal_sys.c: thermal_zone_device_set_polling()
  simplified.  Note that round_jiffies() calls in this function are
  meaningless.  round_jiffies() work on absolute jiffies not delta
  delay used by delayed_work.

v2: Tomi pointed out that __cancel_delayed_work() users can't be
    safely converted to mod_delayed_work().  They could be calling it
    from irq context and if that happens while delayed_work_timer_fn()
    is running, it could deadlock.  __cancel_delayed_work() users are
    dropped.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Anton Vorontsov <cbouatmailru@gmail.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Roland Dreier <roland@kernel.org>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
2012-08-13 16:27:37 -07:00
Tejun Heo
8376fe22c7 workqueue: implement mod_delayed_work[_on]()
Workqueue was lacking a mechanism to modify the timeout of an already
pending delayed_work.  delayed_work users have been working around
this using several methods - using an explicit timer + work item,
messing directly with delayed_work->timer, and canceling before
re-queueing, all of which are error-prone and/or ugly.

This patch implements mod_delayed_work[_on]() which behaves similarly
to mod_timer() - if the delayed_work is idle, it's queued with the
given delay; otherwise, its timeout is modified to the new value.
Zero @delay guarantees immediate execution.

v2: Updated to reflect try_to_grab_pending() changes.  Now safe to be
    called from bh context.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
2012-08-03 10:30:47 -07:00
Tejun Heo
bbb68dfaba workqueue: mark a work item being canceled as such
There can be two reasons try_to_grab_pending() can fail with -EAGAIN.
One is when someone else is queueing or deqeueing the work item.  With
the previous patches, it is guaranteed that PENDING and queued state
will soon agree making it safe to busy-retry in this case.

The other is if multiple __cancel_work_timer() invocations are racing
one another.  __cancel_work_timer() grabs PENDING and then waits for
running instances of the target work item on all CPUs while holding
PENDING and !queued.  try_to_grab_pending() invoked from another task
will keep returning -EAGAIN while the current owner is waiting.

Not distinguishing the two cases is okay because __cancel_work_timer()
is the only user of try_to_grab_pending() and it invokes
wait_on_work() whenever grabbing fails.  For the first case, busy
looping should be fine but wait_on_work() doesn't cause any critical
problem.  For the latter case, the new contender usually waits for the
same condition as the current owner, so no unnecessarily extended
busy-looping happens.  Combined, these make __cancel_work_timer()
technically correct even without irq protection while grabbing PENDING
or distinguishing the two different cases.

While the current code is technically correct, not distinguishing the
two cases makes it difficult to use try_to_grab_pending() for other
purposes than canceling because it's impossible to tell whether it's
safe to busy-retry grabbing.

This patch adds a mechanism to mark a work item being canceled.
try_to_grab_pending() now disables irq on success and returns -EAGAIN
to indicate that grabbing failed but PENDING and queued states are
gonna agree soon and it's safe to busy-loop.  It returns -ENOENT if
the work item is being canceled and it may stay PENDING && !queued for
arbitrary amount of time.

__cancel_work_timer() is modified to mark the work canceling with
WORK_OFFQ_CANCELING after grabbing PENDING, thus making
try_to_grab_pending() fail with -ENOENT instead of -EAGAIN.  Also, it
invokes wait_on_work() iff grabbing failed with -ENOENT.  This isn't
necessary for correctness but makes it consistent with other future
users of try_to_grab_pending().

v2: try_to_grab_pending() was testing preempt_count() to ensure that
    the caller has disabled preemption.  This triggers spuriously if
    !CONFIG_PREEMPT_COUNT.  Use preemptible() instead.  Reported by
    Fengguang Wu.

v3: Updated so that try_to_grab_pending() disables irq on success
    rather than requiring preemption disabled by the caller.  This
    makes busy-looping easier and will allow try_to_grap_pending() to
    be used from bh/irq contexts.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
2012-08-03 10:30:46 -07:00
Tejun Heo
36e227d242 workqueue: reorganize try_to_grab_pending() and __cancel_timer_work()
* Use bool @is_dwork instead of @timer and let try_to_grab_pending()
  use to_delayed_work() to determine the delayed_work address.

* Move timer handling from __cancel_work_timer() to
  try_to_grab_pending().

* Make try_to_grab_pending() use -EAGAIN instead of -1 for
  busy-looping and drop the ret local variable.

* Add proper function comment to try_to_grab_pending().

This makes the code a bit easier to understand and will ease further
changes.  This patch doesn't make any functional change.

v2: Use @is_dwork instead of @timer.

Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-03 10:30:46 -07:00
Tejun Heo
7beb2edf44 workqueue: factor out __queue_delayed_work() from queue_delayed_work_on()
This is to prepare for mod_delayed_work[_on]() and doesn't cause any
functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-03 10:30:46 -07:00
Tejun Heo
b549007727 workqueue: introduce WORK_OFFQ_FLAG_*
Low WORK_STRUCT_FLAG_BITS bits of work_struct->data contain
WORK_STRUCT_FLAG_* and flush color.  If the work item is queued, the
rest point to the cpu_workqueue with WORK_STRUCT_CWQ set; otherwise,
WORK_STRUCT_CWQ is clear and the bits contain the last CPU number -
either a real CPU number or one of WORK_CPU_*.

Scheduled addition of mod_delayed_work[_on]() requires an additional
flag, which is used only while a work item is off queue.  There are
more than enough bits to represent off-queue CPU number on both 32 and
64bits.  This patch introduces WORK_OFFQ_FLAG_* which occupy the lower
part of the @work->data high bits while off queue.  This patch doesn't
define any actual OFFQ flag yet.

Off-queue CPU number is now shifted by WORK_OFFQ_CPU_SHIFT, which adds
the number of bits used by OFFQ flags to WORK_STRUCT_FLAG_SHIFT, to
make room for OFFQ flags.

To avoid shift width warning with large WORK_OFFQ_FLAG_BITS, ulong
cast is added to WORK_STRUCT_NO_CPU and, just in case, BUILD_BUG_ON()
to check that there are enough bits to accomodate off-queue CPU number
is added.

This patch doesn't make any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-03 10:30:46 -07:00
Tejun Heo
bf4ede014e workqueue: move try_to_grab_pending() upwards
try_to_grab_pending() will be used by to-be-implemented
mod_delayed_work[_on]().  Move try_to_grab_pending() and related
functions above queueing functions.

This patch only moves functions around.

Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-03 10:30:46 -07:00
Tejun Heo
715f130080 workqueue: fix zero @delay handling of queue_delayed_work_on()
If @delay is zero and the dealyed_work is idle, queue_delayed_work()
queues it for immediate execution; however, queue_delayed_work_on()
lacks this logic and always goes through timer regardless of @delay.

This patch moves 0 @delay handling logic from queue_delayed_work() to
queue_delayed_work_on() so that both functions behave the same.

Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-03 10:30:46 -07:00
Tejun Heo
57469821fd workqueue: unify local CPU queueing handling
Queueing functions have been using different methods to determine the
local CPU.

* queue_work() superflously uses get/put_cpu() to acquire and hold the
  local CPU across queue_work_on().

* delayed_work_timer_fn() uses smp_processor_id().

* queue_delayed_work() calls queue_delayed_work_on() with -1 @cpu
  which is interpreted as the local CPU.

* flush_delayed_work[_sync]() were using raw_smp_processor_id().

* __queue_work() interprets %WORK_CPU_UNBOUND as local CPU if the
  target workqueue is bound one but nobody uses this.

This patch converts all functions to uniformly use %WORK_CPU_UNBOUND
to indicate local CPU and use the local binding feature of
__queue_work().  unlikely() is dropped from %WORK_CPU_UNBOUND handling
in __queue_work().

Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-03 10:30:45 -07:00
Tejun Heo
d8e794dfd5 workqueue: set delayed_work->timer function on initialization
delayed_work->timer.function is currently initialized during
queue_delayed_work_on().  Export delayed_work_timer_fn() and set
delayed_work timer function during delayed_work initialization
together with other fields.

This ensures the timer function is always valid on an initialized
delayed_work.  This is to help mod_delayed_work() implementation.

To detect delayed_work users which diddle with the internal timer,
trigger WARN if timer function doesn't match on queue.

Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-03 10:30:45 -07:00
Tejun Heo
8930caba3d workqueue: disable irq while manipulating PENDING
Queueing operations use WORK_STRUCT_PENDING_BIT to synchronize access
to the target work item.  They first try to claim the bit and proceed
with queueing only after that succeeds and there's a window between
PENDING being set and the actual queueing where the task can be
interrupted or preempted.

There's also a similar window in process_one_work() when clearing
PENDING.  A work item is dequeued, gcwq->lock is released and then
PENDING is cleared and the worker might get interrupted or preempted
between releasing gcwq->lock and clearing PENDING.

cancel[_delayed]_work_sync() tries to claim or steal PENDING.  The
function assumes that a work item with PENDING is either queued or in
the process of being [de]queued.  In the latter case, it busy-loops
until either the work item loses PENDING or is queued.  If canceling
coincides with the above described interrupts or preemptions, the
canceling task will busy-loop while the queueing or executing task is
preempted.

This patch keeps irq disabled across claiming PENDING and actual
queueing and moves PENDING clearing in process_one_work() inside
gcwq->lock so that busy looping from PENDING && !queued doesn't wait
for interrupted/preempted tasks.  Note that, in process_one_work(),
setting last CPU and clearing PENDING got merged into single
operation.

This removes possible long busy-loops and will allow using
try_to_grab_pending() from bh and irq contexts.

v2: __queue_work() was testing preempt_count() to ensure that the
    caller has disabled preemption.  This triggers spuriously if
    !CONFIG_PREEMPT_COUNT.  Use preemptible() instead.  Reported by
    Fengguang Wu.

v3: Disable irq instead of preemption.  IRQ will be disabled while
    grabbing gcwq->lock later anyway and this allows using
    try_to_grab_pending() from bh and irq contexts.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
2012-08-03 10:30:45 -07:00
Tejun Heo
959d1af8cf workqueue: add missing smp_wmb() in process_one_work()
WORK_STRUCT_PENDING is used to claim ownership of a work item and
process_one_work() releases it before starting execution.  When
someone else grabs PENDING, all pre-release updates to the work item
should be visible and all updates made by the new owner should happen
afterwards.

Grabbing PENDING uses test_and_set_bit() and thus has a full barrier;
however, clearing doesn't have a matching wmb.  Given the preceding
spin_unlock and use of clear_bit, I don't believe this can be a
problem on an actual machine and there hasn't been any related report
but it still is theretically possible for clear_pending to permeate
upwards and happen before work->entry update.

Add an explicit smp_wmb() before work_clear_pending().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: stable@vger.kernel.org
2012-08-03 10:30:45 -07:00
Tejun Heo
d4283e9378 workqueue: make queueing functions return bool
All queueing functions return 1 on success, 0 if the work item was
already pending.  Update them to return bool instead.  This signifies
better that they don't return 0 / -errno.

This is cleanup and doesn't cause any functional difference.

While at it, fix comment opening for schedule_work_on().

Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-03 10:30:44 -07:00
Tejun Heo
0a13c00e9d workqueue: reorder queueing functions so that _on() variants are on top
Currently, queue/schedule[_delayed]_work_on() are located below the
counterpart without the _on postifx even though the latter is usually
implemented using the former.  Swap them.

This is cleanup and doesn't cause any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-03 10:30:44 -07:00
Linus Torvalds
0d7614f09c Linux 3.6-rc1 2012-08-02 16:38:10 -07:00
Linus Torvalds
fc6bdb59a5 Merge branch 'for-linus-3.6' of git://dev.laptop.org/users/dilinger/linux-olpc
Pull OLPC platform updates from Andres Salomon:
 "These move the OLPC Embedded Controller driver out of
  arch/x86/platform and into drivers/platform/olpc.

  OLPC machines are now ARM-based (which means lots of x86 and ARM
  changes), but are typically pretty self-contained..  so it makes more
  sense to go through a separate OLPC tree after getting the appropriate
  review/ACKs."

* 'for-linus-3.6' of git://dev.laptop.org/users/dilinger/linux-olpc:
  x86: OLPC: move s/r-related EC cmds to EC driver
  Platform: OLPC: move global variables into priv struct
  Platform: OLPC: move debugfs support from x86 EC driver
  x86: OLPC: switch over to using new EC driver on x86
  Platform: OLPC: add a suspended flag to the EC driver
  Platform: OLPC: turn EC driver into a platform_driver
  Platform: OLPC: allow EC cmd to be overridden, and create a workqueue to call it
  drivers: OLPC: update various drivers to include olpc-ec.h
  Platform: OLPC: add a stub to drivers/platform/ for the OLPC EC driver
2012-08-02 11:52:39 -07:00
Linus Torvalds
44d82e2963 ARM: arm-soc Marvell Orion device-tree updates
This branch contains a set of device-tree conversions for Marvell Orion
 platforms that were staged early but took a few tries to get the branch
 into a format where it was suitable for us to pick up.
 
 Given that most people working on these platforms are hobbyists with
 limited time, we were a bit more flexible with merging it even though
 it came in late.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQGrp8AAoJEIwa5zzehBx3Ou8P/3HA53sS2WotNsBjpGTPuJim
 TppOmfFU6e7TLaY2oOWjIb5tPxK96u4vvCnQFAv+9TF+ewW0yCQlkBIJga3uVE7+
 tlz5TbKBo+vQYrFgFCpEwwTHYuLgjpIK4wDs+09EkIM4cMqe6RnLK1KGo6vCAqO9
 CKE0rqEfUFjdUq2jed7HA39iwNqJrz8VGLviozvTjW2R42x9iGuy5m2DpThYrLzr
 7IYwKGMaJIatG1C/pai7KLEjdJgQl9PeDsgG0oyYY2HHeNy3o8m7tLOL3tnWXeyj
 XaniNs/YZBp4nADf2sCzXxEVjqRBPVyiw8qCHEOW5L1cfUTq//1QDI3I4kcQzffg
 VsfLPNvzFsPEMlLI1Xo5UX/8w3xQhBeoI8PtlX7coBMBLn5rJXGnjGpEnH+5SsSA
 8KXxm4hP2aOGF+injoEv5HCEiUngH4YPN4hgXwyYdJKcln2YHy8xoIVabsmx27+v
 bSfododiEvHsBNxfz0Nn5Vo5O3trubU3qe4FEBAhS3pU3GksBocokAqtxeR+BBbH
 RfBVJ09QEYx1N+c6bhGQXt7F+8sUYO0s9tA7BEkeJtiXp8OX6M8eE4mFoMpIJbsC
 SmyxsVEE9pna9x3TSy42jiGO3yxbuKEoAy2HfyU1R8yq84XBFXbkgwV02tQcgClz
 vH4VgGdQoGULqa83GImG
 =xb8w
 -----END PGP SIGNATURE-----

Merge tag 'dt2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull arm-soc Marvell Orion device-tree updates from Olof Johansson:
 "This contains a set of device-tree conversions for Marvell Orion
  platforms that were staged early but took a few tries to get the
  branch into a format where it was suitable for us to pick up.

  Given that most people working on these platforms are hobbyists with
  limited time, we were a bit more flexible with merging it even though
  it came in late."

* tag 'dt2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (21 commits)
  ARM: Kirkwood: Replace mrvl with marvell
  ARM: Kirkwood: Describe GoFlex Net LEDs and SATA in DT.
  ARM: Kirkwood: Describe Dreamplug LEDs in DT.
  ARM: Kirkwood: Describe iConnects LEDs in DT.
  ARM: Kirkwood: Describe iConnects temperature sensor in DT.
  ARM: Kirkwood: Describe IB62x0 LEDs in DT.
  ARM: Kirkwood: Describe IB62x0 gpio-keys in DT.
  ARM: Kirkwood: Describe DNS32? gpio-keys in DT.
  ARM: Kirkwood: Move common portions into a kirkwood-dnskw.dtsi
  ARM: Kirkwood: Replace DNS-320/DNS-325 leds with dt bindings
  ARM: Kirkwood: Describe DNS325 temperature sensor in DT.
  ARM: Kirkwood: Use DT to configure SATA device.
  ARM: kirkwood: use devicetree for SPI on dreamplug
  ARM: kirkwood: Add LS-XHL and LS-CHLv2 support
  ARM: Kirkwood: Initial DTS support for Kirkwood GoFlex Net
  ARM: Kirkwood: Add basic device tree support for QNAP TS219.
  ATA: sata_mv: Add device tree support
  ARM: Orion: DTify the watchdog timer.
  ARM: Orion: Add arch support needed for I2C via DT.
  ARM: kirkwood: use devicetree for orion-spi
  ...

Conflicts:
	drivers/watchdog/orion_wdt.c
2012-08-02 11:50:24 -07:00
Linus Torvalds
bfdf85dfce ARM: arm-soc: cpuidle enablement for OMAP
Coupled cpuidle was meant to merge for 3.5 through Len Brown's tree, but
 didn't go in because the pull request ended up rejected. So it just got
 merged, and we got this staged branch that enables the coupled cpuidle
 code on OMAP.
 
 With a stable git workflow from the other maintainer we could have staged
 this earlier, but that wasn't the case so we have had to merge it late.
 
 The alternative is to hold it off until 3.7 but given that the code is
 well-isolated to OMAP and they are eager to see it go in, I didn't push
 back hard in that direction.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQGrdLAAoJEIwa5zzehBx36jwP/jP7olzWRo4AfNB8urXpEmVF
 WLh0OrdM4PSvLaIZxpLPuJZC8WP2U4mU5WvlGKtOaqr5iau8ErKfUQ3XVqNvfEKW
 2FeslUxQLiMtic88Ys3OMnneFPV5pp0QdH/LgiivFj7ZeUAkAV/FQj9U2CA1KXER
 k+2uC1b8Kd27FSQjbwKx/0oO4IfNUhOItifKajDUnrDlkET92eeR45+WF94cepwo
 vBN0SpE8nfrLvbELkY2FN/xxk/7eADr8qbM6KNI98yn41b4577aZO7xQ3/8r+PUr
 0vyW9QRVgkLLVV/HNfEKcEPo8VNH3xNQ6bb0DcqV7hZxqCcXz3YPGPLTnWb6/RxC
 vp26/VKiRPikQF9XGUT55k/vuAQQH1vzzlQTH0YKZ8fooWA1zvMgXdpyMZdjqfYm
 ZVft2x4P1uHBAbtG841KJO11SXHavsXxCNlezsddvgs+dfeE2etclmkVtZ0dCUMZ
 CIMNy2JMOUjv+Ky55YB4qYUdCoOQDZSEGe59AnS2IRpvRVTAnYblMcoY/Fug0lPe
 JI6XkCrk7nYCMXK2NSQjgteZZ9anJ2IzU1Q7d7ev+k7oUcMGRiWPVWJBS49WPIqu
 SFX5M10cSOaahxe1qqNl+qn3e4K2O6ImCxXlCidFQdXh13zojovokIqc+GDnbYi/
 mrix0RssEKdaGFbiYObK
 =Wp41
 -----END PGP SIGNATURE-----

Merge tag 'pm2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull arm-soc cpuidle enablement for OMAP from Olof Johansson:
 "Coupled cpuidle was meant to merge for 3.5 through Len Brown's tree,
  but didn't go in because the pull request ended up rejected.  So it
  just got merged, and we got this staged branch that enables the
  coupled cpuidle code on OMAP.

  With a stable git workflow from the other maintainer we could have
  staged this earlier, but that wasn't the case so we have had to merge
  it late.

  The alternative is to hold it off until 3.7 but given that the code is
  well-isolated to OMAP and they are eager to see it go in, I didn't
  push back hard in that direction."

* tag 'pm2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: OMAP4: CPUidle: Open broadcast clock-event device.
  ARM: OMAP4: CPUidle: add synchronization for coupled idle states
  ARM: OMAP4: CPUidle: Use coupled cpuidle states to implement SMP cpuidle.
  ARM: OMAP: timer: allow gp timer clock-event to be used on both cpus
2012-08-02 11:48:54 -07:00
Linus Torvalds
d1494ba8c3 ARM: SoC fixes
A few fixes for merge window fallout, and a bugfix for timer resume
 on PRIMA2.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQGraYAAoJEIwa5zzehBx3eCEP+QHCKDWVz+8bloVe09EOvLvG
 VQzlTzGhQtJmQTeOY40gvGG0Fq+w/GdV5VG2/ovIpgB2ntvw/hUZha3Sm94MhOM8
 LxlM0lHX0/oNWUXTzsxDGETpED1lcshig2ScoTyG1DuRRIKBG736Szj7hPRH7ZBT
 PGmBIvu84qjynaHhCvpOkOQRYJRuZlmba6cJQZ6i7G1Y9hqCyBNT6IxjZ7adB4PH
 Ahc3ZrgniLp8JLx4tGudp86ST65rTn08KkgnN4JNaVFjyUuvGCAa/1M+aZfypGWK
 YbMjE4FXoooCsW5a4gK4HRPk9OGp8+LLR/RlEjnhX289/+0ReHKoAJxoYYEqJ+IO
 6tsVtERU/9qaSTleakKfTWxsL32aSFR7QbA0hQXwZeQXPsRs6HP7V/1FqzVRd3qN
 fbXerVx7SzkxO6Q+psBWhDOx9dALlltG+mXXuaOlJ4NCKRnZA/8kIeQW9xEtJbby
 bbabF9xCpp2DQgE30tx6eSMTdctuP1J90y5OFKNNxeqYWjvZW5iw7LMwbeqymK4H
 wky93bc4Z6rErBXjPH2Mg66NdM2YxPA9SJY6t5S26hNa/noPRzXwBsapfsB6zlIT
 Ewx+IhLdMal0BH7va3VSKLj3LbX6O8X1YCjFmRz8Tev90n9VMQABnCvGB2XdJbTp
 M0wbQnv7lYKJJ65XJYpV
 =BkSN
 -----END PGP SIGNATURE-----

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
 "A few fixes for merge window fallout, and a bugfix for timer resume on
  PRIMA2."

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: mmp: add missing irqs.h
  arm: mvebu: fix typo in .dtsi comment for Armada XP SoCs
  ARM: PRIMA2: delete redundant codes to restore LATCHED when timer resumes
  ARM: mxc: Include missing irqs.h header
2012-08-02 11:48:20 -07:00
Linus Torvalds
0a276d1675 SuperH fixes for 3.6-rc1 merge window
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iEYEABECAAYFAlAaIN8ACgkQGkmNcg7/o7gpMgCgtmCJeUrSCc/eVKGy5ik1BPnC
 3DAAoLdqCJflwx4mrN48kkXQj9SA4Xko
 =Y6yl
 -----END PGP SIGNATURE-----

Merge tag 'sh-for-linus' of git://github.com/pmundt/linux-sh

Pull SuperH fixes from Paul Mundt.

* tag 'sh-for-linus' of git://github.com/pmundt/linux-sh: (24 commits)
  sh: explicitly include sh_dma.h in setup-sh7722.c
  sh: ecovec: care CN5 VBUS if USB host mode
  sh: sh7724: fixup renesas_usbhs clock settings
  sh: intc: initial irqdomain support.
  sh: pfc: Fix up init ordering mess.
  serial: sh-sci: fix compilation breakage, when DMA is enabled
  dmaengine: shdma: restore partial transfer calculation
  sh: modify the sh_dmae_slave_config for RSPI in setup-sh7757
  sh: Fix up recursive fault in oops with unset TTB.
  sh: pfc: Build fix for pinctrl_remove_gpio_range() changes.
  sh: select the fixed regulator driver on several boards
  sh: ecovec: switch MMC power control to regulators
  sh: add fixed voltage regulators to se7724
  sh: add fixed voltage regulators to sdk7786
  sh: add fixed voltage regulators to rsk
  sh: add fixed voltage regulators to migor
  sh: add fixed voltage regulators to kfr2r09
  sh: add fixed voltage regulators to ap325rxa
  sh: add fixed voltage regulators to sh7757lcr
  sh: add fixed voltage regulators to sh2007
  ...
2012-08-02 11:45:42 -07:00
Linus Torvalds
25aa6a7ae4 Additional md update for 3.6
This contains a few patches that depend on
 plugging changes in the block layer so needs to wait
 for those.
 It also contains a Kconfig fix for the new RAID10 support
 in dm-raid.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQIVAwUAUBnKUznsnt1WYoG5AQJOQA/+M7RoVnF63+TbGIqdNDotuF8FxvudCZBl
 Ou2yG47EOPtWf/RoqPyfpydDgdjyXsk4T5TfXoc0hsXVr4shCYo51uT9K34TMSDJ
 2GzGWuyugRJFyvxW7PBgM+zFWlcVdgUGcwsdmIUMtHRz8Q10TqO5fE22RNLkhwOl
 fvGCK1KYnQqlG87DbulHWMo22vyZVic8jBqFSw55CPuuFMSJMxCw0rOPUnvk5Q8v
 jWzZzuUqrM8iiOxTDHsbCA0IleCbGl/m0tgk02Vj4tkCvz9N/xzQW2se0H6uECiK
 k8odbAiNBOh1q135sa7ASrBzxT+JqSiQ25rLheTEzzNxjFv6/NlntXmYu6HB+lD3
 DoHAvRjgMxiLCdisW6TJb10NItitXwE/HSpQOVRxyYtINdzmhIDaCccgfN8ZMkho
 nmE/uzO+CAoCFpZC2C/nY8D0BZs5fw4hgDAsci66mvs+88dy+SoA4AbyNEMAusOS
 tiL8ZEjnYXvxTh3JFaMIaqQd6PkbahmtEtvorwXsUYUdY0ybkcs2FYVksvkgYdyW
 WlejOZVurY2i5biqck3UqjesxeJA5TMAlAUQR7vXu1Fa9fYFXZbqJom/KnPRTfek
 xerCWPMbhuzmcyEjUOGfjs6GFEnEmRT6Q6fN3CBaQMS2Q/z+6AkTOXKVl5Fhvoyl
 aeu1m8nZLuI=
 =ovN2
 -----END PGP SIGNATURE-----

Merge tag 'md-3.6' of git://neil.brown.name/md

Pull additional md update from NeilBrown:
 "This contains a few patches that depend on plugging changes in the
  block layer so needed to wait for those.

  It also contains a Kconfig fix for the new RAID10 support in dm-raid."

* tag 'md-3.6' of git://neil.brown.name/md:
  md/dm-raid: DM_RAID should select MD_RAID10
  md/raid1: submit IO from originating thread instead of md thread.
  raid5: raid5d handle stripe in batch way
  raid5: make_request use batch stripe release
2012-08-02 11:34:40 -07:00
Linus Torvalds
c8924234bd Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull two ceph fixes from Sage Weil:
 "The first patch fixes up the old crufty open intent code to use the
  atomic_open stuff properly, and the second fixes a possible null deref
  and memory leak with the crypto keys."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  libceph: fix crypto key null deref, memory leak
  ceph: simplify+fix atomic_open
2012-08-02 10:57:31 -07:00
Linus Torvalds
410fc4ce8a - Fixes a bug when the lower filesystem mount options include 'acl', but the
eCryptfs mount options do not
 - Cleanups in the messaging code
 - Better handling of empty files in the lower filesystem to improve usability.
   Failed file creations are now cleaned up and empty lower files are converted
   into eCryptfs during open().
 - The write-through cache changes are being reverted due to bugs that are not
   easy to fix. Stability outweighs the performance enhancements here.
 - Improvement to the mount code to catch unsupported ciphers specified in the
   mount options
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCgAGBQJQGhVWAAoJENaSAD2qAscKjvcQAMgF66sl8KjwzFCgojh30a4u
 1hCXltxUCLjXxjXxyygUHb/pH0xdvC+Ss3FTtsPXAjgYm2lXjjoOmVG8WAvwHHx1
 2ZjDo+8fQ4XA8Rl9kYuvt/abF0IssNRK3csTWplR7lpoQ8AWbkpkag1I4WZhibey
 cgs/zECl8ACTJ5zQ+AyRGnrssq4jI7xZAKWLK0+KKk7S9yIRI7K/xdz1xK39jGK6
 N09Dw3VWY/bMcMq77ZXBtyHdP7KR7wKUtCeQttmCvdf20Ocy6AXzr1FRKvUxionF
 Sf31tJim0u9OO8hmy8cjCyWEy9LHnXnSd/5vn+Qd9ok9GvuiYmKw07rbXi/gjhBX
 ai5PKtl05WiQgp80BybUYfIY1Hq71MsppNi6h9Zgiid5rEvWWvCBWBWP95G8DTmC
 6TwLaCG0rh8uuZyeiVrs3xZQ2IG5Zmu0CX3XGyfsaLvqmQWhtT5ZQVMeMQEEBxyQ
 ur9SSU2O/nC8ceLB7fzGmZPTLZUWOuYQnd24NJNK+7j0P+Km7pqmDYpCdwmFpx3C
 CQ0gGaJGHeycvBF327bwxPmsPdO4fy+nmEL8vrEXPTyQ3ZVAPvxZK9t8Jpk4UFOl
 JSTWFiK0mvgE+5dX2kB0nzN7iD7hcMgmozht50qQP3OUkI1kkBrEHgHVO3KzgUIA
 +aRAljeLdelq44JBxjiB
 =4vDd
 -----END PGP SIGNATURE-----

Merge tag 'ecryptfs-3.6-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs

Pull ecryptfs fixes from Tyler Hicks:
 - Fixes a bug when the lower filesystem mount options include 'acl',
   but the eCryptfs mount options do not
 - Cleanups in the messaging code
 - Better handling of empty files in the lower filesystem to improve
   usability.  Failed file creations are now cleaned up and empty lower
   files are converted into eCryptfs during open().
 - The write-through cache changes are being reverted due to bugs that
   are not easy to fix.  Stability outweighs the performance
   enhancements here.
 - Improvement to the mount code to catch unsupported ciphers specified
   in the mount options

* tag 'ecryptfs-3.6-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
  eCryptfs: check for eCryptfs cipher support at mount
  eCryptfs: Revert to a writethrough cache model
  eCryptfs: Initialize empty lower files when opening them
  eCryptfs: Unlink lower inode when ecryptfs_create() fails
  eCryptfs: Make all miscdev functions use daemon ptr in file private_data
  eCryptfs: Remove unused messaging declarations and function
  eCryptfs: Copy up POSIX ACL and read-only flags from lower mount
2012-08-02 10:56:34 -07:00
Linus Torvalds
630103ea2c Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS update from Steve French:
 "Adds SMB2 rmdir/mkdir capability to the SMB2/SMB2.1 support in cifs.

  I am holding up a few more days on merging the remainder of the
  SMB2/SMB2.1 enablement although it is nearing review completion, in
  order to address some review comments from Jeff Layton on a few of the
  subsequent SMB2 patches, and also to debug an unrelated cifs problem
  that Pavel discovered."

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  CIFS: Add SMB2 support for rmdir
  CIFS: Move rmdir code to ops struct
  CIFS: Add SMB2 support for mkdir operation
  CIFS: Separate protocol specific part from mkdir
  CIFS: Simplify cifs_mkdir call
2012-08-02 10:54:11 -07:00
Linus Torvalds
8783b6e2b2 mm: remove node_start_pfn checking in new WARN_ON for now
Borislav Petkov reports that the new warning added in commit
88fdf75d1b ("mm: warn if pg_data_t isn't initialized with zero")
triggers for him, and it is the node_start_pfn field that has already
been initialized once.

The call trace looks like this:

  x86_64_start_kernel ->
    x86_64_start_reservations ->
    start_kernel ->
    setup_arch ->
    paging_init ->
    zone_sizes_init ->
    free_area_init_nodes ->
    free_area_init_node

and (with the warning replaced by debug output), Borislav sees

  On node 0 totalpages: 4193848
    DMA zone: 64 pages used for memmap
    DMA zone: 6 pages reserved
    DMA zone: 3890 pages, LIFO batch:0
    DMA32 zone: 16320 pages used for memmap
    DMA32 zone: 798464 pages, LIFO batch:31
    Normal zone: 52736 pages used for memmap
    Normal zone: 3322368 pages, LIFO batch:31
  free_area_init_node: pgdat->node_start_pfn: 4423680      <----
  On node 1 totalpages: 4194304
    Normal zone: 65536 pages used for memmap
    Normal zone: 4128768 pages, LIFO batch:31
  free_area_init_node: pgdat->node_start_pfn: 8617984      <----
  On node 2 totalpages: 4194304
    Normal zone: 65536 pages used for memmap
    Normal zone: 4128768 pages, LIFO batch:31
  free_area_init_node: pgdat->node_start_pfn: 12812288     <----
  On node 3 totalpages: 4194304
    Normal zone: 65536 pages used for memmap
    Normal zone: 4128768 pages, LIFO batch:31

so remove the bogus warning for now to avoid annoying people.  Minchan
Kim is looking at it.

Reported-by: Borislav Petkov <bp@amd64.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-08-02 10:37:03 -07:00
Haojian Zhuang
bac6f61550 ARM: mmp: add missing irqs.h
arch/arm/mach-mmp/gplugd.c:195:13: error: ‘MMP_NR_IRQS’ undeclared here
(not in a function)
make[1]: *** [arch/arm/mach-mmp/gplugd.o] Error 1

Include <mach/irqs.h> to fix this issue.

Signed-off-by: Haojian Zhuang <haojian.zhuang@gmail.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
2012-08-02 10:15:59 -07:00
Thomas Petazzoni
10b683cba5 arm: mvebu: fix typo in .dtsi comment for Armada XP SoCs
The comment was wrongly referring to Armada 370 while the file is
related to Armada XP.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
2012-08-02 10:05:32 -07:00
Barry Song
debeaf6c2d ARM: PRIMA2: delete redundant codes to restore LATCHED when timer resumes
The only way to write LATCHED registers to write LATCH_BIT to LATCH register,
that will latch COUNTER into LATCHED.e.g.
writel_relaxed(SIRFSOC_TIMER_LATCH_BIT, sirfsoc_timer_base +
	SIRFSOC_TIMER_LATCH);

Writing values to LATCHED registers directly is useless at all.

Signed-off-by: Barry Song <Baohua.Song@csr.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
2012-08-02 10:05:27 -07:00
Sylvain Munaut
f0666b1ac8 libceph: fix crypto key null deref, memory leak
Avoid crashing if the crypto key payload was NULL, as when it was not correctly
allocated and initialized.  Also, avoid leaking it.

Signed-off-by: Sylvain Munaut <tnt@246tNt.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
2012-08-02 09:19:20 -07:00