Preparation patch so more drbd_send_state() usage on the peer
will not confuse drbd in receive_state().
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
no functional change, just using full state instead of just the .conn
part of it for comparisons.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
drbd commit 17c854fea474a5eb3cfa12e4fb019e46debbc4ec
drbd: receiving of big packets, for payloads between 64kByte and 4GByte
introduced a new on-the-wire packet header format. We must no longer
assume either format, but use the result of whatever drbd_recv_header
has decoded.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
We used to be16_to_cpu the length field in our received packet header.
drbd commit 17c854fea474a5eb3cfa12e4fb019e46debbc4ec
drbd: receiving of big packets, for payloads between 64kByte and 4GByte
changed this, but forgot to adjust a few places where we relied on
h->length being in native byte order.
This broke the receiving side of the RLE compressed bitmap exchange.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
This caused rs_planed to be not in sync with the content of the fifo.
That in turn could cause that the resync comes to a complete halt.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Connections through a compressing proxy might have more bits
on the fly. 500MByte instead of 50MByte
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
If we release the page pointed to by md_io_tmpp, we need to zero out the
pointer, too, as that may be used later to decide whether we need to
allocate a new page again.
Impact: a previously freed page may be used and clobbered. Depending on
what that particular page is being used for meanwhile, this may result
in silent data corruption of completely unrelated things.
Only of concern on devices with logical_block_size != 512 byte,
if you re-attach after becoming diskless once.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Two missing corner cases to the "maximum packet size" handshake.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
There are three ways to get IO suspended:
* Loss of any access to data
* Fence-peer-handler running
* User requested to suspend IO
Track those in different bits, so that one condition clearing its
state bit does not interfere with the other two conditions.
Only when the user resumes IO he overrules all three bits.
The fact is hidden from the user, he sees only a single suspend
bit.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Forgot to consider the max size for the resync requests.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
If a synctarget lost connection while being WFSyncUUID,
due to "state sanitizing", the attempted state change to SyncTarget
looked like an "invalidate" to after_state_ch() later,
thus caused a full sync on next handshake (Bug #318).
drbd0: PingAck did not arrive in time.
drbd0: peer( Primary -> Unknown ) conn( WFSyncUUID -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
from : { cs:NetworkFailure ro:Secondary/Unknown ds:UpToDate/DUnknown r--- }
to : { cs:SyncTarget ro:Secondary/Unknown ds:Inconsistent/DUnknown r--- }
after sanizising, resulted in
state: { cs:NetworkFailure ro:Secondary/Unknown ds:Inconsistent/DUnknown r--- }
drbd0: disk( UpToDate -> Inconsistent )
Fix:
don't mask state transition errors in "sanitizing",
so the requested state change to SyncTarget fails,
instead of being implicitly "remaped" to invalidate.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
If we cannot satisfy a request (because our disk just broke),
we still need to drain the payload. Or we'll get a protocol error
when interpreting the payload as DRBD packet header.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
BUG trace would look like:
lc_find
drbd_rs_complete_io
got_OVResult
drbd_asender
Could be triggered by explicit, or IO-error policy based,
detach during online-verify.
We may only dereference mdev->resync, if we first get_ldev(), as the
disk may break any time, causing mdev->resync to disappear once all
ldev references have been returned.
Already in flight online-verify requests or replies may still come in,
which we then need to ignore.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Just in case we have some pending meta data changes to sync, do it
before we call our userland helper, as that may take some time,
or even cause a hard reboot.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
addendum to baa33ae4eaa4477b60af7c434c0ddd1d182c1ae7
The race:
drbd_md_sync()
if (!test_and_clear_bit(MD_DIRTY, &mdev->flags))
return;
==> RACE with drbd_md_mark_dirty() rearming the timer.
del_timer(&mdev->md_sync_timer);
Fixed by moving the del_timer before the test_and_clear_bit.
Additionally only rearm the timer in drbd_md_mark_dirty, if MD_DIRTY was
not already set, reduce the grace period from five to one second, and
add an ifdef'ed debuging aid to find code paths missing an explicit
drbd_md_sync, if any, as those are the only relevant ones for this race.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The actual race happened int the drbd_start_resync() function. Where
drbd_resync_finished() -> __drbd_set_state() set STOP_SYNC_TIMER and
armed the timer.
If the timer fired before execution reaches the mod_timer statement
at the end of drbd_start_resync() the latter would cause an
unexpected call to w_make_resync_request().
Removed the STOP_SYNC_TIMER bit, and base it on the connection state.
The STOP_SYNC_TIMER bit probably originates probably the time before
the state engine.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
If pacemaker (for example) decided to initialize minor devices not in
the exact sync-after dependency order, the configuration partially
failed with an error "The sync-after minor number is invalid". (Bugz. #322)
We can avoid that by implicitly creating unconfigured minor devices,
if others depend on them.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
If a drbd_nl_net_conf hits the small window between the state change
to C_STANDALONE and the corresponding cleanup in after_state_ch,
that cleanup would throw away stuff we now need again,
and later trigger BUG_ON()s.
Fixed by properly serializing the new config request with
any pending cleanup.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
When the complete device is marked as out of sync, we can disable
updates of the on disk AL. Currently AL updates are only disabled
if one uses the "invalidate-remote" command on an unconnected,
primary device, or when at attach time all bits in the bitmap are
set.
As of now, AL updated do not get disabled when a all bits becomes
set due to application writes to an unconnected DRBD device.
While this is a missing feature, it is not considered important,
and might get added later.
BTW, after initializing a "one legged" DRBD device
drbdadm create-md resX
drbdadm -- --force primary resX
AL updates also get disabled, until the first connect.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Now we have multiple BIOs per ee, packets with a 32 bit length field,
it gets time to use these goodies.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
If we intent to use the block_id member of an epoch entry,
we may not use the digest member.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
We now track the data rate of locally submitted resync related requests,
and can thus detect non-resync activity on the lower level device.
If the current sync rate is above c-min-rate, and the lower level device
appears to be busy, we throttle the resyncer.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
also canonicalize the return values of read_for_csum
and drbd_rs_begin_io to return -ESOMETHING, or 0 for success.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The current resync speed as displayed in /proc/drbd fluctuates a lot.
Using an array of rolling marks makes this calculation much more stable.
We used to have this (a long time ago with 0.7), but it got lost somehow.
If "stalled", do not discard the rest of the information, just add a
" (stalled)" tag to the progress line.
This patch also shortens a spinlock critical section somewhat, and
reduces the number of atomic operations in put_ldev.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The commit 288f422ec1
drbd: Track all IO requests on the TL, not writes only
moved a list_add_tail(req, ) into a region where req
may have just been freed due to conflict detection.
Fix this by adding a proper cleanup section for that code path.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
We may not free tl_hash when IO is suspended, since we can not wait
until ap_bio_cnt reaches zero.
We can do this after susp reched 0, since then tl_clear was called
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
After disconnect (most likely mdev->net_cnt == 0) and we are
still in an unstable state (!drbd_state_is_stable()). When we
get an IO request in drbd_get_max_buffers() (called from
__inc_ap_bio_cond(), called from inc_ap_bio()) we wake up
misc_wait. Misc_wait is also used in inc_ap_bio() to sleep
until the outcome of __inc_ap_bio_cond() changes. => Busy loop!
Solution: Have a dedicated wait queue for get_net_conf() and
put_net_conf().
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Make sure the state engine can deny two primaries to connect
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
When a fencing policy of "resource-and-stonith" is configured,
and DRBD looses connection to it's peer, we can delay the
creation of a new current-UUID until IO gets thawed.
That allows one to deploy fence-peer handlers that actually
commit suicide on the machine they get started.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Since we can not thaw the transfer log, the next logical step is
to allow reconnects while the fence-peer handler runs.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
State transitions in the space of non-allowed states used
to be very noisy. Reduce that, since that has little value
for the majority of the user base.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
When no data is accessible (no connection to the peer, nor a local disk)
allow the user to select to freeze all IO operations instead of getting
IO errors.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
If IO was frozen for a temporal network outage, resend the
content of the transfer-log into the newly established connection.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
With that the drbd_fail_pending_reads() function becomes obsolete.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
This should pass "buf" to bvec_kunmap_irq() instead of "bv". The api is
like kmap_atomic() instead of kmap().
Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Must drop reference taken by blk_make_request().
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org # .35.x
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The block device drivers have all gained new lock_kernel
calls from a recent pushdown, and some of the drivers
were already using the BKL before.
This turns the BKL into a set of per-driver mutexes.
Still need to check whether this is safe to do.
file=$1
name=$2
if grep -q lock_kernel ${file} ; then
if grep -q 'include.*linux.mutex.h' ${file} ; then
sed -i '/include.*<linux\/smp_lock.h>/d' ${file}
else
sed -i 's/include.*<linux\/smp_lock.h>.*$/include <linux\/mutex.h>/g' ${file}
fi
sed -i ${file} \
-e "/^#include.*linux.mutex.h/,$ {
1,/^\(static\|int\|long\)/ {
/^\(static\|int\|long\)/istatic DEFINE_MUTEX(${name}_mutex);
} }" \
-e "s/\(un\)*lock_kernel\>[ ]*()/mutex_\1lock(\&${name}_mutex)/g" \
-e '/[ ]*cycle_kernel_lock();/d'
else
sed -i -e '/include.*\<smp_lock.h\>/d' ${file} \
-e '/cycle_kernel_lock()/d'
fi
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
All these files use the big kernel lock in a trivial
way to serialize their private file operations,
typically resulting from an earlier semi-automatic
pushdown from VFS.
None of these drivers appears to want to lock against
other code, and they all use the BKL as the top-level
lock in their file operations, meaning that there
is no lock-order inversion problem.
Consequently, we can remove the BKL completely,
replacing it with a per-file mutex in every case.
Using a scripted approach means we can avoid
typos.
These drivers do not seem to be under active
maintainance from my brief investigation. Apologies
to those maintainers that I have missed.
file=$1
name=$2
if grep -q lock_kernel ${file} ; then
if grep -q 'include.*linux.mutex.h' ${file} ; then
sed -i '/include.*<linux\/smp_lock.h>/d' ${file}
else
sed -i 's/include.*<linux\/smp_lock.h>.*$/include <linux\/mutex.h>/g' ${file}
fi
sed -i ${file} \
-e "/^#include.*linux.mutex.h/,$ {
1,/^\(static\|int\|long\)/ {
/^\(static\|int\|long\)/istatic DEFINE_MUTEX(${name}_mutex);
} }" \
-e "s/\(un\)*lock_kernel\>[ ]*()/mutex_\1lock(\&${name}_mutex)/g" \
-e '/[ ]*cycle_kernel_lock();/d'
else
sed -i -e '/include.*\<smp_lock.h\>/d' ${file} \
-e '/cycle_kernel_lock()/d'
fi
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The PKT_CTRL_CMD_STATUS device ioctl retrieves a pointer to a
pktcdvd_device from the global pkt_devs array. The index into this
array is provided directly by the user and is a signed integer, so the
comparison to ensure that it falls within the bounds of this array will
fail when provided with a negative index.
This can be used to read arbitrary kernel memory or cause a crash due to
an invalid pointer dereference. This can be exploited by users with
permission to open /dev/pktcdvd/control (on many distributions, this is
readable by group "cdrom").
Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com>
[ Rather than add a cast, just make the function take the right type -Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
o Use one request queue per gendisk instead of sharing the queue.
o Don't have hardware. No compile testing or run time testing done. Completely
untested.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
o Use one request queue per gendisk instead of sharing request queue
o Don't have hardware. No compile testing or run time testing done. Completely
untested.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Pretty straight forward conversion. Note that we do round-robin
between the drives that have available requests, before we simply
used the drive that the IO scheduler told us to. Since the IO
scheduler doesn't care about multiple devices per queue, the resulting
sort would not have made sense.
Fixed by Vivek to get rid of a double lock problem in set_next_request()
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
The "h->scatter_list" is allocated inside a for loop. If any of those
allocations fail, then the rest of the list is uninitialized data. When
we free it we should start from the top and free backwards so that we
don't call kfree() on uninitialized pointers.
Also if the allocation for "h->scatter_list" fails then we would get an
Oops here. I should have noticed this when I send: 4ee69851c "cciss:
handle allocation failure." but I didn't. Sorry about that.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
All the blkdev_issue_* helpers can only sanely be used for synchronous
caller. To issue cache flushes or barriers asynchronously the caller needs
to set up a bio by itself with a completion callback to move the asynchronous
state machine ahead. So drop the BLKDEV_IFL_WAIT flag that is always
specified when calling blkdev_issue_* and also remove the now unused flags
argument to blkdev_issue_flush and blkdev_issue_zeroout. For
blkdev_issue_discard we need to keep it for the secure discard flag, which
gains a more descriptive name and loses the bitops vs flag confusion.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
We have several users of min_not_zero, each of them using their own
definition. Move the define to kernel.h.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@carl.home.kernel.dk>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: Range check cpu in blk_cpu_to_group
scatterlist: prevent invalid free when alloc fails
writeback: Fix lost wake-up shutting down writeback thread
writeback: do not lose wakeup events when forking bdi threads
cciss: fix reporting of max queue depth since init
block: switch s390 tape_block and mg_disk to elevator_change()
block: add function call to switch the IO scheduler from a driver
fs/bio-integrity.c: return -ENOMEM on kmalloc failure
bio-integrity.c: remove dependency on __GFP_NOFAIL
BLOCK: fix bio.bi_rw handling
block: put dev->kobj in blk_register_queue fail path
cciss: handle allocation failure
cfq-iosched: Documentation help for new tunables
cfq-iosched: blktrace print per slice sector stats
cfq-iosched: Implement tunable group_idle
cfq-iosched: Do group share accounting in IOPS when slice_idle=0
cfq-iosched: Do not idle if slice_idle=0
cciss: disable doorbell reset on reset_devices
blkio: Fix return code for mkdir calls
Remove now unused REQ_HARDBARRIER support. virtio_blk already
supports REQ_FLUSH and the usefulness of REQ_FUA for virtio_blk is
questionable at this point, so there's nothing else to do to support
new REQ_FLUSH/FUA interface.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Deprecate REQ_HARDBARRIER and implement REQ_FLUSH/FUA instead. Also,
instead of checking file->f_op->fsync() directly, look at the value of
vfs_fsync() and ignore -EINVAL return.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
REQ_HARDBARRIER is deprecated. Remove spurious uses in the following
users. Please note that other than osdblk, all other uses were
already spurious before deprecation.
* osdblk: osdblk_rq_fn() won't receive any request with
REQ_HARDBARRIER set. Remove the test for it.
* pktcdvd: use of REQ_HARDBARRIER in pkt_generic_packet() doesn't mean
anything. Removed.
* aic7xxx_old: Setting MSG_ORDERED_Q_TAG on REQ_HARDBARRIER is
spurious. Removed.
* sas_scsi_host: Setting TASK_ATTR_ORDERED on REQ_HARDBARRIER is
spurious. Removed.
* scsi_tcq: The ordered tag path wasn't being used anyway. Removed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Boaz Harrosh <bharrosh@panasas.com>
Cc: James Bottomley <James.Bottomley@suse.de>
Cc: Peter Osterlund <petero2@telia.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Barrier is deemed too heavy and will soon be replaced by FLUSH/FUA
requests. Deprecate barrier. All REQ_HARDBARRIERs are failed with
-EOPNOTSUPP and blk_queue_ordered() is replaced with simpler
blk_queue_flush().
blk_queue_flush() takes combinations of REQ_FLUSH and FUA. If a
device has write cache and can flush it, it should set REQ_FLUSH. If
the device can handle FUA writes, it should also set REQ_FUA.
All blk_queue_ordered() users are converted.
* ORDERED_DRAIN is mapped to 0 which is the default value.
* ORDERED_DRAIN_FLUSH is mapped to REQ_FLUSH.
* ORDERED_DRAIN_FLUSH_FUA is mapped to REQ_FLUSH | REQ_FUA.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Boaz Harrosh <bharrosh@panasas.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Pierre Ossman <drzeus@drzeus.cx>
Cc: Stefan Weinhuber <wein@de.ibm.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Nobody is making meaningful use of ORDERED_BY_TAG now and queue
draining for barrier requests will be removed soon which will render
the advantage of tag ordering moot. Kill ORDERED_BY_TAG. The
following users are affected.
* brd: converted to ORDERED_DRAIN.
* virtio_blk: ORDERED_TAG path was already marked deprecated. Removed.
* xen-blkfront: ORDERED_TAG case dropped.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
loop implements FLUSH using fsync but was incorrectly setting its
ordered mode to DRAIN. Change it to DRAIN_FLUSH. In practice, this
doesn't change anything as loop doesn't make use of the block layer
ordered implementation.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
The ioctl path and the scsi tape path were not accounting
for their additions to the queue depth.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* 'for-upstream/pvhvm' of git://xenbits.xensource.com/people/ianc/linux-2.6:
xen: pvhvm: make it clearer that XEN_UNPLUG_* define bits in a bitfield
xen: pvhvm: rename xen_emul_unplug=ignore to =unnnecessary
xen: pvhvm: allow user to request no emulated device unplug
Create /sys/block/loopX/loop directory and provide these attributes:
- backing_file
- autoclear
- offset
- sizelimit
This loop directory is present only if loop device is configured.
To be used in util-linux-ng (and possibly elsewhere like udev rules)
where code need to get loop attributes from kernel (and not store
duplicate info in userspace).
Moreover loop ioctls are not even able to provide full backing
file info because of buffer limits.
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Now that we have this API, switch the two in-kernel users to it.
Resolves an oops introduced by commit
1abec4fdbb.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
It is not immediately clear what this option causes to become
ignored. The actual meaning is that it is not necessary to unplug the
emulated devices to safely use the PV ones, even if the platform does
not support the unplug protocol. (pressumably the user will only add
this option if they have ensured that their domain configuration is
safe).
I think xen_emul_unplug=unnecessary better captures this.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>
Return of the bi_rw tests is no longer bool after commit 74450be1. But
results of such tests are stored in bools. This doesn't fit in there
for some compilers (gcc 4.5 here), so either use !! magic to get real
bools or use ulong where the result is assigned somewhere.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
If kmalloc() fails then cleanup and return failure (-1).
Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
The doorbell reset initially appears to work correctly,
the controller resets, comes up, some i/o can even be
done, but on at least some Smart Arrays in some servers,
it eventually causes a subsequent controller lockup due
to some kind of PCIe error, and kdump can end up leaving
the root filesystem in an unbootable state. For this
reason, until the problem is fixed, or at least isolated
to certain hardware enough to be avoided, the doorbell
reset should not be used at all.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
The drivers for Xilinx' SystemACE and physically mapped MTDs were missing
prototypes for of_address_to_resource(). This patch adds the necessary
headers.
Signed-off-by: Graeme Smecher <graeme.smecher@mail.mcgill.ca>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
* 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block: (149 commits)
block: make sure that REQ_* types are seen even with CONFIG_BLOCK=n
xen-blkfront: fix missing out label
blkdev: fix blkdev_issue_zeroout return value
block: update request stacking methods to support discards
block: fix missing export of blk_types.h
writeback: fix bad _bh spinlock nesting
drbd: revert "delay probes", feature is being re-implemented differently
drbd: Initialize all members of sync_conf to their defaults [Bugz 315]
drbd: Disable delay probes for the upcomming release
writeback: cleanup bdi_register
writeback: add new tracepoints
writeback: remove unnecessary init_timer call
writeback: optimize periodic bdi thread wakeups
writeback: prevent unnecessary bdi threads wakeups
writeback: move bdi threads exiting logic to the forker thread
writeback: restructure bdi forker loop a little
writeback: move last_active to bdi
writeback: do not remove bdi from bdi_list
writeback: simplify bdi code a little
writeback: do not lose wake-ups in bdi threads
...
Fixed up pretty trivial conflicts in drivers/block/virtio_blk.c and
drivers/scsi/scsi_error.c as per Jens.
It was a now abandoned attempt to throttle resync bandwidth
based on the delay it causes on the bulk data socket.
It has no userbase yet, and has been disabled by
9173465ccb51c09cc3102a10af93e9f469a0af6f already.
This removes the now unused code.
The basic feature, namely using up "idle" bandwith
of network and disk IO subsystem, with minimal impact
to application IO, is being reimplemented differently.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
put_user() may fail, if so return -EFAULT.
Signed-off-by: Kulikov Vasiliy <segooon@gmail.com>
Acked-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
If there's no feature-barrier key in xenstore, then it means its a fairly
old backend which does uncached in-order writes, which means ORDERED_DRAIN
is appropriate.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
When barriers are supported, then use QUEUE_ORDERED_TAG to tell the block
subsystem that it doesn't need to do anything else with the barriers.
Previously we used ORDERED_DRAIN which caused the block subsystem to
drain all pending IO before submitting the barrier, which would be
very expensive.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
The struct cont_t is just a set of virtual function pointers.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Use memdup_user when user data is immediately copied into the
allocated region. Some checkpatch cleanups in nearby code.
The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression from,to,size,flag;
position p;
identifier l1,l2;
@@
- to = \(kmalloc@p\|kzalloc@p\)(size,flag);
+ to = memdup_user(from,size);
if (
- to==NULL
+ IS_ERR(to)
|| ...) {
<+... when != goto l1;
- -ENOMEM
+ PTR_ERR(to)
...+>
}
- if (copy_from_user(to, from, size) != 0) {
- <+... when != goto l2;
- -EFAULT
- ...+>
- }
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: Chirag Kantharia <chirag.kantharia@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
cciss: cleanup interrupt_not_for_us
In the case of MSI/MSIX interrutps, we don't need to check
if the interrupt is for us, and in the case of the intx interrupt
handler, when checking if the interrupt is for us, we don't need
to check if we're using MSI/MSIX, we know we're not.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
cciss: change printks to dev_warn, etc.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
cciss: separate cmd_alloc() and cmd_special_alloc()
cmd_alloc() took a parameter which caused it to either allocate
from a pre-allocated pool, or allocate using pci_alloc_consistent.
This parameter is always known at compile time, so this would
be better handled by breaking the function into two functions
and differentiating the cases by function names. Same goes
for cmd_free().
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
cciss: use consistent variable names
"h", for the hba structure and "c" for the command structures.
and get rid of trivial CCISS_LOCK macro.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
cciss: forbid hard reset of 640x boards
The 6402/6404 are two PCI devices -- two Smart Array controllers
-- that fit into one slot. It is possible to reset them independently,
however, they share a battery backed cache module. One of the pair
controls the cache and the 2nd one access the cache through the first
one. If you reset the one controlling the cache, the other one will
not be a happy camper. So we just forbid resetting this conjoined
mess.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
cciss: sanitize max commands
Some controllers might try to tell us they support 0 commands
in performant mode. This is a lie told by buggy firmware.
We have to be wary of this lest we try to allocate a negative
number of command blocks, which will be treated as unsigned,
and get an out of memory condition.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
cciss: Fix hard reset code.
Smart Array controllers newer than the P600 do not honor the
PCI power state method of resetting the controllers. Instead,
in these cases we can get them to reset via the "doorbell" register.
This escaped notice until we began using "performant" mode because
the fact that the controllers did not reset did not normally
impede subsequent operation, and so things generally appeared to
"work". Once the performant mode code was added, if the controller
does not reset, it remains in performant mode. The code immediately
after the reset presumes the controller is in "simple" mode
(which previously, it had remained in simple mode the whole time).
If the controller remains in performant mode any code which presumes
it is in simple mode will not work. So the reset needs to be fixed.
Unfortunately there are some controllers which cannot be reset by
either method. (eg. p800). We detect these cases by noticing that
the controller seems to remain in performant mode even after a
reset has been attempted. In those cases we ignore the controller,
as any commands outstanding on it will result in stale completions.
To sum up, we try to do a better job of resetting the controller if
"reset_devices" is set, and if it doesn't work, we ignore that
controller.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
cciss: factor out cciss_reset_devices()
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Rationale for this is that I will also need to use this code
in fixing kdump host reset code prior to having the hba structure.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
cciss: factor out cciss_enter_performant_mode
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
cciss: factor out cciss_wait_for_mode_change_ack()
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
cciss: make cciss_put_controller_into_performant_mode as __devinit
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
cciss: factor out cciss_p600_dma_prefetch_quirk()
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
cciss: factor out cciss_enable_scsi_prefetch()
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
cciss: factor out CISS_signature_present()
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
cciss: factor out cciss_find_board_params
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
cciss: fix leak of ioremapped memory
in cciss_pci_init error path.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
cciss: factor out cciss_wait_for_board_ready()
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
cciss: factor out cciss_find_memory_BAR()
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
cciss: remove board_id parameter from cciss_interrupt_mode()
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
cciss: factor out cciss_lookup_board_id
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
cciss: save pdev pointer in per hba structure early to avoid passing it around so much.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
cciss: Set the performant mode bit in the scsi half of the driver
In a couple of places, the performant mode bit wasn't being set in
the scsi half of the driver, causing commands to seem to hang. Use
enqueue_cmd_and_start_io() where appropriate. This fixes a bug that
echo engage scsi > /proc/driver/cciss/cciss0
would hang.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
This is just bd_openers, protected by the bd_mutex.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
This is just bd_openers, protected by the bd_mutex.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Same approach as blkfront_closing:
* Grab the bdev safely, holding the info mutex.
* Zap xbdev safely, holding the info mutex.
* Try bdev removal safely, holding bd_mutex.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
We cannot read backend state within bdev operations, because it risks
grabbing the state change before xenbus gets to do it.
Fixed by tracking deferral with a frontend switch to Closing. State
exposure isn't strictly necessary, but the backends won't mind.
For a 'clean' deferral this seems actually a more decent protocol than
raising errors.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
We need not mind if users grab a late handle on a closing disk. We
probably even should not. But we have to make sure it's not a dead
one already
Let the bdev deal with a gendisk deleted under its feet. Takes the
info mutex to decide a race against backend closing.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
The bdev .open/.release fops race against backend switches to Closing,
handled by the XenBus thread.
The original code attempted to serialize block device holders and
xenbus only via bd_mutex. This is insufficient, the info->bd pointer
may already be stale (or null) while xenbus tries to bump up the
refcount.
Protect blkfront_info with a dedicated mutex.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
* Current blkfront_closing is rather a xlvbd_release_gendisk.
Renamed in preparation of later patches (need the name again).
* Removed the misleading comment -- this only applied to the backend
switch handler, and the queue is already flushed btw.
* Break out the xenbus call, callers know better when to switch
frontend state.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
The call to del_gendisk follows an non-refcounted gd->queue
pointer. We release the last ref in blk_cleanup_queue. Fixed by
reordering releases accordingly.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Fix:
drivers/block/xen-blkfront.c: In function ‘blkfront_connect’:
drivers/block/xen-blkfront.c:933: warning: enumeration value ‘BLKIF_STATE_DISCONNECTED’ not handled in switch
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Support dynamic resizing of virtual block devices. This patch supports
both file backed block devices as well as physical devices that can be
dynamically resized on the host side.
Signed-off-by: K. Y. Srinivasan <ksrinivasan@novell.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Unfortunately commit "blkfront: fixes for 'xm block-detach ... --force'"
still wasn't quite right - there was a reference to freed memory left
from blkfront_closing().
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Prevent prematurely freeing 'struct blkfront_info' instances (when the
xenbus data structures are gone, but the Linux ones are still needed).
Prevent adding a disk with the same (major, minor) [and hence the same
name and sysfs entries, which leads to oopses] when the previous
instance wasn't fully de-allocated yet.
This still doesn't address all issues resulting from forced detach:
I/O submitted after the detach still blocks forever, likely preventing
subsequent un-mounting from completing. It's not clear to me (not
knowing much about the block layer) how this can be avoided.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
All Xen frontend drivers have a couple of identically named functions which
makes figuring out which device went wrong from a stacktrace harder than it
needs to be. Rename them to something specificto the device type.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
The open and release block_device_operations are currently
called with the BKL held. In order to change that, we must
first make sure that all drivers that currently rely
on this have no regressions.
This blindly pushes the BKL into all .open and .release
operations for all block drivers to prepare for the
next step. The drivers can subsequently replace the BKL
with their own locks or remove it completely when it can
be shown that it is not needed.
The functions blkdev_get and blkdev_put are the only
remaining users of the big kernel lock in the block
layer, besides a few uses in the ioctl code, none
of which need to serialize with blkdev_{get,put}.
Most of these two functions is also under the protection
of bdev->bd_mutex, including the actual calls to
->open and ->release, and the common code does not
access any global data structures that need the BKL.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
As a preparation for the removal of the big kernel
lock in the block layer, this removes the BKL
from the common ioctl handling code, moving it
into every single driver still using it.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
use REQ_FLUSH flag instead.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
REQ_FLUSH flag enables us to kill ps3disk_prepare_flush().
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Fix extra brace typo that is causing build errors.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
On compilation, gcc correctly detects that we do not handle
all types:
In function ‘blk_done’:
warning: enumeration value ‘REQ_TYPE_FS’ not handled in switch
warning: enumeration value ‘REQ_TYPE_SENSE’ not handled in switch
warning: enumeration value ‘REQ_TYPE_PM_SUSPEND’ not handled in switch
warning: enumeration value ‘REQ_TYPE_PM_RESUME’ not handled in switch
warning: enumeration value ‘REQ_TYPE_PM_SHUTDOWN’ not handled in switch
warning: enumeration value ‘REQ_TYPE_LINUX_BLOCK’ not handled in switch
warning: enumeration value ‘REQ_TYPE_ATA_TASKFILE’ not handled in switch
warning: enumeration value ‘REQ_TYPE_ATA_PC’ not handled in switch
which is a bit pointless since this is at the end of the request
processessing. Add a default case that just breaks out.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Remove the current bio flags and reuse the request flags for the bio, too.
This allows to more easily trace the type of I/O from the filesystem
down to the block driver. There were two flags in the bio that were
missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
renamed two request flags that had a superflous RW in them.
Note that the flags are in bio.h despite having the REQ_ name - as
blkdev.h includes bio.h that is the only way to go for now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Remove all the trivial wrappers for the cmd_type and cmd_flags fields in
struct requests. This allows much easier grepping for different request
types instead of unwinding through macros.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Convert assertions to use WARN(). There are several error checks in the
code for things that should never happen. Convert them to standard
warnings so kerneloops.org will see them.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Convert wait loops to use wait_event_ macros.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Ioctl cmd value is unsigned, so change normalize_ioctl
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
As reported by sparse, cmos attribute is local.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
The usage_count was being protected by a lock which was only there to
create an atomic counter.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
The first thing the floppy does is read block 0 to test geometry and to
test for disk presence. If disk is not present this causes a console
warning message about failed I/O. Set flag to silence.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
These routines are all big enough that is better to let the compiler
decide to inline or not.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Set debug jiffies offset at initialization. Avoids wierd values showing
up if debugging enabled.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Change the command padding on 32-bit systems to 0 since setting it to 32
has the identical effect.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Cc: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Remove a debug statement left behind by accident Ths debug statement got
left behind. It was commented out after use but not deleted.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Cc: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
The definition of next_command also ended up in wrong place It ended up
inside an "#ifdef CONFIG_PROCFS". Already caught by Randy Dunlap and a
couple others. Tried to put it somewhere that made sense.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Cc: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
call to put_controller_in_performant_mode was in the wrong place
The call inadvertently ended up in an error path.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Cc: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Make sure we register the performant mode interrupt Another blunder.
Seemed to work because the call to put_controller_into_performant_mode was
never called.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Cc: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Add support for new controllers due out next year. HP must continue to
support new controllers in older distros. All vendors require support be
upstream. These controllers support only 16 commands in simple mode but
can support up to 1024 in performant mode. See patch 5/6/ We have no
marketing names yet.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Cc: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Add a mode of controller operation called Performant Mode. Even though
cciss has been deprecated in favor of hpsa there are new controllers due
out next year that HP must support in older vendor distros. Vendors
require all fixes/features be upstream. These new controllers support
only 16 commands in simple mode but support up to 1024 in performant mode.
This requires us to add this support at this late date.
The performant mode transport minimizes host PCI accesses by performinf
many completions per read. PCI writes are posted so the host can write
then immediately get off the bus not waiting for the writwe to complete to
the target. In the context of performant mode the host read out to a
controller pulls all posted writes into host memory ensuring the reply
queue is coherent.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Cc: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Change the return type of our interrupt access routines to bool from
unsigned long. It makes more sense that way.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Cc: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Check to see if h->msi[x]_vector is set. We need this for a following
patch. Without this check we process one interrupt then stop because in
msi[x] mode the interrupt pending bit is not set. Not sure why we didn't
encounter this before.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Cc: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Simplify the interrupt handler code to more closely match hpsa and to
hopefully make it easier to follow.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Cc: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Clean up some code where we subit our io. The same 5 lines appeared
several times. Also helps for a following patch.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Cc: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
of_device is just an alias for platform_device, so remove it entirely. Also
replace to_of_device() with to_platform_device() and update comment blocks.
This patch was initially generated from the following semantic patch, and then
edited by hand to pick up the bits that coccinelle didn't catch.
@@
@@
-struct of_device
+struct platform_device
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Reviewed-by: David S. Miller <davem@davemloft.net>
* 'upstream/xen' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: (23 commits)
xen/panic: use xen_reboot and fix smp_send_stop
Xen: register panic notifier to take crashes of xen guests on panic
xen: support large numbers of CPUs with vcpu info placement
xen: drop xen_sched_clock in favour of using plain wallclock time
pvops: do not notify callers from register_xenstore_notifier
Introduce CONFIG_XEN_PVHVM compile option
blkfront: do not create a PV cdrom device if xen_hvm_guest
support multiple .discard.* sections to avoid section type conflicts
xen/pvhvm: fix build problem when !CONFIG_XEN
xenfs: enable for HVM domains too
x86: Call HVMOP_pagetable_dying on exit_mmap.
x86: Unplug emulated disks and nics.
x86: Use xen_vcpuop_clockevent, xen_clocksource and xen wallclock.
implement O_NONBLOCK for /proc/xen/xenbus
xen: Fix find_unbound_irq in presence of ioapic irqs.
xen: Add suspend/resume support for PV on HVM guests.
xen: Xen PCI platform device driver.
x86/xen: event channels delivery on HVM.
x86: early PV on HVM features initialization.
xen: Add support for HVM hypercalls.
...
With the availablility of a sysfs device attribute for examining disk serial
numbers the ioctl is no longer needed. The user-space changes for this aren't
upstream yet so we don't have any users to worry about.
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Create a new attribute for virtio-blk devices that will fetch the serial number
of the block device. This attribute can be used by udev to create disk/by-id
symlinks for devices that don't have a UUID (filesystem) associated with them.
ATA_IDENTIFY strings are special in that they can be up to 20 chars long
and aren't required to be nul-terminated. The buffer is also zero-padded
meaning that if the serial is 19 chars or less that we get a nul-terminated
string. When copying this value into a string buffer, we must be careful to
copy up to the nul (if it present) and only 20 if it is longer and not to
attempt to nul terminate; this isn't needed.
Changes since v1:
- Added BUILD_BUG_ON() for PAGE_SIZE check
- Removed min() since BUILD_BUG_ON() handles the check
- Replaced serial_sysfs() by copying id directly to buffer
Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Signed-off-by: john cooper <john.cooper@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we want to support barriers with the cache=writethrough mode in qemu
we need to tell the block layer that we only need queue drains to
implement a barrier. Follow the model set by SCSI and IDE and assume
that there is no volatile write cache if the host doesn't advertize it.
While this might imply working barriers on old qemu versions or other
hypervisors that actually have a volatile write cache this is only a
cosmetic issue - these hypervisors don't guarantee any data integrity
with or without this patch, but with the patch we at least provide
data ordering.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It is not possible to unplug emulated cdrom devices, and PV cdroms don't
handle media insert, eject and stream, so we are better off disabling PV
cdroms when running as a Xen HVM guest.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Add a xen_emul_unplug command line option to the kernel to unplug
xen emulated disks and nics.
Set the default value of xen_emul_unplug depending on whether or
not the Xen PV frontends and the Xen platform PCI driver have
been compiled for this kernel (modules or built-in are both OK).
The user can specify xen_emul_unplug=ignore to enable PV drivers on HVM
even if the host platform doesn't support unplug.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
pavel@suse.cz no longer works, replace it with working address.
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Cc: Mike Miller <mikem@beardog.cce.hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>