Commit Graph

131525 Commits

Author SHA1 Message Date
Felix Blyakher
43f3f057c5 [XFS] Warn on transaction in flight on read-only remount
Till VFS can correctly support read-only remount without racing,
use WARN_ON instead of BUG_ON on detecting transaction in flight
after quiescing filesystem.

Signed-off-by: Felix Blyakher <felixb@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2009-02-03 11:04:54 -06:00
Dave Chinner
6139a23609 xfs: Check buffer lengths in log recovery
Before trying to obtain, read or write a buffer,
check that the buffer length is actually valid. If
it is not valid, then something read in the recovery
process has been corrupted and we should abort
recovery.

Reported-by: Eric Sesterhenn <snakebyte@gmx.de>
Tested-by: Eric Sesterhenn <snakebyte@gmx.de>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
2009-02-03 11:01:32 -06:00
Felix Blyakher
6d2160bfe7 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 into for-linus 2009-02-03 10:38:41 -06:00
Linus Torvalds
52a84ec2f3 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  libata: implement HORKAGE_1_5_GBPS and apply it to WD My Book
  libata: add no penalty retry request for EH device handling routines
  libata: improve probe failure handling
  libata: add @spd_limit to sata_down_spd_limit()
  libata: clear dev->ering in smarter way
  libata: check onlineness before using SPD in sata_down_spd_limit()
  libata: move ata_dev_disable() to libata-eh.c
  libata: fix EH device failure handling
  sata_nv: ck804 has borked hardreset too
  ide/libata: fix ata_id_is_cfa() (take 4)
  libata: fix kernel-doc warnings
  ahci: add a module parameter to ignore the SSS flags for async scanning
  sata_mv: Fix chip type for Hightpoint RocketRaid 1740/1742
  [libata] sata_sil: Fix compilation error with libata debugging enabled
2009-02-03 07:39:55 -08:00
David S. Miller
b3ff29d2cc Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/Kconfig
2009-02-03 00:15:35 -08:00
David S. Miller
fb53fde976 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 2009-02-02 23:55:27 -08:00
Michael Tokarev
1bded710a5 tun: Check supplemental groups in TUN/TAP driver.
Michael Tokarev wrote:
[]
> 2, and this is the main one: How about supplementary groups?
>
> Here I have a valid usage case: a group of testers running various
> versions of windows using KVM (kernel virtual machine), 1 at a time,
> to test some software.  kvm is set up to use bridge with a tap device
> (there should be a way to connect to the machine).  Anyone on that group
> has to be able to start/stop the virtual machines.
>
> My first attempt - pretty obvious when I saw -g option of tunctl - is
> to add group ownership for the tun device and add a supplementary group
> to each user (their primary group should be different).  But that fails,
> since kernel only checks for egid, not any other group ids.
>
> What's the reasoning to not allow supplementary groups and to only check
> for egid?

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-02 23:34:56 -08:00
Maciej Sosnowski
eb4400e3a0 dca: redesign locks to fix deadlocks
Change spin_locks to irqsave to prevent dead-locks.
Protect adding and deleting to/from dca_providers list.
Drop the lock during dca_sysfs_add_req() and dca_sysfs_remove_req() calls
as they might sleep (use GFP_KERNEL allocation).

Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-02 23:26:57 -08:00
Frederic Weisbecker
1a5645bc90 connector: create connector workqueue only while needed once
The netlink connector uses its own workqueue to relay the datas sent
from userspace to the appropriate callback.  If you launch the test
from Documentation/connector and change it a bit to send a high flow
of data, you will see thousands of events coming to the "cqueue"
workqueue by looking at the workqueue tracer.

This flow of events can be sent very quickly. So, to not encumber the
kevent workqueue and delay other jobs, the "cqueue" workqueue should
remain.

But this workqueue is pointless most of the time, it will always be
created (assuming you have built it of course) although only
developpers with specific needs will use it.

So avoid this "most of the time useless task", this patch proposes to
create this workqueue only when needed once.  The first jobs to be
sent to connector callbacks will be sent to kevent while the "cqueue"
thread creation will be scheduled to kevent too.

The following jobs will continue to be scheduled to keventd until the
cqueue workqueue is created, and then the rest of the jobs will
continue to perform as usual, through this dedicated workqueue.

Each time I tested this patch, only the first event was sent to
keventd, the rest has been sent to cqueue which have been created
quickly.

Also, this patch fixes some trailing whitespaces on the connector files.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-02 23:22:04 -08:00
Roel Kluin
ff01b91636 cassini/sungem: limit reaches -1, but 0 tested
while (limit--)
	if (test())
		break;

if (limit <= 0)
	goto test_failed;

In the last iteration, limit is decremented after the test to 0.
If just thereafter test() succeeds and a break occurs, the goto
still occurs because limit is 0.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-02 23:19:50 -08:00
Stephen Rothwell
47a4a0e766 sparc: fixup for sparseirq changes
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-02 22:14:28 -08:00
David S. Miller
eeabac7386 sparc64: Validate kernel generated fault addresses on sparc64.
In order to handle all of the cases of address calculation overflow
properly, we run sparc 32-bit processes in "address masking" mode
when running on a 64-bit kernel.

Address masking mode zeros out the top 32-bits of the address
calculated for every load and store instruction.

However, when we're in privileged mode we have to run with that
address masking mode disabled even when accessing userspace from
the kernel.

To "simulate" the address masking mode we clear the top-bits by
hand for 32-bit processes in the fault handler.

It is the responsibility of code in the compat layer to properly
zero extend addresses used to access userspace.  If this isn't
followed properly we can get into a fault loop.

Say that the user address is 0xf0000000 but for whatever reason
the kernel code sign extends this to 64-bit, and then the kernel
tries to access the result.

In such a case we'll fault on address 0xfffffffff0000000 but the fault
handler will process that fault as if it were to address 0xf0000000.
We'll loop faulting forever because the fault never gets satisfied.

So add a check specifically for this case, when the kernel is faulting
on a user address access and the addresses don't match up.

This code path is sufficiently slow path, and this bug is sufficiently
painful to diagnose, that this kind of bug check is warranted.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-02 22:08:15 -08:00
David S. Miller
802c64b310 sparc64: On non-Niagara, need to touch NMI watchdog in NOHZ mode.
When we're idling in NOHZ mode, timer interrupts are not running.

Evidence of processing timer interrupts is what the NMI watchdog
uses to determine if the CPU is stuck.

On Niagara, we'll yield the cpu.  This will make the cpu, at
worst, hang out in the hypervisor until an interrupt arrives.
This will prevent the NMI watchdog timer from firing.

However on non-Niagara we just loop executing instructions
which will cause the NMI watchdog to keep firing.  It won't
see timer interrupts happening so it will think the cpu is
stuck.

Fix this by touching the NMI watchdog in the cpu idle loop
on non-Niagara machines.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-02 21:57:48 -08:00
Roel Kluin
46578a6913 net: variables reach -1, but 0 tested
while (timeout--) { ... }

timeout becomes -1 if the loop isn't ended otherwise, not 0.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-02 21:39:02 -08:00
Tejun Heo
9062712fa9 libata: implement HORKAGE_1_5_GBPS and apply it to WD My Book
3Gbps is often much more prone to transmission failures.  It's usually
okay to let EH handle speed down after transmission failures but some
WD My Book drives completely shutdown after certain transmission
failures and after it only power cycling can revive them.  Combined
with the fact that external drives often end up with cable assembly
which is longer than usual and more likely to have intervening gender,
this makes these drives very likely to shutdown under certain
configurations virtually rendering them unusable.

This patch implements HOARKGE_1_5_GBPS and applies it to WD My Book
such that 1.5Gbps is forced once the device is identified.

Please take a look at the following bz for related reports.

  http://bugzilla.kernel.org/show_bug.cgi?id=9913

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-02-02 23:04:31 -05:00
Tejun Heo
cf9a590a9e libata: add no penalty retry request for EH device handling routines
Let -EAGAIN from EH device handling routines trigger EH retry without
consuming its tries count.  This will be used to implement link SPD
horkage which requires hardreset to adjust SPD without affecting other
EH decisions.  As it bypasses the forward progress guarantee provided
by the tries count, the requester is responsible for ensuring forward
progress.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-02-02 23:04:19 -05:00
Tejun Heo
c2c7a89c5e libata: improve probe failure handling
When link is flaky at high speed, it isn't uncommon for a device to
repeatedly fail probing sequence early after successfully negotiating
high link speed.  This often leads to consecutive hotplug events
without successful probing.

This patch improves libata EH such that it remembers probing trials
and if there have been more than two unsuccessful trials in the past
60 seconds, slows down link speed to 1.5Gbps.

As link speed negotiation is the duty of the PHY layer proper, the
goal of this fallback mechanism is to provide the last resort when
everything else fails, which unfortunately happens not too
infrequently, so no fancy 6->3->1.5 speeding down or highest
successful transmission speed seen kind of logics (yet).

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-02-02 23:03:34 -05:00
Tejun Heo
a07d499b47 libata: add @spd_limit to sata_down_spd_limit()
Add @spd_limit to sata_down_spd_limit() so that the caller can specify
the SPD limit it wants.  This parameter doesn't get in the way even
when it's too low.  The closest possible limit is applied.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-02-02 23:03:22 -05:00
Tejun Heo
99cf610aa4 libata: clear dev->ering in smarter way
dev->ering used to be cleared together with the rest of ata_device in
ata_dev_init() which is called whenever a probing event occurs.
dev->ering is about to be used to track probing failures so it needs
to remain persistent over multiple porbing events.  This patch
achieves this by doing the following.

* Instead of CLEAR_OFFSET, define CLEAR_BEGIN and CLEAR_END and only
  clear between BEGIN and END.  ering is moved after END.  The split
  of persistent area is to allow hotter items remain at the head.

* ering is explicitly cleared on ata_dev_disable() and when device
  attach succeeds.  So, ering is persistent throug a device's life
  time (unless explicitly cleared of course) and also through periods
  inbetween disablement of an attached device and successful detection
  of the next one.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-02-02 23:03:17 -05:00
Tejun Heo
9913ff8abf libata: check onlineness before using SPD in sata_down_spd_limit()
sata_down_spd_limit() should check whether the link is online before
using the SPD value to determine how to limit the link speed.  Factor
out onlineness test and test it from sata_down_spd_limit().

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-02-02 23:03:08 -05:00
Tejun Heo
678afac678 libata: move ata_dev_disable() to libata-eh.c
ata_dev_disable() is about to be more tightly integrated into EH
logic.  Move it to libata-eh.c.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-02-02 23:03:00 -05:00
Tejun Heo
d89293abd9 libata: fix EH device failure handling
The dev->pio_mode > XFER_PIO_0 test is there to avoid unnecessary
speed down warning messages but it accidentally disabled SATA link spd
down during configuration phase after reset where PIO mode is always
zero.

This patch fixes the problem by moving the test where it belongs.
This makes libata probing sequence behave better when the connection
is flaky at higher link speeds which isn't too uncommon for eSATA
devices.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-02-02 23:02:57 -05:00
Tejun Heo
8d993eaa9c sata_nv: ck804 has borked hardreset too
While playing with nvraid, I found out that rmmoding and insmoding
often trigger hardreset failure on the first port (the second one was
always okay).  Seriously, how diverse can you get with hardreset
behaviors?  Anyways, make ck804 use noclassify variant too.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-02-02 23:02:38 -05:00
Sergei Shtylyov
2999b58b79 ide/libata: fix ata_id_is_cfa() (take 4)
When checking for the CFA feature set support, ata_id_is_cfa() tests bit 2 in
word 82 of the identify data instead the word 83;  it also checks the ATA/PI
version support in the word 80 (which the CompactFlash specifications have as
reserved), this having no slightest chance to work on the modern CF cards that
don't have 0x848A in the word 0...

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-02-02 22:45:10 -05:00
Randy Dunlap
5eb66fe05f libata: fix kernel-doc warnings
Fix libata kernel-doc warnings:

Warning(linux-next-20090120//drivers/ata/libata-core.c:4720): Excess function parameter 'dev' description in 'ata_qc_new'
Warning(linux-next-20090120//drivers/ata/libata-scsi.c:428): No description found for parameter 'ap'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-02-02 22:41:45 -05:00
Arjan van de Ven
f3d7f23f87 ahci: add a module parameter to ignore the SSS flags for async scanning
The SSS flag, which directs the OS to spin up one disk at a time
to not have the PSU blow out, sometimes gets set even when not needed.
The effect of this is a longer-than-needed boot time.

This patch adds a module parameter that makes the driver ignore SSS
at least as far as the parallel scan during boot is concerned...

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-02-02 22:41:39 -05:00
Mark Lord
4462254ac6 sata_mv: Fix chip type for Hightpoint RocketRaid 1740/1742
Fix chip type for the Highpoint RocketRAID 1740 and 1742 PCI cards.
These really do have Marvell 6042 chips on them, rather than the 5081 chip.

Confirmed by multiple (two) users (for the 1740), and by examining
the product photographs from Highpoint's web site.

Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-02-02 22:41:29 -05:00
Pasi Kärkkäinen
41137aa61c [libata] sata_sil: Fix compilation error with libata debugging enabled
I tried compiling 2.6.29-rc1 and 2.6.29-rc3 with libata debugging enabled
and got the following error:

  CC [M]  drivers/ata/sata_sil.o
drivers/ata/sata_sil.c: In function 'sil_fill_sg':
drivers/ata/sata_sil.c:327: error: 'pi' undeclared (first use in this function)
drivers/ata/sata_sil.c:327: error: (Each undeclared identifier is reported only once
drivers/ata/sata_sil.c:327: error: for each function it appears in.)
make[2]: *** [drivers/ata/sata_sil.o] Error 1
make[1]: *** [drivers/ata] Error 2
make: *** [drivers] Error 2

include/linux/libata.h has the following enabled:

#define ATA_DEBUG
#define ATA_VERBOSE_DEBUG
#define ATA_IRQ_TRAP

This fixes the compilation.

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-02-02 22:38:29 -05:00
Linus Torvalds
b1792e3670 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI hotplug: Change link order of pciehp & acpiphp
  PCI hotplug: fakephp: Allocate PCI resources before adding the device
  PCI MSI: Fix undefined shift by 32
  PCI PM: Do not wait for buses in B2 or B3 during resume
  PCI PM: Power up devices before restoring their state
  PCI PM: Fix hibernation breakage on EeePC 701
  PCI: irq and pci_ids patch for Intel Tigerpoint DeviceIDs
  PCI PM: Fix suspend error paths and testing facility breakage
2009-02-02 19:28:58 -08:00
Linus Torvalds
859281ff37 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
  slub: fix per cpu kmem_cache_cpu array memory leak
  kmalloc: return NULL instead of link failure
2009-02-02 19:27:00 -08:00
Linus Torvalds
93bfbd71db Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  fbdev/atyfb: Fix DSP config on some PowerMacs & PowerBooks
  powerpc: Fix oops on some machines due to incorrect pr_debug()
  powerpc/ps3: Printing fixups for l64 to ll64 convserion drivers/net
  powerpc/5200: update device tree binding documentation
  powerpc/5200: Bugfix for PCI mapping of memory and IMMR
  powerpc/5200: update defconfigs
2009-02-02 19:26:44 -08:00
Linus Torvalds
31c952dcf8 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched_rt: don't use first_cpu on cpumask created with cpumask_and
  sched: fix buddie group latency
  sched: clear buddies more aggressively
  sched: symmetric sync vs avg_overlap
  sched: fix sync wakeups
  cpuset: fix possible deadlock in async_rebuild_sched_domains
2009-02-02 19:26:29 -08:00
Linus Torvalds
9e6235e997 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6: (45 commits)
  V4L/DVB (10411): s5h1409: Perform s5h1409 soft reset after tuning
  V4L/DVB (10403): saa7134-alsa: saa7130 doesn't support digital audio
  V4L/DVB (10229): ivtv: fix memory leak
  V4L/DVB (10385): gspca - main: Fix memory leak when USB disconnection while streaming.
  V4L/DVB (10325): em28xx: Fix for fail to submit URB with IRQs and Pre-emption Disabled
  V4L/DVB (10317): radio-mr800: fix radio->muted and radio->stereo
  V4L/DVB (10314): cx25840: ignore TUNER_SET_CONFIG in the command callback.
  V4L/DVB (10288): af9015: bug fix: stick does not work always when plugged
  V4L/DVB (10287): af9015: fix second FE
  V4L/DVB (10270): saa7146: fix unbalanced mutex_lock/unlock
  V4L/DVB (10265): budget.c driver: Kernel oops: "BUG: unable to handle kernel paging request at ffffffff
  V4L/DVB (10261): em28xx: fix kernel panic on audio shutdown
  V4L/DVB (10257): em28xx: Fix for KWorld 330U Board
  V4L/DVB (10256): em28xx: Fix for KWorld 330U AC97
  V4L/DVB (10254): em28xx: Fix audio URB transfer buffer race condition
  V4L/DVB (10250): cx25840: fix regression: fw not loaded on first use
  V4L/DVB (10248): v4l-dvb: fix a bunch of compile warnings.
  V4L/DVB (10243): em28xx: fix compile warning
  V4L/DVB (10240): Fix obvious swapped names in v4l2_subdev logic
  V4L/DVB (10233): [PATCH] Terratec Cinergy DT XS Diversity new USB ID (0ccd:0081)
  ...
2009-02-02 19:26:06 -08:00
Linus Torvalds
5c350d93ff Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc:
  pxamci: enable DMA for write ops after CMD/RESP
  pxamci: replace #ifdef CONFIG_PXA27x with if (cpu_is_pxa27x())
  ricoh_mmc: Use suspend_late/resume_early
  mmci: Add support for ST Micro derivate
  mmc: Add a MX2/MX3 specific SDHC driver
2009-02-02 19:24:14 -08:00
Linus Torvalds
017f51788f Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6:
  icside: fix PCB version 6 support (v2)
  tx4939ide: typo fix and minor cleanup
  ide: add CS5536 host driver (v3)
  ide: Force VIA IDE legacy interrupts for AmigaOne boards
  IDE: Unregister and disable devices if initialization fails.
  ide: fix ide_register_port() failure handling
  ide: struct device - replace bus_id with dev_name(), dev_set_name()
  ide-cd: fix DMA for non bio-backed requests
2009-02-02 19:23:49 -08:00
Linus Torvalds
17294ab2ca Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/dvrabel/uwb
* 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/dvrabel/uwb:
  uwb: lock rc->rsvs_lock with spin_lock_bh()
  wusb: timeout when waiting for ASL/PZL updates in whci-hcd
  uwb: remove unused #include <version.h>'s
  wusb: return -ENOTCONN when resetting a port with no connected device
  uwb: safely remove all reservations
2009-02-02 19:20:17 -08:00
Linus Torvalds
86adf8adfc Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  block: add text file detailing queue/ sysfs files
  bio.h: If they MUST be inlined, then use __always_inline
  Fix misleading comment in bio.h
  block: fix inconsistent parenthesisation of QUEUE_FLAG_DEFAULT
  block: fix oops in blk_queue_io_stat()
2009-02-02 19:19:50 -08:00
Mark McLoughlin
3fff0179e3 virtio-pci: do not oops on config change if driver not loaded
The host really shouldn't be notifying us of config changes
before the device status is VIRTIO_CONFIG_S_DRIVER or
VIRTIO_CONFIG_S_DRIVER_OK.

However, if we do happen to be interrupted while we're not
attached to a driver, we really shouldn't oops. Prevent
this simply by checking that device->driver is non-NULL
before trying to notify the driver of config changes.

Problem observed by doing a "set_link virtio.0 down" with
QEMU before the net driver had been loaded.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-02 19:17:56 -08:00
Eric Dumazet
720eba31f4 modules: Use a better scheme for refcounting
Current refcounting for modules (done if CONFIG_MODULE_UNLOAD=y) is
using a lot of memory.

Each 'struct module' contains an [NR_CPUS] array of full cache lines.

This patch uses existing infrastructure (percpu_modalloc() &
percpu_modfree()) to allocate percpu space for the refcount storage.

Instead of wasting NR_CPUS*128 bytes (on i386), we now use
nr_cpu_ids*sizeof(local_t) bytes.

On a typical distro, where NR_CPUS=8, shiping 2000 modules, we reduce
size of module files by about 2 Mbytes. (1Kb per module)

Instead of having all refcounters in the same memory node - with TLB misses
because of vmalloc() - this new implementation permits to have better
NUMA properties, since each  CPU will use storage on its preferred node,
thanks to percpu storage.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-02 19:17:55 -08:00
Mark Fasheh
fd4ef23196 ocfs2: add quota call to ocfs2_remove_btree_range()
We weren't reclaiming the clusters which get free'd from this function,
so any user punching holes in a file would still have those bytes accounted
against him/her. Add the call to vfs_dq_free_space_nodirty() to fix this.
Interestingly enough, the journal credits calculation already took this into
account.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Acked-by: Jan Kara <jack@suse.cz>
2009-02-02 14:20:20 -08:00
Sunil Mushran
a4b91965d3 ocfs2: Wakeup the downconvert thread after a successful cancel convert
When two nodes holding PR locks on a resource concurrently attempt to
upconvert the locks to EX, the master sends a BAST to one of the nodes. This
message tells that node to first cancel convert the upconvert request,
followed by downconvert to a NL. Only when this lock is downconverted to NL,
can the master upconvert the first node's lock to EX.

While the fs was doing the cancel convert, it was forgetting to wake up the
dc thread after a successful cancel, leading to a deadlock.

Reported-and-Tested-by: David Teigland <teigland@redhat.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-02-02 14:20:19 -08:00
Tao Ma
554e7f9e04 ocfs2: Access the xattr bucket only before modifying it.
In ocfs2_xattr_value_truncate, we may call b-tree codes which will
extend the journal transaction. It has a potential problem that it
may let the already-accessed-but-not-dirtied buffers gone. So we'd
better access the bucket after we call ocfs2_xattr_value_truncate.
And as for the root buffer for the xattr value, b-tree code will
acess and dirty it, so we don't need to worry about it.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-02-02 14:20:18 -08:00
Joel Becker
0e0333429a configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item()
When attaching default groups (subdirs) of a new group (in mkdir() or
in configfs_register()), configfs recursively takes inode's mutexes
along the path from the parent of the new group to the default
subdirs. This is needed to ensure that the VFS will not race with
operations on these sub-dirs. This is safe for the following reasons:

- the VFS allows one to lock first an inode and second one of its
  children (The lock subclasses for this pattern are respectively
  I_MUTEX_PARENT and I_MUTEX_CHILD);
- from this rule any inode path can be recursively locked in
  descending order as long as it stays under a single mountpoint and
  does not follow symlinks.

Unfortunately lockdep does not know (yet?) how to handle such
recursion.

I've tried to use Peter Zijlstra's lock_set_subclass() helper to
upgrade i_mutexes from I_MUTEX_CHILD to I_MUTEX_PARENT when we know
that we might recursively lock some of their descendant, but this
usage does not seem to fit the purpose of lock_set_subclass() because
it leads to several i_mutex locked with subclass I_MUTEX_PARENT by
the same task.

>From inside configfs it is not possible to serialize those recursive
locking with a top-level one, because mkdir() and rmdir() are already
called with inodes locked by the VFS. So using some
mutex_lock_nest_lock() is not an option.

I am proposing two solutions:
1) one that wraps recursive mutex_lock()s with
   lockdep_off()/lockdep_on().
2) (as suggested earlier by Peter Zijlstra) one that puts the
   i_mutexes recursively locked in different classes based on their
   depth from the top-level config_group created. This
   induces an arbitrary limit (MAX_LOCK_DEPTH - 2 == 46) on the
   nesting of configfs default groups whenever lockdep is activated
   but this limit looks reasonably high. Unfortunately, this alos
   isolates VFS operations on configfs default groups from the others
   and thus lowers the chances to detect locking issues.

This patch implements solution 1).

Solution 2) looks better from lockdep's point of view, but fails with
configfs_depend_item(). This needs to rework the locking
scheme of configfs_depend_item() by removing the variable lock recursion
depth, and I think that it's doable thanks to the configfs_dirent_lock.
For now, let's stick to solution 1).

Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com>
Acked-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-02-02 14:20:18 -08:00
Jan Kara
f8afead716 ocfs2: Fix possible deadlock in ocfs2_write_dquot()
It could happen that some limit has been set via quotactl() and in parallel
->mark_dirty() is called from another thread doing e.g. dquot_alloc_space(). In
such case ocfs2_write_dquot() must not try to sync the dquot because that needs
global quota lock but that ranks above transaction start.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-02-02 14:20:17 -08:00
Jan Kara
ea455f8ab6 ocfs2: Push out dropping of dentry lock to ocfs2_wq
Dropping of last reference to dentry lock is a complicated operation involving
dropping of reference to inode. This can get complicated and quota code in
particular needs to obtain some quota locks which leads to potential deadlock.
Thus we defer dropping of inode reference to ocfs2_wq.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-02-02 14:20:16 -08:00
Ron Mercer
0047e5d240 qlge: bugfix: Add missing netif_napi_del call.
Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-02 13:54:31 -08:00
Ron Mercer
e78f5fa7cc qlge: bugfix: Add flash offset for second port.
Without this the 2nd port gets first ports MAC addr.

Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-02 13:54:15 -08:00
Ron Mercer
26351479ed qlge: bugfix: Fix endian issue when reading flash.
Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-02 13:53:57 -08:00
Eric Dumazet
e408b8dcb5 udp: increments sk_drops in __udp_queue_rcv_skb()
Commit 93821778de (udp: Fix rcv socket
locking) accidentally removed sk_drops increments for UDP IPV4
sockets.

This field can be used to detect incorrect sizing of socket receive
buffers.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-02 13:41:57 -08:00
David S. Miller
0afd4a21ba net: Fix userland breakage wrt. linux/if_tunnel.h
Reported by Andrew Walrond <andrew@walrond.org>

Changeset c19e654ddb
("gre: Add netlink interface") added an include
of linux/ip.h to linux/if_tunnel.h

We can't really let that get exposed to userspace
because this conflicts with types defined in netinet/ip.h
which userland is almost certainly going to have included
either explicitly or implicitly.

So guard this include with a __KERNEL__ ifdef.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-02 13:27:44 -08:00