Commit Graph

244667 Commits

Author SHA1 Message Date
Trond Myklebust
fd954ae124 NFSv4.1: Don't loop forever in nfs4_proc_create_session
If a server for some reason keeps sending NFS4ERR_DELAY errors, we can end
up looping forever inside nfs4_proc_create_session, and so the usual
mechanisms for detecting if the nfs_client is dead don't work.

Fix this by ensuring that we loop inside the nfs4_state_manager thread
instead.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-04-24 14:28:18 -04:00
Dan Rosenberg
5f6279da37 [SCSI] pmcraid: reject negative request size
There's a code path in pmcraid that can be reached via device ioctl that
causes all sorts of ugliness, including heap corruption or triggering
the OOM killer due to consecutive allocation of large numbers of pages.
Not especially relevant from a security perspective, since users must
have CAP_SYS_ADMIN to open the character device.

First, the user can call pmcraid_chr_ioctl() with a type
PMCRAID_PASSTHROUGH_IOCTL.  A pmcraid_passthrough_ioctl_buffer
is copied in, and the request_size variable is set to
buffer->ioarcb.data_transfer_length, which is an arbitrary 32-bit signed
value provided by the user.

If a negative value is provided here, bad things can happen.  For
example, pmcraid_build_passthrough_ioadls() is called with this
request_size, which immediately calls pmcraid_alloc_sglist() with a
negative size.  The resulting math on allocating a scatter list can
result in an overflow in the kzalloc() call (if num_elem is 0, the
sglist will be smaller than expected), or if num_elem is unexpectedly
large the subsequent loop will call alloc_pages() repeatedly, a high
number of pages will be allocated and the OOM killer might be invoked.

Prevent this value from being negative in pmcraid_ioctl_passthrough().

Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Cc: stable@kernel.org
Cc: Anil Ravindranath <anil_ravindranath@pmc-sierra.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-04-24 12:15:07 -05:00
James Bottomley
86cbfb5607 [SCSI] put stricter guards on queue dead checks
SCSI uses request_queue->queuedata == NULL as a signal that the queue
is dying.  We set this state in the sdev release function.  However,
this allows a small window where we release the last reference but
haven't quite got to this stage yet and so something will try to take
a reference in scsi_request_fn and oops.  It's very rare, but we had a
report here, so we're pushing this as a bug fix

The actual fix is to set request_queue->queuedata to NULL in
scsi_remove_device() before we drop the reference.  This causes
correct automatic rejects from scsi_request_fn as people who hold
additional references try to submit work and prevents anything from
getting a new reference to the sdev that way.

Cc: stable@kernel.org
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-04-24 11:02:17 -05:00
Mike Snitzer
0b8393578c [SCSI] scsi_dh: fix reference counting in scsi_dh_activate error path
Commit db422318cb ([SCSI] scsi_dh:
propagate SCSI device deletion) introduced a regression where the device
reference is not dropped prior to scsi_dh_activate's early return from
the error path.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@kernel.org # 2.6.38
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-04-24 11:02:09 -05:00
Dan Rosenberg
a1f74ae82d [SCSI] mpt2sas: prevent heap overflows and unchecked reads
At two points in handling device ioctls via /dev/mpt2ctl, user-supplied
length values are used to copy data from userspace into heap buffers
without bounds checking, allowing controllable heap corruption and
subsequently privilege escalation.

Additionally, user-supplied values are used to determine the size of a
copy_to_user() as well as the offset into the buffer to be read, with no
bounds checking, allowing users to read arbitrary kernel memory.

Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Cc: stable@kernel.org
Acked-by: Eric Moore <eric.moore@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-04-24 11:01:59 -05:00
Linus Torvalds
5dd12af05c Merge branch 'dcache-cleanup'
* dcache-cleanup:
  vfs: get rid of insane dentry hashing rules
2011-04-24 08:51:15 -07:00
Linus Torvalds
8f7544682c Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  libata: ahci_start_engine compliant to AHCI spec
  ata: pata_at91.c bugfix for initial_timing initialisation
  ata: pata_at91.c bugfix for high master clock
  ahci: AHCI-mode SATA patch for Intel Panther Point DeviceIDs
  ata_piix: IDE-mode SATA patch for Intel Panther Point DeviceIDs
  libata: Pioneer DVR-216D can't do SETXFER
  ahci: don't enable port irq before handler is registered
  libata: Implement ATA_FLAG_NO_DIPM and apply it to mcp65
  libata: Kill unused ATA_DFLAG_{H|D}IPM flags
  ahci: EM supported message type sysfs attribute
2011-04-24 08:45:37 -07:00
Linus Torvalds
1f91f48b65 Merge branch 'for-linus' of git://git.infradead.org/ubifs-2.6
* 'for-linus' of git://git.infradead.org/ubifs-2.6:
  UBIFS: fix master node recovery
  UBIFS: fix false assertion warning in case of I/O failures
  UBIFS: fix false space checking failure
2011-04-24 08:42:15 -07:00
Jian Peng
270dac35c2 libata: ahci_start_engine compliant to AHCI spec
At the end of section 10.1 of AHCI spec (rev 1.3), it states

Software shall not set PxCMD.ST to 1 until it is determined that
a functoinal device is present on the port as determined by
PxTFD.STS.BSY=0, PxTFD.STS.DRQ=0 and PxSSTS.DET=3h

Even though most AHCI host controller works without this check,
specific controller will fail under this condition.

Signed-off-by: Jian Peng <jipeng2005@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-04-24 11:35:40 -04:00
Igor Plyatov
792d37af35 ata: pata_at91.c bugfix for initial_timing initialisation
The "struct ata_timing" must contain 10 members, but ".dmack_hold" member was
forgotten for "initial_timing" initialisation. This patch fixes such a problem.

Signed-off-by: Igor Plyatov <plyatov@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-04-24 11:34:06 -04:00
Igor Plyatov
9719b8f5bc ata: pata_at91.c bugfix for high master clock
The AT91SAM9 microcontrollers with master clock higher then 105 MHz
and PIO0, have overflow of the NCS_RD_PULSE value in the MSB. This
lead to "NCS_RD_PULSE" pulse longer then "NRD_CYCLE" pulse and driver
does not detect ATA device.

Signed-off-by: Igor Plyatov <plyatov@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-04-24 11:34:06 -04:00
Seth Heasley
181e3ceaba ahci: AHCI-mode SATA patch for Intel Panther Point DeviceIDs
The previously submitted patch was word-wrapped.

This patch adds the AHCI-mode SATA DeviceIDs for the Intel Panther Point PCH.

Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-04-24 11:34:05 -04:00
Seth Heasley
4a836c701a ata_piix: IDE-mode SATA patch for Intel Panther Point DeviceIDs
The previously submitted patch was word-wrapped.

This patch adds the IDE-mode SATA DeviceIDs for the Intel Panther
Point PCH.

Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-04-24 11:34:05 -04:00
Jeff Mahoney
d69cf28cd2 libata: Pioneer DVR-216D can't do SETXFER
Commit 4a5610a04d fixed an issue with
 the Pioneer DVR-212D not handling SETXFER correctly. An openSUSE user
 reported a similar issue with his DVR-216D that the NOSETXFER horkage
 worked around for him as well.

 This patch adds the DVR-216D (1.08) to the horkage list for NOSETXFER.

 The issue was reported at:
 https://bugzilla.novell.com/show_bug.cgi?id=679143

Reported-by: Volodymyr Kyrychenko <vladimir.kirichenko@gmail.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-04-24 11:34:05 -04:00
Maxime Bizon
7b3a24c57d ahci: don't enable port irq before handler is registered
The ahci_pmp_attach() & ahci_pmp_detach() unmask port irqs, but they
are also called during port initialization, before ahci host irq
handler is registered. On ce4100 platform, this sometimes triggers
"irq 4: nobody cared" message when loading driver.

Fixed this by not touching the register if the port is in frozen
state, and mark all uninitialized port as frozen.

Signed-off-by: Maxime Bizon <mbizon@freebox.fr>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-04-24 11:34:05 -04:00
Tejun Heo
ae01b2493c libata: Implement ATA_FLAG_NO_DIPM and apply it to mcp65
NVIDIA mcp65 familiy of controllers cause command timeouts when DIPM
is used.  Implement ATA_FLAG_NO_DIPM and apply it.

This problem was reported by Stefan Bader in the following thread.

 http://thread.gmane.org/gmane.linux.ide/48841

stable: applicable to 2.6.37 and 38.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Stefan Bader <stefan.bader@canonical.com>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-04-24 11:32:16 -04:00
Tejun Heo
3f7ac1d667 libata: Kill unused ATA_DFLAG_{H|D}IPM flags
ATA_DFLAG_{H|D}IPM flags are no longer used.  Kill them.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-04-24 11:32:03 -04:00
Hannes Reinecke
6e5fe5b12c ahci: EM supported message type sysfs attribute
This patch adds an sysfs attribute 'em_message_supported' to the
ahci host device which prints out the supported enclosure management
message types.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-04-24 11:31:31 -04:00
Ben Hutchings
3ba4162115 kconfig: Avoid buffer underrun in choice input
Commit 40aee729b3 ('kconfig: fix default value for choice input')
fixed some cases where kconfig would select the wrong option from a
choice with a single valid option and thus enter an infinite loop.

However, this broke the test for user input of the form 'N?', because
when kconfig selects the single valid option the input is zero-length
and the test will read the byte before the input buffer.  If this
happens to contain '?' (as it will in a mips build on Debian unstable
today) then kconfig again enters an infinite loop.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: stable@kernel.org [2.6.17+]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-24 08:24:31 -07:00
Linus Torvalds
dea3667bc3 vfs: get rid of insane dentry hashing rules
The dentry hashing rules have been really quite complicated for a long
while, in odd ways.  That made functions like __d_drop() very fragile
and non-obvious.

In particular, whether a dentry was hashed or not was indicated with an
explicit DCACHE_UNHASHED bit.  That's despite the fact that the hash
abstraction that the dentries use actually have a 'is this entry hashed
or not' model (which is a simple test of the 'pprev' pointer).

The reason that was done is because we used the normal 'is this entry
unhashed' model to mark whether the dentry had _ever_ been hashed in the
dentry hash tables, and that logic goes back many years (commit
b3423415fb: "dcache: avoid RCU for never-hashed dentries").

That, in turn, meant that __d_drop had totally different unhashing logic
for the dentry hash table case and for the anonymous dcache case,
because in order to use the "is this dentry hashed" logic as a flag for
whether it had ever been on the RCU hash table, we had to unhash such a
dentry differently so that we'd never think that it wasn't 'unhashed'
and wouldn't be free'd correctly.

That's just insane.  It made the logic really hard to follow, when there
were two different kinds of "unhashed" states, and one of them (the one
that used "list_bl_unhashed()") really had nothing at all to do with
being unhashed per se, but with a very subtle lifetime rule instead.

So turn all of it around, and make it logical.

Instead of having a DENTRY_UNHASHED bit in d_flags to indicate whether
the dentry is on the hash chains or not, use the hash chain unhashed
logic for that.  Suddenly "d_unhashed()" just uses "list_bl_unhashed()",
and everything makes sense.

And for the lifetime rule, just use an explicit DENTRY_RCUACCEES bit.
If we ever insert the dentry into the dentry hash table so that it is
visible to RCU lookup, we mark it DENTRY_RCUACCESS to show that it now
needs the RCU lifetime rules.  Now suddently that test at dentry free
time makes sense too.

And because unhashing now is sane and doesn't depend on where the dentry
got unhashed from (because the dentry hash chain details doesn't have
some subtle side effects), we can re-unify the __d_drop() logic and use
common code for the unhashing.

Also fix one more open-coded hash chain bit_spin_lock() that I missed in
the previous chain locking cleanup commit.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-24 07:58:46 -07:00
Linus Torvalds
686c4cbb10 Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6
* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
  PM: Add missing syscore_suspend() and syscore_resume() calls
  PM: Fix error code paths executed after failing syscore_suspend()
2011-04-23 22:35:16 -07:00
Linus Torvalds
b07ad9967f vfs: get rid of 'struct dcache_hash_bucket' abstraction
It's a useless abstraction for 'hlist_bl_head', and it doesn't actually
help anything - quite the reverse.  All the users end up having to know
about the hlist_bl_head details anyway, using 'struct hlist_bl_node *'
etc. So it just makes the code look confusing.

And the cost of it is extra '&b->head' syntactic noise, but more
importantly it spuriously makes the hash table dentry list look
different from the per-superblock DCACHE_DISCONNECTED dentry list.

As a result, the code ended up using ad-hoc locking for one case and
special helper functions for what is really another totally identical
case in the very same function.

Make it all look and work the same.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-23 22:32:03 -07:00
Linus Torvalds
0f1d9f78ce Merge branch 'tty-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6
* 'tty-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6:
  tty/n_gsm: fix bug in CRC calculation for gsm1 mode
  serial/imx: read cts state only after acking cts change irq
  parport_pc.c: correctly release the requested region for the IT887x
2011-04-22 16:19:19 -07:00
Andi Kleen
8c9e80ed27 SECURITY: Move exec_permission RCU checks into security modules
Right now all RCU walks fall back to reference walk when CONFIG_SECURITY
is enabled, even though just the standard capability module is active.
This is because security_inode_exec_permission unconditionally fails
RCU walks.

Move this decision to the low level security module. This requires
passing the RCU flags down the security hook. This way at least
the capability module and a few easy cases in selinux/smack work
with RCU walks with CONFIG_SECURITY=y

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-22 16:17:29 -07:00
Linus Torvalds
8d082f8f3f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: hda - Fix unused warnings when !SND_HDA_NEEDS_RESUME
  ALSA: hda - Add a fix-up for Acer dmic with ALC271x codec
  ASoC: add a module alias to the FSI driver
  ALSA: emu10k1 - Fix "Music" controls to "Synth" controls in documents
  ARM: s3c2440: gta02; Register dfbmcs320 device for BT audio interface
  ASoC: codecs: JZ4740: Fix OOPS
  ASoC: Fix output PGA enabling in wm_hubs CODECs
  ASoC: sn95031: decorate function with __devexit_p()
  ASoC: SAMSUNG: Fix the inverted clocks handling for pcm driver
  ASoC: sst_platform: Fix lock acquring
  ASoC: fsi: driver safely remove for against irq
  ASoC: fsi: modify vague PM control on probe
  ASoC: fsi: take care in failing case of dai register
  MAINTAINERS: Update Samsung ASoC maintainer's id
  ASoC: WM8903: HP and Line out PGA/mixer DAPM fixes
  ASoC: Set left channel volume update bits for WM8994
  ASoC: fix config error path
  ASoC: check channel mismatch between cpu_dai and codec_dai
  ASoC: Tegra: Suspend/resume support
2011-04-22 14:59:07 -07:00
James Bottomley
4a5fa3590f [PARISC] slub: fix panic with DISCONTIGMEM
Slub makes assumptions about page_to_nid() which are violated by
DISCONTIGMEM and !NUMA.  This violation results in a panic because
page_to_nid() can be non-zero for pages in the discontiguous ranges and
this leads to a null return by get_node().  The assertion by the
maintainer is that DISCONTIGMEM should only be allowed when NUMA is also
defined.  However, at least six architectures: alpha, ia64, m32r, m68k,
mips, parisc violate this.  The panic is a regression against slab, so
just mark slub broken in the problem configuration to prevent users
reporting these panics.

Cc: stable@kernel.org
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-04-22 15:42:46 -05:00
Linus Torvalds
258ba6a5a9 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf, x86: Update/fix Intel Nehalem cache events
  perf, x86: P4 PMU - Don't forget to clear cpuc->active_mask on overflow
  x86, perf event: Turn off unstructured raw event access to offcore registers
  perf: Support Xeon E7's via the Westmere PMU driver
2011-04-22 11:31:27 -07:00
Linus Torvalds
d6d61c97e6 Merge branch 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  xtensa: Fixup irq conversion fallout and nmi_count
2011-04-22 11:31:21 -07:00
Peter Zijlstra
f4929bd372 perf, x86: Update/fix Intel Nehalem cache events
Change the Nehalem cache events to use retired memory instruction counters
(similar to Westmere), this greatly improves the provided stats.

Using:

main ()
{
        int i;

        for (i = 0; i < 1000000000; i++) {
                asm("mov (%%rsp), %%rbx;"
                    "mov %%rbx, (%%rsp);" : : : "rbx");
        }
}

We find:

 $ perf stat --repeat 10 -e instructions:u -e l1-dcache-loads:u -e l1-dcache-stores:u ./loop_1b_loads+stores
  Performance counter stats for './loop_1b_loads+stores' (10 runs):
      4,000,081,056 instructions:u           #      0.000 IPC ( +-   0.000% )
      4,999,502,846 l1-dcache-loads:u          ( +-   0.008% )
      1,000,034,832 l1-dcache-stores:u         ( +-   0.000% )
         1.565184942  seconds time elapsed   ( +-   0.005% )

The 5b is surprising - we'd expect 1b:

 $ perf stat --repeat 10 -e instructions:u -e r10b:u -e l1-dcache-stores:u ./loop_1b_loads+stores
  Performance counter stats for './loop_1b_loads+stores' (10 runs):
      4,000,081,054 instructions:u           #      0.000 IPC ( +-   0.000% )
      1,000,021,961 r10b:u                     ( +-   0.000% )
      1,000,030,951 l1-dcache-stores:u         ( +-   0.000% )
         1.565055422  seconds time elapsed   ( +-   0.003% )

Which this patch thus fixes.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Link: http://lkml.kernel.org/n/tip-q9rtru7b7840tws75xzboapv@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-22 13:50:27 +02:00
Cyrill Gorcunov
1ea5a6afd9 perf, x86: P4 PMU - Don't forget to clear cpuc->active_mask on overflow
It's not enough to simply disable event on overflow the
cpuc->active_mask should be cleared as well otherwise counter
may stall in "active" even in real being already disabled (which
potentially may lead to the situation that user may not use this
counter further).

Don pointed out that:

 " I also noticed this patch fixed some unknown NMIs
   on a P4 when I stressed the box".

Tested-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Link: http://lkml.kernel.org/r/1303398203-2918-3-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-22 10:21:34 +02:00
Ingo Molnar
b52c55c6a2 x86, perf event: Turn off unstructured raw event access to offcore registers
Andi Kleen pointed out that the Intel offcore support patches were merged
without user-space tool support to the functionality:

 |
 | The offcore_msr perf kernel code was merged into 2.6.39-rc*, but the
 | user space bits were not. This made it impossible to set the extra mask
 | and actually do the OFFCORE profiling
 |

Andi submitted a preliminary patch for user-space support, as an
extension to perf's raw event syntax:

 |
 | Some raw events -- like the Intel OFFCORE events -- support additional
 | parameters. These can be appended after a ':'.
 |
 | For example on a multi socket Intel Nehalem:
 |
 |    perf stat -e r1b7:20ff -a sleep 1
 |
 | Profile the OFFCORE_RESPONSE.ANY_REQUEST with event mask REMOTE_DRAM_0
 | that measures any access to DRAM on another socket.
 |

But this kind of usability is absolutely unacceptable - users should not
be expected to type in magic, CPU and model specific incantations to get
access to useful hardware functionality.

The proper solution is to expose useful offcore functionality via
generalized events - that way users do not have to care which specific
CPU model they are using, they can use the conceptual event and not some
model specific quirky hexa number.

We already have such generalization in place for CPU cache events,
and it's all very extensible.

"Offcore" events measure general DRAM access patters along various
parameters. They are particularly useful in NUMA systems.

We want to support them via generalized DRAM events: either as the
fourth level of cache (after the last-level cache), or as a separate
generalization category.

That way user-space support would be very obvious, memory access
profiling could be done via self-explanatory commands like:

  perf record -e dram ./myapp
  perf record -e dram-remote ./myapp

... to measure DRAM accesses or more expensive cross-node NUMA DRAM
accesses.

These generalized events would work on all CPUs and architectures that
have comparable PMU features.

( Note, these are just examples: actual implementation could have more
  sophistication and more parameter - as long as they center around
  similarly simple usecases. )

Now we do not want to revert *all* of the current offcore bits, as they
are still somewhat useful for generic last-level-cache events, implemented
in this commit:

  e994d7d23a: perf: Fix LLC-* events on Intel Nehalem/Westmere

But we definitely do not yet want to expose the unstructured raw events
to user-space, until better generalization and usability is implemented
for these hardware event features.

( Note: after generalization has been implemented raw offcore events can be
  supported as well: there can always be an odd event that is marginally
  useful but not useful enough to generalize. DRAM profiling is definitely
  *not* such a category so generalization must be done first. )

Furthermore, PERF_TYPE_RAW access to these registers was not intended
to go upstream without proper support - it was a side-effect of the above
e994d7d23a commit, not mentioned in the changelog.

As v2.6.39 is nearing release we go for the simplest approach: disable
the PERF_TYPE_RAW offcore hack for now, before it escapes into a released
kernel and becomes an ABI.

Once proper structure is implemented for these hardware events and users
are offered usable solutions we can revisit this issue.

Reported-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1302658203-4239-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-22 10:02:53 +02:00
Andi Kleen
b2508e828d perf: Support Xeon E7's via the Westmere PMU driver
There's a new model number public, 47, for Xeon E7 (aka Westmere EX).

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: a.p.zijlstra@chello.nl
Link: http://lkml.kernel.org/r/1303429715-10202-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-22 08:27:29 +02:00
Linus Torvalds
91e8549bde Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  ide: unexport DISK_EVENT_MEDIA_CHANGE for ide-gd and ide-cd
  block: don't propagate unlisted DISK_EVENTs to userland
  elevator: check for ELEVATOR_INSERT_SORT_MERGE in !elvpriv case too
2011-04-21 10:50:56 -07:00
Tejun Heo
7eec77a181 ide: unexport DISK_EVENT_MEDIA_CHANGE for ide-gd and ide-cd
check_events() implementations in both ide-gd and ide-cd are
inadequate for in-kernel event polling.  Both generate media change
events continuously when certain conditions are met causing infinite
event loop between the driver and userland event handler.

As disk event now supports suppression of unlisted events, simply
de-listing DISK_EVENT_MEDIA_CHANGE from disk->events resolves the
problem.  Internal handling around media revalidation will behave the
same while userland will fall back to userland event polling after
detecting the device doesn't support disk events.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jens Axboe <jaxboe@fusionio.com>
Acked-by: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-21 19:43:59 +02:00
Tejun Heo
7c88a168da block: don't propagate unlisted DISK_EVENTs to userland
DISK_EVENT_MEDIA_CHANGE is used for both userland visible event and
internal event for revalidation of removeable devices.  Some legacy
drivers don't implement proper event detection and continuously
generate events under certain circumstances.  For example, ide-cd
generates media changed continuously if there's no media in the drive,
which can lead to infinite loop of events jumping back and forth
between the driver and userland event handler.

This patch updates disk event infrastructure such that it never
propagates events not listed in disk->events to userland.  Those
events are processed the same for internal purposes but uevent
generation is suppressed.

This also ensures that userland only gets events which are advertised
in the @events sysfs node lowering risk of confusion.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-21 19:43:58 +02:00
Jens Axboe
3aa72873ff elevator: check for ELEVATOR_INSERT_SORT_MERGE in !elvpriv case too
The sort insert is the one that goes to the IO scheduler. With
the SORT_MERGE addition, we could bypass IO scheduler setup
but still ask the IO scheduler to insert the request. This would
cause an oops on switching IO schedulers through the sysfs
interface, unless the disk just happened to be idle while it
occured.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-21 19:28:35 +02:00
Pavel Shilovsky
4906e50b37 CIFS: Fix memory over bound bug in cifs_parse_mount_options
While password processing we can get out of options array bound if
the next character after array is delimiter. The patch adds a check
if we reach the end.

Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-04-21 17:22:43 +00:00
Linus Torvalds
37fc67c9f0 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: fix duplicate message output
2011-04-21 10:01:26 -07:00
Linus Torvalds
d20dc4d523 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, numa: Fix cpu nodemasks for NUMA emulation and CONFIG_DEBUG_PER_CPU_MAPS
  Revert "x86, NUMA: Fix fakenuma boot failure"
2011-04-21 10:01:03 -07:00
Randy Dunlap
d76c8420c3 raid5: fix build error, sector_t usage
Change <sectors> from unsigned long long to sector_t.
This matches its source field.

  ERROR: "__udivdi3" [drivers/md/raid456.ko] undefined!

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-21 10:00:00 -07:00
Linus Torvalds
83425eee85 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  virtio: console: Enable call to hvc_remove() on console port remove
  virtio_pci: Prevent double-free of pci regions after device hot-unplug
  virtio: Decrement avail idx on buffer detach
2011-04-21 09:58:42 -07:00
Linus Torvalds
8ed54bd565 Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  agp: fix arbitrary kernel memory writes
  agp: fix OOM and buffer overflow
  drm/radeon/kms: fix IH writeback on r6xx+ on big endian machines
2011-04-21 09:57:56 -07:00
Linus Torvalds
25b210371f Merge branch 'drm-intel-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/keithp/linux-2.6
* 'drm-intel-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/keithp/linux-2.6:
  drm/i915: Initialise g4x watermarks for disabled pipes
  drm/i915: Sanitize the output registers after resume
  drm/i915/tv: Fix modeset flickering introduced in 7f58aabc3
  drm/i915/tv: Only poll for TV connections
  drm/i915/tv: Remember the detected TV type
2011-04-21 09:57:13 -07:00
Linus Torvalds
ec616048ea Merge git://git.infradead.org/iommu-2.6
* git://git.infradead.org/iommu-2.6:
  intel_iommu: disable all VT-d PMRs when TXT launched
  intel-iommu: Fix get_domain_for_dev() error path
  intel-iommu: Unlink domain from iommu
  intel-iommu: Fix use after release during device attach
2011-04-21 09:56:35 -07:00
Jan Kara
df7e130384 vfs: Pass setxattr(2) flags properly
For some reason generic_setxattr() did not pass flags (XATTR_CREATE,
XATTR_REPLACE) to the filesystem specific helper. This caused that
setxattr(2) syscall just ignored these flags.

Fix the bug by passing flags correctly.

Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-21 07:34:44 -07:00
David Rientjes
d9b41e0b54 [PARISC] set memory ranges in N_NORMAL_MEMORY when onlined
When a DISCONTIGMEM memory range is brought online as a NUMA node, it
also needs to have its bet set in N_NORMAL_MEMORY.  This is necessary for
generic kernel code that utilizes N_NORMAL_MEMORY as a subset of N_ONLINE
for memory savings.

These types of hacks can hopefully be removed once DISCONTIGMEM is either
removed or abstracted away from CONFIG_NUMA.

Fixes a panic in the slub code which only initializes structures for
N_NORMAL_MEMORY to save memory:

	Backtrace:
	 [<000000004021c938>] add_partial+0x28/0x98
	 [<000000004021faa0>] __slab_free+0x1d0/0x1d8
	 [<000000004021fd04>] kmem_cache_free+0xc4/0x128
	 [<000000004033bf9c>] ida_get_new_above+0x21c/0x2c0
	 [<00000000402a8980>] sysfs_new_dirent+0xd0/0x238
	 [<00000000402a974c>] create_dir+0x5c/0x168
	 [<00000000402a9ab0>] sysfs_create_dir+0x98/0x128
	 [<000000004033d6c4>] kobject_add_internal+0x114/0x258
	 [<000000004033d9ac>] kobject_add_varg+0x7c/0xa0
	 [<000000004033df20>] kobject_add+0x50/0x90
	 [<000000004033dfb4>] kobject_create_and_add+0x54/0xc8
	 [<00000000407862a0>] cgroup_init+0x138/0x1f0
	 [<000000004077ce50>] start_kernel+0x5a0/0x840
	 [<000000004011fa3c>] start_parisc+0xa4/0xb8
	 [<00000000404bb034>] packet_ioctl+0x16c/0x208
	 [<000000004049ac30>] ip_mroute_setsockopt+0x260/0xf20

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: stable@kernel.org
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-04-21 09:32:38 -05:00
Amit Shah
afa2689e19 virtio: console: Enable call to hvc_remove() on console port remove
This call was disabled as hot-unplugging one virtconsole port led to
another virtconsole port freezing.

Upon testing it again, this now works, so enable it.

In addition, a bug was found in qemu wherein removing a port of one type
caused the guest output from another port to stop working.  I doubt it
was just this bug that caused it (since disabling the hvc_remove() call
did allow other ports to continue working), but since it's all solved
now, we're fine with hot-unplugging of virtconsole ports.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-04-21 22:57:00 +09:30
Amit Shah
31a3ddda16 virtio_pci: Prevent double-free of pci regions after device hot-unplug
In the case where a virtio-console port is in use (opened by a program)
and a virtio-console device is removed, the port is kept around but all
the virtio-related state is assumed to be gone.

When the port is finally released (close() called), we call
device_destroy() on the port's device.  This results in the parent
device's structures to be freed as well.  This includes the PCI regions
for the virtio-console PCI device.

Once this is done, however, virtio_pci_release_dev() kicks in, as the
last ref to the virtio device is now gone, and attempts to do

     pci_iounmap(pci_dev, vp_dev->ioaddr);
     pci_release_regions(pci_dev);
     pci_disable_device(pci_dev);

which results in a double-free warning.

Move the code that releases regions, etc., to the virtio_pci_remove()
function, and all that's now left in release_dev is the final freeing of
the vp_dev.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-04-21 22:57:00 +09:30
Amit Shah
b3258ff1d6 virtio: Decrement avail idx on buffer detach
When detaching a buffer from a vq, the avail.idx value should be
decremented as well.

This was noticed by hot-unplugging a virtio console port and then
plugging in a new one on the same number (re-using the vqs which were
just 'disowned').  qemu reported

   'Guest moved used index from 0 to 256'

when any IO was attempted on the new port.

CC: stable@kernel.org
Reported-by: juzhang <juzhang@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-04-21 22:57:00 +09:30
Michal Simek
d20ac25282 ftrace: Build without frame pointers on Microblaze
Microblaze doesn't need/support FRAME_POINTERS in order to have a working
function tracer.

The patch remove Kconfig warning.

Warning log:
warning: (LOCKDEP && FAULT_INJECTION_STACKTRACE_FILTER && LATENCYTOP &&
FUNCTION_TRACER && KMEMCHECK) selects FRAME_POINTER which has unmet direct
dependencies (DEBUG_KERNEL && (CRIS || M68K || FRV || UML || AVR32 ||
SUPERH || BLACKFIN || MN10300) || ARCH_WANT_FRAME_POINTERS)

Signed-off-by: Michal Simek <monstr@monstr.eu>
Link: http://lkml.kernel.org/r/1301908812-8119-2-git-send-email-monstr@monstr.eu
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-04-21 09:06:24 -04:00