When we are finished with return PFNs to the hypervisor, then
populate it back, and also mark the E820 MMIO and E820 gaps
as IDENTITY_FRAMEs, we then call P2M to set areas that can
be used for ballooning. We were off by one, and ended up
over-writting a P2M entry that most likely was an IDENTITY_FRAME.
For example:
1-1 mapping on 40000->40200
1-1 mapping on bc558->bc5ac
1-1 mapping on bc5b4->bc8c5
1-1 mapping on bc8c6->bcb7c
1-1 mapping on bcd00->100000
Released 614 pages of unused memory
Set 277889 page(s) to 1-1 mapping
Populating 40200-40466 pfn range: 614 pages added
=> here we set from 40466 up to bc559 P2M tree to be
INVALID_P2M_ENTRY. We should have done it up to bc558.
The end result is that if anybody is trying to construct
a PTE for PFN bc558 they end up with ~PAGE_PRESENT.
CC: stable@vger.kernel.org
Reported-by-and-Tested-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
If P2M leaf is completly packed with INVALID_P2M_ENTRY or with
1:1 PFNs (so IDENTITY_FRAME type PFNs), we can swap the P2M leaf
with either a p2m_missing or p2m_identity respectively. The old
page (which was created via extend_brk or was grafted on from the
mfn_list) can be re-used for setting new PFNs.
This also means we can remove git commit:
5bc6f9888d
xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back
which tried to fix this.
and make the amount that is required to be reserved much smaller.
CC: stable@vger.kernel.org # for 3.5 only.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
This reverts commit 00e37bdb01.
During shutdown of PVHVM guests with more than 2VCPUs on certain
machines we can hit the race where the replaced shared_info is not
replaced fast enough and the PV time clock retries reading the same
area over and over without any any success and is stuck in an
infinite loop.
Acked-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
When we release pages back during bootup:
Freeing 9d-100 pfn range: 99 pages freed
Freeing 9cf36-9d0d2 pfn range: 412 pages freed
Freeing 9f6bd-9f6bf pfn range: 2 pages freed
Freeing 9f714-9f7bf pfn range: 171 pages freed
Freeing 9f7e0-9f7ff pfn range: 31 pages freed
Freeing 9f800-100000 pfn range: 395264 pages freed
Released 395979 pages of unused memory
We then try to populate those pages back. In the P2M tree however
the space for those leafs must be reserved - as such we use extend_brk.
We reserve 8MB of _brk space, which means we can fit over
1048576 PFNs - which is more than we should ever need.
Without this, on certain compilation of the kernel we would hit:
(XEN) domain_crash_sync called from entry.S
(XEN) CPU: 0
(XEN) RIP: e033:[<ffffffff818aad3b>]
(XEN) RFLAGS: 0000000000000206 EM: 1 CONTEXT: pv guest
(XEN) rax: ffffffff81a7c000 rbx: 000000000000003d rcx: 0000000000001000
(XEN) rdx: ffffffff81a7b000 rsi: 0000000000001000 rdi: 0000000000001000
(XEN) rbp: ffffffff81801cd8 rsp: ffffffff81801c98 r8: 0000000000100000
(XEN) r9: ffffffff81a7a000 r10: 0000000000000001 r11: 0000000000000003
(XEN) r12: 0000000000000004 r13: 0000000000000004 r14: 000000000000003d
(XEN) r15: 00000000000001e8 cr0: 000000008005003b cr4: 00000000000006f0
(XEN) cr3: 0000000125803000 cr2: 0000000000000000
(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033
(XEN) Guest stack trace from rsp=ffffffff81801c98:
.. which is extend_brk hitting a BUG_ON.
Interestingly enough, most of the time we are not going to hit this
b/c the _brk space is quite large (v3.5):
ffffffff81a25000 B __brk_base
ffffffff81e43000 B __brk_limit
= ~4MB.
vs earlier kernels (with this back-ported), the space is smaller:
ffffffff81a25000 B __brk_base
ffffffff81a7b000 B __brk_limit
= 344 kBytes.
where we would certainly hit this and hit extend_brk.
Note that git commit c3d93f8801
(xen: populate correct number of pages when across mem boundary (v2))
exposed this bug).
[v1: Made it 8MB of _brk space instead of 4MB per Jan's suggestion]
CC: stable@vger.kernel.org #only for 3.5
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Currently kexec in a PVonHVM guest fails with a triple fault because the
new kernel overwrites the shared info page. The exact failure depends on
the size of the kernel image. This patch moves the pfn from RAM into
MMIO space before the kexec boot.
The pfn containing the shared_info is located somewhere in RAM. This
will cause trouble if the current kernel is doing a kexec boot into a
new kernel. The new kernel (and its startup code) can not know where the
pfn is, so it can not reserve the page. The hypervisor will continue to
update the pfn, and as a result memory corruption occours in the new
kernel.
One way to work around this issue is to allocate a page in the
xen-platform pci device's BAR memory range. But pci init is done very
late and the shared_info page is already in use very early to read the
pvclock. So moving the pfn from RAM to MMIO is racy because some code
paths on other vcpus could access the pfn during the small window when
the old pfn is moved to the new pfn. There is even a small window were
the old pfn is not backed by a mfn, and during that time all reads
return -1.
Because it is not known upfront where the MMIO region is located it can
not be used right from the start in xen_hvm_init_shared_info.
To minimise trouble the move of the pfn is done shortly before kexec.
This does not eliminate the race because all vcpus are still online when
the syscore_ops will be called. But hopefully there is no work pending
at this point in time. Also the syscore_op is run last which reduces the
risk further.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
init_hvm_pv_info is called only in PVonHVM context, move it into ifdef.
init_hvm_pv_info does not fail, make it a void function.
remove arguments from init_hvm_pv_info because they are not used by the
caller.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Both have type struct shared_info so no cast is needed.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
While debugging kexec issues in a PVonHVM guest I modified
xen_hvm_platform() to return false to disable all PV drivers. This
caused a crash in platform_pci_init() because it expects certain data
structures to be initialized properly.
To avoid such a crash make sure the driver is initialized only if
running in a Xen guest.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Add xs_reset_watches function to shutdown watches from old kernel after
kexec boot. The old kernel does not unregister all watches in the
shutdown path. They are still active, the double registration can not
be detected by the new kernel. When the watches fire, unexpected events
will arrive and the xenwatch thread will crash (jumps to NULL). An
orderly reboot of a hvm guest will destroy the entire guest with all its
resources (including the watches) before it is rebuilt from scratch, so
the missing unregister is not an issue in that case.
With this change the xenstored is instructed to wipe all active watches
for the guest. However, a patch for xenstored is required so that it
accepts the XS_RESET_WATCHES request from a client (see changeset
23839:42a45baf037d in xen-unstable.hg). Without the patch for xenstored
the registration of watches will fail and some features of a PVonHVM
guest are not available. The guest is still able to boot, but repeated
kexec boots will fail.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
When switching tasks in a Xen PV guest, avoid updating the TLS
descriptors if they haven't changed. This improves the speed of
context switches by almost 10% as much of the time the descriptors are
the same or only one is different.
The descriptors written into the GDT by Xen are modified from the
values passed in the update_descriptor hypercall so we keep shadow
copies of the three TLS descriptors to compare against.
lmbench3 test Before After Improvement
--------------------------------------------
lat_ctx -s 32 24 7.19 6.52 9%
lat_pipe 12.56 11.66 7%
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
When constructing the initial page tables, if the MFN for a usable PFN
is missing in the p2m then that frame is initially ballooned out. In
this case, zero the PTE (as in decrease_reservation() in
drivers/xen/balloon.c).
This is obviously safe instead of having an valid PTE with an MFN of
INVALID_P2M_ENTRY (~0).
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
In xen_set_pte() if batching is unavailable (because the caller is in
an interrupt context such as handling a page fault) it would fall back
to using native_set_pte() and trapping and emulating the PTE write.
On 32-bit guests this requires two traps for each PTE write (one for
each dword of the PTE). Instead, do one mmu_update hypercall
directly.
During construction of the initial page tables, continue to use
native_set_pte() because most of the PTEs being set are in writable
and unpinned pages (see phys_pmd_init() in arch/x86/mm/init_64.c) and
using a hypercall for this is very expensive.
This significantly improves page fault performance in 32-bit PV
guests.
lmbench3 test Before After Improvement
----------------------------------------------
lat_pagefault 3.18 us 2.32 us 27%
lat_proc fork 356 us 313.3 us 11%
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Coverity points out that we do not free in one case the
pr_backup - and sure enough we forgot.
Found by Coverity (CID 401970)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
If a driver leaves its poll method NULL, the device is assumed to
be both readable and writable without blocking.
This patch add .poll method to xen mcelog device driver, so that
when mcelog use system calls like ppoll or select, it would be
blocked when no data available, and avoid spinning at CPU.
Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
copy_to_user might sleep and print a stack trace if it is executed
in an atomic spinlock context. Like this:
(XEN) CMCI: send CMCI to DOM0 through virq
BUG: sleeping function called from invalid context at /home/konradinux/kernel.h:199
in_atomic(): 1, irqs_disabled(): 0, pid: 4581, name: mcelog
Pid: 4581, comm: mcelog Tainted: G O 3.5.0-rc1upstream-00003-g149000b-dirty #1
[<ffffffff8109ad9a>] __might_sleep+0xda/0x100
[<ffffffff81329b0b>] xen_mce_chrdev_read+0xab/0x140
[<ffffffff81148945>] vfs_read+0xc5/0x190
[<ffffffff81148b0c>] sys_read+0x4c/0x90
[<ffffffff815bd039>] system_call_fastpath+0x16
This patch schedule a workqueue for IRQ handler to poll the data,
and use mutex instead of spinlock, so copy_to_user sleep in atomic
context would not occur.
Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
This patch provide Xen physical cpus online/offline sys interface.
User can use it for their own purpose, like power saving:
by offlining some cpus when light workload it save power greatly.
Its basic workflow is, user online/offline cpu via sys interface,
then hypercall xen to implement, after done xen inject virq back to dom0,
and then dom0 sync cpu status.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
When Xen hypervisor inject vMCE to guest, use native mce handler
to handle it
Signed-off-by: Ke, Liping <liping.ke@intel.com>
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
there are 3 funcs which need to be _initcalled in a logic sequence:
1. xen_late_init_mcelog
2. mcheck_init_device
3. threshold_init_device
xen_late_init_mcelog must register xen_mce_chrdev_device before
native mce_chrdev_device registration if running under xen platform;
mcheck_init_device should be inited before threshold_init_device to
initialize mce_device, otherwise a a NULL ptr dereference will cause panic.
so we use following _initcalls
1. device_initcall(xen_late_init_mcelog);
2. device_initcall_sync(mcheck_init_device);
3. late_initcall(threshold_init_device);
when running under xen, the initcall order is 1,2,3;
on baremetal, we skip 1 and we do only 2 and 3.
Acked-and-tested-by: Borislav Petkov <bp@amd64.org>
Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
When MCA error occurs, it would be handled by Xen hypervisor first,
and then the error information would be sent to initial domain for logging.
This patch gets error information from Xen hypervisor and convert
Xen format error into Linux format mcelog. This logic is basically
self-contained, not touching other kernel components.
By using tools like mcelog tool users could read specific error information,
like what they did under native Linux.
To test follow directions outlined in Documentation/acpi/apei/einj.txt
Acked-and-tested-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Ke, Liping <liping.ke@intel.com>
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Pull btrfs compile warning fixes from Chris Mason.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: cast devid to unsigned long long for printk %llu
Btrfs: init old_generation in get_old_root
Pull arch/tile update from Chris Metcalf:
"This one-line bug fix unbreaks glibc robust mutexes (among other
things no doubt), from code merged in during the 3.5 merge window but
which we had been running internally at Tilera for almost a year."
* 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
tile: fix bug in get_user() for 4-byte values
- two fixes for s3c-fb by Jingoo Han
(including a fix for a potential division by zero)
- a couple of randconfig fixes by Arnd Bergmann
- a cleanup for bfin_adv7393fb by Emil Goode
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.12 (GNU/Linux)
iQIcBAABAgAGBQJP3OVbAAoJECSVL5KnPj1P+BoP/jf20UvGevLHaKYLxjB6elWP
cutoJFgfPWWdJcu6qwk4thR3fAZkB9R1vorFEp0EP7m8EkB8poP7pdPK7xGzbLKT
iGhWUy4pENcleyPu0mXE2kqLfq5g3fnPc9M0aiIySwfGAgjop24duP9NYMBL5sfr
GHs7JcrcyqWWVDHQLK3gx7JpVzoQOlN44yy64Su2GR8hfLUiSHC35AVnmK8svkra
S5dTa9RJJwVFowF8/o5bUzFuuMctKc22CbXA2NkR+PUHHc0zAqaRo0XedTr02C8u
bM9w7RWxQ71ZprJJeaBJgUbDev0JO9QxX7kA6eN3MRTHimi2wz9n2jy95R+H3ITm
EXhCl6F2kf+xzOfl9Ta9lB4eUthj2O1yBHGArpc5iJOuWHip2ggW4SrEyjKLrOpn
7AyB+D9gbZppnms4z5kCPM0Q/i/flyMoEccuaV4D3AaWw1miA4leQUtfjL/SGavl
9t6N6Zo2OT/h+Z36GauFnH3ap4lXNGZo3I03sFGp7E4VC0MDA8UnqG70zzdzm7BM
CO/c34HVpNBS0MH6blsUn+64aftJampqVzlQ0uj++5R1fpDGBCo+p+WEkIx9GGqN
bhJbmXR0HPnHtNQtf/LCPv98V/OAnWpfC98EeewPAQ3MTAexpdS8JduuY3PjhdhD
TkLBbb6A5UCU90SV32AD
=0Zkc
-----END PGP SIGNATURE-----
Merge tag 'fbdev-fixes-for-3.5-1' of git://github.com/schandinat/linux-2.6
Pull fbdev fixes from Florian Tobias Schandinat:
- two fixes for s3c-fb by Jingoo Han (including a fix for a potential
division by zero)
- a couple of randconfig fixes by Arnd Bergmann
- a cleanup for bfin_adv7393fb by Emil Goode
* tag 'fbdev-fixes-for-3.5-1' of git://github.com/schandinat/linux-2.6:
video: s3c-fb: fix possible division by zero in s3c_fb_calc_pixclk
video: s3c-fb: clear SHADOWCON register when clearing hardware window registers
drivers/tosa: driver needs I2C and SPI to compile
drivers/savagefb: use mdelay instead of udelay
video/console: automatically select a font
video/ili9320: do not mark exported functions __devexit
drivers/video: use correct __devexit_p annotation
video: bfin_adv7393fb: Convert to kstrtouint_from_user
The definition of 32-bit values in the 64-bit tilegx architecture is that
they should be sign-extended regardless of whether they are considered
signed or unsigned by the compiler. Accordingly, we need to use an
"ld4s" rather than "ld4u" to load and sign-extend for get_user().
This fixes glibc bug 14238 (see http://sourceware.org/bugzilla),
introduced during the 3.5 merge window.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Minchan Kim reports that when a system has many swap areas, and tmpfs
swaps out to the ninth or more, shmem_getpage_gfp()'s attempts to read
back the page cannot locate it, and the read fails with -ENOMEM.
Whoops. Yes, I blindly followed read_swap_header()'s pte_to_swp_entry(
swp_entry_to_pte()) technique for determining maximum usable swap
offset, without stopping to realize that that actually depends upon the
pte swap encoding shifting swap offset to the higher bits and truncating
it there. Whereas our radix_tree swap encoding leaves offset in the
lower bits: it's swap "type" (that is, index of swap area) that was
truncated.
Fix it by reducing the SWP_TYPE_SHIFT() in swapops.h, and removing the
broken radix_to_swp_entry(swp_to_radix_entry()) from read_swap_header().
This does not reduce the usable size of a swap area any further, it
leaves it as claimed when making the original commit: no change from 3.0
on x86_64, nor on i386 without PAE; but 3.0's 512GB is reduced to 128GB
per swapfile on i386 with PAE. It's not a change I would have risked
five years ago, but with x86_64 supported for ten years, I believe it's
appropriate now.
Hmm, and what if some architecture implements its swap pte with offset
encoded below type? That would equally break the maximum usable swap
offset check. Happily, they all follow the same tradition of encoding
offset above type, but I'll prepare a check on that for next.
Reported-and-Reviewed-and-Tested-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org [3.1, 3.2, 3.3, 3.4]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a couple of minor fixes, one for a preempt warning in the mpt2sas
driver and one is a config failure with the new sd async domain.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQEcBAABAgAGBQJP2fuGAAoJEDeqqVYsXL0Md88IAMw1nLZkg39TzEM4tAkkCa4C
o3e3xcmUdO1lqswmgUDaD95Z0H9MUBxwHWQGiAQ7fheuyXXyGOLINkBTx7wOGksk
JpsjEHNhVwW2rxFwZDLYxvtBjRy9E6wJPeuBQxQdU2gGVzjmk2bAV6mJnf+/wDkH
uqJKnrmVNU7x5drw1wP2z0i3zm8UZVlB+eW3J5ReZzfQlpBYWmFcwxrSTp8Q03ko
rx+rMNYlQO+5Cp0/UZxdIlTw4TTaU+F6bXC8jPzZ2zj0mUWtBxiF6rYZQlebNM5n
/byCuivfM9OMeWaiwk5ERIQ/1iOUF4L5apCNUIG2PFtS7RY4349luWpBoeYkaVY=
=eNSh
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is a couple of minor fixes, one for a preempt warning in the
mpt2sas driver and one is a config failure with the new sd async
domain."
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
[SCSI] Fix sd_probe_domain config problem
[SCSI] mpt2sas: Fix unsafe using smp_processor_id() in preemptible
Highlights include:
- Fix a couple of mount regressions due to the recent cleanups.
- Fix an Oops in the open recovery code
- Fix an rpc_pipefs upcall hang that results from some of the
net namespace work from 3.4.x (stable kernel candidate).
- Fix a couple of write and o_direct regressions that were found
at last weeks Bakeathon testing event in Ann Arbor.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJP2gmaAAoJEGcL54qWCgDyrBMP/RY/T++He8y5k3M9aEqiIv0q
D8ZVMwzID6f4Zgw4xRg96aYr02sBTw0q+0mP5x1EZmg8mK29rnBiVeKHE1iwSfXq
10/SYISlpIjhJC4I4kHXGd2KClgj7qRRCbDKFRWwoIIwYU+kJn8MRnPa9XqdL8kP
q68lrtayW8THSJDR8bk1GQn+ARxGeoY++qzHxm3vpQCbZVVb19VqKMWAWSN4VKqb
epWehOSAzB3iA7HrLRbf8Y8/sDdXewxCQpr9CC/wxuu++l5ifPphR0ToX+k9VZXI
BKFLUojCUZHTMAgCxuxjrFYehMeyClbzL2lLkz5Pgj0gQhOX6Myj+WMXoEg/uWfo
XNf51FH3yBbnfayTaOUs6Y50iuU+dQO7TUTAoWTPpW9V/iT5z/fWAKUVJhDtrPk5
DVDkR6SEgb4P1RqkehZKLq5k5GSAcTR+MZr452eDrFYXJrY8ORDE6o6kP4Rr3Nnd
n8gap0gHxzIYlhBghem6+nLN+HhpZQopWeD8mNub20VuXsChRDr9/+XWuMCSJaZF
2kleVdt2+rTDzi9bJTRYlsX397oaThL0NbRvshHAwnXIDtIQrzxx6+dUyOsEWMEu
go/EdSUUESXGNlsWTqewCBsOjPeE4L5ijI/QglfDkF+CzD5dDjrxl+5i57iMKVfc
Ydste3pQJkS7PiZu1sWA
=unbu
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.5-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:
- Fix a couple of mount regressions due to the recent cleanups.
- Fix an Oops in the open recovery code
- Fix an rpc_pipefs upcall hang that results from some of the net
namespace work from 3.4.x (stable kernel candidate).
- Fix a couple of write and o_direct regressions that were found at
last weeks Bakeathon testing event in Ann Arbor."
* tag 'nfs-for-3.5-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: add an endian notation for sparse
NFSv4.1: integer overflow in decode_cb_sequence_args()
rpc_pipefs: allow rpc_purge_list to take a NULL waitq pointer
NFSv4 do not send an empty SETATTR compound
NFSv2: EOF incorrectly set on short read
NFS: Use the NFS_DEFAULT_VERSION for v2 and v3 mounts
NFS: fix directio refcount bug on commit
NFSv4: Fix unnecessary delegation returns in nfs4_do_open
NFSv4.1: Convert another trivial printk into a dprintk
NFS4: Fix open bug when pnfs module blacklisted
NFS: Remove incorrect BUG_ON in nfs_found_client
NFS: Map minor mismatch error to protocol not support error.
NFS: Fix a commit bug
NFS4: Set parsed mount data version to 4
NFSv4.1: Ensure we clear session state flags after a session creation
NFSv4.1: Convert a trivial printk into a dprintk
NFSv4: Fix up decode_attr_mdsthreshold
NFSv4: Fix an Oops in the open recovery code
NFSv4.1: Fix a request leak on the back channel
Pull DMA-mapping fixes from Marek Szyprowski:
"A set of minor fixes for dma-mapping code (ARM and x86) required for
Contiguous Memory Allocator (CMA) patches merged in v3.5-rc1."
* 'fixes-for-linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
x86: dma-mapping: fix broken allocation when dma_mask has been provided
ARM: dma-mapping: fix debug messages in dmabounce code
ARM: mm: fix type of the arm_dma_limit global variable
ARM: dma-mapping: Add missing static storage class specifier
Just one commit, and a one-liner at that, but an important one;
without it hard_irq_disable() does nothing on powerpc.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJP2ub9AAoJEJ5CldYF9mzpeYQP/2IobuBukiIR51zs6+iM938L
nX4BThjeKNGbYmGCQNT/q9htSQTC6kMlOu3LmWba3WRcBL/eTmfsQSp2UzZlmi9z
XZLNeBliSBFaUndxJpBPLILqTpwKglBxFuSX4osHGNBi+WhmUSB2vWCcEVWcGE27
G3kXPoYYFfiBDR3iAQZgYDeirj1dQNI9rbaihq95YhmsdUr2VNcofYfo0s5bWVBr
wp2srp3BGP6TqJh71wArX2rcEBS232JT88q2U04MiqJtVJoY2OCsGpNcOBlCmTSX
k+zWgjaw+dt0h/y3jJbs6QeVvE4MpB2EiSpq+Lpo9NjeX2qDgWGF2qCKnoZlpdz0
Vg411irHbl6OwFGI0XKdywIEXv+VUPJq+kxGz49q1dC9/8f0VtQYjEw90RmcBlZB
Zesffw7OYrXkwThC29yhHVQOla6or7jX2Ulun50MeCz50TAeKkRrXvWTHDvOMhC/
FQmQvYTipK9ANjO85G4tYgwvdvuhBcPasA3136AOdgZN382Vu/tlr1uCd7XDiAUi
yieCgxzBa92lP0W98AsHT1AqPmYSAFoiOIUNLu8sriba3L5/B1eGKUqE+IoeAE22
N643CpnUs09ZRQXh0V6Yfsztv/bKP+wWhAJzECu1fu+md9ITS/zLo2sBjGk2f0gz
nFMwpZjurPprn7GyMrAy
=P06G
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
Pull PowerPC fix from Paul Mackerras:
"Just one commit, and a one-liner at that, but an important one;
without it hard_irq_disable() does nothing on powerpc."
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
Make hard_irq_disable() actually hard-disable interrupts
Pull two nfsd bugfixes from J. Bruce Fields.
* 'for-3.5' of git://linux-nfs.org/~bfields/linux:
nfsd4: BUG_ON(!is_spin_locked()) no good on UP kernels
NFS: hard-code init_net for NFS callback transports
- When booting as PVHVM we would try to use PV console - but would not validate
the parameters causing us to crash during restore b/c we re-use the wrong event
channel.
- When booting on machines with SR-IOV PCI bridge we didn't check for the bridge
and tried to use it.
- Under AMD machines would advertise the APERFMPERF resulting in needless amount
of MSRs from the guest.
- A global value (xen_released_pages) was not subtracted at bootup when pages
were added back in. This resulted in the balloon worker having the wrong
account of how many pages were truly released.
- Fix dead-lock when xen-blkfront is run in the same domain as xen-blkback.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJP2kcmAAoJEFjIrFwIi8fJXLcH/0a2m6KMcyjc4WaCHspAEFDL
9B055QUFDEOlH55wE2QeED/8D+0HUbTYnQBycH126XLKzLfRv1fsrKFKDSA/SWW2
Mh8N316UrY5Wc3KMdxXdCXJCDqDs7VhARTv6JdlUqUlH9oLRYE6CMRO8MujT0iwd
r+uEnNuW0udMFt8x9SnJW7pEaq7u2N5koEGdWEzZhfoumDaCRxm5OKAKXZ0DZlEZ
/BPjTW/N+Pf4u+bJZY+wQq41y4zGMqu7TDo/hOpuGZxeqtVnCE9trBbuGLnp4K+W
n4TfZZs9Y1kovSMj6qTeB0aP0F77tqHyXPb1oPKxm2kWfqT2dFtIRpuLtXYSC+o=
=cQl2
-----END PGP SIGNATURE-----
Merge tag 'stable/for-linus-3.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
Pull five Xen bug-fixes from Konrad Rzeszutek Wilk:
- When booting as PVHVM we would try to use PV console - but would not validate
the parameters causing us to crash during restore b/c we re-use the wrong event
channel.
- When booting on machines with SR-IOV PCI bridge we didn't check for the bridge
and tried to use it.
- Under AMD machines would advertise the APERFMPERF resulting in needless amount
of MSRs from the guest.
- A global value (xen_released_pages) was not subtracted at bootup when pages
were added back in. This resulted in the balloon worker having the wrong
account of how many pages were truly released.
- Fix dead-lock when xen-blkfront is run in the same domain as xen-blkback.
* tag 'stable/for-linus-3.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen: mark local pages as FOREIGN in the m2p_override
xen/setup: filter APERFMPERF cpuid feature out
xen/balloon: Subtract from xen_released_pages the count that is populated.
xen/pci: Check for PCI bridge before using it.
xen/events: Add WARN_ON when quick lookup found invalid type.
xen/hvc: Check HVM_PARAM_CONSOLE_[EVTCHN|PFN] for correctness.
xen/hvc: Fix error cases around HVM_PARAM_CONSOLE_PFN
xen/hvc: Collapse error logic.
Here are a bunch of tiny fixes for the USB core and drivers for 3.5-rc3
A bunch of gadget fixes, and new device ids, as well as some fixes for a number
of different regressions that have been reported recently. We also fixed some
PCI host controllers to resolve a long-standing bug with a whole class of host
controllers that have been plaguing people for a number of kernel releases,
preventing their systems from suspending properly.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iEYEABECAAYFAk/byIEACgkQMUfUDdst+ykLBwCgiAi/cS4ObblcBEbnBvosPgsA
PqEAoJLMAwhcozeEOWLhb++45Vs+UjKv
=HAWc
-----END PGP SIGNATURE-----
Merge tag 'usb-3.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg Kroah-Hartman:
"Here are a bunch of tiny fixes for the USB core and drivers for
3.5-rc3
A bunch of gadget fixes, and new device ids, as well as some fixes for
a number of different regressions that have been reported recently.
We also fixed some PCI host controllers to resolve a long-standing bug
with a whole class of host controllers that have been plaguing people
for a number of kernel releases, preventing their systems from
suspending properly.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
* tag 'usb-3.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (41 commits)
USB: fix gathering of interface associations
usb: ehci-sh: fix illegal phy_init() running when platform_data is NULL
usb: cdc-acm: fix devices not unthrottled on open
Fix OMAP EHCI suspend/resume failure (i693)
USB: ohci-hub: Mark ohci_finish_controller_resume() as __maybe_unused
usb: use usb_serial_put in usb_serial_probe errors
USB: EHCI: Fix build warning in xilinx ehci driver
USB: fix PS3 EHCI systems
xHCI: Increase the timeout for controller save/restore state operation
xhci: Don't free endpoints in xhci_mem_cleanup()
xhci: Fix invalid loop check in xhci_free_tt_info()
xhci: Fix error path return value.
USB: Checking the wrong variable in usb_disable_lpm()
usb-storage: Add 090c:1000 to unusal-devs
USB: serial-generic: use a single set of device IDs
USB: serial: Enforce USB driver and USB serial driver match
USB: add NO_D3_DURING_SLEEP flag and revert 151b612847
USB: option: add more YUGA device ids
USB: mos7840: Fix compilation of usb serial driver
USB: option: fix memory leak
...
Pull IDE fixes from David S. Miller:
1) Two fixes to icside, one for a build failure and another for a
warning. From Christian Dietrich.
2) Fix a bit operation that did erroneous masking, from Julia Lawall.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide:
drivers/ide/ide-cs.c: adjust suspicious bit operation
ide: icside.c: fix printk format string compile warning
ide: icside.c: Fix compile with CONFIG_BLK_DEV_IDEDMA_ICS=n
Pull sparc update from David S. Miller:
"This just removes some sparc headers that were never, ever, used."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: remove two unused headers
gcc was giving an uninit variable warning here. Strictly
speaking we don't need to init it, but this will make things
much less error prone.
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Pull perf fixes from Ingo Molnar.
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
watchdog: Quiet down the boot messages
perf/x86: Fix broken LBR fixup code
tracing: Have tracing_off() actually turn tracing off
Pull core updates (RCU and locking) from Ingo Molnar:
"Most of the diffstat comes from the RCU slow boot regression fixes,
but there's also a debuggability improvements/fixes."
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
memblock: Document memblock_is_region_{memory,reserved}()
rcu: Precompute RCU_FAST_NO_HZ timer offsets
rcu: Move RCU_FAST_NO_HZ per-CPU variables to rcu_dynticks structure
rcu: Update RCU_FAST_NO_HZ tracing for lazy callbacks
rcu: RCU_FAST_NO_HZ detection of callback adoption
spinlock: Indicate that a lockup is only suspected
kdump: Execute kmsg_dump(KMSG_DUMP_PANIC) after smp_send_stop()
panic: Make panic_on_oops configurable
Pull target updates from Nicholas Bellinger:
"This series contains post merge qla_target.c / tcm_qla2xxx bugfixes
from the past weeks, including the patch to allow target-core to use
an optional session shutdown callback to help address an active I/O
shutdown bug in tcm_qla2xxx code (Joern).
Also included is a target regression bugfix releated to explict ALUA
target port group CDB emulation that is CC'ed to stable (Roland)."
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
qla2xxx: Remove version.h header file inclusion
tcm_qla2xxx: Handle malformed wwn strings properly
tcm_qla2xxx: tcm_qla2xxx_handle_tmr() can be static
qla2xxx: Don't leak commands we give up on in qlt_do_work()
qla2xxx: Don't crash if we can't find cmd for failed CTIO
tcm_qla2xxx: Don't insert nacls without sessions into the btree
target: Return error to initiator if SET TARGET PORT GROUPS emulation fails
tcm_qla2xxx: Clear session s_id + loop_id earlier during shutdown
tcm_qla2xxx: Convert to TFO->put_session() usage
target: Add TFO->put_session() caller for HW fabric session shutdown
Pull btrfs update from Chris Mason:
"The dates look like I had to rebase this morning because there was a
compiler warning for a printk arg that I had missed earlier.
These are all fixes, including one to prevent using stale pointers for
device names, and lots of fixes around transaction abort cleanups
(Josef, Liu Bo).
Jan Schmidt also sent in a number of fixes for the new reference
number tracking code.
Liu Bo beat me to updating the MAINTAINERS file. Since he thought to
also fix the git url, I kept his commit."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (24 commits)
Btrfs: update MAINTAINERS info for BTRFS FILE SYSTEM
Btrfs: destroy the items of the delayed inodes in error handling routine
Btrfs: make sure that we've made everything in pinned tree clean
Btrfs: avoid memory leak of extent state in error handling routine
Btrfs: do not resize a seeding device
Btrfs: fix missing inherited flag in rename
Btrfs: fix incompat flags setting
Btrfs: fix defrag regression
Btrfs: call filemap_fdatawrite twice for compression
Btrfs: keep inode pinned when compressing writes
Btrfs: implement ->show_devname
Btrfs: use rcu to protect device->name
Btrfs: unlock everything properly in the error case for nocow
Btrfs: fix btrfs_destroy_marked_extents
Btrfs: abort the transaction if the commit fails
Btrfs: wake up transaction waiters when aborting a transaction
Btrfs: fix locking in btrfs_destroy_delayed_refs
Btrfs: pass locked_page into extent_clear_unlock_delalloc if theres an error
Btrfs: fix race in tree mod log addition
Btrfs: add btrfs_next_old_leaf
...
Update to the latest btrfs's maintainer mail and git repo.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
the items of the delayed inodes were forgotten to be freed, this patch
fixes it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Since we have two trees for recording pinned extents, we need to go through
both of them to make sure that we've done everything clean.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
We've forgotten to clear extent states in pinned tree, which will results in
space counter mismatch and memory leak:
WARNING: at fs/btrfs/extent-tree.c:7537 btrfs_free_block_groups+0x1f3/0x2e0 [btrfs]()
...
space_info 2 has 8380416 free, is not full
space_info total=12582912, used=4096, pinned=4096, reserved=0, may_use=0, readonly=4194304
btrfs state leak: start 29364224 end 29376511 state 1 in tree ffff880075f20090 refs 1
...
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Seeding devices are not supposed to change any more.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
When we move a file into a directory with compression flag, we need to
inherite BTRFS_INODE_COMPRESS and clear BTRFS_INODE_NOCOMPRESS as well.
But if we move a file into a directory without compression flag, we need
to clear both of them.
It is the way how our setflags deals with compression flag, so keep
the same behaviour here.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>