Commit Graph

218446 Commits

Author SHA1 Message Date
Eric Paris
b575156daf IMA: drop the inode opencount since it isn't needed for operation
The opencount was used to help debugging to make sure that everything
which created a struct file also correctly made the IMA calls.  Since we
moved all of that into the VFS this isn't as necessary.  We should be
able to get the same amount of debugging out of just the reader and
write count.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 11:37:17 -07:00
Eric Paris
8549164143 IMA: use rbtree instead of radix tree for inode information cache
The IMA code needs to store the number of tasks which have an open fd
granting permission to write a file even when IMA is not in use.  It
needs this information in order to be enabled at a later point in time
without losing it's integrity garantees.

At the moment that means we store a little bit of data about every inode
in a cache.  We use a radix tree key'd on the inode's memory address.
Dave Chinner pointed out that a radix tree is a terrible data structure
for such a sparse key space.  This patch switches to using an rbtree
which should be more efficient.

Bug report from Dave:

 "I just noticed that slabtop was reporting an awfully high usage of
  radix tree nodes:

   OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
  4200331 2778082  66%    0.55K 144839       29   2317424K radix_tree_node
  2321500 2060290  88%    1.00K  72581       32   2322592K xfs_inode
  2235648 2069791  92%    0.12K  69864       32    279456K iint_cache

  That is, 2.7M radix tree nodes are allocated, and the cache itself is
  consuming 2.3GB of RAM.  I know that the XFS inodei caches are indexed
  by radix tree node, but for 2 million cached inodes that would mean a
  density of 1 inode per radix tree node, which for a system with 16M
  inodes in the filsystems is an impossibly low density.  The worst I've
  seen in a production system like kernel.org is about 20-25% density,
  which would mean about 150-200k radix tree nodes for that many inodes.
  So it's not the inode cache.

  So I looked up what the iint_cache was.  It appears to used for
  storing per-inode IMA information, and uses a radix tree for indexing.
  It uses the *address* of the struct inode as the indexing key.  That
  means the key space is extremely sparse - for XFS the struct inode
  addresses are approximately 1000 bytes apart, which means the closest
  the radix tree index keys get is ~1000.  Which means that there is a
  single entry per radix tree leaf node, so the radix tree is using
  roughly 550 bytes for every 120byte structure being cached.  For the
  above example, it's probably wasting close to 1GB of RAM...."

Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 11:37:17 -07:00
Bryan Schumaker
eb1c86b8b5 NFS: rename nfs.upcall -> nfs.idmap
This patch renames the idmapper upcall program from nfs.upcall to nfs.idmap in
the NFS documentation.  This is because the program has been renamed in the
nfs-utils source.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-26 13:57:10 -04:00
Trond Myklebust
036a107597 NFS: Fix a compile issue in nfs_root
Stephen Rothwell reports:

> /home/test/linux-2.6/fs/nfs/nfsroot.c: In function 'nfs_root_debug':
> /home/test/linux-2.6/fs/nfs/nfsroot.c:110:2: error: 'nfs_debug'
> undeclared (first use in this function)
> /home/test/linux-2.6/fs/nfs/nfsroot.c:110:2: note: each undeclared
> identifier is reported only once for each function it appears in
> make[3]: *** [fs/nfs/nfsroot.o] Error 1
> make[2]: *** [fs/nfs] Error 2
> make[1]: *** [fs] Error 2
> make: *** [sub-make] Error 2

Which is caused by commit 306a075362
(NFS: Allow NFSROOT debugging messages to be enabled dynamically)

Fix is to disable this code when RPC_DEBUG is disabled.

Reported-by: Zimny Lech <napohybelskurwysynom2010@gmail.com>
Tested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-26 13:56:42 -04:00
Linus Torvalds
45352bbf48 Merge git://git.infradead.org/battery-2.6
* git://git.infradead.org/battery-2.6:
  power_supply: Makefile cleanup
  bq27x00_battery: Add missing kfree(di->bus) in bq27x00_battery_remove()
  power_supply: Introduce maximum current property
  power_supply: Add types for USB chargers
  ds2782_battery: Fix units
  power_supply: Add driver for TWL4030/TPS65950 BCI charger
  bq20z75: Add support for more power supply properties
  wm831x_power: Add missing kfree(wm831x_power) in wm831x_power_remove()
  jz4740-battery: Add missing kfree(jz_battery) in jz_battery_remove()
  ds2760_battery: Add missing kfree(di) in ds2760_battery_remove()
  olpc_battery: Fix endian neutral breakage for s16 values
  ds2760_battery: Fix W1 and W1_SLAVE_DS2760 dependency
  pcf50633-charger: Add missing sysfs_remove_group()
  power_supply: Add driver for TI BQ20Z75 gas gauge IC
  wm831x_power: Remove duplicate chg mask
  omap: rx51: Add support for USB chargers
  power_supply: Add isp1704 charger detection driver
2010-10-26 10:14:23 -07:00
Linus Torvalds
da62aa69c1 Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/i7core
* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/i7core: (34 commits)
  i7core_edac: return -ENODEV when devices were already probed
  i7core_edac: properly terminate pci_dev_table
  i7core_edac: Avoid PCI refcount to reach zero on successive load/reload
  i7core_edac: Fix refcount error at PCI devices
  i7core_edac: it is safe to i7core_unregister_mci() when mci=NULL
  i7core_edac: Fix an oops at i7core probe
  i7core_edac: Remove unused member channels in i7core_pvt
  i7core_edac: Remove unused arg csrow from get_dimm_config
  i7core_edac: Reduce args of i7core_register_mci
  i7core_edac: Introduce i7core_unregister_mci
  i7core_edac: Use saved pointers
  i7core_edac: Check probe counter in i7core_remove
  i7core_edac: Call pci_dev_put() when alloc_i7core_dev()  failed
  i7core_edac: Fix error path of i7core_register_mci
  i7core_edac: Fix order of lines in i7core_register_mci
  i7core_edac: Always do get/put for all devices
  i7core_edac: Introduce i7core_pci_ctl_create/release
  i7core_edac: Introduce free_i7core_dev
  i7core_edac: Introduce alloc_i7core_dev
  i7core_edac: Reduce args of i7core_get_onedevice
  ...
2010-10-26 10:13:48 -07:00
Linus Torvalds
f1ebdd60cc Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6
* 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (22 commits)
  Add _addr_lsb field to ia64 siginfo
  Fix migration.c compilation on s390
  HWPOISON: Remove retry loop for try_to_unmap
  HWPOISON: Turn addr_valid from bitfield into char
  HWPOISON: Disable DEBUG by default
  HWPOISON: Convert pr_debugs to pr_info
  HWPOISON: Improve comments in memory-failure.c
  x86: HWPOISON: Report correct address granuality for huge hwpoison faults
  Encode huge page size for VM_FAULT_HWPOISON errors
  Fix build error with !CONFIG_MIGRATION
  hugepage: move is_hugepage_on_freelist inside ifdef to avoid warning
  Clean up __page_set_anon_rmap
  HWPOISON, hugetlb: fix unpoison for hugepage
  HWPOISON, hugetlb: soft offlining for hugepage
  HWPOSION, hugetlb: recover from free hugepage error when !MF_COUNT_INCREASED
  hugetlb: move refcounting in hugepage allocation inside hugetlb_lock
  HWPOISON, hugetlb: add free check to dequeue_hwpoison_huge_page()
  hugetlb: hugepage migration core
  hugetlb: redefine hugepage copy functions
  hugetlb: add allocate function for hugepage migration
  ...
2010-10-26 10:13:10 -07:00
Linus Torvalds
f99d055398 Merge branch 'for_linus' of git://github.com/at91linux/linux-2.6-at91
* 'for_linus' of git://github.com/at91linux/linux-2.6-at91:
  AT91: rtc: enable built-in RTC in Kconfig for at91sam9g45 family
  at91/atmel-mci: inclusion of sd/mmc driver in at91sam9g45 chip and board
  AT91: pm: make sure that r0 is 0 when dealing with cache operations
  AT91: pm: use plain cpu_do_idle() for "wait for interrupt"
  AT91: reset: extend alternate reset procedure to several chips
  AT91: reset routine cleanup, remove not needed icache flush
  AT91: trivial: align comment of at91sam9g20_reset with one more tab
  AT91: Fix AT91SAM9G20 reset as per the errata in the data sheet
  AT91: add board support for Pcontrol_G20
2010-10-26 10:03:40 -07:00
Linus Torvalds
2c518959f0 Merge branch 'for-linus' of git://gitorious.org/linux-omap-dss2/linux
* 'for-linus' of git://gitorious.org/linux-omap-dss2/linux:
  OMAP: DSS2: don't power off a panel twice
  OMAP: DSS2: OMAPFB: Allow usage of def_vrfb only for omap2,3
  OMAP: DSS2: OMAPFB: make VRFB depends on OMAP2,3
  OMAP: DSS2: OMAPFB: Allow FB_OMAP2 to build without VRFB
  arm/omap: simplify conditional
  OMAP: DSS2: DSI: Remove extra iounmap in error path
  OMAP: DSS2: Use dss_features framework on DSS2 code
  OMAP: DSS2: Introduce dss_features files
  video/omap: remove mux.h include
  ARM: omap/fb: move get_fbmem_region() to .init.text
  ARM: omap/fb: move omapfb_reserve_sram to .init.text
  ARM: omap/fb: move omap_init_fb to .init.text
  OMAP: DSS2: OMAPFB: swap front and back porches for both hsync and vsync
  OMAP: DSS2: make filter coefficient tables human readable
  OMAP: DSS2: Add SPI dependency to Kconfig of ACX565AKM panel
2010-10-26 10:02:39 -07:00
Linus Torvalds
4f6876031e Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ]: x86, cpufreq: Mark longrun_get_policy with __cpuinit.
  [CPUFREQ] add sampling_down_factor tunable to improve ondemand performance
  [CPUFREQ] arch/x86/kernel/cpu/cpufreq: Fix unsigned return type
  [CPUFREQ] drivers/cpufreq: Adjust confusing if indentation
2010-10-26 10:00:04 -07:00
Linus Torvalds
4390110fef Merge branch 'for-2.6.37' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.37' of git://linux-nfs.org/~bfields/linux: (99 commits)
  svcrpc: svc_tcp_sendto XPT_DEAD check is redundant
  svcrpc: no need for XPT_DEAD check in svc_xprt_enqueue
  svcrpc: assume svc_delete_xprt() called only once
  svcrpc: never clear XPT_BUSY on dead xprt
  nfsd4: fix connection allocation in sequence()
  nfsd4: only require krb5 principal for NFSv4.0 callbacks
  nfsd4: move minorversion to client
  nfsd4: delay session removal till free_client
  nfsd4: separate callback change and callback probe
  nfsd4: callback program number is per-session
  nfsd4: track backchannel connections
  nfsd4: confirm only on succesful create_session
  nfsd4: make backchannel sequence number per-session
  nfsd4: use client pointer to backchannel session
  nfsd4: move callback setup into session init code
  nfsd4: don't cache seq_misordered replies
  SUNRPC: Properly initialize sock_xprt.srcaddr in all cases
  SUNRPC: Use conventional switch statement when reclassifying sockets
  sunrpc/xprtrdma: clean up workqueue usage
  sunrpc: Turn list_for_each-s into the ..._entry-s
  ...

Fix up trivial conflicts (two different deprecation notices added in
separate branches) in Documentation/feature-removal-schedule.txt
2010-10-26 09:55:25 -07:00
Linus Torvalds
a4dd8dce14 Merge branch 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  net/sunrpc: Use static const char arrays
  nfs4: fix channel attribute sanity-checks
  NFSv4.1: Use more sensible names for 'initialize_mountpoint'
  NFSv4.1: pnfs: filelayout: add driver's LAYOUTGET and GETDEVICEINFO infrastructure
  NFSv4.1: pnfs: add LAYOUTGET and GETDEVICEINFO infrastructure
  NFS: client needs to maintain list of inodes with active layouts
  NFS: create and destroy inode's layout cache
  NFSv4.1: pnfs: filelayout: introduce minimal file layout driver
  NFSv4.1: pnfs: full mount/umount infrastructure
  NFS: set layout driver
  NFS: ask for layouttypes during v4 fsinfo call
  NFS: change stateid to be a union
  NFSv4.1: pnfsd, pnfs: protocol level pnfs constants
  SUNRPC: define xdr_decode_opaque_fixed
  NFSD: remove duplicate NFS4_STATEID_SIZE
2010-10-26 09:52:09 -07:00
Nicolas Ferre
24cecc1be6 AT91: rtc: enable built-in RTC in Kconfig for at91sam9g45 family
Enable built-in RTC IP in Kconfig and modify comments and help messages.
RTT as RTC is still available but should not be selected in common case.

Reported-by: Yegor Yefremov <yegor_sub1@visionsystems.de>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
2010-10-26 11:32:49 +02:00
Nicolas Ferre
75305d768d at91/atmel-mci: inclusion of sd/mmc driver in at91sam9g45 chip and board
This adds the support of atmel-mci sd/mmc driver in at91sam9g45 devices and
board files. This also configures the DMA controller slave interface for
at_hdmac dmaengine driver.

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
2010-10-26 11:32:49 +02:00
Nicolas Ferre
a2a571b74a AT91: pm: make sure that r0 is 0 when dealing with cache operations
When using CP15 cache operations (c7), we make sure that Rd (r0)
is actually 0 as ARM 926 TRM is saying.

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
2010-10-26 11:32:48 +02:00
Nicolas Ferre
8aeeda822f AT91: pm: use plain cpu_do_idle() for "wait for interrupt"
For power management at91_pm_enter() routine, use the cpu_do_idle() for a
rock solid "wait for interrupt" implementation.
For AT91SAM9 ARM 926 based chips, we can exceed the cache line length as
we can access RAM even while in self-refresh mode.
We keep plain access to CP15 for at91rm9200 as this feature is not
available: instructions have to be in a single cache line.

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
2010-10-26 11:32:48 +02:00
Nicolas Ferre
bb413db591 AT91: reset: extend alternate reset procedure to several chips
Several at91sam9 chips need the alternate reset procedure to be sure to halt
SDRAM smoothly before resetting the chip.
This is an extension of previous patch "Fix AT91SAM9G20 reset" to all chips
affected.

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
2010-10-26 11:32:48 +02:00
Nicolas Ferre
1345562b44 AT91: reset routine cleanup, remove not needed icache flush
Generalize assembler reset routine to allow use on several at91sam9 chips.
This patch replace double definitions of SDRAM controller registers and RSTC
registers with use of classical header files.

For this rework, we remove the not needed icache flush as it is already
done in the calling function: arm_machine_restart().

Rename at91sam9g20_reset.S to generalize to several chips.

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
2010-10-26 11:32:48 +02:00
Nicolas Ferre
ef4d63e6f5 AT91: trivial: align comment of at91sam9g20_reset with one more tab
Preparing next patch with longer names

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
2010-10-26 11:32:47 +02:00
Peter Horton
184c82e853 AT91: Fix AT91SAM9G20 reset as per the errata in the data sheet
If the SDRAM is not cleanly shutdown before reset it can be left driving
the bus, which then stops the bootloader booting from NAND.

Signed-off-by: Peter Horton <phorton@bitbox.co.uk>
[nicolas.ferre@atmel.com: change file header line order]
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
2010-10-26 11:32:47 +02:00
Peter Gsellmann
abf0c1bc94 AT91: add board support for Pcontrol_G20
Board is a carrier board for Stamp9G20, with additional peripherals
for a building automation system

Signed-off-by: Peter Gsellmann <pgsellmann@portner-elektronik.at>
[nicolas.ferre@atmel.com: remove machine_desc.io_pg_offst and .phys_io]
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
2010-10-26 11:32:47 +02:00
Zhang Rui
b1d248d96c ACPI: install ACPI table handler before any dynamic tables being loaded
ACPI table sysfs I/F is broken by commit

78f1699659
Author: Alex Chiang <achiang@hp.com>
Date:   Sun Dec 20 12:19:09 2009 -0700
    ACPI: processor: call _PDC early

because dynamic SSDT tables may be loaded in _PDC,
before installing the ACPI table handler.
As a result, the sysfs I/F of these dynamic tables are
located at  /sys/firmware/acpi/tables instead of
/sys/firmware/acpi/tables/dynamic, which is not true.

Invoke acpi_sysfs_init() before acpi_early_processor_set_pdc(),
so that the table handler is installed before any dynamic tables loaded.

https://bugzilla.kernel.org/show_bug.cgi?id=21142

CC: Dennis Jansen <dennis.jansen@web.de>
CC: Alex Chiang <achiang@hp.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2010-10-26 04:55:29 -04:00
Alex Deucher
881fe6c1d0 drm/radeon/kms: properly compute group_size on 6xx/7xx
Needed for tiled surfaces.

Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-10-26 14:47:54 +10:00
Alex Deucher
354da65323 drm/radeon/kms: fix 2D tile height alignment in the r600 CS checker
macro tile heights are aligned to num channels, not num banks.

Noticed by Dave Airlie.

Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-10-26 14:47:44 +10:00
Alex Deucher
2281a378e1 drm/radeon/kms/evergreen: set the clear state to the blit state
The hw stores a default clear state for registers in the context
range that can be initialized when the CP is set up.  Set the
blit state as the default clear state and use the CLEAR_STATE
packet to load the blit state rather than loading it from an IB.
This reduces overhead when doing bo moves using the 3D engine.

Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-10-26 14:42:39 +10:00
Dave Airlie
c3cceeddf0 drm/radeon/kms: don't poll dac load detect.
This is slightly destructive, cpu intensive and can cause lockups.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-10-26 12:55:52 +10:00
Joe Perches
411b5e0561 net/sunrpc: Use static const char arrays
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-25 22:19:52 -04:00
J. Bruce Fields
43c2e885be nfs4: fix channel attribute sanity-checks
The sanity checks here are incorrect; in the worst case they allow
values that crash the client.

They're also over-reliant on the preprocessor.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-25 22:19:41 -04:00
Linus Torvalds
b18cae4224 Merge branch 'for-next' of git://android.git.kernel.org/kernel/tegra
* 'for-next' of git://android.git.kernel.org/kernel/tegra:
  spi: tegra: fix error setting on timeout
  spi: add spi_tegra driver
  tegra: harmony: enable PCI Express
  tegra: add PCI Express support
  tegra: add PCI Express clocks
  [ARM] tegra: Add APB DMA support
  [ARM] tegra: Add cpufreq support
  [ARM] tegra: common: Update common clock init table
  [ARM] tegra: clock: Add dvfs support, bug fixes, and cleanups
  [ARM] tegra: Add support for reading fuses
  [ARM] tegra: gpio: Add suspend and wake support
  [ARM] tegra: pinmux: add safe values, move tegra2, add suspend
  [ARM] tegra: add suspend and mirror irqs to legacy controller
  [ARM] tegra: Add legacy irq support
  [ARM] tegra: update iomap
2010-10-25 18:42:06 -07:00
Linus Torvalds
4833c16dea Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/vapier/blackfin
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/vapier/blackfin:
  Blackfin: fix inverted anomaly 05000481 logic
  Blackfin: drop unused irq_panic()/DEBUG_ICACHE_CHECK
  Blackfin: ppi/spi/twi headers: add missing __BFP undef
  Blackfin: update defconfigs
  Blackfin: bfin_twi.h: start a common TWI header
  netdev: bfin_mac: push settings to platform resources
2010-10-25 18:41:32 -07:00
Al Viro
63997e98a3 split invalidate_inodes()
Pull removal of fsnotify marks into generic_shutdown_super().
Split umount-time work into a new function - evict_inodes().
Make sure that invalidate_inodes() will be able to cope with
I_FREEING once we change locking in iput().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25 21:27:18 -04:00
Christoph Hellwig
9843b76aae fs: skip I_FREEING inodes in writeback_sb_inodes
Skip I_FREEING inodes just like I_WILL_FREE and I_NEW when walking the
writeback lists.  Currenly this can't happen, but once we move from
inode_lock to more fine grained locking we can have an inode that's
still on the writeback lists but has I_FREEING set, and we absolutely
need to skip it here, just like we do for all other inode list walks.

Based on a patch from Dave Chinner.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25 21:26:16 -04:00
Christoph Hellwig
a031878670 fs: fold invalidate_list into invalidate_inodes
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25 21:26:15 -04:00
Christoph Hellwig
d895a1c96a fs: do not drop inode_lock in dispose_list
Despite the comment above it we can not safely drop the lock here.
invalidate_list is called from many other places that just umount.
Also switch to proper list macros now that we never drop the lock.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25 21:26:15 -04:00
Nick Piggin
7ccf19a804 fs: inode split IO and LRU lists
The use of the same inode list structure (inode->i_list) for two
different list constructs with different lifecycles and purposes
makes it impossible to separate the locking of the different
operations. Therefore, to enable the separation of the locking of
the writeback and reclaim lists, split the inode->i_list into two
separate lists dedicated to their specific tracking functions.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25 21:26:15 -04:00
Dave Chinner
a5491e0c7b fs: switch bdev inode bdi's correctly
bdev inodes can remain dirty even after their last close. Hence the
BDI associated with the bdev->inode gets modified duringthe last
close to point to the default BDI. However, the bdev inode still
needs to be moved to the dirty lists of the new BDI, otherwise it
will corrupt the writeback list is was left on.

Add a new function bdev_inode_switch_bdi() to move all the bdi state
from the old bdi to the new one safely. This is only a temporary
measure until the bdev inode<->bdi lifecycle problems are sorted
out.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25 21:26:14 -04:00
Christoph Hellwig
99a3891924 fs: fix buffer invalidation in invalidate_list
We must not call invalidate_inode_buffers in invalidate_list unless the
inode can be reclaimed.  If we remove the buffer association of a busy
inode fsync won't find the buffers anymore.  As invalidate_inode_buffers
is called from various others sources than umount this actually does
matter in practice.

While at it change the loop to a more natural form and remove the
WARN_ON for I_NEW, wich we already tested a few lines above.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25 21:26:14 -04:00
Christoph Hellwig
4d4eb36679 fsnotify: use dget_parent
Use dget_parent instead of opencoding it.  This simplifies the code, but
more importanly prepares for the more complicated locking for a parent
dget in the dcache scale patch series.

It means we do grab a reference to the parent now if need to be watched,
but not with the specified mask.  If this turns out to be a problem
we'll have to revisit it, but for now let's keep as much as possible
dcache internals inside dcache.[ch].

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25 21:26:14 -04:00
Christoph Hellwig
be9eee2e8b smbfs: use dget_parent
Use dget_parent instead of opencoding it.  This simplifies the code, but
more importanly prepares for the more complicated locking for a parent
dget in the dcache scale patch series.

Note that the d_time assignment in smb_renew_times moves out of d_lock,
but it's a single atomic 32-bit value, and that's what other sites
setting it do already.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25 21:26:14 -04:00
Christoph Hellwig
0461ee2616 exportfs: use dget_parent
Use dget_parent instead of opencoding it.  This simplifies the code, but
more importanly prepares for the more complicated locking for a parent
dget in the dcache scale patch series.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25 21:26:13 -04:00
Christoph Hellwig
3825bdb7ed fs: use RCU read side protection in d_validate
d_validate does a purely read lookup in the dentry hash, so use RCU read side
locking instead of dcache_lock.  Split out from a larget patch by
Nick Piggin <npiggin@suse.de>.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25 21:26:13 -04:00
Christoph Hellwig
a4633357ac fs: clean up dentry lru modification
Always do a list_del_init on the LRU to make sure the list_empty invariant for
not beeing on the LRU always holds true, and fold dentry_lru_del_init into
dentry_lru_del.  Replace the dentry_lru_add_tail primitive with a
dentry_lru_move_tail operations that simpler when the dentry already is one
the list, which is always is.  Move the list_empty into dentry_lru_add to
fit the scheme of the other lru helpers, and simplify locking once we
move to a separate LRU lock.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25 21:26:13 -04:00
Christoph Hellwig
3049cfe24e fs: split __shrink_dcache_sb
Currently __shrink_dcache_sb has an extremly awkward calling convention
because it tries to please very different callers.  Split out the
main loop into a shrink_dentry_list helper, which gets called directly
from shrink_dcache_sb for the cases where all dentries need to be pruned,
or from __shrink_dcache_sb for pruning only a certain number of dentries.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25 21:26:13 -04:00
Nick Piggin
265ac90230 fs: improve DCACHE_REFERENCED usage
dentry referenced bit is only set when installing the dentry back
onto the LRU. However with lazy LRU, the dentry can already be on
the LRU list at dput time, thus missing out on setting the referenced
bit. Fix this.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25 21:26:12 -04:00
Christoph Hellwig
312d3ca856 fs: use percpu counter for nr_dentry and nr_dentry_unused
The nr_dentry stat is a globally touched cacheline and atomic operation
twice over the lifetime of a dentry. It is used for the benfit of userspace
only. Turn it into a per-cpu counter and always decrement it in d_free instead
of doing various batching operations to reduce lock hold times in the callers.

Based on an earlier patch from Nick Piggin <npiggin@suse.de>.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25 21:26:12 -04:00
Christoph Hellwig
9c82ab9c9e fs: simplify __d_free
Remove d_callback and always call __d_free with a RCU head.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25 21:26:12 -04:00
Christoph Hellwig
be148247cf fs: take dcache_lock inside __d_path
All callers take dcache_lock just around the call to __d_path, so
take the lock into it in preparation of getting rid of dcache_lock.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25 21:26:12 -04:00
Christoph Hellwig
85fe4025c6 fs: do not assign default i_ino in new_inode
Instead of always assigning an increasing inode number in new_inode
move the call to assign it into those callers that actually need it.
For now callers that need it is estimated conservatively, that is
the call is added to all filesystems that do not assign an i_ino
by themselves.  For a few more filesystems we can avoid assigning
any inode number given that they aren't user visible, and for others
it could be done lazily when an inode number is actually needed,
but that's left for later patches.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25 21:26:11 -04:00
Eric Dumazet
f991bd2e14 fs: introduce a per-cpu last_ino allocator
new_inode() dirties a contended cache line to get increasing
inode numbers. This limits performance on workloads that cause
significant parallel inode allocation.

Solve this problem by using a per_cpu variable fed by the shared
last_ino in batches of 1024 allocations.  This reduces contention on
the shared last_ino, and give same spreading ino numbers than before
(i.e. same wraparound after 2^32 allocations).

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25 21:26:11 -04:00
Al Viro
7de9c6ee3e new helper: ihold()
Clones an existing reference to inode; caller must already hold one.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25 21:26:11 -04:00