linux

Author	SHA1	Message	Date
Stephen Rothwell	1388cc964e	PCI: don't export linux/io.h from pci.h Move the include of io.h down into the #ifdef __KERNEL__ protected region. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2008-10-22 16:42:46 -07:00
Alex Chiang	58319b802a	PCI: Hotplug core: remove 'name' Now that the PCI core manages the 'name' for each individual hotplug driver, and all drivers (except rpaphp) have been converted to use hotplug_slot_name(), there is no need for the PCI hotplug core to drag around its own copy of name either. Cc: kristen.c.accardi@intel.com Cc: matthew@wil.cx Acked-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2008-10-22 16:42:43 -07:00
Alex Chiang	0ad772ec46	PCI, PCI Hotplug: introduce slot_name helpers In preparation for cleaning up the various hotplug drivers such that they don't have to manage their own 'name' parameters anymore, we provide the following convenience functions: pci_slot_name() hotplug_slot_name() These helpers will be used by individual hotplug drivers. Cc: kristen.c.accardi@intel.com Cc: matthew@wil.cx Acked-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2008-10-22 16:42:40 -07:00
Alex Chiang	828f37683e	PCI: update pci_create_slot() to take a 'hotplug' param Slot detection drivers can co-exist with hotplug drivers. The names of the detected/claimed slots may be different depending on module load order. For legacy reasons, we need to allow hotplug drivers to override the slot name if a detection driver is loaded first (and they find the same slots). Creating and overriding slot names should be an atomic operation, otherwise you get a locking nightmare as various drivers race to call pci_create_slot(). pci_create_slot() is already serialized by grabbing the pci_bus_sem. We update the API and add a 'hotplug' param, which is: set if the caller is a hotplug driver NULL if the caller is a detection driver pci_create_slot() does not actually use the 'hotplug' parameter in this patch. A later patch will add the logic that uses it. Cc: kristen.c.accardi@intel.com Cc: matthew@wil.cx Acked-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2008-10-22 16:42:38 -07:00
Alex Chiang	d25b7c8d6b	PCI: rename pci_update_slot_number to pci_renumber_slot The GPL exported symbol pci_update_slot_number has been renamed to pci_renumber_slot. Some of the safety checks were unnecessary and were removed. Cc: kristen.c.accardi@intel.com Cc: matthew@wil.cx Acked-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2008-10-22 16:42:37 -07:00
Alex Chiang	1359f2701b	PCI Hotplug core: add 'name' param pci_hp_register interface Update pci_hp_register() to take a const char *name parameter. The motivation for this is to clean up the individual hotplug drivers so that each one does not have to manage its own name. The PCI core should be the place where we manage the name. We update the interface and all callsites first, in a "no functional change" manner, and clean up the drivers later. Cc: kristen.c.accardi@intel.com Acked-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Reviewed-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2008-10-22 16:42:37 -07:00
Jesse Barnes	64c7f63c1b	PCI: include io.h in pci.h so that ioremap_nocache is defined Ingo pointed out that the m32r build was broken by pci_ioremap. It looks like some files include pci.h w/o including io.h. The latter defines ioremap_* if present, so it makes sense to include it in pci.h now that we have pci_ioremap there. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2008-10-22 16:42:36 -07:00
Sheng Yang	8dd7f8036c	PCI: add support for function level reset Sometimes, it's necessary to enable software's ability to quiesce and reset endpoint hardware with function-level granularity, so provide support for it. The patch implement Function Level Reset(FLR) feature following PCI-e spec. And this is the first step. We would add more generic method, like D0/D3, to allow more devices support this function. The patch contains two functions. pcie_reset_function() is the new driver API, and, contains some action to quiesce a device. The other function is a helper: pcie_execute_reset_function() just executes the reset for a particular device function. Current the usage model is in KVM. Function reset is necessary for assigning device to a guest, or moving it between partitions. For Function Level Reset(FLR), please refer to PCI Express spec chapter 6.6.2. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2008-10-22 16:42:35 -07:00
Yevgeny Petrilin	7ff93f8b7e	mlx4_core: Multiple port type support Multi-protocol adapters support different port types. Each consumer of mlx4_core queries for supported port types; in particular mlx4_ib can no longer assume that all physical ports belong to it. Port type is configured through a sysfs interface. When the type of a port is changed, all mlx4 interfaces are unregistered, and then registered again with the new port types. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-10-22 15:38:42 -07:00
Zhao Yakui	f5adfaa372	ACPI: Add "acpi.power_nocheck=1" to disable power state check in power transition Maybe the incorrect power state is returned on the bogus bios, which is different with the real power state. For example: the bios returns D0 state and the real power state is D3. OS expects to set the device to D0 state. In such case if OS uses the power state returned by the BIOS and checks the device power state very strictly in power transition, the device can't be transited to the correct power state. So the boot option of "acpi.power_nocheck=1" is added to avoid checking the device power in the course of device power transition. http://bugzilla.kernel.org/show_bug.cgi?id=8049 http://bugzilla.kernel.org/show_bug.cgi?id=11000 Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Li Shaohua <shaohua.li@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2008-10-22 17:59:57 -04:00
Yevgeny Petrilin	2a2336f822	mlx4_core: Ethernet MAC/VLAN management Add support for managing MAC and VLAN filters for each port. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: Oren Duer <oren@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-10-22 11:44:46 -07:00
Anton Vorontsov	11f1f2afd6	i2c: Add info->archdata field If present the info->archdata is copied into the dev->archdata. Some (OpenFirmware) platforms need it. Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Jean Delvare <khali@linux-fr.org>	2008-10-22 20:21:33 +02:00
Jean Delvare	3ae70deef0	i2c: Clean up <linux/i2c.h> Fix most checkpatch.pl errors and warnings. This includes replacing spaces with tabs in many places, adding and removing spaces, and folding long lines. Also complete a couple prototypes to make it clearer what the parameters represent. Signed-off-by: Jean Delvare <khali@linux-fr.org>	2008-10-22 20:21:32 +02:00
Jean Delvare	c0589d4bc1	i2c: Drop 2-byte address block transfer defines We have no users and no implementers for these transfer types so it makes little sense to define functionality bits for them. Signed-off-by: Jean Delvare <khali@linux-fr.org>	2008-10-22 20:21:31 +02:00
Jean Delvare	7d1d8999b4	i2c: Constify i2c_get_clientdata's parameter i2c_get_clientdata doesn't change the i2c_client it is passed as a parameter, so it can be constified. Same for i2c_get_adapdata. Signed-off-by: Jean Delvare <khali@linux-fr.org>	2008-10-22 20:21:31 +02:00
Wolfram Sang	14f55f7a03	i2c: Make clear what the class field of i2c_adapter is good for Make clear what the class field of i2c_adapter is good for. Signed-off-by: Wolfram Sang <w.sang@pengutronix.de> Signed-off-by: Jean Delvare <khali@linux-fr.org>	2008-10-22 20:21:30 +02:00
David Miller	30091404af	i2c-algo-pcf: Add adapter hooks around xfer begin and end Some I2C bus implementations need to synchronize with external entities, such as system firmware, which might also be programming the same I2C bus. In order to facilitate this add ->xfer_begin() and ->xfer_end() hooks which are invoked around pcf_xfer(). [JD: Make these hooks optional.] Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jean Delvare <khali@linux-fr.org>	2008-10-22 20:21:30 +02:00
David Miller	08e5338d11	i2c-algo-pcf: Pass adapter data into ->waitforpin() method Pass adapter data into ->waitforpin() method. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jean Delvare <khali@linux-fr.org>	2008-10-22 20:21:29 +02:00
Yevgeny Petrilin	b79acb49de	mlx4_core: Get ethernet MTU and default address from firmware Get maximum ethernet MTU and default MAC address from the firmware QUERY_DEV_CAP command. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-10-22 10:56:48 -07:00
Yevgeny Petrilin	93fc9e1bb6	mlx4_core: Support multiple pre-reserved QP regions For ethernet support, we need to reserve QPs for the ethernet and fibre channel driver. The QPs are reserved at the end of the QP table. (This way we assure that they are aligned to their size) We need to consider these reserved ranges in bitmap creation, so we extend the mlx4 bitmap utility functions to allow reserved ranges at both the bottom and the top of the range. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-10-22 10:25:29 -07:00
Jiri Slaby	a73a63701f	HID: add hid_type to general hid struct Add type to the hid structure to distinguish to which device type (now only mouse) we are talking to. Needed for per device type ignore list support. Note: this patch leaves the type as unknown for bluetooth devices, there is not support for this in the hidp code. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2008-10-22 14:45:11 +02:00
Catalin Marinas	319edafef6	smc911x: Add IRQ polarity configuration Platforms like ARM Ltd's RealView require the IRQ polarity bit to be set for the SMC9118 chip. This patch allows the dynamic configuration via the smc911x_platdata structure. This patch also changes the smc91x_platdata structure name to the correct smc911x_platdata in the smc911x_drv_probe() function. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-10-22 07:00:38 -04:00
Thomas Gleixner	268a3dcfea	Merge branch 'timers/range-hrtimers' into v28-range-hrtimers-for-linus-v2 Conflicts: kernel/time/tick-sched.c Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-10-22 09:48:06 +02:00
Andy Henroid	27471fdb32	i7300_idle driver v1.55 The Intel 7300 Memory Controller supports dynamic throttling of memory which can be used to save power when system is idle. This driver does the memory throttling when all CPUs are idle on such a system. Refer to "Intel 7300 Memory Controller Hub (MCH)" datasheet for the config space description. Signed-off-by: Andy Henroid <andrew.d.henroid@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>	2008-10-21 23:58:41 -04:00
David Brownell	a30d46c042	mfd: twl4030 IRQ handling update - Move it into a separate file; clean and streamline it - Restructure the init code for reuse during secondary dispatch - Support both levels (primary, secondary) of IRQ dispatch - Use a workqueue for irq mask/unmask and trigger configuration Code for two subchips currently share that secondary handler code. One is the power subchip; its IRQs are now handled by this core, courtesy of this patch. The other is the GPIO module, which will be supported through a later patch. There are also minor changes to the header file, mostly related to GPIO support; nothing yet in mainline cares about those. A few references to OMAP-specific symbols are disabled; when they can all be removed, the TWL4030 support ceases being OMAP-specific. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Samuel Ortiz <sameo@openedhand.com>	2008-10-22 01:19:37 +02:00
Heiko Carstens	0d557dc97f	workqueue: introduce create_rt_workqueue create_rt_workqueue will create a real time prioritized workqueue. This is needed for the conversion of stop_machine to a workqueue based implementation. This patch adds yet another parameter to __create_workqueue_key to tell it that we want an rt workqueue. However it looks like we rather should have something like "int type" instead of singlethread, freezable and rt. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Ingo Molnar <mingo@elte.hu>	2008-10-22 10:00:25 +11:00
Rusty Russell	67e67ceaac	core_param() for genuinely core kernel parameters There are a lot of one-liner uses of __setup() in the kernel: they're cumbersome and not queryable (definitely not settable) via /sys. Yet it's ugly to simplify them to module_param(), because by default that inserts a prefix of the module name (usually filename). So, introduce a "core_param". The parameter gets no prefix, but appears in /sys/module/kernel/parameters/ (if non-zero perms arg). I thought about using the name "core", but that's more common than "kernel". And if you create a module called "kernel", you will die a horrible death. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-10-22 10:00:23 +11:00
Rusty Russell	9b473de872	param: Fix duplicate module prefixes Instead of insisting each new module_param sysfs entry is unique, handle the case where it already exists (for builtin modules). The current code assumes that all identical prefixes are together in the section: true for normal uses, but not necessarily so if someone overrides MODULE_PARAM_PREFIX. More importantly, it's not true with the new "core_param()" code which uses "kernel" as a prefix. This simplifies the caller for the builtin case, at a slight loss of efficiency (we do the lookup every time to see if the directory exists). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Greg Kroah-Hartman <gregkh@suse.de>	2008-10-22 10:00:23 +11:00
Rusty Russell	730b69d225	module: check kernel param length at compile time, not runtime The kparam code tries to handle over-length parameter prefixes at runtime. Not only would I bet this has never been tested, it's not clear that truncating names is a good idea either. So let's check at compile time. We need to move the #define to moduleparam.h to do this, though. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-10-22 10:00:22 +11:00
Rusty Russell	5e458cc0f4	module: simplify load_module. Linus' recent catch of stack overflow in load_module lead me to look at the code. A couple of helpers to get a section address and get objects from a section can help clean things up a little. (And in case you're wondering, the stack size also dropped from 328 to 284 bytes). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2008-10-22 10:00:15 +11:00
David Woodhouse	b876d08f81	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/pci/dmar.c	2008-10-21 19:42:20 +01:00
Heinz Mauelshagen	1f965b1943	dm raid1: separate region_hash interface part1 Separate the region hash code from raid1 so it can be shared by forthcoming targets. Use BUG_ON() for failed async dm_io() calls. Signed-off-by: Heinz Mauelshagen <hjm@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-10-21 17:45:06 +01:00
Mikulas Patocka	d63a5ce3c0	dm: publish array_too_big Move array_too_big to include/linux/device-mapper.h because it is used by targets. Remove the test from dm-raid1 as the number of mirror legs is limited such that it can never fail. (Even for stripes it seems rather unlikely.) Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2008-10-21 17:44:57 +01:00
Darron Broad	878595f6e7	V4L/DVB (9335): videobuf: split unregister bus creating self-contained frontend de-allocator This creates a self contained frontend de-allocator for the instances where an adapter has not been registered yet frontend de-allocation may be required. Signed-off-by: Darron Broad <darron@kewl.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2008-10-21 14:32:08 -02:00
Mauro Carvalho Chehab	8a522c916d	V4L/DVB (9331): Remove unused inode parameter from video_ioctl2 inode is never used on video_ioctl2. Remove it and rename the function to __video_ioctl2. This allows its usage directly as a callback at fops.unlocked_ioctl. Since we still need a callback with inode to be used with fops.ioctl, this patch adds video_ioctl2() that is just a call to __video_ioctl2(). Also, this patch adds some comments about video_ioctl2 and __video_ioctl2 usage at v4l2-ioctl.h. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2008-10-21 14:31:49 -02:00
Mauro Carvalho Chehab	b1f88407f3	V4L/DVB (9330): Get rid of inode parameter at v4l_compat_translate_ioctl() The inode parameter at v4l_compat_translate_ioctl() were just passed over several places just to keep compatible with fops.ioctl. However, it weren't used anywere. This patch gets hid of this unused parameter. Cc: Laurent Pinchart <laurent.pinchart@skynet.be> Cc: Mike Isely <isely@pobox.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2008-10-21 14:31:45 -02:00
Hans Verkuil	d5fbf32f38	V4L/DVB (9324): v4l2: add video_ioctl2_unlocked for unlocked_ioctl support. Based on an older patch from Sakari Ailus. Cc: Sakari Ailus <sakari.ailus@nokia.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2008-10-21 14:31:25 -02:00
Sakari Ailus	f496bc7166	V4L/DVB (9323): v4l2-int-if: Add enum_framesizes and enum_frameintervals ioctls. Signed-off-by: Sakari Ailus <sakari.ailus@nokia.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2008-10-21 14:31:22 -02:00
Sakari Ailus	36499e525f	V4L/DVB (9322): v4l2-int-if: Export more interfaces to modules Export v4l2_int_device_try_attach_all. This allows initiating the initialisation of int if device after the drivers have been registered. Also allow drivers to call ioctls if v4l2-int-if was compiled as module. Signed-off-by: Sakari Ailus <sakari.ailus@nokia.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2008-10-21 14:31:20 -02:00
Sakari Ailus	84389910d0	V4L/DVB (9321): v4l2-int-if: Define new power state changes Use enum v4l2_power instead of int as second argument to vidioc_int_s_power. The new functionality is that standby state is also recognised. Signed-off-by: Sakari Ailus <sakari.ailus@nokia.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2008-10-21 14:31:17 -02:00
Sergio Aguirre	733d710b09	V4L/DVB (9320): v4l2: Add 10-bit RAW Bayer formats Add 10-bit raw bayer format expanded to 16 bits. Adds also definition for 10-bit raw bayer format dpcm-compressed to 8 bits. Signed-off-by: Sergio Aguirre <saaguirre@ti.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2008-10-21 14:31:15 -02:00
Sameer Venkatraman	621a3739f3	V4L/DVB (9319): v4l2-int-if: Add cropcap, g_crop and s_crop commands. Signed-off-by: Sameer Venkatraman <sameerv@ti.com> Signed-off-by: Mohit Jalori <mjalori@ti.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2008-10-21 14:31:12 -02:00
Sakari Ailus	96014a957c	V4L/DVB (9318): v4l2-int-if: Add command to get slave private data. vidioc_int_g_priv is used to get master's slave-related private data structure. The structure can contain for example master's configuration specific to slave. Signed-off-by: Sakari Ailus <sakari.ailus@nokia.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2008-10-21 14:31:09 -02:00
Chris Zankel	0025427eee	xtensa: Add config files for Diamond 232L - Rev B processor variant The Diamond 232L processor is a pre-configured Xtensa processor tailored for Linux application. Signed-off-by: Chris Zankel <chris@zankel.net>	2008-10-21 09:11:43 -07:00
Chris Zankel	00c81d23d3	xtensa: Fix io regions The uncached area starts at e000.0000 and spans 1GB. Also add an IOADDR macro to determine the bypass region to access the IO space. Signed-off-by: Chris Zankel <chris@zankel.net>	2008-10-21 09:08:31 -07:00
Ingo Molnar	e9f95e6373	genirq: fix off by one and coding style Fix off-by-one in for_each_irq_desc_reverse(). Impact is near zero in practice, because nothing substantial wants to iterate down to IRQ#0 - but fix it nevertheless. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-10-21 15:54:40 +02:00
Al Viro	56b26add02	[PATCH] kill the rest of struct file propagation in block ioctls Now we can switch blkdev_ioctl() block_device/mode Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:49:14 -04:00
Al Viro	e436fdae70	[PATCH] get rid of blkdev_driver_ioctl() convert remaining callers to __blkdev_driver_ioctl() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:49:08 -04:00
Al Viro	572c489215	[PATCH] sanitize blkdev_get() and friends * get rid of fake struct file/struct dentry in __blkdev_get() * merge __blkdev_get() and do_open() * get rid of flags argument of blkdev_get() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:49:06 -04:00
Al Viro	e5eb8caa83	[PATCH] remember mode of reiserfs journal Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:49:04 -04:00
Al Viro	30c40d2c01	[PATCH] propagate mode through open_bdev_excl/close_bdev_excl replace open_bdev_excl/close_bdev_excl with variants taking fmode_t. superblock gets the value used to mount it stored in sb->s_mode Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:49:00 -04:00
Al Viro	9a1c354276	[PATCH] pass fmode_t to blkdev_put() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:48:58 -04:00
Al Viro	90b8f2824c	[PATCH] end of methods switch: remove the old ones Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:48:52 -04:00
Al Viro	d4430d62fa	[PATCH] beginning of methods conversion To keep the size of changesets sane we split the switch by drivers; to keep the damn thing bisectable we do the following: 1) rename the affected methods, add ones with correct prototypes, make (few) callers handle both. That's this changeset. 2) for each driver convert to new methods. ALL drivers are converted in this series. 3) kill the old (renamed) methods. Note that it _is_ a flagday; all in-tree drivers are converted and by the end of this series no trace of old methods remain. The only reason why we do that this way is to keep the damn thing bisectable and allow per-driver debugging if anything goes wrong. New methods: open(bdev, mode) release(disk, mode) ioctl(bdev, mode, cmd, arg) /* Called without BKL / compat_ioctl(bdev, mode, cmd, arg) locked_ioctl(bdev, mode, cmd, arg) / Called with BKL, legacy */ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:47:32 -04:00
Al Viro	badf8082c3	[PATCH] switch ide_disk_ops ->ioctl() to sane prototype Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:47:30 -04:00
Al Viro	83ff6fe858	[PATCH] don't mess with file in scsi_nonblockable_ioctl() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:47:28 -04:00
Al Viro	633a08b812	[PATCH] introduce __blkdev_driver_ioctl() Analog of blkdev_driver_ioctl() with sane arguments. For now uses fake struct file, by the end of the series it won't and blkdev_driver_ioctl() will become a wrapper around it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:47:26 -04:00
Al Viro	bbc1cc9784	[PATCH] switch cdrom_{open,release,ioctl} to sane APIs ... convert to it in callers Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:47:22 -04:00
Al Viro	08f8585121	[PATCH] move block_device_operations to blkdev.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:47:20 -04:00
Al Viro	647b3d0084	[PATCH] lose unused arguments in dm ioctl callbacks Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:47:18 -04:00
Al Viro	1bddd9e645	[PATCH] lose the unused file argument in generic_ide_ioctl() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:47:16 -04:00
Al Viro	74f3c8aff3	[PATCH] switch scsi_cmd_ioctl() to passing fmode_t Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:47:14 -04:00
Al Viro	e915e872ed	[PATCH] switch sg_scsi_ioctl() to passing fmode_t Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:47:12 -04:00
Al Viro	86d434dede	[PATCH] eliminate use of ->f_flags in block methods store needed information in f_mode Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:47:08 -04:00
Al Viro	aeb5d72706	[PATCH] introduce fmode_t, do annotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:47:06 -04:00
Benjamin Herrenschmidt	a02efb906d	Merge commit 'origin' into master Manual merge of: arch/powerpc/Kconfig arch/powerpc/include/asm/page.h	2008-10-21 15:52:04 +11:00
Carl Love	a5598ca0d4	powerpc/oprofile: Fix mutex locking for cell spu-oprofile The issue is the SPU code is not holding the kernel mutex lock while adding samples to the kernel buffer. This patch creates per SPU buffers to hold the data. Data is added to the buffers from in interrupt context. The data is periodically pushed to the kernel buffer via a new Oprofile function oprofile_put_buff(). The oprofile_put_buff() function is called via a work queue enabling the funtion to acquire the mutex lock. The existing user controls for adjusting the per CPU buffer size is used to control the size of the per SPU buffers. Similarly, overflows of the SPU buffers are reported by incrementing the per CPU buffer stats. This eliminates the need to have architecture specific controls for the per SPU buffers which is not acceptable to the OProfile user tool maintainer. The export of the oprofile add_event_entry() is removed as it is no longer needed given this patch. Note, this patch has not addressed the issue of indexing arrays by the spu number. This still needs to be fixed as the spu numbering is not guarenteed to be 0 to max_num_spus-1. Signed-off-by: Carl Love <carll@us.ibm.com> Signed-off-by: Maynard Johnson <maynardj@us.ibm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Acked-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-10-21 15:17:48 +11:00
NeilBrown	3c0ee63a64	md: use sysfs_notify_dirent to notify changes to md/dev-xxx/state The 'state' file for a device reports, for example, when the device has failed. Changes should be reported to userspace ASAP without the possibility of blocking on low-memory. sysfs_notify does have that possibility (as it takes a mutex which can be held across a kmalloc) so use sysfs_notify_dirent instead. Signed-off-by: NeilBrown <neilb@suse.de>	2008-10-21 13:25:28 +11:00
NeilBrown	b62b75905d	md: use sysfs_notify_dirent to notify changes to md/array_state Now that we have sysfs_notify_dirent, use it to notify changes to md/array_state. As sysfs_notify_dirent can be called in atomic context, we can remove the delayed notify and the MD_NOTIFY_ARRAY_STATE flag. Signed-off-by: NeilBrown <neilb@suse.de>	2008-10-21 13:25:21 +11:00
Arjan van de Ven	f6f286f33e	fix WARN() for PPC powerpc doesn't use the generic WARN_ON infrastructure. The newly introduced WARN() as a result didn't print the message, this patch adds the printk for this specific case. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 15:29:57 -07:00
Linus Torvalds	e3d2f927f7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6: parisc: convert to generic compat_sys_ptrace parisc: add rtc platform driver parisc: initialize unwinder much earlier parisc: add new syscalls parisc: hijack jump to start_kernel parisc: add pdc_coproc_cfg_unlocked and set_firmware_width_unlocked parisc: move include/asm-parisc to arch/parisc/include/asm parisc: move pdc_result to real2.S parisc: unify CCIO_COLLECT_STATS implementation parisc: add arch/parisc/kernel/.gitignore parisc: ropes.h - fix <asm-parisc/> -> <asm/> parisc: parisc-agp - fix <asm-parisc/> -> <asm/> Resolve remove/rename conflict: include/asm-parisc/a.out.h is no longer relevant.	2008-10-20 14:40:31 -07:00
Trent Piepho	326bb8a5a1	leds: Make default trigger fields const The default_trigger fields of struct gpio_led and thus struct led_classdev are pretty much always assigned from a string literal, which means the string can't be modified. Which is fine, since there is no reason to modify the string and in fact it never is. But they should be marked const to prevent such code from being added, to prevent warnings if -Wwrite-strings is used, when assigned from a constant string other than a string literal (which produces a warning under current kernel compiler flags), and for general good coding practices. Signed-off-by: Trent Piepho <tpiepho@freescale.com> Signed-off-by: Richard Purdie <rpurdie@linux.intel.com>	2008-10-20 22:34:12 +01:00
Linus Torvalds	a0bfb673dc	Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (41 commits) PCI: fix pci_ioremap_bar() on s390 PCI: fix AER capability check PCI: use pci_find_ext_capability everywhere PCI: remove #ifdef DEBUG around dev_dbg call PCI hotplug: fix get_##name return value problem PCI: document the pcie_aspm kernel parameter PCI: introduce an pci_ioremap(pdev, barnr) function powerpc/PCI: Add legacy PCI access via sysfs PCI: Add ability to mmap legacy_io on some platforms PCI: probing debug message uniformization PCI: support PCIe ARI capability PCI: centralize the capabilities code in probe.c PCI: centralize the capabilities code in pci-sysfs.c PCI: fix 64-vbit prefetchable memory resource BARs PCI: replace cfg space size (256/4096) by macros. PCI: use resource_size() everywhere. PCI: use same arg names in PCI_VDEVICE comment PCI hotplug: rpaphp: make debug var unique PCI: use %pF instead of print_fn_descriptor_symbol() in quirks.c PCI: fix hotplug get_##name return value problem ...	2008-10-20 13:40:47 -07:00
Linus Torvalds	92b29b86fe	Merge branch 'tracing-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (131 commits) tracing/fastboot: improve help text tracing/stacktrace: improve help text tracing/fastboot: fix initcalls disposition in bootgraph.pl tracing/fastboot: fix bootgraph.pl initcall name regexp tracing/fastboot: fix issues and improve output of bootgraph.pl tracepoints: synchronize unregister static inline tracepoints: tracepoint_synchronize_unregister() ftrace: make ftrace_test_p6nop disassembler-friendly markers: fix synchronize marker unregister static inline tracing/fastboot: add better resolution to initcall debug/tracing trace: add build-time check to avoid overrunning hex buffer ftrace: fix hex output mode of ftrace tracing/fastboot: fix initcalls disposition in bootgraph.pl tracing/fastboot: fix printk format typo in boot tracer ftrace: return an error when setting a nonexistent tracer ftrace: make some tracers reentrant ring-buffer: make reentrant ring-buffer: move page indexes into page headers tracing/fastboot: only trace non-module initcalls ftrace: move pc counter in irqtrace ... Manually fix conflicts: - init/main.c: initcall tracing - kernel/module.c: verbose level vs tracepoints - scripts/bootgraph.pl: fallout from cherry-picking commits.	2008-10-20 13:35:07 -07:00
Linus Torvalds	9301975ec2	Merge branch 'genirq-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip This merges branches irq/genirq, irq/sparseirq-v4, timers/hpet-percpu and x86/uv. The sparseirq branch is just preliminary groundwork: no sparse IRQs are actually implemented by this tree anymore - just the new APIs are added while keeping the old way intact as well (the new APIs map 1:1 to irq_desc[]). The 'real' sparse IRQ support will then be a relatively small patch ontop of this - with a v2.6.29 merge target. * 'genirq-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (178 commits) genirq: improve include files intr_remapping: fix typo io_apic: make irq_mis_count available on 64-bit too genirq: fix name space collisions of nr_irqs in arch/* genirq: fix name space collision of nr_irqs in autoprobe.c genirq: use iterators for irq_desc loops proc: fixup irq iterator genirq: add reverse iterator for irq_desc x86: move ack_bad_irq() to irq.c x86: unify show_interrupts() and proc helpers x86: cleanup show_interrupts genirq: cleanup the sparseirq modifications genirq: remove artifacts from sparseirq removal genirq: revert dynarray genirq: remove irq_to_desc_alloc genirq: remove sparse irq code genirq: use inline function for irq_to_desc genirq: consolidate nr_irqs and for_each_irq_desc() x86: remove sparse irq from Kconfig genirq: define nr_irqs for architectures with GENERIC_HARDIRQS=n ...	2008-10-20 13:23:01 -07:00
Linus Torvalds	99ebcf8285	Merge branch 'v28-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'v28-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (36 commits) fix documentation of sysrq-q really Fix documentation of sysrq-q timer_list: add base address to clock base timer_list: print cpu number of clockevents device timer_list: print real timer address NOHZ: restart tick device from irq_enter() NOHZ: split tick_nohz_restart_sched_tick() NOHZ: unify the nohz function calls in irq_enter() timers: fix itimer/many thread hang, fix timers: fix itimer/many thread hang, v3 ntp: improve adjtimex frequency rounding timekeeping: fix rounding problem during clock update ntp: let update_persistent_clock() sleep hrtimer: reorder struct hrtimer to save 8 bytes on 64bit builds posix-timers: lock_timer: make it readable posix-timers: lock_timer: kill the bogus ->it_id check posix-timers: kill ->it_sigev_signo and ->it_sigev_value posix-timers: sys_timer_create: cleanup the error handling posix-timers: move the initialization of timer->sigq from send to create path posix-timers: sys_timer_create: simplify and s/tasklist/rcu/ ... Fix trivial conflicts due to sysrq-q description clahes in Documentation/sysrq.txt and drivers/char/sysrq.c	2008-10-20 13:19:56 -07:00
Linus Torvalds	72558dde73	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (36 commits) ide: re-add TRM290 fix lost during ide_build_dmatable() cleanup scc_pata: kill unused variables sgiioc4: kill duplicate ioremap() sgiioc4: kill useless address checks delkin_cb: add PM support ide: remove broken hpt34x driver ide-floppy: remove idefloppy_floppy_t typedef sgiioc4: remove maskproc() method hpt366: cleanup maskproc() method ide: mask interrupt in ide_config_drive_speed() hpt366: fix compile warning ide: remove unused macros from <asm-parisc/ide.h> ide: remove M68K_IDE_SWAPW define from <asm-m68k/ide.h> ide: remove dead <asm-arm/arch-sa1100/ide.h> ide: fix support for IDE PCI controllers using MMIO on frv ide-cd: remove stale comment ide-cd: small drive type print fix ide-cd: debug log enhancements ide: add generic ATA/ATAPI disk driver ide: allow device drivers to specify per-device type /proc settings ...	2008-10-20 13:12:39 -07:00
Linus Torvalds	1d9a8a47d6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: implement nonseekable open fuse: add include protectors fuse: config description improvement fuse: add missing fuse_request_free fuse: fix SEEK_END incorrectness	2008-10-20 12:53:27 -07:00
David Woodhouse	b364776ad1	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/pci/intel-iommu.c	2008-10-20 20:19:36 +01:00
Heiko Carstens	96499871f4	PCI: fix pci_ioremap_bar() on s390 s390 doesn't have ioremap_*, so protect the definition of the new pci_ioremap_bar function with CONFIG_HAS_IOMEM to avoid build breakage. Acked-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2008-10-20 11:28:34 -07:00
Yu Zhao	270c66be9b	PCI: fix AER capability check The 'use pci_find_ext_capability everywhere' cleanup brought a new bug, which makes the AER stop working. Fix it by actually using find_ext_cap instead of just find_cap. Drop the unused config space size define while we're at it. Signed-off-by: Yu Zhao <yu.zhao@intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2008-10-20 11:01:52 -07:00
Jesse Barnes	0927678f55	PCI: use pci_find_ext_capability everywhere Remove some open coded (and buggy) versions of pci_find_ext_capability in favor of the real routine in the PCI core. Tested-by: Tomasz Czernecki <czernecki@gmail.com> Acked-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Reviewed-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2008-10-20 11:01:51 -07:00
Arjan van de Ven	aa42d7c613	PCI: introduce an pci_ioremap(pdev, barnr) function A common thing in many PCI drivers is to ioremap() an entire bar. This is a slightly fragile thing right now, needing both an address and a size, and many driver writers do.. various things there. This patch introduces an pci_ioremap() function taking just a PCI device struct and the bar number as arguments, and figures this all out itself, in one place. In addition, we can add various sanity checks to this function (the patch already checks to make sure that the bar in question really is a MEM bar; few to no drivers do that sort of thing). Hopefully with this type of API we get less chance of mistakes in drivers with ioremap() operations. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2008-10-20 11:01:48 -07:00
Yu Zhao	58c3a727cb	PCI: support PCIe ARI capability This patch adds support for PCI Express Alternative Routing-ID Interpretation (ARI) capability. The ARI capability extends the Function Number field of the PCI Express Endpoint by reusing the Device Number which is otherwise hardwired to 0. With ARI, an Endpoint can have up to 256 functions. Signed-off-by: Yu Zhao <yu.zhao@intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2008-10-20 10:54:32 -07:00
Zhao, Yu	c322b28a04	PCI: use same arg names in PCI_VDEVICE comment This cleanup makes the argument names in PCI_VDEVICE comment consistent with those used in its definition. Signed-off-by: Yu Zhao <yu.zhao@intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2008-10-20 10:54:28 -07:00
Seth Heasley	37a84ec668	x86/PCI: irq and pci_ids patch for Intel Ibex Peak DeviceIDs This patch updates the Intel Ibex Peak (PCH) LPC and SMBus Controller DeviceIDs. The LPC Controller ID is set by Firmware within the range of 0x3b00-3b1f. This range is included in pci_ids.h using min and max values, and irq.c now has code to handle the range (in lieu of 32 additions to a SWITCH statement). The SMBus Controller ID is a fixed-value and will not change. Signed-off-by: Seth Heasley <seth.heasley@intel.com> Acked-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2008-10-20 10:53:48 -07:00
Yinghai Lu	16dbef4a83	PCI: change MSI-x vector to 32bit We are using 28bit pci (bus/dev/fn + 12 bits) as irq number, so the cache for irq number should be 32 bit too. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2008-10-20 10:53:42 -07:00
Rafael J. Wysocki	0235c4fc7f	PCI PM: Introduce function pci_wake_from_d3 Many device drivers use the following sequence of statements to enable the device to wake up the system while being in the D3_hot or D3_cold low power state: pci_enable_wake(pdev, PCI_D3hot, 1); pci_enable_wake(pdev, PCI_D3cold, 1); However, the second call is not necessary if the first one succeeds (the ordering of the statements above doesn't matter here) and it may even be harmful, because we are not supposed to enable PME# after the wake-up power has been enabled for the device. To allow drivers to overcome this problem, introduce function pci_wake_from_d3() that will enable the device to wake up the system from any of D3_hot and D3_cold as long as the wake-up from at least one of them is supported. Acked-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2008-10-20 10:53:41 -07:00
Milton Miller	edbc25caaa	PCI: remove dynids.use_driver_data The driver flag dynids.use_driver_data is almost consistently not set, and causes more problems than it solves. It was initially intended as a flag to indicate whether a driver's usage of driver_data had been carefully inspected and was ready for values from userspace. That audit was never done, so most drivers just get a 0 for driver_data when new IDs are added from userspace via sysfs. So remove the flag, allowing drivers to see the data directly (a followon patch validates the passed driver_data value against what the drivers expect). Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2008-10-20 10:48:34 -07:00
Linus Torvalds	7d67474e50	Merge git://git.infradead.org/battery-2.6 * git://git.infradead.org/battery-2.6: bq27x00_battery: use unaligned access helper power_supply: fix dependency of tosa_battery power_supply: Support for Texas Instruments BQ27200 battery managers power_supply: Add function to return system-wide power state pda_power: Check and handle return value of set_irq_wake	2008-10-20 09:44:30 -07:00
Linus Torvalds	45e4a24f7b	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: (26 commits) 9p: add more conservative locking 9p: fix oops in protocol stat parsing error path. 9p: fix device file handling 9p: Improve debug support 9p: eliminate depricated conv functions 9p: rework client code to use new protocol support functions 9p: remove unnecessary tag field from p9_req_t structure 9p: remove 9p fcall debug prints 9p: add new protocol support code 9p: encapsulate version function 9p: move dirread to fs layer 9p: adjust 9p vfs write operation 9p: move readn meta-function from client to fs layer 9p: consolidate read/write functions 9p: drop broken unused error path from p9_conn_create() 9p: make rpc code common and rework flush code 9p: use the rcall structure passed in the request in trans_fd read_work 9p: apply common request code to trans_fd 9p: apply common tagpool handling to trans_fd 9p: move request management to client code ...	2008-10-20 09:39:47 -07:00
Linus Torvalds	52c6738b7f	Merge git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: use correct fs type for v4 submounts and referrals Make nfs_file_cred more robust. NFS: Enable NFSv4 callback server to listen on AF_INET6 sockets	2008-10-20 09:39:20 -07:00
Linus Torvalds	3b72e44154	Merge branch 'for-next' of git://git.o-hand.com/linux-mfd * 'for-next' of git://git.o-hand.com/linux-mfd: mfd: further unbork the ucb1400 ac97_bus dependencies mfd: ucb1400 needs GPIO mfd: ucb1400 sound driver uses/depends on AC97_BUS: mfd: Don't use NO_IRQ in WM8350 mfd: update TMIO drivers to use the clock API mfd: twl4030-core irq simplification mfd: add base support for Dialog DA9030/DA9034 PMICs mfd: TWL4030 core driver mfd: support tmiofb cell on tc6393xb mfd: add OHCI cell to tc6393xb mfd: Fix htc-egpio compile warning mfd: do tcb6393xb state restore on resume only if requested mfd: provide and use setup hook for tc6393xb mfd: update sm501 debugging/low information messages mfd: reduce stack usage in mfd-core.c	2008-10-20 09:22:47 -07:00
Linus Torvalds	ed402af3c2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (112 commits) sh: Move SH-4 CPU headers down one more level. sh: Only build in gpio.o when CONFIG_GENERIC_GPIO is selected. sh: Migrate common board headers to mach-common/. sh: Move the CPU definition headers from asm/ to cpu/. serial: sh-sci: Add support SCIF of SH7723 video: add sh_mobile_lcdc platform flags video: remove unused sh_mobile_lcdc platform data sh: remove consistent alloc cruft sh: add dynamic crash base address support sh: reduce Migo-R smc91x overruns sh: Fix up some merge damage. Fix debugfs_create_file's error checking method for arch/sh/mm/ Fix debugfs_create_dir's error checking method for arch/sh/kernel/ sh: ap325rxa: Add support RTC RX-8564LC in AP325RXA board sh: Use sh7720 GPIO on magicpanelr2 board sh: Add sh7720 pinmux code sh: Use sh7203 GPIO on rsk7203 board sh: Add sh7203 pinmux code sh: Use sh7723 GPIO on AP325RXA board sh: Add sh7723 pinmux code ...	2008-10-20 09:13:34 -07:00
Linus Torvalds	5fdf11283e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: netfilter: replace old NF_ARP calls with NFPROTO_ARP netfilter: fix compilation error with NAT=n netfilter: xt_recent: use proc_create_data() netfilter: snmp nat leaks memory in case of failure netfilter: xt_iprange: fix range inversion match netfilter: netns: use NFPROTO_NUMPROTO instead of NUMPROTO for tables array netfilter: ctnetlink: remove obsolete NAT dependency from Kconfig pkt_sched: sch_generic: Fix oops in sch_teql dccp: Port redirection support for DCCP tcp: Fix IPv6 fallout from 'Port redirection support for TCP' netdev: change name dropping error codes ipvs: Update CONFIG_IP_VS_IPV6 description and help text	2008-10-20 09:06:35 -07:00
Linus Torvalds	2be508d847	Merge git://git.infradead.org/mtd-2.6 * git://git.infradead.org/mtd-2.6: (69 commits) Revert "[MTD] m25p80.c code cleanup" [MTD] [NAND] GPIO driver depends on ARM... for now. [MTD] [NAND] sh_flctl: fix compile error [MTD] [NOR] AT49BV6416 has swapped erase regions [MTD] [NAND] GPIO NAND flash driver [MTD] cmdlineparts documentation change - explain where mtd-id comes from [MTD] cfi_cmdset_0002.c: Add Macronix CFI V1.0 TopBottom detection [MTD] [NAND] Fix compilation warnings in drivers/mtd/nand/cs553x_nand.c [JFFS2] Write buffer offset adjustment for NOR-ECC (Sibley) flash [MTD] mtdoops: Fix a bug where block may not be erased [MTD] mtdoops: Add a magic number to logged kernel oops [MTD] mtdoops: Fix an off by one error [JFFS2] Correct parameter names of jffs2_compress() in comments [MTD] [NAND] sh_flctl: add support for Renesas SuperH FLCTL [MTD] [NAND] Bug on atmel_nand HW ECC : OOB info not correctly written [MTD] [MAPS] Remove unused variable after ROM API cleanup. [MTD] m25p80.c extended jedec support (v2) [MTD] remove unused mtd parameter in of_mtd_parse_partitions() [MTD] [NAND] remove dead Kconfig associated with !CONFIG_PPC_MERGE [MTD] [NAND] driver extension to support NAND on TQM85xx modules ...	2008-10-20 09:03:12 -07:00
Parag Warudkar	01e8ef11bc	x86: sysfs: kill owner field from attribute Tejun's commit `7b595756ec` made sysfs attribute->owner unnecessary. But the field was left in the structure to ease the merge. It's been over a year since that change and it is now time to start killing attribute->owner along with its users - one arch at a time! This patch is attempt #1 to get rid of attribute->owner only for CONFIG_X86_64 or CONFIG_X86_32 . We will deal with other arches later on as and when possible - avr32 will be the next since that is something I can test. Compile (make allyesconfig / make allmodconfig / custom config) and boot tested. akpm: the idea is that we put the declaration of sttribute.owner inside `#ifndef CONFIG_X86'. But that proved to be too ambitious for now because new usages kept on turning up in subsystem trees. [akpm: remove the ifdef for now] Signed-off-by: Parag Warudkar <parag.lkml@gmail.com> Cc: Greg KH <greg@kroah.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Tejun Heo <htejun@gmail.com> Cc: Len Brown <lenb@kernel.org> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Jean Delvare <khali@linux-fr.org> Cc: Roland Dreier <rolandd@cisco.com> Cc: David Brownell <david-b@pacbell.net> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:42 -07:00
Adrian Bunk	5a85a7dda1	include/linux/bcd.h: remove comments - the macros are gone - there's no more code in this file, LGPL + GPL = GPL, and the code that was moved to lib/bcd.c is anyway trivial Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:42 -07:00
Adrian Bunk	a0098efd6e	remove the obsolete BCDBIN/BINBCD macros Remove the following obsolete macros: - BCD2BIN - BIN2BCD - BCD_TO_BIN - BIN_TO_BCD Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:41 -07:00
Adrian Bunk	357c6e6359	rtc: use bcd2bin/bin2bcd Change various rtc related code to use the new bcd2bin/bin2bcd functions instead of the obsolete BCD_TO_BIN/BIN_TO_BCD/BCD2BIN/BIN2BCD macros. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Alessandro Zummo <a.zummo@towertech.it> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:41 -07:00
Adrian Bunk	fdd2e5f88a	make mm/rmap.c:anon_vma_cachep static This patch makes the needlessly global anon_vma_cachep static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:40 -07:00
Harvey Harrison	1d8cca44b6	byteorder: provide swabb.h generically in asm/byteorder.h This is needed during the transition to the new byteorder headers as the swabb.h functionality will be provided from asm/byteorder.h in the new version. To avoid breakage on arches still using the old implementation, provide swabb.h from asm/byteorder.h as well. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:40 -07:00
Harvey Harrison	acf0108a84	byteorder: use generic C version for value byteswapping This makes the new implementation of the byteorder helpers match the old in how it degraded when an arch-defined version was not available: 1) swab() - look for arch defined - if not, use generic c version 2) swabp() - look for arch-defined - if not, deref pointer and use swab() 3) swabs() - look for arch defined - if not, use swabp Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:40 -07:00
Harvey Harrison	b8e465f494	byteorder: add new headers for make headers-install Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:40 -07:00
Simon Horman	85a0ee342e	kdump: add is_vmcore_usable() and vmcore_unusable() The usage of elfcorehdr_addr has changed recently such that being set to ELFCORE_ADDR_MAX is used by is_kdump_kernel() to indicate if the code is executing in a kernel executed as a crash kernel. However, arch/ia64/kernel/setup.c:reserve_elfcorehdr will rest elfcorehdr_addr to ELFCORE_ADDR_MAX on error, which means any subsequent calls to is_kdump_kernel() will return 0, even though they should return 1. Ok, at this point in time there are no subsequent calls, but I think its fair to say that there is ample scope for error or at the very least confusion. This patch add an extra state, ELFCORE_ADDR_ERR, which indicates that elfcorehdr_addr was passed on the command line, and thus execution is taking place in a crashdump kernel, but vmcore can't be used for some reason. This is tested for using is_vmcore_usable() and set using vmcore_unusable(). A subsequent patch makes use of this new code. To summarise, the states that elfcorehdr_addr can now be in are as follows: ELFCORE_ADDR_MAX: not a crashdump kernel ELFCORE_ADDR_ERR: crashdump kernel but vmcore is unusable any other value: crash dump kernel and vmcore is usable Signed-off-by: Simon Horman <horms@verge.net.au> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:40 -07:00
Vivek Goyal	57cac4d188	kdump: make elfcorehdr_addr independent of CONFIG_PROC_VMCORE o elfcorehdr_addr is used by not only the code under CONFIG_PROC_VMCORE but also by the code which is not inside CONFIG_PROC_VMCORE. For example, is_kdump_kernel() is used by powerpc code to determine if kernel is booting after a panic then use previous kernel's TCE table. So even if CONFIG_PROC_VMCORE is not set in second kernel, one should be able to correctly determine that we are booting after a panic and setup calgary iommu accordingly. o So remove the assumption that elfcorehdr_addr is under CONFIG_PROC_VMCORE. o Move definition of elfcorehdr_addr to arch dependent crash files. (Unfortunately crash dump does not have an arch independent file otherwise that would have been the best place). o kexec.c is not the right place as one can Have CRASH_DUMP enabled in second kernel without KEXEC being enabled. o I don't see sh setup code parsing the command line for elfcorehdr_addr. I am wondering how does vmcore interface work on sh. Anyway, I am atleast defining elfcoredhr_addr so that compilation is not broken on sh. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Simon Horman <horms@verge.net.au> Acked-by: Paul Mundt <lethal@linux-sh.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:39 -07:00
Roland McGrath	656eb2cd5d	add CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS This adds a kconfig option to change the /proc/PID/coredump_filter default. Fedora has been carrying a trivial patch to change the hard-wired value for this default, since Fedora 8. The default default can't change safely because there are old GDB versions out there (all before 6.7) that are confused by the core dump files created by the MMF_DUMP_ELF_HEADERS setting. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Andi Kleen <andi@firstfloor.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Kawai Hidehiro <hidehiro.kawai.ez@hitachi.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:39 -07:00
Adrian Bunk	b747c8c102	make ptrace_untrace() static ptrace_untrace() can now become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:39 -07:00
Lai Jiangshan	c459643540	bitmask: remove bitmap_scnprintf_len() bitmap_scnprintf_len() is not used now, so we remove it. Otherwise we have to maintain it and make its return value always equal to bitmap_scnprintf()'s return value. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Paul Menage <menage@google.com> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:39 -07:00
Lai Jiangshan	3eda201180	seq_file: add seq_cpumask_list(), seq_nodemask_list() seq_cpumask_list(), seq_nodemask_list() are very like seq_cpumask(), seq_nodemask(), but they print human readable string. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Paul Menage <menage@google.com> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:39 -07:00
KAMEZAWA Hiroyuki	52d4b9ac0b	memcg: allocate all page_cgroup at boot Allocate all page_cgroup at boot and remove page_cgroup poitner from struct page. This patch adds an interface as struct page_cgroup lookup_page_cgroup(struct page) All FLATMEM/DISCONTIGMEM/SPARSEMEM and MEMORY_HOTPLUG is supported. Remove page_cgroup pointer reduces the amount of memory by - 4 bytes per PAGE_SIZE. - 8 bytes per PAGE_SIZE if memory controller is disabled. (even if configured.) On usual 8GB x86-32 server, this saves 8MB of NORMAL_ZONE memory. On my x86-64 server with 48GB of memory, this saves 96MB of memory. I think this reduction makes sense. By pre-allocation, kmalloc/kfree in charge/uncharge are removed. This means - we're not necessary to be afraid of kmalloc faiulre. (this can happen because of gfp_mask type.) - we can avoid calling kmalloc/kfree. - we can avoid allocating tons of small objects which can be fragmented. - we can know what amount of memory will be used for this extra-lru handling. I added printk message as "allocated %ld bytes of page_cgroup" "please try cgroup_disable=memory option if you don't want" maybe enough informative for users. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:39 -07:00
Paul Menage	886465f407	cgroups: fix declaration of cgroup_mm_owner_callbacks The choice of real/dummy declaration for cgroup_mm_owner_callbacks() shouldn't be based on CONFIG_MM_OWNER, but on CONFIG_CGROUPS. Otherwise kernel/exit.c fails to compile when something other than a cgroups controller selects CONFIG_MM_OWNER Signed-off-by: Paul Menage <menage@google.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:38 -07:00
Paul Menage	cc31edceee	cgroups: convert tasks file to use a seq_file with shared pid array Rather than pre-generating the entire text for the "tasks" file each time the file is opened, we instead just generate/update the array of process ids and use a seq_file to report these to userspace. All open file handles on the same "tasks" file can share a pid array, which may be updated any time that no thread is actively reading the array. By sharing the array, the potential for userspace to DoS the system by opening many handles on the same "tasks" file is removed. [Based on a patch by Lai Jiangshan, extended to use seq_file] Signed-off-by: Paul Menage <menage@google.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:38 -07:00
Lai Jiangshan	146aa1bd05	cgroups: fix probable race with put_css_set[_taskexit] and find_css_set put_css_set_taskexit may be called when find_css_set is called on other cpu. And the race will occur: put_css_set_taskexit side find_css_set side \| atomic_dec_and_test(&kref->refcount) \| /* kref->refcount = 0 / \| .................................................................... \| read_lock(&css_set_lock) \| find_existing_css_set \| get_css_set \| read_unlock(&css_set_lock); .................................................................... __release_css_set \| .................................................................... \| / use a released css_set */ \| [put_css_set is the same. But in the current code, all put_css_set are put into cgroup mutex critical region as the same as find_css_set.] [akpm@linux-foundation.org: repair comments] [menage@google.com: eliminate race in css_set refcounting] Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:38 -07:00
Hidehiro Kawai	0e4fb5e283	ext3: add an option to control error handling on file data If the journal doesn't abort when it gets an IO error in file data blocks, the file data corruption will spread silently. Because most of applications and commands do buffered writes without fsync(), they don't notice the IO error. It's scary for mission critical systems. On the other hand, if the journal aborts whenever it gets an IO error in file data blocks, the system will easily become inoperable. So this patch introduces a filesystem option to determine whether it aborts the journal or just call printk() when it gets an IO error in file data. If you mount a ext3 fs with data_err=abort option, it aborts on file data write error. If you mount it with data_err=ignore, it doesn't abort, just call printk(). data_err=ignore is the default. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Jan Kara <jack@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:37 -07:00
Krzysztof Helt	3e680aae4e	fb: convert lock/unlock_kernel() into local fb mutex Change lock_kernel()/unlock_kernel() to local fb mutex. Each frame buffer instance has its own mutex. The one line try_to_load() function is unrolled to request_module() in two places for readability. [righi.andrea@gmail.com: fb: fix NULL pointer BUG dereference in fb_open()] Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl> Signed-off-by: Andrea Righi <righi.andrea@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:36 -07:00
Matt Helsley	dc52ddc0e6	container freezer: implement freezer cgroup subsystem This patch implements a new freezer subsystem in the control groups framework. It provides a way to stop and resume execution of all tasks in a cgroup by writing in the cgroup filesystem. The freezer subsystem in the container filesystem defines a file named freezer.state. Writing "FROZEN" to the state file will freeze all tasks in the cgroup. Subsequently writing "RUNNING" will unfreeze the tasks in the cgroup. Reading will return the current state. * Examples of usage : # mkdir /containers/freezer # mount -t cgroup -ofreezer freezer /containers # mkdir /containers/0 # echo $some_pid > /containers/0/tasks to get status of the freezer subsystem : # cat /containers/0/freezer.state RUNNING to freeze all tasks in the container : # echo FROZEN > /containers/0/freezer.state # cat /containers/0/freezer.state FREEZING # cat /containers/0/freezer.state FROZEN to unfreeze all tasks in the container : # echo RUNNING > /containers/0/freezer.state # cat /containers/0/freezer.state RUNNING This is the basic mechanism which should do the right thing for user space task in a simple scenario. It's important to note that freezing can be incomplete. In that case we return EBUSY. This means that some tasks in the cgroup are busy doing something that prevents us from completely freezing the cgroup at this time. After EBUSY, the cgroup will remain partially frozen -- reflected by freezer.state reporting "FREEZING" when read. The state will remain "FREEZING" until one of these things happens: 1) Userspace cancels the freezing operation by writing "RUNNING" to the freezer.state file 2) Userspace retries the freezing operation by writing "FROZEN" to the freezer.state file (writing "FREEZING" is not legal and returns EIO) 3) The tasks that blocked the cgroup from entering the "FROZEN" state disappear from the cgroup's set of tasks. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: export thaw_process] Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Acked-by: Serge E. Hallyn <serue@us.ibm.com> Tested-by: Matt Helsley <matthltc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:34 -07:00
Matt Helsley	8174f1503f	container freezer: make refrigerator always available Now that the TIF_FREEZE flag is available in all architectures, extract the refrigerator() and freeze_task() from kernel/power/process.c and make it available to all. The refrigerator() can now be used in a control group subsystem implementing a control group freezer. Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Acked-by: Serge E. Hallyn <serue@us.ibm.com> Tested-by: Matt Helsley <matthltc@us.ibm.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:33 -07:00
Matt Helsley	83224b0837	container freezer: add TIF_FREEZE flag to all architectures This patch series introduces a cgroup subsystem that utilizes the swsusp freezer to freeze a group of tasks. It's immediately useful for batch job management scripts. It should also be useful in the future for implementing container checkpoint/restart. The freezer subsystem in the container filesystem defines a cgroup file named freezer.state. Reading freezer.state will return the current state of the cgroup. Writing "FROZEN" to the state file will freeze all tasks in the cgroup. Subsequently writing "RUNNING" will unfreeze the tasks in the cgroup. * Examples of usage : # mkdir /containers/freezer # mount -t cgroup -ofreezer freezer /containers # mkdir /containers/0 # echo $some_pid > /containers/0/tasks to get status of the freezer subsystem : # cat /containers/0/freezer.state RUNNING to freeze all tasks in the container : # echo FROZEN > /containers/0/freezer.state # cat /containers/0/freezer.state FREEZING # cat /containers/0/freezer.state FROZEN to unfreeze all tasks in the container : # echo RUNNING > /containers/0/freezer.state # cat /containers/0/freezer.state RUNNING This patch: The first step in making the refrigerator() available to all architectures, even for those without power management. The purpose of such a change is to be able to use the refrigerator() in a new control group subsystem which will implement a control group freezer. [akpm@linux-foundation.org: fix sparc] Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Acked-by: Pavel Machek <pavel@suse.cz> Acked-by: Serge E. Hallyn <serue@us.ibm.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Nigel Cunningham <nigel@tuxonice.net> Tested-by: Matt Helsley <matthltc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:33 -07:00
KOSAKI Motohiro	e575f111dc	coredump_filter: add hugepage dumping Presently hugepage's vma has a VM_RESERVED flag in order not to be swapped. But a VM_RESERVED vma isn't core dumped because this flag is often used for some kernel vmas (e.g. vmalloc, sound related). Thus hugepages are never dumped and it can't be debugged easily. Many developers want hugepages to be included into core-dump. However, We can't read generic VM_RESERVED area because this area is often IO mapping area. then these area reading may change device state. it is definitly undesiable side-effect. So adding a hugepage specific bit to the coredump filter is better. It will be able to hugepage core dumping and doesn't cause any side-effect to any i/o devices. In additional, libhugetlb use hugetlb private mapping pages as anonymous page. Then, hugepage private mapping pages should be core dumped by default. Then, /proc/[pid]/core_dump_filter has two new bits. - bit 5 mean hugetlb private mapping pages are dumped or not. (default: yes) - bit 6 mean hugetlb shared mapping pages are dumped or not. (default: no) I tested by following method. % ulimit -c unlimited % ./crash_hugepage 50 % ./crash_hugepage 50 -p % ls -lh % gdb ./crash_hugepage core % % echo 0x43 > /proc/self/coredump_filter % ./crash_hugepage 50 % ./crash_hugepage 50 -p % ls -lh % gdb ./crash_hugepage core #include <stdlib.h> #include <stdio.h> #include <unistd.h> #include <sys/mman.h> #include <string.h> #include "hugetlbfs.h" int main(int argc, char** argv){ char* p; int ch; int mmap_flags = MAP_SHARED; int fd; int nr_pages; while((ch = getopt(argc, argv, "p")) != -1) { switch (ch) { case 'p': mmap_flags &= ~MAP_SHARED; mmap_flags \|= MAP_PRIVATE; break; default: /* nothing/ break; } } argc -= optind; argv += optind; if (argc == 0){ printf("need # of pages\n"); exit(1); } nr_pages = atoi(argv[0]); if (nr_pages < 2) { printf("nr_pages must >2\n"); exit(1); } fd = hugetlbfs_unlinked_fd(); p = mmap(NULL, nr_pages gethugepagesize(), PROT_READ\|PROT_WRITE, mmap_flags, fd, 0); sleep(2); (p + gethugepagesize()) = 1; / COW / sleep(2); / crash! / (int*)0 = 1; return 0; } Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Kawai Hidehiro <hidehiro.kawai.ez@hitachi.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: William Irwin <wli@holomorphy.com> Cc: Adam Litke <agl@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:32 -07:00
Nick Piggin	db64fe0225	mm: rewrite vmap layer Rewrite the vmap allocator to use rbtrees and lazy tlb flushing, and provide a fast, scalable percpu frontend for small vmaps (requires a slightly different API, though). The biggest problem with vmap is actually vunmap. Presently this requires a global kernel TLB flush, which on most architectures is a broadcast IPI to all CPUs to flush the cache. This is all done under a global lock. As the number of CPUs increases, so will the number of vunmaps a scaled workload will want to perform, and so will the cost of a global TLB flush. This gives terrible quadratic scalability characteristics. Another problem is that the entire vmap subsystem works under a single lock. It is a rwlock, but it is actually taken for write in all the fast paths, and the read locking would likely never be run concurrently anyway, so it's just pointless. This is a rewrite of vmap subsystem to solve those problems. The existing vmalloc API is implemented on top of the rewritten subsystem. The TLB flushing problem is solved by using lazy TLB unmapping. vmap addresses do not have to be flushed immediately when they are vunmapped, because the kernel will not reuse them again (would be a use-after-free) until they are reallocated. So the addresses aren't allocated again until a subsequent TLB flush. A single TLB flush then can flush multiple vunmaps from each CPU. XEN and PAT and such do not like deferred TLB flushing because they can't always handle multiple aliasing virtual addresses to a physical address. They now call vm_unmap_aliases() in order to flush any deferred mappings. That call is very expensive (well, actually not a lot more expensive than a single vunmap under the old scheme), however it should be OK if not called too often. The virtual memory extent information is stored in an rbtree rather than a linked list to improve the algorithmic scalability. There is a per-CPU allocator for small vmaps, which amortizes or avoids global locking. To use the per-CPU interface, the vm_map_ram / vm_unmap_ram interfaces must be used in place of vmap and vunmap. Vmalloc does not use these interfaces at the moment, so it will not be quite so scalable (although it will use lazy TLB flushing). As a quick test of performance, I ran a test that loops in the kernel, linearly mapping then touching then unmapping 4 pages. Different numbers of tests were run in parallel on an 4 core, 2 socket opteron. Results are in nanoseconds per map+touch+unmap. threads vanilla vmap rewrite 1 14700 2900 2 33600 3000 4 49500 2800 8 70631 2900 So with a 8 cores, the rewritten version is already 25x faster. In a slightly more realistic test (although with an older and less scalable version of the patch), I ripped the not-very-good vunmap batching code out of XFS, and implemented the large buffer mapping with vm_map_ram and vm_unmap_ram... along with a couple of other tricks, I was able to speed up a large directory workload by 20x on a 64 CPU system. I believe vmap/vunmap is actually sped up a lot more than 20x on such a system, but I'm running into other locks now. vmap is pretty well blown off the profiles. Before: 1352059 total 0.1401 798784 _write_lock 8320.6667 <- vmlist_lock 529313 default_idle 1181.5022 15242 smp_call_function 15.8771 <- vmap tlb flushing 2472 __get_vm_area_node 1.9312 <- vmap 1762 remove_vm_area 4.5885 <- vunmap 316 map_vm_area 0.2297 <- vmap 312 kfree 0.1950 300 _spin_lock 3.1250 252 sn_send_IPI_phys 0.4375 <- tlb flushing 238 vmap 0.8264 <- vmap 216 find_lock_page 0.5192 196 find_next_bit 0.3603 136 sn2_send_IPI 0.2024 130 pio_phys_write_mmr 2.0312 118 unmap_kernel_range 0.1229 After: 78406 total 0.0081 40053 default_idle 89.4040 33576 ia64_spinlock_contention 349.7500 1650 _spin_lock 17.1875 319 __reg_op 0.5538 281 _atomic_dec_and_lock 1.0977 153 mutex_unlock 1.5938 123 iget_locked 0.1671 117 xfs_dir_lookup 0.1662 117 dput 0.1406 114 xfs_iget_core 0.0268 92 xfs_da_hashname 0.1917 75 d_alloc 0.0670 68 vmap_page_range 0.0462 <- vmap 58 kmem_cache_alloc 0.0604 57 memset 0.0540 52 rb_next 0.1625 50 __copy_user 0.0208 49 bitmap_find_free_region 0.2188 <- vmap 46 ia64_sn_udelay 0.1106 45 find_inode_fast 0.1406 42 memcmp 0.2188 42 finish_task_switch 0.1094 42 __d_lookup 0.0410 40 radix_tree_lookup_slot 0.1250 37 _spin_unlock_irqrestore 0.3854 36 xfs_bmapi 0.0050 36 kmem_cache_free 0.0256 35 xfs_vn_getattr 0.0322 34 radix_tree_lookup 0.1062 33 __link_path_walk 0.0035 31 xfs_da_do_buf 0.0091 30 _xfs_buf_find 0.0204 28 find_get_page 0.0875 27 xfs_iread 0.0241 27 __strncpy_from_user 0.2812 26 _xfs_buf_initialize 0.0406 24 _xfs_buf_lookup_pages 0.0179 24 vunmap_page_range 0.0250 <- vunmap 23 find_lock_page 0.0799 22 vm_map_ram 0.0087 <- vmap 20 kfree 0.0125 19 put_page 0.0330 18 __kmalloc 0.0176 17 xfs_da_node_lookup_int 0.0086 17 _read_lock 0.0885 17 page_waitqueue 0.0664 vmap has gone from being the top 5 on the profiles and flushing the crap out of all TLBs, to using less than 1% of kernel time. [akpm@linux-foundation.org: cleanups, section fix] [akpm@linux-foundation.org: fix build on alpha] Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Krzysztof Helt <krzysztof.h1@poczta.fm> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:32 -07:00
Nick Piggin	51b07fc3c5	fs: buffer lock use lock bitops trylock_buffer and unlock_buffer open and close a critical section. Hence, we can use the lock bitops to get the desired memory ordering. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:32 -07:00
Nick Piggin	8413ac9d8c	mm: page lock use lock bitops trylock_page, unlock_page open and close a critical section. Hence, we can use the lock bitops to get the desired memory ordering. Also, mark trylock as likely to succeed (and remove the annotation from callers). Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:32 -07:00
Nick Piggin	f45840b5c1	mm: pagecache insertion fewer atomics Setting and clearing the page locked when inserting it into swapcache / pagecache when it has no other references can use non-atomic page flags operations because no other CPU may be operating on it at this time. This saves one atomic operation when inserting a page into pagecache. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:31 -07:00
KOSAKI Motohiro	902d2e8ae0	vmscan: kill unused lru functions Several LRU manupuration function are not used now. So they can be removed. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:31 -07:00
Lee Schermerhorn	985737cf2e	mlock: count attempts to free mlocked page Allow free of mlock()ed pages. This shouldn't happen, but during developement, it occasionally did. This patch allows us to survive that condition, while keeping the statistics and events correct for debug. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:31 -07:00
Lee Schermerhorn	af936a1606	vmscan: unevictable LRU scan sysctl This patch adds a function to scan individual or all zones' unevictable lists and move any pages that have become evictable onto the respective zone's inactive list, where shrink_inactive_list() will deal with them. Adds sysctl to scan all nodes, and per node attributes to individual nodes' zones. Kosaki: If evictable page found in unevictable lru when write /proc/sys/vm/scan_unevictable_pages, print filename and file offset of these pages. [akpm@linux-foundation.org: fix one CONFIG_MMU=n build error] [kosaki.motohiro@jp.fujitsu.com: adapt vmscan-unevictable-lru-scan-sysctl.patch to new sysfs API] Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:31 -07:00
Lee Schermerhorn	64d6519dda	swap: cull unevictable pages in fault path In the fault paths that install new anonymous pages, check whether the page is evictable or not using lru_cache_add_active_or_unevictable(). If the page is evictable, just add it to the active lru list [via the pagevec cache], else add it to the unevictable list. This "proactive" culling in the fault path mimics the handling of mlocked pages in Nick Piggin's series to keep mlocked pages off the lru lists. Notes: 1) This patch is optional--e.g., if one is concerned about the additional test in the fault path. We can defer the moving of nonreclaimable pages until when vmscan [shrink_*_list()] encounters them. Vmscan will only need to handle such pages once, but if there are a lot of them it could impact system performance. 2) The 'vma' argument to page_evictable() is require to notice that we're faulting a page into an mlock()ed vma w/o having to scan the page's rmap in the fault path. Culling mlock()ed anon pages is currently the only reason for this patch. 3) We can't cull swap pages in read_swap_cache_async() because the vma argument doesn't necessarily correspond to the swap cache offset passed in by swapin_readahead(). This could [did!] result in mlocking pages in non-VM_LOCKED vmas if [when] we tried to cull in this path. 4) Move set_pte_at() to after where we add page to lru to keep it hidden from other tasks that might walk the page table. We already do it in this order in do_anonymous() page. And, these are COW'd anon pages. Is this safe? [riel@redhat.com: undo an overzealous code cleanup] Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:31 -07:00
Nick Piggin	5344b7e648	vmstat: mlocked pages statistics Add NR_MLOCK zone page state, which provides a (conservative) count of mlocked pages (actually, the number of mlocked pages moved off the LRU). Reworked by lts to fit in with the modified mlock page support in the Reclaim Scalability series. [kosaki.motohiro@jp.fujitsu.com: fix incorrect Mlocked field of /proc/meminfo] [lee.schermerhorn@hp.com: mlocked-pages: add event counting with statistics] Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:31 -07:00
Nick Piggin	b291f00039	mlock: mlocked pages are unevictable Make sure that mlocked pages also live on the unevictable LRU, so kswapd will not scan them over and over again. This is achieved through various strategies: 1) add yet another page flag--PG_mlocked--to indicate that the page is locked for efficient testing in vmscan and, optionally, fault path. This allows early culling of unevictable pages, preventing them from getting to page_referenced()/try_to_unmap(). Also allows separate accounting of mlock'd pages, as Nick's original patch did. Note: Nick's original mlock patch used a PG_mlocked flag. I had removed this in favor of the PG_unevictable flag + an mlock_count [new page struct member]. I restored the PG_mlocked flag to eliminate the new count field. 2) add the mlock/unevictable infrastructure to mm/mlock.c, with internal APIs in mm/internal.h. This is a rework of Nick's original patch to these files, taking into account that mlocked pages are now kept on unevictable LRU list. 3) update vmscan.c:page_evictable() to check PageMlocked() and, if vma passed in, the vm_flags. Note that the vma will only be passed in for new pages in the fault path; and then only if the "cull unevictable pages in fault path" patch is included. 4) add try_to_unlock() to rmap.c to walk a page's rmap and ClearPageMlocked() if no other vmas have it mlocked. Reuses as much of try_to_unmap() as possible. This effectively replaces the use of one of the lru list links as an mlock count. If this mechanism let's pages in mlocked vmas leak through w/o PG_mlocked set [I don't know that it does], we should catch them later in try_to_unmap(). One hopes this will be rare, as it will be relatively expensive. Original mm/internal.h, mm/rmap.c and mm/mlock.c changes: Signed-off-by: Nick Piggin <npiggin@suse.de> splitlru: introduce __get_user_pages(): New munlock processing need to GUP_FLAGS_IGNORE_VMA_PERMISSIONS. because current get_user_pages() can't grab PROT_NONE pages theresore it cause PROT_NONE pages can't munlock. [akpm@linux-foundation.org: fix this for pagemap-pass-mm-into-pagewalkers.patch] [akpm@linux-foundation.org: untangle patch interdependencies] [akpm@linux-foundation.org: fix things after out-of-order merging] [hugh@veritas.com: fix page-flags mess] [lee.schermerhorn@hp.com: fix munlock page table walk - now requires 'mm'] [kosaki.motohiro@jp.fujitsu.com: build fix] [kosaki.motohiro@jp.fujitsu.com: fix truncate race and sevaral comments] [kosaki.motohiro@jp.fujitsu.com: splitlru: introduce __get_user_pages()] Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:30 -07:00
Lee Schermerhorn	89e004ea55	SHM_LOCKED pages are unevictable Shmem segments locked into memory via shmctl(SHM_LOCKED) should not be kept on the normal LRU, since scanning them is a waste of time and might throw off kswapd's balancing algorithms. Place them on the unevictable LRU list instead. Use the AS_UNEVICTABLE flag to mark address_space of SHM_LOCKed shared memory regions as unevictable. Then these pages will be culled off the normal LRU lists during vmscan. Add new wrapper function to clear the mapping's unevictable state when/if shared memory segment is munlocked. Add 'scan_mapping_unevictable_page()' to mm/vmscan.c to scan all pages in the shmem segment's mapping [struct address_space] for evictability now that they're no longer locked. If so, move them to the appropriate zone lru list. Changes depend on [CONFIG_]UNEVICTABLE_LRU. [kosaki.motohiro@jp.fujitsu.com: revert shm change] Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:50:26 -07:00
Lee Schermerhorn	ba9ddf4939	Ramfs and Ram Disk pages are unevictable Christoph Lameter pointed out that ram disk pages also clutter the LRU lists. When vmscan finds them dirty and tries to clean them, the ram disk writeback function just redirties the page so that it goes back onto the active list. Round and round she goes... With the ram disk driver [rd.c] replaced by the newer 'brd.c', this is no longer the case, as ram disk pages are no longer maintained on the lru. [This makes them unmigratable for defrag or memory hot remove, but that can be addressed by a separate patch series.] However, the ramfs pages behave like ram disk pages used to, so: Define new address_space flag [shares address_space flags member with mapping's gfp mask] to indicate that the address space contains all unevictable pages. This will provide for efficient testing of ramfs pages in page_evictable(). Also provide wrapper functions to set/test the unevictable state to minimize #ifdefs in ramfs driver and any other users of this facility. Set the unevictable state on address_space structures for new ramfs inodes. Test the unevictable state in page_evictable() to cull unevictable pages. These changes depend on [CONFIG_]UNEVICTABLE_LRU. [riel@redhat.com: undo the brd.c part] Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Rik van Riel <riel@redhat.com> Debugged-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:50:26 -07:00
Lee Schermerhorn	bbfd28eee9	unevictable lru: add event counting with statistics Fix to unevictable-lru-page-statistics.patch Add unevictable lru infrastructure vm events to the statistics patch. Rename the "NORECL_" and "noreclaim_" symbols and text strings to "UNEVICTABLE_" and "unevictable_", respectively. Currently, both the infrastructure and the mlocked pages event are added by a single patch later in the series. This makes it difficult to add or rework the incremental patches. The events actually "belong" with the stats, so pull them up to here. Also, restore the event counting to putback_lru_page(). This was removed from previous patch in series where it was "misplaced". The actual events weren't defined that early. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Rik van Riel <riel@redhat.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:50:26 -07:00
Lee Schermerhorn	894bc31041	Unevictable LRU Infrastructure When the system contains lots of mlocked or otherwise unevictable pages, the pageout code (kswapd) can spend lots of time scanning over these pages. Worse still, the presence of lots of unevictable pages can confuse kswapd into thinking that more aggressive pageout modes are required, resulting in all kinds of bad behaviour. Infrastructure to manage pages excluded from reclaim--i.e., hidden from vmscan. Based on a patch by Larry Woodman of Red Hat. Reworked to maintain "unevictable" pages on a separate per-zone LRU list, to "hide" them from vmscan. Kosaki Motohiro added the support for the memory controller unevictable lru list. Pages on the unevictable list have both PG_unevictable and PG_lru set. Thus, PG_unevictable is analogous to and mutually exclusive with PG_active--it specifies which LRU list the page is on. The unevictable infrastructure is enabled by a new mm Kconfig option [CONFIG_]UNEVICTABLE_LRU. A new function 'page_evictable(page, vma)' in vmscan.c tests whether or not a page may be evictable. Subsequent patches will add the various !evictable tests. We'll want to keep these tests light-weight for use in shrink_active_list() and, possibly, the fault path. To avoid races between tasks putting pages [back] onto an LRU list and tasks that might be moving the page from non-evictable to evictable state, the new function 'putback_lru_page()' -- inverse to 'isolate_lru_page()' -- tests the "evictability" of a page after placing it on the LRU, before dropping the reference. If the page has become unevictable, putback_lru_page() will redo the 'putback', thus moving the page to the unevictable list. This way, we avoid "stranding" evictable pages on the unevictable list. [akpm@linux-foundation.org: fix fallout from out-of-order merge] [riel@redhat.com: fix UNEVICTABLE_LRU and !PROC_PAGE_MONITOR build] [nishimura@mxp.nes.nec.co.jp: remove redundant mapping check] [kosaki.motohiro@jp.fujitsu.com: unevictable-lru-infrastructure: putback_lru_page()/unevictable page handling rework] [kosaki.motohiro@jp.fujitsu.com: kill unnecessary lock_page() in vmscan.c] [kosaki.motohiro@jp.fujitsu.com: revert migration change of unevictable lru infrastructure] [kosaki.motohiro@jp.fujitsu.com: revert to unevictable-lru-infrastructure-kconfig-fix.patch] [kosaki.motohiro@jp.fujitsu.com: restore patch failure of vmstat-unevictable-and-mlocked-pages-vm-events.patch] Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Debugged-by: Benjamin Kidwell <benjkidwell@yahoo.com> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:50:26 -07:00
Lee Schermerhorn	8a7a8544a4	pageflag helpers for configed-out flags Define proper false/noop inline functions for noreclaim page flags when !defined(CONFIG_UNEVICTABLE_LRU) Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:50:26 -07:00
Rik van Riel	556adecba1	vmscan: second chance replacement for anonymous pages We avoid evicting and scanning anonymous pages for the most part, but under some workloads we can end up with most of memory filled with anonymous pages. At that point, we suddenly need to clear the referenced bits on all of memory, which can take ages on very large memory systems. We can reduce the maximum number of pages that need to be scanned by not taking the referenced state into account when deactivating an anonymous page. After all, every anonymous page starts out referenced, so why check? If an anonymous page gets referenced again before it reaches the end of the inactive list, we move it back to the active list. To keep the maximum amount of necessary work reasonable, we scale the active to inactive ratio with the size of memory, using the formula active:inactive ratio = sqrt(memory in GB * 10). Kswapd CPU use now seems to scale by the amount of pageout bandwidth, instead of by the amount of memory present in the system. [kamezawa.hiroyu@jp.fujitsu.com: fix OOM with memcg] [kamezawa.hiroyu@jp.fujitsu.com: memcg: lru scan fix] Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:50:25 -07:00
Rik van Riel	4f98a2fee8	vmscan: split LRU lists into anon & file sets Split the LRU lists in two, one set for pages that are backed by real file systems ("file") and one for pages that are backed by memory and swap ("anon"). The latter includes tmpfs. The advantage of doing this is that the VM will not have to scan over lots of anonymous pages (which we generally do not want to swap out), just to find the page cache pages that it should evict. This patch has the infrastructure and a basic policy to balance how much we scan the anon lists and how much we scan the file lists. The big policy changes are in separate patches. [lee.schermerhorn@hp.com: collect lru meminfo statistics from correct offset] [kosaki.motohiro@jp.fujitsu.com: prevent incorrect oom under split_lru] [kosaki.motohiro@jp.fujitsu.com: fix pagevec_move_tail() doesn't treat unevictable page] [hugh@veritas.com: memcg swapbacked pages active] [hugh@veritas.com: splitlru: BDI_CAP_SWAP_BACKED] [akpm@linux-foundation.org: fix /proc/vmstat units] [nishimura@mxp.nes.nec.co.jp: memcg: fix handling of shmem migration] [kosaki.motohiro@jp.fujitsu.com: adjust Quicklists field of /proc/meminfo] [kosaki.motohiro@jp.fujitsu.com: fix style issue of get_scan_ratio()] Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:50:25 -07:00
Rik van Riel	b2e185384f	define page_file_cache() function Define page_file_cache() function to answer the question: is page backed by a file? Originally part of Rik van Riel's split-lru patch. Extracted to make available for other, independent reclaim patches. Moved inline function to linux/mm_inline.h where it will be needed by subsequent "split LRU" and "noreclaim" patches. Unfortunately this needs to use a page flag, since the PG_swapbacked state needs to be preserved all the way to the point where the page is last removed from the LRU. Trying to derive the status from other info in the page resulted in wrong VM statistics in earlier split VM patchsets. The total number of page flags in use on a 32 bit machine after this patch is 19. [akpm@linux-foundation.org: fix up out-of-order merge fallout] [hugh@veritas.com: splitlru: shmem_getpage SetPageSwapBacked sooner[ Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: MinChan Kim <minchan.kim@gmail.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:50:25 -07:00
Rik van Riel	68a22394c2	vmscan: free swap space on swap-in/activation If vm_swap_full() (swap space more than 50% full), the system will free swap space at swapin time. With this patch, the system will also free the swap space in the pageout code, when we decide that the page is not a candidate for swapout (and just wasting swap space). Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Signed-off-by: MinChan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:50:25 -07:00
KOSAKI Motohiro	f04e9ebbe4	swap: use an array for the LRU pagevecs Turn the pagevecs into an array just like the LRUs. This significantly cleans up the source code and reduces the size of the kernel by about 13kB after all the LRU lists have been created further down in the split VM patch series. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:50:25 -07:00
Christoph Lameter	b69408e88b	vmscan: Use an indexed array for LRU variables Currently we are defining explicit variables for the inactive and active list. An indexed array can be more generic and avoid repeating similar code in several places in the reclaim code. We are saving a few bytes in terms of code size: Before: text data bss dec hex filename 4097753 573120 4092484 8763357 85b7dd vmlinux After: text data bss dec hex filename 4097729 573120 4092484 8763333 85b7c5 vmlinux Having an easy way to add new lru lists may ease future work on the reclaim code. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:50:25 -07:00
Nick Piggin	62695a84eb	vmscan: move isolate_lru_page() to vmscan.c On large memory systems, the VM can spend way too much time scanning through pages that it cannot (or should not) evict from memory. Not only does it use up CPU time, but it also provokes lock contention and can leave large systems under memory presure in a catatonic state. This patch series improves VM scalability by: 1) putting filesystem backed, swap backed and unevictable pages onto their own LRUs, so the system only scans the pages that it can/should evict from memory 2) switching to two handed clock replacement for the anonymous LRUs, so the number of pages that need to be scanned when the system starts swapping is bound to a reasonable number 3) keeping unevictable pages off the LRU completely, so the VM does not waste CPU time scanning them. ramfs, ramdisk, SHM_LOCKED shared memory segments and mlock()ed VMA pages are keept on the unevictable list. This patch: isolate_lru_page logically belongs to be in vmscan.c than migrate.c. It is tough, because we don't need that function without memory migration so there is a valid argument to have it in migrate.c. However a subsequent patch needs to make use of it in the core mm, so we can happily move it to vmscan.c. Also, make the function a little more generic by not requiring that it adds an isolated page to a given list. Callers can do that. Note that we now have '__isolate_lru_page()', that does something quite different, visible outside of vmscan.c for use with memory controller. Methinks we need to rationalize these names/purposes. --lts [akpm@linux-foundation.org: fix mm/memory_hotplug.c build] Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:50:25 -07:00
David Vrabel	61e0e79ee3	Merge branch 'master' into for-upstream Conflicts: Documentation/ABI/testing/sysfs-bus-usb drivers/Makefile	2008-10-20 16:07:19 +01:00
Thomas Gleixner	592aa999d6	hrtimers: add missing docbook comments to struct hrtimer Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-10-20 16:38:19 +02:00
Peter Zijlstra	ffda12a17a	sched: optimize group load balancer I noticed that tg_shares_up() unconditionally takes rq-locks for all cpus in the sched_domain. This hurts. We need the rq-locks whenever we change the weight of the per-cpu group sched entities. To allevate this a little, only change the weight when the new weight is at least shares_thresh away from the old value. This avoids the rq-lock for the top level entries, since those will never be re-weighted, and fuzzes the lower level entries a little to gain performance in semi-stable situations. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-10-20 14:05:02 +02:00
Thomas Gleixner	c465a76af6	Merge branches 'timers/clocksource', 'timers/hrtimers', 'timers/nohz', 'timers/ntp', 'timers/posixtimers' and 'timers/debug' into v28-timers-for-linus	2008-10-20 13:14:06 +02:00
Patrick McHardy	10a03a42d1	netfilter: netns: use NFPROTO_NUMPROTO instead of NUMPROTO for tables array The netfilter families have been decoupled from regular protocol families. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-20 03:31:54 -07:00
Magnus Damm	f400f510df	video: add sh_mobile_lcdc platform flags Add platform data flags for detailed lcd display configuration. Signed-off-by: Magnus Damm <damm@igel.co.jp> Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2008-10-20 11:38:50 +09:00
Magnus Damm	7994b1c55e	video: remove unused sh_mobile_lcdc platform data Remove lddckr from the platform data, these days we calculate the register value from clock source and clock dividers anyway. Signed-off-by: Magnus Damm <damm@igel.co.jp> Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2008-10-20 11:38:47 +09:00
Paul Mundt	4cb40f795a	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: Documentation/kernel-parameters.txt arch/sh/include/asm/elf.h	2008-10-20 11:17:52 +09:00

1 2 3 4 5 ...

27509 Commits