linux

Author	SHA1	Message	Date
Rik van Riel	56e49d2188	vmscan: evict use-once pages first When the file LRU lists are dominated by streaming IO pages, evict those pages first, before considering evicting other pages. This should be safe from deadlocks or performance problems because only three things can happen to an inactive file page: 1) referenced twice and promoted to the active list 2) evicted by the pageout code 3) under IO, after which it will get evicted or promoted The pages freed in this way can either be reused for streaming IO, or allocated for something else. If the pages are used for streaming IO, this pageout pattern continues. Otherwise, we will fall back to the normal pageout pattern. Signed-off-by: Rik van Riel <riel@redhat.com> Reported-by: Elladan <elladan@eskimo.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:38 -07:00
Wu Fengguang	20a0307c03	mm: introduce PageHuge() for testing huge/gigantic pages A series of patches to enhance the /proc/pagemap interface and to add a userspace executable which can be used to present the pagemap data. Export 10 more flags to end users (and more for kernel developers): 11. KPF_MMAP (pseudo flag) memory mapped page 12. KPF_ANON (pseudo flag) memory mapped page (anonymous) 13. KPF_SWAPCACHE page is in swap cache 14. KPF_SWAPBACKED page is swap/RAM backed 15. KPF_COMPOUND_HEAD () 16. KPF_COMPOUND_TAIL () 17. KPF_HUGE hugeTLB pages 18. KPF_UNEVICTABLE page is in the unevictable LRU list 19. KPF_HWPOISON hardware detected corruption 20. KPF_NOPAGE (pseudo flag) no page frame at the address (*) For compound pages, exporting _both_ head/tail info enables users to tell where a compound page starts/ends, and its order. a simple demo of the page-types tool # ./page-types -h page-types [options] -r\|--raw Raw mode, for kernel developers -a\|--addr addr-spec Walk a range of pages -b\|--bits bits-spec Walk pages with specified bits -l\|--list Show page details in ranges -L\|--list-each Show page details one by one -N\|--no-summary Don't show summay info -h\|--help Show this usage message addr-spec: N one page at offset N (unit: pages) N+M pages range from N to N+M-1 N,M pages range from N to M-1 N, pages range from N to end ,M pages range from 0 to M bits-spec: bit1,bit2 (flags & (bit1\|bit2)) != 0 bit1,bit2=bit1 (flags & (bit1\|bit2)) == bit1 bit1,~bit2 (flags & (bit1\|bit2)) == bit1 =bit1,bit2 flags == (bit1\|bit2) bit-names: locked error referenced uptodate dirty lru active slab writeback reclaim buddy mmap anonymous swapcache swapbacked compound_head compound_tail huge unevictable hwpoison nopage reserved(r) mlocked(r) mappedtodisk(r) private(r) private_2(r) owner_private(r) arch(r) uncached(r) readahead(o) slob_free(o) slub_frozen(o) slub_debug(o) (r) raw mode bits (o) overloaded bits # ./page-types flags page-count MB symbolic-flags long-symbolic-flags 0x0000000000000000 487369 1903 _________________________________ 0x0000000000000014 5 0 __R_D____________________________ referenced,dirty 0x0000000000000020 1 0 _____l___________________________ lru 0x0000000000000024 34 0 __R__l___________________________ referenced,lru 0x0000000000000028 3838 14 ___U_l___________________________ uptodate,lru 0x0001000000000028 48 0 ___U_l_______________________I___ uptodate,lru,readahead 0x000000000000002c 6478 25 __RU_l___________________________ referenced,uptodate,lru 0x000100000000002c 47 0 __RU_l_______________________I___ referenced,uptodate,lru,readahead 0x0000000000000040 8344 32 ______A__________________________ active 0x0000000000000060 1 0 _____lA__________________________ lru,active 0x0000000000000068 348 1 ___U_lA__________________________ uptodate,lru,active 0x0001000000000068 12 0 ___U_lA______________________I___ uptodate,lru,active,readahead 0x000000000000006c 988 3 __RU_lA__________________________ referenced,uptodate,lru,active 0x000100000000006c 48 0 __RU_lA______________________I___ referenced,uptodate,lru,active,readahead 0x0000000000004078 1 0 ___UDlA_______b__________________ uptodate,dirty,lru,active,swapbacked 0x000000000000407c 34 0 __RUDlA_______b__________________ referenced,uptodate,dirty,lru,active,swapbacked 0x0000000000000400 503 1 __________B______________________ buddy 0x0000000000000804 1 0 __R________M_____________________ referenced,mmap 0x0000000000000828 1029 4 ___U_l_____M_____________________ uptodate,lru,mmap 0x0001000000000828 43 0 ___U_l_____M_________________I___ uptodate,lru,mmap,readahead 0x000000000000082c 382 1 __RU_l_____M_____________________ referenced,uptodate,lru,mmap 0x000100000000082c 12 0 __RU_l_____M_________________I___ referenced,uptodate,lru,mmap,readahead 0x0000000000000868 192 0 ___U_lA____M_____________________ uptodate,lru,active,mmap 0x0001000000000868 12 0 ___U_lA____M_________________I___ uptodate,lru,active,mmap,readahead 0x000000000000086c 800 3 __RU_lA____M_____________________ referenced,uptodate,lru,active,mmap 0x000100000000086c 31 0 __RU_lA____M_________________I___ referenced,uptodate,lru,active,mmap,readahead 0x0000000000004878 2 0 ___UDlA____M__b__________________ uptodate,dirty,lru,active,mmap,swapbacked 0x0000000000001000 492 1 ____________a____________________ anonymous 0x0000000000005808 4 0 ___U_______Ma_b__________________ uptodate,mmap,anonymous,swapbacked 0x0000000000005868 2839 11 ___U_lA____Ma_b__________________ uptodate,lru,active,mmap,anonymous,swapbacked 0x000000000000586c 30 0 __RU_lA____Ma_b__________________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked total 513968 2007 # ./page-types -r flags page-count MB symbolic-flags long-symbolic-flags 0x0000000000000000 468002 1828 _________________________________ 0x0000000100000000 19102 74 _____________________r___________ reserved 0x0000000000008000 41 0 _______________H_________________ compound_head 0x0000000000010000 188 0 ________________T________________ compound_tail 0x0000000000008014 1 0 __R_D__________H_________________ referenced,dirty,compound_head 0x0000000000010014 4 0 __R_D___________T________________ referenced,dirty,compound_tail 0x0000000000000020 1 0 _____l___________________________ lru 0x0000000800000024 34 0 __R__l__________________P________ referenced,lru,private 0x0000000000000028 3794 14 ___U_l___________________________ uptodate,lru 0x0001000000000028 46 0 ___U_l_______________________I___ uptodate,lru,readahead 0x0000000400000028 44 0 ___U_l_________________d_________ uptodate,lru,mappedtodisk 0x0001000400000028 2 0 ___U_l_________________d_____I___ uptodate,lru,mappedtodisk,readahead 0x000000000000002c 6434 25 __RU_l___________________________ referenced,uptodate,lru 0x000100000000002c 47 0 __RU_l_______________________I___ referenced,uptodate,lru,readahead 0x000000040000002c 14 0 __RU_l_________________d_________ referenced,uptodate,lru,mappedtodisk 0x000000080000002c 30 0 __RU_l__________________P________ referenced,uptodate,lru,private 0x0000000800000040 8124 31 ______A_________________P________ active,private 0x0000000000000040 219 0 ______A__________________________ active 0x0000000800000060 1 0 _____lA_________________P________ lru,active,private 0x0000000000000068 322 1 ___U_lA__________________________ uptodate,lru,active 0x0001000000000068 12 0 ___U_lA______________________I___ uptodate,lru,active,readahead 0x0000000400000068 13 0 ___U_lA________________d_________ uptodate,lru,active,mappedtodisk 0x0000000800000068 12 0 ___U_lA_________________P________ uptodate,lru,active,private 0x000000000000006c 977 3 __RU_lA__________________________ referenced,uptodate,lru,active 0x000100000000006c 48 0 __RU_lA______________________I___ referenced,uptodate,lru,active,readahead 0x000000040000006c 5 0 __RU_lA________________d_________ referenced,uptodate,lru,active,mappedtodisk 0x000000080000006c 3 0 __RU_lA_________________P________ referenced,uptodate,lru,active,private 0x0000000c0000006c 3 0 __RU_lA________________dP________ referenced,uptodate,lru,active,mappedtodisk,private 0x0000000c00000068 1 0 ___U_lA________________dP________ uptodate,lru,active,mappedtodisk,private 0x0000000000004078 1 0 ___UDlA_______b__________________ uptodate,dirty,lru,active,swapbacked 0x000000000000407c 34 0 __RUDlA_______b__________________ referenced,uptodate,dirty,lru,active,swapbacked 0x0000000000000400 538 2 __________B______________________ buddy 0x0000000000000804 1 0 __R________M_____________________ referenced,mmap 0x0000000000000828 1029 4 ___U_l_____M_____________________ uptodate,lru,mmap 0x0001000000000828 43 0 ___U_l_____M_________________I___ uptodate,lru,mmap,readahead 0x000000000000082c 382 1 __RU_l_____M_____________________ referenced,uptodate,lru,mmap 0x000100000000082c 12 0 __RU_l_____M_________________I___ referenced,uptodate,lru,mmap,readahead 0x0000000000000868 192 0 ___U_lA____M_____________________ uptodate,lru,active,mmap 0x0001000000000868 12 0 ___U_lA____M_________________I___ uptodate,lru,active,mmap,readahead 0x000000000000086c 800 3 __RU_lA____M_____________________ referenced,uptodate,lru,active,mmap 0x000100000000086c 31 0 __RU_lA____M_________________I___ referenced,uptodate,lru,active,mmap,readahead 0x0000000000004878 2 0 ___UDlA____M__b__________________ uptodate,dirty,lru,active,mmap,swapbacked 0x0000000000001000 492 1 ____________a____________________ anonymous 0x0000000000005008 2 0 ___U________a_b__________________ uptodate,anonymous,swapbacked 0x0000000000005808 4 0 ___U_______Ma_b__________________ uptodate,mmap,anonymous,swapbacked 0x000000000000580c 1 0 __RU_______Ma_b__________________ referenced,uptodate,mmap,anonymous,swapbacked 0x0000000000005868 2839 11 ___U_lA____Ma_b__________________ uptodate,lru,active,mmap,anonymous,swapbacked 0x000000000000586c 29 0 __RU_lA____Ma_b__________________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked total 513968 2007 # ./page-types --raw --list --no-summary --bits reserved offset count flags 0 15 _____________________r___________ 31 4 _____________________r___________ 159 97 _____________________r___________ 4096 2067 _____________________r___________ 6752 2390 _____________________r___________ 9355 3 _____________________r___________ 9728 14526 _____________________r___________ This patch: Introduce PageHuge(), which identifies huge/gigantic pages by their dedicated compound destructor functions. Also move prep_compound_gigantic_page() to hugetlb.c and make __free_pages_ok() non-static. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:36 -07:00
Christoph Lameter	62bc62a873	page allocator: use a pre-calculated value instead of num_online_nodes() in fast paths num_online_nodes() is called in a number of places but most often by the page allocator when deciding whether the zonelist needs to be filtered based on cpusets or the zonelist cache. This is actually a heavy function and touches a number of cache lines. This patch stores the number of online nodes at boot time and updates the value when nodes get onlined and offlined. The value is then used in a number of important paths in place of num_online_nodes(). [rientjes@google.com: do not override definition of node_set_online() with macro] Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:35 -07:00
Mel Gorman	418589663d	page allocator: use allocation flags as an index to the zone watermark ALLOC_WMARK_MIN, ALLOC_WMARK_LOW and ALLOC_WMARK_HIGH determin whether pages_min, pages_low or pages_high is used as the zone watermark when allocating the pages. Two branches in the allocator hotpath determine which watermark to use. This patch uses the flags as an array index into a watermark array that is indexed with WMARK_* defines accessed via helpers. All call sites that use zone->pages_* are updated to use the helpers for accessing the values and the array offsets for setting. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:35 -07:00
Mel Gorman	49255c619f	page allocator: move check for disabled anti-fragmentation out of fastpath On low-memory systems, anti-fragmentation gets disabled as there is nothing it can do and it would just incur overhead shuffling pages between lists constantly. Currently the check is made in the free page fast path for every page. This patch moves it to a slow path. On machines with low memory, there will be small amount of additional overhead as pages get shuffled between lists but it should quickly settle. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:33 -07:00
Mel Gorman	6484eb3e2a	page allocator: do not check NUMA node ID when the caller knows the node is valid Callers of alloc_pages_node() can optionally specify -1 as a node to mean "allocate from the current node". However, a number of the callers in fast paths know for a fact their node is valid. To avoid a comparison and branch, this patch adds alloc_pages_exact_node() that only checks the nid with VM_BUG_ON(). Callers that know their node is valid are then converted. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Paul Mundt <lethal@linux-sh.org> [for the SLOB NUMA bits] Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:32 -07:00
Mel Gorman	b3c466ce51	page allocator: do not sanity check order in the fast path No user of the allocator API should be passing in an order >= MAX_ORDER but we check for it on each and every allocation. Delete this check and make it a VM_BUG_ON check further down the call path. [akpm@linux-foundation.org: s/VM_BUG_ON/WARN_ON_ONCE/] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:32 -07:00
Mel Gorman	d239171e4f	page allocator: replace __alloc_pages_internal() with __alloc_pages_nodemask() The start of a large patch series to clean up and optimise the page allocator. The performance improvements are in a wide range depending on the exact machine but the results I've seen so fair are approximately; kernbench: 0 to 0.12% (elapsed time) 0.49% to 3.20% (sys time) aim9: -4% to 30% (for page_test and brk_test) tbench: -1% to 4% hackbench: -2.5% to 3.45% (mostly within the noise though) netperf-udp -1.34% to 4.06% (varies between machines a bit) netperf-tcp -0.44% to 5.22% (varies between machines a bit) I haven't sysbench figures at hand, but previously they were within the -0.5% to 2% range. On netperf, the client and server were bound to opposite number CPUs to maximise the problems with cache line bouncing of the struct pages so I expect different people to report different results for netperf depending on their exact machine and how they ran the test (different machines, same cpus client/server, shared cache but two threads client/server, different socket client/server etc). I also measured the vmlinux sizes for a single x86-based config with CONFIG_DEBUG_INFO enabled but not CONFIG_DEBUG_VM. The core of the .config is based on the Debian Lenny kernel config so I expect it to be reasonably typical. This patch: __alloc_pages_internal is the core page allocator function but essentially it is an alias of __alloc_pages_nodemask. Naming a publicly available and exported function "internal" is also a big ugly. This patch renames __alloc_pages_internal() to __alloc_pages_nodemask() and deletes the old nodemask function. Warning - This patch renames an exported symbol. No kernel driver is affected by external drivers calling __alloc_pages_internal() should change the call to __alloc_pages_nodemask() without any alteration of parameters. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:32 -07:00
Miao Xie	58568d2a82	cpuset,mm: update tasks' mems_allowed in time Fix allocating page cache/slab object on the unallowed node when memory spread is set by updating tasks' mems_allowed after its cpuset's mems is changed. In order to update tasks' mems_allowed in time, we must modify the code of memory policy. Because the memory policy is applied in the process's context originally. After applying this patch, one task directly manipulates anothers mems_allowed, and we use alloc_lock in the task_struct to protect mems_allowed and memory policy of the task. But in the fast path, we didn't use lock to protect them, because adding a lock may lead to performance regression. But if we don't add a lock,the task might see no nodes when changing cpuset's mems_allowed to some non-overlapping set. In order to avoid it, we set all new allowed nodes, then clear newly disallowed ones. [lee.schermerhorn@hp.com: The rework of mpol_new() to extract the adjusting of the node mask to apply cpuset and mpol flags "context" breaks set_mempolicy() and mbind() with MPOL_PREFERRED and a NULL nodemask--i.e., explicit local allocation. Fix this by adding the check for MPOL_PREFERRED and empty node mask to mpol_new_mpolicy(). Remove the now unneeded 'nodes = NULL' from mpol_new(). Note that mpol_new_mempolicy() is always called with a non-NULL 'nodes' parameter now that it has been removed from mpol_new(). Therefore, we don't need to test nodes for NULL before testing it for 'empty'. However, just to be extra paranoid, add a VM_BUG_ON() to verify this assumption.] [lee.schermerhorn@hp.com: I don't think the function name 'mpol_new_mempolicy' is descriptive enough to differentiate it from mpol_new(). This function applies cpuset set context, usually constraining nodes to those allowed by the cpuset. However, when the 'RELATIVE_NODES flag is set, it also translates the nodes. So I settled on 'mpol_set_nodemask()', because the comment block for mpol_new() mentions that we need to call this function to "set nodes". Some additional minor line length, whitespace and typo cleanup.] Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Paul Menage <menage@google.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:31 -07:00
Nick Piggin	d2bf6be8ab	mm: clean up get_user_pages_fast() documentation Move more documentation for get_user_pages_fast into the new kerneldoc comment. Add some comments for get_user_pages as well. Also, move get_user_pages_fast declaration up to get_user_pages. It wasn't there initially because it was once a static inline function. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Andy Grover <andy.grover@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:30 -07:00
Wu Fengguang	dc566127dd	radix-tree: add radix_tree_prev_hole() The counterpart of radix_tree_next_hole(). To be used by context readahead. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Vladislav Bolkhovitin <vst@vlnb.net> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Ying Han <yinghan@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:30 -07:00
Wu Fengguang	d30a11004e	readahead: record mmap read-around states in file_ra_state Mmap read-around now shares the same code style and data structure with readahead code. This also removes do_page_cache_readahead(). Its last user, mmap read-around, has been changed to call ra_submit(). The no-readahead-if-congested logic is dumped by the way. Users will be pretty sensitive about the slow loading of executables. So it's unfavorable to disabled mmap read-around on a congested queue. [akpm@linux-foundation.org: coding-style fixes] Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Cc: Ying Han <yinghan@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:29 -07:00
Wu Fengguang	1ebf26a9b3	readahead: make mmap_miss an unsigned int This makes the performance impact of possible mmap_miss wrap around to be temporary and tolerable: i.e. MMAP_LOTSAMISS=100 extra readarounds. Otherwise if ever mmap_miss wraps around to negative, it takes INT_MAX cache misses to bring it back to normal state. During the time mmap readaround will be _enabled_ for whatever wild random workload. That's almost permanent performance impact. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Ying Han <yinghan@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:28 -07:00
Alexey Dobriyan	bb1f17b037	mm: consolidate init_mm definition * create mm/init-mm.c, move init_mm there * remove INIT_MM, initialize init_mm with C99 initializer * unexport init_mm on all arches: init_mm is already unexported on x86. One strange place is some OMAP driver (drivers/video/omap/) which won't build modular, but it's already wants get_vm_area() export. Somebody should look there. [akpm@linux-foundation.org: add missing #includes] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Mike Frysinger <vapier.adi@gmail.com> Cc: Americo Wang <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:28 -07:00
Yinghai Lu	3b0fde0fac	firmware_map: fix hang with x86/32bit Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13484 Peer reported: \| The bug is introduced from kernel 2.6.27, if E820 table reserve the memory \| above 4G in 32bit OS(BIOS-e820: 00000000fff80000 - 0000000120000000 \| (reserved)), system will report Int 6 error and hang up. The bug is caused by \| the following code in drivers/firmware/memmap.c, the resource_size_t is 32bit \| variable in 32bit OS, the BUG_ON() will be invoked to result in the Int 6 \| error. I try the latest 32bit Ubuntu and Fedora distributions, all hit this \| bug. \|====== \|static int firmware_map_add_entry(resource_size_t start, resource_size_t end, \| const char type, \| struct firmware_map_entry entry) and it only happen with CONFIG_PHYS_ADDR_T_64BIT is not set. it turns out we need to pass u64 instead of resource_size_t for that. [akpm@linux-foundation.org: add comment] Reported-and-tested-by: Peer Chen <pchen@nvidia.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:28 -07:00
Arnd Bergmann	08604bd993	time: move PIT_TICK_RATE to linux/timex.h PIT_TICK_RATE is currently defined in four architectures, but in three different places. While linux/timex.h is not the perfect place for it, it is still a reasonable replacement for those drivers that traditionally use asm/timex.h to get CLOCK_TICK_RATE and expect it to be the PIT frequency. Note that for Alpha, the actual value changed from 1193182UL to 1193180UL. This is unlikely to make a difference, and probably can only improve accuracy. There was a discussion on the correct value of CLOCK_TICK_RATE a few years ago, after which every existing instance was getting changed to 1193182. According to the specification, it should be 1193181.818181... Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Len Brown <lenb@kernel.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Dmitry Torokhov <dtor@mail.ru> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-16 19:47:27 -07:00
Linus Torvalds	19035e5b5d	Merge branch 'timers-for-linus-migration' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus-migration' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: timers: Logic to move non pinned timers timers: /proc/sys sysctl hook to enable timer migration timers: Identifying the existing pinned timers timers: Framework for identifying pinned timers timers: allow deferrable timers for intervals tv2-tv5 to be deferred Fix up conflicts in kernel/sched.c and kernel/timer.c manually	2009-06-15 10:06:19 -07:00
Linus Torvalds	3f27c0d2a4	Merge branch 'timers-for-linus-clocksource' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus-clocksource' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: clocksource: prevent selection of low resolution clocksourse also for nohz=on clocksource: sanity check sysfs clocksource changes	2009-06-15 09:58:33 -07:00
Linus Torvalds	9aaa630503	Merge branch 'timers-for-linus-ntp' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus-ntp' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: ntp: fix comment typos ntp: adjust SHIFT_PLL to improve NTP convergence	2009-06-15 09:43:24 -07:00
Linus Torvalds	2ed0e21b30	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1244 commits) pkt_sched: Rename PSCHED_US2NS and PSCHED_NS2US ipv4: Fix fib_trie rebalancing Bluetooth: Fix issue with uninitialized nsh.type in DTL-1 driver Bluetooth: Fix Kconfig issue with RFKILL integration PIM-SM: namespace changes ipv4: update ARPD help text net: use a deferred timer in rt_check_expire ieee802154: fix kconfig bool/tristate muckup bonding: initialization rework bonding: use is_zero_ether_addr bonding: network device names are case sensative bonding: elminate bad refcount code bonding: fix style issues bonding: fix destructor bonding: remove bonding read/write semaphore bonding: initialize before registration bonding: bond_create always called with default parameters x_tables: Convert printk to pr_err netfilter: conntrack: optional reliable conntrack event delivery list_nulls: add hlist_nulls_add_head and hlist_nulls_del ...	2009-06-15 09:40:05 -07:00
Linus Torvalds	0fa213310c	Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (103 commits) powerpc: Fix bug in move of altivec code to vector.S powerpc: Add support for swiotlb on 32-bit powerpc/spufs: Remove unused error path powerpc: Fix warning when printing a resource_size_t powerpc/xmon: Remove unused variable in xmon.c powerpc/pseries: Fix warnings when printing resource_size_t powerpc: Shield code specific to 64-bit server processors powerpc: Separate PACA fields for server CPUs powerpc: Split exception handling out of head_64.S powerpc: Introduce CONFIG_PPC_BOOK3S powerpc: Move VMX and VSX asm code to vector.S powerpc: Set init_bootmem_done on NUMA platforms as well powerpc/mm: Fix a AB->BA deadlock scenario with nohash MMU context lock powerpc/mm: Fix some SMP issues with MMU context handling powerpc: Add PTRACE_SINGLEBLOCK support fbdev: Add PLB support and cleanup DCR in xilinxfb driver. powerpc/virtex: Add ml510 reference design device tree powerpc/virtex: Add Xilinx ML510 reference design support powerpc/virtex: refactor intc driver and add support for i8259 cascading powerpc/virtex: Add support for Xilinx PCI host bridge ...	2009-06-15 09:32:52 -07:00
Philipp Zabel	b110a8fb24	regulator/max1586: support increased V3 voltage range The V3 regulator can be configured with an external resistor connected to the feedback pin (R24 in the data sheet) to increase the voltage range. For example, hx4700 has R24 = 3.32 kOhm to achieve a maximum V3 voltage of 1.55 V which is needed for 624 MHz CPU frequency. Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com> Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Acked-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>	2009-06-15 11:18:26 +01:00
Marek Szyprowski	0cbdf7bce5	LP3971 PMIC regulator driver (updated and combined version) This patch adds regulator drivers for National Semiconductors LP3971 PMIC. This LP3971 PMIC controller has 3 DC/DC voltage converters and 5 low drop-out (LDO) regulators. LP3971 PMIC controller uses I2C interface. Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>	2009-06-15 11:18:26 +01:00
Mike Rapoport	1d98cccf7f	regulator: add userspace-consumer driver The userspace-consumer driver allows control of voltage and current regulator state from userspace. This is required for fine-grained power management of devices that are completely controller by userspace applications, e.g. a GPS transciever connected to a serial port. Signed-off-by: Mike Rapoport <mike@compulab.co.il> Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>	2009-06-15 11:18:22 +01:00
Robert Jarzmik	55f4fa4e33	Maxim 1586 regulator driver The Maxim 1586 regulator is a voltage regulator with 2 voltage outputs, specially suitable for Marvell PXA chips. One output is in the range of required VCC_CORE by the PXA27x chips, the other in the VCC_USIM required as well by PXA27x chips. The chip is controlled through the I2C bus. Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr> Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>	2009-06-15 11:18:22 +01:00
David S. Miller	9cbc1cb8cd	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: Documentation/feature-removal-schedule.txt drivers/scsi/fcoe/fcoe.c net/core/drop_monitor.c net/core/net-traces.c	2009-06-15 03:02:23 -07:00
Jarek Poplawski	ca44d6e60f	pkt_sched: Rename PSCHED_US2NS and PSCHED_NS2US Let's use TICKS instead of US, so PSCHED_TICKS2NS and PSCHED_NS2TICKS (like in PSCHED_TICKS_PER_SEC already) to avoid misleading. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-15 02:31:47 -07:00
Linus Torvalds	45e3e1935e	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next: (53 commits) .gitignore: ignore .lzma files kbuild: add generic --set-str option to scripts/config kbuild: simplify argument loop in scripts/config kbuild: handle non-existing options in scripts/config kallsyms: generalize text region handling kallsyms: support kernel symbols in Blackfin on-chip memory documentation: make version fix kbuild: fix a compile warning gitignore: Add GNU GLOBAL files to top .gitignore kbuild: fix delay in setlocalversion on readonly source README: fix misleading pointer to the defconf directory vmlinux.lds.h update kernel-doc: cleanup perl script Improve vmlinux.lds.h support for arch specific linker scripts kbuild: fix headers_exports with boolean expression kbuild/headers_check: refine extern check kbuild: fix "Argument list too long" error for "make headers_check", ignore .patch files Remove bashisms from scripts menu: fix embedded menu presentation ...	2009-06-14 14:12:18 -07:00
Linus Torvalds	cf5046323e	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: mlx4_core: Don't double-free IRQs when falling back from MSI-X to INTx IB/mthca: Don't double-free IRQs when falling back from MSI-X to INTx IB/mlx4: Add strong ordering to local inval and fast reg work requests IB/ehca: Remove superfluous bitmasks from QP control block RDMA/cxgb3: Limit fast register size based on T3 limitations RDMA/cxgb3: Report correct port state and MTU mlx4_core: Add module parameter for number of MTTs per segment IB/mthca: Add module parameter for number of MTTs per segment RDMA/nes: Fix off-by-one bugs in reset_adapter_ne020() and init_serdes() infiniband: Remove void casts IB/ehca: Increment version number IB/ehca: Remove unnecessary memory operations for userspace queue pairs IB/ehca: Fall back to vmalloc() for big allocations IB/ehca: Replace vmalloc() with kmalloc() for queue allocation	2009-06-14 13:53:22 -07:00
Samuel Thibault	5a7e3d1281	keyboard: advertise KT_DEAD2 extended diacriticals In addition to KT_DEAD which has limited support for diacriticals, there is KT_DEAD2 that can support 256 criticals, so let's advertise it in <linux/keyboard.h>. This lets userland know abut the drivers/char/keyboard.c function k_dead2, which supports more than the few trivial ones that k_dead supports. Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-14 13:50:36 -07:00
Linus Torvalds	2625b10d8c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc: (25 commits) atmel-mci: add MCI2 register definitions atmel-mci: Integrate AT91 specific definition in header file tmio_mmc: allow compilation for ASIC3 mmc_block: do not DMA to stack sdhci: Print ADMA status and pointer on debug tmio_mmc: fix clock setup tmio_mmc: map SD control registers after enabling the MFD cell tmio_mmc: correct probe return value for num_resources != 3 tmio_mmc: don't use set_irq_type tmio_mmc: add bus_shift support MFD,mmc: tmio_mmc: make HCLK configurable mmc_spi: don't use EINVAL for possible transmission errors cb710: more cleanup for the DEBUG case. sdhci: platform driver for SDHCI mxcmmc: remove frequency workaround cb710: handle DEBUG define in Makefile cb710: add missing parenthesis cb710: fix printk format string mmc: Driver for CB710/720 memory card reader (MMC part) pxamci: add regulator support. ...	2009-06-14 13:46:57 -07:00
Linus Torvalds	489f7ab6c1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (31 commits) trivial: remove the trivial patch monkey's name from SubmittingPatches trivial: Fix a typo in comment of addrconf_dad_start() trivial: usb: fix missing space typo in doc trivial: pci hotplug: adding __init/__exit macros to sgi_hotplug trivial: Remove the hyphen from git commands trivial: fix ETIMEOUT -> ETIMEDOUT typos trivial: Kconfig: .ko is normally not included in module names trivial: SubmittingPatches: fix typo trivial: Documentation/dell_rbu.txt: fix typos trivial: Fix Pavel's address in MAINTAINERS trivial: ftrace:fix description of trace directory trivial: unnecessary (void*) cast removal in sound/oss/msnd.c trivial: input/misc: Fix typo in Kconfig trivial: fix grammo in bus_for_each_dev() kerneldoc trivial: rbtree.txt: fix rb_entry() parameters in sample code trivial: spelling fix in ppc code comments trivial: fix typo in bio_alloc kernel doc trivial: Documentation/rbtree.txt: cleanup kerneldoc of rbtree.txt trivial: Miscellaneous documentation typo fixes trivial: fix typo milisecond/millisecond for documentation and source comments. ...	2009-06-14 13:46:25 -07:00
Linus Torvalds	b322b78169	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: HID: fix inverted wheel for bluetooth version of apple mighty mouse HID: no more reinitializtion is needed in post_reset HID: hidraw -- fix comment about accepted devices HID: Multitouch support for the N-Trig touchscreen HID: add new multitouch and digitizer contants HID: autocentering support for Logitech Force 3D Pro HID: fix hid-ff drivers so that devices work even without ff support HID: force feedback support for SmartJoy PLUS PS2/USB adapter HID: Wacom Graphire Bluetooth driver HID: autocentering support for Logitech G25 Racing Wheel	2009-06-14 13:45:49 -07:00
Linus Torvalds	2cf4d4514d	Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm * 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm: (417 commits) MAINTAINERS: EB110ATX is not ebsa110 MAINTAINERS: update Eric Miao's email address and status fb: add support of LCD display controller on pxa168/910 (base layer) [ARM] 5552/1: ep93xx get_uart_rate(): use EP93XX_SYSCON_PWRCNT and EP93XX_SYSCON_PWRCN [ARM] pxa/sharpsl_pm: zaurus needs generic pxa suspend/resume routines [ARM] 5544/1: Trust PrimeCell resource sizes [ARM] pxa/sharpsl_pm: cleanup of gpio-related code. [ARM] pxa/sharpsl_pm: drop set_irq_type calls [ARM] pxa/sharpsl_pm: merge pxa-specific code into generic one [ARM] pxa/sharpsl_pm: merge the two sharpsl_pm.c since it's now pxa specific [ARM] sa1100: remove unused collie_pm.c [ARM] pxa: fix the conflicting non-static declarations of global_gpios[] [ARM] 5550/1: Add default configure file for w90p910 platform [ARM] 5549/1: Add clock api for w90p910 platform. [ARM] 5548/1: Add gpio api for w90p910 platform [ARM] 5551/1: Add multi-function pin api for w90p910 platform. [ARM] Make ARM_VIC_NR depend on ARM_VIC [ARM] 5546/1: ARM PL022 SSP/SPI driver v3 ARM: OMAP4: SMP: Update defconfig for OMAP4430 ARM: OMAP4: SMP: Enable SMP support for OMAP4430 ...	2009-06-14 13:42:43 -07:00
Sam Ravnborg	7923f90fff	vmlinux.lds.h update Updated after review by Tim Abbott. - Use HEAD_TEXT_SECTION - Drop use of section-names.h and delete file - Introduce EXIT_CALL Deleting section-names.h required a few simple updates of init.h Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Tim Abbott <tabbott@ksplice.com>	2009-06-14 22:10:41 +02:00
Russell King	4c31791c3d	Merge branch 'for-rmk' of git://git.kernel.org/pub/scm/linux/kernel/git/ycmiao/pxa-linux-2.6 into devel	2009-06-14 11:00:16 +01:00
Philipp Zabel	f0e46cc497	MFD,mmc: tmio_mmc: make HCLK configurable The Toshiba parts all have a 24 MHz HCLK, but HTC ASIC3 has a 24.576 MHz HCLK and AMD Imageon w228x's HCLK is 80 MHz. With this patch, the MFD driver provides the HCLK frequency to tmio_mmc via mfd_cell->driver_data. Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com> Acked-by: Ian Molton <ian@mnementh.co.uk> Acked-by: Samuel Ortiz <sameo@openedhand.com> Signed-off-by: Pierre Ossman <pierre@ossman.eu>	2009-06-13 22:42:59 +02:00
Michał Mirosław	c54f6bc67a	cb710: more cleanup for the DEBUG case. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: Pierre Ossman <pierre@ossman.eu>	2009-06-13 22:42:59 +02:00
Pierre Ossman	9bf69a26ad	cb710: handle DEBUG define in Makefile Signed-off-by: Pierre Ossman <pierre@ossman.eu>	2009-06-13 22:42:58 +02:00
Michał Mirosław	5f5bac8272	mmc: Driver for CB710/720 memory card reader (MMC part) The code is divided in two parts. There is a virtual 'bus' driver that handles PCI device and registers three new devices one per card reader type. The other driver handles SD/MMC part of the reader. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: Pierre Ossman <pierre@ossman.eu>	2009-06-13 22:42:58 +02:00
Linus Torvalds	84c48e6f43	Merge git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6: avr32: Fix oops on unaligned user access avr32: Add support for Mediama RMTx add-on board for ATNGW100 avr32: Change Atmel ATNGW100 config to add choice of add-on board Fix MIMC200 board LCD init avr32: Fix clash in ATMEL_USART_ flags avr32: remove obsolete hw_interrupt_type avr32: Solves problem with inverted MCI detect pin on Merisc board atmel-mci: Add support for inverted detect pin	2009-06-13 13:18:32 -07:00
Haavard Skinnemoen	fbe0b8d582	Merge branch 'avr32-arch' of git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6	2009-06-13 15:34:22 +02:00
Pablo Neira Ayuso	dd7669a92c	netfilter: conntrack: optional reliable conntrack event delivery This patch improves ctnetlink event reliability if one broadcast listener has set the NETLINK_BROADCAST_ERROR socket option. The logic is the following: if an event delivery fails, we keep the undelivered events in the missed event cache. Once the next packet arrives, we add the new events (if any) to the missed events in the cache and we try a new delivery, and so on. Thus, if ctnetlink fails to deliver an event, we try to deliver them once we see a new packet. Therefore, we may lose state transitions but the userspace process gets in sync at some point. At worst case, if no events were delivered to userspace, we make sure that destroy events are successfully delivered. Basically, if ctnetlink fails to deliver the destroy event, we remove the conntrack entry from the hashes and we insert them in the dying list, which contains inactive entries. Then, the conntrack timer is added with an extra grace timeout of random32() % 15 seconds to trigger the event again (this grace timeout is tunable via /proc). The use of a limited random timeout value allows distributing the "destroy" resends, thus, avoiding accumulating lots "destroy" events at the same time. Event delivery may re-order but we can identify them by means of the tuple plus the conntrack ID. The maximum number of conntrack entries (active or inactive) is still handled by nf_conntrack_max. Thus, we may start dropping packets at some point if we accumulate a lot of inactive conntrack entries that did not successfully report the destroy event to userspace. During my stress tests consisting of setting a very small buffer of 2048 bytes for conntrackd and the NETLINK_BROADCAST_ERROR socket flag, and generating lots of very small connections, I noticed very few destroy entries on the fly waiting to be resend. A simple way to test this patch consist of creating a lot of entries, set a very small Netlink buffer in conntrackd (+ a patch which is not in the git tree to set the BROADCAST_ERROR flag) and invoke `conntrack -F'. For expectations, no changes are introduced in this patch. Currently, event delivery is only done for new expectations (no events from expectation expiration, removal and confirmation). In that case, they need a per-expectation event cache to implement the same idea that is exposed in this patch. This patch can be useful to provide reliable flow-accouting. We still have to add a new conntrack extension to store the creation and destroy time. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>	2009-06-13 12:30:52 +02:00
Pablo Neira Ayuso	d219dce76c	list_nulls: add hlist_nulls_add_head and hlist_nulls_del This patch adds the hlist_nulls_add_head() function which is based on hlist_nulls_add_head_rcu() but without the use of rcu_assign_pointer(). It also adds hlist_nulls_del which is exactly the same like hlist_nulls_del_rcu(). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2009-06-13 12:28:57 +02:00
Pablo Neira Ayuso	9858a3ae1d	netfilter: conntrack: move helper destruction to nf_ct_helper_destroy() This patch moves the helper destruction to a function that lives in nf_conntrack_helper.c. This new function is used in the patch to add ctnetlink reliable event delivery. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>	2009-06-13 12:28:22 +02:00
Pablo Neira Ayuso	a0891aa6a6	netfilter: conntrack: move event caching to conntrack extension infrastructure This patch reworks the per-cpu event caching to use the conntrack extension infrastructure. The main drawback is that we consume more memory per conntrack if event delivery is enabled. This patch is required by the reliable event delivery that follows to this patch. BTW, this patch allows you to enable/disable event delivery via /proc/sys/net/netfilter/nf_conntrack_events in runtime, although you can still disable event caching as compilation option. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>	2009-06-13 12:26:29 +02:00
Thomas Gleixner	cd6d95d844	clocksource: prevent selection of low resolution clocksourse also for nohz=on commit `3f68535ada` (clocksource: sanity check sysfs clocksource changes) prevents selection of non high resolution capable clocksources when high resolution mode is active, but did not take into account that the same rules apply for highres=off nohz=on. Check the tick device mode instead of hrtimer_hres_active() to verify whether the system needs to be protected from a switch to jiffies or other non highres capable clock sources. Reported-by: Luming Yu <luming.yu@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-06-13 12:00:26 +02:00
Richard Röjfors	dd14be4c27	i2c-ocores: Can add I2C devices to the bus There is sometimes a need for the ocores driver to add devices to the bus when installed. i2c_register_board_info can not always be used, because the I2C devices are not known at an early state, they could for instance be connected on a I2C bus on a PCI device which has the Open Cores IP. i2c_new_device can not be used in all cases either since the resulting bus nummer might be unknown. The solution is the pass a list of I2C devices in the platform data to the Open Cores driver. This is useful for MFD drivers. Signed-off-by: Richard Röjfors <richard.rojfors.ext@mocean-labs.com> Signed-off-by: Ben Dooks <ben-linux@fluff.org>	2009-06-13 10:39:28 +01:00
Linus Torvalds	cd166bd0dd	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: add generic lib/checksum.c asm-generic: add a generic uaccess.h asm-generic: add generic NOMMU versions of some headers asm-generic: add generic atomic.h and io.h asm-generic: add legacy I/O header files asm-generic: add generic versions of common headers asm-generic: make bitops.h usable asm-generic: make pci.h usable directly asm-generic: make get_rtc_time overridable asm-generic: rename page.h and uaccess.h asm-generic: rename atomic.h to atomic-long.h asm-generic: add a generic unistd.h asm-generic: add generic ABI headers asm-generic: add generic sysv ipc headers asm-generic: introduce asm/bitsperlong.h asm-generic: rename termios.h, signal.h and mman.h	2009-06-12 18:15:51 -07:00
Linus Torvalds	6b702462cb	Merge branch 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 * 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (50 commits) drm: include kernel list header file in hashtab header drm: Export hash table functionality. drm: Split out the mm declarations in a separate header. Add atomic operations. drm/radeon: add support for RV790. drm/radeon: add rv740 drm support. drm_calloc_large: check right size, check integer overflow, use GFP_ZERO drm: Eliminate magic I2C frobbing when reading EDID drm/i915: duplicate desired mode for use by fbcon. drm/via: vfree() no need checking before calling it drm: Replace DRM_DEBUG with DRM_DEBUG_DRIVER in i915 driver drm: Replace DRM_DEBUG with DRM_DEBUG_MODE in drm_mode drm/i915: Replace DRM_DEBUG with DRM_DEBUG_KMS in intel_sdvo drm/i915: replace DRM_DEBUG with DRM_DEBUG_KMS in intel_lvds drm: add separate drm debugging levels radeon: remove _DRM_DRIVER from the preadded sarea map drm: don't associate _DRM_DRIVER maps with a master drm: simplify kcalloc() call to kzalloc(). intelfb: fix spelling of "CLOCK" drm: fix LOCK_TEST_WITH_RETURN macro drm/i915: Hook connector to encoder during load detection (fixes tv/vga detect) ...	2009-06-12 18:09:18 -07:00

1 2 3 4 5 ...

31973 Commits