linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-20 02:51:44 +00:00

Author	SHA1	Message	Date
Ingo Molnar	89034bc2c7	Merge branch 'linus' into tracing/core Conflicts: kernel/trace/trace_events_filter.c We use the tracing/core version. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-11 14:19:09 +02:00
Eric Dumazet	bd3f02212d	ring-buffer: Fix memleak in ring_buffer_free() I noticed oprofile memleaked in linux-2.6 current tree, and tracked this ring-buffer leak. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> LKML-Reference: <4A7C06B9.2090302@gmail.com> Cc: stable@kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-08-07 12:46:39 -04:00
Robert Richter	469535a598	ring-buffer: Fix advance of reader in rb_buffer_peek() When calling rb_buffer_peek() from ring_buffer_consume() and a padding event is returned, the function rb_advance_reader() is called twice. This may lead to missing samples or under high workloads to the warning below. This patch fixes this. If a padding event is returned by rb_buffer_peek() it will be consumed by the calling function now. Also, I simplified some code in ring_buffer_consume(). ------------[ cut here ]------------ WARNING: at /dev/shm/.source/linux/kernel/trace/ring_buffer.c:2289 rb_advance_reader+0x2e/0xc5() Hardware name: Anaheim Modules linked in: Pid: 29, comm: events/2 Tainted: G W 2.6.31-rc3-oprofile-x86_64-standard-00059-g5050dc2 #1 Call Trace: [<ffffffff8106776f>] ? rb_advance_reader+0x2e/0xc5 [<ffffffff81039ffe>] warn_slowpath_common+0x77/0x8f [<ffffffff8103a025>] warn_slowpath_null+0xf/0x11 [<ffffffff8106776f>] rb_advance_reader+0x2e/0xc5 [<ffffffff81068bda>] ring_buffer_consume+0xa0/0xd2 [<ffffffff81326933>] op_cpu_buffer_read_entry+0x21/0x9e [<ffffffff810be3af>] ? __find_get_block+0x4b/0x165 [<ffffffff8132749b>] sync_buffer+0xa5/0x401 [<ffffffff810be3af>] ? __find_get_block+0x4b/0x165 [<ffffffff81326c1b>] ? wq_sync_buffer+0x0/0x78 [<ffffffff81326c76>] wq_sync_buffer+0x5b/0x78 [<ffffffff8104aa30>] worker_thread+0x113/0x1ac [<ffffffff8104dd95>] ? autoremove_wake_function+0x0/0x38 [<ffffffff8104a91d>] ? worker_thread+0x0/0x1ac [<ffffffff8104dc9a>] kthread+0x88/0x92 [<ffffffff8100bdba>] child_rip+0xa/0x20 [<ffffffff8104dc12>] ? kthread+0x0/0x92 [<ffffffff8100bdb0>] ? child_rip+0x0/0x20 ---[ end trace f561c0a58fcc89bd ]--- Cc: Steven Rostedt <rostedt@goodmis.org> Cc: <stable@kernel.org> Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-06 14:20:25 +02:00
Steven Rostedt	464e85eb0e	ring-buffer: do not disable ring buffer on oops_in_progress The commit: commit `e0fdace10e` Author: David Miller <davem@davemloft.net> Date: Fri Aug 1 01:11:22 2008 -0700 debug_locks: set oops_in_progress if we will log messages. Otherwise lock debugging messages on runqueue locks can deadlock the system due to the wakeups performed by printk(). Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ingo Molnar <mingo@elte.hu> Will permanently set oops_in_progress on any lockdep failure. When this triggers it will cause any read from the ring buffer to permanently disable the ring buffer (not to mention no locking of printk). This patch removes the check. It keeps the print in NMI which makes sense. This is probably OK, since the ring buffer should not cause something to set oops_in_progress anyway. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-08-05 20:20:00 -04:00
Steven Rostedt	0f2541d299	ring-buffer: fix check of try_to_discard result The function ring_buffer_discard_commit inversed the code path of the result of try_to_discard. It should skip incrementing the entry counter if try_to_discard succeeded. But instead, it increments the entry conder if it succeeded to discard, and does not increment it if it fails. The result of this bug is that filtering will make the stat counters incorrect. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-08-05 20:19:59 -04:00
Lai Jiangshan	da706d8bc8	ring_buffer: Fix warning while ignoring cmpxchg return value kernel/trace/ring_buffer.c: In function 'rb_tail_page_update': kernel/trace/ring_buffer.c:849: warning: value computed is not used kernel/trace/ring_buffer.c:850: warning: value computed is not used Add "(void)"s to fix this warning, because we don't need here to handle the fail case of cmpxchg, it's fine if an interrupt already did the job. Changed from V1: Add a comment(which is written by Steven) for it. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-07-16 18:46:47 -04:00
Steven Rostedt	77ae365eca	ring-buffer: make lockless This patch converts the ring buffers into a completely lockless buffer recording system. The read side still takes locks since we still serialize readers. But the writers are the ones that must be lockless (those can happen in NMIs). The main change is to the "head_page" pointer. We write to the tail, and read from the head. The "head_page" pointer in the cpu buffer is now just a reference to where to look. The real head page is now kept in the head_page->list->prev->next pointer. That is, in the list head of the previous page we set flags. The list pages are allocated to be aligned such that the lowest significant bits are always zero pointing to the list. This gives us play to put in flags to their pointers. bit 0: set when the page is a head page bit 1: set when the writer is moving the page (for overwrite mode) cmpxchg is used to update the pointer. When the writer wraps the buffer and the tail meets the head, in overwrite mode, the writer must move the head page forward. It first uses cmpxchg to change the pointer flag from 1 to 2. Once this is done, the reader on another CPU will not take the page from the buffer. The writers need to protect against interrupts (we don't bother with disabling interrupts because NMIs are allowed to write too). After the writer sets the pointer flag to 2, it takes care to manage interrupts coming in. This is discribed in detail within the comments of the code. Changes in version 2: - Let reader reset entries value of header page. - Fix tail page passing commit page on reader page test. - Always increment entries and write counter in rb_tail_page_update - Add safety check in rb_set_commit_to_write to break out of infinite loop - add mask in rb_is_reader_page [ Impact: lock free writing to the ring buffer ] Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-07-07 18:36:12 -04:00
Steven Rostedt	3adc54fa82	ring-buffer: make the buffer a true circular link list This patch changes the ring buffer data pages from using a link list head pointer, to making each buffer page point to another buffer page and never back to a "head". This makes the handling of the ring buffer less complex, since the traversing of the ring buffer pages no longer needs to account for the head pointer. This change also is needed to make the ring buffer lockless. [ Changes in version 2: - Added change that Lai Jiangshan mentioned. From: Lai Jiangshan <laijs@cn.fujitsu.com> Date: Thu, 11 Jun 2009 11:25:48 +0800 LKML-Reference: <4A30793C.6090208@cn.fujitsu.com> I'm not sure whether these 4 lines: bpage = list_entry(pages.next, struct buffer_page, list); list_del_init(&bpage->list); cpu_buffer->pages = &bpage->list; list_splice(&pages, cpu_buffer->pages); equal to these 2 lines: cpu_buffer->pages = pages.next; list_del(&pages); If there are equivalent, I think the second one are simpler. It may be not a really necessarily cleanup. What I asked is: if there are equivalent, could you use these two line: cpu_buffer->pages = pages.next; list_del(&pages); ] [ Impact: simplify the ring buffer to help make it lockless ] Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-07-07 18:36:10 -04:00
Paul Mundt	1155de47cd	ring-buffer: Make it generally available In hunting down the cause for the hwlat_detector ring buffer spew in my failed -next builds it became obvious that folks are now treating ring_buffer as something that is generic independent of tracing and thus, suitable for public driver consumption. Given that there are only a few minor areas in ring_buffer that have any reliance on CONFIG_TRACING or CONFIG_FUNCTION_TRACER, provide stubs for those and make it generally available. Signed-off-by: Paul Mundt <lethal@linux-sh.org> Cc: Jon Masters <jcm@jonmasters.org> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <20090625053012.GB19944@linux-sh.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-25 10:31:30 +02:00
Linus Torvalds	b0b7065b64	Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (24 commits) tracing/urgent: warn in case of ftrace_start_up inbalance tracing/urgent: fix unbalanced ftrace_start_up function-graph: add stack frame test function-graph: disable when both x86_32 and optimize for size are configured ring-buffer: have benchmark test print to trace buffer ring-buffer: do not grab locks in nmi ring-buffer: add locks around rb_per_cpu_empty ring-buffer: check for less than two in size allocation ring-buffer: remove useless compile check for buffer_page size ring-buffer: remove useless warn on check ring-buffer: use BUF_PAGE_HDR_SIZE in calculating index tracing: update sample event documentation tracing/filters: fix race between filter setting and module unload tracing/filters: free filter_string in destroy_preds() ring-buffer: use commit counters for commit pointer accounting ring-buffer: remove unused variable ring-buffer: have benchmark test handle discarded events ring-buffer: prevent adding write in discarded area tracing/filters: strloc should be unsigned short tracing/filters: operand can be negative ... Fix up kmemcheck-induced conflict in kernel/trace/ring_buffer.c manually	2009-06-20 10:56:46 -07:00
Steven Rostedt	8d707e8eb4	ring-buffer: do not grab locks in nmi If ftrace_dump_on_oops is set, and an NMI detects a lockup, then it will need to read from the ring buffer. But the read side of the ring buffer still takes locks. This patch adds a check on the read side that if it is in an NMI, then it will disable the ring buffer and not take any locks. Reads can still happen on a disabled ring buffer. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-17 14:16:27 -04:00
Steven Rostedt	d47882078f	ring-buffer: add locks around rb_per_cpu_empty The checking of whether the buffer is empty or not needs to be serialized among the readers. Add the reader spin lock around it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-17 14:16:23 -04:00
Steven Rostedt	5f78abeebb	ring-buffer: check for less than two in size allocation The ring buffer must have at least two pages allocated for the reader page swap to work. The page count check will miss the case of a zero size passed in. Even though a zero size ring buffer would probably fail an allocation, making the min size check for less than two instead of equal to one makes the code a bit more robust. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-17 14:16:20 -04:00
Steven Rostedt	0dcd4d6c3e	ring-buffer: remove useless compile check for buffer_page size The original version of the ring buffer had a hack to map the page struct that held the pages of the buffer to also be the structure that the ring buffer would keep the pages in a link list. This overlap of the page struct was very dangerous and that hack was removed a while ago. But there was a check to make sure the buffer_page never became bigger than the page struct, and would fail the compile if it did. The check was only meaningful when we had the hack. Now that we have separate allocated descriptors for the buffer pages, we can remove this check. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-17 14:16:07 -04:00
Steven Rostedt	c6a9d7b55e	ring-buffer: remove useless warn on check A check if "write > BUF_PAGE_SIZE" is done right after a if (write > BUF_PAGE_SIZE) return ...; Thus the check is actually testing the compiler and not the kernel. This is useless, remove it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-16 21:19:26 -04:00
Steven Rostedt	22f470f8da	ring-buffer: use BUF_PAGE_HDR_SIZE in calculating index The index of the event is found by masking PAGE_MASK to it and subtracting the header size. Currently the header size is calculate by PAGE_SIZE - BUF_PAGE_SIZE, when we already have a macro BUF_PAGE_HDR_SIZE to define it. If we want to change BUF_PAGE_SIZE to something less than filling the rest of the page (this is done for debugging), then we break the algorithm to find the index. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-16 21:19:23 -04:00
Steven Rostedt	fa7439531d	ring-buffer: use commit counters for commit pointer accounting The ring buffer is made up of three sets of pointers. The head page pointer, which points to the next page for the reader to get. The commit pointer and commit index, which points to the page and index of the last committed write respectively. The tail pointer and tail index, which points to the page and the index of the last reserved data respectively (non committed). The commit pointer is only moved forward by the outer most writer. If a nested writer comes in, it will not move the pointer forward. The current implementation has a flaw. It assumes that the outer most writer successfully reserved data. There's a small race window where the outer most writer could find the tail pointer, but a nested writer could come in (via interrupt) and move the tail forward, and even the commit forward. The outer writer would not realized the commit moved forward and the accounting will break. This patch changes the design to use counters in the per cpu buffers to keep track of commits. The counters are incremented at the start of the commit, and decremented at the end. If the end commit counter is 1, then it moves the commit pointers. A loop is made to check for races between checking and moving the commit pointers. Only the outer commit should move the pointers anyway. The test of knowing if a reserve is equal to the last commit update is still needed to know for time keeping. The time code is much less racey than the commit updates. This change not only solves the mentioned race, but also makes the code simpler. [ Impact: fix commit race and simplify code ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-16 16:25:33 -04:00
Steven Rostedt	263294f3e1	ring-buffer: remove unused variable Fix the compiler error: kernel/trace/ring_buffer.c: In function 'rb_move_tail': kernel/trace/ring_buffer.c:1236: warning: unused variable 'event' Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-16 16:24:39 -04:00
Steven Rostedt	c7b0930857	ring-buffer: prevent adding write in discarded area This a very tight race where an interrupt could come in and not have enough data to put into the end of a buffer page, and that it would fail to write and need to go to the next page. But if this happened when another writer was about to reserver their data, and that writer has smaller data to reserve, then it could succeed even though the interrupt moved the tail page. To pervent that, if we fail to store data, and by subtracting the amount we reserved we still have room for smaller data, we need to fill that space with "discarded" data. [ Impact: prevent race were buffer data may be lost ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-15 11:37:19 -04:00
Rusty Russell	3f237a79dd	cpumask: use new operators in kernel/trace Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> LKML-Reference: <200906122115.30787.rusty@rustcorp.com.au> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-15 11:36:42 -04:00
Vegard Nossum	1744a21d57	trace: annotate bitfields in struct ring_buffer_event This gets rid of a heap of false-positive warnings from the tracer code due to the use of bitfields. [rebased for mainline inclusion] Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>	2009-06-15 15:49:37 +02:00
Steven Rostedt	f57a8a1911	ring-buffer: fix ret in rb_add_time_stamp The update of ret got mistakenly added to the if statement of rb_try_to_discard. The variable ret should be 1 on commit and zero otherwise. [ Impact: fix compiler warning and real bug ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-09 12:33:30 -04:00
Peter Zijlstra	1f8a6a10fb	ring-buffer: pass in lockdep class key for reader_lock On Sun, 7 Jun 2009, Ingo Molnar wrote: > Testing tracer sched_switch: <6>Starting ring buffer hammer > PASSED > Testing tracer sysprof: PASSED > Testing tracer function: PASSED > Testing tracer irqsoff: > ============================================= > PASSED > Testing tracer preemptoff: PASSED > Testing tracer preemptirqsoff: [ INFO: possible recursive locking detected ] > PASSED > Testing tracer branch: 2.6.30-rc8-tip-01972-ge5b9078-dirty #5760 > --------------------------------------------- > rb_consumer/431 is trying to acquire lock: > (&cpu_buffer->reader_lock){......}, at: [<c109eef7>] ring_buffer_reset_cpu+0x37/0x70 > > but task is already holding lock: > (&cpu_buffer->reader_lock){......}, at: [<c10a019e>] ring_buffer_consume+0x7e/0xc0 > > other info that might help us debug this: > 1 lock held by rb_consumer/431: > #0: (&cpu_buffer->reader_lock){......}, at: [<c10a019e>] ring_buffer_consume+0x7e/0xc0 The ring buffer is a generic structure, and can be used outside of ftrace. If ftrace traces within the use of the ring buffer, it can produce false positives with lockdep. This patch passes in a static lock key into the allocation of the ring buffer, so that different ring buffers will have their own lock class. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1244477919.13761.9042.camel@twins> [ store key in ring buffer descriptor ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-08 18:50:20 -04:00
Steven Rostedt	ea05b57cc1	ring-buffer: discard timestamps that are at the start of the buffer Every buffer page in the ring buffer includes its own time stamp. When an event is recorded to the ring buffer with a delta time greater than what can be held in the event header, a time stamp event is created. If the the create timestamp falls over to the next buffer page, it is redundant because the buffer page holds a full time stamp. This patch will try to discard the time stamp when it falls to the start of the next page. This change also fixes a issues with disarding events. If most events are discarded, timestamps will start to creep into the ring buffer. If we do not discard the timestamps then they can fill up the ring buffer over time and waste space. This change will keep time stamps from filling up over another page. If something is recorded in the buffer page, and the rest is filtered, then the time stamps can only fill up to the end of the page. [ Impact: prevent time stamps from filling ring buffer ] Reported-by: Tim Bird <tim.bird@am.sony.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-03 10:15:25 -04:00
Steven Rostedt	edd813bffc	ring-buffer: try to discard unneeded timestamps There are times that a race may happen that we add a timestamp in a nested write. This timestamp would just contain a zero delta and serves no purpose. Now that we have a way to discard events, this patch will try to discard the timestamp instead of just wasting the space in the ring buffer. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-03 10:15:22 -04:00
Tim Bird	a202355640	ring-buffer: fix bug in ring_buffer_discard_commit There's a bug in ring_buffer_discard_commit. The wrong pointer is being compared in order to check if the event can be freed from the buffer rather than discarded (i.e. marked as PAD). I noticed this when I was working on duration filtering. The bug is not deadly - it just results in lots of wasted space in the buffer. All filtered events are left in the buffer and marked as discarded, rather than being removed from the buffer to make space for other events. Unfortunately, when I fixed this bug, I got errors doing a filtered function trace. Multiple TIME_EXTEND events pile up in the buffer, and trigger the following loop overage warning in rb_iter_peek(): again: ... if (RB_WARN_ON(cpu_buffer, ++nr_loops > 10)) return NULL; I'm not sure what the best way is to fix this. I don't know if I should extend the loop threshhold, or if I should make the test more complex (ignore TIME_EXTEND events), or just get rid of this loop check completely. Note that if I implement a workaround for this, then I see another problem from rb_advance_iter(). I haven't tracked that one down yet. In general, it seems like the case of removing filtered events has not been working properly, and so some assumptions about buffer invariant conditions need to be revisited. Here's the patch for the simple fix: Compare correct pointer for checking if an event can be freed rather than left as discarded in the buffer. Signed-off-by: Tim Bird <tim.bird@am.sony.com> LKML-Reference: <4A25BE9E.5090909@am.sony.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-06-03 10:15:06 -04:00
Steven Rostedt	168b6b1d05	ring-buffer: move code around to remove some branches This is a bit of micro-optimizations. But since the ring buffer is used in tracing every function call, it is an extreme hot path. Every nanosecond counts. This change shows over 5% improvement in the ring-buffer-benchmark. [ Impact: more efficient code ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-05-11 23:33:06 -04:00
Steven Rostedt	88eb012536	ring-buffer: use internal time stamp function The ring_buffer_time_stamp that is exported adds a little more overhead than is needed for using it internally. This patch adds an internal timestamp function that can be inlined (a single line function) and used internally for the ring buffer. [ Impact: a little less overhead to the ring buffer ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-05-11 23:14:03 -04:00
Steven Rostedt	0f0c85fc80	ring-buffer: small optimizations Doing some small changes in the fast path of the ring buffer recording saves over 3% in the ring-buffer-benchmark test. [ Impact: a little faster ring buffer recording ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-05-11 23:12:34 -04:00
Steven Rostedt	be957c447f	ring-buffer: move calculation of event length The event length is calculated and passed in to rb_reserve_next_event in two different locations. Having rb_reserve_next_event do the calculations directly makes only one location to do the change and causes the calculation to be inlined by gcc. Before: text data bss dec hex filename 16538 24 12 16574 40be kernel/trace/ring_buffer.o After: text data bss dec hex filename 16490 24 12 16526 408e kernel/trace/ring_buffer.o [ Impact: smaller more efficient code ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-05-11 14:42:53 -04:00
Steven Rostedt	1cd8d73589	ring-buffer: remove type parameter from rb_reserve_next_event The rb_reserve_next_event is only called for the data type (type = 0). There is no reason to pass in the type to the function. Before: text data bss dec hex filename 16554 24 12 16590 40ce kernel/trace/ring_buffer.o After: text data bss dec hex filename 16538 24 12 16574 40be kernel/trace/ring_buffer.o [ Impact: cleaner, smaller and slightly more efficient code ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-05-11 14:19:00 -04:00
Steven Rostedt	74f4fd2166	ring-buffer: change WARN_ON from checking preempt_count to preemptible There's a WARN_ON in the ring buffer code that makes sure preemption is disabled. It checks "!preempt_count()". But when CONFIG_PREEMPT is not enabled, preempt_count() is always zero, and this will trigger the warning. [ Impact: prevent false warning on non preemptible kernels ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-05-07 20:01:11 -04:00
Steven Rostedt	6634ff26cc	ring-buffer: make moving the tail page a separate function Ingo Molnar thought the code would be cleaner if we used a function call instead of a goto for moving the tail page. After implementing this, it seems that gcc still inlines the result and the output is pretty much the same. Since this is considered a cleaner approach, might as well implement it. [ Impact: code clean up ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-05-06 15:30:07 -04:00
Steven Rostedt	8e7abf1c62	ring-buffer: remove unneeded conditional in rb_reserve_next The code in __rb_reserve_next checks on page overflow if it is the original commiter and then resets the page back to the original setting. Although this is fine, and the code is correct, it is a bit fragil. Some experimental work I did breaks it easily. The better and more robust solution is to have all commiters that overflow the page, simply subtract what they added. [ Impact: more robust ring buffer account management ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-05-06 12:49:19 -04:00
Steven Rostedt	aa20ae8444	ring-buffer: move big if statement down In the hot path of the ring buffer "__rb_reserve_next" there's a big if statement that does not even return back to the work flow. code; if (cross to next page) { [ lots of code ] return; } more code; The condition is even the unlikely path, although we do not denote it with an unlikely because gcc is fine with it. The condition is true when the write crosses a page boundary, and we need to start at a new page. Having this if statement makes it hard to read, but calling another function to do the work is also not appropriate, because we are using a lot of variables that were set before the if statement, and we do not want to send them as parameters. This patch changes it to a goto: code; if (cross to next page) goto next_page; more code; return; next_page: [ lots of code] This makes the code easier to understand, and a bit more obvious. The output from gcc is practically identical. For some reason, gcc decided to use different registers when I switched it to a goto. But other than that, the logic is the same. [ Impact: easier to read code ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-05-05 21:16:11 -04:00
Steven Rostedt	41ede23ede	ring-buffer: disable writers when resetting buffers As a precaution, it is best to disable writing to the ring buffers when reseting them. [ Impact: prevent weird things if write happens during reset ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-05-05 17:22:02 -04:00
Steven Rostedt	afbab76a62	ring-buffer: have read page swap increment counter with page entries In the swap page ring buffer code that is used by the ftrace splice code, we scan the page to increment the counter of entries read. With the number of entries already in the page we simply need to add it. [ Impact: speed up reading page from ring buffer ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-05-05 16:58:24 -04:00
Steven Rostedt	778c55d44e	ring-buffer: record page entries in buffer page descriptor Currently, when the ring buffer writer overflows the buffer and must write over non consumed data, we increment the overrun counter by reading the entries on the page we are about to overwrite. This reads the entries one by one. This is not very effecient. This patch adds another entry counter into each buffer page descriptor that keeps track of the number of entries on the page. Now on overwrite, the overrun counter simply needs to add the number of entries that is on the page it is about to overwrite. [ Impact: speed up of ring buffer in overwrite mode ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-05-05 14:28:08 -04:00
Steven Rostedt	e4906eff9e	ring-buffer: convert cpu buffer entries to local_t The entries counter in cpu buffer is not atomic. It can be updated by other interrupts or from another CPU (readers). But making entries into "atomic_t" causes an atomic operation that can hurt performance. Instead we convert it to a local_t that will increment a counter with a local CPU atomic operation (if the arch supports it). Instead of fighting with readers and overwrites that decrement the counter, I added a "read" counter. Every time a reader reads an entry it is incremented. We already have a overrun counter and with that, the entries counter and the read counter, we can calculate the total number of entries in the buffer with: (entries - overrun) - read As long as the total number of entries in the ring buffer is less than the word size, this will work. But since the entries counter was previously a long, this is no different than what we had before. Thanks to Andrew Morton for pointing out in the first version that atomic_t does not replace unsigned long. I switched to atomic_long_t even though it is signed. A negative count is most likely a bug. [ Impact: keep accurate count of cpu buffer entries ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-05-05 14:25:44 -04:00
Steven Rostedt	f0d2c681ac	ring-buffer: add counters for commit overrun and nmi dropped entries The WARN_ON in the ring buffer when a commit is preempted and the buffer is filled by preceding writes can happen in normal operations. The WARN_ON makes it look like a bug, not to mention, because it does not stop tracing and calls printk which can also recurse, this is prone to deadlock (the WARN_ON is not in a position to recurse). This patch removes the WARN_ON and replaces it with a counter that can be retrieved by a tracer. This counter is called commit_overrun. While at it, I added a nmi_dropped counter to count any time an NMI entry is dropped because the NMI could not take the spinlock. [ Impact: prevent deadlock by printing normal case warning ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-05-05 13:51:02 -04:00
Steven Rostedt	d6ce96dabe	ring-buffer: export symbols I'm adding a module to do a series of tests on the ring buffer as well as benchmarks. This module needs to have more of the ring buffer API exported. There's nothing wrong with reading the ring buffer from a module. [ Impact: allow modules to read pages from the ring buffer ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-05-05 13:46:33 -04:00
Steven Rostedt	7d7d2b8031	ring-buffer: fix printk output The warning output in trace_recursive_lock uses %d for a long when it should be %ld. [ Impact: fix compile warning ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-04-29 00:42:01 -04:00
Lai Jiangshan	334d4169a6	ring_buffer: compressed event header RB_MAX_SMALL_DATA = 28bytes is too small for most tracers, it wastes an 'u32' to save the actually length for events which data size > 28. This fix uses compressed event header and enlarges RB_MAX_SMALL_DATA. [ Impact: saves about 0%-12.5%(depends on tracer) memory in ring_buffer ] Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> LKML-Reference: <49F13189.3090000@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-04-24 00:08:38 -04:00
Steven Rostedt	3554228d42	ring-buffer: only warn on wrap if buffer is bigger than two pages On boot up, to save memory, ftrace allocates the minimum buffer which is two pages. Ftrace also goes through a series of tests (when configured) on boot up. These tests can fill up a page within a single interrupt. The ring buffer also has a WARN_ON when it detects that the buffer was completely filled within a single commit (other commits are allowed to be nested). Combine the small buffer on start up, with the tests that can fill more than a single page within an interrupt, this can trigger the WARN_ON. This patch makes the WARN_ON only happen when the ring buffer consists of more than two pages. [ Impact: prevent false WARN_ON in ftrace startup tests ] Reported-by: Ingo Molnar <mingo@elte.hu> LKML-Reference: <20090421094616.GA14561@elte.hu> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-21 16:00:45 +02:00
Steven Rostedt	aa18efb2a2	tracing: use recursive counter over irq level Althought using the irq level (hardirq_count, softirq_count and in_nmi) was nice to detect bad recursion right away, but since the counters are not atomically updated with respect to the interrupts, the function tracer might trigger the test from an interrupt handler before the hardirq_count is updated. This will trigger a false warning. This patch converts the recursive detection to a simple counter. If the depth is greater than 16 then the recursive detection will trigger. 16 is more than enough for any nested interrupts. [ Impact: fix false positive trace recursion detection ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-04-20 16:16:11 -04:00
Steven Rostedt	e395898e98	tracing: remove recursive test from ring_buffer_event_discard The ring_buffer_event_discard is not tied to ring_buffer_lock_reserve. It can be called inside or outside the reserve/commit. Even if it is called inside the reserve/commit the commit part must also be called. Only ring_buffer_discard_commit can be used as a replacement for ring_buffer_unlock_commit. This patch removes the trace_recursive_unlock from ring_buffer_event_discard since it would be the wrong place to do so. [Impact: prevent breakage in trace recursive testing ] Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-04-20 13:32:44 -04:00
Steven Rostedt	17487bfeb6	tracing: fix recursive test level calculation The recursive tests to detect same level recursion in the ring buffers did not account for the hard/softirq_counts to be shifted. Thus the numbers could be larger than then mask to be tested. This patch includes the shift for the calculation of the irq depth. [ Impact: stop false positives in trace recursion detection ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-04-20 13:24:21 -04:00
Frederic Weisbecker	f3b9aae162	tracing/ring-buffer: Add unlock recursion protection on discard The pair of helpers trace_recursive_lock() and trace_recursive_unlock() have been introduced recently to provide generic tracing recursion protection. They are used in a symetric way: - trace_recursive_lock() on buffer reserve - trace_recursive_unlock() on buffer commit However sometimes, we don't commit but discard on entry to the buffer, ie: in case of filter checking. Then we must also unlock the recursion protection on discard time, otherwise the tracing gets definitely deactivated and a warning is raised spuriously, such as: 111.119821] ------------[ cut here ]------------ [ 111.119829] WARNING: at kernel/trace/ring_buffer.c:1498 ring_buffer_lock_reserve+0x1b7/0x1d0() [ 111.119835] Hardware name: AMILO Li 2727 [ 111.119839] Modules linked in: [ 111.119846] Pid: 5731, comm: Xorg Tainted: G W 2.6.30-rc1 #69 [ 111.119851] Call Trace: [ 111.119863] [<ffffffff8025ce68>] warn_slowpath+0xd8/0x130 [ 111.119873] [<ffffffff8028a30f>] ? __lock_acquire+0x19f/0x1ae0 [ 111.119882] [<ffffffff8028a30f>] ? __lock_acquire+0x19f/0x1ae0 [ 111.119891] [<ffffffff802199b0>] ? native_sched_clock+0x20/0x70 [ 111.119899] [<ffffffff80286dee>] ? put_lock_stats+0xe/0x30 [ 111.119906] [<ffffffff80286eb8>] ? lock_release_holdtime+0xa8/0x150 [ 111.119913] [<ffffffff802c8ae7>] ring_buffer_lock_reserve+0x1b7/0x1d0 [ 111.119921] [<ffffffff802cd110>] trace_buffer_lock_reserve+0x30/0x70 [ 111.119930] [<ffffffff802ce000>] trace_current_buffer_lock_reserve+0x20/0x30 [ 111.119939] [<ffffffff802474e8>] ftrace_raw_event_sched_switch+0x58/0x100 [ 111.119948] [<ffffffff808103b7>] __schedule+0x3a7/0x4cd [ 111.119957] [<ffffffff80211b56>] ? ftrace_call+0x5/0x2b [ 111.119964] [<ffffffff80211b56>] ? ftrace_call+0x5/0x2b [ 111.119971] [<ffffffff80810c08>] schedule+0x18/0x40 [ 111.119977] [<ffffffff80810e09>] preempt_schedule+0x39/0x60 [ 111.119985] [<ffffffff80813bd3>] _read_unlock+0x53/0x60 [ 111.119993] [<ffffffff807259d2>] sock_def_readable+0x72/0x80 [ 111.120002] [<ffffffff807ad5ed>] unix_stream_sendmsg+0x24d/0x3d0 [ 111.120011] [<ffffffff807219a3>] sock_aio_write+0x143/0x160 [ 111.120019] [<ffffffff80211b56>] ? ftrace_call+0x5/0x2b [ 111.120026] [<ffffffff80721860>] ? sock_aio_write+0x0/0x160 [ 111.120033] [<ffffffff80721860>] ? sock_aio_write+0x0/0x160 [ 111.120042] [<ffffffff8031c283>] do_sync_readv_writev+0xf3/0x140 [ 111.120049] [<ffffffff80211b56>] ? ftrace_call+0x5/0x2b [ 111.120057] [<ffffffff80276ff0>] ? autoremove_wake_function+0x0/0x40 [ 111.120067] [<ffffffff8045d489>] ? cap_file_permission+0x9/0x10 [ 111.120074] [<ffffffff8045c1e6>] ? security_file_permission+0x16/0x20 [ 111.120082] [<ffffffff8031cab4>] do_readv_writev+0xd4/0x1f0 [ 111.120089] [<ffffffff80211b56>] ? ftrace_call+0x5/0x2b [ 111.120097] [<ffffffff80211b56>] ? ftrace_call+0x5/0x2b [ 111.120105] [<ffffffff8031cc18>] vfs_writev+0x48/0x70 [ 111.120111] [<ffffffff8031cd65>] sys_writev+0x55/0xc0 [ 111.120119] [<ffffffff80211e32>] system_call_fastpath+0x16/0x1b [ 111.120125] ---[ end trace 15605f4e98d5ccb5 ]--- [ Impact: fix spurious warning triggering tracing shutdown ] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-04-20 10:59:20 +02:00
Frederic Weisbecker	e057a5e564	tracing/core: Add current context on tracing recursion warning In case of tracing recursion detection, we only get the stacktrace. But the current context may be very useful to debug the issue. This patch adds the softirq/hardirq/nmi context with the warning using lockdep context display to have a familiar output. v2: Use printk_once() v3: drop {hardirq,softirq}_context which depend on lockdep, only keep what is part of current->trace_recursion, sufficient to debug the warning source. [ Impact: print context necessary to debug recursion ] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-04-19 23:38:12 +02:00
Steven Rostedt	261842b7c9	tracing: add same level recursion detection The tracing infrastructure allows for recursion. That is, an interrupt may interrupt the act of tracing an event, and that interrupt may very well perform its own trace. This is a recursive trace, and is fine to do. The problem arises when there is a bug, and the utility doing the trace calls something that recurses back into the tracer. This recursion is not caused by an external event like an interrupt, but by code that is not expected to recurse. The result could be a lockup. This patch adds a bitmask to the task structure that keeps track of the trace recursion. To find the interrupt depth, the following algorithm is used: level = hardirq_count() + softirq_count() + in_nmi; Here, level will be the depth of interrutps and softirqs, and even handles the nmi. Then the corresponding bit is set in the recursion bitmask. If the bit was already set, we know we had a recursion at the same level and we warn about it and fail the writing to the buffer. After the data has been committed to the buffer, we clear the bit. No atomics are needed. The only races are with interrupts and they reset the bitmask before returning anywy. [ Impact: detect same irq level trace recursion ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-04-17 16:21:32 -04:00

1 2 3

137 Commits