linux

Author	SHA1	Message	Date
Steven Rostedt (Red Hat)	1b22e382ab	tracing: Let tracing_snapshot() be used by modules but not NMI Add EXPORT_SYMBOL_GPL() to let the tracing_snapshot() functions be called from modules. Also add a test to see if the snapshot was called from NMI context and just warn in the tracing buffer if so, and return. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:35:57 -04:00
Steven Rostedt (Red Hat)	ca268da6e4	tracing: Add internal ftrace trace_puts() for ftrace to use There's a few places that ftrace uses trace_printk() for internal use, but this requires context (normal, softirq, irq, NMI) buffers to keep things lockless. But the trace_puts() does not, as it can write the string directly into the ring buffer. Make a internal helper for trace_puts() and have the internal functions use that. This way the extra context buffers are not used. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:35:56 -04:00
Steven Rostedt (Red Hat)	09ae72348e	tracing: Add trace_puts() for even faster trace_printk() tracing The trace_printk() is extremely fast and is very handy as it can be used in any context (including NMIs!). But it still requires scanning the fmt string for parsing the args. Even the trace_bprintk() requires a scan to know what args will be saved, although it doesn't copy the format string itself. Several times trace_printk() has no args, and wastes cpu cycles scanning the fmt string. Adding trace_puts() allows the developer to use an even faster tracing method that only saves the pointer to the string in the ring buffer without doing any format parsing at all. This will help remove even more of the "Heisenbug" effect, when debugging. Also fixed up the F_printk()s for the ftrace internal bprint and print events. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:35:55 -04:00
Steven Rostedt (Red Hat)	153e8ed913	tracing: Fix the branch tracer that broke with buffer change The changce to add the trace_buffer struct to have the trace array have both the main buffer and max buffer broke the branch tracer because the change did not update that code. As the branch tracer adds a significant amount of overhead, and must be selected via a selection (not a allyesconfig) it was missed in testing. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:35:54 -04:00
Steven Rostedt (Red Hat)	55034cd6e6	tracing: Add alloc_snapshot kernel command line parameter If debugging the kernel, and the developer wants to use tracing_snapshot() in places where tracing_snapshot_alloc() may be difficult (or more likely, the developer is lazy and doesn't want to bother with tracing_snapshot_alloc() at all), then adding alloc_snapshot to the kernel command line parameter will tell ftrace to allocate the snapshot buffer (if configured) when it allocates the main tracing buffer. I also noticed that ring_buffer_expanded and tracing_selftest_disabled had inconsistent use of boolean "true" and "false" with "0" and "1". I cleaned that up too. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:35:53 -04:00
Steven Rostedt (Red Hat)	f4e781c0a8	tracing: Move the tracing selftest code into its own function Move the tracing startup selftest code into its own function and when not enabled, always have that function succeed. This makes the register_tracer() function much more readable. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:35:53 -04:00
Steven Rostedt (Red Hat)	f5eb558826	ring-buffer: Do not use schedule_work_on() for current CPU The ring buffer updates when done while the ring buffer is active, needs to be completed on the CPU that is used for the ring buffer per_cpu buffer. To accomplish this, schedule_work_on() is used to schedule work on the given CPU. Now there's no reason to use schedule_work_on() if the process doing the update happens to be on the CPU that it is processing. It has already filled the requirement. Instead, just do the work and continue. This is needed for tracing_snapshot_alloc() where it may be called really early in boot, where the work queues have not been set up yet. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:35:52 -04:00
Steven Rostedt (Red Hat)	ad909e21bb	tracing: Add internal tracing_snapshot() functions The new snapshot feature is quite handy. It's a way for the user to take advantage of the spare buffer that, until then, only the latency tracers used to "snapshot" the buffer when it hit a max latency. Now users can trigger a "snapshot" manually when some condition is hit in a program. But a snapshot currently can not be triggered by a condition inside the kernel. With the addition of tracing_snapshot() and tracing_snapshot_alloc(), snapshots can now be taking when a condition is hit, and the developer wants to snapshot the case without stopping the trace. Note, any snapshot will overwrite the old one, so take care in how this is done. These new functions are to be used like tracing_on(), tracing_off() and trace_printk() are. That is, they should never be called in the mainline Linux kernel. They are solely for the purpose of debugging. The tracing_snapshot() will not allocate a buffer, but it is safe to be called from any context (except NMIs). But if a snapshot buffer isn't allocated when it is called, it will write to the live buffer, complaining about the lack of a snapshot buffer, and then stop tracing (giving you the "permanent snapshot"). tracing_snapshot_alloc() will allocate the snapshot buffer if it was not already allocated and then take the snapshot. This routine may sleep, and must be called from context that can sleep. The allocation is done with GFP_KERNEL and not atomic. If you need a snapshot in an atomic context, say in early boot, then it is best to call the tracing_snapshot_alloc() before then, where it will allocate the buffer, and then you can use the tracing_snapshot() anywhere you want and still get snapshots. Cc: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:35:51 -04:00
Steven Rostedt (Red Hat)	a695cb5816	tracing: Prevent deleting instances when they are being read Add a ref count to the trace_array structure and prevent removal of instances that have open descriptors. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:35:51 -04:00
Steven Rostedt (Red Hat)	121aaee7b0	tracing: Add per_cpu directory into tracing instances Add the per_cpu directory to the created tracing instances: cd /sys/kernel/debug/tracing/instances mkdir foo ls foo/per_cpu/cpu0 buffer_size_kb snapshot_raw trace trace_pipe_raw snapshot stats trace_pipe Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:35:50 -04:00
Steven Rostedt (Red Hat)	ce9bae5597	tracing: Add snapshot feature to instances Add the "snapshot" file to the the multi-buffer instances. cd /sys/kernel/debug/tracing/instances mkdir foo ls foo buffer_size_kb buffer_total_size_kb events free_buffer set_event snapshot trace trace_clock trace_marker trace_options trace_pipe tracing_on cat foo/snapshot # tracer: nop # # # * Snapshot is freed * # # Snapshot commands: # echo 0 > snapshot : Clears and frees snapshot buffer # echo 1 > snapshot : Allocates snapshot buffer, if not already allocated. # Takes a snapshot of the main buffer. # echo 2 > snapshot : Clears snapshot buffer (but does not allocate) # (Doesn't have to be '2' works with any number that # is not a '0' or '1') Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:35:49 -04:00
Steven Rostedt (Red Hat)	737223fbca	tracing: Consolidate buffer allocation code There's a bit of duplicate code in creating the trace buffers for the normal trace buffer and the max trace buffer among the instances and the main global_trace. This code can be consolidated and cleaned up a bit making the code cleaner and more readable as well as less duplication. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:35:49 -04:00
Steven Rostedt (Red Hat)	45ad21ca55	tracing: Have trace_array keep track if snapshot buffer is allocated The snapshot buffer belongs to the trace array not the tracer that is running. The trace array should be the data structure that keeps track of whether or not the snapshot buffer is allocated, not the tracer desciptor. Having the trace array keep track of it makes modifications so much easier. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:35:48 -04:00
Steven Rostedt (Red Hat)	6de58e6269	tracing: Add snapshot_raw to extract the raw data from snapshot Add a 'snapshot_raw' per_cpu file that allows tools to read the raw binary data of the snapshot buffer. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:35:47 -04:00
Steven Rostedt (Red Hat)	0b85ffc293	tracing: Add config option to allow snapshot to swap per cpu When the preempt or irq latency tracers are enabled, they require the ring buffer to be able to swap the per cpu sub buffers between two main buffers. This adds a slight overhead to tracing as the trace recording needs to perform some checks to synchronize between recording and swaps that might be happening on other CPUs. The config RING_BUFFER_ALLOW_SWAP is set when a user of the ring buffer needs the "swap cpu" feature, otherwise the extra checks are not implemented and removed from the tracing overhead. The snapshot feature will swap per CPU if the RING_BUFFER_ALLOW_SWAP config is set. But that only gets set by things like OPROFILE and the irqs and preempt latency tracers. This config is added to let the user decide to include this feature with the snapshot agnostic from whether or not another user of the ring buffer sets this config. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:35:47 -04:00
Steven Rostedt (Red Hat)	f1affcaaa8	tracing: Add snapshot in the per_cpu trace directories Add the snapshot file into the per_cpu tracing directories to allow them to be read for an individual cpu. This also allows to clear an individual cpu from the snapshot buffer. If the kernel allows it (CONFIG_RING_BUFFER_ALLOW_SWAP is set), then echoing in '1' into one of the per_cpu snapshot files will do an individual cpu buffer swap instead of the entire file. Cc: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:35:46 -04:00
Steven Rostedt (Red Hat)	12883efb67	tracing: Consolidate max_tr into main trace_array structure Currently, the way the latency tracers and snapshot feature works is to have a separate trace_array called "max_tr" that holds the snapshot buffer. For latency tracers, this snapshot buffer is used to swap the running buffer with this buffer to save the current max latency. The only items needed for the max_tr is really just a copy of the buffer itself, the per_cpu data pointers, the time_start timestamp that states when the max latency was triggered, and the cpu that the max latency was triggered on. All other fields in trace_array are unused by the max_tr, making the max_tr mostly bloat. This change removes the max_tr completely, and adds a new structure called trace_buffer, that holds the buffer pointer, the per_cpu data pointers, the time_start timestamp, and the cpu where the latency occurred. The trace_array, now has two trace_buffers, one for the normal trace and one for the max trace or snapshot. By doing this, not only do we remove the bloat from the max_trace but the instances of traces can now use their own snapshot feature and not have just the top level global_trace have the snapshot feature and latency tracers for itself. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:35:40 -04:00
Steven Rostedt (Red Hat)	22cffc2bb4	tracing: Enable snapshot when any latency tracer is enabled The snapshot utility is extremely useful, and does not add any more overhead in memory when another latency tracer is enabled. They use the snapshot underneath. There's no reason to hide the snapshot file when a latency tracer has been enabled in the kernel. If any of the latency tracers (irq, preempt or wakeup) is enabled then also select the snapshot facility. Note, snapshot can be enabled without the latency tracers enabled. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:34:57 -04:00
Steven Rostedt (Red Hat)	873c642f59	tracing: Clear all trace buffers when unloaded module event was used Currently we do not know what buffer a module event was enabled in. On unload, it is safest to clear all buffer instances, not just the top level buffer. Todo: Clear only the buffer that the event was used in. The infrastructure is there to do this, but it makes the code a bit more complex. Lets get the current code vetted before we add that. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:34:57 -04:00
Steven Rostedt (Red Hat)	575380da8b	tracing: Only clear trace buffer on module unload if event was traced Currently, when a module with events is unloaded, the trace buffer is cleared. This is just a safety net in case the module might have some strange callback when its event is outputted. But there's no reason to reset the buffer if the module didn't have any of its events traced. Add a flag to the event "call" structure called WAS_ENABLED and gets set when the event is ever enabled, and this flag never gets cleared. When a module gets unloaded, if any of its events have this flag set, then the trace buffer will get cleared. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:34:56 -04:00
Steven Rostedt (Red Hat)	f1dc672588	ring-buffer: Init waitqueue for blocked readers The move of blocked readers to the ring buffer left out the init of the wait queue that is used. Tests missed this due to running stress tests against the buffers, which didn't allow for any readers to end up waiting. Running a simple read and wait triggered a bug. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:34:55 -04:00
Li Zefan	523c81135b	tracing: Fix some section mismatch warnings As we've added __init annotation to field-defining functions, we should add __refdata annotation to event_call variables, which reference those functions. Link: http://lkml.kernel.org/r/51343C1F.2050502@huawei.com Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:34:54 -04:00
Steven Rostedt (Red Hat)	315326c16a	tracing: Fix trace events build without modules The new multi-buffers added a descriptor that kept track of module events, and the directories they use, with struct ftace_module_file_ops. This is used to add a ref count to keep modules from unloading while their files are being accessed. As the descriptor is only needed when CONFIG_MODULES is enabled, it is only declared when the config is enabled. But that struct is dereferenced in a few areas outside the #ifdef CONFIG_MODULES. By adding some helper routines and moving code around a little, events can be compiled again without modules. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:34:53 -04:00
Steven Rostedt (Red Hat)	34ef61b1fa	tracing: Add __per_cpu annotation to trace array percpu data pointer With the conversion of the data array to per cpu, sparse now complains about the use of per_cpu_ptr() on the variable. But The variable is allocated with alloc_percpu() and is fine to use. But since the structure that contains the data variable does not annotate it as such, sparse gives out a lot of false warnings. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:34:52 -04:00
Li Zefan	b8aae39fc5	tracing/syscalls: Annotate field-defining functions with __init These two functions are called during kernel boot only. Link: http://lkml.kernel.org/r/51258796.7020704@huawei.com Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:34:52 -04:00
Li Zefan	7e4f44b153	tracing: Annotate event field-defining functions with __init Those functions are called either during kernel boot or module init. Before: $ dmesg \| grep 'Freeing unused kernel memory' Freeing unused kernel memory: 1208k freed Freeing unused kernel memory: 1360k freed Freeing unused kernel memory: 1960k freed After: $ dmesg \| grep 'Freeing unused kernel memory' Freeing unused kernel memory: 1236k freed Freeing unused kernel memory: 1388k freed Freeing unused kernel memory: 1960k freed Link: http://lkml.kernel.org/r/5125877D.5000201@huawei.com Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:34:51 -04:00
Li Zefan	f71130de5c	tracing: Add a helper function for event print functions Move duplicate code in event print functions to a helper function. This shrinks the size of the kernel by ~13K. text data bss dec hex filename 6596137 1743966 10138672 18478775 `119f6b7` vmlinux.o.old 6583002 1743849 10138672 18465523 119c2f3 vmlinux.o.new Link: http://lkml.kernel.org/r/51258746.2060304@huawei.com Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:34:51 -04:00
Steven Rostedt (Red Hat)	15693458c4	tracing/ring-buffer: Move poll wake ups into ring buffer code Move the logic to wake up on ring buffer data into the ring buffer code itself. This simplifies the tracing code a lot and also has the added benefit that waiters on one of the instance buffers can be woken only when data is added to that instance instead of data added to any instance. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:34:50 -04:00
Steven Rostedt	b627344fef	tracing: Fix read blocking on trace_pipe_raw If the ring buffer is empty, a read to trace_pipe_raw wont block. The tracing code has the infrastructure to wake up waiting readers, but the trace_pipe_raw doesn't take advantage of that. When a read is done to trace_pipe_raw without the O_NONBLOCK flag set, have the read block until there's data in the requested buffer. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:34:49 -04:00
Steven Rostedt	cc60cdc952	tracing: Fix polling on trace_pipe_raw The trace_pipe_raw never implemented polling and this was casing issues for several utilities. This is now implemented. Blocked reads still are on the TODO list. Reported-by: Mauro Carvalho Chehab <mchehab@redhat.com> Tested-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:34:49 -04:00
Steven Rostedt (Red Hat)	189e5784f6	tracing: Do not block on splice if either file or splice NONBLOCK flag is set Currently only the splice NONBLOCK flag is checked to determine if the splice read should block or not. But the file descriptor NONBLOCK flag also needs to be checked. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:34:48 -04:00
Steven Rostedt	92edca073c	tracing: Use direct field, type and system names The names used to display the field and type in the event format files are copied, as well as the system name that is displayed. All these names are created by constant values passed in. If one of theses values were to be removed by a module, the module would also be required to remove any event it created. By using the strings directly, we can save over 100K of memory. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:34:47 -04:00
Steven Rostedt	d1a291437f	tracing: Use kmem_cache_alloc instead of kmalloc in trace_events.c The event structures used by the trace events are mostly persistent, but they are also allocated by kmalloc, which is not the best at allocating space for what is used. By converting these kmallocs into kmem_cache_allocs, we can save over 50K of space that is permanently allocated. After boot we have: slab name active allocated size --------- ------ --------- ---- ftrace_event_file 979 1005 56 67 1 ftrace_event_field 2301 2310 48 77 1 The ftrace_event_file has at boot up 979 active objects out of 1005 allocated in the slabs. Each object is 56 bytes. In a normal kmalloc, that would allocate 64 bytes for each object. 1005 - 979 = 26 objects not used 26 * 56 = 1456 bytes wasted But if we used kmalloc: 64 - 56 = 8 bytes unused per allocation 8 * 979 = 7832 bytes wasted 7832 - 1456 = 6376 bytes in savings Doing the same for ftrace_event_field where there's 2301 objects allocated in a slab that can hold 2310 with 48 bytes each we have: 2310 - 2301 = 9 objects not used 9 * 48 = 432 bytes wasted A kmalloc would also use 64 bytes per object: 64 - 48 = 16 bytes unused per allocation 16 * 2301 = 36816 bytes wasted! 36816 - 432 = 36384 bytes in savings This change gives us a total of 42760 bytes in savings. At least on my machine, but as there's a lot of these persistent objects for all configurations that use trace points, this is a net win. Thanks to Ezequiel Garcia for his trace_analyze presentation which pointed out the wasted space in my code. Cc: Ezequiel Garcia <elezegarcia@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:34:46 -04:00
Steven Rostedt	772482216f	tracing: Get trace_events kernel command line working again With the new descriptors used to allow multiple buffers in the tracing directory added, the kernel command line parameter trace_events=... no longer works. This is because the top level (global) trace array now has a list of descriptors associated with the events and the files in the debugfs directory. But in early bootup, when the command line is processed and the events enabled, the trace array list of events has not been set up yet. Without the list of events in the trace array, the setting of events to record will fail because it would not match any events. The solution is to set up the top level array in two stages. The first is to just add the ftrace file descriptors that just point to the events. This will allow events to be enabled and start tracing. The second stage is called after the filesystem is set up, and this stage will create the debugfs event files and directories associated with the trace array events. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:34:46 -04:00
Steven Rostedt	0c8916c342	tracing: Add rmdir to remove multibuffer instances Add a method to the hijacked dentry descriptor of the "instances" directory to allow for rmdir to remove an instance of a multibuffer. Example: cd /debug/tracing/instances mkdir hello ls hello/ rmdir hello ls Like the mkdir method, the i_mutex is dropped for the instances directory. The instances directory is created at boot up and can not be renamed or removed. The trace_types_lock mutex is used to synchronize adding and removing of instances. I've run several stress tests with different threads trying to create and delete directories of the same name, and it has stood up fine. Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:34:45 -04:00
Steven Rostedt	277ba04461	tracing: Add interface to allow multiple trace buffers Add the interface ("instances" directory) to add multiple buffers to ftrace. To create a new instance, simply do a mkdir in the instances directory: This will create a directory with the following: # cd instances # mkdir foo # ls foo buffer_size_kb free_buffer trace_clock trace_pipe buffer_total_size_kb set_event trace_marker tracing_enabled events/ trace trace_options tracing_on Currently only events are able to be set, and there isn't a way to delete a buffer when one is created (yet). Note, the i_mutex lock is dropped from the parent "instances" directory during the mkdir operation. As the "instances" directory can not be renamed or deleted (created on boot), I do not see any harm in dropping the lock. The creation of the sub directories is protected by trace_types_lock mutex, which only lets one instance get into the code path at a time. If two tasks try to create or delete directories of the same name, only one will occur and the other will fail with -EEXIST. Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:34:44 -04:00
Steven Rostedt	12ab74ee00	tracing: Make syscall events suitable for multiple buffers Currently the syscall events record into the global buffer. But if multiple buffers are in place, then we need to have syscall events record in the proper buffers. By adding descriptors to pass to the syscall event functions, the syscall events can now record into the buffers that have been assigned to them (one event may be applied to mulitple buffers). This will allow tracing high volume syscalls along with seldom occurring syscalls without losing the seldom syscall events. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:34:44 -04:00
Steven Rostedt	a7603ff4b5	tracing: Replace the static global per_cpu arrays with allocated per_cpu The global and max-tr currently use static per_cpu arrays for the CPU data descriptors. But in order to get new allocated trace_arrays, they need to be allocated per_cpu arrays. Instead of using the static arrays, switch the global and max-tr to use allocated data. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:34:43 -04:00
Steven Rostedt	ccb469a198	tracing: Pass the ftrace_file to the buffer lock reserve code Pass the struct ftrace_event_file *ftrace_file to the trace_event_buffer_lock_reserve() (new function that replaces the trace_current_buffer_lock_reserver()). The ftrace_file holds a pointer to the trace_array that is in use. In the case of multiple buffers with different trace_arrays, this allows different events to be recorded into different buffers. Also fixed some of the stale comments in include/trace/ftrace.h Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:34:42 -04:00
Steven Rostedt	2b6080f28c	tracing: Encapsulate global_trace and remove dependencies on global vars The global_trace variable in kernel/trace/trace.c has been kept 'static' and local to that file so that it would not be used too much outside of that file. This has paid off, even though there were lots of changes to make the trace_array structure more generic (not depending on global_trace). Removal of a lot of direct usages of global_trace is needed to be able to create more trace_arrays such that we can add multiple buffers. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:34:42 -04:00
Steven Rostedt	ae3b5093ad	tracing: Use RING_BUFFER_ALL_CPUS for TRACE_PIPE_ALL_CPU Both RING_BUFFER_ALL_CPUS and TRACE_PIPE_ALL_CPU are defined as -1 and used to say that all the ring buffers are to be modified or read (instead of just a single cpu, which would be >= 0). There's no reason to keep TRACE_PIPE_ALL_CPU as it is also started to be used for more than what it was created for, and now that the ring buffer code added a generic RING_BUFFER_ALL_CPUS define, we can clean up the trace code to use that instead and remove the TRACE_PIPE_ALL_CPU macro. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:34:41 -04:00
Steven Rostedt	ae63b31e4d	tracing: Separate out trace events from global variables The trace events for ftrace are all defined via global variables. The arrays of events and event systems are linked to a global list. This prevents multiple users of the event system (what to enable and what not to). By adding descriptors to represent the event/file relation, as well as to which trace_array descriptor they are associated with, allows for more than one set of events to be defined. Once the trace events files have a link between the trace event and the trace_array they are associated with, we can create multiple trace_arrays that can record separate events in separate buffers. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-15 00:34:40 -04:00
Steven Rostedt (Red Hat)	613f04a0f5	tracing: Prevent buffer overwrite disabled for latency tracers The latency tracers require the buffers to be in overwrite mode, otherwise they get screwed up. Force the buffers to stay in overwrite mode when latency tracers are enabled. Added a flag_changed() method to the tracer structure to allow the tracers to see what flags are being changed, and also be able to prevent the change from happing. Cc: stable@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-14 23:40:21 -04:00
Steven Rostedt (Red Hat)	8090282265	tracing: Keep overwrite in sync between regular and snapshot buffers Changing the overwrite mode for the ring buffer via the trace option only sets the normal buffer. But the snapshot buffer could swap with it, and then the snapshot would be in non overwrite mode and the normal buffer would be in overwrite mode, even though the option flag states otherwise. Keep the two buffers overwrite modes in sync. Cc: stable@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-14 23:40:15 -04:00
Steven Rostedt (Red Hat)	69d34da298	tracing: Protect tracer flags with trace_types_lock Seems that the tracer flags have never been protected from synchronous writes. Luckily, admins don't usually modify the tracing flags via two different tasks. But if scripts were to be used to modify them, then they could get corrupted. Move the trace_types_lock that protects against tracers changing to also protect the flags being set. Cc: stable@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-14 13:50:56 -04:00
Ingo Molnar	0b34083f46	Merge branch 'tip/perf/urgent-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/urgent Pull tracing fixes from Steven Rostedt. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-03-14 08:12:20 +01:00
Tejun Heo	2e109a2855	workqueue: rename workqueue_lock to wq_mayday_lock With the recent locking updates, the only thing protected by workqueue_lock is workqueue->maydays list. Rename workqueue_lock to wq_mayday_lock. This patch is pure rename. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-13 19:47:40 -07:00
Tejun Heo	794b18bc8a	workqueue: separate out pool_workqueue locking into pwq_lock This patch continues locking cleanup from the previous patch. It breaks out pool_workqueue synchronization from workqueue_lock into a new spinlock - pwq_lock. The followings are protected by pwq_lock. * workqueue->pwqs * workqueue->saved_max_active The conversion is straight-forward. workqueue_lock usages which cover the above two are converted to pwq_lock. New locking label PW added for things protected by pwq_lock and FR is updated to mean flush_mutex + pwq_lock + sched-RCU. This patch shouldn't introduce any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-13 19:47:40 -07:00
Tejun Heo	5bcab3355a	workqueue: separate out pool and workqueue locking into wq_mutex Currently, workqueue_lock protects most shared workqueue resources - the pools, workqueues, pool_workqueues, draining, ID assignments, mayday handling and so on. The coverage has grown organically and there is no identified bottleneck coming from workqueue_lock, but it has grown a bit too much and scheduled rebinding changes need the pools and workqueues to be protected by a mutex instead of a spinlock. This patch breaks out pool and workqueue synchronization from workqueue_lock into a new mutex - wq_mutex. The followings are protected by wq_mutex. * worker_pool_idr and unbound_pool_hash * pool->refcnt * workqueues list * workqueue->flags, ->nr_drainers Most changes are mostly straight-forward. workqueue_lock is replaced with wq_mutex where applicable and workqueue_lock lock/unlocks are added where wq_mutex conversion leaves data structures not protected by wq_mutex without locking. irq / preemption flippings were added where the conversion affects them. Things worth noting are * New WQ and WR locking lables added along with assert_rcu_or_wq_mutex(). * worker_pool_assign_id() now expects to be called under wq_mutex. * create_mutex is removed from get_unbound_pool(). It now just holds wq_mutex. This patch shouldn't introduce any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-13 19:47:40 -07:00
Tejun Heo	7d19c5ce66	workqueue: relocate global variable defs and function decls in workqueue.c They're split across debugobj code for some reason. Collect them. This patch is pure relocation. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-13 19:47:40 -07:00
Tejun Heo	cd549687a7	workqueue: better define locking rules around worker creation / destruction When a manager creates or destroys workers, the operations are always done with the manager_mutex held; however, initial worker creation or worker destruction during pool release don't grab the mutex. They are still correct as initial worker creation doesn't require synchronization and grabbing manager_arb provides enough exclusion for pool release path. Still, let's make everyone follow the same rules for consistency and such that lockdep annotations can be added. Update create_and_start_worker() and put_unbound_pool() to grab manager_mutex around thread creation and destruction respectively and add lockdep assertions to create_worker() and destroy_worker(). This patch doesn't introduce any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-13 19:47:39 -07:00
Tejun Heo	ebf44d16ec	workqueue: factor out initial worker creation into create_and_start_worker() get_unbound_pool(), workqueue_cpu_up_callback() and init_workqueues() have similar code pieces to create and start the initial worker factor those out into create_and_start_worker(). This patch doesn't introduce any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-13 19:47:39 -07:00
Tejun Heo	bc3a1afc92	workqueue: rename worker_pool->assoc_mutex to ->manager_mutex Manager operations are currently governed by two mutexes - pool->manager_arb and ->assoc_mutex. The former is used to decide who gets to be the manager and the latter to exclude the actual manager operations including creation and destruction of workers. Anyone who grabs ->manager_arb must perform manager role; otherwise, the pool might stall. Grabbing ->assoc_mutex blocks everyone else from performing manager operations but doesn't require the holder to perform manager duties as it's merely blocking manager operations without becoming the manager. Because the blocking was necessary when [dis]associating per-cpu workqueues during CPU hotplug events, the latter was named assoc_mutex. The mutex is scheduled to be used for other purposes, so this patch gives it a more fitting generic name - manager_mutex - and updates / adds comments to explain synchronization around the manager role and operations. This patch is pure rename / doc update. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-13 19:47:39 -07:00
Tejun Heo	8425e3d5bd	workqueue: inline trivial wrappers There's no reason to make these trivial wrappers full (exported) functions. Inline the followings. queue_work() queue_delayed_work() mod_delayed_work() schedule_work_on() schedule_work() schedule_delayed_work_on() schedule_delayed_work() keventd_up() Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-13 16:51:36 -07:00
Tejun Heo	611c92a020	workqueue: rename @id to @pi in for_each_each_pool() Rename @id argument of for_each_pool() to @pi so that it doesn't get reused accidentally when for_each_pool() is used in combination with other iterators. This patch is purely cosmetic. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-13 16:51:36 -07:00
Tejun Heo	c5aa87bbf4	workqueue: update comments and a warning message * Update incorrect and add missing synchronization labels. * Update incorrect or misleading comments. Add new comments where clarification is necessary. Reformat / rephrase some comments. * drain_workqueue() can be used separately from destroy_workqueue() but its warning message was incorrectly referring to destruction. Other than the warning message change, this patch doesn't make any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-13 16:51:36 -07:00
Tejun Heo	983ca25e73	workqueue: fix max_active handling in init_and_link_pwq() Since `9e8cd2f589` ("workqueue: implement apply_workqueue_attrs()"), init_and_link_pwq() may be called to initialize a new pool_workqueue for a workqueue which is already online, but the function was setting pwq->max_active to wq->saved_max_active without proper synchronization. Fix it by calling pwq_adjust_max_active() under proper locking instead of manually setting max_active. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-13 16:51:35 -07:00
Tejun Heo	699ce097ef	workqueue: implement and use pwq_adjust_max_active() Rename pwq_set_max_active() to pwq_adjust_max_active() and move pool_workqueue->max_active synchronization and max_active determination logic into it. The new function should be called with workqueue_lock held for stable workqueue->saved_max_active, determines the current max_active value the target pool_workqueue should be using from @wq->saved_max_active and the state of the associated pool, and applies it with proper synchronization. The current two users - workqueue_set_max_active() and thaw_workqueues() - are updated accordingly. In addition, the manual freezing handling in __alloc_workqueue_key() and freeze_workqueues_begin() are replaced with calls to pwq_adjust_max_active(). This centralizes max_active handling so that it's less error-prone. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-13 16:51:35 -07:00
Tejun Heo	0fbd95aa8a	workqueue: relocate pwq_set_max_active() pwq_set_max_active() is gonna be modified and used during pool_workqueue init. Move it above init_and_link_pwq(). This patch is pure code reorganization and doesn't introduce any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-13 16:51:35 -07:00
Linus Torvalds	842d223f28	Merge branch 'akpm' (fixes from Andrew) Merge misc fixes from Andrew Morton: - A bunch of fixes - Finish off the idr API conversions before someone starts to use the old interfaces again. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: idr: idr_alloc() shouldn't trigger lowmem warning when preloaded UAPI: fix endianness conditionals in M32R's asm/stat.h UAPI: fix endianness conditionals in linux/raid/md_p.h UAPI: fix endianness conditionals in linux/acct.h UAPI: fix endianness conditionals in linux/aio_abi.h decompressors: fix typo "POWERPC" mm/fremap.c: fix oops on error path idr: deprecate idr_pre_get() and idr_get_new[_above]() tidspbridge: convert to idr_alloc() zcache: convert to idr_alloc() mlx4: remove leftover idr_pre_get() call workqueue: convert to idr_alloc() nfsd: convert to idr_alloc() nfsd: remove unused get_new_stid() kernel/signal.c: use __ARCH_HAS_SA_RESTORER instead of SA_RESTORER signal: always clear sa_restorer on execve mm: remove_memory(): fix end_pfn setting include/linux/res_counter.h needs errno.h	2013-03-13 15:21:57 -07:00
Tejun Heo	e68035fb65	workqueue: convert to idr_alloc() idr_get_new*() and friends are about to be deprecated. Convert to the new idr_alloc() interface. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-03-13 15:21:46 -07:00
Andrew Morton	522cff142d	kernel/signal.c: use __ARCH_HAS_SA_RESTORER instead of SA_RESTORER __ARCH_HAS_SA_RESTORER is the preferred conditional for use in 3.9 and later kernels, per Kees. Cc: Emese Revfy <re.emese@gmail.com> Cc: Emese Revfy <re.emese@gmail.com> Cc: PaX Team <pageexec@freemail.hu> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: Julien Tinnes <jln@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-03-13 15:21:45 -07:00
Kees Cook	2ca39528c0	signal: always clear sa_restorer on execve When the new signal handlers are set up, the location of sa_restorer is not cleared, leaking a parent process's address space location to children. This allows for a potential bypass of the parent's ASLR by examining the sa_restorer value returned when calling sigaction(). Based on what should be considered "secret" about addresses, it only matters across the exec not the fork (since the VMAs haven't changed until the exec). But since exec sets SIG_DFL and keeps sa_restorer, this is where it should be fixed. Given the few uses of sa_restorer, a "set" function was not written since this would be the only use. Instead, we use __ARCH_HAS_SA_RESTORER, as already done in other places. Example of the leak before applying this patch: $ cat /proc/$$/maps ... 7fb9f3083000-7fb9f3238000 r-xp 00000000 fd:01 404469 .../libc-2.15.so ... $ ./leak ... 7f278bc74000-7f278be29000 r-xp 00000000 fd:01 404469 .../libc-2.15.so ... 1 0 (nil) 0x7fb9f30b94a0 2 4000000 (nil) 0x7f278bcaa4a0 3 4000000 (nil) 0x7f278bcaa4a0 4 0 (nil) 0x7fb9f30b94a0 ... [akpm@linux-foundation.org: use SA_RESTORER for backportability] Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Emese Revfy <re.emese@gmail.com> Cc: Emese Revfy <re.emese@gmail.com> Cc: PaX Team <pageexec@freemail.hu> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: Julien Tinnes <jln@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-03-13 15:21:44 -07:00
Eric W. Biederman	e66eded830	userns: Don't allow CLONE_NEWUSER \| CLONE_FS Don't allowing sharing the root directory with processes in a different user namespace. There doesn't seem to be any point, and to allow it would require the overhead of putting a user namespace reference in fs_struct (for permission checks) and incrementing that reference count on practically every call to fork. So just perform the inexpensive test of forbidding sharing fs_struct acrosss processes in different user namespaces. We already disallow other forms of threading when unsharing a user namespace so this should be no real burden in practice. This updates setns, clone, and unshare to disallow multiple user namespaces sharing an fs_struct. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-03-13 15:00:20 -07:00
Steven Rostedt (Red Hat)	740466bc89	tracing: Fix free of probe entry by calling call_rcu_sched() Because function tracing is very invasive, and can even trace calls to rcu_read_lock(), RCU access in function tracing is done with preempt_disable_notrace(). This requires a synchronize_sched() for updates and not a synchronize_rcu(). Function probes (traceon, traceoff, etc) must be freed after a synchronize_sched() after its entry has been removed from the hash. But call_rcu() is used. Fix this by using call_rcu_sched(). Also fix the usage to use hlist_del_rcu() instead of hlist_del(). Cc: stable@vger.kernel.org Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-13 17:57:44 -04:00
Randy Dunlap	6c23cbbd50	futex: fix kernel-doc notation and spello Fix kernel-doc warning in futex.c and convert 'Returns' to the new Return: kernel-doc notation format. Warning(kernel/futex.c:2286): Excess function parameter 'clockrt' description in 'futex_wait_requeue_pi' Fix one spello. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-03-12 20:42:10 -07:00
Randy Dunlap	20f22ab42e	signals: fix new kernel-doc warnings Fix new kernel-doc warnings in kernel/signal.c: Warning(kernel/signal.c:2689): No description found for parameter 'uset' Warning(kernel/signal.c:2689): Excess function parameter 'set' description in 'sys_rt_sigpending' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-03-12 20:42:10 -07:00
Tejun Heo	e626761691	workqueue: implement current_is_workqueue_rescuer() Implement a function which queries whether it currently is running off a workqueue rescuer. This will be used to convert writeback to workqueue. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-12 17:42:01 -07:00
Li Zefan	80f36c2a1a	cgroup: remove useless code in cgroup_write_event_control() eventfd_poll() never returns POLLHUP. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-12 15:36:00 -07:00
Li Zefan	6ee211ad0a	cgroup: don't bother to resize pid array When we open cgroup.procs, we'll allocate an buffer and store all tasks' tgid in it, and then duplicate entries will be stripped. If that results in a much smaller pid list, we'll re-allocate a smaller buffer. But we've already sucessfully allocated memory and reading the procs file is a short period and the memory will be freed very soon, so why bother to re-allocate memory. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-12 15:36:00 -07:00
Li Zefan	d7eeac1913	cgroup: hold cgroup_mutex before calling css_offline() cpuset no longer nests cgroup_mutex inside cpu_hotplug lock, so we don't have to release cgroup_mutex before calling css_offline(). Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-12 15:35:59 -07:00
Li Zefan	6dc01181ea	cgroup: remove unused variables in cgroup_destroy_locked() Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-12 15:35:58 -07:00
Li Zefan	e7b2dcc52b	cgroup: remove cgroup_is_descendant() It was used by ns cgroup, and ns cgroup was removed long ago. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-12 15:35:58 -07:00
Li Zefan	cfb5966bef	cpuset: fix RCU lockdep splat in cpuset_print_task_mems_allowed() Sasha reported a lockdep warning when OOM was triggered. The reason is cgroup_name() should be called with rcu_read_lock() held. Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-12 14:25:25 -07:00
Lai Jiangshan	362f2b098b	async: rename and redefine async_func_ptr A function type is typically defined as typedef ret_type (*func)(args..) but async_func_ptr is not. Redefine it. Also rename async_func_ptr to async_func_t for _func_t suffix is more generic. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com>	2013-03-12 13:59:14 -07:00
Lai Jiangshan	92266d6ef6	async: simplify lowest_in_progress() The code in lowest_in_progress() are duplicated in two branches, simplify them. tj: Minor indentation adjustment. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com>	2013-03-12 13:59:13 -07:00
Tejun Heo	226223ab3c	workqueue: implement sysfs interface for workqueues There are cases where workqueue users want to expose control knobs to userland. e.g. Unbound workqueues with custom attributes are scheduled to be used for writeback workers and depending on configuration it can be useful to allow admins to tinker with the priority or allowed CPUs. This patch implements workqueue_sysfs_register(), which makes the workqueue visible under /sys/bus/workqueue/devices/WQ_NAME. There currently are two attributes common to both per-cpu and unbound pools and extra attributes for unbound pools including nice level and cpumask. If alloc_workqueue*() is called with WQ_SYSFS, workqueue_sysfs_register() is called automatically as part of workqueue creation. This is the preferred method unless the workqueue user wants to apply workqueue_attrs before making the workqueue visible to userland. v2: Disallow exposing ordered workqueues as ordered workqueues can't be tuned in any way. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-12 11:37:07 -07:00
Tejun Heo	8719dceae2	workqueue: reject adjusting max_active or applying attrs to ordered workqueues Adjusting max_active of or applying new workqueue_attrs to an ordered workqueue breaks its ordering guarantee. The former is obvious. The latter is because applying attrs creates a new pwq (pool_workqueue) and there is no ordering constraint between the old and new pwqs. Make apply_workqueue_attrs() and workqueue_set_max_active() trigger WARN_ON() if those operations are requested on an ordered workqueue and fail / ignore respectively. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-12 11:30:04 -07:00
Tejun Heo	618b01eb42	workqueue: make it clear that WQ_DRAINING is an internal flag We're gonna add another internal WQ flag. Let's make the distinction clear. Prefix WQ_DRAINING with __ and move it to bit 16. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-12 11:30:04 -07:00
Tejun Heo	9e8cd2f589	workqueue: implement apply_workqueue_attrs() Implement apply_workqueue_attrs() which applies workqueue_attrs to the specified unbound workqueue by creating a new pwq (pool_workqueue) linked to worker_pool with the specified attributes. A new pwq is linked at the head of wq->pwqs instead of tail and __queue_work() verifies that the first unbound pwq has positive refcnt before choosing it for the actual queueing. This is to cover the case where creation of a new pwq races with queueing. As base ref on a pwq won't be dropped without making another pwq the first one, __queue_work() is guaranteed to make progress and not add work item to a dead pwq. init_and_link_pwq() is updated to return the last first pwq the new pwq replaced, which is put by apply_workqueue_attrs(). Note that apply_workqueue_attrs() is almost identical to unbound pwq part of alloc_and_link_pwqs(). The only difference is that there is no previous first pwq. apply_workqueue_attrs() is implemented to handle such cases and replaces unbound pwq handling in alloc_and_link_pwqs(). Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-12 11:30:04 -07:00
Tejun Heo	c9178087ac	workqueue: perform non-reentrancy test when queueing to unbound workqueues too Because per-cpu workqueues have multiple pwqs (pool_workqueues) to serve the CPUs, to guarantee that a single work item isn't queued on one pwq while still executing another, __queue_work() takes a look at the previous pool the target work item was on and if it's still executing there, queue the work item on that pool. To support changing workqueue_attrs on the fly, unbound workqueues too will have multiple pwqs and thus need non-reentrancy test when queueing. This patch modifies __queue_work() such that the reentrancy test is performed regardless of the workqueue type. per_cpu_ptr(wq->cpu_pwqs, cpu) used to be used to determine the matching pwq for the last pool. This can't be used for unbound workqueues and is replaced with worker->current_pwq which also happens to be simpler. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-12 11:30:04 -07:00
Tejun Heo	75ccf5950f	workqueue: prepare flush_workqueue() for dynamic creation and destrucion of unbound pool_workqueues Unbound pwqs (pool_workqueues) will be dynamically created and destroyed with the scheduled unbound workqueue w/ custom attributes support. This patch synchronizes pwq linking and unlinking against flush_workqueue() so that its operation isn't disturbed by pwqs coming and going. Linking and unlinking a pwq into wq->pwqs is now protected also by wq->flush_mutex and a new pwq's work_color is initialized to wq->work_color during linking. This ensures that pwqs changes don't disturb flush_workqueue() in progress and the new pwq's work coloring stays in sync with the rest of the workqueue. flush_mutex during unlinking isn't strictly necessary but it's simpler to do it anyway. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-12 11:30:04 -07:00
Tejun Heo	8864b4e59f	workqueue: implement get/put_pwq() Add pool_workqueue->refcnt along with get/put_pwq(). Both per-cpu and unbound pwqs have refcnts and any work item inserted on a pwq increments the refcnt which is dropped when the work item finishes. For per-cpu pwqs the base ref is never dropped and destroy_workqueue() frees the pwqs as before. For unbound ones, destroy_workqueue() simply drops the base ref on the first pwq. When the refcnt reaches zero, pwq_unbound_release_workfn() is scheduled on system_wq, which unlinks the pwq, puts the associated pool and frees the pwq and wq as necessary. This needs to be done from a work item as put_pwq() needs to be protected by pool->lock but release can't happen with the lock held - e.g. put_unbound_pool() involves blocking operations. Unbound pool->locks are marked with lockdep subclas 1 as put_pwq() will schedule the release work item on system_wq while holding the unbound pool's lock and triggers recursive locking warning spuriously. This will be used to implement dynamic creation and destruction of unbound pwqs. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-12 11:30:04 -07:00
Tejun Heo	d2c1d40487	workqueue: restructure __alloc_workqueue_key() * Move initialization and linking of pool_workqueues into init_and_link_pwq(). * Make the failure path use destroy_workqueue() once pool_workqueue initialization succeeds. These changes are to prepare for dynamic management of pool_workqueues and don't introduce any functional changes. While at it, convert list_del(&wq->list) to list_del_init() as a precaution as scheduled changes will make destruction more complex. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-12 11:30:04 -07:00
Tejun Heo	493008a8e4	workqueue: drop WQ_RESCUER and test workqueue->rescuer for NULL instead WQ_RESCUER is superflous. WQ_MEM_RECLAIM indicates that the user wants a rescuer and testing wq->rescuer for NULL can answer whether a given workqueue has a rescuer or not. Drop WQ_RESCUER and test wq->rescuer directly. This will help simplifying __alloc_workqueue_key() failure path by allowing it to use destroy_workqueue() on a partially constructed workqueue, which in turn will help implementing dynamic management of pool_workqueues. While at it, clear wq->rescuer after freeing it in destroy_workqueue(). This is a precaution as scheduled changes will make destruction more complex. This patch doesn't introduce any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-12 11:30:03 -07:00
Tejun Heo	ac6104cdf8	workqueue: add pool ID to the names of unbound kworkers There are gonna be multiple unbound pools. Include pool ID in the name of unbound kworkers. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-12 11:30:03 -07:00
Tejun Heo	f02ae73aaa	workqueue: drop "std" from cpu_std_worker_pools and for_each_std_worker_pool() All per-cpu pools are standard, so there's no need to use both "cpu" and "std" and for_each_std_worker_pool() is confusing in that it can be used only for per-cpu pools. * s/cpu_std_worker_pools/cpu_worker_pools/ * s/for_each_std_worker_pool()/for_each_cpu_worker_pool()/ Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-12 11:30:03 -07:00
Tejun Heo	7a62c2c87e	workqueue: remove unbound_std_worker_pools[] and related helpers Workqueue no longer makes use of unbound_std_worker_pools[]. All unbound worker_pools are created dynamically and there's nothing special about the standard ones. With unbound_std_worker_pools[] unused, workqueue no longer has places where it needs to treat the per-cpu pools-cpu and unbound pools together. Remove unbound_std_worker_pools[] and the helpers wrapping it to present unified per-cpu and unbound standard worker_pools. * for_each_std_worker_pool() now only walks through per-cpu pools. * for_each[_online]_wq_cpu() which don't have any users left are removed. * std_worker_pools() and std_worker_pool_pri() are unused and removed. * get_std_worker_pool() is removed. Its only user - alloc_and_link_pwqs() - only used it for per-cpu pools anyway. Open code per_cpu access in alloc_and_link_pwqs() instead. This patch doesn't introduce any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-12 11:30:03 -07:00
Tejun Heo	29c91e9912	workqueue: implement attribute-based unbound worker_pool management This patch makes unbound worker_pools reference counted and dynamically created and destroyed as workqueues needing them come and go. All unbound worker_pools are hashed on unbound_pool_hash which is keyed by the content of worker_pool->attrs. When an unbound workqueue is allocated, get_unbound_pool() is called with the attributes of the workqueue. If there already is a matching worker_pool, the reference count is bumped and the pool is returned. If not, a new worker_pool with matching attributes is created and returned. When an unbound workqueue is destroyed, put_unbound_pool() is called which decrements the reference count of the associated worker_pool. If the refcnt reaches zero, the worker_pool is destroyed in sched-RCU safe way. Note that the standard unbound worker_pools - normal and highpri ones with no specific cpumask affinity - are no longer created explicitly during init_workqueues(). init_workqueues() only initializes workqueue_attrs to be used for standard unbound pools - unbound_std_wq_attrs[]. The pools are spawned on demand as workqueues are created. v2: - Comment added to init_worker_pool() explaining that @pool should be in a condition which can be passed to put_unbound_pool() even on failure. - pool->refcnt reaching zero and the pool being removed from unbound_pool_hash should be dynamic. pool->refcnt is converted to int from atomic_t and now manipulated inside workqueue_lock. - Removed an incorrect sanity check on nr_idle in put_unbound_pool() which may trigger spuriously. All changes were suggested by Lai Jiangshan. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-12 11:30:03 -07:00
Tejun Heo	7a4e344c56	workqueue: introduce workqueue_attrs Introduce struct workqueue_attrs which carries worker attributes - currently the nice level and allowed cpumask along with helper routines alloc_workqueue_attrs() and free_workqueue_attrs(). Each worker_pool now carries ->attrs describing the attributes of its workers. All functions dealing with cpumask and nice level of workers are updated to follow worker_pool->attrs instead of determining them from other characteristics of the worker_pool, and init_workqueues() is updated to set worker_pool->attrs appropriately for all standard pools. Note that create_worker() is updated to always perform set_user_nice() and use set_cpus_allowed_ptr() combined with manual assertion of PF_THREAD_BOUND instead of kthread_bind(). This simplifies handling random attributes without affecting the outcome. This patch doesn't introduce any behavior changes. v2: Missing cpumask_var_t definition caused build failure on some archs. linux/cpumask.h included. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: kbuild test robot <fengguang.wu@intel.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-12 11:30:00 -07:00
Tejun Heo	4e1a1f9a05	workqueue: separate out init_worker_pool() from init_workqueues() This will be used to implement unbound pools with custom attributes. This patch doesn't introduce any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-12 11:30:00 -07:00
Tejun Heo	34a06bd6b6	workqueue: replace POOL_MANAGING_WORKERS flag with worker_pool->manager_arb POOL_MANAGING_WORKERS is used to synchronize the manager role. Synchronizing among workers doesn't need blocking and that's why it's implemented as a flag. It got converted to a mutex a while back to add blocking wait from CPU hotplug path - `6037315269` ("workqueue: use mutex for global_cwq manager exclusion"). Later it turned out that synchronization among workers and cpu hotplug need to be done separately. Eventually, POOL_MANAGING_WORKERS is restored and workqueue->manager_mutex got morphed into workqueue->assoc_mutex - `552a37e936` ("workqueue: restore POOL_MANAGING_WORKERS") and `b2eb83d123` ("workqueue: rename manager_mutex to assoc_mutex"). Now, we're gonna need to be able to lock out managers from destroy_workqueue() to support multiple unbound pools with custom attributes making it again necessary to be able to block on the manager role. This patch replaces POOL_MANAGING_WORKERS with worker_pool->manager_arb. This patch doesn't introduce any behavior changes. v2: s/manager_mutex/manager_arb/ Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-12 11:30:00 -07:00
Tejun Heo	fa1b54e69b	workqueue: update synchronization rules on worker_pool_idr Make worker_pool_idr protected by workqueue_lock for writes and sched-RCU protected for reads. Lockdep assertions are added to for_each_pool() and get_work_pool() and all their users are converted to either hold workqueue_lock or disable preemption/irq. worker_pool_assign_id() is updated to hold workqueue_lock when allocating a pool ID. As idr_get_new() always performs RCU-safe assignment, this is enough on the writer side. As standard pools are never destroyed, there's nothing to do on that side. The locking is superflous at this point. This is to help implementation of unbound pools/pwqs with custom attributes. This patch doesn't introduce any behavior changes. v2: Updated for_each_pwq() use if/else for the hidden assertion statement instead of just if as suggested by Lai. This avoids confusing the following else clause. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-12 11:30:00 -07:00
Tejun Heo	76af4d9361	workqueue: update synchronization rules on workqueue->pwqs Make workqueue->pwqs protected by workqueue_lock for writes and sched-RCU protected for reads. Lockdep assertions are added to for_each_pwq() and first_pwq() and all their users are converted to either hold workqueue_lock or disable preemption/irq. alloc_and_link_pwqs() is updated to use list_add_tail_rcu() for consistency which isn't strictly necessary as the workqueue isn't visible. destroy_workqueue() isn't updated to sched-RCU release pwqs. This is okay as the workqueue should have on users left by that point. The locking is superflous at this point. This is to help implementation of unbound pools/pwqs with custom attributes. This patch doesn't introduce any behavior changes. v2: Updated for_each_pwq() use if/else for the hidden assertion statement instead of just if as suggested by Lai. This avoids confusing the following else clause. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-12 11:30:00 -07:00
Tejun Heo	7fb98ea79c	workqueue: replace get_pwq() with explicit per_cpu_ptr() accesses and first_pwq() get_pwq() takes @cpu, which can also be WORK_CPU_UNBOUND, and @wq and returns the matching pwq (pool_workqueue). We want to move away from using @cpu for identifying pools and pwqs for unbound pools with custom attributes and there is only one user - workqueue_congested() - which makes use of the WQ_UNBOUND conditional in get_pwq(). All other users already know whether they're dealing with a per-cpu or unbound workqueue. Replace get_pwq() with explicit per_cpu_ptr(wq->cpu_pwqs, cpu) for per-cpu workqueues and first_pwq() for unbound ones, and open-code WQ_UNBOUND conditional in workqueue_congested(). Note that this makes workqueue_congested() behave sligntly differently when @cpu other than WORK_CPU_UNBOUND is specified. It ignores @cpu for unbound workqueues and always uses the first pwq instead of oopsing. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-12 11:30:00 -07:00
Tejun Heo	420c0ddb1f	workqueue: remove workqueue_struct->pool_wq.single workqueue->pool_wq union is used to point either to percpu pwqs (pool_workqueues) or single unbound pwq. As the first pwq can be accessed via workqueue->pwqs list, there's no reason for the single pointer anymore. Use list_first_entry(workqueue->pwqs) to access the unbound pwq and drop workqueue->pool_wq.single pointer and the pool_wq union. It simplifies the code and eases implementing multiple unbound pools w/ custom attributes. This patch doesn't introduce any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-12 11:29:59 -07:00
Tejun Heo	d84ff0512f	workqueue: consistently use int for @cpu variables Workqueue is mixing unsigned int and int for @cpu variables. There's no point in using unsigned int for cpus - many of cpu related APIs take int anyway. Consistently use int for @cpu variables so that we can use negative values to mark special ones. This patch doesn't introduce any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-12 11:29:59 -07:00
Tejun Heo	493a1724fe	workqueue: add wokrqueue_struct->maydays list to replace mayday cpu iterators Similar to how pool_workqueue iteration used to be, raising and servicing mayday requests is based on CPU numbers. It's hairy because cpumask_t may not be able to handle WORK_CPU_UNBOUND and cpumasks are assumed to be always set on UP. This is ugly and can't handle multiple unbound pools to be added for unbound workqueues w/ custom attributes. Add workqueue_struct->maydays. When a pool_workqueue needs rescuing, it gets chained on the list through pool_workqueue->mayday_node and rescuer_thread() consumes the list until it's empty. This patch doesn't introduce any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-12 11:29:59 -07:00
Tejun Heo	24b8a84718	workqueue: restructure pool / pool_workqueue iterations in freeze/thaw functions The three freeze/thaw related functions - freeze_workqueues_begin(), freeze_workqueues_busy() and thaw_workqueues() - need to iterate through all pool_workqueues of all freezable workqueues. They did it by first iterating pools and then visiting all pwqs (pool_workqueues) of all workqueues and process it if its pwq->pool matches the current pool. This is rather backwards and done this way partly because workqueue didn't have fitting iteration helpers and partly to avoid the number of lock operations on pool->lock. Workqueue now has fitting iterators and the locking operation overhead isn't anything to worry about - those locks are unlikely to be contended and the same CPU visiting the same set of locks multiple times isn't expensive. Restructure the three functions such that the flow better matches the logical steps and pwq iteration is done using for_each_pwq() inside workqueue iteration. * freeze_workqueues_begin(): Setting of FREEZING is moved into a separate for_each_pool() iteration. pwq iteration for clearing max_active is updated as described above. * freeze_workqueues_busy(): pwq iteration updated as described above. * thaw_workqueues(): The single for_each_wq_cpu() iteration is broken into three discrete steps - clearing FREEZING, restoring max_active, and kicking workers. The first and last steps use for_each_pool() and the second step uses pwq iteration described above. This makes the code easier to understand and removes the use of for_each_wq_cpu() for walking pwqs, which can't support multiple unbound pwqs which will be needed to implement unbound workqueues with custom attributes. This patch doesn't introduce any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-12 11:29:58 -07:00
Tejun Heo	1711696955	workqueue: introduce for_each_pool() With the scheduled unbound pools with custom attributes, there will be multiple unbound pools, so it wouldn't be able to use for_each_wq_cpu() + for_each_std_worker_pool() to iterate through all pools. Introduce for_each_pool() which iterates through all pools using worker_pool_idr and use it instead of for_each_wq_cpu() + for_each_std_worker_pool() combination in freeze_workqueues_begin(). Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-12 11:29:58 -07:00
Tejun Heo	49e3cf44df	workqueue: replace for_each_pwq_cpu() with for_each_pwq() Introduce for_each_pwq() which iterates all pool_workqueues of a workqueue using the recently added workqueue->pwqs list and replace for_each_pwq_cpu() usages with it. This is primarily to remove the single unbound CPU assumption from pwq iteration for the scheduled unbound pools with custom attributes support which would introduce multiple unbound pwqs per workqueue; however, it also simplifies iterator users. Note that pwq->pool initialization is moved to alloc_and_link_pwqs() as that now is the only place which is explicitly handling the two pwq types. This patch doesn't introduce any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-12 11:29:58 -07:00
Tejun Heo	30cdf2496d	workqueue: add workqueue_struct->pwqs list Add workqueue_struct->pwqs list and chain all pool_workqueues belonging to a workqueue there. This will be used to implement generic pool_workqueue iteration and handle multiple pool_workqueues for the scheduled unbound pools with custom attributes. This patch doesn't introduce any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-12 11:29:57 -07:00
Tejun Heo	e904e6c266	workqueue: introduce kmem_cache for pool_workqueues pool_workqueues need to be aligned to 1 << WORK_STRUCT_FLAG_BITS as the lower bits of work->data are used for flags when they're pointing to pool_workqueues. Due to historical reasons, unbound pool_workqueues are allocated using kzalloc() with sufficient buffer area for alignment and aligned manually. The original pointer is stored at the end which free_pwqs() retrieves when freeing it. There's no reason for this hackery anymore. Set alignment of struct pool_workqueue to 1 << WORK_STRUCT_FLAG_BITS, add kmem_cache for pool_workqueues with proper alignment and replace the hacky alloc and free implementation with plain kmem_cache_zalloc/free(). In case WORK_STRUCT_FLAG_BITS gets shrunk too much and makes fields of pool_workqueues misaligned, trigger WARN if the alignment of struct pool_workqueue becomes smaller than that of long long. Note that assertion on IS_ALIGNED() is removed from alloc_pwqs(). We already have another one in pwq init loop in __alloc_workqueue_key(). This patch doesn't introduce any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-12 11:29:57 -07:00
Tejun Heo	e98d5b16cf	workqueue: make workqueue_lock irq-safe workqueue_lock will be used to synchronize areas which require irq-safety and there isn't much benefit in keeping it not irq-safe. Make it irq-safe. This patch doesn't introduce any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-12 11:29:57 -07:00
Tejun Heo	6183c009f6	workqueue: make sanity checks less punshing using WARN_ON[_ONCE]()s Workqueue has been using mostly BUG_ON()s for sanity checks, which fail unnecessarily harshly when the assertion doesn't hold. Most assertions can converted to be less drastic such that things can limp along instead of dying completely. Convert BUG_ON()s to WARN_ON[_ONCE]()s with softer failure behaviors - e.g. if assertion check fails in destroy_worker(), trigger WARN and silently ignore destruction request. Most conversions are trivial. Note that sanity checks in destroy_workqueue() are moved above removal from workqueues list so that it can bail out without side-effects if assertion checks fail. This patch doesn't introduce any visible behavior changes during normal operation. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-12 11:29:57 -07:00
Steven Rostedt (Red Hat)	2721e72dd1	tracing: Fix race in snapshot swapping Although the swap is wrapped with a spin_lock, the assignment of the temp buffer used to swap is not within that lock. It needs to be moved into that lock, otherwise two swaps happening on two different CPUs, can end up using the wrong temp buffer to assign in the swap. Luckily, all current callers of the swap function appear to have their own locks. But in case something is added that allows two different callers to call the swap, then there's a chance that this race can trigger and corrupt the buffers. New code is coming soon that will allow for this race to trigger. I've Cc'd stable, so this bug will not show up if someone backports one of the changes that can trigger this bug. Cc: stable@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-12 11:56:33 -04:00
Linus Torvalds	7c6baa304b	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc minor fixes mostly related to tracing" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: s390: Fix a header dependencies related build error tracing: update documentation of snapshot utility tracing: Do not return EINVAL in snapshot when not allocated tracing: Add help of snapshot feature when snapshot is empty ftrace: Update the kconfig for DYNAMIC_FTRACE	2013-03-11 07:54:29 -07:00
Lai Jiangshan	eb2834285c	workqueue: fix possible pool stall bug in wq_unbind_fn() Since multiple pools per cpu have been introduced, wq_unbind_fn() has a subtle bug which may theoretically stall work item processing. The problem is two-fold. * wq_unbind_fn() depends on the worker executing wq_unbind_fn() itself to start unbound chain execution, which works fine when there was only single pool. With multiple pools, only the pool which is running wq_unbind_fn() - the highpri one - is guaranteed to have such kick-off. The other pool could stall when its busy workers block. * The current code is setting WORKER_UNBIND / POOL_DISASSOCIATED of the two pools in succession without initiating work execution inbetween. Because setting the flags requires grabbing assoc_mutex which is held while new workers are created, this could lead to stalls if a pool's manager is waiting for the previous pool's work items to release memory. This is almost purely theoretical tho. Update wq_unbind_fn() such that it sets WORKER_UNBIND / POOL_DISASSOCIATED, goes over schedule() and explicitly kicks off execution for a pool and then moves on to the next one. tj: Updated comments and description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org	2013-03-08 15:18:28 -08:00
Arnd Bergmann	dc893e19b5	Revert parts of "hlist: drop the node parameter from iterators" Commit `b67bfe0d42` ("hlist: drop the node parameter from iterators") did a lot of nice changes but also contains two small hunks that seem to have slipped in accidentally and have no apparent connection to the intent of the patch. This reverts the two extraneous changes. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Peter Senna Tschudin <peter.senna@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-03-08 15:05:34 -08:00
Mark Rutland	a7dc19b865	clockevents: Don't allow dummy broadcast timers Currently tick_check_broadcast_device doesn't reject clock_event_devices with CLOCK_EVT_FEAT_DUMMY, and may select them in preference to real hardware if they have a higher rating value. In this situation, the dummy timer is responsible for broadcasting to itself, and the core clockevents code may attempt to call non-existent callbacks for programming the dummy, eventually leading to a panic. This patch makes tick_check_broadcast_device always reject dummy timers, preventing this problem. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Jon Medhurst (Tixy) <tixy@linaro.org> Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2013-03-07 17:16:11 +01:00
Steven Rostedt (Red Hat)	c9960e4854	tracing: Do not return EINVAL in snapshot when not allocated To use the tracing snapshot feature, writing a '1' into the snapshot file causes the snapshot buffer to be allocated if it has not already been allocated and dose a 'swap' with the main buffer, so that the snapshot now contains what was in the main buffer, and the main buffer now writes to what was the snapshot buffer. To free the snapshot buffer, a '0' is written into the snapshot file. To clear the snapshot buffer, any number but a '0' or '1' is written into the snapshot file. But if the file is not allocated it returns -EINVAL error code. This is rather pointless. It is better just to do nothing and return success. Acked-by: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-07 10:31:38 -05:00
Steven Rostedt (Red Hat)	d8741e2e88	tracing: Add help of snapshot feature when snapshot is empty When cat'ing the snapshot file, instead of showing an empty trace header like the trace file does, show how to use the snapshot feature. Also, this is a good place to show if the snapshot has been allocated or not. Users may want to "pre allocate" the snapshot to have a fast "swap" of the current buffer. Otherwise, a swap would be slow and might fail as it would need to allocate the snapshot buffer, and that might fail under tight memory constraints. Here's what it looked like before: # tracer: nop # # entries-in-buffer/entries-written: 0/0 #P:4 # # _-----=> irqs-off # / _----=> need-resched # \| / _---=> hardirq/softirq # \|\| / _--=> preempt-depth # \|\|\| / delay # TASK-PID CPU# \|\|\|\| TIMESTAMP FUNCTION # \| \| \| \|\|\|\| \| \| Here's what it looks like now: # tracer: nop # # # * Snapshot is freed * # # Snapshot commands: # echo 0 > snapshot : Clears and frees snapshot buffer # echo 1 > snapshot : Allocates snapshot buffer, if not already allocated. # Takes a snapshot of the main buffer. # echo 2 > snapshot : Clears snapshot buffer (but does not allocate) # (Doesn't have to be '2' works with any number that # is not a '0' or '1') Acked-by: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-03-07 10:31:22 -05:00
Linus Torvalds	e3b59518c1	Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes and cleanups from Thomas Gleixner: "Commit `e5ab012c32` ("nohz: Make tick_nohz_irq_exit() irq safe") is the first commit in the series and the minimal necessary bugfix, which needs to go back into stable. The remanining commits enforce irq disabling in irq_exit(), sanitize the hardirq/softirq preempt count transition and remove a bunch of no longer necessary conditionals." I personally love getting rid of the very subtle and confusing IRQ_EXIT_OFFSET thing. Even apart from the whole "more lines removed than added" thing. * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irq: Don't re-enable interrupts at the end of irq_exit irq: Remove IRQ_EXIT_OFFSET workaround Revert "nohz: Make tick_nohz_irq_exit() irq safe" irq: Sanitize invoke_softirq irq: Ensure irq_exit() code runs with interrupts disabled nohz: Make tick_nohz_irq_exit() irq safe	2013-03-05 18:10:04 -08:00
Linus Torvalds	6516ab6fdf	Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull smpboot bugfix from Thomas Gleixner: "A single bugfix for a regression introduced with the conversion of the stop machine threads to the generic smpboot thread management facility" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: stop_machine: Mark per cpu stopper enabled early	2013-03-05 18:07:12 -08:00
Li Zefan	7d8e0bf56a	cgroup: avoid accessing modular cgroup subsys structure without locking subsys[i] is set to NULL in cgroup_unload_subsys() at modular unload, and that's protected by cgroup_mutex, and then the memory *subsys[i] resides will be freed. So this is unsafe without any locking: if (!ss \|\| ss->module) ... v2: - add a comment for enum cgroup_subsys_id - simplify the comment in cgroup_exit() Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-05 09:33:25 -08:00
Li Zefan	f50daa704f	cgroup: no need to check css refs for release notification We no longer fail rmdir() when there're still css refs, so we don't need to check css refs in check_for_release(). This also voids a bug. cgroup_has_css_refs() accesses subsys[i] without cgroup_mutex, so it can race with cgroup_unload_subsys(). cgroup_has_css_refs() ... if (ss == NULL \|\| ss->root != cgrp->root) if ss pointers to net_cls_subsys, and cls_cgroup module is unloaded right after the former check but before the latter, the memory that net_cls_subsys resides has become invalid. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-04 10:04:54 -08:00
Li Zefan	f440d98f8e	cpuset: use cgroup_name() in cpuset_print_task_mems_allowed() Use cgroup_name() instead of cgrp->dentry->name. This makes the code a bit simpler. While at it, remove cpuset_name and make cpuset_nodelist a local variable to cpuset_print_task_mems_allowed(). Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-04 09:50:08 -08:00
Li Zefan	65dff759d2	cgroup: fix cgroup_path() vs rename() race rename() will change dentry->d_name. The result of this race can be worse than seeing partially rewritten name, but we might access a stale pointer because rename() will re-allocate memory to hold a longer name. As accessing dentry->name must be protected by dentry->d_lock or parent inode's i_mutex, while on the other hand cgroup-path() can be called with some irq-safe spinlocks held, we can't generate cgroup path using dentry->d_name. Alternatively we make a copy of dentry->d_name and save it in cgrp->name when a cgroup is created, and update cgrp->name at rename(). v5: use flexible array instead of zero-size array. v4: - allocate root_cgroup_name and all root_cgroup->name points to it. - add cgroup_name() wrapper. v3: use kfree_rcu() instead of synchronize_rcu() in user-visible path. v2: make cgrp->name RCU safe. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-04 09:50:08 -08:00
Lai Jiangshan	b31041042a	workqueue: better define synchronization rule around rescuer->pool updates Rescuers visit different worker_pools to process work items from pools under pressure. Currently, rescuer->pool is updated outside any locking and when an outsider looks at a rescuer, there's no way to tell when and whether rescuer->pool is gonna change. While this doesn't currently cause any problem, it is nasty. With recent worker_maybe_bind_and_lock() changes, we can move rescuer->pool updates inside pool locks such that if rescuer->pool equals a locked pool, it's guaranteed to stay that way until the pool is unlocked. Move rescuer->pool inside pool->lock. This patch doesn't introduce any visible behavior difference. tj: Updated the description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-04 09:44:58 -08:00
Lai Jiangshan	f36dc67b27	workqueue: change argument of worker_maybe_bind_and_lock() to @pool worker_maybe_bind_and_lock() currently takes @worker but only cares about @worker->pool. This patch updates worker_maybe_bind_and_lock() to take @pool instead of @worker. This will be used to better define synchronization rules regarding rescuer->pool updates. This doesn't introduce any functional change. tj: Updated the comments and description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-04 09:44:58 -08:00
Lai Jiangshan	f5faa0774e	workqueue: use %current instead of worker->task in worker_maybe_bind_and_lock() worker_maybe_bind_and_lock() uses both @worker->task and @current at the same time. As worker_maybe_bind_and_lock() can only be called by the current worker task, they are always the same. Update worker_maybe_bind_and_lock() to use %current consistently. This doesn't introduce any functional change. tj: Massaged the description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-04 09:44:58 -08:00
Linus Torvalds	56a79b7b02	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more VFS bits from Al Viro: "Unfortunately, it looks like xattr series will have to wait until the next cycle ;-/ This pile contains 9p cleanups and fixes (races in v9fs_fid_add() etc), fixup for nommu breakage in shmem.c, several cleanups and a bit more file_inode() work" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: constify path_get/path_put and fs_struct.c stuff fix nommu breakage in shmem.c cache the value of file_inode() in struct file 9p: if v9fs_fid_lookup() gets to asking server, it'd better have hashed dentry 9p: make sure ->lookup() adds fid to the right dentry 9p: untangle ->lookup() a bit 9p: double iput() in ->lookup() if d_materialise_unique() fails 9p: v9fs_fid_add() can't fail now v9fs: get rid of v9fs_dentry 9p: turn fid->dlist into hlist 9p: don't bother with private lock in ->d_fsdata; dentry->d_lock will do just fine more file_inode() open-coded instances selinux: opened file can't have NULL or negative ->f_path.dentry (In the meantime, the hlist traversal macros have changed, so this required a semantic conflict fixup for the newly hlistified fid->dlist)	2013-03-03 13:23:03 -08:00
Linus Torvalds	8fd5e7a2d9	ImgTec Meta architecture changes for v3.9-rc1 This adds core architecture support for Imagination's Meta processor cores, followed by some later miscellaneous arch/metag cleanups and fixes which I kept separate to ease review: - Support for basic Meta 1 (ATP) and Meta 2 (HTP) core architecture - A few fixes all over, particularly for symbol prefixes - A few privilege protection fixes - Several cleanups (setup.c includes, split out a lot of metag_ksyms.c) - Fix some missing exports - Convert hugetlb to use vm_unmapped_area() - Copy device tree to non-init memory - Provide dma_get_sgtable() Signed-off-by: James Hogan <james.hogan@imgtec.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) iQIcBAABAgAGBQJRMmVXAAoJEKHZs+irPybfivgP/inEXqJyfw59omQdjwvYcU/a /u0MJ3UKSNS3U+HknfaFCy/Nwk1dqPLjqqyVC1V6AbUPBXlaEwGcimlNRx2uRjdq Uh96upMLHsNuF/xiiR477g3RwY0egIJdM1R1bGi3mZ3vVrNQGF+wbni6f61xCWGz M/4rDglpQvE79oLhYdgj6tidZtHQT0YWtERA9W90zkQWXGYmpFPKBKbfZAi5+rKQ U6Gpg26orUugzXNaxltJEYKE8gjLTppEabx8DARnItZ4zCMy4dw5RBJ35RFvQw6e eSmfgTy9w9WqBMY2+QMSgU0KQt1IITCzX7OlOXC0jALQJXoU0WWbOELlBVQLCwF1 T0OcR/5ZP/hIlOk5Dh+e9U3AtbASXdMtqA0ZUe78woH1CBf7Nc/0c0vRg23EdMh8 lnHDJxT/UqskoOYLI4kgWbEdLDy4uTh19U2pVi7VCo7ksLB9Bj9Xc8VSKgscSXTl OwzN+c4Jgtu8FDFTp+Af4AT8pYGJ08j8L2ErsV2sOv3Q44U5WXdrMz3GSgwXj8+4 wZk3HvdkQVkMD5sJCUZgAswaN6BnbB0pHdCz4wMQ8jR/Ogs015Ipk64Ecym9S/4n uES7PnDtt/4lb5EyX2ScbvdnZTAFTaaP7OOhC77BOQvbQjIW1tkAcxWJqRry86uS iM0BFgK6Ohx3geqa5Ft0 =65cR -----END PGP SIGNATURE----- Merge tag 'metag-v3.9-rc1-v4' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag Pull new ImgTec Meta architecture from James Hogan: "This adds core architecture support for Imagination's Meta processor cores, followed by some later miscellaneous arch/metag cleanups and fixes which I kept separate to ease review: - Support for basic Meta 1 (ATP) and Meta 2 (HTP) core architecture - A few fixes all over, particularly for symbol prefixes - A few privilege protection fixes - Several cleanups (setup.c includes, split out a lot of metag_ksyms.c) - Fix some missing exports - Convert hugetlb to use vm_unmapped_area() - Copy device tree to non-init memory - Provide dma_get_sgtable()" * tag 'metag-v3.9-rc1-v4' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag: (61 commits) metag: Provide dma_get_sgtable() metag: prom.h: remove declaration of metag_dt_memblock_reserve() metag: copy devicetree to non-init memory metag: cleanup metag_ksyms.c includes metag: move mm/init.c exports out of metag_ksyms.c metag: move usercopy.c exports out of metag_ksyms.c metag: move setup.c exports out of metag_ksyms.c metag: move kick.c exports out of metag_ksyms.c metag: move traps.c exports out of metag_ksyms.c metag: move irq enable out of irqflags.h on SMP genksyms: fix metag symbol prefix on crc symbols metag: hugetlb: convert to vm_unmapped_area() metag: export clear_page and copy_page metag: export metag_code_cache_flush_all metag: protect more non-MMU memory regions metag: make TXPRIVEXT bits explicit metag: kernel/setup.c: sort includes perf: Enable building perf tools for Meta metag: add boot time LNKGET/LNKSET check metag: add __init to metag_cache_probe() ...	2013-03-03 12:06:09 -08:00
Linus Torvalds	6ec40b4230	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull sigprocmask compat fix from Al Viro: "generic compat_sys_rt_sigprocmask() had a very dumb braino; I'd spent quite a while staring at the offending commit before finally managing to spot the idiocy ;-/" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: fix compat_sys_rt_sigprocmask()	2013-03-02 19:32:06 -08:00
Al Viro	db61ec29fd	fix compat_sys_rt_sigprocmask() Converting bitmask to 32bit granularity is fine, but we'd better _do_ something with the result. Such as "copy it to userland"... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-03-02 20:39:15 -05:00
James Hogan	649508f684	trace/ring_buffer: handle 64bit aligned structs Some 32 bit architectures require 64 bit values to be aligned (for example Meta which has 64 bit read/write instructions). These require 8 byte alignment of event data too, so use !CONFIG_HAVE_64BIT_ALIGNED_ACCESS instead of !CONFIG_64BIT \|\| CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS to decide alignment, and align buffer_data_page::data accordingly. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> (previous version subtly different)	2013-03-02 20:09:16 +00:00
Linus Torvalds	3cfb07743a	KGDB/KDB fixes and cleanups Cleanups Remove kdb ssb command - there is no in kernel disassembler to support it Remove kdb ll command - Always caused a kernel oops and there were no bug reports so no one was using this command Use kernel ARRAY_SIZE macro instead of array computations Fixes Stop oops in kdb if user executes kdb_defcmd with args kdb help command truncated text ppc64 support for kgdbts Add missing kconfig option from original kdb port for dealing with catastrophic kernel crashes such that you can reboot automatically on continue from kdb -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJRMhJPAAoJEIciOldedpOj25gP+wZn/u3YxOGkhL3CIBOegEwW KFte7FFNeLNzw8tyEzBMVXok85PJ+/lvm+Xi+a4hhtV7/S+WvwGoWeIg0zCDtrAu im4m2X1u00Egcztcr4tCk8Lwc6vNm5KZsBsUsbKKi0K8twaWifkxLMEWeq+3WLzh +1d4TDkziwdZfcLpUHvohmCsCyg1EdjUxTYnWpSJvLrq5lhNvTQBowHWZH06GBcI MmhC+aHp3myk2/Vwd1AuDq+jIRbH3ORV50wfRONBR7OJ58sOoD9nV8Mr+fy8QLRN BPv8WxeRSm7X/Hl6nPm9PX7oNQSg+Ba1+5helIIJMdGGWhJbl9AZslFEU40Zn3yS V7FmS3mwRSA8NCgfZ+CO6311tirSCvtf34+5ttXEntyK1ujaplSPffBP4Ya26y6q TB2wLr91+3L0LhzrVRj20P13qexsWUW4t9inLzqtDApD3Zljl9Kuwd0febCiLg0r mgGkpZjK0VI7S7RfkxpmEyU4bFP2lCTzBOKAKOfIV6O88bfqFdv5QCTHODJArf22 ugDGPX3tw++Lyc9otGxyPbxmc7npYGqTD1UEM8xK/7sEBiWJm71jjwHN1bwRwB3o EC5bcZ6TJgDJnnvYGBTQec74PnUiCu/UnQVFFabzhowQCJTIWSJkPxjnI2BKK1V5 3exPqogclusAhhwopOtG =Hwbq -----END PGP SIGNATURE----- Merge tag 'for_linux-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb Pull KGDB/KDB fixes and cleanups from Jason Wessel: "For a change we removed more code than we added. If people aren't using it we shouldn't be carrying it. :-) Cleanups: - Remove kdb ssb command - there is no in kernel disassembler to support it - Remove kdb ll command - Always caused a kernel oops and there were no bug reports so no one was using this command - Use kernel ARRAY_SIZE macro instead of array computations Fixes: - Stop oops in kdb if user executes kdb_defcmd with args - kdb help command truncated text - ppc64 support for kgdbts - Add missing kconfig option from original kdb port for dealing with catastrophic kernel crashes such that you can reboot automatically on continue from kdb" * tag 'for_linux-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb: kdb: Remove unhandled ssb command kdb: Prevent kernel oops with kdb_defcmd kdb: Remove the ll command kdb_main: fix help print kdb: Fix overlap in buffers with strcpy Fixed dead ifdef block by adding missing Kconfig option. kdb: Setup basic kdb state before invoking commands via kgdb kdb: use ARRAY_SIZE where possible kgdb/kgdbts: support ppc64 kdb: A fix for kdb command table expansion	2013-03-02 08:31:39 -08:00
Linus Torvalds	e23b62256a	Initial ARC Linux port with some fixes on top for 3.9-rc1 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJRMZYbAAoJEGnX8d3iisJeFj8P/R1hWohDDUc8pG3+ov9Y2Brt g7oIVw1udlKIk3HhVwsyT14/UHunfcTCONKKKGmmbfRrLJSMMsSlXvYoAQLozokf TuaO3Xt5IfERROqTrCDSwdNaAmwIZGsIuI9jWKo4qsXovAL0nc3sR527qMI1o7OE 9X5eIqOJ/rPvOjXYrPqXmvO/DZKYG8PNTH4PYqePes31CsAmDXWBIlgYmgvF3th3 ptwKPgRU/c2wKNpDDJXVQg/bcg9NI2cCnndNrjgXZgyUQrC37ZTdt/IOF5w6FgIW 6i6UbDKXn8MgQAhrXx0Ns/+0kSJZ7eBWmj8hLyrxUzOYlF4rCs/il6ofDRaMO6fv 9LmbNZXYnGICzm1YAxZRK7dm13IbDnltmMc81vISBpJSMTBgqzLWobHnq5/67Wh4 2oUkoc2Tfaw70FnRCewX0x4Qop2YXmXl1KBwdecvzdcKi6Yg+rRH08ur/0yyCyx7 +vAQpPVIuVqCc916qwmCPFaf1UMNnmMStxNH7D1AQHvi1G372NxfXizdYyKFRY9N f5Q+6DTo1xh2AxuGieSZxBoeK0Rlp4DWTOBD4MMz29y7BRX7LK1U2iS+nW0g8uir 3RdYeAqyCxlJtjJNQX9U8ZT54jUPZgvJWU0udesRN1CBdOSQMjM9OyZsLLRtLeX1 ww2tc7zqhBUyjBej6Itg =NKkW -----END PGP SIGNATURE----- Merge tag 'arc-v3.9-rc1-late' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull new ARC architecture from Vineet Gupta: "Initial ARC Linux port with some fixes on top for 3.9-rc1: I would like to introduce the Linux port to ARC Processors (from Synopsys) for 3.9-rc1. The patch-set has been discussed on the public lists since Nov and has received a fair bit of review, specially from Arnd, tglx, Al and other subsystem maintainers for DeviceTree, kgdb... The arch bits are in arch/arc, some asm-generic changes (acked by Arnd), a minor change to PARISC (acked by Helge). The series is a touch bigger for a new port for 2 main reasons: 1. It enables a basic kernel in first sub-series and adds ptrace/kgdb/.. later 2. Some of the fallout of review (DeviceTree support, multi-platform- image support) were added on top of orig series, primarily to record the revision history. This updated pull request additionally contains - fixes due to our GNU tools catching up with the new syscall/ptrace ABI - some (minor) cross-arch Kconfig updates." * tag 'arc-v3.9-rc1-late' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: (82 commits) ARC: split elf.h into uapi and export it for userspace ARC: Fixup the current ABI version ARC: gdbserver using regset interface possibly broken ARC: Kconfig cleanup tracking cross-arch Kconfig pruning in merge window ARC: make a copy of flat DT ARC: [plat-arcfpga] DT arc-uart bindings change: "baud" => "current-speed" ARC: Ensure CONFIG_VIRT_TO_BUS is not enabled ARC: Fix pt_orig_r8 access ARC: [3.9] Fallout of hlist iterator update ARC: 64bit RTSC timestamp hardware issue ARC: Don't fiddle with non-existent caches ARC: Add self to MAINTAINERS ARC: Provide a default serial.h for uart drivers needing BASE_BAUD ARC: [plat-arcfpga] defconfig for fully loaded ARC Linux ARC: [Review] Multi-platform image #8: platform registers SMP callbacks ARC: [Review] Multi-platform image #7: SMP common code to use callbacks ARC: [Review] Multi-platform image #6: cpu-to-dma-addr optional ARC: [Review] Multi-platform image #5: NR_IRQS defined by ARC core ARC: [Review] Multi-platform image #4: Isolate platform headers ARC: [Review] Multi-platform image #3: switch to board callback ...	2013-03-02 07:58:56 -08:00
Vincent	36dfea42cc	kdb: Remove unhandled ssb command The 'ssb' command can only be handled when we have a disassembler, to check for branches, so remove the 'ssb' command for now. Signed-off-by: Vincent Stehlé <vincent.stehle@laposte.net> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2013-03-02 08:52:20 -06:00
Jason Wessel	a37372f6c3	kdb: Prevent kernel oops with kdb_defcmd The kdb_defcmd can only be used to display the available command aliases while using the kernel debug shell. If you try to define a new macro while the kernel debugger is active it will oops. The debug shell macros must use pre-allocated memory set aside at the time kdb_init() is run, and the kdb_defcmd is restricted to only working at the time that the kdb_init sequence is being run, which only occurs if you actually activate the kernel debugger. Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2013-03-02 08:52:19 -06:00
Jason Wessel	1b2caa2dcb	kdb: Remove the ll command Recently some code inspection was done after fixing a problem with kmalloc used while in the kernel debugger context (which is not legal), and it turned up the fact that kdb ll command will oops the kernel. Given that there have been zero bug reports on the command combined with the fact it will oops the kernel it is clearly not being used. Instead of fixing it, it will be removed. Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2013-03-02 08:52:19 -06:00
Jason Wessel	074604af21	kdb_main: fix help print The help command was chopping all the usage instructions such that they were not readable. Example: bta [D\|R\|S\|T\|C\|Z\|E\|U\|I\| Backtrace all processes matching state flag per_cpu <sym> [<bytes>] [<c Display per_cpu variables Where as it should look like: bta [D\|R\|S\|T\|C\|Z\|E\|U\|I\|M\|A] Backtrace all processes matching state flag per_cpu <sym> [<bytes>] [<cpu>] Display per_cpu variables All that is needed is to check the how long the cmd_usage is and jump to the next line when appropriate. Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2013-03-02 08:52:18 -06:00
Jason Wessel	4eb7a66d94	kdb: Fix overlap in buffers with strcpy Maxime reported that strcpy(s->usage, s->usage+1) has no definitive guarantee that it will work on all archs the same way when you have overlapping memory. The fix is simple for the kdb code because we still have the original string memory in the function scope, so we just have to use that as the argument instead. Reported-by: Maxime Villard <rustyBSD@gmx.fr> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2013-03-02 08:52:18 -06:00
Matt Klein	00370b8f8d	kdb: Setup basic kdb state before invoking commands via kgdb Although invasive kdb commands are not supported via kgdb, some useful non-invasive commands like bt* require basic kdb state to be setup before calling into the kdb code. Factor out some of this code and call it before and after executing kdb commands via kgdb. Signed-off-by: Matt Klein <mklein@twitter.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2013-03-02 08:52:17 -06:00
Sasha Levin	5f784f798c	kdb: use ARRAY_SIZE where possible Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2013-03-02 08:52:17 -06:00
John Blackwood	f7c82d5a3c	kdb: A fix for kdb command table expansion When locally adding in some additional kdb commands, I stumbled across an issue with the dynamic expansion of the kdb command table. When the number of kdb commands exceeds the size of the statically allocated kdb_base_commands[] array, additional space is allocated in the kdb_register_repeat() routine. The unused portion of the newly allocated array was not being initialized to zero properly and this would result in segfaults when help '?' was executed or when a search for a non-existing command would traverse the command table beyond the end of valid command entries and then attempt to use the non-zeroed area as actual command entries. Signed-off-by: John Blackwood <john.blackwood@ccur.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2013-03-02 08:52:16 -06:00
Linus Torvalds	2af78448ff	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux Pull thermal management updates from Zhang Rui: "Highlights: - introduction of Dove thermal sensor driver. - introduction of Kirkwood thermal sensor driver. - introduction of intel_powerclamp thermal cooling device driver. - add interrupt and DT support for rcar thermal driver. - add thermal emulation support which allows platform thermal driver to do software/hardware emulation for thermal issues." * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (36 commits) thermal: rcar: remove __devinitconst thermal: return an error on failure to register thermal class Thermal: rename thermal governor Kconfig option to avoid generic naming thermal: exynos: Use the new thermal trend type for quick cooling action. Thermal: exynos: Add support for temperature falling interrupt. Thermal: Dove: Add Themal sensor support for Dove. thermal: Add support for the thermal sensor on Kirkwood SoCs thermal: rcar: add Device Tree support thermal: rcar: remove machine_power_off() from rcar_thermal_notify() thermal: rcar: add interrupt support thermal: rcar: add read/write functions for common/priv data thermal: rcar: multi channel support thermal: rcar: use mutex lock instead of spin lock thermal: rcar: enable CPCTL to use hardware TSC deciding thermal: rcar: use parenthesis on macro Thermal: fix a build warning when CONFIG_THERMAL_EMULATION cleared Thermal: fix a wrong comment thermal: sysfs: Add a new sysfs node emul_temp for thermal emulation PM: intel_powerclamp: off by one in start_power_clamp() thermal: exynos: Miscellaneous fixes to support falling threshold interrupt ...	2013-02-28 19:48:26 -08:00
Linus Torvalds	ee89f81252	Merge branch 'for-3.9/core' of git://git.kernel.dk/linux-block Pull block IO core bits from Jens Axboe: "Below are the core block IO bits for 3.9. It was delayed a few days since my workstation kept crashing every 2-8h after pulling it into current -git, but turns out it is a bug in the new pstate code (divide by zero, will report separately). In any case, it contains: - The big cfq/blkcg update from Tejun and and Vivek. - Additional block and writeback tracepoints from Tejun. - Improvement of the should sort (based on queues) logic in the plug flushing. - _io() variants of the wait_for_completion() interface, using io_schedule() instead of schedule() to contribute to io wait properly. - Various little fixes. You'll get two trivial merge conflicts, which should be easy enough to fix up" Fix up the trivial conflicts due to hlist traversal cleanups (commit `b67bfe0d42`: "hlist: drop the node parameter from iterators"). * 'for-3.9/core' of git://git.kernel.dk/linux-block: (39 commits) block: remove redundant check to bd_openers() block: use i_size_write() in bd_set_size() cfq: fix lock imbalance with failed allocations drivers/block/swim3.c: fix null pointer dereference block: don't select PERCPU_RWSEM block: account iowait time when waiting for completion of IO request sched: add wait_for_completion_io[_timeout] writeback: add more tracepoints block: add block_{touch\|dirty}_buffer tracepoint buffer: make touch_buffer() an exported function block: add @req to bio_{front\|back}_merge tracepoints block: add missing block_bio_complete() tracepoint block: Remove should_sort judgement when flush blk_plug block,elevator: use new hashtable implementation cfq-iosched: add hierarchical cfq_group statistics cfq-iosched: collect stats from dead cfqgs cfq-iosched: separate out cfqg_stats_reset() from cfq_pd_reset_stats() blkcg: make blkcg_print_blkgs() grab q locks instead of blkcg lock block: RCU free request_queue blkcg: implement blkg_[rw]stat_recursive_sum() and blkg_[rw]stat_merge() ...	2013-02-28 12:52:24 -08:00
Frederic Weisbecker	4cd5d1115c	irq: Don't re-enable interrupts at the end of irq_exit Commit `74eed0163d` "irq: Ensure irq_exit() code runs with interrupts disabled" restore interrupts flags in the end of irq_exit() for archs that don't define __ARCH_IRQ_EXIT_IRQS_DISABLED. However always returning from irq_exit() with interrupts disabled should not be a problem for these archs. Prior to this commit this was already happening anytime we processed pending softirqs anyway. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2013-02-28 20:00:43 +01:00
Linus Torvalds	2a7d2b96d5	Merge branch 'akpm' (final batch from Andrew) Merge third patch-bumb from Andrew Morton: "This wraps me up for -rc1. - Lots of misc stuff and things which were deferred/missed from patchbombings 1 & 2. - ocfs2 things - lib/scatterlist - hfsplus - fatfs - documentation - signals - procfs - lockdep - coredump - seqfile core - kexec - Tejun's large IDR tree reworkings - ipmi - partitions - nbd - random() things - kfifo - tools/testing/selftests updates - Sasha's large and pointless hlist cleanup" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (163 commits) hlist: drop the node parameter from iterators kcmp: make it depend on CHECKPOINT_RESTORE selftests: add a simple doc tools/testing/selftests/Makefile: rearrange targets selftests/efivarfs: add create-read test selftests/efivarfs: add empty file creation test selftests: add tests for efivarfs kfifo: fix kfifo_alloc() and kfifo_init() kfifo: move kfifo.c from kernel/ to lib/ arch Kconfig: centralise CONFIG_ARCH_NO_VIRT_TO_BUS w1: add support for DS2413 Dual Channel Addressable Switch memstick: move the dereference below the NULL test drivers/pps/clients/pps-gpio.c: use devm_kzalloc Documentation/DMA-API-HOWTO.txt: fix typo include/linux/eventfd.h: fix incorrect filename is a comment mtd: mtd_stresstest: use prandom_bytes() mtd: mtd_subpagetest: convert to use prandom library mtd: mtd_speedtest: use prandom_bytes mtd: mtd_pagetest: convert to use prandom library mtd: mtd_oobtest: convert to use prandom library ...	2013-02-27 20:58:09 -08:00
Sasha Levin	b67bfe0d42	hlist: drop the node parameter from iterators I'm not sure why, but the hlist for each entry iterators were conceived list_for_each_entry(pos, head, member) The hlist ones were greedy and wanted an extra parameter: hlist_for_each_entry(tpos, pos, head, member) Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate. Besides the semantic patch, there was some manual work required: - Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually. The semantic patch which is mostly the work of Peter Senna Tschudin is here: @@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host; type T; expression a,c,d,e; identifier b; statement S; @@ -T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S \| hlist_for_each_entry_continue(a, - b, c) S \| hlist_for_each_entry_from(a, - b, c) S \| hlist_for_each_entry_rcu(a, - b, c, d) S \| hlist_for_each_entry_rcu_bh(a, - b, c, d) S \| hlist_for_each_entry_continue_rcu_bh(a, - b, c) S \| for_each_busy_worker(a, c, - b, d) S \| ax25_uid_for_each(a, - b, c) S \| ax25_for_each(a, - b, c) S \| inet_bind_bucket_for_each(a, - b, c) S \| sctp_for_each_hentry(a, - b, c) S \| sk_for_each(a, - b, c) S \| sk_for_each_rcu(a, - b, c) S \| sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S \| sk_for_each_safe(a, - b, c, d) S \| sk_for_each_bound(a, - b, c) S \| hlist_for_each_entry_safe(a, - b, c, d, e) S \| hlist_for_each_entry_continue_rcu(a, - b, c) S \| nr_neigh_for_each(a, - b, c) S \| nr_neigh_for_each_safe(a, - b, c, d) S \| nr_node_for_each(a, - b, c) S \| nr_node_for_each_safe(a, - b, c, d) S \| - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S \| - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S \| for_each_host(a, - b, c) S \| for_each_host_safe(a, - b, c, d) S \| for_each_mesh_entry(a, - b, c, d) S ) ...+> [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c] [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix warnings] [akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:24 -08:00
Cyrill Gorcunov	1e142b29e2	kcmp: make it depend on CHECKPOINT_RESTORE Since kcmp syscall has been implemented (initially on x86 architecture) a number of other archs wire it up as well: xtensa, sparc, sh, s390, mips, microblaze, m68k (not taking into account those who uses <asm-generic/unistd.h> for syscall numbers definitions). But the Makefile, which turns kcmp.o generation on still depends on former config-x86. Thus get rid of this limitation and make kcmp.o depend on CHECKPOINT_RESTORE option. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:24 -08:00
Stefani Seibold	c759b35e64	kfifo: move kfifo.c from kernel/ to lib/ Move kfifo.c from kernel/ to lib/ Signed-off-by: Stefani Seibold <stefani@seibold.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:23 -08:00
Yuanhan Liu	bf5315366b	kernel/utsname.c: fix wrong comment about clone_uts_ns() Fix the wrong comment about the return value of clone_uts_ns() Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:22 -08:00
Yuanhan Liu	cd89f46b52	kernel/utsname_sysctl.c: put get/get_uts() into CONFIG_PROC_SYSCTL code block Put get/get_uts() into CONFIG_PROC_SYSCTL code block as they are used only when CONFIG_PROC_SYSCTL is enabled. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:22 -08:00
Xi Wang	df1778be1a	sysctl: fix null checking in bin_dn_node_address() The null check of `strchr() + 1' is broken, which is always non-null, leading to OOB read. Instead, check the result of strchr(). Signed-off-by: Xi Wang <xi.wang@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:21 -08:00
Tejun Heo	ee94d523bf	posix-timers: convert to idr_alloc() Convert to the much saner new idr interface. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:19 -08:00
Tejun Heo	0e9c3be20d	events: convert to idr_alloc() Convert to the much saner new idr interface. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:19 -08:00
Tejun Heo	d228d9ec2c	cgroup: convert to idr_alloc() Convert to the much saner new idr interface. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:19 -08:00
Tejun Heo	c897ff68be	cgroup: don't use idr_remove_all() idr_destroy() can destroy idr by itself and idr_remove_all() is being deprecated. Drop its usage. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:14 -08:00
Zhang Yanfei	8c333ac2e4	kexec: avoid freeing NULL pointer in image_crash_alloc() Though there is no error if we free a NULL pointer, I think we could avoid this behaviour. Change the code a little in kimage_crash_alloc() could avoid this kind of unnecessary free. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Sasha Levin <sasha.levin@oracle.com> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:12 -08:00
Zhang Yanfei	b92e7e0dae	kexec: fix memory leak in function kimage_normal_alloc If kimage_normal_alloc() fails to alloc pages for image->swap_page, it should call kimage_free_page_list() to free allocated pages in image->control_pages list before it frees image. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Sasha Levin <sasha.levin@oracle.com> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:12 -08:00
Sasha Levin	fe88f2ee33	kexec: prevent double free on image allocation failure If kimage_normal_alloc() fails to initialize an allocated kimage, it will free the image but would still set 'rimage', as a result kexec_load will try to free it again. This would explode as part of the freeing process is accessing internal members which point to uninitialized memory. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:12 -08:00
Mitsuhiro Tanino	0d0bf66741	kexec: export PG_hwpoison flag into vmcoreinfo This patch exports a PG_hwpoison into vmcoreinfo when CONFIG_MEMORY_FAILURE is defined. "makedumpfile" needs to read information of memory, such as 'mem_section', 'zone', 'pageflags' from vmcore. We introduce a function into "makedumpfile" to exclude hwpoison page from vmcore dump. In order to introduce this function, PG_hwpoison flag have to export into vmcoreinfo. Signed-off-by: Mitsuhiro Tanino <mitsuhiro.tanino.gm@hitachi.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Mitsuhiro Tanino <mitsuhiro.tanino.gm@hitachi.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:12 -08:00
Zhang Yanfei	8a525f5e7a	kexec: get rid of duplicate check for hole_end hole_end has been checked to make sure it is <= crash_res.end in the while condition check, so the if condition check is duplicate. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:12 -08:00
Atsushi Kumagai	8d67091ec6	kexec: add the values related to buddy system for filtering free pages. tAdd adds the values related to buddy system to vmcoreinfo data so that makedumpfile (dump filtering command) can filter out all free pages with the new logic. It's faster than the current logic because it can distinguish free page by analyzing page structure at the same time as filtering for other unnecessary pages (e.g. anonymous page). OTOH, the current logic has to trace free_list to distinguish free pages while analyzing page structure to filter out other unnecessary pages. The new logic uses the fact that buddy page is marked by _mapcount == PAGE_BUDDY_MAPCOUNT_VALUE. But, _mapcount shares its memory with other fields for SLAB/SLUB when PG_slab is set, so we need to check if PG_slab is set or not before looking up _mapcount value. And we can get the order of buddy system from private field. To sum it up, the values below are required for this logic. Required values: - OFFSET(page._mapcount) - OFFSET(page.private) - NUMBER(PG_slab) - NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE) Changelog from v1 to v2: 1. remove SIZE(pageflags) The new logic was changed after I sent v1 patch. Accordingly, SIZE(pageflags) has been unnecessary for makedumpfile. What's makedumpfile: makedumpfile creates a small dumpfile by excluding unnecessary pages for the analysis. To distinguish unnecessary pages, makedumpfile gets the vmcoreinfo data which has the minimum debugging information only for dump filtering. Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:12 -08:00
Alan Cox	6f977e6b2f	fork: unshare: remove dead code If new_nsproxy is set we will always call switch_task_namespaces and then set new_nsproxy back to NULL so the reassignment and fall through check are redundant Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:12 -08:00
Mandeep Singh Baines	80d26af89a	coredump: use a freezable_schedule for the coredump_finish wait Prevents hung_task detector from panicing the machine. This is also needed to prevent this wait from blocking suspend. (It doesnt' currently block suspend but it would once the next patch in this series is applied.) [yongjun_wei@trendmicro.com.cn: kernel/exit.c: remove duplicated include] Signed-off-by: Mandeep Singh Baines <msb@chromium.org> Cc: Ben Chan <benchan@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:11 -08:00
Mandeep Singh Baines	6aa9707099	lockdep: check that no locks held at freeze time We shouldn't try_to_freeze if locks are held. Holding a lock can cause a deadlock if the lock is later acquired in the suspend or hibernate path (e.g. by dpm). Holding a lock can also cause a deadlock in the case of cgroup_freezer if a lock is held inside a frozen cgroup that is later acquired by a process outside that group. [akpm@linux-foundation.org: export debug_check_no_locks_held] Signed-off-by: Mandeep Singh Baines <msb@chromium.org> Cc: Ben Chan <benchan@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:11 -08:00
Kees Cook	e579d2c259	coredump: remove redundant defines for dumpable states The existing SUID_DUMP_* defines duplicate the newer SUID_DUMPABLE_* defines introduced in `54b501992d` ("coredump: warn about unsafe suid_dumpable / core_pattern combo"). Remove the new ones, and use the prior values instead. Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Chen Gang <gang.chen@asianux.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alan Cox <alan@linux.intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: James Morris <james.l.morris@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:11 -08:00
Valdis Kletnieks	5d1fadc147	kernel/signal.c: fix suboptimal printk usage Several printk's were missing KERN_INFO and KERN_CONT flags. In addition, a printk that was outside a #if/#endif should have been inside, which would result in stray blank line on non-x86 boxes. Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:11 -08:00
Andrey Vagin	66dd34ad31	signal: allow to send any siginfo to itself The idea is simple. We need to get the siginfo for each signal on checkpointing dump, and then return it back on restore. The first problem is that the kernel doesn't report complete siginfos to userspace. In a signal handler the kernel strips SI_CODE from siginfo. When a siginfo is received from signalfd, it has a different format with fixed sizes of fields. The interface of signalfd was extended. If a signalfd is created with the flag SFD_RAW, it returns siginfo in a raw format. rt_sigqueueinfo looks suitable for restoring signals, but it can't send siginfo with a positive si_code, because these codes are reserved for the kernel. In the real world each person has right to do anything with himself, so I think a process should able to send any siginfo to itself. This patch: The kernel prevents sending of siginfo with positive si_code, because these codes are reserved for kernel. I think we can allow a task to send such a siginfo to itself. This operation should not be dangerous. This functionality is required for restoring signals in checkpoint/restart. Signed-off-by: Andrey Vagin <avagin@openvz.org> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:11 -08:00
Oleg Nesterov	7ff6764061	usermodehelper: cleanup/fix __orderly_poweroff() && argv_free() __orderly_poweroff() does argv_free() if call_usermodehelper_fns() returns -ENOMEM. As Lucas pointed out, this can be wrong if -ENOMEM was not triggered by the failing call_usermodehelper_setup(), in this case both __orderly_poweroff() and argv_cleanup() can do kfree(). Kill argv_cleanup() and change __orderly_poweroff() to call argv_free() unconditionally like do_coredump() does. This info->cleanup() is not needed (and wrong) since `6c0c0d4d` "fix bug in orderly_poweroff() which did the UMH_NO_WAIT => UMH_WAIT_EXEC change, we can rely on the fact that CLONE_VFORK can't return until do_execve() succeeds/fails. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Lucas De Marchi <lucas.demarchi@profusion.mobi> Cc: David Howells <dhowells@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: hongfeng <hongfeng@marvell.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:09 -08:00
Steven Rostedt	db05021d49	ftrace: Update the kconfig for DYNAMIC_FTRACE The prompt to enable DYNAMIC_FTRACE (the ability to nop and enable function tracing at run time) had a confusing statement: "enable/disable ftrace tracepoints dynamically" This was written before tracepoints were added to the kernel, but now that tracepoints have been added, this is very confusing and has confused people enough to give wrong information during presentations. Not only that, I looked at the help text, and it still references that dreaded daemon that use to wake up once a second to update the nop locations and brick NICs, that hasn't been around for over five years. Time to bring the text up to the current decade. Cc: stable@vger.kernel.org Reported-by: Ezequiel Garcia <elezegarcia@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-02-27 21:58:01 -05:00
Al Viro	6131ffaa1f	more file_inode() open-coded instances Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-27 16:59:05 -05:00
Linus Torvalds	0ca7ffb356	Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild Pull kbuild changes from Michal Marek: - Alias generation in modpost is cross-compile safe. - kernel/timeconst.h is now generated using a bc script instead of perl. - scripts/link-vmlinux.sh now works with an alternative $KCONFIG_CONFIG. - destination-y for exported headers is supported in Kbuild files again. - depmod is called with -P $CONFIG_SYMBOL_PREFIX on architectures that need it. - CONFIG_DEBUG_INFO_REDUCED disables var-tracking - scripts/setlocalversion works with too much translated locales ;) * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: kbuild: Fix reading of .config in link-vmlinux.sh kbuild: Unset language specific variables in setlocalversion script Kbuild: Disable var tracking with CONFIG_DEBUG_INFO_REDUCED depmod: pass -P $CONFIG_SYMBOL_PREFIX kbuild: Fix destination-y for installed headers scripts/link-vmlinux.sh: source variables from KCONFIG_CONFIG kernel: Replace timeconst.pl with a bc script mod/file2alias: make modalias generation safe for cross compiling	2013-02-27 12:25:47 -08:00
Linus Torvalds	d895cb1af1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile (part one) from Al Viro: "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent locking violations, etc. The most visible changes here are death of FS_REVAL_DOT (replaced with "has ->d_weak_revalidate()") and a new helper getting from struct file to inode. Some bits of preparation to xattr method interface changes. Misc patches by various people sent this cycle and ocfs2 fixes from several cycles ago that should've been upstream right then. PS: the next vfs pile will be xattr stuff." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits) saner proc_get_inode() calling conventions proc: avoid extra pde_put() in proc_fill_super() fs: change return values from -EACCES to -EPERM fs/exec.c: make bprm_mm_init() static ocfs2/dlm: use GFP_ATOMIC inside a spin_lock ocfs2: fix possible use-after-free with AIO ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero target: writev() on single-element vector is pointless export kernel_write(), convert open-coded instances fs: encode_fh: return FILEID_INVALID if invalid fid_type kill f_vfsmnt vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op nfsd: handle vfs_getattr errors in acl protocol switch vfs_getattr() to struct path default SET_PERSONALITY() in linux/elf.h ceph: prepopulate inodes only when request is aborted d_hash_and_lookup(): export, switch open-coded instances 9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate() 9p: split dropping the acls from v9fs_set_create_acl() ...	2013-02-26 20:16:07 -08:00
Linus Torvalds	24e55910e4	Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Ingo Molnar. * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource : Nomadik-mtu : fix missing irq initialization posix-timer: Don't call idr_find() with out-of-range ID	2013-02-26 19:43:20 -08:00
Linus Torvalds	dcad0fceae	Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar. * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cputime: Use local_clock() for full dynticks cputime accounting cputime: Constify timeval_to_cputime(timeval) argument sched: Move RR_TIMESLICE from sysctl.h to rt.h sched: Fix /proc/sched_debug failure on very very large systems sched: Fix /proc/sched_stat failure on very very large systems sched/core: Remove the obsolete and unused nr_uninterruptible() function	2013-02-26 19:42:08 -08:00
Linus Torvalds	f8ef15d6b9	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar. * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Add Intel IvyBridge event scheduling constraints ftrace: Call ftrace cleanup module notifier after all other notifiers tracing/syscalls: Allow archs to ignore tracing compat syscalls	2013-02-26 19:40:37 -08:00
Thomas Gleixner	46c498c2cd	stop_machine: Mark per cpu stopper enabled early commit `14e568e78` (stop_machine: Use smpboot threads) introduced the following regression: Before this commit the stopper enabled bit was set in the online notifier. CPU0 CPU1 cpu_up cpu online hotplug_notifier(ONLINE) stopper(CPU1)->enabled = true; ... stop_machine() The conversion to smpboot threads moved the enablement to the wakeup path of the parked thread. The majority of users seem to have the following working order: CPU0 CPU1 cpu_up cpu online unpark_threads() wakeup(stopper[CPU1]) .... stopper thread runs stopper(CPU1)->enabled = true; stop_machine() But Konrad and Sander have observed: CPU0 CPU1 cpu_up cpu online unpark_threads() wakeup(stopper[CPU1]) .... stop_machine() stopper thread runs stopper(CPU1)->enabled = true; Now the stop machinery kicks CPU0 into the stop loop, where it gets stuck forever because the queue code saw stopper(CPU1)->enabled == false, so CPU0 waits for CPU1 to enter stomp_machine, but the CPU1 stopper work got discarded due to enabled == false. Add a pre_unpark function to the smpboot thread descriptor and call it before waking the thread. This fixes the problem at hand, but the stop_machine code should be more robust. The stopper->enabled flag smells fishy at best. Thanks to Konrad for going through a loop of debug patches and providing the information to decode this issue. Reported-and-tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reported-and-tested-by: Sander Eikelenboom <linux@eikelenboom.it> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1302261843240.22263@ionos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2013-02-26 22:25:17 +01:00
Al Viro	7bb307e894	export kernel_write(), convert open-coded instances Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-26 02:46:11 -05:00
Al Viro	3dadecce20	switch vfs_getattr() to struct path Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-26 02:46:08 -05:00
Linus Torvalds	fffddfd6c8	Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux Pull drm merge from Dave Airlie: "Highlights: - TI LCD controller KMS driver - TI OMAP KMS driver merged from staging - drop gma500 stub driver - the fbcon locking fixes - the vgacon dirty like zebra fix. - open firmware videomode and hdmi common code helpers - major locking rework for kms object handling - pageflip/cursor won't block on polling anymore! - fbcon helper and prime helper cleanups - i915: all over the map, haswell power well enhancements, valleyview macro horrors cleaned up, killing lots of legacy GTT code, - radeon: CS ioctl unification, deprecated UMS support, gpu reset rework, VM fixes - nouveau: reworked thermal code, external dp/tmds encoder support (anx9805), fences sleep instead of polling, - exynos: all over the driver fixes." Lovely conflict in radeon/evergreen_cs.c between commit `de0babd60d` ("drm/radeon: enforce use of radeon_get_ib_value when reading user cmd") and the new changes that modified that evergreen_dma_cs_parse() function. * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (508 commits) drm/tilcdc: only build on arm drm/i915: Revert hdmi HDP pin checks drm/tegra: Add list of framebuffers to debugfs drm/tegra: Fix color expansion drm/tegra: Split DC_CMD_STATE_CONTROL register write drm/tegra: Implement page-flipping support drm/tegra: Implement VBLANK support drm/tegra: Implement .mode_set_base() drm/tegra: Add plane support drm/tegra: Remove bogus tegra_framebuffer structure drm: Add consistency check for page-flipping drm/radeon: Use generic HDMI infoframe helpers drm/tegra: Use generic HDMI infoframe helpers drm: Add EDID helper documentation drm: Add HDMI infoframe helpers video: Add generic HDMI infoframe helpers drm: Add some missing forward declarations drm: Move mode tables to drm_edid.c drm: Remove duplicate drm_mode_cea_vic() gma500: Fix n, m1 and m2 clock limits for sdvo and lvds ...	2013-02-25 16:46:44 -08:00
Linus Torvalds	94f2f14234	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace and namespace infrastructure changes from Eric W Biederman: "This set of changes starts with a few small enhnacements to the user namespace. reboot support, allowing more arbitrary mappings, and support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the user namespace root. I do my best to document that if you care about limiting your unprivileged users that when you have the user namespace support enabled you will need to enable memory control groups. There is a minor bug fix to prevent overflowing the stack if someone creates way too many user namespaces. The bulk of the changes are a continuation of the kuid/kgid push down work through the filesystems. These changes make using uids and gids typesafe which ensures that these filesystems are safe to use when multiple user namespaces are in use. The filesystems converted for 3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs. The changes for these filesystems were a little more involved so I split the changes into smaller hopefully obviously correct changes. XFS is the only filesystem that remains. I was hoping I could get that in this release so that user namespace support would be enabled with an allyesconfig or an allmodconfig but it looks like the xfs changes need another couple of days before it they are ready." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits) cifs: Enable building with user namespaces enabled. cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t cifs: Convert struct cifs_sb_info to use kuids and kgids cifs: Modify struct smb_vol to use kuids and kgids cifs: Convert struct cifsFileInfo to use a kuid cifs: Convert struct cifs_fattr to use kuid and kgids cifs: Convert struct tcon_link to use a kuid. cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t cifs: Convert from a kuid before printing current_fsuid cifs: Use kuids and kgids SID to uid/gid mapping cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc cifs: Use BUILD_BUG_ON to validate uids and gids are the same size cifs: Override unmappable incoming uids and gids nfsd: Enable building with user namespaces enabled. nfsd: Properly compare and initialize kuids and kgids nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids nfsd: Modify nfsd4_cb_sec to use kuids and kgids nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion nfsd: Convert nfsxdr to use kuids and kgids nfsd: Convert nfs3xdr to use kuids and kgids ...	2013-02-25 16:00:49 -08:00
Linus Torvalds	9043a2650c	The sweeping change is to make add_taint() explicitly indicate whether to disable lockdep, but it's a mechanical change. Cheers, Rusty. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJRJAcuAAoJENkgDmzRrbjxsw0P/3eXb+LddYnx0V0uHYdKpCUf 4vdW7X0fX3Z+aUK69IWRL/6ahoO4TpaHYGHBDjEoivyQ0GDq14X7JNWsYYt3LdMf 3wmDgRc2cn/mZOJbFeVpNV8ox5l/xc0CUvV+iQ8tMjfQItXMXgWUFZKMECsXKSO6 eex3lrw9M2jAX2uL8LQPp9W8xtKu24nSZRC6tH5riE/8fCzi1cZPPAqfxP5c8Lee ZXtbCRSyAFENZLpKyMe1PC7HvtJyi5NDn9xwOQiXULZV/VOlvP94DGBLIKCM/6dn 4QvZxpG0P0uOlpCgRAVLyh/z7g4XY4VF/fHopLCmEcqLsvgD+V2LQpQ9zWUalLPC Z+pUpz2vu0gIddPU1nR8R6oGpEdJ8O12aJle62p/RSXWZGx12qUQ+Tamu0tgKcv1 AsiJfbUGNDYfxgU6sHsoQjl2f68LTVckCU1C1LqEbW/S104EIORtGx30CHM4LRiO 32kDC5TtgYDBKQAIqJ4bL48ZMh+9W3uX40p7xzOI5khHQjvswUKa3jcxupU0C1uv lx8KXo7pn8WT33QGysWC782wJCgJuzSc2vRn+KQoqoynuHGM6agaEtR59gil3QWO rQEcxH63BBRDgHlg4FM9IkJwwsnC3PWKL8gbX0uAWXAPMbgapJkuuGZAwt0WDGVK +GszxsFkCjlW0mK0egTb =tiSY -----END PGP SIGNATURE----- Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module update from Rusty Russell: "The sweeping change is to make add_taint() explicitly indicate whether to disable lockdep, but it's a mechanical change." * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: MODSIGN: Add option to not sign modules during modules_install MODSIGN: Add -s <signature> option to sign-file MODSIGN: Specify the hash algorithm on sign-file command line MODSIGN: Simplify Makefile with a Kconfig helper module: clean up load_module a little more. modpost: Ignore ARC specific non-alloc sections module: constify within_module_* taint: add explicit flag to show whether lock dep is still OK. module: printk message when module signature fail taints kernel.	2013-02-25 15:41:43 -08:00
Linus Torvalds	89f883372f	Merge tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Marcelo Tosatti: "KVM updates for the 3.9 merge window, including x86 real mode emulation fixes, stronger memory slot interface restrictions, mmu_lock spinlock hold time reduction, improved handling of large page faults on shadow, initial APICv HW acceleration support, s390 channel IO based virtio, amongst others" * tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits) Revert "KVM: MMU: lazily drop large spte" x86: pvclock kvm: align allocation size to page size KVM: nVMX: Remove redundant get_vmcs12 from nested_vmx_exit_handled_msr x86 emulator: fix parity calculation for AAD instruction KVM: PPC: BookE: Handle alignment interrupts booke: Added DBCR4 SPR number KVM: PPC: booke: Allow multiple exception types KVM: PPC: booke: use vcpu reference from thread_struct KVM: Remove user_alloc from struct kvm_memory_slot KVM: VMX: disable apicv by default KVM: s390: Fix handling of iscs. KVM: MMU: cleanup __direct_map KVM: MMU: remove pt_access in mmu_set_spte KVM: MMU: cleanup mapping-level KVM: MMU: lazily drop large spte KVM: VMX: cleanup vmx_set_cr0(). KVM: VMX: add missing exit names to VMX_EXIT_REASONS array KVM: VMX: disable SMEP feature when guest is in non-paging mode KVM: Remove duplicate text in api.txt Revert "KVM: MMU: split kvm_mmu_free_page" ...	2013-02-24 13:07:18 -08:00
Frederic Weisbecker	7f6575f1fb	cputime: Use local_clock() for full dynticks cputime accounting Running the full dynticks cputime accounting with preemptible kernel debugging trigger the following warning: [ 4.488303] BUG: using smp_processor_id() in preemptible [00000000] code: init/1 [ 4.490971] caller is native_sched_clock+0x22/0x80 [ 4.493663] Pid: 1, comm: init Not tainted 3.8.0+ #13 [ 4.496376] Call Trace: [ 4.498996] [<ffffffff813410eb>] debug_smp_processor_id+0xdb/0xf0 [ 4.501716] [<ffffffff8101e642>] native_sched_clock+0x22/0x80 [ 4.504434] [<ffffffff8101db99>] sched_clock+0x9/0x10 [ 4.507185] [<ffffffff81096ccd>] fetch_task_cputime+0xad/0x120 [ 4.509916] [<ffffffff81096dd5>] task_cputime+0x35/0x60 [ 4.512622] [<ffffffff810f146e>] acct_update_integrals+0x1e/0x40 [ 4.515372] [<ffffffff8117d2cf>] do_execve_common+0x4ff/0x5c0 [ 4.518117] [<ffffffff8117cf14>] ? do_execve_common+0x144/0x5c0 [ 4.520844] [<ffffffff81867a10>] ? rest_init+0x160/0x160 [ 4.523554] [<ffffffff8117d457>] do_execve+0x37/0x40 [ 4.526276] [<ffffffff810021a3>] run_init_process+0x23/0x30 [ 4.528953] [<ffffffff81867aac>] kernel_init+0x9c/0xf0 [ 4.531608] [<ffffffff8188356c>] ret_from_fork+0x7c/0xb0 We use sched_clock() to perform and fixup the cputime accounting. However we are calling it with preemption enabled from the read side, which trigger the bug above. To fix this up, use local_clock() instead. It takes care of preemption and also provide a more reliable clock source. This is welcome for this kind of statistic that is widely relied on in userspace. Reported-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: Ingo Molnar <mingo@kernel.org> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Kevin Hilman <khilman@linaro.org> Link: http://lkml.kernel.org/r/1361636925-22288-3-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-24 12:57:16 +01:00
Linus Torvalds	9e2d59ad58	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull signal handling cleanups from Al Viro: "This is the first pile; another one will come a bit later and will contain SYSCALL_DEFINE-related patches. - a bunch of signal-related syscalls (both native and compat) unified. - a bunch of compat syscalls switched to COMPAT_SYSCALL_DEFINE (fixing several potential problems with missing argument validation, while we are at it) - a lot of now-pointless wrappers killed - a couple of architectures (cris and hexagon) forgot to save altstack settings into sigframe, even though they used the (uninitialized) values in sigreturn; fixed. - microblaze fixes for delivery of multiple signals arriving at once - saner set of helpers for signal delivery introduced, several architectures switched to using those." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (143 commits) x86: convert to ksignal sparc: convert to ksignal arm: switch to struct ksignal * passing alpha: pass k_sigaction and siginfo_t using ksignal pointer burying unused conditionals make do_sigaltstack() static arm64: switch to generic old sigaction() (compat-only) arm64: switch to generic compat rt_sigaction() arm64: switch compat to generic old sigsuspend arm64: switch to generic compat rt_sigqueueinfo() arm64: switch to generic compat rt_sigpending() arm64: switch to generic compat rt_sigprocmask() arm64: switch to generic sigaltstack sparc: switch to generic old sigsuspend sparc: COMPAT_SYSCALL_DEFINE does all sign-extension as well as SYSCALL_DEFINE sparc: kill sign-extending wrappers for native syscalls kill sparc32_open() sparc: switch to use of generic old sigaction sparc: switch sys_compat_rt_sigaction() to COMPAT_SYSCALL_DEFINE mips: switch to generic sys_fork() and sys_clone() ...	2013-02-23 18:50:11 -08:00
Linus Torvalds	5ce1a70e2f	Merge branch 'akpm' (more incoming from Andrew) Merge second patch-bomb from Andrew Morton: - A little DM fix - the MM queue * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (154 commits) ksm: allocate roots when needed mm: cleanup "swapcache" in do_swap_page mm,ksm: swapoff might need to copy mm,ksm: FOLL_MIGRATION do migration_entry_wait ksm: shrink 32-bit rmap_item back to 32 bytes ksm: treat unstable nid like in stable tree ksm: add some comments tmpfs: fix mempolicy object leaks tmpfs: fix use-after-free of mempolicy object mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages mm: export mmu notifier invalidates mm: accelerate mm_populate() treatment of THP pages mm: use long type for page counts in mm_populate() and get_user_pages() mm: accurately document nr_free_*_pages functions with code comments HWPOISON: change order of error_states[]'s elements HWPOISON: fix misjudgement of page_action() for errors on mlocked pages memcg: stop warning on memcg_propagate_kmem net: change type of virtio_chan->p9_max_pages vmscan: change type of vm_total_pages to unsigned long fs/nfsd: change type of max_delegations, nfsd_drc_max_mem and nfsd_drc_mem_used ...	2013-02-23 17:50:35 -08:00
Paul Szabo	75f7ad8e04	page-writeback.c: subtract min_free_kbytes from dirtyable memory When calculating amount of dirtyable memory, min_free_kbytes should be subtracted because it is not intended for dirty pages. Addresses http://bugs.debian.org/695182 [akpm@linux-foundation.org: fix up min_free_kbytes extern declarations] [akpm@linux-foundation.org: fix min() warning] Signed-off-by: Paul Szabo <psz@maths.usyd.edu.au> Acked-by: Rik van Riel <riel@redhat.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:17 -08:00
Tang Chen	aa00d89c27	sched: do not use cpu_to_node() to find an offlined cpu's node. If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu) will return -1. As a result, cpumask_of_node(nid) will return NULL. In this case, find_next_bit() in for_each_cpu will get a NULL pointer and cause panic. Here is a call trace: Call Trace: <IRQ> select_fallback_rq+0x71/0x190 try_to_wake_up+0x2cb/0x2f0 wake_up_process+0x15/0x20 hrtimer_wakeup+0x22/0x30 __run_hrtimer+0x83/0x320 hrtimer_interrupt+0x106/0x280 smp_apic_timer_interrupt+0x69/0x99 apic_timer_interrupt+0x6f/0x80 There is a hrtimer process sleeping, whose cpu has already been offlined. When it is waken up, it tries to find another cpu to run, and get a -1 nid. As a result, cpumask_of_node(-1) returns NULL, and causes ernel panic. This patch fixes this problem by judging if the nid is -1. If nid is not -1, a cpu on the same node will be picked. Else, a online cpu on another node will be picked. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:13 -08:00
Al Viro	496ad9aa8e	new helper: file_inode(file) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-22 23:31:31 -05:00
Linus Torvalds	3b5d8510b9	Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core locking changes from Ingo Molnar: "The biggest change is the rwsem lock-steal improvements, both to the assembly optimized and the spinlock based variants. The other notable change is the clean up of the seqlock implementation to be based on the seqcount infrastructure. The rest is assorted smaller debuggability, cleanup and continued -rt locking changes." * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rwsem-spinlock: Implement writer lock-stealing for better scalability futex: Revert "futex: Mark get_robust_list as deprecated" generic: Use raw local irq variant for generic cmpxchg lockdep: Selftest: convert spinlock to raw spinlock seqlock: Use seqcount infrastructure seqlock: Remove unused functions ntp: Make ntp_lock raw intel_idle: Convert i7300_idle_lock to raw_spinlock locking: Various static lock initializer fixes lockdep: Print more info when MAX_LOCK_DEPTH is exceeded rwsem: Implement writer lock-stealing for better scalability lockdep: Silence warning if CONFIG_LOCKDEP isn't set watchdog: Use local_clock for get_timestamp() lockdep: Rename print_unlock_inbalance_bug() to print_unlock_imbalance_bug() locking/stat: Fix a typo	2013-02-22 19:25:09 -08:00
Nathan Zimmer	bbbfeac92b	sched: Fix /proc/sched_debug failure on very very large systems On systems with 4096 cores attemping to read /proc/sched_debug fails because we are trying to push all the data into a single kmalloc buffer. The issue is on these very large machines all the data will not fit in 4mb. A better solution is to not us the single_open mechanism but to provide our own seq_operations and treat each cpu as an individual record. The output should be identical to the previous version. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Nathan Zimmer <nzimmer@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org>) [ Whitespace fixlet] [ Fix spello in comment] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-22 10:27:25 +01:00
Nathan Zimmer	cb152ff267	sched: Fix /proc/sched_stat failure on very very large systems On systems with 4096 cores doing a cat /proc/sched_stat fails, because we are trying to push all the data into a single kmalloc buffer. The issue is on these very large machines all the data will not fit in 4mb. A better solution is to not use the single_open() mechanism but to provide our own seq_operations. The output should be identical to previous version and thus not need the version number. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Nathan Zimmer <nzimmer@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wu Fengguang <fengguang.wu@intel.com> [ Fix memleak] [ Fix spello in comment] [ Fix warnings] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-22 10:27:24 +01:00
Linus Torvalds	2ef14f465b	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm changes from Peter Anvin: "This is a huge set of several partly interrelated (and concurrently developed) changes, which is why the branch history is messier than one would like. The really big items are two humonguous patchsets mostly developed by Yinghai Lu at my request, which completely revamps the way we create initial page tables. In particular, rather than estimating how much memory we will need for page tables and then build them into that memory -- a calculation that has shown to be incredibly fragile -- we now build them (on 64 bits) with the aid of a "pseudo-linear mode" -- a #PF handler which creates temporary page tables on demand. This has several advantages: 1. It makes it much easier to support things that need access to data very early (a followon patchset uses this to load microcode way early in the kernel startup). 2. It allows the kernel and all the kernel data objects to be invoked from above the 4 GB limit. This allows kdump to work on very large systems. 3. It greatly reduces the difference between Xen and native (Xen's equivalent of the #PF handler are the temporary page tables created by the domain builder), eliminating a bunch of fragile hooks. The patch series also gets us a bit closer to W^X. Additional work in this pull is the 64-bit get_user() work which you were also involved with, and a bunch of cleanups/speedups to __phys_addr()/__pa()." * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (105 commits) x86, mm: Move reserving low memory later in initialization x86, doc: Clarify the use of asm("%edx") in uaccess.h x86, mm: Redesign get_user with a __builtin_choose_expr hack x86: Be consistent with data size in getuser.S x86, mm: Use a bitfield to mask nuisance get_user() warnings x86/kvm: Fix compile warning in kvm_register_steal_time() x86-32: Add support for 64bit get_user() x86-32, mm: Remove reference to alloc_remap() x86-32, mm: Remove reference to resume_map_numa_kva() x86-32, mm: Rip out x86_32 NUMA remapping code x86/numa: Use __pa_nodebug() instead x86: Don't panic if can not alloc buffer for swiotlb mm: Add alloc_bootmem_low_pages_nopanic() x86, 64bit, mm: hibernate use generic mapping_init x86, 64bit, mm: Mark data/bss/brk to nx x86: Merge early kernel reserve for 32bit and 64bit x86: Add Crash kernel low reservation x86, kdump: Remove crashkernel range find limit for 64bit memblock: Add memblock_mem_size() x86, boot: Not need to check setup_header version for setup_data ...	2013-02-21 18:06:55 -08:00
Linus Torvalds	27ea6dfdc2	Misc ia64 bits for 3.9 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJRJpj1AAoJEKurIx+X31iB9icP/0ps+HenD7ogf1mUVnVLVWIh rC+SQa3/ihqw6H+lokyPQsaGL6UMnd3I/mibnH5OFekM/NMFe/nFJJukUec4W9bW WtZqoi4nVDZiRQo3cemoK7sfGhHNAb/rXvb7KN0donlWcbmft9Eaj7wZUiX1RwXs xkGUP5urmSbhyLDVuQOogSvU0StNc4cVFw0Hu3GOvZuOuhePgxUI0aNzqv//IOMG qTsbMh8ABb0J26GdRbxOLnShVcD7HCEsK1SgQevl5mERKcWUJauXKdeJAwwJ0XE2 s/U7PCzlkVq77Mdaupzdfl78ahurH90Z4eX29PuztYvFNeA+smDHuGM6LytmvVba 8TjrqcbAWfbRoa5F2jbBXu0Bu+mzYz0xIi2SqegF5oA5j2679LZMHa5ymafZlfjy wINph4AehW9mMBZMBlPzR6MZMGAi3xIfZFUu4J91cdmchYxFY9qxqyI7CLbq1he3 cDJNK9cUBv8NKRxVLjom0lO6uO/Q/6KEC+6qIxJjDcAIvG76O+HRmmlV4l+eU57H BqxLl5jt+alfJWs4ElxxFPiNu0RXeYhIcJXfgAdDer3f00NwGUtKNoiw7wipLoI0 KH3qDmfI8Vgd2rrESeYbqnYEiSZ8wZTP44kmwzIvjaS9fgOK9k4WoMHGLqP001oU kQnvI4cHjffCyB9nvppS =zl+k -----END PGP SIGNATURE----- Merge tag 'please-pull-misc-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux Pull misc ia64 bits from Tony Luck. * tag 'please-pull-misc-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: MAINTAINERS: update SGI & ia64 Altix stuff sysctl: Enable IA64 "ignore-unaligned-usertrap" to be used cross-arch	2013-02-21 17:55:48 -08:00
Linus Torvalds	7c2db36e73	Merge branch 'akpm' (incoming from Andrew) Merge misc patches from Andrew Morton: - Florian has vanished so I appear to have become fbdev maintainer again :( - Joel and Mark are distracted to welcome to the new OCFS2 maintainer - The backlight queue - Small core kernel changes - lib/ updates - The rtc queue - Various random bits * akpm: (164 commits) rtc: rtc-davinci: use devm_() functions rtc: rtc-max8997: use devm_request_threaded_irq() rtc: rtc-max8907: use devm_request_threaded_irq() rtc: rtc-da9052: use devm_request_threaded_irq() rtc: rtc-wm831x: use devm_request_threaded_irq() rtc: rtc-tps80031: use devm_request_threaded_irq() rtc: rtc-lp8788: use devm_request_threaded_irq() rtc: rtc-coh901331: use devm_clk_get() rtc: rtc-vt8500: use devm_() functions rtc: rtc-tps6586x: use devm_request_threaded_irq() rtc: rtc-imxdi: use devm_clk_get() rtc: rtc-cmos: use dev_warn()/dev_dbg() instead of printk()/pr_debug() rtc: rtc-pcf8583: use dev_warn() instead of printk() rtc: rtc-sun4v: use pr_warn() instead of printk() rtc: rtc-vr41xx: use dev_info() instead of printk() rtc: rtc-rs5c313: use pr_err() instead of printk() rtc: rtc-at91rm9200: use dev_dbg()/dev_err() instead of printk()/pr_debug() rtc: rtc-rs5c372: use dev_dbg()/dev_warn() instead of printk()/pr_debug() rtc: rtc-ds2404: use dev_err() instead of printk() rtc: rtc-efi: use dev_err()/dev_warn()/pr_err() instead of printk() ...	2013-02-21 17:38:49 -08:00
Yuanhan Liu	d7d48f6216	kernel/nsproxy.c: remove duplicate task_cred_xxx for user_ns We can use user_ns, which is also assigned from task_cred_xxx(tsk, user_ns), at the beginning of copy_namespaces(). Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-21 17:22:26 -08:00
Andrew Morton	f3cbd435b0	sys_prctl(): coding-style cleanup Remove a tabstop from the switch statement, in the usual fashion. A few instances of weirdwrapping were removed as a result. Cc: Chen Gang <gang.chen@asianux.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-21 17:22:20 -08:00
Chen Gang	7fe5e04292	sys_prctl(): arg2 is unsigned long which is never < 0 arg2 will never < 0, for its type is 'unsigned long' Also, use the provided macros. Signed-off-by: Chen Gang <gang.chen@asianux.com> Reported-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-21 17:22:20 -08:00
Shaohua Li	9a46ad6d6d	smp: make smp_call_function_many() use logic similar to smp_call_function_single() I'm testing swapout workload in a two-socket Xeon machine. The workload has 10 threads, each thread sequentially accesses separate memory region. TLB flush overhead is very big in the workload. For each page, page reclaim need move it from active lru list and then unmap it. Both need a TLB flush. And this is a multthread workload, TLB flush happens in 10 CPUs. In X86, TLB flush uses generic smp_call)function. So this workload stress smp_call_function_many heavily. Without patch, perf shows: + 24.49% [k] generic_smp_call_function_interrupt - 21.72% [k] _raw_spin_lock - _raw_spin_lock + 79.80% __page_check_address + 6.42% generic_smp_call_function_interrupt + 3.31% get_swap_page + 2.37% free_pcppages_bulk + 1.75% handle_pte_fault + 1.54% put_super + 1.41% grab_super_passive + 1.36% __swap_duplicate + 0.68% blk_flush_plug_list + 0.62% swap_info_get + 6.55% [k] flush_tlb_func + 6.46% [k] smp_call_function_many + 5.09% [k] call_function_interrupt + 4.75% [k] default_send_IPI_mask_sequence_phys + 2.18% [k] find_next_bit swapout throughput is around 1300M/s. With the patch, perf shows: - 27.23% [k] _raw_spin_lock - _raw_spin_lock + 80.53% __page_check_address + 8.39% generic_smp_call_function_single_interrupt + 2.44% get_swap_page + 1.76% free_pcppages_bulk + 1.40% handle_pte_fault + 1.15% __swap_duplicate + 1.05% put_super + 0.98% grab_super_passive + 0.86% blk_flush_plug_list + 0.57% swap_info_get + 8.25% [k] default_send_IPI_mask_sequence_phys + 7.55% [k] call_function_interrupt + 7.47% [k] smp_call_function_many + 7.25% [k] flush_tlb_func + 3.81% [k] _raw_spin_lock_irqsave + 3.78% [k] generic_smp_call_function_single_interrupt swapout throughput is around 1400M/s. So there is around a 7% improvement, and total cpu utilization doesn't change. Without the patch, cfd_data is shared by all CPUs. generic_smp_call_function_interrupt does read/write cfd_data several times which will create a lot of cache ping-pong. With the patch, the data becomes per-cpu. The ping-pong is avoided. And from the perf data, this doesn't make call_single_queue lock contend. Next step is to remove generic_smp_call_function_interrupt() from arch code. Signed-off-by: Shaohua Li <shli@fusionio.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-21 17:22:20 -08:00
Greg Kroah-Hartman	af3b56289b	time: don't inline EXPORT_SYMBOL functions How is the compiler even handling exported functions that are marked inline? Anyway, these shouldn't be inline because of that, so remove that marking. Based on a larger patch by Mark Charlebois to get LLVM to build the kernel. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Mark Charlebois <mcharleb@qualcomm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: hank <pyu@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-21 17:22:19 -08:00
Dan Carpenter	bffea77c08	compat: return -EFAULT on error in waitid() The copy_to_user() call returns the number of bytes remaining but we want to return -EFAULT on error. Fixes "x32: fix waitid()" Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-21 17:22:16 -08:00
Frederic Weisbecker	4d4c4e24cf	irq: Remove IRQ_EXIT_OFFSET workaround The IRQ_EXIT_OFFSET trick was used to make sure the irq doesn't get preempted after we substract the HARDIRQ_OFFSET until we are entirely done with any code in irq_exit(). This workaround was necessary because some archs may call irq_exit() with irqs enabled and there is still some code in the end of this function that is not covered by the HARDIRQ_OFFSET but want to stay non-preemptible. Now that irq are always disabled in irq_exit(), the whole code is guaranteed not to be preempted. We can thus remove this hack. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2013-02-22 00:05:07 +01:00
Linus Torvalds	b274776c54	arm-soc: cleanups A large number of cleanups, all over the platforms. This is dominated largely by the Samsung platforms (s3c, s5p, exynos) and a few of the others moving code out of arch/arm into more appropriate subsystems. The clocksource and irqchip drivers are now abstracted to the point where platforms that are already cleaned up do not need to even specify the driver they use, it can all get configured from the device tree as we do for normal device drivers. The clocksource changes basically touch every single platform in the process. We further clean up the use of platform specific header files here, with the goal of turning more of the platforms over to being "multiplatform" enabled, which implies that they cannot expose their headers to architecture independent code any more. It is expected that no functional changes are part of the cleanup. The overall reduction in total code lines is mostly the result of removing broken and obsolete code. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIVAwUAUSUyKmCrR//JCVInAQIN8RAAnb/uPytmlMjn5yCksF4Mvb/FVbn/TVwz KRIGpCHOzyKK1q7pM8NRUVWfjW2SZqbXJFqx6zBGKSlDPvFTOhsLyyupU+Tnyu5W IX4eIUBwb+a6H7XDHw0X2YI8uHzi5RNLhne0A1QyDKcnuHs1LDAttXnJHaK4Ap6Y NN2YFt3l3ld7DXWXJtMsw5v8lC10aeIFGTvXefaPDAdeMLivmI57qEUMDXknNr7W Odz/Rc0/cw3BNBVl/zNHA0jw7FOjKAymCYYNUa4xDCJEr+JnIRTqizd0N/YIIC7x aA2xjJ3oKUFyF51yiJE6nFuTyJznhwtehc+uiMOSIkjrPLym52LEHmd7G5Yqlmjz oiei09qBb870q3lGxwfht9iaeIwYgQFYGfD0yW5QWArCO5pxhtCPLPH7YZNZtcQd ZJRSGGqT/ljBz3bm0K9OLESeeTTN7+Nxvtpiz/CD+Piegz0gWJzDYJRTzkJ3UWpA WTVhVQdWUeX2JrNkgM7Z3Tu8iXOe+LIEs7kVXGJZSREmIIZiRvR36UrODZtAkp9I 7YQ+srX/uaR832pgK0RrHK0zY0psU6MmIvhYxJZFbx7keiPA9eH6drb0x7tGqcUD FzEUzvcZvyqppndfBi+R60H/YKAhJDEXdwxzo6dyCpPQaW1T9GnzIqXuE1zin+Aw X7Y8YywMbHI= =DvgJ -----END PGP SIGNATURE----- Merge tag 'cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC cleanups from Arnd Bergmann: "A large number of cleanups, all over the platforms. This is dominated largely by the Samsung platforms (s3c, s5p, exynos) and a few of the others moving code out of arch/arm into more appropriate subsystems. The clocksource and irqchip drivers are now abstracted to the point where platforms that are already cleaned up do not need to even specify the driver they use, it can all get configured from the device tree as we do for normal device drivers. The clocksource changes basically touch every single platform in the process. We further clean up the use of platform specific header files here, with the goal of turning more of the platforms over to being "multiplatform" enabled, which implies that they cannot expose their headers to architecture independent code any more. It is expected that no functional changes are part of the cleanup. The overall reduction in total code lines is mostly the result of removing broken and obsolete code." * tag 'cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (133 commits) ARM: mvebu: correct gated clock documentation ARM: kirkwood: add missing include for nsa310 ARM: exynos: move exynos4210-combiner to drivers/irqchip mfd: db8500-prcmu: update resource passing drivers/db8500-cpufreq: delete dangling include ARM: at91: remove NEOCORE 926 board sunxi: Cleanup the reset code and add meaningful registers defines ARM: S3C24XX: header mach/regs-mem.h local ARM: S3C24XX: header mach/regs-power.h local ARM: S3C24XX: header mach/regs-s3c2412-mem.h local ARM: S3C24XX: Remove plat-s3c24xx directory in arch/arm/ ARM: S3C24XX: transform s3c2443 subirqs into new structure ARM: S3C24XX: modify s3c2443 irq init to initialize all irqs ARM: S3C24XX: move s3c2443 irq code to irq.c ARM: S3C24XX: transform s3c2416 irqs into new structure ARM: S3C24XX: modify s3c2416 irq init to initialize all irqs ARM: S3C24XX: move s3c2416 irq init to common irq code ARM: S3C24XX: Modify s3c_irq_wake to use the hwirq property ARM: S3C24XX: Move irq syscore-ops to irq-pm clocksource: always define CLOCKSOURCE_OF_DECLARE ...	2013-02-21 14:58:40 -08:00
Linus Torvalds	21eaab6d19	tty/serial patches for 3.9-rc1 Here's the big tty/serial driver patches for 3.9-rc1. More tty port rework and fixes from Jiri here, as well as lots of individual serial driver updates and fixes. All of these have been in the linux-next tree for a while. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEABECAAYFAlEmZYQACgkQMUfUDdst+ylJDgCg0B0nMevUUdM4hLvxunbbiyXM HUEAoIOedqriNNPvX4Bwy0hjeOEaWx0g =vi6x -----END PGP SIGNATURE----- Merge tag 'tty-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial patches from Greg Kroah-Hartman: "Here's the big tty/serial driver patches for 3.9-rc1. More tty port rework and fixes from Jiri here, as well as lots of individual serial driver updates and fixes. All of these have been in the linux-next tree for a while." * tag 'tty-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (140 commits) tty: mxser: improve error handling in mxser_probe() and mxser_module_init() serial: imx: fix uninitialized variable warning serial: tegra: assume CONFIG_OF TTY: do not update atime/mtime on read/write lguest: select CONFIG_TTY to build properly. ARM defconfigs: add missing inclusions of linux/platform_device.h fb/exynos: include platform_device.h ARM: sa1100/assabet: include platform_device.h directly serial: imx: Fix recursive locking bug pps: Fix build breakage from decoupling pps from tty tty: Remove ancient hardpps() pps: Additional cleanups in uart_handle_dcd_change pps: Move timestamp read into PPS code proper pps: Don't crash the machine when exiting will do pps: Fix a use-after free bug when unregistering a source. pps: Use pps_lookup_dev to reduce ldisc coupling pps: Add pps_lookup_dev() function tty: serial: uartlite: Support uartlite on big and little endian systems tty: serial: uartlite: Fix sparse and checkpatch warnings serial/arc-uart: Miscll DT related updates (Grant's review comments) ... Fix up trivial conflicts, mostly just due to the TTY config option clashing with the EXPERIMENTAL removal.	2013-02-21 13:41:04 -08:00
Linus Torvalds	06991c28f3	Driver core patches for 3.9-rc1 Here is the big driver core merge for 3.9-rc1 There are two major series here, both of which touch lots of drivers all over the kernel, and will cause you some merge conflicts: - add a new function called devm_ioremap_resource() to properly be able to check return values. - remove CONFIG_EXPERIMENTAL If you need me to provide a merged tree to handle these resolutions, please let me know. Other than those patches, there's not much here, some minor fixes and updates. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEABECAAYFAlEmV0cACgkQMUfUDdst+yncCQCfbmnQZju7kzWXk6PjdFuKspT9 weAAoMCzcAtEzzc4LXuUxxG/sXBVBCjW =yWAQ -----END PGP SIGNATURE----- Merge tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core patches from Greg Kroah-Hartman: "Here is the big driver core merge for 3.9-rc1 There are two major series here, both of which touch lots of drivers all over the kernel, and will cause you some merge conflicts: - add a new function called devm_ioremap_resource() to properly be able to check return values. - remove CONFIG_EXPERIMENTAL Other than those patches, there's not much here, some minor fixes and updates" Fix up trivial conflicts * tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (221 commits) base: memory: fix soft/hard_offline_page permissions drivercore: Fix ordering between deferred_probe and exiting initcalls backlight: fix class_find_device() arguments TTY: mark tty_get_device call with the proper const values driver-core: constify data for class_find_device() firmware: Ignore abort check when no user-helper is used firmware: Reduce ifdef CONFIG_FW_LOADER_USER_HELPER firmware: Make user-mode helper optional firmware: Refactoring for splitting user-mode helper code Driver core: treat unregistered bus_types as having no devices watchdog: Convert to devm_ioremap_resource() thermal: Convert to devm_ioremap_resource() spi: Convert to devm_ioremap_resource() power: Convert to devm_ioremap_resource() mtd: Convert to devm_ioremap_resource() mmc: Convert to devm_ioremap_resource() mfd: Convert to devm_ioremap_resource() media: Convert to devm_ioremap_resource() iommu: Convert to devm_ioremap_resource() drm: Convert to devm_ioremap_resource() ...	2013-02-21 12:05:51 -08:00
Thomas Gleixner	af7bdbafe3	Revert "nohz: Make tick_nohz_irq_exit() irq safe" This reverts commit 351429b2e62b6545bb10c756686393f29ba268a1. The extra local_irq_save() is not longer needed as the call site now always calls with interrupts disabled. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linuxfoundation.org>	2013-02-21 20:52:34 +01:00
Thomas Gleixner	facd8b80c6	irq: Sanitize invoke_softirq With the irq protection in irq_exit, we can remove the #ifdeffery and the bh_disable/enable dance in invoke_softirq() Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linuxfoundation.org> Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1302202155320.22263@ionos	2013-02-21 20:52:24 +01:00
Thomas Gleixner	74eed0163d	irq: Ensure irq_exit() code runs with interrupts disabled We had already a few problems with code called from irq_exit() when interrupted from a nesting interrupt. This can happen on architectures which do not define __ARCH_IRQ_EXIT_IRQS_DISABLED. __ARCH_IRQ_EXIT_IRQS_DISABLED should go away and we want to make it mandatory to call irq_exit() with interrupts disabled. As a temporary protection disable interrupts for those architectures which do not define __ARCH_IRQ_EXIT_IRQS_DISABLED and add a WARN_ONCE when an architecture which defines __ARCH_IRQ_EXIT_IRQS_DISABLED calls irq_exit() with interrupts enabled. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linuxfoundation.org> Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1302202155320.22263@ionos	2013-02-21 20:52:24 +01:00
Frederic Weisbecker	e5ab012c32	nohz: Make tick_nohz_irq_exit() irq safe As it stands, irq_exit() may or may not be called with irqs disabled, depending on __ARCH_IRQ_EXIT_IRQS_DISABLED that the arch can define. It makes tick_nohz_irq_exit() unsafe. For example two interrupts can race in tick_nohz_stop_sched_tick(): the inner most one computes the expiring time on top of the timer list, then it's interrupted right before reprogramming the clock. The new interrupt enqueues a new timer list timer, it reprogram the clock to take it into account and it exits. The CPUs resumes the inner most interrupt and performs the clock reprogramming without considering the new timer list timer. This regression has been introduced by: `280f06774a` ("nohz: Separate out irq exit and idle loop dyntick logic") Let's fix it right now with the appropriate protections. A saner long term solution will be to remove __ARCH_IRQ_EXIT_IRQS_DISABLED and mandate that irq_exit() is called with interrupts disabled. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linuxfoundation.org> Cc: <stable@vger.kernel.org> #v3.2+ Link: http://lkml.kernel.org/r/1361373336-11337-1-git-send-email-fweisbec@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2013-02-21 20:52:24 +01:00
Tejun Heo	e182bb38d7	posix-timer: Don't call idr_find() with out-of-range ID When idr_find() was fed a negative ID, it used to look up the ID ignoring the sign bit before recent ("idr: remove MAX_IDR_MASK and move left MAX_IDR_* into idr.c") patch. Now a negative ID triggers a WARN_ON_ONCE(). __lock_timer() feeds timer_id from userland directly to idr_find() without sanitizing it which can trigger the above malfunctions. Add a range check on @timer_id before invoking idr_find() in __lock_timer(). While timer_t is defined as int by all archs at the moment, Andrew worries that it may be defined as a larger type later on. Make the test cover larger integers too so that it at least is guaranteed to not return the wrong timer. Note that WARN_ON_ONCE() in idr_find() on id < 0 is transitional precaution while moving away from ignoring MSB. Once it's gone we can remove the guard as long as timer_t isn't larger than int. Signed-off-by: Tejun Heo <tj@kernel.org>nnn Reported-by: Sasha Levin <sasha.levin@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20130220232412.GL3570@htj.dyndns.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2013-02-21 17:28:29 +01:00
Linus Torvalds	a0b1c42951	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking update from David Miller: 1) Checkpoint/restarted TCP sockets now can properly propagate the TCP timestamp offset. From Andrey Vagin. 2) VMWARE VM VSOCK layer, from Andy King. 3) Much improved support for virtual functions and SR-IOV in bnx2x, from Ariel ELior. 4) All protocols on ipv4 and ipv6 are now network namespace aware, and all the compatability checks for initial-namespace-only protocols is removed. Thanks to Tom Parkin for helping deal with the last major holdout, L2TP. 5) IPV6 support in netpoll and network namespace support in pktgen, from Cong Wang. 6) Multiple Registration Protocol (MRP) and Multiple VLAN Registration Protocol (MVRP) support, from David Ward. 7) Compute packet lengths more accurately in the packet scheduler, from Eric Dumazet. 8) Use per-task page fragment allocator in skb_append_datato_frags(), also from Eric Dumazet. 9) Add support for connection tracking labels in netfilter, from Florian Westphal. 10) Fix default multicast group joining on ipv6, and add anti-spoofing checks to 6to4 and 6rd. From Hannes Frederic Sowa. 11) Make ipv4/ipv6 fragmentation memory limits more reasonable in modern times, rearrange inet frag datastructures for better cacheline locality, and move more operations outside of locking. From Jesper Dangaard Brouer. 12) Instead of strict master <--> slave relationships, allow arbitrary scenerios with "upper device lists". From Jiri Pirko. 13) Improve rate limiting accuracy in TBF and act_police, also from Jiri Pirko. 14) Add a BPF filter netfilter match target, from Willem de Bruijn. 15) Orphan and delete a bunch of pre-historic networking drivers from Paul Gortmaker. 16) Add TSO support for GRE tunnels, from Pravin B SHelar. Although this still needs some minor bug fixing before it's %100 correct in all cases. 17) Handle unresolved IPSEC states like ARP, with a resolution packet queue. From Steffen Klassert. 18) Remove TCP Appropriate Byte Count support (ABC), from Stephen Hemminger. This was long overdue. 19) Support SO_REUSEPORT, from Tom Herbert. 20) Allow locking a socket BPF filter, so that it cannot change after a process drops capabilities. 21) Add VLAN filtering to bridge, from Vlad Yasevich. 22) Bring ipv6 on-par with ipv4 and do not cache neighbour entries in the ipv6 routes, from YOSHIFUJI Hideaki. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1538 commits) ipv6: fix race condition regarding dst->expires and dst->from. net: fix a wrong assignment in skb_split() ip_gre: remove an extra dst_release() ppp: set qdisc_tx_busylock to avoid LOCKDEP splat atl1c: restore buffer state net: fix a build failure when !CONFIG_PROC_FS net: ipv4: fix waring -Wunused-variable net: proc: fix build failed when procfs is not configured Revert "xen: netback: remove redundant xenvif_put" net: move procfs code to net/core/net-procfs.c qmi_wwan, cdc-ether: add ADU960S bonding: set sysfs device_type to 'bond' bonding: fix bond_release_all inconsistencies b44: use netdev_alloc_skb_ip_align() xen: netback: remove redundant xenvif_put net: fec: Do a sanity check on the gpio number ip_gre: propogate target device GSO capability to the tunnel device ip_gre: allow CSUM capable devices to handle packets bonding: Fix initialize after use for 3ad machine state spinlock bonding: Fix race condition between bond_enslave() and bond_3ad_update_lacp_rate() ...	2013-02-20 18:58:50 -08:00
Linus Torvalds	8793422fd9	ACPI and power management updates for 3.9-rc1 - Rework of the ACPI namespace scanning code from Rafael J. Wysocki with contributions from Bjorn Helgaas, Jiang Liu, Mika Westerberg, Toshi Kani, and Yinghai Lu. - ACPI power resources handling and ACPI device PM update from Rafael J. Wysocki. - ACPICA update to version 20130117 from Bob Moore and Lv Zheng with contributions from Aaron Lu, Chao Guan, Jesper Juhl, and Tim Gardner. - Support for Intel Lynxpoint LPSS from Mika Westerberg. - cpuidle update from Len Brown including Intel Haswell support, C1 state for intel_idle, removal of global pm_idle. - cpuidle fixes and cleanups from Daniel Lezcano. - cpufreq fixes and cleanups from Viresh Kumar and Fabio Baltieri with contributions from Stratos Karafotis and Rickard Andersson. - Intel P-states driver for Sandy Bridge processors from Dirk Brandewie. - cpufreq driver for Marvell Kirkwood SoCs from Andrew Lunn. - cpufreq fixes related to ordering issues between acpi-cpufreq and powernow-k8 from Borislav Petkov and Matthew Garrett. - cpufreq support for Calxeda Highbank processors from Mark Langsdorf and Rob Herring. - cpufreq driver for the Freescale i.MX6Q SoC and cpufreq-cpu0 update from Shawn Guo. - cpufreq Exynos fixes and cleanups from Jonghwan Choi, Sachin Kamat, and Inderpal Singh. - Support for "lightweight suspend" from Zhang Rui. - Removal of the deprecated power trace API from Paul Gortmaker. - Assorted updates from Andreas Fleig, Colin Ian King, Davidlohr Bueso, Joseph Salisbury, Kees Cook, Li Fei, Nishanth Menon, ShuoX Liu, Srinivas Pandruvada, Tejun Heo, Thomas Renninger, and Yasuaki Ishimatsu. / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIcBAABAgAGBQJRIsArAAoJEKhOf7ml8uNsD6MP/j7C4NA+GTq6RdwoJt+Yki0K 9Ep8I4pEuRFoN/oskv24EyQhpGJIk6UxWcJ/DWFBc+1VhmKORta7k2Idv/wlJA77 s7AcDveA9xcDh+TVfbh87TeuiMSXiSdDZbiaQO+wMizWJAF3F84AnjiAqqqyQcSK bA5/Siz/vWlt9PyYDaQtHTVE4lpvPuVcQdYewsdaH2PsmUjvIg/TUzg28CTrdyvv eHOdBK9R0/OLQLhzRbL0VOGJ//wEl+HJRO0QEhTKPgdQ1e/VH/4Zu5WSzF8P/x4C s2f8U4IKQqulDuDHXtpMpelFm7hRWgsOqZLkcyXLs+0dvSM9CTPO6P0ZaImxUctk 5daHWEsXUnCErDQawt1mcZP8l6qnxofMQIfLXyPVzvlSnHyToTmrtXa1v2u4AuL/ hOo4MYWsFNUmRdtGFFGlExGgEDZ4G5NwiYjRBl/6XJ3v4nhnnMbuzxP8scpoe5m1 8tjroJHZFUUs/mFU/H+oRbHzSzXPmp1sddNaTg4OpVmTn3DDh6ljnFhiItd1Ndw0 5ldVbSe6ETq5RoK0TbzvQOeVpa9F3JfqbrXLQPqfd2iz/No41LQYG1uShRYuXKuA wfEcc+c9VMd3FILu05pGwBnU8VS9VbxTYMz7xDxg6b29Ywnb7u+Q1ycCk2gFYtkS E2oZDuyewTJxaskzYsNr =wijn -----END PGP SIGNATURE----- Merge tag 'pm+acpi-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management updates from Rafael Wysocki: - Rework of the ACPI namespace scanning code from Rafael J. Wysocki with contributions from Bjorn Helgaas, Jiang Liu, Mika Westerberg, Toshi Kani, and Yinghai Lu. - ACPI power resources handling and ACPI device PM update from Rafael J Wysocki. - ACPICA update to version 20130117 from Bob Moore and Lv Zheng with contributions from Aaron Lu, Chao Guan, Jesper Juhl, and Tim Gardner. - Support for Intel Lynxpoint LPSS from Mika Westerberg. - cpuidle update from Len Brown including Intel Haswell support, C1 state for intel_idle, removal of global pm_idle. - cpuidle fixes and cleanups from Daniel Lezcano. - cpufreq fixes and cleanups from Viresh Kumar and Fabio Baltieri with contributions from Stratos Karafotis and Rickard Andersson. - Intel P-states driver for Sandy Bridge processors from Dirk Brandewie. - cpufreq driver for Marvell Kirkwood SoCs from Andrew Lunn. - cpufreq fixes related to ordering issues between acpi-cpufreq and powernow-k8 from Borislav Petkov and Matthew Garrett. - cpufreq support for Calxeda Highbank processors from Mark Langsdorf and Rob Herring. - cpufreq driver for the Freescale i.MX6Q SoC and cpufreq-cpu0 update from Shawn Guo. - cpufreq Exynos fixes and cleanups from Jonghwan Choi, Sachin Kamat, and Inderpal Singh. - Support for "lightweight suspend" from Zhang Rui. - Removal of the deprecated power trace API from Paul Gortmaker. - Assorted updates from Andreas Fleig, Colin Ian King, Davidlohr Bueso, Joseph Salisbury, Kees Cook, Li Fei, Nishanth Menon, ShuoX Liu, Srinivas Pandruvada, Tejun Heo, Thomas Renninger, and Yasuaki Ishimatsu. * tag 'pm+acpi-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (267 commits) PM idle: remove global declaration of pm_idle unicore32 idle: delete stray pm_idle comment openrisc idle: delete pm_idle mn10300 idle: delete pm_idle microblaze idle: delete pm_idle m32r idle: delete pm_idle, and other dead idle code ia64 idle: delete pm_idle cris idle: delete idle and pm_idle ARM64 idle: delete pm_idle ARM idle: delete pm_idle blackfin idle: delete pm_idle sparc idle: rename pm_idle to sparc_idle sh idle: rename global pm_idle to static sh_idle x86 idle: rename global pm_idle to static x86_idle APM idle: register apm_cpu_idle via cpuidle cpufreq / intel_pstate: Add kernel command line option disable intel_pstate. cpufreq / intel_pstate: Change to disallow module build tools/power turbostat: display SMI count by default intel_idle: export both C1 and C1E ACPI / hotplug: Fix concurrency issues and memory leaks ...	2013-02-20 11:26:56 -08:00
Linus Torvalds	9ae46e6702	Merge branch 'for-3.9-cpuset' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cpuset changes from Tejun Heo: - Synchornization has seen a lot of changes with focus on decoupling cpuset synchronization from cgroup internal locking. After this change, there only remain a couple of mostly trivial dependencies on cgroup_lock outside cgroup core proper. cgroup_lock is scheduled to be unexported in this devel cycle. This will finally remove the fragile locking order around cgroup (cgroup locking wants to / should be one of the outermost but yet has been acquired from deep inside individual controllers). - At this point, Li is most knowlegeable with cpuset and taking over the maintainership of cpuset. * 'for-3.9-cpuset' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cpuset: drop spurious retval assignment in proc_cpuset_show() cpuset: fix RCU lockdep splat cpuset: update MAINTAINERS cpuset: remove cpuset->parent cpuset: replace cpuset->stack_list with cpuset_for_each_descendant_pre() cpuset: replace cgroup_mutex locking with cpuset internal locking cpuset: schedule hotplug propagation from cpuset_attach() if the cpuset is empty cpuset: pin down cpus and mems while a task is being attached cpuset: make CPU / memory hotplug propagation asynchronous cpuset: drop async_rebuild_sched_domains() cpuset: don't nest cgroup_mutex inside get_online_cpus() cpuset: reorganize CPU / memory hotplug handling cpuset: cleanup cpuset[_can]_attach() cpuset: introduce cpuset_for_each_child() cpuset: introduce CS_ONLINE cpuset: introduce ->css_on/offline() cpuset: remove fast exit path from remove_tasks_in_empty_cpuset() cpuset: remove unused cpuset_unlock()	2013-02-20 09:18:31 -08:00
Linus Torvalds	502b24c23b	Merge branch 'for-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup changes from Tejun Heo: "Nothing too drastic. - Removal of synchronize_rcu() from userland visible paths. - Various fixes and cleanups from Li. - cgroup_rightmost_descendant() added which will be used by cpuset changes (it will be a separate pull request)." * 'for-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: fail if monitored file and event_control are in different cgroup cgroup: fix cgroup_rmdir() vs close(eventfd) race cpuset: fix cpuset_print_task_mems_allowed() vs rename() race cgroup: fix exit() vs rmdir() race cgroup: remove bogus comments in cgroup_diput() cgroup: remove synchronize_rcu() from cgroup_diput() cgroup: remove duplicate RCU free on struct cgroup sched: remove redundant NULL cgroup check in task_group_path() sched: split out css_online/css_offline from tg creation/destruction cgroup: initialize cgrp->dentry before css_alloc() cgroup: remove a NULL check in cgroup_exit() cgroup: fix bogus kernel warnings when cgroup_create() failed cgroup: remove synchronize_rcu() from rebind_subsystems() cgroup: remove synchronize_rcu() from cgroup_attach_{task\|proc}() cgroup: use new hashtable implementation cgroups: fix cgroup_event_listener error handling cgroups: move cgroup_event_listener.c to tools/cgroup cgroup: implement cgroup_rightmost_descendant() cgroup: remove unused dummy cgroup_fork_callbacks()	2013-02-20 09:16:21 -08:00
Sha Zhengju	1c3e826482	sched/core: Remove the obsolete and unused nr_uninterruptible() function Signed-off-by: Sha Zhengju <handai.szj@taobao.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1361351678-8065-1-git-send-email-handai.szj@taobao.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-20 11:39:24 +01:00
Ingo Molnar	ff1fb5f6b4	Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/urgent Pull two fixes from Steven Rostedt. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-20 11:26:21 +01:00
Linus Torvalds	ece8e0b2f9	Merge branch 'for-3.9-async' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull async changes from Tejun Heo: "These are followups for the earlier deadlock issue involving async ending up waiting for itself through block requesting module[1]. The following changes are made by these commits. - Instead of requesting default elevator on each request_queue init, block now requests it once early during boot. - Kmod triggers warning if invoked from an async worker. - Async synchronization implementation has been reimplemented. It's a lot simpler now." * 'for-3.9-async' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: async: initialise list heads to fix crash async: replace list of active domains with global list of pending items async: keep pending tasks on async_domain and remove async_pending async: use ULLONG_MAX for infinity cookie value async: bring sanity to the use of words domain and running async, kmod: warn on synchronous request_module() from async workers block: don't request module during elevator init init, block: try to load default elevator module early during boot	2013-02-19 22:10:26 -08:00
Linus Torvalds	67cb104b4c	Merge branch 'for-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue changes from Tejun Heo: "A lot of reorganization is going on mostly to prepare for worker pools with custom attributes so that workqueue can replace custom pool implementations in places including writeback and btrfs and make CPU assignment in crypto more flexible. workqueue evolved from purely per-cpu design and implementation, so there are a lot of assumptions regarding being bound to CPUs and even unbound workqueues are implemented as an extension of the model - workqueues running on the special unbound CPU. Bulk of changes this round are about promoting worker_pools as the top level abstraction replacing global_cwq (global cpu workqueue). At this point, I'm fairly confident about getting custom worker pools working pretty soon and ready for the next merge window. Lai's patches are replacing the convoluted mb() dancing workqueue has been doing with much simpler mechanism which only depends on assignment atomicity of long. For details, please read the commit message of `0b3dae68ac` ("workqueue: simplify is-work-item-queued-here test"). While the change ends up adding one pointer to struct delayed_work, the inflation in percentage is less than five percent and it decouples delayed_work logic a lot more cleaner from usual work handling, removes the unusual memory barrier dancing, and allows for further simplification, so I think the trade-off is acceptable. There will be two more workqueue related pull requests and there are some shared commits among them. I'll write further pull requests assuming this pull request is pulled first." * 'for-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (37 commits) workqueue: un-GPL function delayed_work_timer_fn() workqueue: rename cpu_workqueue to pool_workqueue workqueue: reimplement is_chained_work() using current_wq_worker() workqueue: fix is_chained_work() regression workqueue: pick cwq instead of pool in __queue_work() workqueue: make get_work_pool_id() cheaper workqueue: move nr_running into worker_pool workqueue: cosmetic update in try_to_grab_pending() workqueue: simplify is-work-item-queued-here test workqueue: make work->data point to pool after try_to_grab_pending() workqueue: add delayed_work->wq to simplify reentrancy handling workqueue: make work_busy() test WORK_STRUCT_PENDING first workqueue: replace WORK_CPU_NONE/LAST with WORK_CPU_END workqueue: post global_cwq removal cleanups workqueue: rename nr_running variables workqueue: remove global_cwq workqueue: remove worker_pool->gcwq workqueue: replace for_each_worker_pool() with for_each_std_worker_pool() workqueue: make freezing/thawing per-pool workqueue: make hotplug processing per-pool ...	2013-02-19 22:01:33 -08:00
Linus Torvalds	1eaec8212e	Merge branch 'for-3.9-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue [delayed_]work_pending() cleanups from Tejun Heo: "This is part of on-going cleanups to remove / minimize usages of workqueue interfaces which are deprecated and/or misleading. This round drops a number of usages of [delayed_]work_pending(), which are dangerous as they lack any form of synchronization and thus often lead to buggy / unnecessary code. There are a couple legitimate use cases in kernel. Hopefully, they can be converted and [delayed_]work_pending() can be removed completely. Even if not, removing most of misuses should make it more difficult to find examples of misuses and thus slow down growth of them. These changes are independent from other workqueue changes." * 'for-3.9-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: wimax/i2400m: fix i2400m->wake_tx_skb handling kprobes: fix wait_for_kprobe_optimizer() ipw2x00: simplify scan_event handling video/exynos: don't use [delayed_]work_pending() tty/max3100: don't use [delayed_]work_pending() x86/mce: don't use [delayed_]work_pending() rfkill: don't use [delayed_]work_pending() wl1251: don't use [delayed_]work_pending() thinkpad_acpi: don't use [delayed_]work_pending() mwifiex: don't use [delayed_]work_pending() sja1000: don't use [delayed_]work_pending()	2013-02-19 21:58:52 -08:00
Linus Torvalds	121027a7a6	Merge branch 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull two x86 kernel build changes from Ingo Molnar: "The first change modifies how 'make oldconfig' works on cross-bitness situations on x86. It was felt the new behavior of preserving the bitness of the .config is more logical. This is a leftover of the merge. The second change eliminates a Perl warning. (There's another, more complete fix resulting of this warning fix, which second fix in flight to you via the kbuild tree, which will remove the timeconst.pl script altogether.)" * 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timeconst.pl: Eliminate Perl warning x86: Default to ARCH=x86 to avoid overriding CONFIG_64BIT	2013-02-19 19:12:03 -08:00
Linus Torvalds	5800700f66	Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/apic changes from Ingo Molnar: "Main changes: - Multiple MSI support added to the APIC, PCI and AHCI code - acked by all relevant maintainers, by Alexander Gordeev. The advantage is that multiple AHCI ports can have multiple MSI irqs assigned, and can thus spread to multiple CPUs. [ Drivers can make use of this new facility via the pci_enable_msi_block_auto() method ] - x86 IOAPIC code from interrupt remapping cleanups from Joerg Roedel: These patches move all interrupt remapping specific checks out of the x86 core code and replaces the respective call-sites with function pointers. As a result the interrupt remapping code is better abstraced from x86 core interrupt handling code. - Various smaller improvements, fixes and cleanups." * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits) x86/intel/irq_remapping: Clean up x2apic opt-out security warning mess x86, kvm: Fix intialization warnings in kvm.c x86, irq: Move irq_remapped out of x86 core code x86, io_apic: Introduce eoi_ioapic_pin call-back x86, msi: Introduce x86_msi.compose_msi_msg call-back x86, irq: Introduce setup_remapped_irq() x86, irq: Move irq_remapped() check into free_remapped_irq x86, io-apic: Remove !irq_remapped() check from __target_IO_APIC_irq() x86, io-apic: Move CONFIG_IRQ_REMAP code out of x86 core x86, irq: Add data structure to keep AMD specific irq remapping information x86, irq: Move irq_remapping_enabled declaration to iommu code x86, io_apic: Remove irq_remapping_enabled check in setup_timer_IRQ0_pin x86, io_apic: Move irq_remapping_enabled checks out of check_timer() x86, io_apic: Convert setup_ioapic_entry to function pointer x86, io_apic: Introduce set_affinity function pointer x86, msi: Use IRQ remapping specific setup_msi_irqs routine x86, hpet: Introduce x86_msi_ops.setup_hpet_msi x86, io_apic: Introduce x86_io_apic_ops.print_entries for debugging x86, io_apic: Introduce x86_io_apic_ops.disable() x86, apic: Mask IO-APIC and PIC unconditionally on LAPIC resume ...	2013-02-19 19:07:27 -08:00
Linus Torvalds	266d7ad7f4	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer changes from Ingo Molnar: "Main changes: - ntp: Add CONFIG_RTC_SYSTOHC: a generic RTC driver facility complementing the existing CONFIG_RTC_HCTOSYS, which uses NTP to keep the hardware clock updated. - posix-timers: Fix clock_adjtime to always return timex data on success. This is changing the ABI, but no breakage was expected and found - caution is warranted nevertheless. - platform persistent clock improvements/cleanups. - clockevents: refactor timer broadcast handling to be more generic and less duplicated with matching architecture code (mostly ARM motivated.) - various fixes and cleanups" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timers/x86/hpet: Use HPET_COUNTER to specify the hpet counter in vread_hpet() posix-cpu-timers: Fix nanosleep task_struct leak clockevents: Fix generic broadcast for FEAT_C3STOP time, Fix setting of hardware clock in NTP code hrtimer: Prevent hrtimer_enqueue_reprogram race clockevents: Add generic timer broadcast function clockevents: Add generic timer broadcast receiver timekeeping: Switch HAS_PERSISTENT_CLOCK to ALWAYS_USE_PERSISTENT_CLOCK x86/time/rtc: Don't print extended CMOS year when reading RTC x86: Select HAS_PERSISTENT_CLOCK on x86 timekeeping: Add CONFIG_HAS_PERSISTENT_CLOCK option rtc: Skip the suspend/resume handling if persistent clock exist timekeeping: Add persistent_clock_exist flag posix-timers: Fix clock_adjtime to always return timex data on success Round the calculated scale factor in set_cyc2ns_scale() NTP: Add a CONFIG_RTC_SYSTOHC configuration MAINTAINERS: Update John Stultz's email time: create __getnstimeofday for WARNless calls	2013-02-19 19:05:45 -08:00
Linus Torvalds	bcbd818c06	Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull preparatory smp/hotplug patches from Ingo Molnar: "Some early preparatory changes for the WIP hotplug rework by Thomas Gleixner." * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: stop_machine: Use smpboot threads stop_machine: Store task reference in a separate per cpu variable smpboot: Allow selfparking per cpu threads	2013-02-19 19:04:55 -08:00
Linus Torvalds	d652e1eb8e	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: "Main changes: - scheduler side full-dynticks (user-space execution is undisturbed and receives no timer IRQs) preparation changes that convert the cputime accounting code to be full-dynticks ready, from Frederic Weisbecker. - Initial sched.h split-up changes, by Clark Williams - select_idle_sibling() performance improvement by Mike Galbraith: " 1 tbench pair (worst case) in a 10 core + SMT package: pre 15.22 MB/sec 1 procs post 252.01 MB/sec 1 procs " - sched_rr_get_interval() ABI fix/change. We think this detail is not used by apps (so it's not an ABI in practice), but lets keep it under observation. - misc RT scheduling cleanups, optimizations" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) sched/rt: Add <linux/sched/rt.h> header to <linux/init_task.h> cputime: Remove irqsave from seqlock readers sched, powerpc: Fix sched.h split-up build failure cputime: Restore CPU_ACCOUNTING config defaults for PPC64 sched/rt: Move rt specific bits into new header file sched/rt: Add a tuning knob to allow changing SCHED_RR timeslice sched: Move sched.h sysctl bits into separate header sched: Fix signedness bug in yield_to() sched: Fix select_idle_sibling() bouncing cow syndrome sched/rt: Further simplify pick_rt_task() sched/rt: Do not account zero delta_exec in update_curr_rt() cputime: Safely read cputime of full dynticks CPUs kvm: Prepare to add generic guest entry/exit callbacks cputime: Use accessors to read task cputime stats cputime: Allow dynamic switch between tick/virtual based cputime accounting cputime: Generic on-demand virtual cputime accounting cputime: Move default nsecs_to_cputime() to jiffies based cputime file cputime: Librarize per nsecs resolution cputime definitions cputime: Avoid multiplication overflow on utime scaling context_tracking: Export context state for generic vtime ... Fix up conflict in kernel/context_tracking.c due to comment additions.	2013-02-19 18:19:48 -08:00
Linus Torvalds	8f55cea410	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf changes from Ingo Molnar: "There are lots of improvements, the biggest changes are: Main kernel side changes: - Improve uprobes performance by adding 'pre-filtering' support, by Oleg Nesterov. - Make some POWER7 events available in sysfs, equivalent to what was done on x86, from Sukadev Bhattiprolu. - tracing updates by Steve Rostedt - mostly misc fixes and smaller improvements. - Use perf/event tracing to report PCI Express advanced errors, by Tony Luck. - Enable northbridge performance counters on AMD family 15h, by Jacob Shin. - This tracing commit: tracing: Remove the extra 4 bytes of padding in events changes the ABI. All involved parties (PowerTop in particular) seem to agree that it's safe to do now with the introduction of libtraceevent, but the devil is in the details ... Main tooling side changes: - Add 'event group view', from Namyung Kim: To use it, 'perf record' should group events when recording. And then perf report parses the saved group relation from file header and prints them together if --group option is provided. You can use the 'perf evlist' command to see event group information: $ perf record -e '{ref-cycles,cycles}' noploop 1 [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.385 MB perf.data (~16807 samples) ] $ perf evlist --group {ref-cycles,cycles} With this example, default perf report will show you each event separately. You can use --group option to enable event group view: $ perf report --group ... # group: {ref-cycles,cycles} # ======== # Samples: 7K of event 'anon group { ref-cycles, cycles }' # Event count (approx.): 6876107743 # # Overhead Command Shared Object Symbol # ................ ....... ................. .......................... 99.84% 99.76% noploop noploop [.] main 0.07% 0.00% noploop ld-2.15.so [.] strcmp 0.03% 0.00% noploop [kernel.kallsyms] [k] timerqueue_del 0.03% 0.03% noploop [kernel.kallsyms] [k] sched_clock_cpu 0.02% 0.00% noploop [kernel.kallsyms] [k] account_user_time 0.01% 0.00% noploop [kernel.kallsyms] [k] __alloc_pages_nodemask 0.00% 0.00% noploop [kernel.kallsyms] [k] native_write_msr_safe 0.00% 0.11% noploop [kernel.kallsyms] [k] _raw_spin_lock 0.00% 0.06% noploop [kernel.kallsyms] [k] find_get_page 0.00% 0.02% noploop [kernel.kallsyms] [k] rcu_check_callbacks 0.00% 0.02% noploop [kernel.kallsyms] [k] __current_kernel_time As you can see the Overhead column now contains both of ref-cycles and cycles and header line shows group information also - 'anon group { ref-cycles, cycles }'. The output is sorted by period of group leader first. - Initial GTK+ annotate browser, from Namhyung Kim. - Add option for runtime switching perf data file in perf report, just press 's' and a menu with the valid files found in the current directory will be presented, from Feng Tang. - Add support to display whole group data for raw columns, from Jiri Olsa. - Add per processor socket count aggregation in perf stat, from Stephane Eranian. - Add interval printing in 'perf stat', from Stephane Eranian. - 'perf test' improvements - Add support for wildcards in tracepoint system name, from Jiri Olsa. - Add anonymous huge page recognition, from Joshua Zhu. - perf build-id cache now can show DSOs present in a perf.data file that are not in the cache, to integrate with build-id servers being put in place by organizations such as Fedora. - perf top now shares more of the evsel config/creation routines with 'record', paving the way for further integration like 'top' snapshots, etc. - perf top now supports DWARF callchains. - Fix mmap limitations on 32-bit, fix from David Miller. - 'perf bench numa mem' NUMA performance measurement suite - ... and lots of fixes, performance improvements, cleanups and other improvements I failed to list - see the shortlog and git log for details." * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (270 commits) perf/x86/amd: Enable northbridge performance counters on AMD family 15h perf/hwbp: Fix cleanup in case of kzalloc failure perf tools: Fix build with bison 2.3 and older. perf tools: Limit unwind support to x86 archs perf annotate: Make it to be able to skip unannotatable symbols perf gtk/annotate: Fail early if it can't annotate perf gtk/annotate: Show source lines with gray color perf gtk/annotate: Support multiple event annotation perf ui/gtk: Implement basic GTK2 annotation browser perf annotate: Fix warning message on a missing vmlinux perf buildid-cache: Add --update option uprobes/perf: Avoid uprobe_apply() whenever possible uprobes/perf: Teach trace_uprobe/perf code to use UPROBE_HANDLER_REMOVE uprobes/perf: Teach trace_uprobe/perf code to pre-filter uprobes/perf: Teach trace_uprobe/perf code to track the active perf_event's uprobes: Introduce uprobe_apply() perf: Introduce hw_perf_event->tp_target and ->tp_list uprobes/perf: Always increment trace_uprobe->nhit uprobes/tracing: Kill uprobe_trace_consumer, embed uprobe_consumer into trace_uprobe uprobes/tracing: Introduce is_trace_uprobe_enabled() ...	2013-02-19 17:49:41 -08:00
Linus Torvalds	b7133a9a10	Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq core changes from Ingo Molnar: "The biggest changes are the IRQ-work and printk changes from Frederic Weisbecker, which prepare the code for 'full dynticks' (the ability to stop or slow down the periodic tick arbitrarily, not just in idle time as today): - Don't stop tick with irq works pending. This fix is generally useful and concerns archs that can't raise self IPIs. - Flush irq works before CPU offlining. - Introduce "lazy" irq works that can wait for the next tick to be executed, unless it's stopped. - Implement klogd wake up using irq work. This removes the ad-hoc printk_tick()/printk_needs_cpu() hooks and make it working even in dynticks mode. - Cleanups and fixes." * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Export enable/disable_percpu_irq() arch Kconfig: Remove references to IRQ_PER_CPU irq_work: Remove return value from the irq_work_queue() function genirq: Avoid deadlock in spurious handling printk: Wake up klogd using irq_work irq_work: Make self-IPIs optable irq_work: Warn if there's still work on cpu_down irq_work: Flush work on CPU_DYING irq_work: Don't stop the tick with pending works nohz: Add API to check tick state irq_work: Remove CONFIG_HAVE_IRQ_WORK irq_work: Fix racy check on work pending flag irq_work: Fix racy IRQ_WORK_BUSY flag setting	2013-02-19 17:47:58 -08:00
Linus Torvalds	e84cf5d0fd	Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU changes from Ingo Molnar: "SRCU changes: - These include debugging aids, updates that move towards the goal of permitting srcu_read_lock() and srcu_read_unlock() to be used from idle and offline CPUs, and a few small fixes. Changes to rcutorture and to RCU documentation: - Posted to LKML at https://lkml.org/lkml/2013/1/26/188 Enhancements to uniprocessor handling in tiny RCU: - Posted to LKML at https://lkml.org/lkml/2013/1/27/2 Tag RCU callbacks with grace-period number to simplify callback advancement: - Posted to LKML at https://lkml.org/lkml/2013/1/26/203 Miscellaneous fixes: - Posted to LKML at https://lkml.org/lkml/2013/1/26/204" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits) srcu: use ACCESS_ONCE() to access sp->completed in srcu_read_lock() srcu: Update synchronize_srcu_expedited()'s comments srcu: Update synchronize_srcu()'s comments srcu: Remove checks preventing idle CPUs from calling srcu_read_lock() srcu: Remove checks preventing offline CPUs from calling srcu_read_lock() srcu: Simple cleanup for cleanup_srcu_struct() srcu: Add might_sleep() annotation to synchronize_srcu() srcu: Simplify __srcu_read_unlock() via this_cpu_dec() rcu: Allow rcutorture to be built at low optimization levels rcu: Make rcutorture's shuffler task shuffle recently added tasks rcu: Allow TREE_PREEMPT_RCU on UP systems rcu: Provide RCU CPU stall warnings for tiny RCU context_tracking: Add comments on interface and internals rcu: Remove obsolete Kconfig option from comment rcu: Remove unused code originally used for context tracking rcu: Consolidate debugging Kconfig options rcu: Correct 'optimized' to 'optimize' in header comment rcu: Trace callback acceleration rcu: Tag callback lists with corresponding grace-period number rcutorture: Don't compare ptr with 0 ...	2013-02-19 17:45:20 -08:00
Konstantin Khlebnikov	1438ade567	workqueue: un-GPL function delayed_work_timer_fn() commit `d8e794dfd5` ("workqueue: set delayed_work->timer function on initialization") exports function delayed_work_timer_fn() only for GPL modules. This makes delayed-works unusable for non-GPL modules, because initialization macro now requires GPL symbol. For example schedule_delayed_work() available for non-GPL. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org # 3.7	2013-02-19 10:09:13 -08:00
Thomas Gleixner	fe2b05f7ca	futex: Revert "futex: Mark get_robust_list as deprecated" This reverts commit `ec0c4274e3`. get_robust_list() is in use and a removal would break existing user space. With the permission checks in place it's not longer a security hole. Remove the deprecation warnings. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Richard Weinberger <richard@nod.at> Cc: akpm@linux-foundation.org Cc: paul.gortmaker@windriver.com Cc: davej@redhat.com Cc: keescook@chromium.org Cc: stable@vger.kernel.org Cc: ebiederm@xmission.com	2013-02-19 08:43:38 +01:00
Thomas Gleixner	a6c0c943a1	ntp: Make ntp_lock raw seconds_overflow() is called from hard interrupt context even on Preempt-RT. This requires the lock to be a raw_spinlock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-19 08:43:16 +01:00
Ben Greear	c054060683	lockdep: Print more info when MAX_LOCK_DEPTH is exceeded This helps debug cases where a lock is acquired over and over without being released. Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ben Greear <greearb@candelatech.com> Cc: peterz@infradead.org Link: http://lkml.kernel.org/r/1360176979-4421-1-git-send-email-greearb@candelatech.com [ Changed the printout ordering. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-19 08:42:44 +01:00
Namhyung Kim	c06b4f1947	watchdog: Use local_clock for get_timestamp() The get_timestamp() function is always called with current cpu, thus using local_clock() would be more appropriate and it makes the code shorter and cleaner IMHO. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Don Zickus <dzickus@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1356576585-28782-1-git-send-email-namhyung@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-19 08:42:40 +01:00
Srivatsa S. Bhat	f86f755482	lockdep: Rename print_unlock_inbalance_bug() to print_unlock_imbalance_bug() Fix the typo in the function name (s/inbalance/imbalance) Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: rostedt@goodmis.org Cc: peterz@infradead.org Link: http://lkml.kernel.org/r/20130108130547.32733.79507.stgit@srivatsabhat.in.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-19 08:42:39 +01:00
Thomas Gleixner	cdc4e86b58	cputime: Remove irqsave from seqlock readers The reader side code has no requirement to disable interrupts while sampling data. The sequence counter is enough to ensure consistency. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-02-19 08:05:53 +01:00
David S. Miller	6338a53a2b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net into net Pull in 'net' to take in the bug fixes that didn't make it into 3.8-final. Also, deal with the semantic conflict of the change made to net/ipv6/xfrm6_policy.c A missing rt6->n neighbour release was added to 'net', but in 'net-next' we no longer cache the neighbour entries in the ipv6 routes so that change is not appropriate there. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-02-18 23:34:21 -05:00
Steven Rostedt (Red Hat)	8c189ea64e	ftrace: Call ftrace cleanup module notifier after all other notifiers Commit: `c1bf08ac` "ftrace: Be first to run code modification on modules" changed ftrace module notifier's priority to INT_MAX in order to process the ftrace nops before anything else could touch them (namely kprobes). This was the correct thing to do. Unfortunately, the ftrace module notifier also contains the ftrace clean up code. As opposed to the set up code, this code should be run after all the module notifiers have run in case a module is doing correct clean-up and unregisters its ftrace hooks. Basically, ftrace needs to do clean up on module removal, as it needs to know about code being removed so that it doesn't try to modify that code. But after it removes the module from its records, if a ftrace user tries to remove a probe, that removal will fail due as the record of that code segment no longer exists. Nothing really bad happens if the probe removal is called after ftrace did the clean up, but the ftrace removal function will return an error. Correct code (such as kprobes) will produce a WARN_ON() if it fails to remove the probe. As people get annoyed by frivolous warnings, it's best to do the ftrace clean up after everything else. By splitting the ftrace_module_notifier into two notifiers, one that does the module load setup that is run at high priority, and the other that is called for module clean up that is run at low priority, the problem is solved. Cc: stable@vger.kernel.org Reported-by: Frank Ch. Eigler <fche@redhat.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-02-18 23:09:26 -05:00
Chris Metcalf	36a5df85e9	genirq: Export enable/disable_percpu_irq() These functions are used by the tilegx onchip network driver, and it's useful to be able to load that driver as a module. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Link: http://lkml.kernel.org/r/201302012043.r11KhNZF024371@farm-0021.internal.tilera.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2013-02-18 21:42:25 +01:00
Li Zefan	f169007b27	cgroup: fail if monitored file and event_control are in different cgroup If we pass fd of memory.usage_in_bytes of cgroup A to cgroup.event_control of cgroup B, then we won't get memory usage notification from A but B! What's worse, if A and B are in different mount hierarchy, we'll end up accessing NULL pointer! Disallow this kind of invalid usage. Signed-off-by: Li Zefan <lizefan@huawei.com> Acked-by: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-02-18 09:31:35 -08:00
Li Zefan	810cbee4fa	cgroup: fix cgroup_rmdir() vs close(eventfd) race commit `205a872bd6` ("cgroup: fix lockdep warning for event_control") solved a deadlock by introducing a new bug. Move cgrp->event_list to a temporary list doesn't mean you can traverse this list locklessly, because at the same time cgroup_event_wake() can be called and remove the event from the list. The result of this race is disastrous. We adopt the way how kvm irqfd code implements race-free event removal, which is now described in the comments in cgroup_event_wake(). v3: - call eventfd_signal() no matter it's eventfd close or cgroup removal that removes the cgroup event. Acked-by: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-02-18 09:17:24 -08:00
Li Zefan	63f43f55c9	cpuset: fix cpuset_print_task_mems_allowed() vs rename() race rename() will change dentry->d_name. The result of this race can be worse than seeing partially rewritten name, but we might access a stale pointer because rename() will re-allocate memory to hold a longer name. It's safe in the protection of dentry->d_lock. v2: check NULL dentry before acquiring dentry lock. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org	2013-02-18 09:08:15 -08:00
Li Zefan	71b5707e11	cgroup: fix exit() vs rmdir() race In cgroup_exit() put_css_set_taskexit() is called without any lock, which might lead to accessing a freed cgroup: thread1 thread2 --------------------------------------------- exit() cgroup_exit() put_css_set_taskexit() atomic_dec(cgrp->count); rmdir(); /* not safe !! */ check_for_release(cgrp); rcu_read_lock() can be used to make sure the cgroup is alive. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org	2013-02-18 09:08:10 -08:00
H. Peter Anvin	70730bca13	kernel: Replace timeconst.pl with a bc script bc is the standard tool for multi-precision arithmetic. We switched to Perl because akpm reported a hard-to-reproduce build hang, which was very odd because affected and unaffected machines were all running the same version of GNU bc. Unfortunately switching to Perl required a really ugly "canning" mechanism to support Perl < 5.8 installations lacking the Math::BigInt module. It was recently pointed out to me that some very old versions of GNU make had problems with pipes in subshells, which was indeed the construct used in the Makefile rules in that version of the patch; Perl didn't need it so switching to Perl fixed the problem for unrelated reasons. With the problem (hopefully) root-caused, we can switch back to bc and do the arbitrary-precision arithmetic naturally. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Andrew Morton <akpm@linux-foundation.org> Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Michal Marek <mmarek@suse.cz>	2013-02-16 23:17:25 +01:00
Vineet Gupta	bf14e3b979	sysctl: Enable PARISC "unaligned-trap" to be used cross-arch PARISC defines /proc/sys/kernel/unaligned-trap to runtime toggle unaligned access emulation. The exact mechanics of enablig/disabling are still arch specific, we can make the sysctl usable by other arches. Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Acked-by: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Serge Hallyn <serge.hallyn@canonical.com>	2013-02-15 23:16:05 +05:30
Vladimir Davydov	686855f5d8	sched: add wait_for_completion_io[_timeout] The only difference between wait_for_completion[_timeout]() and wait_for_completion_io[_timeout]() is that the latter calls io_schedule_timeout() instead of schedule_timeout() so that the caller is accounted as waiting for IO, not just sleeping. These functions can be used for correct iowait time accounting when the completion struct is actually used for waiting for IO (e.g. completion of a bio request in the block layer). Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2013-02-15 16:45:06 +01:00
Rafael J. Wysocki	7113fe74c1	Merge branch 'pm-assorted' * pm-assorted: suspend: enable freeze timeout configuration through sys ACPI: enable ACPI SCI during suspend PM: Introduce suspend state PM_SUSPEND_FREEZE PM / Runtime: Add new helper function: pm_runtime_active() PM / tracing: remove deprecated power trace API PM: don't use [delayed_]work_pending() PM / Domains: don't use [delayed_]work_pending()	2013-02-15 13:58:54 +01:00
Stanislaw Gruszka	e6c42c295e	posix-cpu-timers: Fix nanosleep task_struct leak The trinity fuzzer triggered a task_struct reference leak via clock_nanosleep with CPU_TIMERs. do_cpu_nanosleep() calls posic_cpu_timer_create(), but misses a corresponding posix_cpu_timer_del() which leads to the task_struct reference leak. Reported-and-tested-by: Tommi Rantala <tt.rantala@gmail.com> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20130215100810.GF4392@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2013-02-15 11:41:56 +01:00
Daniel Baluta	02e176af92	perf/hwbp: Fix cleanup in case of kzalloc failure Obviously this is a typo and could result in memory leaks if kzalloc fails on a given cpu. Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1360186160-7566-1-git-send-email-dbaluta@ixiacom.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2013-02-14 17:06:39 -03:00
Thomas Gleixner	9f4646d283	Merge branch 'fortglx/3.9/time' of git://git.linaro.org/people/jstultz/linux into timers/core	2013-02-14 19:46:10 +01:00
Thomas Gleixner	14e568e78f	stop_machine: Use smpboot threads Use the smpboot thread infrastructure. Mark the stopper thread selfparking and park it after it has finished the take_cpu_down() work. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Arjan van de Veen <arjan@infradead.org> Cc: Paul Turner <pjt@google.com> Cc: Richard Weinberger <rw@linutronix.de> Cc: Magnus Damm <magnus.damm@gmail.com> Link: http://lkml.kernel.org/r/20130131120741.686315164@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2013-02-14 15:29:38 +01:00
Thomas Gleixner	860a0ffaa3	stop_machine: Store task reference in a separate per cpu variable To allow the stopper thread being managed by the smpboot thread infrastructure separate out the task storage from the stopper data structure. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Arjan van de Veen <arjan@infradead.org> Cc: Paul Turner <pjt@google.com> Cc: Richard Weinberger <rw@linutronix.de> Cc: Magnus Damm <magnus.damm@gmail.com> Link: http://lkml.kernel.org/r/20130131120741.626690384@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2013-02-14 15:29:37 +01:00
Thomas Gleixner	7d7e499f73	smpboot: Allow selfparking per cpu threads The stop machine threads are still killed when a cpu goes offline. The reason is that the thread is used to bring the cpu down, so it can't be parked along with the other per cpu threads. Allow a per cpu thread to be excluded from automatic parking, so it can park itself once it's done Add a create callback function as well. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Arjan van de Veen <arjan@infradead.org> Cc: Paul Turner <pjt@google.com> Cc: Richard Weinberger <rw@linutronix.de> Cc: Magnus Damm <magnus.damm@gmail.com> Link: http://lkml.kernel.org/r/20130131120741.553993267@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2013-02-14 15:29:37 +01:00
Al Viro	d64008a8f3	burying unused conditionals __ARCH_WANT_SYS_RT_SIGACTION, __ARCH_WANT_SYS_RT_SIGSUSPEND, __ARCH_WANT_COMPAT_SYS_RT_SIGSUSPEND, __ARCH_WANT_COMPAT_SYS_SCHED_RR_GET_INTERVAL - not used anymore CONFIG_GENERIC_{SIGALTSTACK,COMPAT_RT_SIG{ACTION,QUEUEINFO,PENDING,PROCMASK}} - can be assumed always set.	2013-02-14 09:21:15 -05:00
Al Viro	e9b04b5b67	make do_sigaltstack() static Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-14 09:21:15 -05:00
Tejun Heo	112202d909	workqueue: rename cpu_workqueue to pool_workqueue workqueue has moved away from global_cwqs to worker_pools and with the scheduled custom worker pools, wforkqueues will be associated with pools which don't have anything to do with CPUs. The workqueue code went through significant amount of changes recently and mass renaming isn't likely to hurt much additionally. Let's replace 'cpu' with 'pool' so that it reflects the current design. * s/struct cpu_workqueue_struct/struct pool_workqueue/ * s/cpu_wq/pool_wq/ * s/cwq/pwq/ This patch is purely cosmetic. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-02-13 19:29:12 -08:00
Tejun Heo	8d03ecfe47	workqueue: reimplement is_chained_work() using current_wq_worker() is_chained_work() was added before current_wq_worker() and implemented its own ham-fisted way of finding out whether %current is a workqueue worker - it iterates through all possible workers. Drop the custom implementation and reimplement using current_wq_worker(). Signed-off-by: Tejun Heo <tj@kernel.org>	2013-02-13 19:29:10 -08:00
Tejun Heo	1dd638149f	workqueue: fix is_chained_work() regression `c9e7cf273f` ("workqueue: move busy_hash from global_cwq to worker_pool") incorrectly converted is_chained_work() to use get_gcwq() inside for_each_gcwq_cpu() while removing get_gcwq(). As cwq might not exist for all possible workqueue CPUs, @cwq can be NULL and the following cwq deferences can lead to oops. Fix it by using for_each_cwq_cpu() instead, which is the better one to use anyway as we only need to check pools that the wq is associated with. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-02-13 19:29:07 -08:00

... 3 4 5 6 7 ...

15406 Commits