This fixes these build errors on powerpc:
In file included from arch/powerpc/mm/fault.c:18:
include/linux/signal.h:239: error: 'struct task_struct' declared inside parameter list
include/linux/signal.h:239: error: its scope is only this definition or declaration, which is probably not what you want
include/linux/signal.h:240: error: 'struct task_struct' declared inside parameter list
..
Exposed by commit e66eed651f ("list: remove prefetching from regular
list iterators"), which removed the include of <linux/prefetch.h> from
<linux/list.h>.
Without that, linux/signal.h no longer accidentally got the declaration
of 'struct task_struct'.
Fix by properly declaring the struct, rather than introducing any new
header file dependency.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Give users the option of completely powering off unoccupied
SATA ports using the existing min_power link_power_management_policy
option. When the use selects this option on an empty port, we
will power the port off by setting DET to off. For occupied ports,
behavior is unchanged.
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
hrtimer: Make lookup table const
RTC: Disable CONFIG_RTC_CLASS from being built as a module
timers: Fix alarmtimer build issues when CONFIG_RTC_CLASS=n
timers: Remove delayed irqwork from alarmtimers implementation
timers: Improve alarmtimer comments and minor fixes
timers: Posix interface for alarm-timers
timers: Introduce in-kernel alarm-timer interface
timers: Add rb_init_node() to allow for stack allocated rb nodes
time: Add timekeeping_inject_sleeptime
* 'timers-clockevents-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: hpet: Cleanup the clockevents init and register code
x86: Convert PIT to clockevents_config_and_register()
clockevents: Provide interface to reconfigure an active clock event device
clockevents: Provide combined configure and register function
clockevents: Restructure clock_event_device members
clocksource: Get rid of the hardcoded 5 seconds sleep time limit
clocksource: Restructure clocksource struct members
* 'timers-clocksource-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
clocksource: convert mips to generic i8253 clocksource
clocksource: convert x86 to generic i8253 clocksource
clocksource: convert footbridge to generic i8253 clocksource
clocksource: add common i8253 PIT clocksource
blackfin: convert to clocksource_register_hz
mips: convert to clocksource_register_hz/khz
sparc: convert to clocksource_register_hz/khz
alpha: convert to clocksource_register_hz
microblaze: convert to clocksource_register_hz/khz
ia64: convert to clocksource_register_hz/khz
x86: Convert remaining x86 clocksources to clocksource_register_hz/khz
Make clocksource name const
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (60 commits)
sched: Fix and optimise calculation of the weight-inverse
sched: Avoid going ahead if ->cpus_allowed is not changed
sched, rt: Update rq clock when unthrottling of an otherwise idle CPU
sched: Remove unused parameters from sched_fork() and wake_up_new_task()
sched: Shorten the construction of the span cpu mask of sched domain
sched: Wrap the 'cfs_rq->nr_spread_over' field with CONFIG_SCHED_DEBUG
sched: Remove unused 'this_best_prio arg' from balance_tasks()
sched: Remove noop in alloc_rt_sched_group()
sched: Get rid of lock_depth
sched: Remove obsolete comment from scheduler_tick()
sched: Fix sched_domain iterations vs. RCU
sched: Next buddy hint on sleep and preempt path
sched: Make set_*_buddy() work on non-task entities
sched: Remove need_migrate_task()
sched: Move the second half of ttwu() to the remote cpu
sched: Restructure ttwu() some more
sched: Rename ttwu_post_activation() to ttwu_do_wakeup()
sched: Remove rq argument from ttwu_stat()
sched: Remove rq->lock from the first half of ttwu()
sched: Drop rq->lock from sched_exec()
...
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Fix rt_rq runtime leakage bug
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (107 commits)
perf stat: Add more cache-miss percentage printouts
perf stat: Add -d -d and -d -d -d options to show more CPU events
ftrace/kbuild: Add recordmcount files to force full build
ftrace: Add self-tests for multiple function trace users
ftrace: Modify ftrace_set_filter/notrace to take ops
ftrace: Allow dynamically allocated function tracers
ftrace: Implement separate user function filtering
ftrace: Free hash with call_rcu_sched()
ftrace: Have global_ops store the functions that are to be traced
ftrace: Add ops parameter to ftrace_startup/shutdown functions
ftrace: Add enabled_functions file
ftrace: Use counters to enable functions to trace
ftrace: Separate hash allocation and assignment
ftrace: Create a global_ops to hold the filter and notrace hashes
ftrace: Use hash instead for FTRACE_FL_FILTER
ftrace: Replace FTRACE_FL_NOTRACE flag with a hash of ignored functions
perf bench, x86: Add alternatives-asm.h wrapper
x86, 64-bit: Fix copy_[to/from]_user() checks for the userspace address limit
x86, mem: memset_64.S: Optimize memset by enhanced REP MOVSB/STOSB
x86, mem: memmove_64.S: Optimize memmove by enhanced REP MOVSB/STOSB
...
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
seqlock: Don't smp_rmb in seqlock reader spin loop
watchdog, hung_task_timeout: Add Kconfig configurable default
lockdep: Remove cmpxchg to update nr_chain_hlocks
lockdep: Print a nicer description for simple irq lock inversions
lockdep: Replace "Bad BFS generated tree" message with something less cryptic
lockdep: Print a nicer description for irq inversion bugs
lockdep: Print a nicer description for simple deadlocks
lockdep: Print a nicer description for normal deadlocks
lockdep: Print a nicer description for irq lock inversions
* 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, gart: Rename pci-gart_64.c to amd_gart_64.c
x86/amd-iommu: Use threaded interupt handler
arch/x86/kernel/pci-iommu_table.c: Convert sprintf_symbol to %pS
x86/amd-iommu: Add support for invalidate_all command
x86/amd-iommu: Add extended feature detection
x86/amd-iommu: Add ATS enable/disable code
x86/amd-iommu: Add flag to indicate IOTLB support
x86/amd-iommu: Flush device IOTLB if ATS is enabled
x86/amd-iommu: Select PCI_IOV with AMD IOMMU driver
PCI: Move ATS declarations in seperate header file
dma-debug: print information about leaked entry
x86/amd-iommu: Flush all internal TLBs when IOMMUs are enabled
x86/amd-iommu: Rename iommu_flush_device
x86/amd-iommu: Improve handling of full command buffer
x86/amd-iommu: Rename iommu_flush* to domain_flush*
x86/amd-iommu: Remove command buffer resetting logic
x86/amd-iommu: Cleanup completion-wait handling
x86/amd-iommu: Cleanup inv_pages command handling
x86/amd-iommu: Move inv-dte command building to own function
x86/amd-iommu: Move compl-wait command building to own function
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: (34 commits)
PM: Introduce generic prepare and complete callbacks for subsystems
PM: Allow drivers to allocate memory from .prepare() callbacks safely
PM: Remove CONFIG_PM_VERBOSE
Revert "PM / Hibernate: Reduce autotuned default image size"
PM / Hibernate: Add sysfs knob to control size of memory for drivers
PM / Wakeup: Remove useless synchronize_rcu() call
kmod: always provide usermodehelper_disable()
PM / ACPI: Remove acpi_sleep=s4_nonvs
PM / Wakeup: Fix build warning related to the "wakeup" sysfs file
PM: Print a warning if firmware is requested when tasks are frozen
PM / Runtime: Rework runtime PM handling during driver removal
Freezer: Use SMP barriers
PM / Suspend: Do not ignore error codes returned by suspend_enter()
PM: Fix build issue in clock_ops.c for CONFIG_PM_RUNTIME unset
PM: Revert "driver core: platform_bus: allow runtime override of dev_pm_ops"
OMAP1 / PM: Use generic clock manipulation routines for runtime PM
PM: Remove sysdev suspend, resume and shutdown operations
PM / PowerPC: Use struct syscore_ops instead of sysdevs for PM
PM / UNICORE32: Use struct syscore_ops instead of sysdevs for PM
PM / AVR32: Use struct syscore_ops instead of sysdevs for PM
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/qib: Use pci_dev->revision
RDMA/iwcm: Get rid of enum iw_cm_event_status
IB/ipath: Use pci_dev->revision, again
IB/qib: Prevent driver hang with unprogrammed boards
RDMA/cxgb4: EEH errors can hang the driver
RDMA/cxgb4: Reset wait condition atomically
RDMA/cxgb4: Fix missing parentheses
RDMA/cxgb4: Initialization errors can cause crash
RDMA/cxgb4: Don't change QP state outside EP lock
RDMA/cma: Add an ID_REUSEADDR option
RDMA/cma: Fix handling of IPv6 addressing in cma_use_port
* 'stable/backend.base.v3' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/pci: Fix compiler error when CONFIG_XEN_PRIVILEGED_GUEST is not set.
xen/p2m: Add EXPORT_SYMBOL_GPL to the M2P override functions.
xen/p2m/m2p/gnttab: Support GNTMAP_host_map in the M2P override.
xen/irq: The Xen hypervisor cleans up the PIRQs if the other domain forgot.
xen/irq: Export 'xen_pirq_from_irq' function.
xen/irq: Add support to check if IRQ line is shared with other domains.
xen/irq: Check if the PCI device is owned by a domain different than DOMID_SELF.
xen/pci: Add xen_[find|register|unregister]_device_domain_owner functions.
* 'stable/gntalloc.v7' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/gntdev,gntalloc: Remove unneeded VM flags
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
params.c: Use new strtobool function to process boolean inputs
debugfs: move to new strtobool
Add a strtobool function matching semantics of existing in kernel equivalents
modpost: Update 64k section support for binutils 2.18.50
module: Use binary search in lookup_symbol()
module: Use the binary search for symbols resolution
lib: Add generic binary search function to the kernel.
module: Sort exported symbols
module: each_symbol_section instead of each_symbol
module: split unset_section_ro_nx function.
module: undo module RONX protection correctly.
module: zero mod->init_ro_size after init is freed.
minor ANSI prototype sparse fix
module: reorder kparam_array to remove alignment padding on 64 bit builds
module: remove 64 bit alignment padding from struct module with CONFIG_TRACE*
module: do not hide __modver_version_show declaration behind ifdef
module: deal with alignment issues in built-in module versions
This is removes the use of software prefetching from the regular list
iterators. We don't want it. If you do want to prefetch in some
iterator of yours, go right ahead. Just don't expect the iterator to do
it, since normally the downsides are bigger than the upsides.
It also replaces <linux/prefetch.h> with <linux/const.h>, because the
use of LIST_POISON ends up needing it. <linux/poison.h> is sadly not
self-contained, and including prefetch.h just happened to hide that.
Suggested by David Miller (networking has a lot of regular lists that
are often empty or a single entry, and prefetching is not going to do
anything but add useless instructions).
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A driver for the AK4641 codec used in iPAQ hx4700 and Glofiish M800
among others.
Signed-off-by: Harald Welte <laforge@gnumonks.org>
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Signed-off-by: Dmitry Artamonow <mad_soft@inbox.ru>
Acked-by: Liam Girdwood <lrg@ti.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
They not only increase the code footprint, they actually make things
slower rather than faster. On internationally acclaimed benchmarks
("make -j16" on an already fully built kernel source tree) the hlist
prefetching slows down the build by up to 1%.
(Almost all of it comes from hlist_for_each_entry_rcu() as used by
avc_has_perm_noaudit(), which is very hot due to all the pathname
lookups to see if there is anything to do).
The cause seems to be two-fold:
- on at least some Intel cores, prefetch(NULL) ends up with some
microarchitectural stall due to the TLB miss that it incurs. The
hlist case triggers this very commonly, since the NULL pointer is the
last entry in the list.
- the prefetch appears to cause more D$ activity, probably because it
prefetches hash list entries that are never actually used (because we
ended the search early due to a hit).
Regardless, the numbers clearly say that the implicit prefetching is
simply a bad idea. If some _particular_ user of the hlist iterators
wants to prefetch the next list entry, they can do so themselves
explicitly, rather than depend on all list iterators doing so
implicitly.
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ipv6 has per device ICMP SNMP counters, taking too much space because
they use percpu storage.
needed size per device is :
(512+4)*sizeof(long)*number_of_possible_cpus*2
On a 32bit kernel, 16 possible cpus, this wastes more than 64kbytes of
memory per ipv6 enabled network device, taken in vmalloc pool.
Since ICMP messages are rare, just use shared counters (atomic_long_t)
Per network space ICMP counters are still using percpu memory, we might
also convert them to shared counters in a future patch.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Denys Fedoryshchenko <denys@visp.net.lb>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds a new generic gpio rfkill driver to support rfkill switches
which are controlled by gpios. The driver also supports passing in
data about the clock for the radio, so that when rfkill is blocking,
it can disable the clock.
This driver assumes platform data is passed from the board files to
configure it for specific devices.
Original-patch-by: Anantha Idapalapati <aidapalapati@nvidia.com>
Signed-off-by: Rhyland Klein <rklein@nvidia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The sht15 sensor allows validating exchanges to and from the device
using a crc8 function. An utility function to reverse a byte has also
been added.
Signed-off-by: Jerome Oufella <jerome.oufella@savoirfairelinux.com>
Acked-by: Jonathan Cameron <jic23@cam.ac.uk>
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
* Add support for:
- Heater.
- End of battery notice.
- Ability not to reload from OTP.
- Low resolution (12bit temp, 8bit humidity).
* Add an utility function to read individual bytes from the device.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Jonathan Cameron <jic23@cam.ac.uk>
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
* Add a documentation file for the device.
* Respect a bit more the kernel-doc syntax.
* Rename some variables for clarity.
* Use bool type for flags.
* Use an enum for states (actions being done).
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Jonathan Cameron <jic23@cam.ac.uk>
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
cfg80211 scan code adds separate BSS entries if the same BSS shows up
on multiple channels. However, sme implementation does not use the
frequency when fetching the BSS entry. Fix this by adding channel
information to cfg80211_roamed() and include it in cfg80211_get_bss()
calls.
Please note that drivers using cfg80211_roamed() need to be modified to
fully implement this fix. This commit includes only minimal changes to
avoid compilation issues; it maintains the old (broken) behavior for
most drivers. ath6kl was the only one that I could test, so I updated
it to provide the operating frequency in the roamed event.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Some ARM SoCs have clock event devices which have their frequency
modified due to frequency scaling. Provide an interface which allows
to reconfigure an active device. After reconfiguration reprogram the
current pending event.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: LAK <linux-arm-kernel@lists.infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Link: http://lkml.kernel.org/r/%3C20110518210136.437459958%40linutronix.de%3E
All clockevent devices have the same open coded initialization
functions. Provide an interface which does all necessary
initialization in the core code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Link: http://lkml.kernel.org/r/%3C20110518210136.331975870%40linutronix.de%3E
Group the hot path members of struct clock_event_device together so we
have a better cache line footprint. Make it cacheline aligned.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Link: http://lkml.kernel.org/r/%3C20110518210136.223607682%40linutronix.de%3E
Group the hot path members of struct clocksource together so we have a
better cache line footprint. Make it cacheline aligned.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Link: http://lkml.kernel.org/r/%3C20110518210136.003081882%40linutronix.de%3E
Some embedded devices like the Netgear WNDR3300 have two SSB based cards
without an own sprom on the pci bus. We have to provide two different
fallback sproms for these and this was not possible with the old solution.
In the bcm47xx architecture the sprom data is stored in the nvram in the
main flash storage. The architecture code will be able to fill the sprom
with the stored data based on the bus where the device was found.
The bcm63xx code should do the same thing as before, just using the new
API.
Acked-by: Michael Buesch <mb@bu3sch.de>
Cc: netdev@vger.kernel.org
Cc: linux-wireless@vger.kernel.org
Cc: Florian Fainelli <florian@openwrt.org>
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/2362/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
This is a rename of the usr_strtobool proposal, which was a renamed,
relocated and fixed version of previous kstrtobool RFC
Signed-off-by: Jonathan Cameron <jic23@cam.ac.uk>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There a large number hand-coded binary searches in the kernel (run
"git grep search | grep binary" to find many of them). Since in my
experience, hand-coding binary searches can be error-prone, it seems
worth cleaning this up by providing a generic binary search function.
This generic binary search implementation comes from Ksplice. It has
the same basic API as the C library bsearch() function. Ksplice uses
it in half a dozen places with 4 different comparison functions, and I
think our code is substantially cleaner because of this.
Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Extra-bikeshedding-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Extra-bikeshedding-by: André Goddard Rosa <andre.goddard@gmail.com>
Extra-bikeshedding-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Alessio Igor Bogani <abogani@kernel.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This patch places every exported symbol in its own section
(i.e. "___ksymtab+printk"). Thus the linker will use its SORT() directive
to sort and finally merge all symbol in the right and final section
(i.e. "__ksymtab").
The symbol prefixed archs use an underscore as prefix for symbols.
To avoid collision we use a different character to create the temporary
section names.
This work was supported by a hardware donation from the CE Linux Forum.
Signed-off-by: Alessio Igor Bogani <abogani@kernel.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (folded in '+' fixup)
Tested-by: Dirk Behme <dirk.behme@googlemail.com>
Instead of having a callback function for each symbol in the kernel,
have a callback for each array of symbols.
This eases the logic when we move to sorted symbols and binary search.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Alessio Igor Bogani <abogani@kernel.org>
Reorder structure kparam_array to remove 8 bytes of alignment padding on
64 bit builds, dropping its size from 40 to 32 bytes.
Also update the macro module_param_array_named to initialise the
structure using its member names to allow it to be changed without
touching all its call sites.
'git grep' finds module_param_array in 1037 places so this patch will
save a small amount of data space across many modules.
Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reorder struct module to remove 24 bytes of alignment padding on 64 bit
builds when the CONFIG_TRACE options are selected. This allows the
structure to fit into one fewer cache lines, and its size drops from 592
to 568 on x86_64.
Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Doing so prevents the following warning from sparse:
CHECK kernel/params.c
kernel/params.c:817:9: warning: symbol '__modver_version_show' was not
declared. Should it be static?
since kernel/params.c is never compiled with MODULE being set.
Signed-off-by: Dmitry Torokhov <dtor@vmware.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
On m68k natural alignment is 2-byte boundary but we are trying to
align structures in __modver section on sizeof(void *) boundary.
This causes trouble when we try to access elements in this section
in array-like fashion when create "version" attributes for built-in
modules.
Moreover, as DaveM said, we can't reliably put structures into
independent objects, put them into a special section, and then expect
array access over them (via the section boundaries) after linking the
objects together to just "work" due to variable alignment choices in
different situations. The only solution that seems to work reliably
is to make an array of plain pointers to the objects in question and
put those pointers in the special section.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Dmitry Torokhov <dtor@vmware.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since users of the function tracer can now pick and choose which
functions they want to trace agnostically from other users of the
function tracer, we need to pass the ops struct to the ftrace_set_filter()
functions.
The functions ftrace_set_global_filter() and ftrace_set_global_notrace()
is added to keep the old filter functions which are used to modify
the generic function tracers.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
It's way past it's usefulness. And this gets rid of a bunch
of stray ->rt_{dst,src} references.
Even the comment documenting the macro was inaccurate (stated
default was 1 when it's 0).
If reintroduced, it should be done properly, with dynamic debug
facilities.
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6:
drivercore: revert addition of of_match to struct device
of: fix race when matching drivers
Now that functions may be selected individually, it only makes sense
that we should allow dynamically allocated trace structures to
be traced. This will allow perf to allocate a ftrace_ops structure
at runtime and use it to pick and choose which functions that
structure will trace.
Note, a dynamically allocated ftrace_ops will always be called
indirectly instead of being called directly from the mcount in
entry.S. This is because there's no safe way to prevent mcount
from being preempted before calling the function, unless we
modify every entry.S to do so (not likely). Thus, dynamically allocated
functions will now be called by the ftrace_ops_list_func() that
loops through the ops that are allocated if there are more than
one op allocated at a time. This loop is protected with a
preempt_disable.
To determine if an ftrace_ops structure is allocated or not, a new
util function was added to the kernel/extable.c called
core_kernel_data(), which returns 1 if the address is between
_sdata and _edata.
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ftrace_ops that are registered to trace functions can now be
agnostic to each other in respect to what functions they trace.
Each ops has their own hash of the functions they want to trace
and a hash to what they do not want to trace. A empty hash for
the functions they want to trace denotes all functions should
be traced that are not in the notrace hash.
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Every function has its own record that stores the instruction
pointer and flags for the function to be traced. There are only
two flags: enabled and free. The enabled flag states that tracing
for the function has been enabled (actively traced), and the free
flag states that the record no longer points to a function and can
be used by new functions (loaded modules).
These flags are now moved to the MSB of the flags (actually just
the top 32bits). The rest of the bits (30 bits) are now used as
a ref counter. Everytime a tracer register functions to trace,
those functions will have its counter incremented.
When tracing is enabled, to determine if a function should be traced,
the counter is examined, and if it is non-zero it is set to trace.
When a ftrace_ops is registered to trace functions, its hashes
are examined. If the ftrace_ops filter_hash count is zero, then
all functions are set to be traced, otherwise only the functions
in the hash are to be traced. The exception to this is if a function
is also in the ftrace_ops notrace_hash. Then that function's counter
is not incremented for this ftrace_ops.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Combine the filter and notrace hashes to be accessed by a single entity,
the global_ops. The global_ops is a ftrace_ops structure that is passed
to different functions that can read or modify the filtering of the
function tracer.
The ftrace_ops structure was modified to hold a filter and notrace
hashes so that later patches may allow each ftrace_ops to have its own
set of rules to what functions may be filtered.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
When multiple users are allowed to have their own set of functions
to trace, having the FTRACE_FL_FILTER flag will not be enough to
handle the accounting of those users. Each user will need their own
set of functions.
Replace the FTRACE_FL_FILTER with a filter_hash instead. This is
temporary until the rest of the function filtering accounting
gets in.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
To prepare for the accounting system that will allow multiple users of
the function tracer, having the FTRACE_FL_NOTRACE as a flag in the
dyn_trace record does not make sense.
All ftrace_ops will soon have a hash of functions they should trace
and not trace. By making a global hash of functions not to trace makes
this easier for the transition.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Commit b826291c, "drivercore/dt: add a match table pointer to struct
device" added an of_match pointer to struct device to cache the
of_match_table entry discovered at driver match time. This was unsafe
because matching is not an atomic operation with probing a driver. If
two or more drivers are attempted to be matched to a driver at the
same time, then the cached matching entry pointer could get
overwritten.
This patch reverts the of_match cache pointer and reworks all users to
call of_match_device() directly instead.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
If two drivers are probing devices at the same time, both will write
their match table result to the dev->of_match cache at the same time.
Only write the result if the device matches.
In a thread titled "SBus devices sometimes detected, sometimes not",
Meelis reported his SBus hme was not detected about 50% of the time.
From the debug suggested by Grant it was obvious another driver matched
some devices between the call to match the hme and the hme discovery
failling.
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Milton Miller <miltonm@bga.com>
[grant.likely: modified to only call of_match_device() once]
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: don't delay blk_run_queue_async
scsi: remove performance regression due to async queue run
blk-throttle: Use task_subsys_state() to determine a task's blkio_cgroup
block: rescan partitions on invalidated devices on -ENOMEDIA too
cdrom: always check_disk_change() on open
block: unexport DISK_EVENT_MEDIA_CHANGE for legacy/fringe drivers
Fix the compile warning, do_sigtimedwait(struct timespec *) in signal.h
needs the forward declaration of timespec.
Reported-and-acked-by: Mike Frysinger <vapier.adi@gmail.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
generic_handle_irq() is missing a NULL pointer check for the result of
irq_to_desc. This was a not a big problem, but we want to expose it to
drivers, so we better have sanity checks in place. Add a return value
as well, which indicates that the irq number was valid and the handler
was invoked.
Based on the pure code move from Jonathan Cameron.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jonathan Cameron <jic23@cam.ac.uk>
Provide a stub for proc_mkdir_mode() when CONFIG_PROC_FS is not
enabled, just like the stub for proc_mkdir().
Fixes this linux-next build error:
drivers/net/wireless/airo.c:4504: error: implicit declaration of function 'proc_mkdir_mode'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "John W. Linville" <linville@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce generic .prepare() and .complete() power management
callbacks, currently missing, that can be used by subsystems and
power domains and export them. Provide NULL definitions of all
the generic system sleep callbacks for CONFIG_PM_SLEEP unset.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
If device drivers allocate substantial amounts of memory (above 1 MB)
in their hibernate .freeze() callbacks (or in their legacy suspend
callbcks during hibernation), the subsequent creation of hibernate
image may fail due to the lack of memory. This is the case, because
the drivers' .freeze() callbacks are executed after the hibernate
memory preallocation has been carried out and the preallocated amount
of memory may be too small to cover the new driver allocations.
Unfortunately, the drivers' .prepare() callbacks also are executed
after the hibernate memory preallocation has completed, so they are
not suitable for allocating additional memory either. Thus the only
way a driver can safely allocate memory during hibernation is to use
a hibernate/suspend notifier. However, the notifiers are called
before the freezing of user space and the drivers wanting to use them
for allocating additional memory may not know how much memory needs
to be allocated at that point.
To let device drivers overcome this difficulty rework the hibernation
sequence so that the memory preallocation is carried out after the
drivers' .prepare() callbacks have been executed, so that the
.prepare() callbacks can be used for allocating additional memory
to be used by the drivers' .freeze() callbacks. Update documentation
to match the new behavior of the code.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* power-domains:
PM: Fix build issue in clock_ops.c for CONFIG_PM_RUNTIME unset
PM: Revert "driver core: platform_bus: allow runtime override of dev_pm_ops"
OMAP1 / PM: Use generic clock manipulation routines for runtime PM
PM / Runtime: Generic clock manipulation rountines for runtime PM (v6)
PM / Runtime: Add subsystem data field to struct dev_pm_info
OMAP2+ / PM: move runtime PM implementation to use device power domains
PM / Platform: Use generic runtime PM callbacks directly
shmobile: Use power domains for platform runtime PM
PM: Export platform bus type's default PM callbacks
PM: Make power domain callbacks take precedence over subsystem ones
* syscore:
PM: Remove sysdev suspend, resume and shutdown operations
PM / PowerPC: Use struct syscore_ops instead of sysdevs for PM
PM / UNICORE32: Use struct syscore_ops instead of sysdevs for PM
PM / AVR32: Use struct syscore_ops instead of sysdevs for PM
PM / Blackfin: Use struct syscore_ops instead of sysdevs for PM
ARM / Samsung: Use struct syscore_ops for "core" power management
ARM / PXA: Use struct syscore_ops for "core" power management
ARM / SA1100: Use struct syscore_ops for "core" power management
ARM / Integrator: Use struct syscore_ops for core PM
ARM / OMAP: Use struct syscore_ops for "core" power management
ARM: Use struct syscore_ops instead of sysdevs for PM in common code
We need to prevent kernel-forked processes during system poweroff.
Such processes try to access the filesystem whose disks we are
trying to shutdown at the same time. This causes delays and exceptions
in the storage drivers.
A follow-up patch will add these calls and need usermodehelper_disable()
also on systems without suspend support.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Some drivers erroneously use request_firmware() from their ->resume()
(or ->thaw(), or ->restore()) callbacks, which is not going to work
unless the firmware has been built in. This causes system resume to
stall until the firmware-loading timeout expires, which makes users
think that the resume has failed and reboot their machines
unnecessarily. For this reason, make _request_firmware() print a
warning and return immediately with error code if it has been called
when tasks are frozen and it's impossible to start any new usermode
helpers.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Reviewed-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
If CONFIG_PROC_SYSCTL=n the building process fails:
ping.c:(.text+0x52af3): undefined reference to `inet_get_ping_group_range_net'
Moved inet_get_ping_group_range_net() to ping.c.
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit c21e6beb removed our queue request_fn re-enter
protection, and defaulted to always running the queues from
kblockd to be safe. This was a known potential slow down,
but should be safe.
Unfortunately this is causing big performance regressions for
some, so we need to improve this logic. Looking into the details
of the re-enter, the real issue is on requeue of requests.
Requeue of requests upon seeing a BUSY condition from the device
ends up re-running the queue, causing traces like this:
scsi_request_fn()
scsi_dispatch_cmd()
scsi_queue_insert()
__scsi_queue_insert()
scsi_run_queue()
scsi_request_fn()
...
potentially causing the issue we want to avoid. So special
case the requeue re-run of the queue, but improve it to offload
the entire run of local queue and starved queue from a single
workqueue callback. This is a lot better than potentially
kicking off a workqueue run for each device seen.
This also fixes the issue of the local device going into recursion,
since the above mentioned commit never moved that queue run out
of line.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc:
Revert "mmc: fix a race between card-detect rescan and clock-gate work instances"
The init and exit sections should not be traced and adding a call to
mcount to them is a waste of text and instruction cache. Have the
macro section attributes include notrace to ignore these functions
for tracing from the build.
Link: http://lkml.kernel.org/r/20110421023738.953028219@goodmis.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The platform_bus_set_pm_ops() operation is deprecated in favor of the
new device power domain infrastructre implemented in commit
7538e3db6e (PM: add support for device
power domains)
Signed-off-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Currently the devices that have already stripped IEEE 802.11
header from the AMSDU SKB can not use ieee80211_amsdu_to_8023s
routine. This patch enhances ieee80211_amsdu_to_8023s() API by
changing mandatory removing of IEEE 802.11 header from AMSDU
to optional.
Signed-off-by: Yogesh Ashok Powar <yogeshp@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
These definitions need to be exposed now that we can set the peer link
states via NL80211_ATTR_STA_PLINK_STATE. They were already being
(opaquely) reported by NL80211_STA_INFO_PLINK_STATE.
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add the ability to advertise interface combinations in nl80211.
This allows the driver to indicate what the combinations are
that it supports. "Combinations" of just a single interface are
implicit, as previously. Note that cfg80211 will enforce that
the restrictions are met, but not for all drivers yet (once all
drivers are updated, we can remove the flag and enforce for all).
When no combinations are actually supported, an empty list will
be exported so that userspace can know if the kernel exported
this info or not (although it isn't clear to me what tools using
the info should do if the kernel didn't export it).
Since some interface types are purely virtual/software and don't
fit the restrictions, those are exposed in a new list of pure SW
types, not subject to restrictions. This mainly exists to handle
AP-VLAN and monitor interfaces in mac80211.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Currently after mount/remount operation on pstore filesystem,
the content on pstore will be lost. It is because current ERST
implementation doesn't support multi-user usage, which moves
internal pointer to the end after accessing it. Adding
multi-user support for pstore usage.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
the return type of function _read_ in pstore is size_t,
but in the callback function of _read_, the logic doesn't
consider it too much, which means if negative value (assuming
error here) is returned, it will be converted to positive because
of type casting. ssize_t is enough for this function.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm: Take lock around probes for drm_fb_helper_hotplug_event
drm/i915: Revert i915.semaphore=1 default from 47ae63e0
vga_switcheroo: don't toggle-switch devices
drm/radeon/kms: add some evergreen/ni safe regs
drm/radeon/kms: fix extended lvds info parsing
drm/radeon/kms: fix tiling reg on fusion
We need to hold the dev->mode_config.mutex whilst detecting the output
status. But we also need to drop it for the call into
drm_fb_helper_single_fb_probe(), which indirectly acquires the lock when
attaching the fbcon.
Failure to do so exposes a race with normal output probing. Detected by
adding some warnings that the mutex is held to the backend detect routines:
[ 17.772456] WARNING: at drivers/gpu/drm/i915/intel_crt.c:471 intel_crt_detect+0x3e/0x373 [i915]()
[ 17.772458] Hardware name: Latitude E6400
[ 17.772460] Modules linked in: ....
[ 17.772582] Pid: 11, comm: kworker/0:1 Tainted: G W 2.6.38.4-custom.2 #8
[ 17.772584] Call Trace:
[ 17.772591] [<ffffffff81046af5>] ? warn_slowpath_common+0x78/0x8c
[ 17.772603] [<ffffffffa03f3e5c>] ? intel_crt_detect+0x3e/0x373 [i915]
[ 17.772612] [<ffffffffa0355d49>] ? drm_helper_probe_single_connector_modes+0xbf/0x2af [drm_kms_helper]
[ 17.772619] [<ffffffffa03534d5>] ? drm_fb_helper_probe_connector_modes+0x39/0x4d [drm_kms_helper]
[ 17.772625] [<ffffffffa0354760>] ? drm_fb_helper_hotplug_event+0xa5/0xc3 [drm_kms_helper]
[ 17.772633] [<ffffffffa035577f>] ? output_poll_execute+0x146/0x17c [drm_kms_helper]
[ 17.772638] [<ffffffff81193c01>] ? cfq_init_queue+0x247/0x345
[ 17.772644] [<ffffffffa0355639>] ? output_poll_execute+0x0/0x17c [drm_kms_helper]
[ 17.772648] [<ffffffff8105b540>] ? process_one_work+0x193/0x28e
[ 17.772652] [<ffffffff8105c6bc>] ? worker_thread+0xef/0x172
[ 17.772655] [<ffffffff8105c5cd>] ? worker_thread+0x0/0x172
[ 17.772658] [<ffffffff8105c5cd>] ? worker_thread+0x0/0x172
[ 17.772663] [<ffffffff8105f767>] ? kthread+0x7a/0x82
[ 17.772668] [<ffffffff8100a724>] ? kernel_thread_helper+0x4/0x10
[ 17.772671] [<ffffffff8105f6ed>] ? kthread+0x0/0x82
[ 17.772674] [<ffffffff8100a720>] ? kernel_thread_helper+0x0/0x10
Reported-by: Frederik Himpe <fhimpe@telenet.be>
References: https://bugs.freedesktop.org/show_bug.cgi?id=36394
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Do proper handling of dev_queue_xmit errors in order to
avoid double free of skb and leaks in error conditions.
In cfctrl pending requests are removed when CAIF Link layer goes down.
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use struct net to reference CAIF configuration object instead of static variables.
Refactor functions caif_connect_client, caif_disconnect_client and squach
files cfcnfg.c and caif_config_utils.
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CAIF Socket Layer and ip-interface registers reference counters
in CAIF service layer. The functions sock_hold, sock_put and
dev_hold, dev_put are used by CAIF Stack to protect from freeing
memory while packets are in-flight.
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of having reference counts in caif service layers,
we hook into existing refcount handling in socket layer and netdevice.
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce Per-cpu reference for lower part of CAIF Stack.
Before freeing payload is disabled, synchronize_rcu() is called,
and then ref-count verified to be zero.
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RCU read_lock and refcount is used to protect in-flight packets.
Use RCU and counters to manage freeing lower part of the CAIF stack if
CAIF-link layer is removed. Old solution based on delaying removal of
device is removed.
When CAIF link layer goes down the use of CAIF link layer is disabled
(by calling caif_set_phy_state()), but removal and freeing of the
lower part of the CAIF stack is done when Link layer is unregistered.
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace spin_lock with rcu_read_lock when accessing lists to layers
and cache. While packets are in flight rcu_read_lock should not be held,
instead ref-counters are used in combination with RCU.
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
FS_COW_FL and FS_NOCOW_FL were newly introduced to control per file
COW in btrfs, but FS_NOCOW_FL is sufficient.
The fact is we don't have corresponding BTRFS_INODE_COW flag.
COW is default, and FS_NOCOW_FL can be used to switch off COW for
a single file.
If we mount btrfs with nodatacow, a newly created file will be set with
the FS_NOCOW_FL flag. So to turn on COW for it, we can just clear the
FS_NOCOW_FL flag.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The old IDE cmd64x checks the status of the CNTRL register to see if
the ports are enabled before probing them. pata_cmd64x doesn't do
this, which causes a HPMC on parisc when it tries to poke at the
secondary port because apparently the BAR isn't wired up (and a
non-responding piece of memory causes a HPMC).
Fix this by porting the CNTRL register port detection logic from IDE
cmd64x. In addition, following converns from Alan Cox, add a check to
see if a mobility electronics bridge is the immediate parent and forgo
the check if it is (prevents problems on hotplug controllers).
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Conflicts:
arch/ia64/kernel/cyclone.c
arch/mips/kernel/i8253.c
arch/x86/kernel/i8253.c
Reason: Resolve conflicts so further cleanups do not conflict further
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Convert the footbridge isa-timer code to use generic i8253 clocksource.
Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
On some arches (x86, sh, arm, unicore, powerpc) the oops message would
print out the last sysfs file accessed.
This was very useful in finding a number of sysfs and driver core bugs
in the 2.5 and early 2.6 development days, but it has been a number of
years since this file has actually helped in debugging anything that
couldn't also be trivially determined from the stack traceback.
So it's time to delete the line. This is good as we need all the space
we can get for oops messages at times on consoles.
Acked-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
bridge: fix forwarding of IPv6
bonding,llc: Fix structure sizeof incompatibility for some PDUs
ipv6: restore correct ECN handling on TCP xmit
ne-h8300: Fix regression caused during net_device_ops conversion
hydra: Fix regression caused during net_device_ops conversion
zorro8390: Fix regression caused during net_device_ops conversion
sfc: Always map MCDI shared memory as uncacheable
ehea: Fix memory hotplug oops
libertas: fix cmdpendingq locking
iwlegacy: fix IBSS mode crashes
ath9k: Fix a warning due to a queued work during S3 state
mac80211: don't start the dynamic ps timer if not associated
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFSv4.1: Ensure that layoutget uses the correct gfp modes
NFSv4.1: remove pnfs_layout_hdr from pnfs_destroy_all_layouts tmp_list
NFSv41: Resend on NFS4ERR_RETRY_UNCACHED_REP
Pass in the sk_buff so that we can fetch the necessary keys from
the packet header when working with input routes.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds IPPROTO_ICMP socket kind. It makes it possible to send
ICMP_ECHO messages and receive the corresponding ICMP_ECHOREPLY messages
without any special privileges. In other words, the patch makes it
possible to implement setuid-less and CAP_NET_RAW-less /bin/ping. In
order not to increase the kernel's attack surface, the new functionality
is disabled by default, but is enabled at bootup by supporting Linux
distributions, optionally with restriction to a group or a group range
(see below).
Similar functionality is implemented in Mac OS X:
http://www.manpagez.com/man/4/icmp/
A new ping socket is created with
socket(PF_INET, SOCK_DGRAM, PROT_ICMP)
Message identifiers (octets 4-5 of ICMP header) are interpreted as local
ports. Addresses are stored in struct sockaddr_in. No port numbers are
reserved for privileged processes, port 0 is reserved for API ("let the
kernel pick a free number"). There is no notion of remote ports, remote
port numbers provided by the user (e.g. in connect()) are ignored.
Data sent and received include ICMP headers. This is deliberate to:
1) Avoid the need to transport headers values like sequence numbers by
other means.
2) Make it easier to port existing programs using raw sockets.
ICMP headers given to send() are checked and sanitized. The type must be
ICMP_ECHO and the code must be zero (future extensions might relax this,
see below). The id is set to the number (local port) of the socket, the
checksum is always recomputed.
ICMP reply packets received from the network are demultiplexed according
to their id's, and are returned by recv() without any modifications.
IP header information and ICMP errors of those packets may be obtained
via ancillary data (IP_RECVTTL, IP_RETOPTS, and IP_RECVERR). ICMP source
quenches and redirects are reported as fake errors via the error queue
(IP_RECVERR); the next hop address for redirects is saved to ee_info (in
network order).
socket(2) is restricted to the group range specified in
"/proc/sys/net/ipv4/ping_group_range". It is "1 0" by default, meaning
that nobody (not even root) may create ping sockets. Setting it to "100
100" would grant permissions to the single group (to either make
/sbin/ping g+s and owned by this group or to grant permissions to the
"netadmins" group), "0 4294967295" would enable it for the world, "100
4294967295" would enable it for the users, but not daemons.
The existing code might be (in the unlikely case anyone needs it)
extended rather easily to handle other similar pairs of ICMP messages
(Timestamp/Reply, Information Request/Reply, Address Mask Request/Reply
etc.).
Userspace ping util & patch for it:
http://openwall.info/wiki/people/segoon/ping
For Openwall GNU/*/Linux it was the last step on the road to the
setuid-less distro. A revision of this patch (for RHEL5/OpenVZ kernels)
is in use in Owl-current, such as in the 2011/03/12 LiveCD ISOs:
http://mirrors.kernel.org/openwall/Owl/current/iso/
Initially this functionality was written by Pavel Kankovsky for
Linux 2.4.32, but unfortunately it was never made public.
All ping options (-b, -p, -Q, -R, -s, -t, -T, -M, -I), are tested with
the patch.
PATCH v3:
- switched to flowi4.
- minor changes to be consistent with raw sockets code.
PATCH v2:
- changed ping_debug() to pr_debug().
- removed CONFIG_IP_PING.
- removed ping_seq_fops.owner field (unused for procfs).
- switched to proc_net_fops_create().
- switched to %pK in seq_printf().
PATCH v1:
- fixed checksumming bug.
- CAP_NET_RAW may not create icmp sockets anymore.
RFC v2:
- minor cleanups.
- introduced sysctl'able group range to restrict socket(2).
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With some combinations of arch/compiler (e.g. arm-linux-gcc) the sizeof
operator on structure returns value greater than expected. In cases when the
structure is used for mapping PDU fields it may lead to unexpected results
(such as holes and alignment problems in skb data). __packed prevents this
undesired behavior.
Signed-off-by: Vitalii Demianets <vitas@nppfactor.kiev.ua>
Signed-off-by: David S. Miller <davem@davemloft.net>
If !CONFIG_USERNS, have current_user_ns() defined to (&init_user_ns).
Get rid of _current_user_ns. This requires nsown_capable() to be
defined in capability.c rather than as static inline in capability.h,
so do that.
Request_key needs init_user_ns defined at current_user_ns if
!CONFIG_USERNS, so forward-declare that in cred.h if !CONFIG_USERNS
at current_user_ns() define.
Compile-tested with and without CONFIG_USERNS.
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
[ This makes a huge performance difference for acl_permission_check(),
up to 30%. And that is one of the hottest kernel functions for loads
that are pathname-lookup heavy. ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>