Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
This small patchset contains three accumulated Netfilter/IPVS updates,
they are:
1) Refactorize common NAT code by encapsulating it into a helper
function, similarly to what we do in other conntrack extensions,
from Florian Westphal.
2) A minor format string mismatch fix for IPVS, from Masanari Iida.
3) Add quota support to the netfilter accounting infrastructure, now
you can add quotas to accounting objects via the nfnetlink interface
and use them from iptables. You can also listen to quota
notifications from userspace. This enhancement from Mathieu Poirier.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In some cases, when the driver is already using all the channel
contexts it can handle at once, we have to do an in-place switch
(ie. we cannot afford using an extra context temporarily for the
transaction). But some drivers may not support switching the channel
context assigned to a vif on the fly (ie. without unassigning and
assigning it) while others may only work if the context is changed on
the fly, without unassigning it first.
To allow these different scenarios, add a new driver operation that
let's the driver decide how to handle an in-place switch.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Conflicts:
drivers/net/bonding/bond_alb.c
drivers/net/ethernet/altera/altera_msgdma.c
drivers/net/ethernet/altera/altera_sgdma.c
net/ipv6/xfrm6_output.c
Several cases of overlapping changes.
The xfrm6_output.c has a bug fix which overlaps the renaming
of skb->local_df to skb->ignore_df.
In the Altera TSE driver cases, the register access cleanups
in net-next overlapped with bug fixes done in net.
Similarly a bug fix to send ALB packets in the bonding driver using
the right source address overlaps with cleanups in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
Define separate fields in the sock structure for configuring disabling
checksums in both TX and RX-- sk_no_check_tx and sk_no_check_rx.
The SO_NO_CHECK socket option only affects sk_no_check_tx. Also,
removed UDP_CSUM_* defines since they are no longer necessary.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It doesn't seem like an protocols are setting anything other
than the default, and allowing to arbitrarily disable checksums
for a whole protocol seems dangerous. This can be done on a per
socket basis.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On the mgmt level we have a key type parameter which currently accepts
two possible values: 0x00 for unauthenticated and 0x01 for
authenticated. However, in the internal struct smp_ltk representation we
have an explicit "authenticated" boolean value.
To make this distinction clear, add defines for the possible mgmt values
and do conversion to and from the internal authenticated value.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2014-05-22
This is the last ipsec pull request before I leave for
a three weeks vacation tomorrow. David, can you please
take urgent ipsec patches directly into net/net-next
during this time?
I'll continue to run the ipsec/ipsec-next trees as soon
as I'm back.
1) Simplify the xfrm audit handling, from Tetsuo Handa.
2) Codingstyle cleanup for xfrm_output, from abian Frederick.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Although the implementation probably needs a lot of work, this initial API
allows to implement software TSO in mvneta and mv643xx_eth drivers in a not
so intrusive way.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the NFC pull request for 3.16. We have:
- STMicroeectronics st21nfca support. The st21nfca is an HCI chipset and
thus relies on the HCI stack. This submission provides support for tag
redaer/writer mode (including Type 5) and device tree bindings.
- PM runtime support and a bunch of bug fixes for TI's trf7970a.
- Device tree support for NXP's pn544. Legacy platform data support is
obviously kept intact.
- NFC Tag type 4B support to the NFC Digital stack.
- SOCK_RAW type support to the raw NFC socket, and allow NCI
sniffing from that. This can be extended to report HCI frames and also
proprietarry ones like e.g. the pn533 ones.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJTepRlAAoJEIqAPN1PVmxKnF0P/RvfrZs6CbGNJC+dkEbk90p1
nsngy4+4MmPwJYVzObnLz4Br0k1kmFKiOKske6drjMpgzDWeuQelw3B7bd3FYfxD
YkQsc5RC984xrDoDH5pn8mA6VJqmn7whrmcibTYAixrDqTvo8gw6uja4ryAnSdZm
n7cRbh/A5F/sa7O4mPA0bCTdp4jAS/vOP9rGFDOth/b5yJVs99XmC+AZp/Ad9BUx
+/osWGmBV5jshtX7aPTSxIQB4BUaP/lP1DW8yF5whKDjsHC9QyJcAtw9HfZ4tv2h
YNteZZ8yjM+rSjnDw/LvDc2Gp8Z8P1GYf8D3QN3cWhw1ZvXi7CnqKjEnm41sbfaH
L5esIfsRBUdmk6Ika7zALqmOQFI3PzH+ag96punl29qb2gyBDRSnXKVLirv3xxFG
h7vYtQL43Rosn/4pSilRbYReRwyKbSCxW3un/tUJy0Faafs6q+9oMC2aWbIfTT6l
40n4H9EmzYy2OaaXSFckiIIYYgVDAji8GLXTf+dPHb+NrH3QQOR3m27WzHc4rmYk
kUrv0lKoFswA+VLlIcJTrSKNF21FDjwuImzIWiPz6Fx/+rWJ0b4GlQyIynD72LpR
2LkUhTrxuRuRtxVCtvTdkPlL6Bdp3HO7t4qZ0EirgnpmGK6NScBgABoqFJSbz9uS
UUvZbHVIjLrDU9zzoyz8
=cSl+
-----END PGP SIGNATURE-----
Merge tag 'nfc-next-3.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next
Samuel Ortiz <sameo@linux.intel.com> says:
"NFC: 3.16: First pull request
This is the NFC pull request for 3.16. We have:
- STMicroeectronics st21nfca support. The st21nfca is an HCI chipset and
thus relies on the HCI stack. This submission provides support for tag
redaer/writer mode (including Type 5) and device tree bindings.
- PM runtime support and a bunch of bug fixes for TI's trf7970a.
- Device tree support for NXP's pn544. Legacy platform data support is
obviously kept intact.
- NFC Tag type 4B support to the NFC Digital stack.
- SOCK_RAW type support to the raw NFC socket, and allow NCI
sniffing from that. This can be extended to report HCI frames and also
proprietarry ones like e.g. the pn533 ones."
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Pablo Neira Ayuso says:
====================
Netfilter/nftables updates for net-next
The following patchset contains Netfilter/nftables updates for net-next,
most relevantly they are:
1) Add set element update notification via netlink, from Arturo Borrero.
2) Put all object updates in one single message batch that is sent to
kernel-space. Before this patch only rules where included in the batch.
This series also introduces the generic transaction infrastructure so
updates to all objects (tables, chains, rules and sets) are applied in
an all-or-nothing fashion, these series from me.
3) Defer release of objects via call_rcu to reduce the time required to
commit changes. The assumption is that all objects are destroyed in
reverse order to ensure that dependencies betweem them are fulfilled
(ie. rules and sets are destroyed first, then chains, and finally
tables).
4) Allow to match by bridge port name, from Tomasz Bursztyka. This series
include two patches to prepare this new feature.
5) Implement the proper set selection based on the characteristics of the
data. The new infrastructure also allows you to specify your preferences
in terms of memory and computational complexity so the underlying set
type is also selected according to your needs, from Patrick McHardy.
6) Several cleanup patches for nft expressions, including one minor possible
compilation breakage due to missing mark support, also from Patrick.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Experience with the recent e114a710aa ("tcp: fix cwnd limited
checking to improve congestion control") has shown that there are
common cases where that commit can cause cwnd to be much larger than
necessary. This leads to TSO autosizing cooking skbs that are too
large, among other things.
The main problems seemed to be:
(1) That commit attempted to predict the future behavior of the
connection by looking at the write queue (if TSO or TSQ limit
sending). That prediction sometimes overestimated future outstanding
packets.
(2) That commit always allowed cwnd to grow to twice the number of
outstanding packets (even in congestion avoidance, where this is not
needed).
This commit improves both of these, by:
(1) Switching to a measurement-based approach where we explicitly
track the largest number of packets in flight during the past window
("max_packets_out"), and remember whether we were cwnd-limited at the
moment we finished sending that flight.
(2) Only allowing cwnd to grow to twice the number of outstanding
packets ("max_packets_out") in slow start. In congestion avoidance
mode we now only allow cwnd to grow if it was fully utilized.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Channels in 2.4GHz band overlap, this means that if we
send a probe request on channel 1 and then move to channel
2, we will hear the probe response on channel 2. In this
case, the RSSI will be lower than if we had heard it on
the channel on which it was sent (1 in this case).
The firmware / low level driver can parse the channel in
the DS IE or HT IE and compensate the RSSI so that it will
still have a valid value even if we heard the frame on an
adjacent channel. This can be done up to a certain offset.
Add this offset as a configuration for the low level driver.
A low level driver that can compensate the low RSSI in this
case should assign the maximal offset for which the RSSI
value is still valid.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Implement and export the new cfg80211_get_station() API.
This utility can be used by other kernel modules to obtain
detailed information about a given wireless station.
It will be in particular useful to batman-adv which will
implement a wireless rate based metric.
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add get_expected_throughput() API to mac80211 so that each
driver can implement its own version based on the RC
algorithm they are using (might be using an HW RC algo).
The API returns a value expressed in Kbps.
Also, add the new get_expected_throughput() member
to the rate_control_ops structure in order to be
able to query the RC algorithm (this patch provides an
implementation of this API for both minstrel and
minstrel_ht).
The related member in the station_info object is now
filled accordingly when dumping a station.
Cc: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Users may need information about the expected throughput
towards a given peer.
This value is supposed to consider the size overhead
generated by the 802.11 header.
This value is exported in kbps through the get_station() API
by including it into the station_info object.
Moreover, it is sent to user space when replying to the
nl80211 GET_STATION command.
This information will be useful to the batman-adv module
which will use it for its new metric computation.
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This allows for a more generic NFC sniffing by using SOCKPROTO_RAW
SOCK_RAW to read RAW NFC frames. This is for sniffing anything but LLCP
(HCI, NCI, etc...).
Signed-off-by: Hiren Tandel <hirent@marvell.com>
Signed-off-by: Rahul Tank <rahult@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
This requires changing the nl80211 parsing code a bit to use
intermediate pointers for the allocation, but clarifies the
API towards the drivers.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This also propagates through the drivers.
The orinoco driver uses the cfg80211 API structs for internal
bookkeeping, and so needs a (void *) cast that removes the
const - but that's OK because it allocates those pointers.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The csa_counter_offs was erroneously described as csa_offs in
the docbook section.
This fixes two warnings when making htmldocs (at least):
Warning(include/net/mac80211.h:3428): No description found for parameter 'csa_counter_offs[IEEE80211_MAX_CSA_COUNTERS_NUM]'
Warning(include/net/mac80211.h:3428): Excess struct/union/enum/typedef member 'csa_offs' description in 'ieee80211_mutable_offsets'
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Move the comment in the structure to a description of the
max_num_csa_counters field in the docbook area.
This fixes a warning when building htmldocs (at least):
Warning(include/net/cfg80211.h:3064): No description found for parameter 'max_num_csa_counters'
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Now that all objects are released in the reverse order via the
transaction infrastructure, we can enqueue the release via
call_rcu to save one synchronize_rcu. For small rule-sets loaded
via nft -f, it now takes around 50ms less here.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Instead of caching the original skbuff that contains the netlink
messages, this stores the netlink message sequence number, the
netlink portID and the report flag. This helps to prepare the
introduction of the object release via call_rcu.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Leave the set content in consistent state if we fail to load the
batch. Use the new generic transaction infrastructure to achieve
this.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch speeds up rule-set updates and it also provides a way
to revert updates and leave things in consistent state in case that
the batch needs to be aborted.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch speeds up rule-set updates and it also introduces a way to
revert chain updates if the batch is aborted. The idea is to store the
changes in the transaction to apply that in the commit step.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch reworks the nf_tables API so set updates are included in
the same batch that contains rule updates. This speeds up rule-set
updates since we skip a dialog of four messages between kernel and
user-space (two on each direction), from:
1) create the set and send netlink message to the kernel
2) process the response from the kernel that contains the allocated name.
3) add the set elements and send netlink message to the kernel.
4) process the response from the kernel (to check for errors).
To:
1) add the set to the batch.
2) add the set elements to the batch.
3) add the rule that points to the set.
4) send batch to the kernel.
This also introduces an internal set ID (NFTA_SET_ID) that is unique
in the batch so set elements and rules can refer to new sets.
Backward compatibility has been only retained in userspace, this
means that new nft versions can talk to the kernel both in the new
and the old fashion.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The patch adds message type to the transaction to simplify the
commit the and abort routines. Yet another step forward in the
generalisation of the transaction infrastructure.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch generalises the existing rule transaction infrastructure
so it can be used to handle set, table and chain object transactions
as well. The transaction provides a data area that stores private
information depending on the transaction type.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The new transaction infrastructure updates the family, table and chain
objects in the context structure, so let's deconstify them. While at it,
move the context structure initialization routine to the top of the
source file as it will be also used from the table and chain routines.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The 802.15.4-2011 standard states that for each key, a list of devices
that use this key shall be kept. Previous patches have only considered
two options:
* a device "uses" (or may use) all keys, rendering the list useless
* a device is restricted to a certain set of keys
Another option would be that a device *may* use all keys, but need not
do so, and we are interested in the actual set of keys the device uses.
Recording keys used by any given device may have a noticable performance
impact and might not be needed as often. The common case, in which a
device will not switch keys too often, should still perform well.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow datagram sockets to override the security settings of the device
they send from on a per-socket basis. Requires CAP_NET_ADMIN or
CAP_NET_RAW, since raw sockets can send arbitrary packets anyway.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The added structures match 802.15.4-2011 link-layer security PIBs as
closely as is reasonable. Some lists required by the standard were
modeled as bitmaps (frame_types and command_frame_ids in *llsec_key,
802.15.4-2011 7.5/Table 61), since using lists for those seems a bit
excessive and not particularly useful. The DeviceDescriptorHandleList
was inverted and is here a per-device list, since operations on this
list are likely to have both a key and a device at hand, and per-device
lists of keys are shorter than per-key lists of devices.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support to store local maximum TX power level for
connection when reply for HCI_Read_Transmit_Power_Level is received.
Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds support for Get Connection Information mgmt command
which can be used to query for information about connection, i.e. RSSI
and local TX power level.
In general values cached in hci_conn are returned as long as they are
considered valid, i.e. do not exceed age limit set in hdev. This limit
is calculated as random value between min/max values to avoid client
trying to guess when to poll for updated information.
Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds conn_info_min_age and conn_info_max_age parameters to
debugfs which determine lifetime of connection information. Actual
lifetime will be random value between min and max age.
Default values for min and max age are 1000ms and 3000ms respectively.
Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
RFC 4861 states in 7.2.5:
The IsRouter flag in the cache entry MUST be set based on the
Router flag in the received advertisement. In those cases
where the IsRouter flag changes from TRUE to FALSE as a result
of this update, the node MUST remove that router from the
Default Router List and update the Destination Cache entries
for all destinations using that neighbor as a router as
specified in Section 7.3.3. This is needed to detect when a
node that is used as a router stops forwarding packets due to
being configured as a host.
Currently, when dealing with NA Message which IsRouter flag changes from
TRUE to FALSE, the kernel only removes router from the Default Router List,
and don't update the Destination Cache entries.
Now in order to update those Destination Cache entries, i introduce
function rt6_clean_tohost().
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current mac_cb handling of ieee802154 is rather awkward and limited.
Decompose the single flags field into multiple fields with the meanings
of each subfield of the flags field to make future extensions (for
example, link-layer security) easier. Also don't set the frame sequence
number in upper layers, since that's a thing the MAC is supposed to set
on frame transmit - we set it on header creation, but assuming that
upper layers do not blindly duplicate our headers, this is fine.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
When dealing with 802.15.4, one often has to know the maximum payload
size for a given packet. This depends on many factors, one of which is
whether or not a security header is present in the frame. These
definitions and functions provide an easy way for any upper layer to
calculate the maximum payload size for a packet. The first obvious user
for this is 6lowpan, which duplicates this calculation and gets it
partially wrong because it ignores security headers.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
> include/net/ip.h:211:5: warning: "CONFIG_SYSCTL" is not defined [-Wundef]
> #if CONFIG_SYSCTL
> ^
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the beacon CSA counters part of ieee80211_mutable_offsets and don't
decrement CSA counters when generating a beacon template. This permits the
driver to offload the CSA counters handling. Since mac80211 updates the probe
responses with the correct counter, the driver should sync the counter's value
with mac80211 using ieee80211_csa_update_counter function.
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add a new API ieee80211_beacon_get_template, which doesn't
affect DTIM counter and should be used if the device generates beacon
frames, and new beacon template is needed. In addition set the offsets
to TIM IE for MESH interface.
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Change the type of NL80211_ATTR_CSA_C_OFF_BEACON and
NL80211_ATTR_CSA_C_OFF_PRESP to be NLA_BINARY which allows
userspace to use beacons and probe responses with
multiple CSA counters.
This isn't breaking the API since userspace can
continue to use nla_put_u16 for this attributes, which
is equivalent to a single element u16 array.
In addition advertise max number of supported CSA counters.
This is needed when using CSA and eCSA IEs together.
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add NL80211_ATTR_CSA_C_OFFSETS_TX which holds an array
of offsets to the CSA counters which should be updated
when sending a management frames with NL80211_CMD_FRAME.
This API should be used by the drivers that wish to keep the
CSA counter updated in probe responses, but do not implement
probe response offloading and so, do not use
ieee80211_proberesp_get function.
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Missing a colon on definition use is a bit odd so
change the macro for the 32 bit case to declare an
__attribute__((unused)) and __deprecated variable.
The __deprecated attribute will cause gcc to emit
an error if the variable is actually used.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_local_port_range is already per netns, so should ip_local_reserved_ports
be. And since it is none by default we don't actually need it when we don't
enable CONFIG_SYSCTL.
By the way, rename inet_is_reserved_local_port() to inet_is_local_reserved_port()
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When using mark-based routing, sockets returned from accept()
may need to be marked differently depending on the incoming
connection request.
This is the case, for example, if different socket marks identify
different networks: a listening socket may want to accept
connections from all networks, but each connection should be
marked with the network that the request came in on, so that
subsequent packets are sent on the correct network.
This patch adds a sysctl to mark TCP sockets based on the fwmark
of the incoming SYN packet. If enabled, and an unmarked socket
receives a SYN, then the SYN packet's fwmark is written to the
connection's inet_request_sock, and later written back to the
accepted socket when the connection is established. If the
socket already has a nonzero mark, then the behaviour is the same
as it is today, i.e., the listening socket's fwmark is used.
Black-box tested using user-mode linux:
- IPv4/IPv6 SYN+ACK, FIN, etc. packets are routed based on the
mark of the incoming SYN packet.
- The socket returned by accept() is marked with the mark of the
incoming SYN packet.
- Tested with syncookies=1 and syncookies=2.
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kernel-originated IP packets that have no user socket associated
with them (e.g., ICMP errors and echo replies, TCP RSTs, etc.)
are emitted with a mark of zero. Add a sysctl to make them have
the same mark as the packet they are replying to.
This allows an administrator that wishes to do so to use
mark-based routing, firewalling, etc. for these replies by
marking the original packets inbound.
Tested using user-mode linux:
- ICMP/ICMPv6 echo replies and errors.
- TCP RST packets (IPv4 and IPv6).
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To avoid large code duplication in IPv6, we need to first simplify
the complicate SYN-ACK sending code in tcp_v4_conn_request().
To use tcp_v4(6)_send_synack() to send all SYN-ACKs, we need to
initialize the mini socket's receive window before trying to
create the child socket and/or building the SYN-ACK packet. So we move
that initialization from tcp_make_synack() to tcp_v4_conn_request()
as a new function tcp_openreq_init_req_rwin().
After this refactoring the SYN-ACK sending code is simpler and easier
to implement Fast Open for IPv6.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Daniel Lee <longinus00@gmail.com>
Signed-off-by: Jerry Chu <hkchu@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Consolidate various cookie checking and generation code to simplify
the fast open processing. The main goal is to reduce code duplication
in tcp_v4_conn_request() for IPv6 support.
Removes two experimental sysctl flags TFO_SERVER_ALWAYS and
TFO_SERVER_COOKIE_NOT_CHKD used primarily for developmental debugging
purposes.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Daniel Lee <longinus00@gmail.com>
Signed-off-by: Jerry Chu <hkchu@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move common TFO functions that will be used by both v4 and v6
to tcp_fastopen.c. Create a helper tcp_fastopen_queue_check().
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Daniel Lee <longinus00@gmail.com>
Signed-off-by: Jerry Chu <hkchu@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At the moment, the ath9k/ath10k DFS module only supports detecting ETSI
radar patterns.
Add a bitmap in the interface combinations, indicating which DFS regions
are supported by the detector. If unset, support for all regions is
assumed.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
As suggested by several people, rename local_df to ignore_df,
since it means "ignore df bit if it is set".
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/altera/altera_sgdma.c
net/netlink/af_netlink.c
net/sched/cls_api.c
net/sched/sch_api.c
The netlink conflict dealt with moving to netlink_capable() and
netlink_ns_capable() in the 'net' tree vs. supporting 'tc' operations
in non-init namespaces. These were simple transformations from
netlink_capable to netlink_ns_capable.
The Altera driver conflict was simply code removal overlapping some
void pointer cast cleanups in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support to store local TX power level for connection
when reply for HCI_Read_Transmit_Power_Level is received.
Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
John W. Linville says:
====================
pull request: wireless 2014-05-08
This one is all from Johannes:
"Here are a few small fixes for the current cycle: radiotap TX flags were
wrong (fix by Bob), Chun-Yeow fixes an SMPS issue with mesh interfaces,
Eliad fixes a locking bug and a cfg80211 state problem and finally
Henning sent me a fix for IBSS rate information."
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When the driver fails during HW restart or resume, the whole
stack goes into a very confused state with interfaces being
up while the hardware is down etc.
Address this by shutting down everything; we'll run into a
lot of warnings in the process but that's better than having
the whole stack get messed up.
Reviewed-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Similarly, when CONFIG_SYSCTL is not set, ping_group_range should still
work, just that no one can change it. Therefore we should move it out of
sysctl_net_ipv4.c. And, it should not share the same seqlock with
ip_local_port_range.
BTW, rename it to ->ping_group_range instead.
Cc: David S. Miller <davem@davemloft.net>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Reported-by: Stefan de Konink <stefan@konink.de>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When CONFIG_SYSCTL is not set, ip_local_port_range should still work,
just that no one can change it. Therefore we should move it out of sysctl_inet.c.
Also, rename it to ->ip_local_ports instead.
Cc: David S. Miller <davem@davemloft.net>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Reported-by: Stefan de Konink <stefan@konink.de>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support to store RSSI for connection when reply for
HCI_Read_RSSI is received.
Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When trying to generate documentation, at least xmldocs, we get the
following warning:
Warning(include/net/cfg80211.h:461): No description found for parameter 'nl80211_iftype'
Fix it by adding the iftype argument name to the
cfg80211_chandef_dfs_required() function declaration.
Reported-and-tested-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
commit 8f0ea0fe3a (snmp: reduce percpu needs by 50%)
reduced snmp array size to 1, so technically it doesn't have to be
an array any more. What's more, after the following commit:
commit 933393f58f
Date: Thu Dec 22 11:58:51 2011 -0600
percpu: Remove irqsafe_cpu_xxx variants
We simply say that regular this_cpu use must be safe regardless of
preemption and interrupt state. That has no material change for x86
and s390 implementations of this_cpu operations. However, arches that
do not provide their own implementation for this_cpu operations will
now get code generated that disables interrupts instead of preemption.
probably no arch wants to have SNMP_ARRAY_SZ == 2. At least after
almost 3 years, no one complains.
So, just convert the array to a single pointer and remove snmp_mib_init()
and snmp_mib_free() as well.
Cc: Christoph Lameter <cl@linux.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The quoted text and figure are from RFC 6040 ("Tunnelling of Explicit
Congestion Notification").
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This exports a new cfg80211_stop_iface() function.
This is intended for driver internal interface
combination management and channel switching.
Due to locking issues (it re-enters driver) the
call is asynchronous and uses cfg80211 event
list/worker.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
It was possible for tx queues to be stuck stopped
if AP CSA finalization failed. In that case
neither stop_ap nor do_stop woke the queues up.
This means it was impossible to perform tx at all
until driver was reloaded or a successful CSA was
performed later.
It was possible to solve this in a simpler manner
however this is more robust and future proof
(having multi-vif CSA in mind).
New sdata->csa_block_tx is introduced to keep
track of which interfaces requested tx to be
blocked for CSA. This is required because mac80211
stops all tx queues for that purpose. This means
queues must be awoken only when last tx-blocking
CSA interface is finished.
It is still possible to have tx queues stopped
after CSA failure but as soon as offending
interfaces are stopped from userspace (stop_ap or
ifdown) tx queues are woken up properly.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Enabling CONFIG_DEBUG_ATOMIC_SLEEP has shown that some rfcomm functions
acquiring spinlocks call sleeping locks further in the chain. Converting
the offending spinlocks into mutexes makes sleeping safe.
Signed-off-by: Libor Pechacek <lpechacek@suse.cz>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Call skb_checksum_init instead of private functions.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Call skb_checksum_init instead of private functions.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
csum_add is really nothing more then add-with-carry which
can be implemented efficiently in some architectures.
Allow architecture to define this protected by HAVE_ARCH_CSUM_ADD.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
pull request: wireless-next 2014-05-02
Please pull this batch of updates intended for the 3.16 stream...
For the mac80211 bits, Johannes says:
"In this round we have a large number of small features and
improvements from people too numerous to list here. The only really
bit thing is Michał and Luca's CSA work (including changing how
interface combination verification is done)."
For the Bluetooth bits, Gustavo says:
"Here goes some patches for the -next release. There is nothing
really special for this pull request, just a bunch of refactors,
fixes and clean ups."
For the ath10k/ath6kl bits, Kalle says:
"For ath6kl Kalle fixed a bunch of checkpatch warnings.
In ath10k we had more changes, major ones being:
* fix memory allocation failures after a firmware crash (Michal)
* some rework of DFS configuration to enable it correctly in all cases
(Michal)
* add a new firmware crash option to make it possible to crash 10.1
firmware for testing purposes (Marek P)
* fix RTS/CTS protection in certain cases (Marek K)
* fix wrong RSSI and rate reporting in some cases (Janusz)
* fix firmware stats reporting (Chun, Ben & Bartosz)"
For the iwlwifi bits, Emmanuel says:
"I have here a bunch of unrelated things. I disabled support for
-7.ucode which means that I can removed a lot of code. Eliad has
a brand new feature: we reduce the Tx power when the link allows -
this reduces our power consumption. The regular changes in power and
scan area. One interesting thing though is the patches from Johannes,
we have now GRO which allows to increase our throughput in TCP Rx. The
main advantage is that it reduces the number of TCP Acks - these TCP
Acks are completely useless when we are using A-MPDU since the first
packet of the A-MPDU generates a TCP Ack which is made obsolete by
the next packets."
Along with that, there are a variety of updates to b43, mwifiex,
rtl8180 and wil6210 drivers and a handful of other updates here
and there.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Right now the core vsock module is the owner of the proto family. This
means there's nothing preventing the transport module from unloading if
there are open sockets, which results in a panic. Fix that by allowing
the transport to be the owner, which will refcount it properly.
Includes version bump to 1.0.1.0-k
Passes checkpatch this time, I swear...
Acked-by: Dmitry Torokhov <dtor@vmware.com>
Signed-off-by: Andy King <acking@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Expose a new tdls flag for the public ieee80211_sta struct.
This can be used in some rate control decisions.
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add locked-version for cfg80211_sched_scan_stopped.
This is used for some users that might want to
call it when rtnl is already locked.
Fixes: d43c6b6 ("mac80211: reschedule sched scan after HW restart")
Cc: stable@vger.kernel.org (3.14+)
Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Commit e114a710aa ("tcp: fix cwnd limited checking to improve
congestion control") obsoleted in_flight parameter from
tcp_is_cwnd_limited() and its callers.
This patch does the removal as promised.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung discovered tcp_is_cwnd_limited() was returning false in
slow start phase even if the application filled the socket write queue.
All congestion modules take into account tcp_is_cwnd_limited()
before increasing cwnd, so this behavior limits slow start from
probing the bandwidth at full speed.
The problem is that even if write queue is full (aka we are _not_
application limited), cwnd can be under utilized if TSO should auto
defer or TCP Small queues decided to hold packets.
So the in_flight can be kept to smaller value, and we can get to the
point tcp_is_cwnd_limited() returns false.
With TCP Small Queues and FQ/pacing, this issue is more visible.
We fix this by having tcp_cwnd_validate(), which is supposed to track
such things, take into account unsent_segs, the number of segs that we
are not sending at the moment due to TSO or TSQ, but intend to send
real soon. Then when we are cwnd-limited, remember this fact while we
are processing the window of ACKs that comes back.
For example, suppose we have a brand new connection with cwnd=10; we
are in slow start, and we send a flight of 9 packets. By the time we
have received ACKs for all 9 packets we want our cwnd to be 18.
We implement this by setting tp->lsnd_pending to 9, and
considering ourselves to be cwnd-limited while cwnd is less than
twice tp->lsnd_pending (2*9 -> 18).
This makes tcp_is_cwnd_limited() more understandable, by removing
the GSO/TSO kludge, that tried to work around the issue.
Note the in_flight parameter can be removed in a followup cleanup
patch.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This replaces 6 identical code snippets with a call to a new
static inline function.
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
DSA drivers have a trick which consists in allocating "priv_size" more
bytes to account for the DSA driver private context. Add a helper
function to access that private context instead of open-coding it in
drivers with (void *)(ds + 1).
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reduce copy-past a bit by adding a common helper.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This extends NL80211_CMD_SET_CHANNEL to allow dynamic channel bandwidth
changes in AP mode (including P2P GO) during a lifetime of the BSS. This
can be used to implement, e.g., HT 20/40 MHz co-existence rules on the
2.4 GHz band.
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When actions are attached to a filter, they are a part of the filter
itself, so when changing a filter we should allow to overwrite the actions
inside as well.
In my specific case, when I tried to _append_ a new action to an existing
filter which already has an action, I got EEXIST since kernel refused
to overwrite the existing one in kernel.
This patch checks if we are changing the filter checking NLM_F_CREATE flag
(Sigh, filters don't use NLM_F_REPLACE...) and then passes the boolean down
to actions. This fixes the problem above.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since there are frequency bands (e.g. 5.9GHz) allowing channels
with only 10 or 5 MHz bandwidth, this patch adds attributes that
allow keeping track about this information.
When channel attributes are reported to user-space, make sure to
not break old tools, i.e. if the 'split wiphy dump' is enabled,
report the extra attributes (if present) describing the bandwidth
restrictions. If the 'split wiphy dump' is not enabled,
completely omit those channels that have flags set to either
IEEE80211_CHAN_NO_10MHZ or IEEE80211_CHAN_NO_20MHZ.
Add the check for new bandwidth restriction flags in
cfg80211_chandef_usable() to comply with the restrictions.
Signed-off-by: Rostislav Lisovy <rostislav.lisovy@fel.cvut.cz>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Some chips can encrypt managment frames in HW, but
require generated IV in the frame. Add a key flag
that allows us to achieve this.
Signed-off-by: Marek Kwaczynski <marek.kwaczynski@tieto.com>
[use BIT(0) to fill that spot, fix indentation]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The patch splits cfg80211_check_combinations()
into an iterator function and a simple iteration
user.
This makes it possible for drivers to asses how
many channels can use given iftype setup. This in
turn can be used for future
multi-interface/multi-channel channel switching.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch allows to switch the netns when packet is encapsulated or
decapsulated.
The vxlan socket is openned into the i/o netns, ie into the netns where
encapsulated packets are received. The socket lookup is done into this netns to
find the corresponding vxlan tunnel. After decapsulation, the packet is
injecting into the corresponding interface which may stand to another netns.
When one of the two netns is removed, the tunnel is destroyed.
Configuration example:
ip netns add netns1
ip netns exec netns1 ip link set lo up
ip link add vxlan10 type vxlan id 10 group 239.0.0.10 dev eth0 dstport 0
ip link set vxlan10 netns netns1
ip netns exec netns1 ip addr add 192.168.0.249/24 broadcast 192.168.0.255 dev vxlan10
ip netns exec netns1 ip link set vxlan10 up
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_net_capable - The common case, operations that are safe in a network namespace.
sk_capable - Operations that are not known to be safe in a network namespace
sk_ns_capable - The general case for special cases.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes noted this is not needed, all of the fragment
accessors don't need CONFIG_NET_NS. This goes test compiled with
CONFIG_BT_6LOWPAN=y and a disabled CONFIG_NET_NS.
CC: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: linux-zigbee-devel@lists.sourceforge.net
Cc: David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This will be useful to create network family dedicated META expression
as for NFPROTO_BRIDGE for instance.
Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Commit f1370cc4 "xfrm: Remove useless secid field from xfrm_audit." changed
"struct xfrm_audit" to have either
{ audit_get_loginuid(current) / audit_get_sessionid(current) } or
{ INVALID_UID / -1 } pair.
This means that we can represent "struct xfrm_audit" as "bool".
This patch replaces "struct xfrm_audit" argument with "bool".
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
It seems to me that commit ab5f5e8b "[XFRM]: xfrm audit calls" is doing
something strange at xfrm_audit_helper_usrinfo().
If secid != 0 && security_secid_to_secctx(secid) != 0, the caller calls
audit_log_task_context() which basically does
secid != 0 && security_secid_to_secctx(secid) == 0 case
except that secid is obtained from current thread's context.
Oh, what happens if secid passed to xfrm_audit_helper_usrinfo() was
obtained from other thread's context? It might audit current thread's
context rather than other thread's context if security_secid_to_secctx()
in xfrm_audit_helper_usrinfo() failed for some reason.
Then, are all the caller of xfrm_audit_helper_usrinfo() passing either
secid obtained from current thread's context or secid == 0?
It seems to me that they are.
If I didn't miss something, we don't need to pass secid to
xfrm_audit_helper_usrinfo() because audit_log_task_context() will
obtain secid from current thread's context.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Add RF tech and framing macros for the ISO/IEC 14443-B Protocol.
Cc: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
load_session allows a CLF to restore the gate <-> pipe table from some
proprietary location.
The main advantage to add this function is to reduce the memory wear by
running pipe creation (and storing) only once.
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Make tcp_cwnd_application_limited() static and move it from tcp_input.c to
tcp_output.c
Signed-off-by: Weiping Pan <wpan@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't rely on driver files or other headers having this file included.
CC: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: linux-zigbee-devel@lists.sourceforge.net
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This will simplify the new reassembly backport
with no code changes being required.
CC: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: linux-zigbee-devel@lists.sourceforge.net
Cc: David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, it is possible to create an SCTP socket, then switch
auth_enable via sysctl setting to 1 and crash the system on connect:
Oops[#1]:
CPU: 0 PID: 0 Comm: swapper Not tainted 3.14.1-mipsgit-20140415 #1
task: ffffffff8056ce80 ti: ffffffff8055c000 task.ti: ffffffff8055c000
[...]
Call Trace:
[<ffffffff8043c4e8>] sctp_auth_asoc_set_default_hmac+0x68/0x80
[<ffffffff8042b300>] sctp_process_init+0x5e0/0x8a4
[<ffffffff8042188c>] sctp_sf_do_5_1B_init+0x234/0x34c
[<ffffffff804228c8>] sctp_do_sm+0xb4/0x1e8
[<ffffffff80425a08>] sctp_endpoint_bh_rcv+0x1c4/0x214
[<ffffffff8043af68>] sctp_rcv+0x588/0x630
[<ffffffff8043e8e8>] sctp6_rcv+0x10/0x24
[<ffffffff803acb50>] ip6_input+0x2c0/0x440
[<ffffffff8030fc00>] __netif_receive_skb_core+0x4a8/0x564
[<ffffffff80310650>] process_backlog+0xb4/0x18c
[<ffffffff80313cbc>] net_rx_action+0x12c/0x210
[<ffffffff80034254>] __do_softirq+0x17c/0x2ac
[<ffffffff800345e0>] irq_exit+0x54/0xb0
[<ffffffff800075a4>] ret_from_irq+0x0/0x4
[<ffffffff800090ec>] rm7k_wait_irqoff+0x24/0x48
[<ffffffff8005e388>] cpu_startup_entry+0xc0/0x148
[<ffffffff805a88b0>] start_kernel+0x37c/0x398
Code: dd0900b8 000330f8 0126302d <dcc60000> 50c0fff1 0047182a a48306a0
03e00008 00000000
---[ end trace b530b0551467f2fd ]---
Kernel panic - not syncing: Fatal exception in interrupt
What happens while auth_enable=0 in that case is, that
ep->auth_hmacs is initialized to NULL in sctp_auth_init_hmacs()
when endpoint is being created.
After that point, if an admin switches over to auth_enable=1,
the machine can crash due to NULL pointer dereference during
reception of an INIT chunk. When we enter sctp_process_init()
via sctp_sf_do_5_1B_init() in order to respond to an INIT chunk,
the INIT verification succeeds and while we walk and process
all INIT params via sctp_process_param() we find that
net->sctp.auth_enable is set, therefore do not fall through,
but invoke sctp_auth_asoc_set_default_hmac() instead, and thus,
dereference what we have set to NULL during endpoint
initialization phase.
The fix is to make auth_enable immutable by caching its value
during endpoint initialization, so that its original value is
being carried along until destruction. The bug seems to originate
from the very first days.
Fix in joint work with Daniel Borkmann.
Reported-by: Joshua Kinard <kumba@gentoo.org>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Tested-by: Joshua Kinard <kumba@gentoo.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As suggested by Julian:
Simply, flowi4_iif must not contain 0, it does not
look logical to ignore all ip rules with specified iif.
because in fib_rule_match() we do:
if (rule->iifindex && (rule->iifindex != fl->flowi_iif))
goto out;
flowi4_iif should be LOOPBACK_IFINDEX by default.
We need to move LOOPBACK_IFINDEX to include/net/flow.h:
1) It is mostly used by flowi_iif
2) Fix the following compile error if we use it in flow.h
by the patches latter:
In file included from include/linux/netfilter.h:277:0,
from include/net/netns/netfilter.h:5,
from include/net/net_namespace.h:21,
from include/linux/netdevice.h:43,
from include/linux/icmpv6.h:12,
from include/linux/ipv6.h:61,
from include/net/ipv6.h:16,
from include/linux/sunrpc/clnt.h:27,
from include/linux/nfs_fs.h:30,
from init/do_mounts.c:32:
include/net/flow.h: In function ‘flowi4_init_output’:
include/net/flow.h:84:32: error: ‘LOOPBACK_IFINDEX’ undeclared (first use in this function)
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the dst->output() path for ipv4, the code assumes the skb it has to
transmit is attached to an inet socket, specifically via
ip_mc_output() : The sk_mc_loop() test triggers a WARN_ON() when the
provider of the packet is an AF_PACKET socket.
The dst->output() method gets an additional 'struct sock *sk'
parameter. This needs a cascade of changes so that this parameter can
be propagated from vxlan to final consumer.
Fixes: 8f646c922d ("vxlan: keep original skb ownership")
Reported-by: lucien xin <lucien.xin@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_queue_xmit() assumes the skb it has to transmit is attached to an
inet socket. Commit 31c70d5956 ("l2tp: keep original skb ownership")
changed l2tp to not change skb ownership and thus broke this assumption.
One fix is to add a new 'struct sock *sk' parameter to ip_queue_xmit(),
so that we do not assume skb->sk points to the socket used by l2tp
tunnel.
Fixes: 31c70d5956 ("l2tp: keep original skb ownership")
Reported-by: Zhan Jianyu <nasa4836@gmail.com>
Tested-by: Zhan Jianyu <nasa4836@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains three Netfilter fixes for your net tree,
they are:
* Fix missing generation sequence initialization which results in a splat
if lockdep is enabled, it was introduced in the recent works to improve
nf_conntrack scalability, from Andrey Vagin.
* Don't flush the GRE keymap list in nf_conntrack when the pptp helper is
disabled otherwise this crashes due to a double release, from Andrey
Vagin.
* Fix nf_tables cmp fast in big endian, from Patrick McHardy.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Francois reported that setting big mtu on loopback device could prevent
tcp sessions making progress.
We do not support (yet ?) IPv6 Jumbograms and cook corrupted packets.
We must limit the IPv6 MTU to (65535 + 40) bytes in theory.
Tested:
ifconfig lo mtu 70000
netperf -H ::1
Before patch : Throughput : 0.05 Mbits
After patch : Throughput : 35484 Mbits
Reported-by: Francois WELLENREITER <f.wellenreiter@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
nft_cmp_fast is used for equality comparisions of size <= 4. For
comparisions of size < 4 byte a mask is calculated that is applied to
both the data from userspace (during initialization) and the register
value (during runtime). Both values are stored using (in effect) memcpy
to a memory area that is then interpreted as u32 by nft_cmp_fast.
This works fine on little endian since smaller types have the same base
address, however on big endian this is not true and the smaller types
are interpreted as a big number with trailing zero bytes.
The mask therefore must not include the lower bytes, but the higher bytes
on big endian. Add a helper function that does a cpu_to_le32 to switch
the bytes on big endian. Since we're dealing with a mask of just consequitive
bits, this works out fine.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pull yet more networking updates from David Miller:
1) Various fixes to the new Redpine Signals wireless driver, from
Fariya Fatima.
2) L2TP PPP connect code takes PMTU from the wrong socket, fix from
Dmitry Petukhov.
3) UFO and TSO packets differ in whether they include the protocol
header in gso_size, account for that in skb_gso_transport_seglen().
From Florian Westphal.
4) If VLAN untagging fails, we double free the SKB in the bridging
output path. From Toshiaki Makita.
5) Several call sites of sk->sk_data_ready() were referencing an SKB
just added to the socket receive queue in order to calculate the
second argument via skb->len. This is dangerous because the moment
the skb is added to the receive queue it can be consumed in another
context and freed up.
It turns out also that none of the sk->sk_data_ready()
implementations even care about this second argument.
So just kill it off and thus fix all these use-after-free bugs as a
side effect.
6) Fix inverted test in tcp_v6_send_response(), from Lorenzo Colitti.
7) pktgen needs to do locking properly for LLTX devices, from Daniel
Borkmann.
8) xen-netfront driver initializes TX array entries in RX loop :-) From
Vincenzo Maffione.
9) After refactoring, some tunnel drivers allow a tunnel to be
configured on top itself. Fix from Nicolas Dichtel.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits)
vti: don't allow to add the same tunnel twice
gre: don't allow to add the same tunnel twice
drivers: net: xen-netfront: fix array initialization bug
pktgen: be friendly to LLTX devices
r8152: check RTL8152_UNPLUG
net: sun4i-emac: add promiscuous support
net/apne: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
net: ipv6: Fix oif in TCP SYN+ACK route lookup.
drivers: net: cpsw: enable interrupts after napi enable and clearing previous interrupts
drivers: net: cpsw: discard all packets received when interface is down
net: Fix use after free by removing length arg from sk_data_ready callbacks.
Drivers: net: hyperv: Address UDP checksum issues
Drivers: net: hyperv: Negotiate suitable ndis version for offload support
Drivers: net: hyperv: Allocate memory for all possible per-pecket information
bridge: Fix double free and memory leak around br_allowed_ingress
bonding: Remove debug_fs files when module init fails
i40evf: program RSS LUT correctly
i40evf: remove open-coded skb_cow_head
ixgb: remove open-coded skb_cow_head
igbvf: remove open-coded skb_cow_head
...
particularly with a focus on RDMA.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: GPGTools - http://gpgtools.org
iQIcBAABAgAGBQJTRXV1AAoJEDZk62b0Tg6xYWAP/0Kfp9/8Z05SQ1fnJveySw1O
nKK//1/GBmquntqFAVHI6yJRLzPJ+Z/Y4u4a0qriwfgpPQvUJQrL77tY/VfEBUPB
VoJG1tj7lpdLrO3p4YyPkPjymyC7YOoFNjGEstWFg7HetwnnqqZL2LB+5yJzyqjx
y9nv1HzsrbAE7j8C4hQ1Nmds5muUb5VhnTtPhjrx4tP1sWWh8XTVJbsVDiEqx6cu
uJXFFTbkONr9jKfv+Ki3H2pZej2yD7w4tU4lkdcGNyij/Q4Xn1iERqroW2/GT6Cl
AXxlIKN24ASjWo4VqW0Wf8gO8vbUtHRChoiZ69DvzNTkbWmAIWSFHyQJ4cinwuyr
UbOQZuccO59QtpNVpBvG/vjnbI54rg+VGLy+xE0vcrBDlyoptc56IAFSg8zJY5UN
ysbyHCGME//9VZ1zeeZvkMjm8z5Enp6x4zmtnUHmufO7DVMTFUePED6U1u9WIyP5
FFy5EboXMSh97yB8REvbIlY2MgBJWYdnyzLKFMeRpzC8fOXJqBoCX8i3Z5SkMYWJ
1FS/pGr7ec/VX1iHXSYi9hhzTJ6o9mEmOIhaO4UcqAuK8Rk2jbCp1Lx0iDhmJtdT
zjofGDe57ro7nOZf8A/TI5z+6BZq8KYVfZrXtYaPqC4rXOzJAox/yHlz9FmAJYgv
ssOIKXX0ujLwqrMBVatj
=yzy/
-----END PGP SIGNATURE-----
Merge tag 'for-linus-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
Pull 9p changes from Eric Van Hensbergen:
"A bunch of updates and cleanup within the transport layer,
particularly with a focus on RDMA"
* tag 'for-linus-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
9pnet_rdma: check token type before int conversion
9pnet: trans_fd : allocate struct p9_trans_fd and struct p9_conn together.
9pnet: p9_client->conn field is unused. Remove it.
9P: Get rid of REQ_STATUS_FLSH
9pnet_rdma: add cancelled()
9pnet_rdma: update request status during send
9P: Add cancelled() to the transport functions.
net: Mark function as static in 9p/client.c
9P: Add memory barriers to protect request fields over cb/rpc threads handoff
Several spots in the kernel perform a sequence like:
skb_queue_tail(&sk->s_receive_queue, skb);
sk->sk_data_ready(sk, skb->len);
But at the moment we place the SKB onto the socket receive queue it
can be consumed and freed up. So this skb->len access is potentially
to freed up memory.
Furthermore, the skb->len can be modified by the consumer so it is
possible that the value isn't accurate.
And finally, no actual implementation of this callback actually uses
the length argument. And since nobody actually cared about it's
value, lots of call sites pass arbitrary values in such as '0' and
even '1'.
So just remove the length argument from the callback, that way there
is no confusion whatsoever and all of these use-after-free cases get
fixed as a side effect.
Based upon a patch by Eric Dumazet and his suggestion to audit this
issue tree-wide.
Signed-off-by: David S. Miller <davem@davemloft.net>
The channels with 5/10MHz bandwidth are not HT. We have to
reflect this in conf_is_ht() function which returns whether the
particular channel is HT or not.
Signed-off-by: Rostislav Lisovy <rostislav.lisovy@fel.cvut.cz>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Commit a2f73b6c5d ("cfg80211: move regulatory flags to their own variable")
renamed WIPHY_FLAG_CUSTOM_REGULATORY to REGULATORY_CUSTOM_REG, but missed to
update one comment.
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
With single-channel drivers, we need to be able to change a running
chanctx if we want to use chanctx reservation. Not all drivers may be
able to do this, so add a flag that indicates support for it.
Changing a running chanctx can also be used as an optimization in
multi-channel drivers when the context needs to be reserved for future
usage.
Introduce IEEE80211_CHANCTX_RESERVED chanctx mode to mark a channel as
reserved so nobody else can use it (since we know it's going to
change). In the future, we may allow several vifs to use the same
reservation as long as they plan to use the chanctx on the same
future channel.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Move the counting part of the interface combination check from
cfg80211 to mac80211.
This is needed to simplify locking when the driver has to perform a
combination check by itself (eg. with channel-switch).
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Some interface types don't require DFS (such as STATION, P2P_CLIENT
etc). In order to centralize these decisions, make
cfg80211_chandef_dfs_required() take the iftype into consideration.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Separate the code that counts the interface types and channels from
the code that check the interface combinations. The new function that
checks for combinations is exported so it can be called by the
drivers.
This is done in preparation for moving the interface combinations
checks out of cfg80211.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Allow GO operation on a channel marked with IEEE80211_CHAN_GO_CONCURRENT
iff there is an active station interface that is associated to
an AP operating on the same channel in the 2 GHz band or the same UNII band
(in the 5 GHz band). This relaxation is not allowed if the channel is
marked with IEEE80211_CHAN_RADAR.
Note that this is a permissive approach to the FCC definitions,
that require a clear assessment that the device operating the AP is
an authorized master, i.e., with radar detection and DFS capabilities.
It is assumed that such restrictions are enforced by user space.
Furthermore, it is assumed, that if the conditions that allowed for
the operation of the GO on such a channel change, i.e., the station
interface disconnected from the AP, it is the responsibility of user
space to evacuate the GO from the channel.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The FCC are clarifying some soft configuration requirements,
which among other include the following:
1. Indoor operation, where a device can use channels requiring indoor
operation, subject to that it can guarantee indoor operation,
i.e., the device is connected to AC Power or the device is under
the control of a local master that is acting as an AP and is
connected to AC Power.
2. Concurrent GO operation, where devices may instantiate a P2P GO
while they are under the guidance of an authorized master. For example,
on a channel on which a BSS is connected to an authorized master, i.e.,
with DFS and radar detection capability in the UNII band.
See https://apps.fcc.gov/eas/comments/GetPublishedDocument.html?id=327&tn=528122
Add support for advertising Indoor-only and GO-Concurrent channel
properties.
Signed-off-by: David Spinadel <david.spinadel@intel.com>
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This will allow the low level driver to make decision based
on the vif such as queues etc...
Since the vif might be NULL, we can't add it to the tracing
functions.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
[fix staging rtl8821ae driver]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When dynamically creating interfaces from userspace, e.g. for P2P usage,
such interfaces are usually owned by the process that created them, i.e.
wpa_supplicant. Should wpa_supplicant crash, such interfaces will often
cease operating properly and cause problems on restarting the process.
To avoid this problem, introduce an ownership concept for interfaces. If
an interface is owned by a netlink socket, then it will be destroyed if
the netlink socket is closed for any reason, including if the process it
belongs to crashed. This gives us a race-free way to get rid of any such
interfaces.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Pull more networking updates from David Miller:
1) If a VXLAN interface is created with no groups, we can crash on
reception of packets. Fix from Mike Rapoport.
2) Missing includes in CPTS driver, from Alexei Starovoitov.
3) Fix string validations in isdnloop driver, from YOSHIFUJI Hideaki
and Dan Carpenter.
4) Missing irq.h include in bnxw2x, enic, and qlcnic drivers. From
Josh Boyer.
5) AF_PACKET transmit doesn't statistically count TX drops, from Daniel
Borkmann.
6) Byte-Queue-Limit enabled drivers aren't handled properly in
AF_PACKET transmit path, also from Daniel Borkmann.
Same problem exists in pktgen, and Daniel fixed it there too.
7) Fix resource leaks in driver probe error paths of new sxgbe driver,
from Francois Romieu.
8) Truesize of SKBs can gradually get more and more corrupted in NAPI
packet recycling path, fix from Eric Dumazet.
9) Fix uniprocessor netfilter build, from Florian Westphal. In the
longer term we should perhaps try to find a way for ARRAY_SIZE() to
work even with zero sized array elements.
10) Fix crash in netfilter conntrack extensions due to mis-estimation of
required extension space. From Andrey Vagin.
11) Since we commit table rule updates before trying to copy the
counters back to userspace (it's the last action we perform), we
really can't signal the user copy with an error as we are beyond the
point from which we can unwind everything. This causes all kinds of
use after free crashes and other mysterious behavior.
From Thomas Graf.
12) Restore previous behvaior of div/mod by zero in BPF filter
processing. From Daniel Borkmann.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (38 commits)
net: sctp: wake up all assocs if sndbuf policy is per socket
isdnloop: several buffer overflows
netdev: remove potentially harmful checks
pktgen: fix xmit test for BQL enabled devices
net/at91_ether: avoid NULL pointer dereference
tipc: Let tipc_release() return 0
at86rf230: fix MAX_CSMA_RETRIES parameter
mac802154: fix duplicate #include headers
sxgbe: fix duplicate #include headers
net: filter: be more defensive on div/mod by X==0
netfilter: Can't fail and free after table replacement
xen-netback: Trivial format string fix
net: bcmgenet: Remove unnecessary version.h inclusion
net: smc911x: Remove unused local variable
bonding: Inactive slaves should keep inactive flag's value
netfilter: nf_tables: fix wrong format in request_module()
netfilter: nf_tables: set names cannot be larger than 15 bytes
netfilter: nf_conntrack: reserve two bytes for nf_ct_ext->len
netfilter: Add {ipt,ip6t}_osf aliases for xt_osf
netfilter: x_tables: allow to use cgroup match for LOCAL_IN nf hooks
...
"len" contains sizeof(nf_ct_ext) and size of extensions. In a worst
case it can contain all extensions. Bellow you can find sizes for all
types of extensions. Their sum is definitely bigger than 256.
nf_ct_ext_types[0]->len = 24
nf_ct_ext_types[1]->len = 32
nf_ct_ext_types[2]->len = 24
nf_ct_ext_types[3]->len = 32
nf_ct_ext_types[4]->len = 152
nf_ct_ext_types[5]->len = 2
nf_ct_ext_types[6]->len = 16
nf_ct_ext_types[7]->len = 8
I have seen "len" up to 280 and my host has crashes w/o this patch.
The right way to fix this problem is reducing the size of the ecache
extension (4) and Florian is going to do this, but these changes will
be quite large to be appropriate for a stable tree.
Fixes: 5b423f6a40 (netfilter: nf_conntrack: fix racy timer handling with reliable)
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pull cgroup updates from Tejun Heo:
"A lot updates for cgroup:
- The biggest one is cgroup's conversion to kernfs. cgroup took
after the long abandoned vfs-entangled sysfs implementation and
made it even more convoluted over time. cgroup's internal objects
were fused with vfs objects which also brought in vfs locking and
object lifetime rules. Naturally, there are places where vfs rules
don't fit and nasty hacks, such as credential switching or lock
dance interleaving inode mutex and cgroup_mutex with object serial
number comparison thrown in to decide whether the operation is
actually necessary, needed to be employed.
After conversion to kernfs, internal object lifetime and locking
rules are mostly isolated from vfs interactions allowing shedding
of several nasty hacks and overall simplification. This will also
allow implmentation of operations which may affect multiple cgroups
which weren't possible before as it would have required nesting
i_mutexes.
- Various simplifications including dropping of module support,
easier cgroup name/path handling, simplified cgroup file type
handling and task_cg_lists optimization.
- Prepatory changes for the planned unified hierarchy, which is still
a patchset away from being actually operational. The dummy
hierarchy is updated to serve as the default unified hierarchy.
Controllers which aren't claimed by other hierarchies are
associated with it, which BTW was what the dummy hierarchy was for
anyway.
- Various fixes from Li and others. This pull request includes some
patches to add missing slab.h to various subsystems. This was
triggered xattr.h include removal from cgroup.h. cgroup.h
indirectly got included a lot of files which brought in xattr.h
which brought in slab.h.
There are several merge commits - one to pull in kernfs updates
necessary for converting cgroup (already in upstream through
driver-core), others for interfering changes in the fixes branch"
* 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (74 commits)
cgroup: remove useless argument from cgroup_exit()
cgroup: fix spurious lockdep warning in cgroup_exit()
cgroup: Use RCU_INIT_POINTER(x, NULL) in cgroup.c
cgroup: break kernfs active_ref protection in cgroup directory operations
cgroup: fix cgroup_taskset walking order
cgroup: implement CFTYPE_ONLY_ON_DFL
cgroup: make cgrp_dfl_root mountable
cgroup: drop const from @buffer of cftype->write_string()
cgroup: rename cgroup_dummy_root and related names
cgroup: move ->subsys_mask from cgroupfs_root to cgroup
cgroup: treat cgroup_dummy_root as an equivalent hierarchy during rebinding
cgroup: remove NULL checks from [pr_cont_]cgroup_{name|path}()
cgroup: use cgroup_setup_root() to initialize cgroup_dummy_root
cgroup: reorganize cgroup bootstrapping
cgroup: relocate setting of CGRP_DEAD
cpuset: use rcu_read_lock() to protect task_cs()
cgroup_freezer: document freezer_fork() subtleties
cgroup: update cgroup_transfer_tasks() to either succeed or fail
cgroup: drop task_lock() protection around task->cgroups
cgroup: update how a newly forked task gets associated with css_set
...
Pull networking updates from David Miller:
"Here is my initial pull request for the networking subsystem during
this merge window:
1) Support for ESN in AH (RFC 4302) from Fan Du.
2) Add full kernel doc for ethtool command structures, from Ben
Hutchings.
3) Add BCM7xxx PHY driver, from Florian Fainelli.
4) Export computed TCP rate information in netlink socket dumps, from
Eric Dumazet.
5) Allow IPSEC SA to be dumped partially using a filter, from Nicolas
Dichtel.
6) Convert many drivers to pci_enable_msix_range(), from Alexander
Gordeev.
7) Record SKB timestamps more efficiently, from Eric Dumazet.
8) Switch to microsecond resolution for TCP round trip times, also
from Eric Dumazet.
9) Clean up and fix 6lowpan fragmentation handling by making use of
the existing inet_frag api for it's implementation.
10) Add TX grant mapping to xen-netback driver, from Zoltan Kiss.
11) Auto size SKB lengths when composing netlink messages based upon
past message sizes used, from Eric Dumazet.
12) qdisc dumps can take a long time, add a cond_resched(), From Eric
Dumazet.
13) Sanitize netpoll core and drivers wrt. SKB handling semantics.
Get rid of never-used-in-tree netpoll RX handling. From Eric W
Biederman.
14) Support inter-address-family and namespace changing in VTI tunnel
driver(s). From Steffen Klassert.
15) Add Altera TSE driver, from Vince Bridgers.
16) Optimizing csum_replace2() so that it doesn't adjust the checksum
by checksumming the entire header, from Eric Dumazet.
17) Expand BPF internal implementation for faster interpreting, more
direct translations into JIT'd code, and much cleaner uses of BPF
filtering in non-socket ocntexts. From Daniel Borkmann and Alexei
Starovoitov"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1976 commits)
netpoll: Use skb_irq_freeable to make zap_completion_queue safe.
net: Add a test to see if a skb is freeable in irq context
qlcnic: Fix build failure due to undefined reference to `vxlan_get_rx_port'
net: ptp: move PTP classifier in its own file
net: sxgbe: make "core_ops" static
net: sxgbe: fix logical vs bitwise operation
net: sxgbe: sxgbe_mdio_register() frees the bus
Call efx_set_channels() before efx->type->dimension_resources()
xen-netback: disable rogue vif in kthread context
net/mlx4: Set proper build dependancy with vxlan
be2net: fix build dependency on VxLAN
mac802154: make csma/cca parameters per-wpan
mac802154: allow only one WPAN to be up at any given time
net: filter: minor: fix kdoc in __sk_run_filter
netlink: don't compare the nul-termination in nla_strcmp
can: c_can: Avoid led toggling for every packet.
can: c_can: Simplify TX interrupt cleanup
can: c_can: Store dlc private
can: c_can: Reduce register access
can: c_can: Make the code readable
...
The current set selection simply choses the first set type that provides
the requested features, which always results in the rbtree being chosen
by virtue of being the first set in the list.
What we actually want to do is choose the implementation that can provide
the requested features and is optimal from either a performance or memory
perspective depending on the characteristics of the elements and the
preferences specified by the user.
The elements are not known when creating a set. Even if we would provide
them for anonymous (literal) sets, we'd still have standalone sets where
the elements are not known in advance. We therefore need an abstract
description of the data charcteristics.
The kernel already knows the size of the key, this patch starts by
introducing a nested set description which so far contains only the maximum
amount of elements. Based on this the set implementations are changed to
provide an estimate of the required amount of memory and the lookup
complexity class.
The set ops have a new callback ->estimate() that is invoked during set
selection. It receives a structure containing the attributes known to the
kernel and is supposed to populate a struct nft_set_estimate with the
complexity class and, in case the size is known, the complete amount of
memory required, or the amount of memory required per element otherwise.
Based on the policy specified by the user (performance/memory, defaulting
to performance) the kernel will then select the best suited implementation.
Even if the set implementation would allow to add more than the specified
maximum amount of elements, they are enforced since new implementations
might not be able to add more than maximum based on which they were
selected.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Commit 9b2777d608 (ieee802154: add TX power control to wpan_phy)
and following erroneously added CSMA and CCA parameters for 802.15.4
devices as PHY parameters, while they are actually MAC parameters and
can differ for any two WPAN instances. Since it is now sensible to have
multiple WPAN devices with differing CSMA/CCA parameters, make these
parameters MAC parameters instead.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
pull request: wireless-next 2014-03-31
Please accept this one last round of general wireless updates for
the 3.15 merge window!
For the Bluetooth bits, Gustavo says:
"Here follow another set of patches to 3.15. This is mostly a bug fix
pull request with the exception of one commit from Marcel which adds
tracking to the current configured LE scan type parameter."
Beyond that, notable bits include some final refactoring of rtl8180
and the addition of the rtl8187se driver, fixes for a number of
problems identified by Dan Carpenter and his static analysis tools,
and a handful of other bits here and there.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the whole rt6_need_strict as static inline into ip6_route.h,
so that it can be reused
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch basically does two things, i) removes the extern keyword
from the include/linux/filter.h file to be more consistent with the
rest of Joe's changes, and ii) moves filter accounting into the filter
core framework.
Filter accounting mainly done through sk_filter_{un,}charge() take
care of the case when sockets are being cloned through sk_clone_lock()
so that removal of the filter on one socket won't result in eviction
as it's still referenced by the other.
These functions actually belong to net/core/filter.c and not
include/net/sock.h as we want to keep all that in a central place.
It's also not in fast-path so uninlining them is fine and even allows
us to get rd of sk_filter_release_rcu()'s EXPORT_SYMBOL and a forward
declaration.
Joint work with Alexei Starovoitov.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/marvell/mvneta.c
The mvneta.c conflict is a case of overlapping changes,
a conversion to devm_ioremap_resource() vs. a conversion
to netdev_alloc_pcpu_stats.
Signed-off-by: David S. Miller <davem@davemloft.net>
addrconf_join_solict and addrconf_join_anycast may cause actions which
need rtnl locked, especially on first address creation.
A new DAD state is introduced which defers processing of the initial
DAD processing into a workqueue.
To get rtnl lock we need to push the code paths which depend on those
calls up to workqueues, specifically addrconf_verify and the DAD
processing.
(v2)
addrconf_dad_failure needs to be queued up to the workqueue, too. This
patch introduces a new DAD state and stop the DAD processing in the
workqueue (this is because of the possible ipv6_del_addr processing
which removes the solicited multicast address from the device).
addrconf_verify_lock is removed, too. After the transition it is not
needed any more.
As we are not processing in bottom half anymore we need to be a bit more
careful about disabling bottom half out when we lock spin_locks which are also
used in bh.
Relevant backtrace:
[ 541.030090] RTNL: assertion failed at net/core/dev.c (4496)
[ 541.031143] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 3.10.33-1-amd64-vyatta #1
[ 541.031145] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 541.031146] ffffffff8148a9f0 000000000000002f ffffffff813c98c1 ffff88007c4451f8
[ 541.031148] 0000000000000000 0000000000000000 ffffffff813d3540 ffff88007fc03d18
[ 541.031150] 0000880000000006 ffff88007c445000 ffffffffa0194160 0000000000000000
[ 541.031152] Call Trace:
[ 541.031153] <IRQ> [<ffffffff8148a9f0>] ? dump_stack+0xd/0x17
[ 541.031180] [<ffffffff813c98c1>] ? __dev_set_promiscuity+0x101/0x180
[ 541.031183] [<ffffffff813d3540>] ? __hw_addr_create_ex+0x60/0xc0
[ 541.031185] [<ffffffff813cfe1a>] ? __dev_set_rx_mode+0xaa/0xc0
[ 541.031189] [<ffffffff813d3a81>] ? __dev_mc_add+0x61/0x90
[ 541.031198] [<ffffffffa01dcf9c>] ? igmp6_group_added+0xfc/0x1a0 [ipv6]
[ 541.031208] [<ffffffff8111237b>] ? kmem_cache_alloc+0xcb/0xd0
[ 541.031212] [<ffffffffa01ddcd7>] ? ipv6_dev_mc_inc+0x267/0x300 [ipv6]
[ 541.031216] [<ffffffffa01c2fae>] ? addrconf_join_solict+0x2e/0x40 [ipv6]
[ 541.031219] [<ffffffffa01ba2e9>] ? ipv6_dev_ac_inc+0x159/0x1f0 [ipv6]
[ 541.031223] [<ffffffffa01c0772>] ? addrconf_join_anycast+0x92/0xa0 [ipv6]
[ 541.031226] [<ffffffffa01c311e>] ? __ipv6_ifa_notify+0x11e/0x1e0 [ipv6]
[ 541.031229] [<ffffffffa01c3213>] ? ipv6_ifa_notify+0x33/0x50 [ipv6]
[ 541.031233] [<ffffffffa01c36c8>] ? addrconf_dad_completed+0x28/0x100 [ipv6]
[ 541.031241] [<ffffffff81075c1d>] ? task_cputime+0x2d/0x50
[ 541.031244] [<ffffffffa01c38d6>] ? addrconf_dad_timer+0x136/0x150 [ipv6]
[ 541.031247] [<ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6]
[ 541.031255] [<ffffffff8105313a>] ? call_timer_fn.isra.22+0x2a/0x90
[ 541.031258] [<ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6]
Hunks and backtrace stolen from a patch by Stephen Hemminger.
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
To be consistent, lets use msec for this timeout as well.
Note: This define value is a minimum scan time taken from BT Core spec 4.0,
Vol 3, Part C, chapter 9.2.6
Signed-off-by: Lukasz Rymanowski <lukasz.rymanowski@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
With this patch it is possible to control discovery interleaved
timeout value from debugfs.
It is for fine tuning of this timeout.
Signed-off-by: Lukasz Rymanowski <lukasz.rymanowski@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Keep msec instead of jiffies in this define. This is needed by following
patch where we want this timeout to be exposed in debugfs.
Note: Value of this timeout comes from recommendation in BT Core Spec.4.0,
Vol 3, Part C, chapter 13.2.1.
Signed-off-by: Lukasz Rymanowski <lukasz.rymanowski@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
If an IPv6 host route with metrics exists, an attempt to add a
new route for the same target with different metrics fails but
rewrites the metrics anyway:
12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000
12sp0:~ # ip -6 route show
fe80::/64 dev eth0 proto kernel metric 256
fec0::1 dev eth0 metric 1024 rto_min lock 1s
12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1500
RTNETLINK answers: File exists
12sp0:~ # ip -6 route show
fe80::/64 dev eth0 proto kernel metric 256
fec0::1 dev eth0 metric 1024 rto_min lock 1.5s
This is caused by all IPv6 host routes using the metrics in
their inetpeer (or the shared default). This also holds for the
new route created in ip6_route_add() which shares the metrics
with the already existing route and thus ip6_route_add()
rewrites the metrics even if the new route ends up not being
used at all.
Another problem is that old metrics in inetpeer can reappear
unexpectedly for a new route, e.g.
12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000
12sp0:~ # ip route del fec0::1
12sp0:~ # ip route add fec0::1 dev eth0
12sp0:~ # ip route change fec0::1 dev eth0 hoplimit 10
12sp0:~ # ip -6 route show
fe80::/64 dev eth0 proto kernel metric 256
fec0::1 dev eth0 metric 1024 hoplimit 10 rto_min lock 1s
Resolve the first problem by moving the setting of metrics down
into fib6_add_rt2node() to the point we are sure we are
inserting the new route into the tree. Second problem is
addressed by introducing new flag DST_METRICS_FORCE_OVERWRITE
which is set for a new host route in ip6_route_add() and makes
ipv6_cow_metrics() always overwrite the metrics in inetpeer
(even if they are not "new"); it is reset after that.
v5: use a flag in _metrics member rather than one in flags
v4: fix a typo making a condition always true (thanks to Hannes
Frederic Sowa)
v3: rewritten based on David Miller's idea to move setting the
metrics (and allocation in non-host case) down to the point we
already know the route is to be inserted. Also rebased to
net-next as it is quite late in the cycle.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The packet hash can be considered a property of the packet, not just
on RX path.
This patch changes name of rxhash and l4_rxhash skbuff fields to be
hash and l4_hash respectively. This includes changing uses of the
field in the code which don't call the access functions.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Especially in crowded environments it can become frequent that we have
to send out whatever pending event there is stored. Since user space
has its own filtering of small RSSI changes sending a 0 value will
essentially force user space to wake up the higher layers (e.g. over
D-Bus) even though the RSSI didn't actually change more than the
threshold value.
This patch adds storing also of the RSSI for pending advertising reports
so that we report an as accurate RSSI as possible when we have to send
out the stored information to user space.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The line was incorrectly split between the variable type and its name.
This patch fixes the issue.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
To avoid too many events being sent to user space and to help parsing of
all available remote device data it makes sense for us to wait for the
scan response and send a single merged Device Found event to user space.
This patch adds a few new variables to hci_dev to track the last
received ADV_IND/ADV_SCAN_IND, i.e. those which will cause a SCAN_REQ to
be send in the case of active scanning. When the SCAN_RSP is received
the pending data is passed together with the SCAN_RSP to the
mgmt_device_found function which takes care of merging them into a
single Device Found event.
We also need a bit of extra logic to handle situations where we don't
receive a SCAN_RSP after caching some data. In such a scenario we simply
have to send out the pending data as it is and then operate on the new
report as if there was no pending data.
We also need to send out any pending data when scanning stops as
well as ensure that the storage is empty at the start of a new active
scanning session. These both cases are covered by the update to the
hci_cc_le_set_scan_enable function in this patch.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When we're in peripheral mode (HCI_ADVERTISING flag is set) the most
natural mapping of connect() is to perform directed advertising to the
peer device.
This patch does the necessary changes to enable directed advertising and
keeps the hci_conn state as BT_CONNECT in a similar way as is done for
central or BR/EDR connection initiation.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
In preparation for being able to merge ADV_IND/ADV_SCAN_IND and SCAN_RSP
together into a single device found event add a second parameter to the
mgmt_device_found function. For now all callers pass NULL as this
parameters since we don't yet have storing of the last received
advertising report.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Conflicts:
Documentation/devicetree/bindings/net/micrel-ks8851.txt
net/core/netpoll.c
The net/core/netpoll.c conflict is a bug fix in 'net' happening
to code which is completely removed in 'net-next'.
In micrel-ks8851.txt we simply have overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
Please pull this batch of wireless updates intended for 3.15!
For the mac80211 bits, Johannes says:
"This has a whole bunch of bugfixes for things that went into -next
previously as well as some other bugfixes I didn't want to rush into
3.14 at this point. The rest of it is some cleanups and a few small
features, the biggest of which is probably Janusz's regulatory DFS CAC
time code."
For the Bluetooth bits, Gustavo says:
"One more pull request to 3.15. This is mostly and bug fix pull request, it
contains several fixes and clean up all over the tree, plus some small new
features."
For the NFC bits, Samuel says:
"This is the NFC pull request for 3.15. With this one we have:
- Support for ISO 15693 a.k.a. NFC vicinity a.k.a. Type 5 tags. ISO
15693 are long range (1 - 2 meters) vicinity tags/cards. The kernel
now supports those through the NFC netlink and digital APIs.
- Support for TI's trf7970a chipset. This chipset relies on the NFC
digital layer and the driver currently supports type 2, 4A and 5 tags.
- Support for NXP's pn544 secure firmare download. The pn544 C3 chipsets
relies on a different firmware download protocal than the C2 one. We
now support both and use the right one depending on the version we
detect at runtime.
- Support for 4A tags from the NFC digital layer.
- A bunch of cleanups and minor fixes from Axel Lin and Thierry Escande."
For the iwlwifi bits, Emmanuel says:
"We were sending a host command while the mutex wasn't held. This
led to hard-to-catch races."
And...
"I have a fix for a "merge damage" which is not really a merge
damage: it enables scheduled scan which has been disabled in
wireless.git. Since you merged wireless.git into wireless-next.git,
this can now be fixed in wireless-next.git.
Besides this, Alex made a workaround for a hardware bug. This fix
allows us to consume less power in S3. Arik and Eliad continue to
work on D0i3 which is a run-time power saving feature. Eliad also
contributes a few bits to the rate scaling logic to which Eyal adds his
own contribution. Avri dives deep in the power code - newer firmware
will allow to enable power save in newer scenarios. Johannes made a few
clean-ups. I have the regular amount of BT Coex boring stuff. I disable
uAPSD since we identified firmware bugs that cause packet loss. One
thing that do stand out is the udev event that we now send when the
FW asserts. I hope it will allow us to debug the FW more easily."
Also included is one last iwlwifi pull for a build breakage fix...
For the Atheros bits, Kalle says:
"Michal now did some optimisations and was able to improve throughput by
100 Mbps on our MIPS based AP135 platform. Chun-Yeow added some
workarounds to be able to better use ad-hoc mode. Ben improved log
messages and added support for MSDU chaining. And, as usual, also some
smaller fixes."
Beyond that...
Andrea Merello continues his rtl8180 refactoring, in preparation for
a long-awaited rtl8187 driver. We get a new driver (rsi) for the
RS9113 chip, from Fariya Fatima. And, of course, we get the usual
round of updates for ath9k, brcmfmac, mwifiex, wil6210, etc. as well.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This request state is mostly useless, and properly implementing it
for RDMA would require an extra lock to be taken in handle_recv()
and in rdma_cancel() to avoid this race:
handle_recv() rdma_cancel()
. .
. if req->state == SENT
req->state = RCVD .
. req->state = FLSH
So just get rid of it.
Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
And move transport-specific code out of net/9p/client.c
Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
We need barriers to guarantee this pattern works as intended:
[w] req->rc, 1 [r] req->status, 1
wmb rmb
[w] req->status, 1 [r] req->rc
Where the wmb ensures that rc gets written before status,
and the rmb ensures that if you observe status == 1, rc is the new value.
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
ip_rt_dump do nothing after IPv4 route caches removal, so we can remove it.
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When changing one 16bit value by another in IP header, we can adjust
the IP checksum by doing a simple operation described in RFC 1624, as
reminded by David.
csum_partial() is a complex function on x86_64, not really suited for
small number of checksummed bytes.
I spotted csum_partial() being in the top 20 most consuming functions
(more than 1 %) in a GRO workload, which was rather unexpected.
The caller was inet_gro_complete() doing a csum_replace2() when
building the new IP header for the GRO packet.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The LE scan type paramter defines if active scanning or passive scanning
is in use. Track the currently set value so it can be used for decision
making from other pieces in the core.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
While it is true that getnstimeofday() uses about 40 cycles if TSC
is available, it can use 1600 cycles if hpet is the clocksource.
Switch to get_jiffies_64(), as this is more than enough, and
go back to 60 seconds periods.
Fixes: 8c27bd75f0 ("tcp: syncookies: reduce cookie lifetime to 128 seconds")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The passkey_notify and user_confirm functions in mgmt.c were expecting
different endianess for the passkey, leading to a big endian bug and
sparse warning in recently added SMP code. This patch converts both
functions to expect host endianess and do the conversion to little
endian only when assigning to the mgmt event struct.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Add a bit in rx_status.vht_flags to let the low level driver
notify mac80211 about a beamformed packet. Propagate this
to the radiotap header.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
On 2.4Ghz band, the channels overlap since the delta
between different channels is 5Mhz while the width of the
receiver is 20Mhz (at least).
This means that we can hear beacons or probe responses from
adjacent channels. These frames will have a significant
lower RSSI which will feed all kinds of logic with inaccurate
data. An obvious example is the roaming algorithm that will
think our AP is getting weak and will try to move to another
AP.
In order to avoid this, update the signal only if the frame
has been heard on the same channel as the one advertised by
the AP in its DS / HT IEs.
We refrain from updating the values only if the AP is
already in the BSS list so that we will still have a valid
(but inaccurate) value if the AP was heard on an adjacent
channel only.
To achieve this, stop taking the channel from DS / HT IEs
in mac80211. The DS / HT IEs is taken into account to
discard the frame if it was received on a disabled channel.
This can happen due to the same phenomenon: the frame is
sent on channel 12, but heard on channel 11 while channel
12 can be disabled on certain devices. Since this check
is done in cfg80211, stop even checking this in mac80211.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
[remove unused rx_freq variable]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Drivers might want to have private data in addition
to all other ieee80211_tx_info.status fields.
The current ieee80211_tx_info.rate_driver_data overlaps
with some of the non-rate data (e.g. ampdu_ack_len), so
it might not be good enough.
Since we already know how much free bytes remained,
simply use this size to define (void *) array.
While on it, change ack_signal type from int to the more
explicit s32 type.
Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Steffen Klassert says:
====================
One patch to rename a newly introduced struct. The rest is
the rework of the IPsec virtual tunnel interface for ipv6 to
support inter address family tunneling and namespace crossing.
1) Rename the newly introduced struct xfrm_filter to avoid a
conflict with iproute2. From Nicolas Dichtel.
2) Introduce xfrm_input_afinfo to access the address family
dependent tunnel callback functions properly.
3) Add and use a IPsec protocol multiplexer for ipv6.
4) Remove dst_entry caching. vti can lookup multiple different
dst entries, dependent of the configured xfrm states. Therefore
it does not make to cache a dst_entry.
5) Remove caching of flow informations. vti6 does not use the the
tunnel endpoint addresses to do route and xfrm lookups.
6) Update the vti6 to use its own receive hook.
7) Remove the now unused xfrm_tunnel_notifier. This was used from vti
and is replaced by the IPsec protocol multiplexer hooks.
8) Support inter address family tunneling for vti6.
9) Check if the tunnel endpoints of the xfrm state and the vti interface
are matching and return an error otherwise.
10) Enable namespace crossing for vti devices.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for net-next,
most relevantly they are:
* cleanup to remove double semicolon from stephen hemminger.
* calm down sparse warning in xt_ipcomp, from Fan Du.
* nf_ct_labels support for nf_tables, from Florian Westphal.
* new macros to simplify rcu dereferences in the scope of nfnetlink
and nf_tables, from Patrick McHardy.
* Accept queue and drop (including reason for drop) to verdict
parsing in nf_tables, also from Patrick.
* Remove unused random seed initialization in nfnetlink_log, from
Florian Westphal.
* Allow to attach user-specific information to nf_tables rules, useful
to attach user comments to rule, from me.
* Return errors in ipset according to the manpage documentation, from
Jozsef Kadlecsik.
* Fix coccinelle warnings related to incorrect bool type usage for ipset,
from Fengguang Wu.
* Add hash:ip,mark set type to ipset, from Vytas Dauksa.
* Fix message for each spotted by ipset for each netns that is created,
from Ilia Mirkin.
* Add forceadd option to ipset, which evicts a random entry from the set
if it becomes full, from Josh Hunt.
* Minor IPVS cleanups and fixes from Andi Kleen and Tingwei Liu.
* Improve conntrack scalability by removing a central spinlock, original
work from Eric Dumazet. Jesper Dangaard Brouer took them over to address
remaining issues. Several patches to prepare this change come in first
place.
* Rework nft_hash to resolve bugs (leaking chain, missing rcu synchronization
on element removal, etc. from Patrick McHardy.
* Restore context in the rule deletion path, as we now release rule objects
synchronously, from Patrick McHardy. This gets back event notification for
anonymous sets.
* Fix NAT family validation in nft_nat, also from Patrick.
* Improve scalability of xt_connlimit by using an array of spinlocks and
by introducing a rb-tree of hashtables for faster lookup of accounted
objects per network. This patch was preceded by several patches and
refactorizations to accomodate this change including the use of kmem_cache,
from Florian Westphal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the NFC pull request for 3.15. With this one we have:
- Support for ISO 15693 a.k.a. NFC vicinity a.k.a. Type 5 tags. ISO
15693 are long range (1 - 2 meters) vicinity tags/cards. The kernel
now supports those through the NFC netlink and digital APIs.
- Support for TI's trf7970a chipset. This chipset relies on the NFC
digital layer and the driver currently supports type 2, 4A and 5 tags.
- Support for NXP's pn544 secure firmare download. The pn544 C3 chipsets
relies on a different firmware download protocal than the C2 one. We
now support both and use the right one depending on the version we
detect at runtime.
- Support for 4A tags from the NFC digital layer.
- A bunch of cleanups and minor fixes from Axel Lin and Thierry Escande.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJTI1oQAAoJEIqAPN1PVmxKmxgP/iiwhQwsehiUZ6oXYjAc06WT
e+vmJwHuDgbEnr5QDSIdpeaKuxYtCErbFcG4SD5+thTQNEeiRsmTLYdkwZ0FCDF+
WKU3F59EdUZu4F0GyUXvt8lUC3D96bJSIHMCJdg1YDJtyaJcuITeK7lW325CK4hF
4ik0+HXcv9mt/Dq4k6xgqdgCOSz7o9dJSbzw5NTCN1I+G6CNgYrvYqSmvsKut2C7
DQvFu8KhbUkSlXgY61YIA2qpf1hfATkfgLdSsjGJfYdKvWLUAyKbEwoG8dkFA38Y
yqc49SJ6DJ4FGmSt4Qm2VtRa8HLqwjA30jBEeb7Zog80j7zEBS+th0NJxqDWp/82
eaeNdfrA5d17CHiR9ODuGOFt3VJZeN83u+JCli3FKGF727A/2Z0UI8UgnB5ZfcbZ
+5IxV6mQp/h8H4AaQzGQ88x2xibhnrW/XYmTpoxj7CcT3AdbGw/IerZXf+afXBce
iyJZ+Ktr6d5zFucCbm0QyZDSwAhfYDIYGiM6AIM3oum9n3K31dA0Qxy6saMiiK8c
ArYY6raQbRCaOBVCI8JI6PoQFvw5MNug40v7Q/G88dElte9mSLsGslVVhCNBpAVB
Atdwa+I3gwgsea1oqGJo0rSEaWv6PsOeQcFG8boDoQDUvgapzk17bTCz4k2DlXO5
U0Q/GcJnFEdcDhqVKqP5
=xikL
-----END PGP SIGNATURE-----
Merge tag 'nfc-next-3.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next
Samuel Ortiz <sameo@linux.intel.com> says:
"NFC: 3.15: First pull request
This is the NFC pull request for 3.15. With this one we have:
- Support for ISO 15693 a.k.a. NFC vicinity a.k.a. Type 5 tags. ISO
15693 are long range (1 - 2 meters) vicinity tags/cards. The kernel
now supports those through the NFC netlink and digital APIs.
- Support for TI's trf7970a chipset. This chipset relies on the NFC
digital layer and the driver currently supports type 2, 4A and 5 tags.
- Support for NXP's pn544 secure firmare download. The pn544 C3 chipsets
relies on a different firmware download protocal than the C2 one. We
now support both and use the right one depending on the version we
detect at runtime.
- Support for 4A tags from the NFC digital layer.
- A bunch of cleanups and minor fixes from Axel Lin and Thierry Escande."
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Conflicts:
drivers/net/usb/r8152.c
drivers/net/xen-netback/netback.c
Both the r8152 and netback conflicts were simple overlapping
changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Fragmentation and reassembly information for 6lowpan is independent from
the 802.15.4 stack and used only by the 6lowpan reassembly process. Move
the ieee802154_frag_info struct to a private are, it needn't be in the
802.15.4 skb control block.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change all internal uses of ieee802154_addr_sa to ieee802154_addr,
except for those instances that communicate directly with userspace.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the operations on 802.15.4 header structs introduced in a previous
patch to create and parse all headers in the mac802154 stack. This patch
reduces code duplication between different parts of the mac802154 stack
that needed information from headers, and also fixes a few bugs that
seem to have gone unnoticed until now:
* 802.15.4 dgram sockets would return a slightly incorrect value for
the SIOCINQ ioctl
* mac802154 would not drop frames with the "security enabled" bit set,
even though it does not support security, in violation of the
standard
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch provides a set of structures to represent 802.15.4 MAC
headers, and a set of operations to push/pull/peek these structs from
skbs. We cannot simply pointer-cast the skb MAC header pointer to these
structs, because 802.15.4 headers are wildly variable - depending on the
first three bytes, virtually all other fields of the header may be
present or not, and be present with different lengths.
The new header creation/parsing routines also support 802.15.4 security
headers, which are currently not supported by the mac802154
implementation of the protocol.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable sparse warnings about endianness, replace the remaining fields
regarding network operations without explicit endianness annotations
with such that are annotated, and propagate this through the entire
stack.
Uses of ieee802154_addr_sa are not changed yet, this patch is only
concerned with all other fields (such as address filters, operation
parameters and the likes).
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a replacement ieee802154_addr struct with proper endianness on
fields. Short address fields are stored as __le16 as on the network,
extended (EUI64) addresses are __le64 as opposed to the u8[8] format
used previously. This disconnect with the netdev address, which is
stored as big-endian u8[8], is intentional.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The struct as currently defined uses host byte order for some fields,
and most big endian/EUI display byte order for other fields. Inside the
stack, endianness should ideally match network byte order where possible
to minimize the number of byteswaps done in critical paths, but this
patch does not address this; it is only preparatory.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This was used from vti and is replaced by the IPsec protocol
multiplexer hooks. It is now unused, so remove it.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
This patch adds an IPsec protocol multiplexer for ipv6. With
this it is possible to add alternative protocol handlers, as
needed for IPsec virtual tunnel interfaces.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
IPv6 can be build as a module, so we need mechanism to access
the address family dependent callback functions properly.
Therefore we introduce xfrm_input_afinfo, similar to that
what we have for the address family dependent part of
policies and states.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
We leak an active timer, the hotcpu notifier and all allocated
resources when we exit a namespace. Fix this by introducing a
flow_cache_fini() function where we release the resources before
we exit.
Fixes: ca925cf153 ("flowcache: Make flow cache name space aware")
Reported-by: Jakub Kicinski <moorray3@wp.pl>
Tested-by: Jakub Kicinski <moorray3@wp.pl>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is not legal to create multiple kmem_cache having the same name.
flowcache can use a single kmem_cache, no need for a per netns
one.
Fixes: ca925cf153 ("flowcache: Make flow cache name space aware")
Reported-by: Jakub Kicinski <moorray3@wp.pl>
Tested-by: Jakub Kicinski <moorray3@wp.pl>
Tested-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to the latest draft specification from
the NFC-V committee, ISO/IEC 15693 tags will be
referred to as "Type 5" tags and not "Type V"
tags anymore. Make the code reflect the new
terminology.
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We do not need to switch the net_ns if the target net_ns the same
as the current one, so here we add a pre-check of net_ns to avoid
this as David suggested.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case the pairable option has been disabled, the pairing procedure
does not create keys for bonding. This means that these generated keys
should not be stored persistently.
For LTK and CSRK this is important to tell userspace to not store these
new keys. They will be available for the lifetime of the device, but
after the next power cycle they should not be used anymore.
Inform userspace to actually store the keys persistently only if both
sides request bonding.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The connection signature resolving key (CSRK) is used for attribute
protocol signed write procedures. This change generates a new local
key during pairing and requests the peer key as well.
Newly generated key and received key will be provided to userspace
using the New Signature Resolving Key management event.
The Master CSRK can be used for verification of remote signed write
PDUs and the Slave CSRK can be used for sending signed write PDUs
to the remote device.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
In order to fix set destruction notifications and get rid of unnecessary
members in private data structures, pass the context to expressions'
destructor functions again.
In order to do so, replace various members in the nft_rule_trans structure
by the full context.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
nf_conntrack_lock is a monolithic lock and suffers from huge contention
on current generation servers (8 or more core/threads).
Perf locking congestion is clear on base kernel:
- 72.56% ksoftirqd/6 [kernel.kallsyms] [k] _raw_spin_lock_bh
- _raw_spin_lock_bh
+ 25.33% init_conntrack
+ 24.86% nf_ct_delete_from_lists
+ 24.62% __nf_conntrack_confirm
+ 24.38% destroy_conntrack
+ 0.70% tcp_packet
+ 2.21% ksoftirqd/6 [kernel.kallsyms] [k] fib_table_lookup
+ 1.15% ksoftirqd/6 [kernel.kallsyms] [k] __slab_free
+ 0.77% ksoftirqd/6 [kernel.kallsyms] [k] inet_getpeer
+ 0.70% ksoftirqd/6 [nf_conntrack] [k] nf_ct_delete
+ 0.55% ksoftirqd/6 [ip_tables] [k] ipt_do_table
This patch change conntrack locking and provides a huge performance
improvement. SYN-flood attack tested on a 24-core E5-2695v2(ES) with
10Gbit/s ixgbe (with tool trafgen):
Base kernel: 810.405 new conntrack/sec
After patch: 2.233.876 new conntrack/sec
Notice other floods attack (SYN+ACK or ACK) can easily be deflected using:
# iptables -A INPUT -m state --state INVALID -j DROP
# sysctl -w net/netfilter/nf_conntrack_tcp_loose=0
Use an array of hashed spinlocks to protect insertions/deletions of
conntracks into the hash table. 1024 spinlocks seem to give good
results, at minimal cost (4KB memory). Due to lockdep max depth,
1024 becomes 8 if CONFIG_LOCKDEP=y
The hash resize is a bit tricky, because we need to take all locks in
the array. A seqcount_t is used to synchronize the hash table users
with the resizing process.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Netfilter expectations are protected with the same lock as conntrack
entries (nf_conntrack_lock). This patch split out expectations locking
to use it's own lock (nf_conntrack_expect_lock).
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
One spinlock per cpu to protect dying/unconfirmed/template special lists.
(These lists are now per cpu, a bit like the untracked ct)
Add a @cpu field to nf_conn, to make sure we hold the appropriate
spinlock at removal time.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Changes while reading through the netfilter code.
Added hint about how conntrack nf_conn refcnt is accessed.
And renamed repl_hash to reply_hash for readability
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
iproute2 already defines a structure with that name, let's use another one to
avoid any conflict.
CC: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
This header is used by bluetooth and ieee802154 branch. This patch
move this header to the include/net directory to avoid a use of a
relative path in include.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The original documentation was very unclear.
The code fix is presumably related to the formerly unclear
documentation: SOCK_TIMESTAMPING_RX_SOFTWARE has no effect on
__sock_recv_timestamp's behavior, so calling __sock_recv_ts_and_drops
from sock_recv_ts_and_drops if only SOCK_TIMESTAMPING_RX_SOFTWARE is
set is pointless. This should have no user-observable effect.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Can be invoked from non-BH context.
Based upon a patch by Eric Dumazet.
Fixes: f19c29e3e3 ("tcp: snmp stats for Fast Open, SYN rtx, and data pkts")
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit e688a60480 ("net: introduce DST_NOPEER dst flag") introduced
DST_NOPEER because because of crashes in ipv6_select_ident called from
udp6_ufo_fragment.
Since commit 916e4cf46d ("ipv6: reuse ip6_frag_id from
ip6_ufo_append_data") we don't call ipv6_select_ident any more from
ip6_ufo_append_data, thus this flag lost its purpose and can be removed.
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/wireless/ath/ath9k/recv.c
drivers/net/wireless/mwifiex/pcie.c
net/ipv6/sit.c
The SIT driver conflict consists of a bug fix being done by hand
in 'net' (missing u64_stats_init()) whilst in 'net-next' a helper
was created (netdev_alloc_pcpu_stats()) which takes care of this.
The two wireless conflicts were overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch drops the current way of 6lowpan fragmentation on receiving
side and replace it with a implementation which use the inet_frag api.
The old fragmentation handling has some race conditions and isn't
rfc4944 compatible. Also adding support to match fragments on
destination address, source address, tag value and datagram_size
which is missing in the current implementation.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds necessary ieee802154 6lowpan namespace to provide the
inet_frag information. This is a initial support for handling 6lowpan
fragmentation with the inet_frag api.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a 6lowpan fragmentation struct into cb of skb which
is necessary to hold fragmentation information.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The stop_scan_complete function was used as an intermediate step before
doing the actual connection creation. Since we're using hci_request
there's no reason to have this extra function around, i.e. we can simply
put both HCI commands into the same request.
The single task that the intermediate function had, i.e. indicating
discovery as stopped is now taken care of by a new
HCI_LE_SCAN_INTERRUPTED flag which allows us to do the discovery state
update when the stop scan command completes.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
LE connection attempts do not have a controller side timeout in the same
way as BR/EDR has (in form of the page timeout). Since we always do
scanning before initiating connections the attempts are always expected
to succeed in some reasonable time.
This patch adds a timer which forces a cancellation of the connection
attempt within 20 seconds if it has not been successful by then. This
way we e.g. ensure that mgmt_pair_device times out eventually and gives
an error response.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds defines for the initiator filter policy parameter values
of the HCI_LE_Create_Connection command. They will be used in a
subsequent patch to check whether we should have a timeout for the
connection attempt or not.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
For SMP we need the local and remote addresses (and their types) that
were used to establish the connection. These may be different from the
Identity Addresses or even the current RPA. To guarantee that we have
this information available and it is correct track these values
separately from the very beginning of the connection.
For outgoing connections we set the values as soon as we get a
successful command status for HCI_LE_Create_Connection (for which the
patch adds a command status handler function) and for incoming
connections as soon as we get a LE Connection Complete HCI event.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The random numbers in Bluetooth Low Energy are 64-bit numbers and should
also be little endian since the HCI specification is little endian.
Change the whole Low Energy pairing to use __le64 instead of a byte
array.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
If some of the cleanup commands caused by mgmt_set_powered(off) never
complete we should still force the adapter to be powered down. This is
rather easy to do since hdev->power_off is already a delayed work
struct. This patch schedules this delayed work if at least one HCI
command was sent by the cleanup procedure.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The current LE white list entries require storing in the HCI controller
structure. So provide a storage and access functions for it. In addition
export the current list via debugfs.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Add the definitions for clearing the LE white list, adding entries to
the LE white list and removing entries from the LE white list.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The hci_blacklist_clear function is not used outside of hci_core.c and
can be made static.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Avoid the following sparse __CHECK_ENDIAN__ warnings:
include/net/addrconf.h:318:25: warning: restricted __be64 degrades to integer
include/net/addrconf.h:318:70: warning: restricted __be64 degrades to integer
include/net/addrconf.h:330:25: warning: restricted __be64 degrades to integer
include/net/addrconf.h:330:70: warning: restricted __be64 degrades to integer
include/net/addrconf.h:347:25: warning: restricted __be64 degrades to integer
include/net/addrconf.h:348:26: warning: restricted __be64 degrades to integer
include/net/addrconf.h:349:18: warning: restricted __be64 degrades to integer
The warnings are false but they make it harder to spot real
bugs.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
This is the rework of the IPsec virtual tunnel interface
for ipv4 to support inter address family tunneling and
namespace crossing. The only change to the last RFC version
is a compile fix for an odd configuration where CONFIG_XFRM
is set but CONFIG_INET is not set.
1) Add and use a IPsec protocol multiplexer.
2) Add xfrm_tunnel_skb_cb to the skb common buffer
to store a receive callback there.
3) Make vti work with i_key set by not including the i_key
when comupting the hash for the tunnel lookup in case of
vti tunnels.
4) Update ip_vti to use it's own receive hook.
5) Remove xfrm_tunnel_notifier, this is replaced by the IPsec
protocol multiplexer.
6) We need to be protocol family indepenent, so use the on xfrm_lookup
returned dst_entry instead of the ipv4 rtable in vti_tunnel_xmit().
7) Add support for inter address family tunneling.
8) Check if the tunnel endpoints of the xfrm state and the vti interface
are matching and return an error otherwise.
8) Enable namespace crossing tor vti devices.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
1) Build fix for ip_vti when NET_IP_TUNNEL is not set.
We need this set to have ip_tunnel_get_stats64()
available.
2) Fix a NULL pointer dereference on sub policy usage.
We try to access a xfrm_state from the wrong array.
3) Take xfrm_state_lock in xfrm_migrate_state_find(),
we need it to traverse through the state lists.
4) Clone states properly on migration, otherwise we crash
when we migrate a state with aead algorithm attached.
5) Fix unlink race when between thread context and timer
when policies are deleted.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The number of places needing the local Identity Address are starting to
grow so it's better to have a single place for the logic of determining
it. This patch adds a convenience function for getting the Identity
Address and updates the two current places needing this to use it.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
To know the real source address for incoming connections (needed e.g.
for SMP) we should store the own_address_type parameter that was used
for the last HCI_LE_Write_Advertising_Parameters command. This patch
adds a proper command complete handler for the command and stores the
address type in a new adv_addr_type variable in the hci_dev struct.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This allows us to store user comment strings, but it could be also
used to store any kind of information that the user application needs
to link to the rule.
Scratch 8 bits for the new ulen field that indicates the length the
user data area. 4 bits from the handle (so it's 42 bits long, according
to Patrick, it would last 139 years with 1000 new rules per second)
and 4 bits from dlen (so the expression data area is 4K, which seems
sufficient by now even considering the compatibility layer).
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Patrick McHardy <kaber@trash.net>
This patches creates the public hci_req_add_le_passive_scan helper so
it can be re-used outside hci_core.c in the next patch.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
We should only accept connection parameters from identity addresses
(public or random static). Thus, we should check the address type
in hci_conn_params_add().
Additionally, since the IRK is removed during unpair, we should also
remove the connection parameters from that device.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch introduces the LE auto connection options: HCI_AUTO_CONN_
ALWAYS and HCI_AUTO_CONN_LINK_LOSS. Their working mechanism are
described as follows:
The HCI_AUTO_CONN_ALWAYS option configures the kernel to always re-
establish the connection, no matter the reason the connection was
terminated. This feature is required by some LE profiles such as
HID over GATT, Health Thermometer and Blood Pressure. These profiles
require the host autonomously connect to the device as soon as it
enters in connectable mode (start advertising) so the device is able
to delivery notifications or indications.
The BT_AUTO_CONN_LINK_LOSS option configures the kernel to re-
establish the connection in case the connection was terminated due
to a link loss. This feature is required by the majority of LE
profiles such as Proximity, Find Me, Cycling Speed and Cadence and
Time.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch introduces the LE auto connection infrastructure which
will be used to implement the LE auto connection options.
In summary, the auto connection mechanism works as follows: Once the
first pending LE connection is created, the background scanning is
started. When the target device is found in range, the kernel
autonomously starts the connection attempt. If connection is
established successfully, that pending LE connection is deleted and
the background is stopped.
To achieve that, this patch introduces the hci_update_background_scan()
which controls the background scanning state. This function starts or
stops the background scanning based on the hdev->pend_le_conns list. If
there is no pending LE connection, the background scanning is stopped.
Otherwise, we start the background scanning.
Then, every time a pending LE connection is added we call hci_update_
background_scan() so the background scanning is started (in case it is
not already running). Likewise, every time a pending LE connection is
deleted we call hci_update_background_scan() so the background scanning
is stopped (in case this was the last pending LE connection) or it is
started again (in case we have more pending LE connections). Finally,
we also call hci_update_background_scan() in hci_le_conn_failed() so
the background scan is restarted in case the connection establishment
fails. This way the background scanning keeps running until all pending
LE connection are established.
At this point, resolvable addresses are not support by this
infrastructure. The proper support is added in upcoming patches.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch introduces the hdev->pend_le_conn list which holds the
device addresses the kernel should autonomously connect. It also
introduces some helper functions to manipulate the list.
The list and helper functions will be used by the next patch which
implements the LE auto connection infrastructure.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
hci_connect() is a very simple and useless wrapper of hci_connect_acl
and hci_connect_le functions. Addtionally, all places where hci_connect
is called the link type value is passed explicitly. This way, we can
safely delete hci_connect, declare hci_connect_acl and hci_connect_le
in hci_core.h and call them directly.
No functionality is changed by this patch.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Some LE controllers don't support scanning and creating a connection
at the same time. So we should always stop scanning in order to
establish the connection.
Since we may prematurely stop the discovery procedure in favor of
the connection establishment, we should also cancel hdev->le_scan_
disable delayed work and set the discovery state to DISCOVERY_STOPPED.
This change does a small improvement since it is not mandatory the
user stops scanning before connecting anymore. Moreover, this change
is required by upcoming LE auto connection mechanism in order to work
properly with controllers that don't support background scanning and
connection establishment at the same time.
In future, we might want to do a small optimization by checking if
controller is able to scan and connect at the same time. For now,
we want the simplest approach so we always stop scanning (even if
the controller is able to carry out both operations).
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds the "hci_" prefix to le_conn_failed() helper and
declares it in hci_core.h so it can be reused in hci_event.c.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch moves stop LE scanning duplicate code to one single
place and reuses it. This will avoid more duplicate code in
upcoming patches.
Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Upcoming congestion controls for TCP require usec resolution for RTT
estimations. Millisecond resolution is simply not enough these days.
FQ/pacing in DC environments also require this change for finer control
and removal of bimodal behavior due to the current hack in
tcp_update_pacing_rate() for 'small rtt'
TCP_CONG_RTT_STAMP is no longer needed.
As Julian Anastasov pointed out, we need to keep user compatibility :
tcp_metrics used to export RTT and RTTVAR in msec resolution,
so we added RTT_US and RTTVAR_US. An iproute2 patch is needed
to use the new attributes if provided by the kernel.
In this example ss command displays a srtt of 32 usecs (10Gbit link)
lpk51:~# ./ss -i dst lpk52
Netid State Recv-Q Send-Q Local Address:Port Peer
Address:Port
tcp ESTAB 0 1 10.246.11.51:42959
10.246.11.52:64614
cubic wscale:6,6 rto:201 rtt:0.032/0.001 ato:40 mss:1448
cwnd:10 send
3620.0Mbps pacing_rate 7240.0Mbps unacked:1 rcv_rtt:993 rcv_space:29559
Updated iproute2 ip command displays :
lpk51:~# ./ip tcp_metrics | grep 10.246.11.52
10.246.11.52 age 561.914sec cwnd 10 rtt 274us rttvar 213us source
10.246.11.51
Old binary displays :
lpk51:~# ip tcp_metrics | grep 10.246.11.52
10.246.11.52 age 561.914sec cwnd 10 rtt 250us rttvar 125us source
10.246.11.51
With help from Julian Anastasov, Stephen Hemminger and Yuchung Cheng
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Larry Brakmo <brakmo@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
This option has the same semantic as IP_PMTUDISC_OMIT for IPv4 which
got recently introduced. It doesn't honor the path mtu discovered by the
host but in contrary to IPV6_PMTUDISC_INTERFACE allows the generation of
fragments if the packet size exceeds the MTU of the outgoing interface
MTU.
Fixes: 93b36cf342 ("ipv6: support IPV6_PMTU_INTERFACE on sockets")
Cc: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
IP_PMTUDISC_INTERFACE has a design error: because it does not allow the
generation of fragments if the interface mtu is exceeded, it is very
hard to make use of this option in already deployed name server software
for which I introduced this option.
This patch adds yet another new IP_MTU_DISCOVER option to not honor any
path mtu information and not accepting new icmp notifications destined for
the socket this option is enabled on. But we allow outgoing fragmentation
in case the packet size exceeds the outgoing interface mtu.
As such this new option can be used as a drop-in replacement for
IP_PMTUDISC_DONT, which is currently in use by most name server software
making the adoption of this option very smooth and easy.
The original advantage of IP_PMTUDISC_INTERFACE is still maintained:
ignoring incoming path MTU updates and not honoring discovered path MTUs
in the output path.
Fixes: 482fc6094a ("ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE")
Cc: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Send Channel Availability Check time as a parameter
of start_radar_detection() callback.
Get CAC time from regulatory database.
Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Introduce DFS CAC time as a regd param, configured per REG_RULE and
set per channel in cfg80211. DFS CAC time is close connected with
regulatory database configuration. Instead of using hardcoded values,
get DFS CAC time form regulatory database. Pass DFS CAC time to user
mode (mainly for iw reg get, iw list, iw info). Allow setting DFS CAC
time via CRDA. Add support for internal regulatory database.
Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
[rewrap commit log]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This was used from vti and is replaced by the IPsec protocol
multiplexer hooks. It is now unused, so remove it.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
IPsec vti_rcv needs to remind the tunnel pointer to
check it later at the vti_rcv_cb callback. So add
this pointer to the IPsec common buffer, initialize
it and check it to avoid transport state matching of
a tunneled packet.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
This patch add an IPsec protocol multiplexer. With this
it is possible to add alternative protocol handlers as
needed for IPsec virtual tunnel interfaces.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Steffen Klassert says:
====================
1) Introduce skb_to_sgvec_nomark function to add further data to the sg list
without calling sg_unmark_end first. Needed to add extended sequence
number informations. From Fan Du.
2) Add IPsec extended sequence numbers support to the Authentication Header
protocol for ipv4 and ipv6. From Fan Du.
3) Make the IPsec flowcache namespace aware, from Fan Du.
4) Avoid creating temporary SA for every packet when no key manager is
registered. From Horia Geanta.
5) Support filtering of SA dumps to show only the SAs that match a
given filter. From Nicolas Dichtel.
6) Remove caching of xfrm_policy_sk_bundles. The cached socket policy bundles
are never used, instead we create a new cache entry whenever xfrm_lookup()
is called on a socket policy. Most protocols cache the used routes to the
socket, so this caching is not needed.
7) Fix a forgotten SADB_X_EXT_FILTER length check in pfkey, from Nicolas
Dichtel.
8) Cleanup error handling of xfrm_state_clone.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Once mgmt_set_powered(off) starts doing disconnections we'll need to
care about any disconnections in mgmt.c and not just those with the
MGMT_CONNECTED flag set. Therefore, move the check into mgmt.c from
hci_event.c.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
We'll soon need to make decisions on toggling the HCI_ADVERTISING flag
based on pending mgmt_set_powered commands. Therefore, move the handling
from hci_event.c into mgmt.c.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds a convenience function to return the number of
connections in the conn_hash list. This will be useful once we update
the power off procedure to disconnect any open connections.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The RPA needs to be stored to know which is the current one. Otherwise
it is impossible to ensure that always the correct RPA can be programmed
into the controller when it is needed.
Current code checks if the address in the controller is a RPA, but that
can potentially lead to using a RPA that can not be resolved with the
IRK that has been distributed.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
When running active scanning during LE discovery, do not reveal the own
identity to the peer devices. In case LE privacy has been enabled, then
a resolvable private address is used. If the LE privacy option is off,
then use an unresolvable private address.
The public address or static random address is never used in active
scanning anymore. This ensures that scan request are send using a
random address.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Now that the identity address type is always looked up for all
successful connections, the hdev->own_addr_type variable has become
completely unnecessary. Simply remove it.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds a convenience function for updating the local random
address which is needed before advertising, scanning and initiating LE
connections.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds a timer for updating the local RPA periodically. The
default timeout is set to 15 minutes.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds basic mgmt defines for enabling privacy. This includes a
new setting flag as well as the Set Privacy command.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This code adds a HCI_PRIVACY flag to track whether Privacy support is
enabled (meaning we have a local IRK) and makes sure the IRK is
distributed during SMP key distribution in case this flag is set.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch fixes two bugs in fastopen :
1) The tcp_sendmsg(..., @size) argument was ignored.
Code was relying on user not fooling the kernel with iovec mismatches
2) When MTU is about 64KB, tcp_send_syn_data() attempts order-5
allocations, which are likely to fail when memory gets fragmented.
Fixes: 783237e8da ("net-tcp: Fast Open client - sending SYN-data")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Tested-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the ieee80211_iface_limit and the ieee80211_iface_combination
structures to docbook. Reformat the examples of combinations
slightly, so it looks a bit better on docbook.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
docbook (or one of its friends) gets confused with semi-colons in the
argument descriptions, causing it to think that the semi-colon is
marking a new section in the description of addr_mask in wiphy struct.
Prevent this by using hyphens instead of semi-colons in the mask
example.
Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
For testing purposes it is useful to provide an option to change the
advertising channel map. So add a debugfs option to allow this.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Bug introduced by commit 7d442fab0a ("ipv4: Cache dst in tunnels").
Because sit code does not call ip_tunnel_init(), the dst_cache was not
initialized.
CC: Tom Herbert <therbert@google.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to solve races with sched_scan_stop, it is necessary
for the driver to be able to return an error to propagate that
to cfg80211 so it doesn't send an event.
Reviewed-by: Alexander Bondar <alexander.bondar@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Due to userspace assumptions, the sched_scan_stop operation must
be synchronous, i.e. once it returns a new scheduled scan must be
able to start immediately. Document this in the API.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We loose a lot of information of the original state if we
clone it with xfrm_state_clone(). In particular, there is
no crypto algorithm attached if the original state uses
an aead algorithm. This patch add the missing information
to the clone state.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
While framing the TDLS Setup Confirmation frame, the driver needs to
know if the TDLS peer is VHT/HT/WMM capable and thus shall construct
the VHT/HT operation / WMM parameter elements accordingly. Supplicant
determines if the TDLS peer is VHT/HT/WMM capable based on the
presence of the respective IEs in the received TDLS Setup Response frame.
The host driver should not need to parse the received TDLS Response
frame and thus, should be able to rely on the supplicant to indicate
the capability of the peer through additional flags while transmitting
the TDLS Setup Confirmation frame through tdls_mgmt operations.
Signed-off-by: Sunil Dutt Undekari <usdutt@qti.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
For Bluetooth controllers with LE support, track the value of the
currently configured random address. It is important to know what
the current random address is to avoid unneeded attempts to set
a new address. This will become important when introducing the
LE privacy support in the future.
In addition expose the current configured random address via
debugfs for debugging purposes.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
The own_address_type debugfs option does not providing enough
flexibity for interacting with the upcoming LE privacy support.
What really is needed is an option to force using the static address
compared to the public address. The new force_static_address debugfs
option does exactly that. In addition it is also only available when
the controller does actually have a public address. For single mode
LE only controllers this option will not be available.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
In case we decide in udp6_sendmsg to send the packet down the ipv4
udp_sendmsg path because the destination is either of family AF_INET or
the destination is an ipv4 mapped ipv6 address, we don't honor the
maybe specified ipv4 mapped ipv6 address in IPV6_PKTINFO.
We simply can check for this option in ip_cmsg_send because no calls to
ipv6 module functions are needed to do so.
Reported-by: Gert Doering <gert@space.net>
Cc: Tore Anderson <tore@fud.no>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the necessary helper function to send the New IRK mgmt
event and makes sure that the function is called at when SMP key
distribution has completed. The event is sent before the New LTK event
so user space knows which remote device to associate with the keys.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch moves the SMP Long Term Key notification over mgmt from the
hci_add_ltk function to smp.c when both sides have completed their key
distribution. This way we are also able to update the identity address
into the mgmt_new_ltk event.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
It's simpler (one less if-statement) to just evaluate the appropriate
value for store_hint in the mgmt_new_ltk function than to pass a boolean
parameter to the function. Furthermore, this simplifies moving the mgmt
event emission out from hci_add_ltk in subsequent patches.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The SMP code will need to postpone the mgmt event emission for the IRK
and LTKs. To avoid extra lookups at the end of the key distribution
simply return the added value from the add_ltk and add_irk functions.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch fix spelling typo in Documentation/DocBook.
It is because .html and .xml files are generated by make htmldocs,
I have to fix a typo within the source files.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
This also adds NF_CT_LABELS_MAX_SIZE so it can be re-used
as BUILD_BUG_ON in nft_ct.
At this time, nft doesn't yet support writing to the label area;
when this changes the label->words handling needs to be moved
out of xt_connlabel.c into nf_conntrack_labels.c.
Also removes a useless run-time check: words cannot grow beyond
4 (32 bit) or 2 (64bit) since xt_connlabel enforces a maximum of
128 labels.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We currently cache socket policy bundles at xfrm_policy_sk_bundles.
These cached bundles are never used. Instead we create and cache
a new one whenever xfrm_lookup() is called on a socket policy.
Most protocols cache the used routes to the socket, so let's
remove the unused caching of socket policy bundles in xfrm.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Conflicts:
drivers/net/bonding/bond_3ad.h
drivers/net/bonding/bond_main.c
Two minor conflicts in bonding, both of which were overlapping
changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
The only place this is used outside rtnetlink.c is veth. So provide
wrapper function for this usage.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we initiate pairing through mgmt_pair_device the code has so far
been waiting for a successful HCI Encrypt Change event in order to
respond to the mgmt command. However, putting privacy into the play we
actually want the key distribution to be complete before replying so
that we can include the Identity Address in the mgmt response.
This patch updates the various hci_conn callbacks for LE in mgmt.c to
only respond in the case of failure, and adds a new mgmt_smp_complete
function that the SMP code will call once key distribution has been
completed.
Since the smp_chan_destroy function that's used to indicate completion
and clean up the SMP context can be called from various places,
including outside of smp.c, the easiest way to track failure vs success
is a new flag that we set once key distribution has been successfully
completed.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When we receive a remote identity address during SMP key distribution we
should ensure that any associated L2CAP channel instances get their
address information correspondingly updated (so that e.g. doing
getpeername on associated sockets returns the correct address).
This patch adds a new L2CAP core function l2cap_conn_update_id_addr()
which is used to iterate through all L2CAP channels associated with a
connection and update their address information.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
There are many situations where we need to check if an LE address is an
RPA and if so try to look up the IRK for it. To simplify such cases this
patch adds a convenience function for the job.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When mgmt_unpair_device is called we should also remove any associated
IRKs. This patch adds a hci_remove_irk convenience function and ensures
that it's called when mgmt_unpair_device is called.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
There are many functions that never fail but still declare an integer
return value for no reason. This patch converts these functions to use a
void return value to avoid any confusion of whether they can fail or not.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When removing Long Term Keys we should also be checking that the given
address type (public vs random) matches. This patch updates the
hci_remove_ltk function to take an extra parameter and uses it for
address type matching.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch implements the Load IRKs command for the management
interface. The command is used to load the kernel with the initial set
of IRKs. It also sets a HCI_RPA_RESOLVING flag to indicate that we can
start requesting devices to distribute their IRK to us.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When implementing support for Resolvable Private Addresses (RPAs) we'll
need to in several places be able to identify such addresses. This patch
adds a simple convenience function to do the identification of the
address type.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds the initial IRK storage and management functions to the
HCI core. This includes storing a list of IRKs per HCI device and the
ability to add, remove and lookup entries in that list.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Previously the crypto context has only been available for LE SMP
sessions, but now that we'll need to perform operations also during
discovery it makes sense to have this context part of the hci_dev
struct. Later, the context can be removed from the SMP context.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Introduce new netlink attributes for SET_PHY_ATTRS:
* CSMA minimal backoff exponent
* CSMA maximal backoff exponent
* CSMA retry limit
* frame retransmission limit
The CSMA attributes shall correspond to minBE, maxBE and maxCSMABackoffs of
802.15.4, respectively. The frame retransmission shall correspond to
maxFrameRetries of 802.15.4, unless given as -1: then the old behaviour
of the stack shall apply. For RF2xy, the old behaviour is to not do
channel sensing at all and simply send *right now*, which is not
intended behaviour for most applications and actually prohibited for
some channel/page combinations.
For all values except frame retransmission limit, the defaults of
802.15.4 apply. Frame retransmission limits are set to -1 to indicate
backward-compatible behaviour.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since three of the four clear channel assesment modes make use of energy
detection, provide an API to set the energy detection threshold.
Driver support for this is available in at86rf230 for the RF212 chips.
Since for these chips the minimal energy detection threshold depends on
page and channel used, add a field to struct at86rf230_local that stores
the minimal threshold. Actual ED thresholds are configured as offsets
from this value.
For RF212, setting the ED threshold will not work before a channel/page
has been set due to the dependency of energy detection in the chip and
the actual channel/page selected.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The standard describes four modes of clear channel assesment: "energy
above threshold", "carrier found", and the logical and/or of these two.
Support for CCA mode setting is included in the at86rf230 driver,
predicated for RF212 chips.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Listen-before-talk is an alternative to CSMA in uncoordinated networks
and prescribed by european regulations if one wants to have a device
with radio duty cycles above 10% (or less in some bands). Add a phy
property to enable/disable LBT in the phy, including support in the
at86rf230 driver for RF212 chips.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace the current u8 transmit_power in wpan_phy with s8 transmit_power.
The u8 field contained the actual tx power and a tolerance field,
which no physical radio every used. Adjust sysfs entries to keep
compatibility with userspace, give tolerances of +-1dB statically there.
This patch only adds support for this in the at86rf230 driver and the
RF212 chip. Configuration calculation for RF212 is also somewhat basic,
but does the job - the RF212 datasheet gives a large table with
suggested values for combinations of TX power and page/channel, if this
does not work well, we might have to copy the whole table.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The goal of this patch is to allow userland to dump only a part of SA by
specifying a filter during the dump.
The kernel is in charge to filter SA, this avoids to generate useless netlink
traffic (it save also some cpu cycles). This is particularly useful when there
is a big number of SA set on the system.
Note that I removed the union in struct xfrm_state_walk to fix a problem on arm.
struct netlink_callback->args is defined as a array of 6 long and the first long
is used in xfrm code to flag the cb as initialized. Hence, we must have:
sizeof(struct xfrm_state_walk) <= sizeof(long) * 5.
With the union, it was false on arm (sizeof(struct xfrm_state_walk) was
sizeof(long) * 7), due to the padding.
In fact, whatever the arch is, this union seems useless, there will be always
padding after it. Removing it will not increase the size of this struct (and
reduce it on arm).
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Implementation of (a)rwnd calculation might lead to severe performance issues
and associations completely stalling. These problems are described and solution
is proposed which improves lksctp's robustness in congestion state.
1) Sudden drop of a_rwnd and incomplete window recovery afterwards
Data accounted in sctp_assoc_rwnd_decrease takes only payload size (sctp data),
but size of sk_buff, which is blamed against receiver buffer, is not accounted
in rwnd. Theoretically, this should not be the problem as actual size of buffer
is double the amount requested on the socket (SO_RECVBUF). Problem here is
that this will have bad scaling for data which is less then sizeof sk_buff.
E.g. in 4G (LTE) networks, link interfacing radio side will have a large portion
of traffic of this size (less then 100B).
An example of sudden drop and incomplete window recovery is given below. Node B
exhibits problematic behavior. Node A initiates association and B is configured
to advertise rwnd of 10000. A sends messages of size 43B (size of typical sctp
message in 4G (LTE) network). On B data is left in buffer by not reading socket
in userspace.
Lets examine when we will hit pressure state and declare rwnd to be 0 for
scenario with above stated parameters (rwnd == 10000, chunk size == 43, each
chunk is sent in separate sctp packet)
Logic is implemented in sctp_assoc_rwnd_decrease:
socket_buffer (see below) is maximum size which can be held in socket buffer
(sk_rcvbuf). current_alloced is amount of data currently allocated (rx_count)
A simple expression is given for which it will be examined after how many
packets for above stated parameters we enter pressure state:
We start by condition which has to be met in order to enter pressure state:
socket_buffer < currently_alloced;
currently_alloced is represented as size of sctp packets received so far and not
yet delivered to userspace. x is the number of chunks/packets (since there is no
bundling, and each chunk is delivered in separate packet, we can observe each
chunk also as sctp packet, and what is important here, having its own sk_buff):
socket_buffer < x*each_sctp_packet;
each_sctp_packet is sctp chunk size + sizeof(struct sk_buff). socket_buffer is
twice the amount of initially requested size of socket buffer, which is in case
of sctp, twice the a_rwnd requested:
2*rwnd < x*(payload+sizeof(struc sk_buff));
sizeof(struct sk_buff) is 190 (3.13.0-rc4+). Above is stated that rwnd is 10000
and each payload size is 43
20000 < x(43+190);
x > 20000/233;
x ~> 84;
After ~84 messages, pressure state is entered and 0 rwnd is advertised while
received 84*43B ~= 3612B sctp data. This is why external observer notices sudden
drop from 6474 to 0, as it will be now shown in example:
IP A.34340 > B.12345: sctp (1) [INIT] [init tag: 1875509148] [rwnd: 81920] [OS: 10] [MIS: 65535] [init TSN: 1096057017]
IP B.12345 > A.34340: sctp (1) [INIT ACK] [init tag: 3198966556] [rwnd: 10000] [OS: 10] [MIS: 10] [init TSN: 902132839]
IP A.34340 > B.12345: sctp (1) [COOKIE ECHO]
IP B.12345 > A.34340: sctp (1) [COOKIE ACK]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057017] [SID: 0] [SSEQ 0] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057017] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057018] [SID: 0] [SSEQ 1] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057018] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057019] [SID: 0] [SSEQ 2] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057019] [a_rwnd 9914] [#gap acks 0] [#dup tsns 0]
<...>
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057098] [SID: 0] [SSEQ 81] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057098] [a_rwnd 6517] [#gap acks 0] [#dup tsns 0]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057099] [SID: 0] [SSEQ 82] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057099] [a_rwnd 6474] [#gap acks 0] [#dup tsns 0]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057100] [SID: 0] [SSEQ 83] [PPID 0x18]
--> Sudden drop
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]
At this point, rwnd_press stores current rwnd value so it can be later restored
in sctp_assoc_rwnd_increase. This however doesn't happen as condition to start
slowly increasing rwnd until rwnd_press is returned to rwnd is never met. This
condition is not met since rwnd, after it hit 0, must first reach rwnd_press by
adding amount which is read from userspace. Let us observe values in above
example. Initial a_rwnd is 10000, pressure was hit when rwnd was ~6500 and the
amount of actual sctp data currently waiting to be delivered to userspace
is ~3500. When userspace starts to read, sctp_assoc_rwnd_increase will be blamed
only for sctp data, which is ~3500. Condition is never met, and when userspace
reads all data, rwnd stays on 3569.
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 1505] [#gap acks 0] [#dup tsns 0]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 3010] [#gap acks 0] [#dup tsns 0]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057101] [SID: 0] [SSEQ 84] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057101] [a_rwnd 3569] [#gap acks 0] [#dup tsns 0]
--> At this point userspace read everything, rwnd recovered only to 3569
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057102] [SID: 0] [SSEQ 85] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057102] [a_rwnd 3569] [#gap acks 0] [#dup tsns 0]
Reproduction is straight forward, it is enough for sender to send packets of
size less then sizeof(struct sk_buff) and receiver keeping them in its buffers.
2) Minute size window for associations sharing the same socket buffer
In case multiple associations share the same socket, and same socket buffer
(sctp.rcvbuf_policy == 0), different scenarios exist in which congestion on one
of the associations can permanently drop rwnd of other association(s).
Situation will be typically observed as one association suddenly having rwnd
dropped to size of last packet received and never recovering beyond that point.
Different scenarios will lead to it, but all have in common that one of the
associations (let it be association from 1)) nearly depleted socket buffer, and
the other association blames socket buffer just for the amount enough to start
the pressure. This association will enter pressure state, set rwnd_press and
announce 0 rwnd.
When data is read by userspace, similar situation as in 1) will occur, rwnd will
increase just for the size read by userspace but rwnd_press will be high enough
so that association doesn't have enough credit to reach rwnd_press and restore
to previous state. This case is special case of 1), being worse as there is, in
the worst case, only one packet in buffer for which size rwnd will be increased.
Consequence is association which has very low maximum rwnd ('minute size', in
our case down to 43B - size of packet which caused pressure) and as such
unusable.
Scenario happened in the field and labs frequently after congestion state (link
breaks, different probabilities of packet drop, packet reordering) and with
scenario 1) preceding. Here is given a deterministic scenario for reproduction:
>From node A establish two associations on the same socket, with rcvbuf_policy
being set to share one common buffer (sctp.rcvbuf_policy == 0). On association 1
repeat scenario from 1), that is, bring it down to 0 and restore up. Observe
scenario 1). Use small payload size (here we use 43). Once rwnd is 'recovered',
bring it down close to 0, as in just one more packet would close it. This has as
a consequence that association number 2 is able to receive (at least) one more
packet which will bring it in pressure state. E.g. if association 2 had rwnd of
10000, packet received was 43, and we enter at this point into pressure,
rwnd_press will have 9957. Once payload is delivered to userspace, rwnd will
increase for 43, but conditions to restore rwnd to original state, just as in
1), will never be satisfied.
--> Association 1, between A.y and B.12345
IP A.55915 > B.12345: sctp (1) [INIT] [init tag: 836880897] [rwnd: 10000] [OS: 10] [MIS: 65535] [init TSN: 4032536569]
IP B.12345 > A.55915: sctp (1) [INIT ACK] [init tag: 2873310749] [rwnd: 81920] [OS: 10] [MIS: 10] [init TSN: 3799315613]
IP A.55915 > B.12345: sctp (1) [COOKIE ECHO]
IP B.12345 > A.55915: sctp (1) [COOKIE ACK]
--> Association 2, between A.z and B.12346
IP A.55915 > B.12346: sctp (1) [INIT] [init tag: 534798321] [rwnd: 10000] [OS: 10] [MIS: 65535] [init TSN: 2099285173]
IP B.12346 > A.55915: sctp (1) [INIT ACK] [init tag: 516668823] [rwnd: 81920] [OS: 10] [MIS: 10] [init TSN: 3676403240]
IP A.55915 > B.12346: sctp (1) [COOKIE ECHO]
IP B.12346 > A.55915: sctp (1) [COOKIE ACK]
--> Deplete socket buffer by sending messages of size 43B over association 1
IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315613] [SID: 0] [SSEQ 0] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315613] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0]
<...>
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315696] [a_rwnd 6388] [#gap acks 0] [#dup tsns 0]
IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315697] [SID: 0] [SSEQ 84] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315697] [a_rwnd 6345] [#gap acks 0] [#dup tsns 0]
--> Sudden drop on 1
IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315698] [SID: 0] [SSEQ 85] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315698] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]
--> Here userspace read, rwnd 'recovered' to 3698, now deplete again using
association 1 so there is place in buffer for only one more packet
IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315799] [SID: 0] [SSEQ 186] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315799] [a_rwnd 86] [#gap acks 0] [#dup tsns 0]
IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315800] [SID: 0] [SSEQ 187] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 43] [#gap acks 0] [#dup tsns 0]
--> Socket buffer is almost depleted, but there is space for one more packet,
send them over association 2, size 43B
IP B.12346 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3676403240] [SID: 0] [SSEQ 0] [PPID 0x18]
IP A.55915 > B.12346: sctp (1) [SACK] [cum ack 3676403240] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]
--> Immediate drop
IP A.60995 > B.12346: sctp (1) [SACK] [cum ack 387491510] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]
--> Read everything from the socket, both association recover up to maximum rwnd
they are capable of reaching, note that association 1 recovered up to 3698,
and association 2 recovered only to 43
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 1548] [#gap acks 0] [#dup tsns 0]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 3053] [#gap acks 0] [#dup tsns 0]
IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315801] [SID: 0] [SSEQ 188] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315801] [a_rwnd 3698] [#gap acks 0] [#dup tsns 0]
IP B.12346 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3676403241] [SID: 0] [SSEQ 1] [PPID 0x18]
IP A.55915 > B.12346: sctp (1) [SACK] [cum ack 3676403241] [a_rwnd 43] [#gap acks 0] [#dup tsns 0]
A careful reader might wonder why it is necessary to reproduce 1) prior
reproduction of 2). It is simply easier to observe when to send packet over
association 2 which will push association into the pressure state.
Proposed solution:
Both problems share the same root cause, and that is improper scaling of socket
buffer with rwnd. Solution in which sizeof(sk_buff) is taken into concern while
calculating rwnd is not possible due to fact that there is no linear
relationship between amount of data blamed in increase/decrease with IP packet
in which payload arrived. Even in case such solution would be followed,
complexity of the code would increase. Due to nature of current rwnd handling,
slow increase (in sctp_assoc_rwnd_increase) of rwnd after pressure state is
entered is rationale, but it gives false representation to the sender of current
buffer space. Furthermore, it implements additional congestion control mechanism
which is defined on implementation, and not on standard basis.
Proposed solution simplifies whole algorithm having on mind definition from rfc:
o Receiver Window (rwnd): This gives the sender an indication of the space
available in the receiver's inbound buffer.
Core of the proposed solution is given with these lines:
sctp_assoc_rwnd_update:
if ((asoc->base.sk->sk_rcvbuf - rx_count) > 0)
asoc->rwnd = (asoc->base.sk->sk_rcvbuf - rx_count) >> 1;
else
asoc->rwnd = 0;
We advertise to sender (half of) actual space we have. Half is in the braces
depending whether you would like to observe size of socket buffer as SO_RECVBUF
or twice the amount, i.e. size is the one visible from userspace, that is,
from kernelspace.
In this way sender is given with good approximation of our buffer space,
regardless of the buffer policy - we always advertise what we have. Proposed
solution fixes described problems and removes necessity for rwnd restoration
algorithm. Finally, as proposed solution is simplification, some lines of code,
along with some bytes in struct sctp_association are saved.
Version 2 of the patch addressed comments from Vlad. Name of the function is set
to be more descriptive, and two parts of code are changed, in one removing the
superfluous call to sctp_assoc_rwnd_update since call would not result in update
of rwnd, and the other being reordering of the code in a way that call to
sctp_assoc_rwnd_update updates rwnd. Version 3 corrected change introduced in v2
in a way that existing function is not reordered/copied in line, but it is
correctly called. Thanks Vlad for suggesting.
Signed-off-by: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com>
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@nsn.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds support for ATS request and response handling for type 4A tag
activation.
Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Add the header definitions required by upcoming
patches that add support for ISO/IEC 15693.
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The tty driver api design prefers no-fail writes if the driver
write_room() method has previously indicated space is available
to accept writes. Since this is trivially possible for the
RFCOMM tty driver, do so.
Introduce rfcomm_dlc_send_noerror(), which queues but does not
schedule the krfcomm thread if the dlc is not yet connected
(and thus does not error based on the connection state).
The mtu size test is also unnecessary since the caller already
chunks the written data into mtu size.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Tested-By: Alexander Holler <holler@ahsoftware.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Only one session/channel combination may be in use at any one
time. However, the failure does not occur until the tty is
opened (in rfcomm_dlc_open()).
Because these settings are actually bound at rfcomm device
creation (via RFCOMMCREATEDEV ioctl), validate and fail before
creating the rfcomm tty device.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Tested-By: Alexander Holler <holler@ahsoftware.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When RFCOMM_RELEASE_ONHUP is set, the rfcomm tty driver 'takes over'
the initial rfcomm_dev reference created by the RFCOMMCREATEDEV ioctl.
The assumption is that the rfcomm tty driver will release the
rfcomm_dev reference when the tty is freed (in rfcomm_tty_cleanup()).
However, if the tty is never opened, the 'take over' never occurs,
so when RFCOMMRELEASEDEV ioctl is called, the reference is not
released.
Track the state of the reference 'take over' so that the release
is guaranteed by either the RFCOMMRELEASEDEV ioctl or the rfcomm tty
driver.
Note that the synchronous hangup in rfcomm_release_dev() ensures
that rfcomm_tty_install() cannot race with the RFCOMMRELEASEDEV ioctl.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Tested-By: Alexander Holler <holler@ahsoftware.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
No logic prevents an rfcomm_dev from being released multiple
times. For example, if the rfcomm_dev ref count is large due
to pending tx, then multiple RFCOMMRELEASEDEV ioctls may
mistakenly release the rfcomm_dev too many times. Note that
concurrent ioctls are not required to create this condition.
Introduce RFCOMM_DEV_RELEASED status bit which guarantees the
rfcomm_dev can only be released once.
NB: Since the flags are exported to userspace, introduce the status
field to track state for which userspace should not be aware.
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Tested-By: Alexander Holler <holler@ahsoftware.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Now that the LE L2CAP Connection Oriented Channel support has undergone a
decent amount of testing we can make it officially supported. This patch
removes the enable_lecoc module parameter which was previously needed to
enable support for LE L2CAP CoC.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>