Commit Graph

38824 Commits

Author SHA1 Message Date
Nikolay Aleksandrov
c9fd56b34e netpoll: warn on netpoll_send_udp users who haven't disabled irqs
Make sure we catch future netpoll_send_udp users who use it without
disabling irqs and also as a hint for poll_controller users.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-28 16:24:54 -07:00
Daniel Borkmann
c1b3b19923 net: sched: don't break line in tc_classify loop notification
Just some minor noise follow-up to address some stylistic issues of
commit 3b3ae88026 ("net: sched: consolidate tc_classify{,_compat}").
Accidentally v1 instead of v2 of that commit got applied, so this
patch adds the relative diff.

Suggested-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-28 14:01:15 -07:00
David S. Miller
8b72ca67fe Included changes:
- code beautification
 - remove obsolete 'deleted' attribute for bat-gw node
 - increase internal version number
 - prevent potential access to netdev object after deregistration
 - set needed_head/tail_room for batman virtual interface
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJV4CMzAAoJEOb/4TMchkvfsjYP/1IckH6GA4vMROXZ6+Y89ZVl
 Ii6uQck+0FUCkzld+sn8RYPDClqmNTB9PcYAVrrKDMFxOZVCdIvK+iLh1s2PQK4o
 eiab+V93VNK3DTlouvszsW0mdWCaBXiKR01PpEcr528d4LxFnGbiCKDGkcr/A0HM
 yMW9ZwmpohFk0/CRZPZlDJjTlTAUsvp+YXqA2vBUdBT1zN8Lb3BB7jK/rXXreLJ9
 nnNNn+AEw6Fy/FXTDc3QKMrCdyFlN+bHnyXds6345p5asy0JCVh0Sk2zuySi2MDh
 LT4cujIP5LtAhp8i52R5ZjOvNOQfGAyRxvBd20gZM/lKfMcIK3KEaMe7Se87NySR
 IhQ1nwE5S4AWnlmU+xfmX4vReojb9sCLhBN8oMC5bm3uqa4kLR9e4CBaPdRSpZex
 P3Z9DPaV1GAe3yyGFz9pRUTXzAuVMtYKjkObDkOVq+oKqBpG5UMf5gwEkycILvzb
 wwqHbjE7QY1izE8TgDe1y+GKPR8Qivwm7X8gwuHqmkKCNCmrqNkGjPkp1W4RGkzg
 BGJmS9FENxxILvt68rXZYqQ2JUg3Z+k/Jiw5gmQGPjwH62JnEMkKD4NMphDf6hxm
 85bT9J8mF2AJJKDvEYwDu+KFEhX0LyhcN38SGq5QYRS+9nf0JoZzWujHJ7oXHJiw
 o7mwRayOYPLzeLLWpjWy
 =KKXM
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge

Antonio Quartulli says:

====================
Included changes:
- code beautification
- remove obsolete 'deleted' attribute for bat-gw node
- increase internal version number
- prevent potential access to netdev object after deregistration
- set needed_head/tail_room for batman virtual interface
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-28 13:43:33 -07:00
Valentin Rothberg
9723e6abc7 openswitch: fix typo CONFIG_NF_CONNTRACK_LABEL
Fix typo in conntrack.c
s/CONFIG_NF_CONNTRACK_LABEL/CONFIG_NF_CONNTRACK_LABELS/

Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-28 13:39:53 -07:00
David Ahern
192132b9a0 net: Add support for VRFs to inetpeer cache
inetpeer caches based on address only, so duplicate IP addresses within
a namespace return the same cached entry. Enhance the ipv4 address key
to contain both the IPv4 address and VRF device index.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-28 13:32:36 -07:00
David Ahern
d39d14ffa2 net: Add helper function to compare inetpeer addresses
tcp_metrics and inetpeer both have functions to compare inetpeer
addresses. Consolidate into 1 version.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-28 13:32:36 -07:00
David Ahern
3abef286cf net: Add set,get helpers for inetpeer addresses
Use inetpeer set,get helpers in tcp_metrics rather than peeking into
the inetpeer_addr struct.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-28 13:32:36 -07:00
David Ahern
72afa352d6 net: Introduce ipv4_addr_hash and use it for tcp metrics
Refactors a common line into helper function.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-28 13:32:35 -07:00
Philip Downey
df2cf4a78e IGMP: Inhibit reports for local multicast groups
The range of addresses between 224.0.0.0 and 224.0.0.255 inclusive, is
reserved for the use of routing protocols and other low-level topology
discovery or maintenance protocols, such as gateway discovery and
group membership reporting.  Multicast routers should not forward any
multicast datagram with destination addresses in this range,
regardless of its TTL.

Currently, IGMP reports are generated for this reserved range of
addresses even though a router will ignore this information since it
has no purpose.  However, the presence of reserved group addresses in
an IGMP membership report uses up network bandwidth and can also
obscure addresses of interest when inspecting membership reports using
packet inspection or debug messages.

Although the RFCs for the various version of IGMP (e.g.RFC 3376 for
v3) do not specify that the reserved addresses be excluded from
membership reports, it should do no harm in doing so.  In particular
there should be no adverse effect in any IGMP snooping functionality
since 224.0.0.x is specifically excluded as per RFC 4541 (IGMP and MLD
Snooping Switches Considerations) section 2.1.2. Data Forwarding
Rules:

    2) Packets with a destination IP (DIP) address in the 224.0.0.X
       range which are not IGMP must be forwarded on all ports.

IGMP reports for local multicast groups can now be optionally
inhibited by means of a system control variable (by setting the value
to zero) e.g.:
    echo 0 > /proc/sys/net/ipv4/igmp_link_local_mcast_reports

To retain backwards compatibility the previous behaviour is retained
by default on system boot or reverted by setting the value back to
non-zero e.g.:
    echo 1 >  /proc/sys/net/ipv4/igmp_link_local_mcast_reports

Signed-off-by: Philip Downey <pdowney@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-28 13:28:47 -07:00
David S. Miller
0d36938bb8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-08-27 21:45:31 -07:00
Linus Torvalds
e001d7084a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Some straggler bug fixes here:

   1) Netlink_sendmsg() doesn't check iterator type properly in mmap
      case, from Ken-ichirou MATSUZAWA.

   2) Don't sleep in atomic context in bcmgenet driver, from Florian
      Fainelli.

   3) The pfkey_broadcast() code patch can't actually ever use anything
      other than GFP_ATOMIC.  And the cases that right now pass
      GFP_KERNEL or similar will currently trigger an RCU splat.  Just
      use GFP_ATOMIC unconditionally.  From David Ahern.

   4) Fix FD bit timings handling in pcan_usb driver, from Marc
      Kleine-Budde.

   5) Cache dst leaked in ip6_gre tunnel removal, fix from Huaibin Wang.

   6) Traversal into drivers/net/ethernet/renesas should be triggered by
      CONFIG_NET_VENDOR_RENESAS, not a particular driver's config
      option.  From Kazuya Mizuguchi.

   7) Fix regression in handling of igmp_join errors in vxlan, from
      Marcelo Ricardo Leitner.

   8) Make phy_{read,write}_mmd_indirect() properly take the mdio_lock
      mutex when programming the registers.  From Russell King.

   9) Fix non-forced handling in u32_destroy(), from WANG Cong.

  10) Test the EVENT_NO_RUNTIME_PM flag before it is cleared in
      usbnet_stop(), from Eugene Shatokhin.

  11) In sfc driver, don't fetch statistics firmware isn't capable of,
      from Bert Kenward.

  12) Verify ASCONF address parameter location in SCTP, from Xin Long"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  sctp: donot reset the overall_error_count in SHUTDOWN_RECEIVE state
  sctp: asconf's process should verify address parameter is in the beginning
  sfc: only use vadaptor stats if firmware is capable
  net: phy: fixed: propagate fixed link values to struct
  usbnet: Get EVENT_NO_RUNTIME_PM bit before it is cleared
  drivers: net: xgene: fix: Oops in linkwatch_fire_event
  cls_u32: complete the check for non-forced case in u32_destroy()
  net: fec: use reinit_completion() in mdio accessor functions
  net: phy: add locking to phy_read_mmd_indirect()/phy_write_mmd_indirect()
  vxlan: re-ignore EADDRINUSE from igmp_join
  net: compile renesas directory if NET_VENDOR_RENESAS is configured
  ip6_gre: release cached dst on tunnel removal
  phylib: Make PHYs children of their MDIO bus, not the bus' parent.
  can: pcan_usb: don't provide CAN FD bittimings by non-FD adapters
  net: Fix RCU splat in af_key
  net: bcmgenet: fix uncleaned dma flags
  net: bcmgenet: Avoid sleeping in bcmgenet_timeout
  netlink: mmap: fix tx type check
2015-08-27 17:52:38 -07:00
Phil Sutter
3e692f2153 net: sched: simplify attach_one_default_qdisc()
Now that noqueue qdisc can be attached just like any other qdisc, no
special treatment is necessary anymore when attaching it as default
qdisc.

This change has the added benefit that 'tc qdisc show' prints noqueue
instead of nothing for devices defaulting to noqueue.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27 17:14:30 -07:00
Phil Sutter
d66d6c3152 net: sched: register noqueue qdisc
This way users can attach noqueue just like any other qdisc using tc
without having to mess with tx_queue_len first.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27 17:14:30 -07:00
Phil Sutter
db4094bca7 net: sched: ignore tx_queue_len when assigning default qdisc
Since alloc_netdev_mqs() sets IFF_NO_QUEUE for drivers not initializing
tx_queue_len, it is safe to assume that if tx_queue_len is zero,
dev->priv flags always contains IFF_NO_QUEUE.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27 17:14:29 -07:00
Phil Sutter
f84bb1eac0 net: fix IFF_NO_QUEUE for drivers using alloc_netdev
Printing a warning in alloc_netdev_mqs() if tx_queue_len is zero and
IFF_NO_QUEUE not set is not appropriate since drivers may use one of the
alloc_netdev* macros instead of alloc_etherdev*, thereby not
intentionally leaving tx_queue_len uninitialized. Instead check here if
tx_queue_len is zero and set IFF_NO_QUEUE, so the value of tx_queue_len
can be ignored in net/sched_generic.c.

Fixes: 906470c ("net: warn if drivers set tx_queue_len = 0")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27 17:14:29 -07:00
Jean Sacren
69dba9bbc5 sock: fix kernel doc error
The symbol '__sk_reclaim' is not present in the current tree. Apparently
'__sk_reclaim' was meant to be '__sk_mem_reclaim', so fix it with the
right symbol name for the kernel doc.

Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Cc: Hideo Aoki <haoki@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27 17:12:41 -07:00
lucien
f648f807f6 sctp: donot reset the overall_error_count in SHUTDOWN_RECEIVE state
Commit f8d9605243 ("sctp: Enforce retransmission limit during shutdown")
fixed a problem with excessive retransmissions in the SHUTDOWN_PENDING by not
resetting the association overall_error_count.  This allowed the association
to better enforce assoc.max_retrans limit.

However, the same issue still exists when the association is in SHUTDOWN_RECEIVED
state.  In this state, HB-ACKs will continue to reset the overall_error_count
for the association would extend the lifetime of association unnecessarily.

This patch solves this by resetting the overall_error_count whenever the current
state is small then SCTP_STATE_SHUTDOWN_PENDING.  As a small side-effect, we
end up also handling SCTP_STATE_SHUTDOWN_ACK_SENT and SCTP_STATE_SHUTDOWN_SENT
states, but they are not really impacted because we disable Heartbeats in those
states.

Fixes: Commit f8d9605243 ("sctp: Enforce retransmission limit during shutdown")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27 17:11:44 -07:00
Nikolay Aleksandrov
b22fbf22f8 bridge: fdb: rearrange net_bridge_fdb_entry
While looking into fixing the local entries scalability issue I noticed
that the structure is badly arranged because vlan_id would fall in a
second cache line while keeping rcu which is used only when deleting
in the first, so re-arrange the structure and push rcu to the end so we
can get 16 bytes which can be used for other fields (by pushing rcu
fully in the second 64 byte chunk). With this change all the core
necessary information when doing fdb lookups will be available in a
single cache line.

pahole before (note vlan_id):
struct net_bridge_fdb_entry {
	struct hlist_node          hlist;                /*     0    16 */
	struct net_bridge_port *   dst;                  /*    16     8 */
	struct callback_head       rcu;                  /*    24    16 */
	long unsigned int          updated;              /*    40     8 */
	long unsigned int          used;                 /*    48     8 */
	mac_addr                   addr;                 /*    56     6 */
	unsigned char              is_local:1;           /*    62: 7  1 */
	unsigned char              is_static:1;          /*    62: 6  1 */
	unsigned char              added_by_user:1;      /*    62: 5  1 */
	unsigned char              added_by_external_learn:1; /*    62: 4  1 */

	/* XXX 4 bits hole, try to pack */
	/* XXX 1 byte hole, try to pack */

	/* --- cacheline 1 boundary (64 bytes) --- */
	__u16                      vlan_id;              /*    64     2 */

	/* size: 72, cachelines: 2, members: 11 */
	/* sum members: 65, holes: 1, sum holes: 1 */
	/* bit holes: 1, sum bit holes: 4 bits */
	/* padding: 6 */
	/* last cacheline: 8 bytes */
}

pahole after (note vlan_id):
struct net_bridge_fdb_entry {
	struct hlist_node          hlist;                /*     0    16 */
	struct net_bridge_port *   dst;                  /*    16     8 */
	long unsigned int          updated;              /*    24     8 */
	long unsigned int          used;                 /*    32     8 */
	mac_addr                   addr;                 /*    40     6 */
	__u16                      vlan_id;              /*    46     2 */
	unsigned char              is_local:1;           /*    48: 7  1 */
	unsigned char              is_static:1;          /*    48: 6  1 */
	unsigned char              added_by_user:1;      /*    48: 5  1 */
	unsigned char              added_by_external_learn:1; /*    48: 4  1 */

	/* XXX 4 bits hole, try to pack */
	/* XXX 7 bytes hole, try to pack */

	struct callback_head       rcu;                  /*    56    16 */
	/* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */

	/* size: 72, cachelines: 2, members: 11 */
	/* sum members: 65, holes: 1, sum holes: 7 */
	/* bit holes: 1, sum bit holes: 4 bits */
	/* last cacheline: 8 bytes */
}

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27 16:38:52 -07:00
Joe Stringer
7b85b4dff2 openvswitch: Include ip6_fib.h.
kbuild test robot reports that certain configurations will not
automatically pick up on the "struct rt6_info" definition, so explicitly
include the header for this structure.

Fixes: 7f8a436 "openvswitch: Add conntrack action"
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27 16:35:51 -07:00
Jiri Pirko
35d4e17252 net: add netif_is_ovs_master helper with IFF_OPENVSWITCH private flag
Add this helper so code can easily figure out if netdev is openswitch.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27 16:28:35 -07:00
Jiri Pirko
0e4ead9d7b net: introduce change upper device notifier change info
Add info that is passed along with NETDEV_CHANGEUPPER event.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27 16:28:34 -07:00
Pravin B Shelar
371bd1061d geneve: Consolidate Geneve functionality in single module.
geneve_core module handles send and receive functionality.
This way OVS could use the Geneve API. Now with use of
tunnel meatadata mode OVS can directly use Geneve netdevice.
So there is no need for separate module for Geneve. Following
patch consolidates Geneve protocol processing in single module.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Reviewed-by: Jesse Gross <jesse@nicira.com>
Acked-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27 15:42:48 -07:00
Pravin B Shelar
6b001e682e openvswitch: Use Geneve device.
With help of tunnel metadata mode OVS can directly use
Geneve devices to implement Geneve tunnels.
This patch removes all of the OVS specific Geneve code
and make OVS use a Geneve net_device. Basic geneve vport
is still there to handle compatibility with current
userspace application.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Reviewed-by: Jesse Gross <jesse@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27 15:42:47 -07:00
Pravin B Shelar
c29a70d2ca tunnel: introduce udp_tun_rx_dst()
Introduce function udp_tun_rx_dst() to initialize tunnel dst on
receive path.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Reviewed-by: Jesse Gross <jesse@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27 15:42:47 -07:00
Toshiaki Makita
d2d427b392 bridge: Add netlink support for vlan_protocol attribute
This enables bridge vlan_protocol to be configured through netlink.

When CONFIG_BRIDGE_VLAN_FILTERING is disabled, kernel behaves the
same way as this feature is not implemented.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27 15:35:33 -07:00
Daniel Borkmann
3b3ae88026 net: sched: consolidate tc_classify{,_compat}
For classifiers getting invoked via tc_classify(), we always need an
extra function call into tc_classify_compat(), as both are being
exported as symbols and tc_classify() itself doesn't do much except
handling of reclassifications when tp->classify() returned with
TC_ACT_RECLASSIFY.

CBQ and ATM are the only qdiscs that directly call into tc_classify_compat(),
all others use tc_classify(). When tc actions are being configured
out in the kernel, tc_classify() effectively does nothing besides
delegating.

We could spare this layer and consolidate both functions. pktgen on
single CPU constantly pushing skbs directly into the netif_receive_skb()
path with a dummy classifier on ingress qdisc attached, improves
slightly from 22.3Mpps to 23.1Mpps.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27 14:18:48 -07:00
lucien
ce7b4ccc4f sctp: asconf's process should verify address parameter is in the beginning
in sctp_process_asconf(), we get address parameter from the beginning of
the addip params. but we never check if it's really there. if the addr
param is not there, it still can pass sctp_verify_asconf(), then to be
handled by sctp_process_asconf(), it will not be safe.

so add a code in sctp_verify_asconf() to check the address parameter is in
the beginning, or return false to send abort.

note that this can also detect multiple address parameters, and reject it.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27 13:59:33 -07:00
Joe Stringer
cae3a26275 openvswitch: Allow attaching helpers to ct action
Add support for using conntrack helpers to assist protocol detection.
The new OVS_CT_ATTR_HELPER attribute of the CT action specifies a helper
to be used for this connection. If no helper is specified, then helpers
will be automatically applied as per the sysctl configuration of
net.netfilter.nf_conntrack_helper.

The helper may be specified as part of the conntrack action, eg:
ct(helper=ftp). Initial packets for related connections should be
committed to allow later packets for the flow to be considered
established.

Example ovs-ofctl flows allowing FTP connections from ports 1->2:
in_port=1,tcp,action=ct(helper=ftp,commit),2
in_port=2,tcp,ct_state=-trk,action=ct(recirc)
in_port=2,tcp,ct_state=+trk-new+est,action=1
in_port=2,tcp,ct_state=+trk+rel,action=1

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27 11:40:43 -07:00
Joe Stringer
c2ac667358 openvswitch: Allow matching on conntrack label
Allow matching and setting the ct_label field. As with ct_mark, this is
populated by executing the CT action. The label field may be modified by
specifying a label and mask nested under the CT action. It is stored as
metadata attached to the connection. Label modification occurs after
lookup, and will only persist when the conntrack entry is committed by
providing the COMMIT flag to the CT action. Labels are currently fixed
to 128 bits in size.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27 11:40:43 -07:00
Joe Stringer
86ca02e774 netfilter: connlabels: Export setting connlabel length
Add functions to change connlabel length into nf_conntrack_labels.c so
they may be reused by other modules like OVS and nftables without
needing to jump through xt_match_check() hoops.

Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Florian Westphal <fw@strlen.de>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27 11:40:43 -07:00
Joe Stringer
55e5713f2b netfilter: Always export nf_connlabels_replace()
The following patches will reuse this code from OVS.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27 11:40:43 -07:00
Joe Stringer
182e3042e1 openvswitch: Allow matching on conntrack mark
Allow matching and setting the ct_mark field. As with ct_state and
ct_zone, these fields are populated when the CT action is executed. To
write to this field, a value and mask can be specified as a nested
attribute under the CT action. This data is stored with the conntrack
entry, and is executed after the lookup occurs for the CT action. The
conntrack entry itself must be committed using the COMMIT flag in the CT
action flags for this change to persist.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27 11:40:43 -07:00
Joe Stringer
7f8a436eaa openvswitch: Add conntrack action
Expose the kernel connection tracker via OVS. Userspace components can
make use of the CT action to populate the connection state (ct_state)
field for a flow. This state can be subsequently matched.

Exposed connection states are OVS_CS_F_*:
- NEW (0x01) - Beginning of a new connection.
- ESTABLISHED (0x02) - Part of an existing connection.
- RELATED (0x04) - Related to an established connection.
- INVALID (0x20) - Could not track the connection for this packet.
- REPLY_DIR (0x40) - This packet is in the reply direction for the flow.
- TRACKED (0x80) - This packet has been sent through conntrack.

When the CT action is executed by itself, it will send the packet
through the connection tracker and populate the ct_state field with one
or more of the connection state flags above. The CT action will always
set the TRACKED bit.

When the COMMIT flag is passed to the conntrack action, this specifies
that information about the connection should be stored. This allows
subsequent packets for the same (or related) connections to be
correlated with this connection. Sending subsequent packets for the
connection through conntrack allows the connection tracker to consider
the packets as ESTABLISHED, RELATED, and/or REPLY_DIR.

The CT action may optionally take a zone to track the flow within. This
allows connections with the same 5-tuple to be kept logically separate
from connections in other zones. If the zone is specified, then the
"ct_zone" match field will be subsequently populated with the zone id.

IP fragments are handled by transparently assembling them as part of the
CT action. The maximum received unit (MRU) size is tracked so that
refragmentation can occur during output.

IP frag handling contributed by Andy Zhou.

Based on original design by Justin Pettit.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27 11:40:43 -07:00
Joe Stringer
5b49004724 ipv6: Export nf_ct_frag6_gather()
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27 11:40:42 -07:00
Joe Stringer
be26b9a88f openvswitch: Move MASKED* macros to datapath.h
This will allow the ovs-conntrack code to reuse these macros.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27 11:40:42 -07:00
Joe Stringer
8e2fed1c0c openvswitch: Serialize acts with original netlink len
Previously, we used the kernel-internal netlink actions length to
calculate the size of messages to serialize back to userspace.
However,the sw_flow_actions may not be formatted exactly the same as the
actions on the wire, so store the original actions length when
de-serializing and re-use the original length when serializing.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-27 11:40:42 -07:00
Marek Lindner
ed29266347 batman-adv: turn batadv_neigh_node_get() into local function
commit c214ebe1eb29 ("batman-adv: move neigh_node list add into
batadv_neigh_node_new()") removed external calls to
batadv_neigh_node_get().

Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-27 20:15:34 +02:00
Sven Eckelmann
7bca68c784 batman-adv: Add lower layer needed_(head|tail)room to own ones
The maximum of hard_header_len and maximum of all needed_(head|tail)room of
all slave interfaces of a batman-adv device must be used to define the
batman-adv device needed_(head|tail)room. This is required to avoid too
small buffer problems when these slave devices try to send the encapsulated
packet in a tx path without the possibility to resize the skbuff.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-27 20:15:34 +02:00
Antonio Quartulli
a5256f7e74 batman-adv: don't access unregistered net_device object
In batadv_hardif_disable_interface() there is a call to
batadv_softif_destroy_sysfs() which in turns invokes
unregister_netdevice() on the soft_iface.
After this point we cannot rely on the soft_iface object
anymore because it might get free'd by the netdev periodic
routine at any time.

For this reason the netdev_upper_dev_unlink(.., soft_iface) call
is moved before the invocation of batadv_softif_destroy_sysfs() so
that we can be sure that the soft_iface object is still valid.

Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2015-08-27 20:15:33 +02:00
Simon Wunderlich
07c48eca16 batman-adv: Start new development cycle
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-27 20:15:33 +02:00
Simon Wunderlich
1e3b4669e7 batman-adv: fix gateway client style issues
commit 0511575c4d03 ("batman-adv: remove obsolete deleted attribute for
gateway node") incorrectly added an empy line and forgot to remove an
include.

Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-27 20:15:32 +02:00
Marek Lindner
3f32f8a687 batman-adv: rearrange batadv_neigh_node_new() arguments to follow convention
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Acked-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-27 20:15:32 +02:00
Simon Wunderlich
bd3524c14b batman-adv: remove obsolete deleted attribute for gateway node
With rcu, the gateway node deleted attribute is not needed anymore. In
fact, it may delay the free of the gateway node and its referenced
structures. Therefore remove it altogether and simplify purging as well.

Signed-off-by: Simon Wunderlich <simon@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-27 20:15:32 +02:00
Marek Lindner
741aa06bfb batman-adv: move neigh_node list add into batadv_neigh_node_new()
All batadv_neigh_node_* functions expect the neigh_node list item to be part
of the orig_node->neigh_list, therefore the constructor of said list item
should be adding the newly created neigh_node to the respective list.

Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Acked-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-27 20:15:31 +02:00
Marek Lindner
39bf7618f0 batman-adv: remove redundant hard_iface assignment
The batadv_neigh_node_new() function already sets the hard_iface pointer.

Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Acked-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-27 20:15:31 +02:00
Marek Lindner
f729dc70da batman-adv: move hardif refcount inc to batadv_neigh_node_new()
The batadv_neigh_node cleanup function 'batadv_neigh_node_free_rcu()'
takes care of reducing the hardif refcounter, hence it's only logical
to assume the creating function of that same object
'batadv_neigh_node_new()' takes care of increasing the same refcounter.

Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Acked-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2015-08-27 20:15:30 +02:00
Alexei Starovoitov
1dd34b5ad8 bpf: fix bpf_skb_set_tunnel_key() helper
Make sure to indicate to tunnel driver that key.tun_id is set,
otherwise gre won't recognize the metadata.

Fixes: d3aa45ce6b ("bpf: add helpers to access tunnel metadata")
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-26 17:38:13 -07:00
Alexei Starovoitov
cff82457c5 net_sched: act_bpf: remove spinlock in fast path
Similar to act_gact/act_mirred, act_bpf can be lockless in packet processing
with extra care taken to free bpf programs after rcu grace period.
Replacement of existing act_bpf (very rare) is done with synchronize_rcu()
and final destruction is done from tc_action_ops->cleanup() callback that is
called from tcf_exts_destroy()->tcf_action_destroy()->__tcf_hash_release() when
bind and refcnt reach zero which is only possible when classifier is destroyed.
Previous two patches fixed the last two classifiers (tcindex and rsvp) to
call tcf_exts_destroy() from rcu callback.

Similar to gact/mirred there is a race between prog->filter and
prog->tcf_action. Meaning that the program being replaced may use
previous default action if it happened to return TC_ACT_UNSPEC.
act_mirred race betwen tcf_action and tcfm_dev is similar.
In all cases the race is harmless.
Long term we may want to improve the situation by replacing the whole
tc_action->priv as single pointer instead of updating inner fields one by one.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-26 11:01:45 -07:00
Alexei Starovoitov
9e528d8915 net_sched: convert rsvp to call tcf_exts_destroy from rcu callback
Adjust destroy path of cls_rsvp to call tcf_exts_destroy() after
rcu grace period.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-26 11:01:45 -07:00
Alexei Starovoitov
ed7aa879ce net_sched: convert tcindex to call tcf_exts_destroy from rcu callback
Adjust destroy path of cls_tcindex to call tcf_exts_destroy() after
rcu grace period.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-26 11:01:44 -07:00