Commit Graph

42161 Commits

Author SHA1 Message Date
Wu Fengguang
fa54cc70ed rxrpc: fix ptr_ret.cocci warnings
net/rxrpc/rxkad.c:1165:1-3: WARNING: PTR_ERR_OR_ZERO can be used

 Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR

Generated by: scripts/coccinelle/api/ptr_ret.cocci

CC: David Howells <dhowells@redhat.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 15:30:21 -07:00
Sowmini Varadhan
9c79440e2c RDS: TCP: fix race windows in send-path quiescence by rds_tcp_accept_one()
The send path needs to be quiesced before resetting callbacks from
rds_tcp_accept_one(), and commit eb19284026 ("RDS:TCP: Synchronize
rds_tcp_accept_one with rds_send_xmit when resetting t_sock") achieves
this using the c_state and RDS_IN_XMIT bit following the pattern
used by rds_conn_shutdown(). However this leaves the possibility
of a race window as shown in the sequence below
    take t_conn_lock in rds_tcp_conn_connect
    send outgoing syn to peer
    drop t_conn_lock in rds_tcp_conn_connect
    incoming from peer triggers rds_tcp_accept_one, conn is
	marked CONNECTING
    wait for RDS_IN_XMIT to quiesce any rds_send_xmit threads
    call rds_tcp_reset_callbacks
    [.. race-window where incoming syn-ack can cause the conn
	to be marked UP from rds_tcp_state_change ..]
    lock_sock called from rds_tcp_reset_callbacks, and we set
	t_sock to null
As soon as the conn is marked UP in the race-window above, rds_send_xmit()
threads will proceed to rds_tcp_xmit and may encounter a null-pointer
deref on the t_sock.

Given that rds_tcp_state_change() is invoked in softirq context, whereas
rds_tcp_reset_callbacks() is in workq context, and testing for RDS_IN_XMIT
after lock_sock could result in a deadlock with tcp_sendmsg, this
commit fixes the race by using a new c_state, RDS_TCP_RESETTING, which
will prevent a transition to RDS_CONN_UP from rds_tcp_state_change().

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 15:10:15 -07:00
Sowmini Varadhan
0b6f760cff RDS: TCP: Retransmit half-sent datagrams when switching sockets in rds_tcp_reset_callbacks
When we switch a connection's sockets in rds_tcp_rest_callbacks,
any partially sent datagram must be retransmitted on the new
socket so that the receiver can correctly reassmble the RDS
datagram. Use rds_send_reset() which is designed for this purpose.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 15:10:15 -07:00
Sowmini Varadhan
335b48d980 RDS: TCP: Add/use rds_tcp_reset_callbacks to reset tcp socket safely
When rds_tcp_accept_one() has to replace the existing tcp socket
with a newer tcp socket (duelling-syn resolution), it must lock_sock()
to suppress the rds_tcp_data_recv() path while callbacks are being
changed.  Also, existing RDS datagram reassembly state must be reset,
so that the next datagram on the new socket  does not have corrupted
state. Similarly when resetting the newly accepted socket, appropriate
locks and synchronization is needed.

This commit ensures correct synchronization by invoking
kernel_sock_shutdown to reset a newly accepted sock, and by taking
appropriate lock_sock()s (for old and new sockets) when resetting
existing callbacks.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 15:10:15 -07:00
Eric Dumazet
80e509db54 fq_codel: fix NET_XMIT_CN behavior
My prior attempt to fix the backlogs of parents failed.

If we return NET_XMIT_CN, our parents wont increase their backlog,
so our qdisc_tree_reduce_backlog() should take this into account.

v2: Florian Westphal pointed out that we could drop the packet,
so we need to save qdisc_pkt_len(skb) in a temp variable before
calling fq_codel_drop()

Fixes: 9d18562a22 ("fq_codel: add batch ability to fq_codel_drop()")
Fixes: 2ccccf5fb4 ("net_sched: update hierarchical backlog too")
Reported-by: Stas Nichiporovich <stasn77@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07 14:49:56 -07:00
WANG Cong
a27758ffaf net_sched: keep backlog updated with qlen
For gso_skb we only update qlen, backlog should be updated too.

Note, it is correct to just update these stats at one layer,
because the gso_skb is cached there.

Reported-by: Stas Nichiporovich <stasn77@gmail.com>
Fixes: 2ccccf5fb4 ("net_sched: update hierarchical backlog too")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-06 21:14:29 -04:00
Helge Deller
1957598840 soreuseport: add compat case for setsockopt SO_ATTACH_REUSEPORT_CBPF
Commit 538950a1b7 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF")
missed to add the compat case for the SO_ATTACH_REUSEPORT_CBPF option.

Signed-off-by: Helge Deller <deller@gmx.de>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-06 15:21:04 -07:00
WANG Cong
8d5958f424 sch_tbf: update backlog as well
Fixes: 2ccccf5fb4 ("net_sched: update hierarchical backlog too")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:24:04 -04:00
WANG Cong
d7f4f332f0 sch_red: update backlog as well
Fixes: 2ccccf5fb4 ("net_sched: update hierarchical backlog too")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:24:04 -04:00
WANG Cong
6a73b571b6 sch_drr: update backlog as well
Fixes: 2ccccf5fb4 ("net_sched: update hierarchical backlog too")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:24:04 -04:00
WANG Cong
6529d75ad9 sch_prio: update backlog as well
We need to update backlog too when we update qlen.

Joint work with Stas.

Reported-by: Stas Nichiporovich <stasn77@gmail.com>
Tested-by: Stas Nichiporovich <stasn77@gmail.com>
Fixes: 2ccccf5fb4 ("net_sched: update hierarchical backlog too")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:24:04 -04:00
WANG Cong
357cc9b4a8 sch_hfsc: always keep backlog updated
hfsc updates backlog lazily, that is only when we
dump the stats. This is problematic after we begin to
update backlog in qdisc_tree_reduce_backlog().

Reported-by: Stas Nichiporovich <stasn77@gmail.com>
Tested-by: Stas Nichiporovich <stasn77@gmail.com>
Fixes: 2ccccf5fb4 ("net_sched: update hierarchical backlog too")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03 19:24:04 -04:00
Kangjie Lu
4116def233 rds: fix an infoleak in rds_inc_info_copy
The last field "flags" of object "minfo" is not initialized.
Copying this object out may leak kernel stack data.
Assign 0 to it to avoid leak.

Signed-off-by: Kangjie Lu <kjlu@gatech.edu>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-02 21:32:37 -07:00
Kangjie Lu
5d2be1422e tipc: fix an infoleak in tipc_nl_compat_link_dump
link_info.str is a char array of size 60. Memory after the NULL
byte is not initialized. Sending the whole object out can cause
a leak.

Signed-off-by: Kangjie Lu <kjlu@gatech.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-02 21:32:37 -07:00
Eric Dumazet
ce25d66ad5 Possible problem with e6afc8ac ("udp: remove headers from UDP packets before queueing")
Paul Moore tracked a regression caused by a recent commit, which
mistakenly assumed that sk_filter() could be avoided if socket
had no current BPF filter.

The intent was to avoid udp_lib_checksum_complete() overhead.

But sk_filter() also checks skb_pfmemalloc() and
security_sock_rcv_skb(), so better call it.

Fixes: e6afc8ace6 ("udp: remove headers from UDP packets before queueing")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Paul Moore <paul@paul-moore.com>
Tested-by: Paul Moore <paul@paul-moore.com>
Tested-by: Stephen Smalley <sds@tycho.nsa.gov>
Cc: samanthakumar <samanthakumar@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-02 18:29:49 -04:00
David S. Miller
fc14963f24 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for your net tree,
they are:

1) Fix incorrect timestamp in nfnetlink_queue introduced when addressing
   y2038 safe timestamp, from Florian Westphal.

2) Get rid of leftover conntrack definition from the previous merge
   window, oneliner from Florian.

3) Make nf_queue handler pernet to resolve race on dereferencing the
   hook state structure with netns removal, from Eric Biederman.

4) Ensure clean exit on unregistered helper ports, from Taehee Yoo.

5) Restore FLOWI_FLAG_KNOWN_NH in nf_dup_ipv6. This got lost while
   generalizing xt_TEE to add packet duplication support in nf_tables,
   from Paolo Abeni.

6) Insufficient netlink NFTA_SET_TABLE attribute check in
   nf_tables_getset(), from Phil Turnbull.

7) Reject helper registration on duplicated ports via modparams.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-01 17:54:19 -07:00
David S. Miller
31843af4dc Three small fixes for the current cycle:
* missing netlink attribute check in hwsim wmediumd (Martin)
  * fast xmit structure alignment fix (Felix)
  * mesh path flush/synchronisation fix (Bob)
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCgAGBQJXTrQgAAoJEGt7eEactAAd/cwP/jZDwNFqDNKoMjLrjOJsIvKv
 YIjysh56ZBik6h5PtwPakUN5zwV9+PnkRVQ16ApEuCGiitZEFbLUdldm/82OQBBd
 sOjjISqPrlLgsJwF6v3Yn16R3yffkXyi86j6XkzRcyqdUgluU0Uu6Hi96IwdlA/X
 eC8jHKbW+eb+46pbFU4dHOEqVM4cxg3+BAG74OARxhE9Lp81pLlbzB5dI6UldY5I
 k2kKLbsFJo3WZUS8R1t/xzv2fKAYgjJ3g9yMRlVC3HrUhvzYiQ1wZJ65LwhGaDr+
 LOl8chktlLWaZK4XzNbXBO+iz+aTP5yCd1s3hbMGlm/UgEgN9WDfgETeyw5u+Jw3
 +DyzIWsK0LvWiKsEV+fno2ZbR8ibgtdwgdY500mONFdaeQLl6lrYKu6OKLZWjGic
 fRW0q/9iQJetcoAQCjej4v275prLRfl02eso6k74NQL1gMt2BqmaHGb4NNCBoUG5
 sweEkAp4IDgJvrgLx48D8NKIZBb8hK9k2QAzY341coJg5bjNK4KG8L3XHMxmYwdZ
 +o/T0fCn92gcGYIfh1VcuipPiyAsc/DIDtnardwGHO1z+hhsKKaPDxB/Ie9XxnO5
 juBEzVaQm5Y40SONuT7Q+lZjZZF8GJAnfCw+7aynYvFAkHNxhZVijt3PWyzrBC4f
 0eIDgbLy7bRemKva3Rcx
 =6VCx
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2016-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Three small fixes for the current cycle:
 * missing netlink attribute check in hwsim wmediumd (Martin)
 * fast xmit structure alignment fix (Felix)
 * mesh path flush/synchronisation fix (Bob)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-01 17:53:19 -07:00
Linus Torvalds
6b15d6650c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix negative error code usage in ATM layer, from Stefan Hajnoczi.

 2) If CONFIG_SYSCTL is disabled, the default TTL is not initialized
    properly.  From Ezequiel Garcia.

 3) Missing spinlock init in mvneta driver, from Gregory CLEMENT.

 4) Missing unlocks in hwmb error paths, also from Gregory CLEMENT.

 5) Fix deadlock on team->lock when propagating features, from Ivan
    Vecera.

 6) Work around buffer offset hw bug in alx chips, from Feng Tang.

 7) Fix double listing of SCTP entries in sctp_diag dumps, from Xin
    Long.

 8) Various statistics bug fixes in mlx4 from Eric Dumazet.

 9) Fix some randconfig build errors wrt fou ipv6 from Arnd Bergmann.

10) All of l2tp was namespace aware, but the ipv6 support code was not
    doing so.  From Shmulik Ladkani.

11) Handle on-stack hrtimers properly in pktgen, from Guenter Roeck.

12) Propagate MAC changes properly through VLAN devices, from Mike
    Manning.

13) Fix memory leak in bnx2x_init_one(), from Vitaly Kuznetsov.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits)
  sfc: Track RPS flow IDs per channel instead of per function
  usbnet: smsc95xx: fix link detection for disabled autonegotiation
  virtio_net: fix virtnet_open and virtnet_probe competing for try_fill_recv
  bnx2x: avoid leaking memory on bnx2x_init_one() failures
  fou: fix IPv6 Kconfig options
  openvswitch: update checksum in {push,pop}_mpls
  sctp: sctp_diag should dump sctp socket type
  net: fec: update dirty_tx even if no skb
  vlan: Propagate MAC address to VLANs
  atm: iphase: off by one in rx_pkt()
  atm: firestream: add more reserved strings
  vxlan: Accept user specified MTU value when create new vxlan link
  net: pktgen: Call destroy_hrtimer_on_stack()
  timer: Export destroy_hrtimer_on_stack()
  net: l2tp: Make l2tp_ip6 namespace aware
  Documentation: ip-sysctl.txt: clarify secure_redirects
  sfc: use flow dissector helpers for aRFS
  ieee802154: fix logic error in ieee802154_llsec_parse_dev_addr
  net: nps_enet: Disable interrupts before napi reschedule
  net/lapb: tuse %*ph to dump buffers
  ...
2016-05-31 22:28:28 -07:00
Arnd Bergmann
95e4daa820 fou: fix IPv6 Kconfig options
The Kconfig options I added to work around broken compilation ended
up screwing up things more, as I used the wrong symbol to control
compilation of the file, resulting in IPv6 fou support to never be built
into the kernel.

Changing CONFIG_NET_FOU_IPV6_TUNNELS to CONFIG_IPV6_FOU fixes that
problem, I had renamed the symbol in one location but not the other,
and as the file is never being used by other kernel code, this did not
lead to a build failure that I would have caught.

After that fix, another issue with the same patch becomes obvious, as we
'select INET6_TUNNEL', which is related to IPV6_TUNNEL, but not the same,
and this can still cause the original build failure when IPV6_TUNNEL is
not built-in but IPV6_FOU is. The fix is equally trivial, we just need
to select the right symbol.

I have successfully build 350 randconfig kernels with this patch
and verified that the driver is now being built.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Valentin Rothberg <valentinrothberg@gmail.com>
Fixes: fabb13db44 ("fou: add Kconfig options for IPv6 support")
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-31 14:07:49 -07:00
Simon Horman
bc7cc5999f openvswitch: update checksum in {push,pop}_mpls
In the case of CHECKSUM_COMPLETE the skb checksum should be updated in
{push,pop}_mpls() as they the type in the ethernet header.

As suggested by Pravin Shelar.

Cc: Pravin Shelar <pshelar@nicira.com>
Fixes: 25cd9ba0ab ("openvswitch: Add basic MPLS support to kernel")
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-31 13:51:42 -07:00
Xin Long
40eb90e9cc sctp: sctp_diag should dump sctp socket type
Now we cannot distinguish that one sk is a udp or sctp style when
we use ss to dump sctp_info. it's necessary to dump it as well.

For sctp_diag, ss support is not officially available, thus there
are no official users of this yet, so we can add this field in the
middle of sctp_info without breaking user API.

v1->v2:
  - move 'sctpi_s_type' field to the end of struct sctp_info, so
    that it won't cause incompatibility with applications already
    built.
  - add __reserved3 in sctp_info to make sure sctp_info is 8-byte
    alignment.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-31 11:59:06 -07:00
Mike Manning
308453aa91 vlan: Propagate MAC address to VLANs
The MAC address of the physical interface is only copied to the VLAN
when it is first created, resulting in an inconsistency after MAC
address changes of only newly created VLANs having an up-to-date MAC.

The VLANs should continue inheriting the MAC address of the physical
interface until the VLAN MAC address is explicitly set to any value.
This allows IPv6 EUI64 addresses for the VLAN to reflect any changes
to the MAC of the physical interface and thus for DAD to behave as
expected.

Signed-off-by: Mike Manning <mmanning@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-31 11:56:48 -07:00
Guenter Roeck
bcf91bdb44 net: pktgen: Call destroy_hrtimer_on_stack()
If CONFIG_DEBUG_OBJECTS_TIMERS=y, hrtimer_init_on_stack() requires
a matching call to destroy_hrtimer_on_stack() to clean up timer
debug objects.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-31 11:44:08 -07:00
Felix Fietkau
6fe04128f1 mac80211: fix fast_tx header alignment
The header field is defined as u8[] but also accessed as struct
ieee80211_hdr. Enforce an alignment of 2 to prevent unnecessary
unaligned accesses, which can be very harmful for performance on many
platforms.

Fixes: e495c24731 ("mac80211: extend fast-xmit for more ciphers")
Cc: stable@vger.kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-05-31 12:14:04 +02:00
Bob Copeland
fe7a7c5762 mac80211: mesh: flush mesh paths unconditionally
Currently, the mesh paths associated with a nexthop station are cleaned
up in the following code path:

    __sta_info_destroy_part1
    synchronize_net()
    __sta_info_destroy_part2
     -> cleanup_single_sta
       -> mesh_sta_cleanup
         -> mesh_plink_deactivate
           -> mesh_path_flush_by_nexthop

However, there are a couple of problems here:

1) the paths aren't flushed at all if the MPM is running in userspace
   (e.g. when using wpa_supplicant or authsae)

2) there is no synchronize_rcu between removing the path and readers
   accessing the nexthop, which means the following race is possible:

CPU0                            CPU1
~~~~                            ~~~~
                                sta_info_destroy_part1()
                                synchronize_net()
rcu_read_lock()
mesh_nexthop_resolve()
  mpath = mesh_path_lookup()
                                [...] -> mesh_path_flush_by_nexthop()
  sta = rcu_dereference(
    mpath->next_hop)
                                kfree(sta)
  access sta <-- CRASH

Fix both of these by unconditionally flushing paths before destroying
the sta, and by adding a synchronize_net() after path flush to ensure
no active readers can still dereference the sta.

Fixes this crash:

[  348.529295] BUG: unable to handle kernel paging request at 00020040
[  348.530014] IP: [<f929245d>] ieee80211_mps_set_frame_flags+0x40/0xaa [mac80211]
[  348.530014] *pde = 00000000
[  348.530014] Oops: 0000 [#1] PREEMPT
[  348.530014] Modules linked in: drbg ansi_cprng ctr ccm ppp_generic slhc ipt_MASQUERADE nf_nat_masquerade_ipv4 8021q ]
[  348.530014] CPU: 0 PID: 20597 Comm: wget Tainted: G           O 4.6.0-rc5-wt=V1 #1
[  348.530014] Hardware name: To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080016  11/07/2014
[  348.530014] task: f64fa280 ti: f4f9c000 task.ti: f4f9c000
[  348.530014] EIP: 0060:[<f929245d>] EFLAGS: 00010246 CPU: 0
[  348.530014] EIP is at ieee80211_mps_set_frame_flags+0x40/0xaa [mac80211]
[  348.530014] EAX: f4ce63e0 EBX: 00000088 ECX: f3788416 EDX: 00020008
[  348.530014] ESI: 00000000 EDI: 00000088 EBP: f6409a4c ESP: f6409a40
[  348.530014]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[  348.530014] CR0: 80050033 CR2: 00020040 CR3: 33190000 CR4: 00000690
[  348.530014] Stack:
[  348.530014]  00000000 f4ce63e0 f5f9bd80 f6409a64 f9291d80 0000ce67 f5d51e00 f4ce63e0
[  348.530014]  f3788416 f6409a80 f9291dc1 f4ce8320 f4ce63e0 f5d51e00 f4ce63e0 f4ce8320
[  348.530014]  f6409a98 f9277f6f 00000000 00000000 0000007c 00000000 f6409b2c f9278dd1
[  348.530014] Call Trace:
[  348.530014]  [<f9291d80>] mesh_nexthop_lookup+0xbb/0xc8 [mac80211]
[  348.530014]  [<f9291dc1>] mesh_nexthop_resolve+0x34/0xd8 [mac80211]
[  348.530014]  [<f9277f6f>] ieee80211_xmit+0x92/0xc1 [mac80211]
[  348.530014]  [<f9278dd1>] __ieee80211_subif_start_xmit+0x807/0x83c [mac80211]
[  348.530014]  [<c04df012>] ? sch_direct_xmit+0xd7/0x1b3
[  348.530014]  [<c022a8c6>] ? __local_bh_enable_ip+0x5d/0x7b
[  348.530014]  [<f956870c>] ? nf_nat_ipv4_out+0x4c/0xd0 [nf_nat_ipv4]
[  348.530014]  [<f957e036>] ? iptable_nat_ipv4_fn+0xf/0xf [iptable_nat]
[  348.530014]  [<c04c6f45>] ? netif_skb_features+0x14d/0x30a
[  348.530014]  [<f9278e10>] ieee80211_subif_start_xmit+0xa/0xe [mac80211]
[  348.530014]  [<c04c769c>] dev_hard_start_xmit+0x1f8/0x267
[  348.530014]  [<c04c7261>] ?  validate_xmit_skb.isra.120.part.121+0x10/0x253
[  348.530014]  [<c04defc6>] sch_direct_xmit+0x8b/0x1b3
[  348.530014]  [<c04c7a9c>] __dev_queue_xmit+0x2c8/0x513
[  348.530014]  [<c04c7cfb>] dev_queue_xmit+0xa/0xc
[  348.530014]  [<f91bfc7a>] batadv_send_skb_packet+0xd6/0xec [batman_adv]
[  348.530014]  [<f91bfdc4>] batadv_send_unicast_skb+0x15/0x4a [batman_adv]
[  348.530014]  [<f91b5938>] batadv_dat_send_data+0x27e/0x310 [batman_adv]
[  348.530014]  [<f91c30b5>] ? batadv_tt_global_hash_find.isra.11+0x8/0xa [batman_adv]
[  348.530014]  [<f91b63f3>] batadv_dat_snoop_outgoing_arp_request+0x208/0x23d [batman_adv]
[  348.530014]  [<f91c0cd9>] batadv_interface_tx+0x206/0x385 [batman_adv]
[  348.530014]  [<c04c769c>] dev_hard_start_xmit+0x1f8/0x267
[  348.530014]  [<c04c7261>] ?  validate_xmit_skb.isra.120.part.121+0x10/0x253
[  348.530014]  [<c04defc6>] sch_direct_xmit+0x8b/0x1b3
[  348.530014]  [<c04c7a9c>] __dev_queue_xmit+0x2c8/0x513
[  348.530014]  [<f80cbd2a>] ? igb_xmit_frame+0x57/0x72 [igb]
[  348.530014]  [<c04c7cfb>] dev_queue_xmit+0xa/0xc
[  348.530014]  [<f843a326>] br_dev_queue_push_xmit+0xeb/0xfb [bridge]
[  348.530014]  [<f843a35f>] br_forward_finish+0x29/0x74 [bridge]
[  348.530014]  [<f843a23b>] ? deliver_clone+0x3b/0x3b [bridge]
[  348.530014]  [<f843a714>] __br_forward+0x89/0xe7 [bridge]
[  348.530014]  [<f843a336>] ? br_dev_queue_push_xmit+0xfb/0xfb [bridge]
[  348.530014]  [<f843a234>] deliver_clone+0x34/0x3b [bridge]
[  348.530014]  [<f843a68b>] ? br_flood+0x95/0x95 [bridge]
[  348.530014]  [<f843a66d>] br_flood+0x77/0x95 [bridge]
[  348.530014]  [<f843a809>] br_flood_forward+0x13/0x1a [bridge]
[  348.530014]  [<f843a68b>] ? br_flood+0x95/0x95 [bridge]
[  348.530014]  [<f843b877>] br_handle_frame_finish+0x392/0x3db [bridge]
[  348.530014]  [<c04e9b2b>] ? nf_iterate+0x2b/0x6b
[  348.530014]  [<f843baa6>] br_handle_frame+0x1e6/0x240 [bridge]
[  348.530014]  [<f843b4e5>] ? br_handle_local_finish+0x6a/0x6a [bridge]
[  348.530014]  [<c04c4ba0>] __netif_receive_skb_core+0x43a/0x66b
[  348.530014]  [<f843b8c0>] ? br_handle_frame_finish+0x3db/0x3db [bridge]
[  348.530014]  [<c023cea4>] ? resched_curr+0x19/0x37
[  348.530014]  [<c0240707>] ? check_preempt_wakeup+0xbf/0xfe
[  348.530014]  [<c0255dec>] ? ktime_get_with_offset+0x5c/0xfc
[  348.530014]  [<c04c4fc1>] __netif_receive_skb+0x47/0x55
[  348.530014]  [<c04c57ba>] netif_receive_skb_internal+0x40/0x5a
[  348.530014]  [<c04c61ef>] napi_gro_receive+0x3a/0x94
[  348.530014]  [<f80ce8d5>] igb_poll+0x6fd/0x9ad [igb]
[  348.530014]  [<c0242bd8>] ? swake_up_locked+0x14/0x26
[  348.530014]  [<c04c5d29>] net_rx_action+0xde/0x250
[  348.530014]  [<c022a743>] __do_softirq+0x8a/0x163
[  348.530014]  [<c022a6b9>] ? __hrtimer_tasklet_trampoline+0x19/0x19
[  348.530014]  [<c021100f>] do_softirq_own_stack+0x26/0x2c
[  348.530014]  <IRQ>
[  348.530014]  [<c022a957>] irq_exit+0x31/0x6f
[  348.530014]  [<c0210eb2>] do_IRQ+0x8d/0xa0
[  348.530014]  [<c058152c>] common_interrupt+0x2c/0x40
[  348.530014] Code: e7 8c 00 66 81 ff 88 00 75 12 85 d2 75 0e b2 c3 b8 83 e9 29 f9 e8 a7 5f f9 c6 eb 74 66 81 e3 8c 005
[  348.530014] EIP: [<f929245d>] ieee80211_mps_set_frame_flags+0x40/0xaa [mac80211] SS:ESP 0068:f6409a40
[  348.530014] CR2: 0000000000020040
[  348.530014] ---[ end trace 48556ac26779732e ]---
[  348.530014] Kernel panic - not syncing: Fatal exception in interrupt
[  348.530014] Kernel Offset: disabled

Cc: stable@vger.kernel.org
Reported-by: Fred Veldini <fred.veldini@gmail.com>
Tested-by: Fred Veldini <fred.veldini@gmail.com>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-05-31 12:12:53 +02:00
Pablo Neira Ayuso
893e093c78 netfilter: nf_ct_helper: bail out on duplicated helpers
Don't allow registration of helpers using the same tuple:

	{ l3proto, l4proto, src-port }

We lookup for the helper from the packet path using this tuple through
__nf_ct_helper_find(). Therefore, we have to avoid having two helpers
with the same tuple to ensure predictible behaviour.

Don't compare the helper string names anymore since it is valid to
register two helpers with the same name, but using different tuples.
This is also implicitly fixing up duplicated helper registration via
ports= modparam since the name comparison was defeating the tuple
duplication validation.

Reported-by: Feng Gao <gfree.wind@gmail.com>
Reported-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-05-31 11:57:18 +02:00
Phil Turnbull
eaa2bcd6d1 netfilter: nf_tables: validate NFTA_SET_TABLE parameter
If the NFTA_SET_TABLE parameter is missing and the NLM_F_DUMP flag is
not set, then a NULL pointer dereference is triggered in
nf_tables_set_lookup because ctx.table is NULL.

Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-05-30 12:21:23 +02:00
Paolo Abeni
83170f3bec netfilter: nf_dup_ipv6: set again FLOWI_FLAG_KNOWN_NH at flowi6_flags
With the commit 48e8aa6e31 ("ipv6: Set FLOWI_FLAG_KNOWN_NH at
flowi6_flags") ip6_pol_route() callers were asked to to set the
FLOWI_FLAG_KNOWN_NH properly and xt_TEE was updated accordingly,
but with the later refactor in commit bbde9fc182 ("netfilter:
factor out packet duplication for IPv4/IPv6") the flowi6_flags
update was lost.
This commit re-add it just before the routing decision.

Fixes: bbde9fc182 ("netfilter: factor out packet duplication for IPv4/IPv6")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-05-30 12:21:23 +02:00
Taehee Yoo
b7a8daa9f3 netfilter: nf_ct_helper: Fix helper unregister count.
helpers should unregister the only registered ports.
but, helper cannot have correct registered ports value when
failed to register.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-05-30 12:21:22 +02:00
Shmulik Ladkani
0e6b525982 net: l2tp: Make l2tp_ip6 namespace aware
l2tp_ip6 tunnel and session lookups were still using init_net, although
the l2tp core infrastructure already supports lookups keyed by 'net'.

As a result, l2tp_ip6_recv discarded packets for tunnels/sessions
created in namespaces other than the init_net.

Fix, by using dev_net(skb->dev) or sock_net(sk) where appropriate.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-30 00:03:53 -07:00
Baozeng Ding
421eeea10d ieee802154: fix logic error in ieee802154_llsec_parse_dev_addr
Fix a logic error to avoid potential null pointer dereference.

Signed-off-by: Baozeng Ding <sploving1@gmail.com>
Reviewed-by: Stefan Schmidt<stefan@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-29 22:36:25 -07:00
Andy Shevchenko
0d08df6c49 net/lapb: tuse %*ph to dump buffers
Use %*ph specifier to dump small buffers in hex format instead doing this
byte-by-byte.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-29 22:33:25 -07:00
Arnd Bergmann
fabb13db44 fou: add Kconfig options for IPv6 support
A previous patch added the fou6.ko module, but that failed to link
in a couple of configurations:

net/built-in.o: In function `ip6_tnl_encap_add_fou_ops':
net/ipv6/fou6.c:88: undefined reference to `ip6_tnl_encap_add_ops'
net/ipv6/fou6.c:94: undefined reference to `ip6_tnl_encap_add_ops'
net/ipv6/fou6.c:97: undefined reference to `ip6_tnl_encap_del_ops'
net/built-in.o: In function `ip6_tnl_encap_del_fou_ops':
net/ipv6/fou6.c:106: undefined reference to `ip6_tnl_encap_del_ops'
net/ipv6/fou6.c:107: undefined reference to `ip6_tnl_encap_del_ops'

If CONFIG_IPV6=m, ip6_tnl_encap_add_ops/ip6_tnl_encap_del_ops
are in a module, but fou6.c can still be built-in, and that
obviously fails to link.

Also, if CONFIG_IPV6=y, but CONFIG_IPV6_TUNNEL=m or
CONFIG_IPV6_TUNNEL=n, the same problem happens for a different
reason.

This adds two new silent Kconfig symbols to work around both
problems:

- CONFIG_IPV6_FOU is now always set to 'm' if either CONFIG_NET_FOU=m
  or CONFIG_IPV6=m
- CONFIG_IPV6_FOU_TUNNEL is set implicitly when IPV6_FOU is enabled
  and NET_FOU_IP_TUNNELS is also turned out, and it will ensure
  that CONFIG_IPV6_TUNNEL is also available.

The options could be made user-visible as well, to give additional
room for configuration, but it seems easier not to bother users
with more choice here.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: aa3463d65e ("fou: Add encap ops for IPv6 tunnels")
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-29 22:24:21 -07:00
Arnd Bergmann
287980e49f remove lots of IS_ERR_VALUE abuses
Most users of IS_ERR_VALUE() in the kernel are wrong, as they
pass an 'int' into a function that takes an 'unsigned long'
argument. This happens to work because the type is sign-extended
on 64-bit architectures before it gets converted into an
unsigned type.

However, anything that passes an 'unsigned short' or 'unsigned int'
argument into IS_ERR_VALUE() is guaranteed to be broken, as are
8-bit integers and types that are wider than 'unsigned long'.

Andrzej Hajda has already fixed a lot of the worst abusers that
were causing actual bugs, but it would be nice to prevent any
users that are not passing 'unsigned long' arguments.

This patch changes all users of IS_ERR_VALUE() that I could find
on 32-bit ARM randconfig builds and x86 allmodconfig. For the
moment, this doesn't change the definition of IS_ERR_VALUE()
because there are probably still architecture specific users
elsewhere.

Almost all the warnings I got are for files that are better off
using 'if (err)' or 'if (err < 0)'.
The only legitimate user I could find that we get a warning for
is the (32-bit only) freescale fman driver, so I did not remove
the IS_ERR_VALUE() there but changed the type to 'unsigned long'.
For 9pfs, I just worked around one user whose calling conventions
are so obscure that I did not dare change the behavior.

I was using this definition for testing:

 #define IS_ERR_VALUE(x) ((unsigned long*)NULL == (typeof (x)*)NULL && \
       unlikely((unsigned long long)(x) >= (unsigned long long)(typeof(x))-MAX_ERRNO))

which ends up making all 16-bit or wider types work correctly with
the most plausible interpretation of what IS_ERR_VALUE() was supposed
to return according to its users, but also causes a compile-time
warning for any users that do not pass an 'unsigned long' argument.

I suggested this approach earlier this year, but back then we ended
up deciding to just fix the users that are obviously broken. After
the initial warning that caused me to get involved in the discussion
(fs/gfs2/dir.c) showed up again in the mainline kernel, Linus
asked me to send the whole thing again.

[ Updated the 9p parts as per Al Viro  - Linus ]

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andrzej Hajda <a.hajda@samsung.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.org/lkml/2016/1/7/363
Link: https://lkml.org/lkml/2016/5/27/486
Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> # For nvmem part
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-27 15:26:11 -07:00
Linus Torvalds
a10c38a4f3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph updates from Sage Weil:
 "This changeset has a few main parts:

   - Ilya has finished a huge refactoring effort to sync up the
     client-side logic in libceph with the user-space client code, which
     has evolved significantly over the last couple years, with lots of
     additional behaviors (e.g., how requests are handled when cluster
     is full and transitions from full to non-full).

     This structure of the code is more closely aligned with userspace
     now such that it will be much easier to maintain going forward when
     behavior changes take place.  There are some locking improvements
     bundled in as well.

   - Zheng adds multi-filesystem support (multiple namespaces within the
     same Ceph cluster)

   - Zheng has changed the readdir offsets and directory enumeration so
     that dentry offsets are hash-based and therefore stable across
     directory fragmentation events on the MDS.

   - Zheng has a smorgasbord of bug fixes across fs/ceph"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (71 commits)
  ceph: fix wake_up_session_cb()
  ceph: don't use truncate_pagecache() to invalidate read cache
  ceph: SetPageError() for writeback pages if writepages fails
  ceph: handle interrupted ceph_writepage()
  ceph: make ceph_update_writeable_page() uninterruptible
  libceph: make ceph_osdc_wait_request() uninterruptible
  ceph: handle -EAGAIN returned by ceph_update_writeable_page()
  ceph: make fault/page_mkwrite return VM_FAULT_OOM for -ENOMEM
  ceph: block non-fatal signals for fault/page_mkwrite
  ceph: make logical calculation functions return bool
  ceph: tolerate bad i_size for symlink inode
  ceph: improve fragtree change detection
  ceph: keep leaf frag when updating fragtree
  ceph: fix dir_auth check in ceph_fill_dirfrag()
  ceph: don't assume frag tree splits in mds reply are sorted
  ceph: fix inode reference leak
  ceph: using hash value to compose dentry offset
  ceph: don't forbid marking directory complete after forward seek
  ceph: record 'offset' for each entry of readdir result
  ceph: define 'end/complete' in readdir reply as bit flags
  ...
2016-05-26 14:10:32 -07:00
Linus Torvalds
ea8ea737c4 NFS client updates for Linux 4.7
Highlights include:
 
 Features:
 - Add support for the NFS v4.2 COPY operation
 - Add support for NFS/RDMA over IPv6
 
 Bugfixes and cleanups:
 - Avoid race that crashes nfs_init_commit()
 - Fix oops in callback path
 - Fix LOCK/OPEN race when unlinking an open file
 - Choose correct stateids when using delegations in setattr, read and write
 - Don't send empty SETATTR after OPEN_CREATE
 - xprtrdma: Prevent server from writing a reply into memory client has released
 - xprtrdma: Support using Read list and Reply chunk in one RPC call
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJXRu76AAoJENfLVL+wpUDrDVoQAKPKv1tEVJMRUQA3UVoKoixd
 KjmmZMjl6GfpISwTZl+a8W549jyGuYH7Gl8vSbMaE9/FI+kJW6XZQniTYfFqY8/a
 LbMSdNx1+yURisbkyO0vPqqwKw9r6UmsfGeUT8SpS3ff61yp4Oj436ra2qcPJsZ3
 cWl/lHItzX7oKFAWmr0Nmq2X8ac/8+NFyK29+V/QGfwtp3qAPbpA8XM5HrHw3rA2
 uk5uNSr3hwqz7P3+Hi7ZoO2m4nQTAbQnEunfYpxlOwz4IaM7qcGnntT6Jhwq1pGE
 /1YasG7bHeiWjhynmZZ4CWuMkogau2UJ/G68Cz7ehLhPNr8rH/ZFCJZ+XX0e0CgI
 1d+AwxZvgszIQVBY3S7sg8ezVSCPBXRFJ8rtzggGscqC53aP7L+rLfUFH+OKrhMg
 6n7RQiq4EmGDJGviB/R2HixI9CpdOf2puNhDKSJmPOqiSS7UuHMw8QCq++vdru+1
 GLGunGyO7D70yTV92KtsdzJlFlnfa/g+FIJrmaMpL3HH1h0stTctWX5xlTYmqEL3
 z3aUuT8RySk2t1FTabSj6KRWqE/krK5BMZbX91kpF27WL4c/olXFaZPqBDsj0q4u
 2rm1fIrc8RxLXctJan9ro092s/e9dup/1JxV5XWMq/EGS1ezvf+0XkCOtURaAWp3
 2aPHlx7M8iuq2SouL6f7
 =QMmY
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.7-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "Highlights include:

  Features:
   - Add support for the NFS v4.2 COPY operation
   - Add support for NFS/RDMA over IPv6

  Bugfixes and cleanups:
   - Avoid race that crashes nfs_init_commit()
   - Fix oops in callback path
   - Fix LOCK/OPEN race when unlinking an open file
   - Choose correct stateids when using delegations in setattr, read and
     write
   - Don't send empty SETATTR after OPEN_CREATE
   - xprtrdma: Prevent server from writing a reply into memory client
     has released
   - xprtrdma: Support using Read list and Reply chunk in one RPC call"

* tag 'nfs-for-4.7-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (61 commits)
  pnfs: pnfs_update_layout needs to consider if strict iomode checking is on
  nfs/flexfiles: Use the layout segment for reading unless it a IOMODE_RW and reading is disabled
  nfs/flexfiles: Helper function to detect FF_FLAGS_NO_READ_IO
  nfs: avoid race that crashes nfs_init_commit
  NFS: checking for NULL instead of IS_ERR() in nfs_commit_file()
  pnfs: make pnfs_layout_process more robust
  pnfs: rework LAYOUTGET retry handling
  pnfs: lift retry logic from send_layoutget to pnfs_update_layout
  pnfs: fix bad error handling in send_layoutget
  flexfiles: add kerneldoc header to nfs4_ff_layout_prepare_ds
  flexfiles: remove pointless setting of NFS_LAYOUT_RETURN_REQUESTED
  pnfs: only tear down lsegs that precede seqid in LAYOUTRETURN args
  pnfs: keep track of the return sequence number in pnfs_layout_hdr
  pnfs: record sequence in pnfs_layout_segment when it's created
  pnfs: don't merge new ff lsegs with ones that have LAYOUTRETURN bit set
  pNFS/flexfiles: When initing reads or writes, we might have to retry connecting to DSes
  pNFS/flexfiles: When checking for available DSes, conditionally check for MDS io
  pNFS/flexfile: Fix erroneous fall back to read/write through the MDS
  NFS: Reclaim writes via writepage are opportunistic
  NFSv4: Use the right stateid for delegations in setattr, read and write
  ...
2016-05-26 10:33:33 -07:00
Xin Long
bed187b540 sctp: fix double EPs display in sctp_diag
We have this situation: that EP hash table, contains only the EPs
that are listening, while the transports one, has the opposite.
We have to traverse both to dump all.

But when we traverse the transports one we will also get EPs that are
in the EP hash if they are listening. In this case, the EP is dumped
twice.

We will fix it by checking if the endpoint that is in the endpoint
hash table contains any ep->asoc in there, as it means we will also
find it via transport hash, and thus we can/should skip it, depending
on the filters used, like 'ss -l'.

Still, we should NOT skip it if the user is listing only listening
endpoints, because then we are not traversing the transport hash.
so we have to check idiag_states there also.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 22:14:31 -07:00
Yan, Zheng
0e76abf21e libceph: make ceph_osdc_wait_request() uninterruptible
Ceph_osdc_wait_request() is used when cephfs issues sync IO. In most
cases, the sync IO should be uninterruptible. The fix is use killale
wait function in ceph_osdc_wait_request().

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-05-26 01:15:40 +02:00
Zhang Zhuoyu
3b33f692c8 ceph: make logical calculation functions return bool
This patch makes serverl logical caculation functions return bool to
improve readability due to these particular functions only using 0/1
as their return value.

No functional change.

Signed-off-by: Zhang Zhuoyu <zhangzhuoyu@cmss.chinamobile.com>
2016-05-26 01:15:39 +02:00
Ilya Dryomov
737cc81ead libceph: support for subscribing to "mdsmap.<id>" maps
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 01:15:30 +02:00
Ilya Dryomov
7cca78c9dc libceph: replace ceph_monc_request_next_osdmap()
... with a wrapper around maybe_request_map() - no need for two
osdmap-specific functions.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 01:15:30 +02:00
Ilya Dryomov
b4f3479569 libceph: take osdc->lock in osdmap_show() and dump flags in hex
There is now about a dozen CEPH_OSDMAP_* flags.  This is a debugging
interface, so just dump in hex instead of spelling each flag out.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 01:15:29 +02:00
Ilya Dryomov
4609245e26 libceph: pool deletion detection
This adds the "map check" infrastructure for sending osdmap version
checks on CALC_TARGET_POOL_DNE and completing in-flight requests with
-ENOENT if the target pool doesn't exist or has just been deleted.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 01:15:29 +02:00
Ilya Dryomov
d0b19705e9 libceph: async MON client generic requests
For map check, we are going to need to send CEPH_MSG_MON_GET_VERSION
messages asynchronously and get a callback on completion.  Refactor MON
client to allow firing off generic requests asynchronously and add an
async variant of ceph_monc_get_version().  ceph_monc_do_statfs() is
switched over and remains sync.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 01:15:29 +02:00
Ilya Dryomov
b07d3c4bd7 libceph: support for checking on status of watch
Implement ceph_osdc_watch_check() to be able to check on status of
watch.  Note that the time it takes for a watch/notify event to get
delivered through the notify_wq is taken into account.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 01:15:28 +02:00
Ilya Dryomov
1907920324 libceph: support for sending notifies
Implement ceph_osdc_notify() for sending notifies.

Due to the fact that the current messenger can't do read-in into
pagelists (it can only do write-out from them), I had to go with a page
vector for a NOTIFY_COMPLETE payload, for now.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 01:15:28 +02:00
Ilya Dryomov
922dab6134 libceph, rbd: ceph_osd_linger_request, watch/notify v2
This adds support and switches rbd to a new, more reliable version of
watch/notify protocol.  As with the OSD client update, this is mostly
about getting the right structures linked into the right places so that
reconnects are properly sent when needed.  watch/notify v2 also
requires sending regular pings to the OSDs - send_linger_ping().

A major change from the old watch/notify implementation is the
introduction of ceph_osd_linger_request - linger requests no longer
piggy back on ceph_osd_request.  ceph_osd_event has been merged into
ceph_osd_linger_request.

All the details are now hidden within libceph, the interface consists
of a simple pair of watch/unwatch functions and ceph_osdc_notify_ack().
ceph_osdc_watch() does return ceph_osd_linger_request, but only to keep
the lifetime management simple.

ceph_osdc_notify_ack() accepts an optional data payload, which is
relayed back to the notifier.

Portions of this patch are loosely based on work by Douglas Fuller
<dfuller@redhat.com> and Mike Christie <michaelc@cs.wisc.edu>.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 01:15:02 +02:00
Ilya Dryomov
42b0696527 libceph: wait_request_timeout()
The unwatch timeout is currently implemented in rbd.  With
watch/unwatch code moving into libceph, we are going to need
a ceph_osdc_wait_request() variant with a timeout.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 01:14:06 +02:00
Ilya Dryomov
3540bfdb30 libceph: request_init() and request_release_checks()
These are going to be used by request_reinit() code.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 01:14:05 +02:00
Ilya Dryomov
5aea3dcd50 libceph: a major OSD client update
This is a major sync up, up to ~Jewel.  The highlights are:

- per-session request trees (vs a global per-client tree)
- per-session locking (vs a global per-client rwlock)
- homeless OSD session
- no ad-hoc global per-client lists
- support for pool quotas
- foundation for watch/notify v2 support
- foundation for map check (pool deletion detection) support

The switchover is incomplete: lingering requests can be setup and
teared down but aren't ever reestablished.  This functionality is
restored with the introduction of the new lingering infrastructure
(ceph_osd_linger_request, linger_work, etc) in a later commit.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 01:14:03 +02:00