Bart Van Assche recently reported a warning to me:
<IRQ> [<ffffffff8103d79f>] warn_slowpath_common+0x7f/0xc0
[<ffffffff8103d7fa>] warn_slowpath_null+0x1a/0x20
[<ffffffff814761dd>] mutex_trylock+0x16d/0x180
[<ffffffff813968c9>] netpoll_poll_dev+0x49/0xc30
[<ffffffff8136a2d2>] ? __alloc_skb+0x82/0x2a0
[<ffffffff81397715>] netpoll_send_skb_on_dev+0x265/0x410
[<ffffffff81397c5a>] netpoll_send_udp+0x28a/0x3a0
[<ffffffffa0541843>] ? write_msg+0x53/0x110 [netconsole]
[<ffffffffa05418bf>] write_msg+0xcf/0x110 [netconsole]
[<ffffffff8103eba1>] call_console_drivers.constprop.17+0xa1/0x1c0
[<ffffffff8103fb76>] console_unlock+0x2d6/0x450
[<ffffffff8104011e>] vprintk_emit+0x1ee/0x510
[<ffffffff8146f9f6>] printk+0x4d/0x4f
[<ffffffffa0004f1d>] scsi_print_command+0x7d/0xe0 [scsi_mod]
This resulted from my commit ca99ca14c which introduced a mutex_trylock
operation in a path that could execute in interrupt context. When mutex
debugging is enabled, the above warns the user when we are in fact
exectuting in interrupt context
interrupt context.
After some discussion, It seems that a semaphore is the proper mechanism to use
here. While mutexes are defined to be unusable in interrupt context, no such
condition exists for semaphores (save for the fact that the non blocking api
calls, like up and down_trylock must be used when in irq context).
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Bart Van Assche <bvanassche@acm.org>
CC: Bart Van Assche <bvanassche@acm.org>
CC: David Miller <davem@davemloft.net>
CC: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit f9c2288837 (netlink:
implement memory mapped recvmsg) increamented skb->users
ref count twice for a dump op which does not look right.
Following patch fixes that.
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Deal with changes in newer xtables while maintaining backward
compatibility. Thanks to Jan Engelhardt for suggestions.
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bridge can crash while trying to send topology change packet.
This happens if root port can't be found. This was reported by user
but currently unable to reproduce it easily. The STP conditions that cause
this are not known yet, but the problem doesn't have to be fatal.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
drivers/net/ethernet/emulex/benet/be.h
include/net/tcp.h
net/mac802154/mac802154.h
Most conflicts were minor overlapping stuff.
The be2net driver brought in some fixes that added __vlan_put_tag
calls, which in net-next take an additional argument.
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, peeking on a unix stream socket with an offset larger than len of
the data in the sk receive queue returns immediately with bogus data.
This patch fixes this so that the behavior is the same as peeking with no
offset on an empty queue: the caller blocks.
Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, peeking on a unix datagram socket with an offset larger than len of
the data in the sk receive queue returns immediately with bogus data. That's
because *off is not reset between each skb_queue_walk().
This patch fixes this so that the behavior is the same as peeking with no
offset on an empty queue: the caller blocks.
Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
"77c1090 net: fix infinite loop in __skb_recv_datagram()" (v3.8) introduced a
regression:
After that commit, recv can no longer peek beyond a 0-sized skb in the queue.
__skb_recv_datagram() instead stops at the first skb with len == 0 and results
in the system call failing with -EFAULT via skb_copy_datagram_iovec().
When peeking at an offset with 0-sized skb(s), each one of those is received
only once, in sequence. The offset starts moving forward again after receiving
datagrams with len > 0.
Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The only user is get_dpifindex(), no need to redirect via the port
operations.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use consume_skb() to free the original skb that is successfully transmitted
as gso segmented skbs so that it is not treated as a drop due to an error.
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linux immediately returns SYNACK on (spurious) SYN retransmits, but
keeps the SYNACK timer running independently. Thus the timer may
fire right after the SYNACK retransmit and causes a SYN-SYNACK
cross-fire burst.
Adopt the fast retransmit/recovery idea in established state by
re-arming the SYNACK timer after the fast (SYNACK) retransmit. The
timer may fire late up to 500ms due to the current SYNACK timer wheel,
but it's OK to be conservative when network is congested. Eric's new
listener design should address this issue.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add MIB counters for checksum errors in IP layer,
and TCP/UDP/ICMP layers, to help diagnose problems.
$ nstat -a | grep Csum
IcmpInCsumErrors 72 0.0
TcpInCsumErrors 382 0.0
UdpInCsumErrors 463221 0.0
Icmp6InCsumErrors 75 0.0
Udp6InCsumErrors 173442 0.0
IpExtInCsumErrors 10884 0.0
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of feeding net_secret[] at boot time, defer the init
at the point first socket is created.
This permits some platforms to use better entropy sources than
the ones available at boot time.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- One patch for moving the LLCP code into net/nfc.
It fixes a build annoyance reported by Dave Miller caused by the fact
that the LLCP code object targets are not in the same directory as the
Makefile trying to build them is. It prevents us from doing e.g.
make net/nfc/llcp/sock.o
Moving the LLCP code into net/nfc and not making it optional anymore
makes sense as LLCP is a fundamental piece of the NFC specifications
and thus should be in the core NFC directory.
- One patch that fixes the missing dependency against RFKILL. Without it NFC
fails to properly build when it's builtin and CONFIG_RFKILL=m.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJRewn5AAoJEIqAPN1PVmxKeDsQAJPGgwKPymgMwRceB6bwSQnx
hlbHkYVxgJd7wTK3Kz5oXf1i/Krz3K39845482qetS0I02/YwmAkrMHJ0YpbAvpq
+TCjKk8sYvJniq157GV05zHG5mCjMr+rh2yQ1D/2GQwiI39KHnXaJsnKV0hhlwl9
5/Z+RB1y83Qaz/ug5Y/ZlFmRxlVTIjIs/+KncC2v1KpKw050Amst+C8ScuAfpDny
ywvweCDUXh4aUteN1gC9+nelwczJDq2rbeP4oSoZctsoUmcO2WLtBuVc/lKSJBSE
8Ttje4IDRcSyn1zPAAwWV8O/zx0urBhskuyv7DNIuNDRjTIK3KdpVuNwEnUG0bO1
+DJ3GfsGQLSwZCi2YQ1ABUQc2kqlUFM10UvVObRrbD2Gwq0BbpvrkKsvbJUWpqWh
9zQe8G9lLHW//MF/sAvXcR+AezU8lZ/uU/qNVse/Yz0vBFUMYXthOd67ajybaSnW
cHf139buAn6kaW9LnxJs4vbHv1qKlVP6a4aAWC2elaHy8nv+UBx62vlrEJg5Jyle
kO7kHzesTK7NnC5Up0xQWbxQfvuoqINNhBI6IyMqb49L27MFDci+yN3aojfu+xGt
i/AH1fwltopRYd5bK+BIjpeo+G3n1FonxW/w1OOaVHr+4aEjSfAerUk59yoC+nMo
kdk7eBpn8uXVbYNjc+Ij
=8VFM
-----END PGP SIGNATURE-----
Merge tag 'nfc-next-3.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next
Samuel Ortiz <sameo@linux.intel.com> says:
"With this one we have:
- One patch for moving the LLCP code into net/nfc.
It fixes a build annoyance reported by Dave Miller caused by the fact
that the LLCP code object targets are not in the same directory as the
Makefile trying to build them is. It prevents us from doing e.g.
make net/nfc/llcp/sock.o
Moving the LLCP code into net/nfc and not making it optional anymore
makes sense as LLCP is a fundamental piece of the NFC specifications
and thus should be in the core NFC directory.
- One patch that fixes the missing dependency against RFKILL. Without it NFC
fails to properly build when it's builtin and CONFIG_RFKILL=m."
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Pablo Neira Ayuso says:
====================
The following patchset contains relevant updates for the Netfilter
tree, they are:
* Enhancements for ipset: Add the counter extension for sets, this
information can be used from the iptables set match, to change
the matching behaviour. Jozsef required to add the extension
infrastructure and moved the existing timeout support upon it.
This also includes a change in net/sched/em_ipset to adapt it to
the new extension structure.
* Enhancements for performance boosting in nfnetlink_queue: Add new
configuration flags that allows user-space to receive big packets (GRO)
and to disable checksumming calculation. This were proposed by Eric
Dumazet during the Netfilter Workshop 2013 in Copenhagen. Florian
Westphal was kind enough to find the time to materialize the proposal.
* A sparse fix from Simon, he noticed it in the SCTP NAT helper, the fix
required a change in the interface of sctp_end_cksum.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the type of the crc32 parameter of sctp_end_cksum()
from __be32 to __u32 to reflect that fact that it is passed
to cpu_to_le32().
There are five in-tree users of sctp_end_cksum().
The following four had warnings flagged by sparse which are
no longer present with this change.
net/netfilter/ipvs/ip_vs_proto_sctp.c:sctp_nat_csum()
net/netfilter/ipvs/ip_vs_proto_sctp.c:sctp_csum_check()
net/sctp/input.c:sctp_rcv_checksum()
net/sctp/output.c:sctp_packet_transmit()
The fifth user is net/netfilter/nf_nat_proto_sctp.c:sctp_manip_pkt().
It has been updated to pass a __u32 instead of a __be32,
the value in question was already calculated in cpu byte-order.
net/netfilter/nf_nat_proto_sctp.c:sctp_manip_pkt() has also
been updated to assign the return value of sctp_end_cksum()
directly to a variable of type __le32, matching the
type of the return value. Previously the return value
was assigned to a variable of type __be32 and then that variable
was finally assigned to another variable of type __le32.
Problems flagged by sparse.
Compile and sparse tested only.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Userspace can now indicate that it can cope with larger-than-mtu sized
packets and packets that have invalid ipv4/tcp checksums.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Once we allow userspace to receive gso/gro packets, userspace
needs to be able to determine when checksums appear to be
broken, but are not.
NFQA_SKB_CSUMNOTREADY means 'checksums will be fixed in kernel
later, pretend they are ok'.
NFQA_SKB_GSO could be used for statistics, or to determine when
packet size exceeds mtu.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
skb_gso_segment is expensive, so it would be nice if we could
avoid it in the future. However, userspace needs to be prepared
to receive larger-than-mtu-packets (which will also have incorrect
l3/l4 checksums), so we cannot simply remove it.
The plan is to add a per-queue feature flag that userspace can
set when binding the queue.
The problem is that in nf_queue, we only have a queue number,
not the queue context/configuration settings.
This patch should have no impact other than the skb_gso_segment
call now being in a function that has access to the queue config
data.
A new size attribute in nf_queue_entry is needed so
nfnetlink_queue can duplicate the entry of the gso skb
when segmenting the skb while also copying the route key.
The follow up patch adds switch to disable skb_gso_segment when
queue config says so.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
required by future patch that will need to duplicate the
nf_queue_entry, bumping refcounts of the copy.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The new revision of the set match supports to match the counters
and to suppress updating the counters at matching too.
At the set:list types, the updating of the subcounters can be
suppressed as well.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Introduce extensions to elements in the core and prepare timeout as
the first one.
This patch also modifies the em_ipset classifier to use the new
extension struct layout.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Following patch adds icmp-registration module for ipv6. It allows
ipv6 protocol to register icmp_sender which is used for sending
ipv6 icmp msgs. This extra layer allows us to kill ipv6 dependency
for sending icmp packets.
This patch also fixes ip_tunnel compilation problem when ip_tunnel
is statically compiled in kernel but ipv6 is module
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows to dump BPF filters attached to a socket with
SO_ATTACH_FILTER.
Note that we check CAP_SYS_ADMIN before allowing to dump this info.
For now, only AF_PACKET sockets use this feature.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_rmem_alloc is disclosed via /proc/net/packet but not via netlink messages.
The goal is to have the same level of information.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This value is disclosed via /proc/net/packet but not via netlink messages.
The goal is to have the same level of information.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change MAC802154_CHAN_NONE from ~(u8)0 to 0xff, or the comparison in
mac802154_wpan_xmit() for ``chan == MAC802154_CHAN_NONE'' will not
succeed.
This bug can be boiled down to ``u8 foo = 0xff; if (foo == ~(u8)0)
[...] else [...]'' where the condition will always take the else
branch.
Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current bridge fdb update code does not seem to update the port
during fdb update. This patch adds a check for fdb dst (port)
change during fdb update. Also rearranges the call to
fdb_notify to send only one notification for create and update.
Changelog:
v2 - Change notify flag to bool
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
'attrbuf' is malloced in genl_family_rcv_msg() when family->maxattr &&
family->parallel_ops, thus should be freed before leaving from the error
handling cases, otherwise it will cause memory leak.
Introduced by commit def3117493
(genl: Allow concurrent genl callbacks.)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the NFC subsystem gained RFKILL support, it needs to be able
to build properly with whatever option for RFKILL has been selected.
on i386:
net/built-in.o: In function `nfc_unregister_device':
(.text+0x6a36d): undefined reference to `rfkill_unregister'
net/built-in.o: In function `nfc_unregister_device':
(.text+0x6a378): undefined reference to `rfkill_destroy'
net/built-in.o: In function `nfc_register_device':
(.text+0x6a493): undefined reference to `rfkill_alloc'
net/built-in.o: In function `nfc_register_device':
(.text+0x6a4a4): undefined reference to `rfkill_register'
net/built-in.o: In function `nfc_register_device':
(.text+0x6a4b3): undefined reference to `rfkill_destroy'
net/built-in.o: In function `nfc_dev_up':
(.text+0x6a8e8): undefined reference to `rfkill_blocked'
when CONFIG_RFKILL=m but NFC is builtin.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
And stop making it optional. LLCP is a fundamental part of the NFC
specifications and making it optional does not make much sense.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
If gcc (e.g. 4.1.2) decides not to inline vsock_init_tables(), this will
cause a section mismatch:
WARNING: net/vmw_vsock/vsock.o(.text+0x1bc): Section mismatch in reference from the function __vsock_core_init() to the function .init.text:vsock_init_tables()
The function __vsock_core_init() references
the function __init vsock_init_tables().
This is often because __vsock_core_init lacks a __init
annotation or the annotation of vsock_init_tables is wrong.
This may cause crashes if VSOCKETS=y and VMWARE_VMCI_VSOCKETS=m.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we call vsock_core_init to init VSOCK the second time,
vsock_device.minor still points to the old dynamically allocated minor
number. misc_register will allocate it for us successfully as if we were
asking for a static one. However, when other user call misc_register to
allocate a dynamic minor number, it will give the one used by
vsock_core_init(), causing this:
[ 405.470687] WARNING: at fs/sysfs/dir.c:536 sysfs_add_one+0xcc/0xf0()
[ 405.470689] Hardware name: OptiPlex 790
[ 405.470690] sysfs: cannot create duplicate filename '/dev/char/10:54'
Always set vsock_device.minor to MISC_DYNAMIC_MINOR before we
register.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andy King <acking@vmware.com>
Cc: Dmitry Torokhov <dtor@vmware.com>
Cc: Reilly Grant <grantr@vmware.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Asias He <asias@redhat.com>
Acked-by: Dmitry Torokhov <dtor@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 6681712d67
vxlan: generalize forwarding tables
relaxed the address checks in rtnl_fdb_del() to use is_zero_ether_addr().
This allows users to add multicast addresses using the fdb API. However,
the check in rtnl_fdb_del() still uses a more strict
is_valid_ether_addr() which rejects multicast addresses. Thus it
is possible to add an fdb that can not be later removed.
Relax the check in rtnl_fdb_del() as well.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Need remove erroneous semicolon, which is found by EXTRA_CFLAGS=-W,
the related commit number: c544193214
("GRE: Refactor GRE tunneling code")
Signed-off-by: Chen Gang <gang.chen@asianux.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sockaddr_nfc_llcp struct has as hole between ->sa_family and
->dev_idx so I've added a memset() to clear it and prevent an
information leak.
Also the ->nfc_protocol element wasn't set so I've added that.
"uaddr->sa_family" and "llcp_addr->sa_family" are the same thing but
it's less confusing to use llcp_addr consistently throughout.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>