Commit Graph

18098 Commits

Author SHA1 Message Date
David S. Miller
1a898592b2 xfrm: Mark flowi arg to xfrm_init_tempstate() const.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-22 18:22:34 -08:00
David S. Miller
4a08ab0fe4 xfrm: Mark flowi arg to xfrm_state_look_at() const.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-22 18:21:31 -08:00
David S. Miller
e1ad2ab2cf xfrm: Mark flowi arg to xfrm_selector_match() const.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-22 18:07:39 -08:00
David S. Miller
8f029de281 xfrm: Mark flowi arg to xfrm_type->reject() const.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-22 17:59:59 -08:00
David S. Miller
73e5ebb20f xfrm: Mark flowi arg to ->init_tempsel() const.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-22 17:51:44 -08:00
David S. Miller
0c7b3eefb4 xfrm: Mark flowi arg to ->fill_dst() const.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-22 17:48:57 -08:00
David S. Miller
05d8402576 xfrm: Mark flowi arg to ->get_tos() const.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-22 17:47:10 -08:00
stephen hemminger
86fce3ba1e cls_u32: fix sparse warnings
The variable _data is used in asm-generic to define sections
which causes sparse warnings, so just rename the variable.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-22 11:22:33 -08:00
David S. Miller
2a3bcfdde6 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next-2.6 2011-02-22 10:21:36 -08:00
Eric Dumazet
eaefd1105b net: add __rcu annotations to sk_wq and wq
Add proper RCU annotations/verbs to sk_wq and wq members

Fix __sctp_write_space() sk_sleep() abuse (and sock->wq access)

Fix sunrpc sk_sleep() abuse too

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-22 10:19:31 -08:00
Shan Wei
59ed5aba9c sctp: fix compile warnings in sctp_tsnmap_num_gabs
net/sctp/tsnmap.c: In function ‘sctp_tsnmap_num_gabs’:
net/sctp/tsnmap.c:347: warning: ‘start’ may be used uninitialized in this function
net/sctp/tsnmap.c:347: warning: ‘end’ may be used uninitialized in this function

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-20 11:10:15 -08:00
Shan Wei
089c34827e tcp: Remove debug macro of TCP_CHECK_TIMER
Now, TCP_CHECK_TIMER is not used for debuging, it does nothing.
And, it has been there for several years, maybe 6 years.

Remove it to keep code clearer.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-20 11:10:14 -08:00
David S. Miller
da935c66ba Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	Documentation/feature-removal-schedule.txt
	drivers/net/e1000e/netdev.c
	net/xfrm/xfrm_policy.c
2011-02-19 19:17:35 -08:00
Jiri Bohac
2205a6ea93 sctp: fix reporting of unknown parameters
commit 5fa782c2f5 re-worked the
handling of unknown parameters. sctp_init_cause_fixed() can now
return -ENOSPC if there is not enough tailroom in the error
chunk skb. When this happens, the error header is not appended to
the error chunk. In that case, the payload of the unknown parameter
should not be appended either.

Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-19 19:06:55 -08:00
Eric Dumazet
91035f0b7d tcp: fix inet_twsk_deschedule()
Eric W. Biederman reported a lockdep splat in inet_twsk_deschedule()

This is caused by inet_twsk_purge(), run from process context,
and commit 575f4cd5a5 (net: Use rcu lookups in inet_twsk_purge.)
removed the BH disabling that was necessary.

Add the BH disabling but fine grained, right before calling
inet_twsk_deschedule(), instead of whole function.

With help from Linus Torvalds and Eric W. Biederman

Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Daniel Lezcano <daniel.lezcano@free.fr>
CC: Pavel Emelyanov <xemul@openvz.org>
CC: Arnaldo Carvalho de Melo <acme@redhat.com>
CC: stable <stable@kernel.org> (# 2.6.33+)
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-19 18:59:04 -08:00
David S. Miller
ece639caa3 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6 2011-02-19 16:42:37 -08:00
Linus Torvalds
4c3021da45 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (37 commits)
  net: deinit automatic LIST_HEAD
  net: dont leave active on stack LIST_HEAD
  net: provide default_advmss() methods to blackhole dst_ops
  tg3: Restrict phy ioctl access
  drivers/net: Call netif_carrier_off at the end of the probe
  ixgbe: work around for DDP last buffer size
  ixgbe: fix panic due to uninitialised pointer
  e1000e: flush all writebacks before unload
  e1000e: check down flag in tasks
  isdn: hisax: Use l2headersize() instead of dup (and buggy) func.
  arp_notify: unconditionally send gratuitous ARP for NETDEV_NOTIFY_PEERS.
  cxgb4vf: Use defined Mailbox Timeout
  cxgb4vf: Quiesce Virtual Interfaces on shutdown ...
  cxgb4vf: Behave properly when CONFIG_DEBUG_FS isn't defined ...
  cxgb4vf: Check driver parameters in the right place ...
  pch_gbe: Fix the MAC Address load issue.
  iwlwifi: Delete iwl3945_good_plcp_health.
  net/can/softing: make CAN_SOFTING_CS depend on CAN_SOFTING
  netfilter: nf_iterate: fix incorrect RCU usage
  pch_gbe: Fix the issue that the receiving data is not normal.
  ...
2011-02-18 14:15:05 -08:00
David S. Miller
9435eb1cf0 ipv4: Implement __ip_dev_find using new interface address hash.
Much quicker than going through the FIB tables.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-18 12:43:09 -08:00
David S. Miller
fd23c3b311 ipv4: Add hash table of interface addresses.
This will be used to optimize __ip_dev_find() and friends.

With help from Eric Dumazet.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-18 12:42:28 -08:00
Eric Dumazet
ceaaec98ad net: deinit automatic LIST_HEAD
commit 9b5e383c11 (net: Introduce
unregister_netdevice_many()) left an active LIST_HEAD() in
rollback_registered(), with possible memory corruption.

Even if device is freed without touching its unreg_list (and therefore
touching the previous memory location holding LISTE_HEAD(single), better
close the bug for good, since its really subtle.

(Same fix for default_device_exit_batch() for completeness)

Reported-by: Michal Hocko <mhocko@suse.cz>
Tested-by: Michal Hocko <mhocko@suse.cz>
Reported-by: Eric W. Biderman <ebiderman@xmission.com>
Tested-by: Eric W. Biderman <ebiderman@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Octavian Purdila <opurdila@ixiacom.com>
CC: stable <stable@kernel.org> [.33+]
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-18 11:49:36 -08:00
Linus Torvalds
f87e6f4793 net: dont leave active on stack LIST_HEAD
Eric W. Biderman and Michal Hocko reported various memory corruptions
that we suspected to be related to a LIST head located on stack, that
was manipulated after thread left function frame (and eventually exited,
so its stack was freed and reused).

Eric Dumazet suggested the problem was probably coming from commit
443457242b (net: factorize
sync-rcu call in unregister_netdevice_many)

This patch fixes __dev_close() and dev_close() to properly deinit their
respective LIST_HEAD(single) before exiting.

References: https://lkml.org/lkml/2011/2/16/304
References: https://lkml.org/lkml/2011/2/14/223

Reported-by: Michal Hocko <mhocko@suse.cz>
Tested-by: Michal Hocko <mhocko@suse.cz>
Reported-by: Eric W. Biderman <ebiderman@xmission.com>
Tested-by: Eric W. Biderman <ebiderman@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-18 11:49:35 -08:00
Eric Dumazet
214f45c91b net: provide default_advmss() methods to blackhole dst_ops
Commit 0dbaee3b37 (net: Abstract default ADVMSS behind an
accessor.) introduced a possible crash in tcp_connect_init(), when
dst->default_advmss() is called from dst_metric_advmss()

Reported-by: George Spelvin <linux@horizon.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-18 11:39:01 -08:00
David S. Miller
982721f391 ipv4: Use const'ify fib_result deep in the route call chains.
The only troublesome bit here is __mkroute_output which wants
to override res->fi and res->type, compute those in local
variables instead.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-17 15:54:42 -08:00
David S. Miller
3b004569d8 ipv4: Avoid use of signed integers in fib_trie code.
GCC emits all kinds of crazy zero extensions when we go from signed
int, to unsigned short, etc. etc.

This transformation has to be legal because:

1) In tkey_extract_bits() in mask_pfx(), the values are used to
   perform shifts, on which negative values are undefined by C.

2) In fib_table_lookup() we perform comparisons with unsigned
   values, constants, and additions.  None of which should
   encounter negative values.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-17 15:49:26 -08:00
David S. Miller
3c7bd1a140 net: Add initial_ref arg to dst_alloc().
This allows avoiding multiple writes to the initial __refcnt.

The most simplest cases of wanting an initial reference of "1"
in ipv4 and ipv6 have been converted, the rest have been left
along and kept at the existing "0".

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-17 15:44:00 -08:00
David S. Miller
0c4dcd58fd ipv4: Consolidate ipv4 dst allocation logic.
This also allows us to combine all the dst->flags settings and avoid
read/modify/write sequences to this struct member.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-17 15:42:37 -08:00
David S. Miller
010c2708e5 ipv4: Move rcu_read_{lock,unlock}() into ip_route_output_slow().
Simplifies tail of __ip_route_output_key().

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-17 15:37:09 -08:00
David S. Miller
5ada552746 ipv4: Simplify output route creation call sequence.
There's a lot of redundancy and unnecessary stack frames
in the output route creation path.

1) Make __mkroute_output() return error pointers.

2) Eliminate ip_mkroute_output() entirely, made possible by #1.

3) Call __mkroute_output() directly and handling the returning error
   pointers in ip_route_output_slow().

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-17 15:29:00 -08:00
Michał Mirosław
e83d360d9a net: introduce NETIF_F_RXCSUM
Introduce NETIF_F_RXCSUM to replace device-private flags for RX checksum
offload. Integrate it with ndo_fix_features.

ethtool_op_get_rx_csum() is removed altogether as nothing in-tree uses it.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-17 14:16:35 -08:00
Michał Mirosław
da8ac86c4a net: use ndo_fix_features for ethtool_ops->set_flags
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-17 14:16:34 -08:00
Michał Mirosław
86794881c2 net: ethtool: use ndo_fix_features for offload setting
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-17 14:16:34 -08:00
Michał Mirosław
5455c6998d net: Introduce new feature setting ops
This introduces a new framework to handle device features setting.
It consists of:
  - new fields in struct net_device:
	+ hw_features - features that hw/driver supports toggling
	+ wanted_features - features that user wants enabled, when possible
  - new netdev_ops:
	+ feat = ndo_fix_features(dev, feat) - API checking constraints for
		enabling features or their combinations
	+ ndo_set_features(dev) - API updating hardware state to match
		changed dev->features
  - new ethtool commands:
	+ ETHTOOL_GFEATURES/ETHTOOL_SFEATURES: get/set dev->wanted_features
		and trigger device reconfiguration if resulting dev->features
		changed
	+ ETHTOOL_GSTRINGS(ETH_SS_FEATURES): get feature bits names (meaning)

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-17 14:16:33 -08:00
Michał Mirosław
0a41770477 ethtool: factorize get/set_one_feature
This allows to enable GRO even if RX csum is disabled. GRO will not
be used for packets without hardware checksum anyway.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-17 14:16:33 -08:00
Michał Mirosław
340ae1654c ethtool: factorize ethtool_get_strings() and ethtool_get_sset_count()
This is needed for unified offloads patch.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-17 14:16:32 -08:00
Michał Mirosław
212b573f55 ethtool: enable GSO and GRO by default
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-17 14:16:32 -08:00
Michał Mirosław
9a279ea3a7 ethtool: move EXPORT_SYMBOL(ethtool_op_set_tx_csum) to correct place
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-17 14:16:31 -08:00
Joerg Marx
0af320fb46 netfilter: ip6t_LOG: fix a flaw in printing the MAC
The flaw was in skipping the second byte in MAC header due to increasing
the pointer AND indexed access starting at '1'.

Signed-off-by: Joerg Marx <joerg.marx@secunet.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-17 16:23:40 +01:00
Florian Westphal
d503b30bd6 netfilter: tproxy: do not assign timewait sockets to skb->sk
Assigning a socket in timewait state to skb->sk can trigger
kernel oops, e.g. in nfnetlink_log, which does:

if (skb->sk) {
        read_lock_bh(&skb->sk->sk_callback_lock);
        if (skb->sk->sk_socket && skb->sk->sk_socket->file) ...

in the timewait case, accessing sk->sk_callback_lock and sk->sk_socket
is invalid.

Either all of these spots will need to add a test for sk->sk_state != TCP_TIME_WAIT,
or xt_TPROXY must not assign a timewait socket to skb->sk.

This does the latter.

If a TW socket is found, assign the tproxy nfmark, but skip the skb->sk assignment,
thus mimicking behaviour of a '-m socket .. -j MARK/ACCEPT' re-routing rule.

The 'SYN to TW socket' case is left unchanged -- we try to redirect to the
listener socket.

Cc: Balazs Scheidler <bazsi@balabit.hu>
Cc: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Florian Westphal <fwestphal@astaro.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-17 11:32:38 +01:00
Ben Hutchings
69a19ee60d net: RPS: Make hardware-accelerated RFS conditional on NETIF_F_NTUPLE
For testing and debugging purposes it is useful to be able to disable
hardware acceleration of RFS without disabling RFS altogether.  Since
this is a similar feature to 'n-tuple' flow steering through the
ethtool API, test the same feature flag that controls that.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2011-02-15 20:36:11 +00:00
David S. Miller
f878b995b0 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next-2.6 2011-02-15 12:25:19 -08:00
Ben Hutchings
5c56580b74 net: Adjust TX queue kobjects if number of queues changes during unregister
If the root qdisc for a net device is mqprio, and the driver's
ndo_setup_tc() operation dynamically adds and remvoes TX queues,
netif_set_real_num_tx_queues() will be called during device
unregistration to remove the extra TX queues when the qdisc is
destroyed.  Currently this causes the corresponding kobjects
to be leaked, and the device's reference count never drops to 0.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2011-02-15 19:45:33 +00:00
David S. Miller
f39925dbde ipv4: Cache learned redirect information in inetpeer.
Note that we do not generate the redirect netevent any longer,
because we don't create a new cached route.

Instead, once the new neighbour is bound to the cached route,
we emit a neigh update event instead.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-14 21:33:27 -08:00
David S. Miller
2c8cec5c10 ipv4: Cache learned PMTU information in inetpeer.
The general idea is that if we learn new PMTU information, we
bump the peer genid.

This triggers the dst_ops->check() code to validate and if
necessary propagate the new PMTU value into the metrics.

Learned PMTU information self-expires.

This means that it is not necessary to kill a cached route
entry just because the PMTU information is too old.

As a consequence:

1) When the path appears unreachable (dst_ops->link_failure
   or dst_ops->negative_advice) we unwind the PMTU state if
   it is out of date, instead of killing the cached route.

   A redirected route will still be invalidated in these
   situations.

2) rt_check_expire(), rt_worker_func(), et al. are no longer
   necessary at all.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-14 21:33:07 -08:00
Ian Campbell
d11327ad66 arp_notify: unconditionally send gratuitous ARP for NETDEV_NOTIFY_PEERS.
NETDEV_NOTIFY_PEER is an explicit request by the driver to send a link
notification while NETDEV_UP/NETDEV_CHANGEADDR generate link
notifications as a sort of side effect.

In the later cases the sysctl option is present because link
notification events can have undesired effects e.g. if the link is
flapping. I don't think this applies in the case of an explicit
request from a driver.

This patch makes NETDEV_NOTIFY_PEER unconditional, if preferred we
could add a new sysctl for this case which defaults to on.

This change causes Xen post-migration ARP notifications (which cause
switches to relearn their MAC tables etc) to be sent by default.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-14 17:47:15 -08:00
Bernard Pidoux
68aa3fd551 ROSE: AX25: finding routes simplification
With previous patch, rose_get_neigh() routine
investigates the full list of neighbor nodes
until it finds or not an already connected node whether
it is called locally or through a level 3 transit frame.
If no routes are opened through an adjacent connected node
then a classical connect request is attempted.

Then there is no more reason for an extra loop such
as the one removed by this patch.

Signed-off-by: Bernard Pidoux <f6bvp@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-14 13:33:49 -08:00
Bernard Pidoux
c5d8b24ad0 ROSE: rose AX25 packet routing improvement
FPAC AX25 packet application is using Linux kernel ROSE
routing skills in order to connect or send packets to remote stations
knowing their ROSE address via a network of interconnected nodes.

Each FPAC node has a ROSE routing table that Linux ROSE module is
looking at each time a ROSE frame is relayed by the node or when
a connect request to a neighbor node is received.

A previous patch improved the system time response by looking at
already established routes each time the system was looking for a
route to relay a frame. If a neighbor node routing the destination
address was already connected, then the frame would be sent
through him. If not, a connection request would be issued.

The present patch extends the same routing capability to a connect
request asked by a user locally connected into an FPAC node.
Without this patch, a connect request was not well handled unless it
was directed to an immediate connected neighbor of the local node.

Implemented at a number of ROSE FPAC node stations, the present patch
improved dramatically FPAC ROSE routing time response and efficiency.

Signed-off-by: Bernard Pidoux <f6bvp@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-14 13:31:09 -08:00
David S. Miller
8bc26a008f Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2011-02-14 12:51:42 -08:00
Eric Dumazet
31d409373c ipv4: fix rcu lock imbalance in fib_select_default()
Commit 0c838ff1ad (ipv4: Consolidate all default route selection
implementations.) forgot to remove one rcu_read_unlock() from
fib_select_default().

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-14 11:23:04 -08:00
David S. Miller
af756e9d88 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6 2011-02-14 11:16:12 -08:00
Ben Hutchings
ac7100ba93 sch_mqprio: Always set num_tc to 0 in mqprio_destroy()
All the cleanup code in mqprio_destroy() is currently conditional on
priv->qdiscs being non-null, but that condition should only apply to
the per-queue qdisc cleanup.  We should always set the number of
traffic classes back to 0 here.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
2011-02-14 19:07:58 +00:00