Commit Graph

337612 Commits

Author SHA1 Message Date
Paul Gortmaker
0fef8f205f tipc: refactor accept() code for improved readability
In TIPC's accept() routine, there is a large block of code relating
to initialization of a new socket, all within an if condition checking
if the allocation succeeded.

Here, we simply flip the check of the if, so that the main execution
path stays at the same indentation level, which improves readability.
If the allocation fails, we jump to an already existing exit label.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-12-07 17:23:24 -05:00
Ying Xue
258f8667a2 tipc: add lock nesting notation to quiet lockdep warning
TIPC accept() call grabs the socket lock on a newly allocated
socket while holding the socket lock on an old socket. But lockdep
worries that this might be a recursive lock attempt:

  [ INFO: possible recursive locking detected ]
  ---------------------------------------------
  kworker/u:0/6 is trying to acquire lock:
  (sk_lock-AF_TIPC){+.+.+.}, at: [<c8c1226c>] accept+0x15c/0x310 [tipc]

  but task is already holding lock:
  (sk_lock-AF_TIPC){+.+.+.}, at: [<c8c12138>] accept+0x28/0x310 [tipc]

  other info that might help us debug this:
  Possible unsafe locking scenario:

          CPU0
          ----
          lock(sk_lock-AF_TIPC);
          lock(sk_lock-AF_TIPC);

          *** DEADLOCK ***

  May be due to missing lock nesting notation
  [...]

Tell lockdep that this locking is safe by using lock_sock_nested().
This is similar to what was done in commit 5131a184a3 for
SCTP code ("SCTP: lock_sock_nested in sctp_sock_migrate").

Also note that this is isn't something that is seen normally,
as it was uncovered with some experimental work-in-progress
code not yet ready for mainline.  So no need for stable
backports or similar of this commit.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-12-07 17:23:23 -05:00
Ying Xue
cbab368790 tipc: eliminate connection setup for implied connect in recv_msg()
As connection setup is now completed asynchronously in BH context,
in the function filter_connect(), the corresponding code in recv_msg()
becomes redundant.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-12-07 17:23:22 -05:00
Ying Xue
584d24b396 tipc: introduce non-blocking socket connect
TIPC has so far only supported blocking connect(), meaning that a call
to connect() doesn't return until either the connection is fully
established, or an error occurs. This has proved insufficient for many
users, so we now introduce non-blocking connect(), analogous to how
this is done in TCP and other protocols.

With this feature, if a connection cannot be established instantly,
connect() will return the error code "-EINPROGRESS".
If the user later calls connect() again, he will either have the
return code "-EALREADY" or "-EISCONN", depending on whether the
connection has been established or not.

The user must have explicitly set the socket to be non-blocking
(SOCK_NONBLOCK or O_NONBLOCK, depending on method used), so unless
for some reason they had set this already (the socket would anyway
remain blocking in current TIPC) this change should be completely
backwards compatible.

It is also now possible to call select() or poll() to wait for the
completion of a connection.

An effect of the above is that the actual completion of a connection
may now be performed asynchronously, independent of the calls from
user space. Therefore, we now execute this code in BH context, in
the function filter_rcv(), which is executed upon reception of
messages in the socket.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: minor refactoring for improved connect/disconnect function names]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-12-07 17:23:21 -05:00
Ying Xue
7e6c131e15 tipc: consolidate connection-oriented message reception in one function
Handling of connection-related message reception is currently scattered
around at different places in the code. This makes it harder to verify
that things are handled correctly in all possible scenarios.
So we consolidate the existing processing of connection-oriented
message reception in a single routine.  In the process, we convert the
chain of if/else into a switch/case for improved readability.

A cast on the socket_state in the switch is needed to avoid compile
warnings on 32 bit, like "net/tipc/socket.c:1252:2: warning: case value
‘4294967295’ not in enumerated type".  This happens because existing
tipc code pseudo extends the default linux socket state values with:

	#define SS_LISTENING    -1      /* socket is listening */
	#define SS_READY        -2      /* socket is connectionless */

It may make sense to add these as _positive_ values to the existing
socket state enum list someday, vs. these already existing defines.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: add cast to fix warning; remove returns from middle of switch]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-12-07 17:23:20 -05:00
Paul Gortmaker
bc879117d4 tipc: standardize across connect/disconnect function naming
Currently we have tipc_disconnect and tipc_disconnect_port.  It is
not clear from the names alone, what they do or how they differ.
It turns out that tipc_disconnect just deals with the port locking
and then calls tipc_disconnect_port which does all the work.

If we rename as follows: tipc_disconnect_port --> __tipc_disconnect
then we will be following typical linux convention, where:

   __tipc_disconnect: "raw" function that does all the work.

   tipc_disconnect: wrapper that deals with locking and then calls
		    the real core __tipc_disconnect function

With this, the difference is immediately evident, and locking
violations are more apt to be spotted by chance while working on,
or even just while reading the code.

On the connect side of things, we currently only have the single
"tipc_connect2port" function.  It does both the locking at enter/exit,
and the core of the work.  Pending changes will make it desireable to
have the connect be a two part locking wrapper + worker function,
just like the disconnect is already.

Here, we make the connect look just like the updated disconnect case,
for the above reason, and for consistency.  In the process, we also
get rid of the "2port" suffix that was on the original name, since
it adds no descriptive value.

On close examination, one might notice that the above connect
changes implicitly move the call to tipc_link_get_max_pkt() to be
within the scope of tipc_port_lock() protected region; when it was
not previously.  We don't see any issues with this, and it is in
keeping with __tipc_connect doing the work and tipc_connect just
handling the locking.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-12-07 17:23:19 -05:00
Jon Maloy
e643df156a tipc: change sk_receive_queue upper limit
The sk_recv_queue upper limit for connectionless sockets has empirically
turned out to be too low. When we double the current limit we get much
fewer rejected messages and no noticable negative side-effects.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-12-07 17:23:18 -05:00
Ying Xue
9da3d47587 tipc: eliminate aggregate sk_receive_queue limit
As a complement to the per-socket sk_recv_queue limit, TIPC keeps a
global atomic counter for the sum of sk_recv_queue sizes across all
tipc sockets. When incremented, the counter is compared to an upper
threshold value, and if this is reached, the message is rejected
with error code TIPC_OVERLOAD.

This check was originally meant to protect the node against
buffer exhaustion and general CPU overload. However, all experience
indicates that the feature not only is redundant on Linux, but even
harmful. Users run into the limit very often, causing disturbances
for their applications, while removing it seems to have no negative
effects at all. We have also seen that overall performance is
boosted significantly when this bottleneck is removed.

Furthermore, we don't see any other network protocols maintaining
such a mechanism, something strengthening our conviction that this
control can be eliminated.

As a result, the atomic variable tipc_queue_size is now unused
and so it can be deleted.  There is a getsockopt call that used
to allow reading it; we retain that but just return zero for
maximum compatibility.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
[PG: phase out tipc_queue_size as pointed out by Neil Horman]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-12-07 14:19:52 -05:00
Erik Hugne
c008413850 tipc: remove obsolete flush of stale reassembly buffer
Each link instance has a periodic job checking if there is a stale
ongoing message reassembly associated to the link. If no new
fragment has been received during the last 4*[link_tolerance] period,
it is assumed the missing fragment will never arrive. As a consequence,
the reassembly buffer is discarded, and a gap in the message sequence
occurs.

This assumption is wrong. After we abandoned our ambition to develop
packet routing for multi-cluster networks, only single-hop packet
transfer remains as an option. For those, all packets are guaranteed
to be delivered in sequence to the defragmentation layer. Any failure
to achieve sequenced delivery will eventually lead to link reset, and
the reassembly buffer will be flushed anyway.

So we just remove this periodic check, which is now obsolete.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: also delete get/inc_timer count, since they are now unused]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-12-06 17:20:19 -05:00
Cong Wang
b93196dc5a net: fix some compiler warning in net/core/neighbour.c
net/core/neighbour.c:65:12: warning: 'zero' defined but not used [-Wunused-variable]
net/core/neighbour.c:66:12: warning: 'unres_qlen_max' defined but not used [-Wunused-variable]

These variables are only used when CONFIG_SYSCTL is defined,
so move them under #ifdef CONFIG_SYSCTL.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Acked-by: Shan Wei <davidshan@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-05 21:50:37 -05:00
David S. Miller
c2d3babfaf bridge: implement multicast fast leave
V3: make it a flag
V2: make the toggle per-port

Fast leave allows bridge to immediately stops the multicast
traffic on the port receives IGMP Leave when IGMP snooping is enabled,
no timeouts are observed.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-12-05 16:24:45 -05:00
Eddie Wai
0d650ec75a cnic: Fix rare race condition during iSCSI disconnect.
If the initiator and target try to close the connection at about the same
time, there is a race condition in the termination sequence for bnx2x.
Fix the problem by waiting for the remote termination to complete before
deleting the Connection ID.  This will prevent the firmware assert.

Update version to 2.5.15.

Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-05 16:01:28 -05:00
Michael Chan
caa9e931fe cnic: Reset iSCSI EQ during shutdown.
Without the reset, reloading the cnic driver can cause the iSCSI
Event Queue to be out of sync with the driver and cause intermittent
crash.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Acked-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-05 16:01:28 -05:00
Eric Dumazet
0e1efe9d5e ipv6: avoid taking locks at socket dismantle
ipv6_sock_mc_close() is called for ipv6 sockets at close time, and most
of them don't use multicast.

Add a test to avoid contention on a shared spinlock.

Same heuristic applies for ipv6_sock_ac_close(), to avoid contention
on a shared rwlock.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-05 16:01:28 -05:00
Shan Wei
cc86802805 net: doc: add default value for neighbour parameters
Signed-off-by: Shan Wei <davidshan@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-05 16:01:28 -05:00
Shan Wei
ce46cc64d4 net: neighbour: prohibit negative value for unres_qlen_bytes parameter
unres_qlen_bytes and unres_qlen are int type.
But multiple relation(unres_qlen_bytes = unres_qlen * SKB_TRUESIZE(ETH_FRAME_LEN))
will cause type overflow when seting unres_qlen. e.g.

$ echo 1027506 > /proc/sys/net/ipv4/neigh/eth1/unres_qlen
$ cat /proc/sys/net/ipv4/neigh/eth1/unres_qlen
1182657265
$ cat /proc/sys/net/ipv4/neigh/eth1/unres_qlen_bytes
-2147479756

The gutted value is not that we setting。
But user/administrator don't know this is caused by int type overflow.

what's more, it is meaningless and even dangerous that unres_qlen_bytes is set
with negative number. Because, for unresolved neighbour address, kernel will cache packets
without limit in __neigh_event_send()(e.g. (u32)-1 = 2GB).

Signed-off-by: Shan Wei <davidshan@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-05 16:01:28 -05:00
Stephan Gatzka
1642182ea0 net/phy: Add interrupt support for dp83640 phy.
Added functions for ack_interrupt and config_intr. Tested on an mpc5200b
powerpc board.

Signed-off-by: Stephan Gatzka <stephan.gatzka@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-05 16:01:28 -05:00
Andrew Gallatin
59e955edb7 myri10ge: fix most sparse warnings
- convert remaining htonl/ntohl +__raw_read/__raw_writel to
  swab32 + readl/writel
- add missing __iomem qualifier in myri10ge_open()
- fix  dubious: x & !y warning by switching from logical to bitwise not

The swab32 conversion fixes a bug in myri10ge_led() where
big-endian machines would write the wrong pattern.

The only remaining warning (lock context imbalance) is due to
the use of __netif_tx_trylock(), and cannot easily be fixed.

Signed-off-by: Andrew Gallatin <gallatin@myri.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-05 16:01:27 -05:00
Amerigo Wang
50426b5925 bridge: implement multicast fast leave
V2: make the toggle per-port

Fast leave allows bridge to immediately stops the multicast
traffic on the port receives IGMP Leave when IGMP snooping is enabled,
no timeouts are observed.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-05 16:01:27 -05:00
Jan Glauber
a573ea56a9 3com: make 3c59x depend on HAS_IOPORT
The 3com driver for 3c59x requires ioport_map. Since not all
architectures support IO port mapping make 3c59x dependent on HAS_IOPORT.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-05 15:36:35 -05:00
David S. Miller
b1afce9538 ipv6: Protect ->mc_forwarding access with CONFIG_IPV6_MROUTE
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 14:46:34 -05:00
Nicolas Dichtel
193c1e478c ip6mr: fix rtm_family of rtnl msg
We talk about IPv6, hence the family is RTNL_FAMILY_IP6MR!
rtnl_register() is already called with RTNL_FAMILY_IP6MR.

The bug is here since the beginning of this function (commit 5b285cac35).

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 13:27:24 -05:00
Serge Hallyn
4e66ae2ea3 net: dev_change_net_namespace: send a KOBJ_REMOVED/KOBJ_ADD
When a new nic is created in namespace ns1, the kernel sends a KOBJ_ADD uevent
to ns1.  When the nic is moved to ns2, we only send a KOBJ_MOVE to ns2, and
nothing to ns1.

This patch changes that behavior so that when moving a nic from ns1 to ns2, we
send a KOBJ_REMOVED to ns1 and KOBJ_ADD to ns2.  (The KOBJ_MOVE is still
sent to ns2).

The effects of this can be seen when starting and stopping containers in
an upstart based host.  Lxc will create a pair of veth nics, the kernel
sends KOBJ_ADD, and upstart starts network-instance jobs for each.  When
one nic is moved to the container, because no KOBJ_REMOVED event is
received, the network-instance job for that veth never goes away.  This
was reported at https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1065589
With this patch the networ-instance jobs properly go away.

The other oddness solved here is that if a nic is passed into a running
upstart-based container, without this patch no network-instance job is
started in the container.  But when the container creates a new nic
itself (ip link add new type veth) then network-interface jobs are
created.  With this patch, behavior comes in line with a regular host.

v2: also send KOBJ_ADD to new netns.  There will then be a
_MOVE event from the device_rename() call, but that should
be innocuous.

Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 13:25:57 -05:00
Wei Yongjun
008d845cf6 net: neterion: use for_each_pci_dev to simplify the code
Use for_each_pci_dev to simplify the code.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 13:19:40 -05:00
Nicolas Dichtel
812e44dd18 ip6mr: advertise new mfc entries via rtnl
This patch allows to monitor mf6c activities via rtnetlink.
To avoid parsing two times the mf6c oifs, we use maxvif to allocate the rtnl
msg, thus we may allocate some superfluous space.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 13:08:11 -05:00
Nicolas Dichtel
8cd3ac9f9b ipmr: advertise new mfc entries via rtnl
This patch allows to monitor mfc activities via rtnetlink.
To avoid parsing two times the mfc oifs, we use maxvif to allocate the rtnl
msg, thus we may allocate some superfluous space.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 13:08:11 -05:00
Nicolas Dichtel
1eb99af52c ipmr/ip6mr: allow to get unresolved cache via netlink
/proc/net/ip[6]_mr_cache allows to get all mfc entries, even if they are put in
the unresolved list (mfc[6]_unres_queue). But only the table RT_TABLE_DEFAULT is
displayed.
This patch adds the parsing of the unresolved list when the dump is made via
rtnetlink, hence each table can be checked.

In IPv6, we set rtm_type in ip6mr_fill_mroute(), because in case of unresolved
mfc __ip6mr_fill_mroute() will not set it. In IPv4, it is already done.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 13:08:11 -05:00
Nicolas Dichtel
9a68ac72a4 ipmr/ip6mr: report origin of mfc entry into rtnl msg
A mfc entry can be static or not (added via the mroute_sk socket). The patch
reports MFC_STATIC flag into rtm_protocol by setting rtm_protocol to
RTPROT_STATIC or RTPROT_MROUTED.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 13:08:11 -05:00
Nicolas Dichtel
adfa85e45d ipmr/ip6mr: advertise mfc stats via rtnetlink
These statistics can be checked only via /proc/net/ip_mr_cache or
SIOCGETSGCNT[_IN6] and thus only for the table RT_TABLE_DEFAULT.
Advertising them via rtnetlink allows to get statistics for all cache entries,
whatever the table is.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 13:08:10 -05:00
Nicolas Dichtel
70b386a0cc ip6mr: use nla_nest_* helpers
This patch removes the skb manipulations when nested attributes are added by
using standard helpers.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 13:08:10 -05:00
Nicolas Dichtel
d67b8c616b netconf: advertise mc_forwarding status
This patch advertise the MC_FORWARDING status for IPv4 and IPv6.
This field is readonly, only multicast engine in the kernel updates it.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 13:08:10 -05:00
David S. Miller
e8ad1a8fab Merge branch 'master' of git://1984.lsi.us.es/nf-next
Pablo Neira Ayuso says:

====================
* Remove limitation in the maximum number of supported sets in ipset.
  Now ipset automagically increments the number of slots in the array
  of sets by 64 new spare slots, from Jozsef Kadlecsik.

* Partially remove the generic queue infrastructure now that ip_queue
  is gone. Its only client is nfnetlink_queue now, from Florian
  Westphal.

* Add missing attribute policy checkings in ctnetlink, from Florian
  Westphal.

* Automagically kill conntrack entries that use the wrong output
  interface for the masquerading case in case of routing changes,
  from Jozsef Kadlecsik.

* Two patches two improve ct object traceability. Now ct objects are
  always placed in any of the existing lists. This allows us to dump
  the content of unconfirmed and dying conntracks via ctnetlink as
  a way to provide more instrumentation in case you suspect leaks,
  from myself.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 13:01:19 -05:00
Sony Chacko
099f7aa740 qlcnic: rename module params with module_param_named
Add qlcnic prefix to qlcnic driver module parameters.

Signed-off-by: Sony Chacko <sony.chacko@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 13:00:12 -05:00
Sony Chacko
5796bd040e qlcnic: fix bug in LRO descriptor access macro
Signed-off-by: Sony Chacko <sony.chacko@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 13:00:12 -05:00
Sony Chacko
bff57d8e1d qlcnic: update NIC partition interface routines
Refactor 82xx driver to support new adapter
Update routines to support variable number of NIC partitions

Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: Sony Chacko <sony.chacko@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 13:00:12 -05:00
Sony Chacko
229997989b qlcnic: get board name API
Cleanup get board information API.

Signed-off-by: Sony Chacko <sony.chacko@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 13:00:12 -05:00
Sony Chacko
15087c2b31 qlcnic: modify PCI and register access routines
Refactor 82xx driver to support new adapter
Update PCI and hardware access routines

Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Signed-off-by: Sony Chacko <sony.chacko@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 13:00:12 -05:00
Sony Chacko
797884509d qlcnic: move HW specific data to seperate structure
Move HW specific data to a seperate structure as part of
refactoring 82xx adapter driver.

Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Signed-off-by: Sony Chacko <sony.chacko@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 13:00:11 -05:00
Sony Chacko
97ee45eb09 qlcnic: add 82xx adapter specific checks
Add 82xx adapter ID check before 82xx specific operations as part of
refactoring the driver to enable support for new adapter.

Signed-off-by: Sony Chacko <sony.chacko@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 13:00:11 -05:00
Matt Carlson
fb4ce8ad80 tg3: PTP - Enable the timestamping feature in hardware and fill skb tx/rx timestamps
This patch implements the hardware timestamping as described in
Documentation/networking/timestamping.txt

Update version to 3.128.

Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 12:58:50 -05:00
Matt Carlson
0a633ac228 tg3: PTP - Add the hardware timestamp ioctl
This patch implements the SIOCSHWTSTAMP ioctl as described in
Documentation/networking/timestamping.txt

[Removed HWTSTAMP_FILTER_ALL handling by returning -ERANGE based on input
 from Richard Cochran.]

Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 12:58:50 -05:00
Matt Carlson
7d41e49ac2 tg3: PTP - Implement the ptp api and ethtool functions
This patch adds the ptp_caps structure, ptp api implementation,
reference clock read and register/unregister functions.  All the basic
clock operations as described in Documentation/ptp/ptp.txt are
supported.

Frequency adjustment is performed using hardware with a 24 bit
accumulator and a programmable correction value. On each clk, the
correction value gets added to the accumulator and when it overflows,
the time counter is incremented/decremented and the accumulator reset.

So conversion from ppb to correction value is
	ppb * (1 << 24) / 1000000000

[Re-organized to put the ptp_clock_info struct declaration in one patch,
 added ptp_clock_info.name, and added locking to tg3_ptp_adjtime() based
 on input from Richard Cochran.]

Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 12:58:49 -05:00
Matt Carlson
be947307b5 tg3: PTP - Add header definitions, initialization and hw access functions.
This patch adds code to write the reference clock. If a chip reset is
performed, the hwclock is reinitialized with the adjusted kernel time

Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 12:58:49 -05:00
Nithin Nayak Sujir
357630668a tg3: Fix inconsistent locking for tg3_netif_start().
Every caller holds tp->lock when calling tg3_netif_start() except
tg3_io_resume().  Fix it so that it is all consistent.  The subsequent
PTP patches add tg3_ptp_resume() to tg3_netif_start() and the tp->lock
is required.

Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 12:58:49 -05:00
David S. Miller
682d7978ae Networking: Remove __dev* markings from the networking drivers
This is a series of patches that remove the dev* attributes for all
 networking drivers, with the exception of wireless drivers, those are in
 a different branch.
 
 Use of __devinit, __devexit_p, __devinitdata, __devinitconst, and
 __devexit are no longer needed since CONFIG_HOTPLUG is being removed as
 an option.
 
 Note, there are some devinit compiler section mismatch warnings due to
 this series, but they are fixed up when merged with my driver-next
 branch, which fixes the PCI device id warnings, and removes the modpost
 detection, as it's no longer needed.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlC9COEACgkQMUfUDdst+ykOEgCeKZsrtKrbjMFM5jEROqnk33FI
 SwEAoMSZWFJ7M1/27FGRkylmmypWXi/p
 =HH3A
 -----END PGP SIGNATURE-----

Merge tag 'dev_removal' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/net-next

Networking:  Remove __dev* markings from the networking drivers

This is a series of patches that remove the dev* attributes for all
networking drivers, with the exception of wireless drivers, those are in
a different branch.

Use of __devinit, __devexit_p, __devinitdata, __devinitconst, and
__devexit are no longer needed since CONFIG_HOTPLUG is being removed as
an option.

Note, there are some devinit compiler section mismatch warnings due to
this series, but they are fixed up when merged with my driver-next
branch, which fixes the PCI device id warnings, and removes the modpost
detection, as it's no longer needed.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-03 15:35:28 -05:00
Paul Marks
a5a81f0b90 ipv6: Fix default route failover when CONFIG_IPV6_ROUTER_PREF=n
I believe this commit from 2008 was incorrect:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=398bcbebb6f721ac308df1e3d658c0029bb74503

When CONFIG_IPV6_ROUTER_PREF is disabled, the kernel should follow
RFC4861 section 6.3.6: if no route is NUD_VALID, then traffic should be
sprayed across all routers (indirectly triggering NUD) until one of them
becomes NUD_VALID.

However, the following experiment demonstrates that this does not work:

1) Connect to an IPv6 network.
2) Change the router's MAC (and link-local) address.

The kernel will lock onto the first router and never try the new one, even
if the first becomes unreachable.  This patch fixes the problem by
allowing rt6_check_neigh() to return 0; if all routers return 0, then
rt6_select() will fall back to round-robin behavior.

This patch should have no effect when CONFIG_IPV6_ROUTER_PREF=y.

Note that rt6_check_neigh() is only used in a boolean context, so I've
changed its return type accordingly.

Signed-off-by: Paul Marks <pmarks@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-03 15:34:47 -05:00
Michael S. Tsirkin
5d09710925 tun: only queue packets on device
Historically tun supported two modes of operation:
- in default mode, a small number of packets would get queued
  at the device, the rest would be queued in qdisc
- in one queue mode, all packets would get queued at the device

This might have made sense up to a point where we made the
queue depth for both modes the same and set it to
a huge value (500) so unless the consumer
is stuck the chance of losing packets is small.

Thus in practice both modes behave the same, but the
default mode has some problems:
- if packets are never consumed, fragments are never orphaned
  which cases a DOS for sender using zero copy transmit
- overrun errors are hard to diagnose: fifo error is incremented
  only once so you can not distinguish between
  userspace that is stuck and a transient failure,
  tcpdump on the device does not show any traffic

Userspace solves this simply by enabling IFF_ONE_QUEUE
but there seems to be little point in not doing the
right thing for everyone, by default.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-03 15:07:36 -05:00
Bill Pemberton
9f9a12f8ca net/intel: remove __dev* attributes
CONFIG_HOTPLUG is going away as an option.  As result the __dev*
markings will be going away.

Remove use of __devinit, __devexit_p, __devinitdata, __devinitconst,
and __devexit.

Signed-off-by: Bill Pemberton <wfp5p@virginia.edu>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Bruce Allan <bruce.w.allan@intel.com>
Cc: Carolyn Wyborny <carolyn.wyborny@intel.com>
Cc: Don Skidmore <donald.c.skidmore@intel.com>
Cc: Greg Rose <gregory.v.rose@intel.com>
Cc: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Cc: Alex Duyck <alexander.h.duyck@intel.com>
Cc: John Ronciak <john.ronciak@intel.com>
Cc: Tushar Dave <tushar.n.dave@intel.com>
Cc: e1000-devel@lists.sourceforge.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-12-03 11:17:07 -08:00
Bill Pemberton
0329aba137 bnx2x: remove __dev* attributes
CONFIG_HOTPLUG is going away as an option.  As result the __dev*
markings will be going away.

Remove use of __devinit, __devexit_p, __devinitdata, __devinitconst,
and __devexit.

Signed-off-by: Bill Pemberton <wfp5p@virginia.edu>
Cc: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-12-03 11:17:07 -08:00
Bill Pemberton
047fc56614 net/broadcom: remove __dev* attributes
CONFIG_HOTPLUG is going away as an option.  As result the __dev*
markings will be going away.

Remove use of __devinit, __devexit_p, __devinitdata, __devinitconst,
and __devexit.

Signed-off-by: Bill Pemberton <wfp5p@virginia.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-12-03 11:17:07 -08:00