Commit Graph

18854 Commits

Author SHA1 Message Date
Gustavo F. Padovan
af6bcd8205 Bluetooth: move l2cap_sock_bind()/listen() to l2cap_sock.c
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:43:30 -02:00
Gustavo F. Padovan
554f05bb8a Bluetooth: move l2cap_sock_release() to l2cap_sock.c
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:43:30 -02:00
Gustavo F. Padovan
65390587c7 Bluetooth: move l2cap_sock_ops to l2cap_sock.c
First step to move all l2cap_sock_ops function to l2cap_sock.c

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:43:30 -02:00
Gustavo F. Padovan
bb58f747e5 Bluetooth: Initial work for L2CAP split.
This patch tries to do the minimal to move l2cap_sock_create() and its
dependencies to l2cap_sock.c. It create a API to initialize and cleanup
the L2CAP sockets from l2cap_core.c through l2cap_init_sockets() and
l2cap_cleanup_sockets().

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:43:30 -02:00
Gustavo F. Padovan
0a708f8fc4 Bluetooth: Rename l2cap.c to l2cap_core.c
In a preparation to the the L2CAP code split in many files.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:43:15 -02:00
Andrei Emeltchenko
d37f50e190 Bluetooth: fix crash by disabling tasklet in sock accept
Crash can happen when tasklet handling connect/disconnect requests
preempts socket accept. Can be reproduced with "l2test -r" on one
side and several "l2test -c -b 1000 -i hci0 -P 10 <bdaddr>" on the
other side.

disable taskets in socket accept and change lock_sock and release_sock
to bh_lock_sock and bh_unlock_sock since we have to use spinlocks and
there is no need to mark sock as owned by user.

...
[ 3555.897247] Unable to handle kernel NULL pointer dereference at virtual
address 000000bc
[ 3555.915039] pgd = cab9c000
[ 3555.917785] [000000bc] *pgd=8bf3d031, *pte=00000000, *ppte=00000000
[ 3555.928314] Internal error: Oops: 17 [#1] PREEMPT
[ 3555.999786] CPU: 0    Not tainted  (2.6.32.21-13874-g67918ef #65)
...
[ 3556.005981] PC is at bt_accept_unlink+0x20/0x58 [bluetooth]
[ 3556.011627] LR is at bt_accept_dequeue+0x3c/0xe8 [bluetooth]
...
[ 3556.161285] [<bf0007fc>] (bt_accept_unlink+0x20/0x58 [bluetooth]) from
[<bf000870>] (bt_accept_dequeue+0x3c/0xe8 [bluetooth])
[ 3556.172729] [<bf000870>] (bt_accept_dequeue+0x3c/0xe8 [bluetooth]) from
[<bf324df8>] (l2cap_sock_accept+0x100/0x15c [l2cap])
[ 3556.184082] [<bf324df8>] (l2cap_sock_accept+0x100/0x15c [l2cap]) from
[<c026a0a8>] (sys_accept4+0x120/0x1e0)
[ 3556.193969] [<c026a0a8>] (sys_accept4+0x120/0x1e0) from [<c002c9a0>]
(ret_fast_syscall+0x0/0x2c)
[ 3556.202819] Code: e5813000 e5901164 e580c160 e580c15c (e1d13bbc)
...

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:40:08 -02:00
Andrei Emeltchenko
5a08ecceda Bluetooth: Do not use assignments in IF conditions
Fix checkpatch warnings concerning assignments in if conditions.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:40:08 -02:00
Johan Hedberg
17fa4b9dff Bluetooth: Add set_io_capability management command
This patch adds a new set_io_capability management command which is used
to set the IO capability for Secure Simple Pairing (SSP) as well as the
Security Manager Protocol (SMP). The value is per hci_dev and each
hci_conn object inherits it upon creation.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:40:08 -02:00
Johan Hedberg
980e1a537f Bluetooth: Add support for PIN code handling in the management interface
This patch adds the necessary commands and events needed to communicate
PIN code related actions between the kernel and userspace. This includes
a pin_code_request event as well as pin_code_reply and
pin_code_negative_reply commands.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:40:07 -02:00
Johan Hedberg
a38528f111 Bluetooth: Create common cmd_complete function for mgmt.c
A lot of management code needs to generate command complete events so it
makes sense to have a helper function for this.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:40:07 -02:00
Johan Hedberg
2784eb41b1 Bluetooth: Add get_connections managment interface command
This patch adds a get_connections command to the management interface.
With this command userspace can get the current list of connected
devices. Typically this command would only be used once when enumerating
existing adapters. After that the connected and disconnected events are
used to track connections.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:40:07 -02:00
Johan Hedberg
17d5c04cb5 Bluetooth: Add support for connect failed management event
This patch add a new connect failed management event to track failures
in connecting to remote devices. It is particularly useful for security
mode 3 scenarios when we don't have a connected state while pairing but
still need to detect when the connect attempt failed.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:40:07 -02:00
Johan Hedberg
8962ee74be Bluetooth: Add disconnect managment command
This patch adds a disconnect command to the managment interface. Using
this command user space is able to force the disconnection of connected
devices. The command maps directly to the Disconnect HCI command.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:40:07 -02:00
Johan Hedberg
f7520543ab Bluetooth: Add connected/disconnected management events
This patch adds connected and disconnected managment events to track the
connection status to remote devices. The events map directly to
successful connection complete and disconnection complete HCI events for
ACL links.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:40:07 -02:00
Johan Hedberg
55ed8ca10f Bluetooth: Implement link key handling for the management interface
This patch adds a management commands to feed the kernel with all stored
link keys as well as remove specific ones or all of them. Once the
load_keys command has been called the kernel takes over link key
replies. A new_key event is also added to inform userspace of newly
created link keys that should be stored permanently.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:40:07 -02:00
Johan Hedberg
1aff6f0949 Bluetooth: Add class of device control to the management interface
This patch adds the possibility for user space to fully control the
Class of Device value of local adapters. To control the service class
bits each UUID that's added comes with a service class "hint" which acts
as a mask of bits that the UUID needs to have enabled. The
set_service_cache management command is used to make sure we queue up
all UUID changes as user space initializes its drivers and then send a
single HCI_Write_Class_of_Device command when initialization is
complete.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:40:06 -02:00
Johan Hedberg
d5859e22cd Bluetooth: Implement a more complete adapter initialization sequence
Using the managment interface means that user space doesn't need to do
any HCI command sending at all. This patch moves the remaining
initialization commands from user space to the kernel side. The patch
makes use of the new feature of __hci_request which allows the request
to be dynamically modified while it is ongoing (something that is needed
to react appropriately to the local features and the version of the
adapter).

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:40:06 -02:00
Johan Hedberg
d835060036 Bluetooth: Remove page timeout setting from HCI init sequence
User space should set the page timeout so there's no need to explicitly
set it in the HCI init sequence. Even if user space fails to set it the
controller default value will be used.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:40:06 -02:00
Johan Hedberg
b0916ea0d9 Bluetooth: Add controller side link key clearing to hci_init_req
The controller may have link keys in its own memory and these keys could
be used for secure connections. However, since the interface to access
these keys doesn't provide information about the key types (which would
be needed to infer the level of security each key provides) using these
keys is rather useless. Therefore, simply clear the controller side list
in the initialization procedure.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:40:06 -02:00
Johan Hedberg
a5040efa20 Bluetooth: Add special handling with __hci_request and HCI_INIT
To support a more dynamic HCI initialization sequence the __hci_request
behavior requires some more changes. Particularly, the init sequence
should be able to have conditionals in it (sending some HCI commands
depending on the outcome of a previous command) instead of being a fixed
list as it is right now.

The reasons for these additional requirements are the moving all
previously user space driven initialization commands to the kernel side
as well as the support the Low Energy controllers.

To fulfull these requirements the init sequence is made the only special
case for multi-command requests and req_last_cmd is renamed to
init_last_cmd. The hci_send_cmd function is changed to update
init_last_cmd as long as the HCI_INIT flag is set.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:40:06 -02:00
Johan Hedberg
03b555e119 Bluetooth: Reject pairing requests when in non-pairable mode
This patch adds the necessary logic to act accordingly when the
HCI_PAIRABLE flag is not set. In that case PIN code replies as well as
Secure Simple Pairing requests without a NoBonding requirement need to
be rejected.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:40:06 -02:00
Johan Hedberg
930e13363f Bluetooth: Implement debugfs support for listing UUIDs
This patch adds a debugfs entry to list the UUIDs that have been
registered through the management interface.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:40:06 -02:00
Johan Hedberg
2aeb9a1ae0 Bluetooth: Implement UUID handling through the management interface
This patch adds methods to the management interface for userspace to
notify the kernel of which services have been registered for specific
adapters. This information is needed for setting the appropriate Class
of Device value as well as the Extended Inquiry Response value. This
patch doesn't actually implement setting of these values but just
provides the storage of the UUIDs so the needed functionality can be
built on top of it.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:40:05 -02:00
Johan Hedberg
c542a06c29 Bluetooth: Implement set_pairable managment command
This patch implements a new set_pairable management command to control
the pairable state of local adapters. The state is represented using a
new HCI_PAIRABLE flag in the hci_dev struct.

For backwards compatibility with older user space versions the
HCI_PAIRABLE flag gets automatically set when the existence of an
adapter is reported to user space through legacy methods and the
HCI_MGMT flag is not set.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:40:05 -02:00
Johan Hedberg
053f0211d3 Bluetooth: Add send_mode_rsp convenience function for mgmt.c
Several management commands have similar responses but they are not
always sent asynchronously. To enable synchronous sending (from the
managment command handler function) a send_mode_rsp function is added.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:40:05 -02:00
Johan Hedberg
ebc99feba7 Bluetooth: Add flag to track managment controlled adapters
This patch adds a HCI_MGMT flag to track adapters which are under the
control of the management interface. This is needed to make sure that
new kernels will work with old user space versions. I.e. behaviour which
could break old user space versions (but is needed by the management
interface) should not be exhibited when the HCI_MGMT flag is not set.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:40:05 -02:00
Johan Hedberg
72a734ec1a Bluetooth: Unify mode related management messages to a single struct
The powered, connectable and discoverable messages all have the same
format. By using a single struct for all of them a lot of code can be
simplified and reused.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:40:05 -02:00
Johan Hedberg
9fbcbb455d Bluetooth: Add set_connectable management command
This patch adds a set_connectable command as well as a corresponding
event to the management interface. It's mainly useful for setting an
adapter as connectable from a non-initialized state as well as setting
an already initialized adapter as non-connectable (mostly useful for
qualification purposes).

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:40:05 -02:00
Johan Hedberg
73f22f6238 Bluetooth: Add support for set_discoverable management command
This patch adds a set_discoverable command to the management interface
as well as the corresponding event. The command is used to control the
discoverable state of adapters.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:40:04 -02:00
Johan Hedberg
eec8d2bcc8 Bluetooth: Add support for set_powered management command
This patch adds a set_powered command to the management interface
through which the powered state of local adapters can be controlled.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:40:04 -02:00
Johan Hedberg
5add6af8fc Bluetooth: Add support for management powered event
This patch adds support for the powered event that's used to indicate to
userspace when the powered state of a local adapter changes.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:40:04 -02:00
Johan Hedberg
ab81cbf99c Bluetooth: Implement automatic setup procedure for local adapters
This patch implements automatic initialization of basic information
about newly registered Bluetooth adapters. E.g. the address and features
are always needed so it makes sense for the kernel to automatically
power on adapters and read this information. A new HCI_SETUP flag is
added to track this state.

In order to not consume unnecessary amounts of power if there isn't a
user space available that could switch the adapter back off, a timer is
added to do this automatically as long as no Bluetooth user space seems
to be present. A new HCI_AUTO_OFF flag is added that user space needs to
clear to avoid the automatic power off.

Additionally, the management interface index_added event is moved to the
end of the HCI_SETUP stage so a user space supporting the managment
inteface has all the necessary information available for fetching when
it gets notified of a new adapter. The HCI_DEV_REG event is kept in the
same place as before since existing HCI raw socket based user space
versions depend on seeing the kernels initialization sequence
(hci_init_req) to determine when the adapter is ready for use.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:40:04 -02:00
Gustavo F. Padovan
7990681c40 Bluetooth: Fix setting of MTU for ERTM and Streaming Mode
The desired MTU should be sent in an Config_Req for all modes.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:40:04 -02:00
Andrei Emeltchenko
e702112ff6 Bluetooth: Use non-flushable by default L2CAP data packets
Modification of Nick Pelly <npelly@google.com> patch.

With Bluetooth 2.1 ACL packets can be flushable or non-flushable. This commit
makes ACL data packets non-flushable by default on compatible chipsets, and
adds the BT_FLUSHABLE socket option to explicitly request flushable ACL
data packets for a given L2CAP socket. This is useful for A2DP data which can
be safely discarded if it can not be delivered within a short time (while
other ACL data should not be discarded).

Note that making ACL data flushable has no effect unless the automatic flush
timeout for that ACL link is changed from its default of 0 (infinite).

Default packet types (for compatible chipsets):
Frame 34: 13 bytes on wire (104 bits), 13 bytes captured (104 bits)
Bluetooth HCI H4
Bluetooth HCI ACL Packet
    .... 0000 0000 0010 = Connection Handle: 0x0002
    ..00 .... .... .... = PB Flag: First Non-automatically Flushable Packet (0)
    00.. .... .... .... = BC Flag: Point-To-Point (0)
    Data Total Length: 8
Bluetooth L2CAP Packet

After setting BT_FLUSHABLE
(sock.setsockopt(274 /*SOL_BLUETOOTH*/, 8 /* BT_FLUSHABLE */, 1 /* flush */))
Frame 34: 13 bytes on wire (104 bits), 13 bytes captured (104 bits)
Bluetooth HCI H4
Bluetooth HCI ACL Packet
    .... 0000 0000 0010 = Connection Handle: 0x0002
    ..10 .... .... .... = PB Flag: First Automatically Flushable Packet (2)
    00.. .... .... .... = BC Flag: Point-To-Point (0)
    Data Total Length: 8
Bluetooth L2CAP Packet

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:40:04 -02:00
Jesper Juhl
b2c60d42db Bluetooth: Fix failure to release lock in read_index_list()
If alloc_skb() fails in read_index_list() we'll return -ENOMEM without
releasing 'hci_dev_list_lock'.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-08 01:40:04 -02:00
Gustavo F. Padovan
80f5585a29 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/padovan/bluetooth-2.6 into wireless 2011-02-08 01:38:50 -02:00
Sven Eckelmann
531c9da8c8 batman-adv: Linearize fragment packets before merge
We access the data inside the skbs of two fragments directly using memmove
during the merge. The data of the skb could span over multiple skb pages. An
direct access without knowledge about the pages would lead to an invalid memory
access.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
[lindner_marek@yahoo.de: Move return from function to the end]
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-02-08 00:54:31 +01:00
andrew hendry
95c3043008 x25: possible skb leak on bad facilities
Originally x25_parse_facilities returned
-1 for an error
 0 meaning 0 length facilities
>0 the length of the facilities parsed.

5ef41308f9 ("x25: Prevent crashing when parsing bad X.25 facilities") introduced more
error checking in x25_parse_facilities however used 0 to indicate bad parsing
a6331d6f9a ("memory corruption in X.25 facilities parsing") followed this further for
DTE facilities, again using 0 for bad parsing.

The meaning of 0 got confused in the callers.
If the facilities are messed up we can't determine where the data starts.
So patch makes all parsing errors return -1 and ensures callers close and don't use the skb further.

Reported-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-07 13:41:38 -08:00
Dan Carpenter
3ad97fbcc2 mac80211: remove unneeded check
"ap" is the address of sdata->u.ap so it can never be NULL here.  Also
we dereferenced it on the previous line.  I removed the check.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-02-07 16:18:28 -05:00
Mohammed Shafi Shajakhan
38f37be209 mac80211: Update comments on radiotap MCS index
mac80211 now supports passing MCS index to radiotap, so update the
comments regarding this

Signed-off-by: Mohammed Shafi Shajakhan <mshajakhan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-02-07 16:18:28 -05:00
Felix Fietkau
4f3123366f mac80211: as a 4-addr station, do not receive packets for other stations
Since 4-addr frames completely override the source address which will
make it into the converted 802.3 frames, receiving frames for other
4-addr stations will confuse the bridging code.

To be able to handle traffic for all connected devices, the bridge
code will automatically turn on promiscuous mode, which triggers
this problem.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Reported-by: Steve Brown <sbrown@cortland.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-02-07 16:18:27 -05:00
Ben Greear
180205bdb2 mac80211: Make some mlme timers module paramaters.
This allows users to tune the connection-loss algorithms
to be more or less lenient.  In particular, larger
null-func retries helps when using lots of virtual
stations on a loaded network.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-02-07 16:18:27 -05:00
Felix Fietkau
fc7c976dc7 mac80211: fix the skb cloned check in the tx path
Using skb_header_cloned to check if it's safe to write to the skb is not
enough - mac80211 also touches the tailroom of the skb.
Initially this check was only used to increase a counter, however this
commit changed the code to also skip skb data reallocation if no extra
head/tailroom was needed:

commit 4cd06a344d
mac80211: skip unnecessary pskb_expand_head calls

It added a regression at least with iwl3945, which is fixed by this patch.

Reported-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Tested-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-02-07 16:02:14 -05:00
Dan Carpenter
7c9989a76e IPVS: precedence bug in ip_vs_sync_switch_mode()
'!' has higher precedence than '&'.  IP_VS_STATE_MASTER is 0x1 so
the original code is equivelent to if (!ipvs->sync_state) ...

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-02-07 20:40:00 +09:00
David S. Miller
92d8682926 inetpeer: Move ICMP rate limiting state into inet_peer entries.
Like metrics, the ICMP rate limiting bits are cached state about
a destination.  So move it into the inet_peer entries.

If an inet_peer cannot be bound (the reason is memory allocation
failure or similar), the policy is to allow.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-04 15:59:53 -08:00
David S. Miller
0131ba451e ipv4: Don't miss existing cached metrics in new routes.
Always lookup to see if we have an existing inetpeer entry for
a route.  Let FLOWI_FLAG_PRECOW_METRICS merely influence the
"create" argument to rt_bind_peer().

Also, call rt_bind_peer() unconditionally since it is not
possible for rt->peer to be non-NULL at this point.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-04 14:37:30 -08:00
David S. Miller
bd4a6974cc Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2011-02-04 14:28:58 -08:00
Ben Greear
b23b025fe2 mac80211: Optimize scans on current operating channel.
This should decrease un-necessary flushes, on/off channel work,
and channel changes in cases where the only scanned channel is
the current operating channel.

* Removes SCAN_OFF_CHANNEL flag, uses SDATA_STATE_OFFCHANNEL
  and is-scanning flags instead.

* Add helper method to determine if we are currently configured
  for the operating channel.

* Do no blindly go off/on channel in work.c  Instead, only call
  appropriate on/off code when we really need to change channels.
  Always enable offchannel-ps mode when starting work,
  and disable it when we are done.

* Consolidate ieee80211_offchannel_stop_station and
  ieee80211_offchannel_stop_beaconing, call it
  ieee80211_offchannel_stop_vifs instead.

* Accept non-beacon frames when scanning on operating channel.

* Scan state machine optimized to minimize on/off channel
  transitions.  Also, when going on-channel, go ahead and
  re-enable beaconing.  We're going to be there for 200ms,
  so seems like some useful beaconing could happen.
  Always enable offchannel-ps mode when starting software
  scan, and disable it when we are done.

* Grab local->mtx earlier in __ieee80211_scan_completed_finish
  so that we are protected when calling hw_config(), etc.

* Pass probe-responses up the stack if scanning on local
  channel, so that mlme can take a look.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-02-04 16:30:32 -05:00
Felix Fietkau
b1f93314bf mac80211: do not send duplicate data frames to the cooked monitor interface
I can't think of a valid use case for this aside from debugging (which can
also be done with a real monitor interface), and dropping these frames saves
some precious CPU cycles.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-02-04 16:29:52 -05:00
Rajkumar Manoharan
8c99f69182 mac80211: do not restart ps timer during scan or offchannel
While leaving oper channel, STA informs sleep state to AP to
stop sending data. Till sending ack for the nullfunc, AP
continues to send the data to STA which restarts ps_timer that
is causing unnecessary nullfunc exchange on timer expiry
when the STA was already moved to offchannel. So don't restart ps_timer
on data reception during scan. This issue was identified by
the following warning.

WARNING: at net/mac80211/tx.c:661 invoke_tx_handlers+0xf07/0x1330 [mac80211]
wlan0: Dropped data frame as no usable bitrate found while scanning and
associated. Target station: 00:03:7f:0b:a6:1b on 5 GHz band
Call Trace:
  [<ffffffffa0413ba7>] invoke_tx_handlers+0xf07/0x1330 [mac80211]
  [<ffffffffa0414056>] ieee80211_tx+0x86/0x2c0 [mac80211]
  [<ffffffffa0414345>] ieee80211_xmit+0xb5/0x1d0 [mac80211]
  [<ffffffffa04037e0>] ieee80211_dynamic_ps_enable_work+0x0/0xb0 [mac80211]
  [<ffffffffa04158cf>] ieee80211_tx_skb+0x4f/0x60 [mac80211]
  [<ffffffffa04026e6>] ieee80211_send_nullfunc+0x46/0x60 [mac80211]
  [<ffffffffa0403885>] ieee80211_dynamic_ps_enable_work+0xa5/0xb0 [mac80211]

Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Rajkumar Manoharan <rmanoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-02-04 16:29:51 -05:00
Christian Lamparter
512119b36f mac80211: fix race between next beacon dtim and ieee80211_get_buffered_bc
On review of 'zd1211rw: implement beacon fetching and handling
ieee80211_get_buffered_bc()', Christian Lamparter noted that [1]:

   Since zd_beacon_done also uploads the next beacon so long in advance,
   there could be an equally long race between the outdated state of the
   next beacon's DTIM broadcast traffic indicator (802.11-2007 7.3.2.6)
   which -in your case- was uploaded almost a beacon interval ago and
   the xmit of ieee80211_get_buffered_bc *now*.

   The dtim bc/mc bit might be not set, when a mc/bc arrived after the
   beacon was uploaded, but before the "beacon done event" from the
   hardware. So, dozing stations don't expect the broadcast traffic
   and of course, they might miss it completely.

   It's probably better to fix this in mac80211 (see the attached hack).

[1] http://marc.info/?l=linux-wireless&m=129435041117256&w=2

CC: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-02-04 16:29:49 -05:00
Linus Torvalds
44f2c5c841 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (68 commits)
  net: can: janz-ican3: world-writable sysfs termination file
  net: can: at91_can: world-writable sysfs files
  MAINTAINERS: update email ids of the be2net driver maintainers.
  bridge: Don't put partly initialized fdb into hash
  r8169: prevent RxFIFO induced loops in the irq handler.
  r8169: RxFIFO overflow oddities with 8168 chipsets.
  r8169: use RxFIFO overflow workaround for 8168c chipset.
  include/net/genetlink.h: Allow genlmsg_cancel to accept a NULL argument
  net: Provide compat support for SIOCGETMIFCNT_IN6 and SIOCGETSGCNT_IN6.
  net: Support compat SIOCGETVIFCNT ioctl in ipv4.
  net: Fix bug in compat SIOCGETSGCNT handling.
  niu: Fix races between up/down and get_stats.
  tcp_ecn is an integer not a boolean
  atl1c: Add missing PCI device ID
  s390: Fix possibly wrong size in strncmp (smsgiucv)
  s390: Fix wrong size in memcmp (netiucv)
  qeth: allow OSA CHPARM change in suspend state
  qeth: allow HiperSockets framesize change in suspend
  qeth: add more strict MTU checking
  qeth: show new mac-address if its setting fails
  ...
2011-02-04 13:20:01 -08:00
Pavel Emelyanov
1158f762e5 bridge: Don't put partly initialized fdb into hash
The fdb_create() puts a new fdb into hash with only addr set. This is
not good, since there are callers, that search the hash w/o the lock
and access all the other its fields.

Applies to current netdev tree.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-04 13:02:36 -08:00
David S. Miller
e2d57766e6 net: Provide compat support for SIOCGETMIFCNT_IN6 and SIOCGETSGCNT_IN6.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-03 18:05:29 -08:00
David S. Miller
ca6b8bb097 net: Support compat SIOCGETVIFCNT ioctl in ipv4.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-03 17:24:28 -08:00
David S. Miller
0033d5ad27 net: Fix bug in compat SIOCGETSGCNT handling.
Commit 709b46e8d9 ("net: Add compat
ioctl support for the ipv4 multicast ioctl SIOCGETSGCNT") added the
correct plumbing to handle SIOCGETSGCNT properly.

However, whilst definiting a proper "struct compat_sioc_sg_req" it
isn't actually used in ipmr_compat_ioctl().

Correct this oversight.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-03 17:21:31 -08:00
Jouni Malinen
681d119047 mac80211: Add testing functionality for TKIP
TKIP countermeasures depend on devices being able to detect Michael
MIC failures on received frames and for stations to report errors to
the AP. In order to test that behavior, it is useful to be able to
send out TKIP frames with incorrect Michael MIC. This testing behavior
has minimal effect on the TX path, so it can be added to mac80211 for
convenient use.

The interface for using this functionality is a file in mac80211
netdev debugfs (tkip_mic_test). Writing a MAC address to the file
makes mac80211 generate a dummy data frame that will be sent out using
invalid Michael MIC value. In AP mode, the address needs to be for one
of the associated stations or ff:ff:ff:ff:ff:ff to use a broadcast
frame. In station mode, the address can be anything, e.g., the current
BSSID. It should be noted that this functionality works correctly only
when associated and using TKIP.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-02-03 16:45:29 -05:00
Jouni Malinen
747d753df7 mac80211: Remove obsolete TKIP flexibility
The TKIP implementation was originally prepared to be a bit more
flexible in the way Michael MIC TX/RX keys are configured. However, we
are now taking care of the TX/RX MIC key swapping in user space, so
this code will not be needed. Similarly, there were some remaining WPA
testing code that won't be used in their current form. Remove the
unneeded extra complexity.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-02-03 16:45:29 -05:00
Johannes Berg
e9d7732eaf mac80211: allow GO to scan like AP
There's no point in disallowing scanning for a
GO interface when it's not beaconing yet.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-02-03 16:44:44 -05:00
Arik Nemtsov
771bbd09f7 mac80211: pass up beacons from external BSS when operating as AP
Beacons from external BSSes are required for updating overlapping BSS
info (i.e. ERP protection). Pass them up unconditionally.

Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-02-03 16:44:44 -05:00
Arik Nemtsov
d057e5a381 mac80211: add HW flag for disabling auto link-PS in AP mode
When operating in AP mode the wl1271 hardware filters out null-data
packets as well as management packets. This makes it impossible for
mac80211 to monitor the PS mode by using the PM bit of incoming frames.

Implement a HW flag to indicate that mac80211 should ignore the PM bit.
In addition, expose ieee80211_sta_ps_transition() to make low-level
drivers capable of controlling PS-mode.

Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-02-03 16:44:44 -05:00
Arik Nemtsov
8fd369eeaa mac80211: do not calc frame duration when using HW rate-control
When rate-control is performed in HW, we cannot calculate frame
duration as we do not have the skb transmission rate in SW.

ieee80211_tx_h_calculate_duration() should only be called when
ieee80211_tx_h_rate_ctrl() has been called before to initialize data
in skb->cb. This doesn't happen for drivers with HW rate-control.

Fixes the following warning when operating in AP-mode
in a driver with HW rate-control.

WARNING: at net/mac80211/tx.c:57 ieee80211_duration+0x54/0x1d8 [mac80211]()
Modules linked in: wl1271_sdio wl1271 firmware_class crc7 mac80211 cfg80211
[<c0046090>] (unwind_backtrace+0x0/0x124) from [<c0064c10>] (warn_slowpath_common+0x4c/0x64)
[<c0064c10>] (warn_slowpath_common+0x4c/0x64) from [<c0064c40>] (warn_slowpath_null+0x18/0x1c)
[<c0064c40>] (warn_slowpath_null+0x18/0x1c) from [<bf040e34>] (ieee80211_duration+0x54/0x1d8 [mac80211])
[<bf040e34>] (ieee80211_duration+0x54/0x1d8 [mac80211]) from [<bf04200c>] (invoke_tx_handlers+0xfa0/0x1088 [mac80211])
[<bf04200c>] (invoke_tx_handlers+0xfa0/0x1088 [mac80211]) from [<bf042178>] (ieee80211_tx+0x84/0x248 [mac80211])
[<bf042178>] (ieee80211_tx+0x84/0x248 [mac80211]) from [<bf042f44>] (ieee80211_tx_pending+0x12c/0x278 [mac80211])
[<bf042f44>] (ieee80211_tx_pending+0x12c/0x278 [mac80211]) from [<c0069a9c>] (tasklet_action+0x68/0xbc)
[<c0069a9c>] (tasklet_action+0x68/0xbc) from [<c006a044>] (__do_softirq+0x84/0x114)
[<c006a044>] (__do_softirq+0x84/0x114) from [<c006a1b8>] (do_softirq+0x48/0x54)
[<c006a1b8>] (do_softirq+0x48/0x54) from [<c006a4f8>] (local_bh_enable+0x98/0xcc)
[<c006a4f8>] (local_bh_enable+0x98/0xcc) from [<bf074e60>] (wl1271_rx+0x2e8/0x3a4 [wl1271])
[<bf074e60>] (wl1271_rx+0x2e8/0x3a4 [wl1271]) from [<bf071ae4>] (wl1271_irq_work+0x230/0x310 [wl1271])
[<bf071ae4>] (wl1271_irq_work+0x230/0x310 [wl1271]) from [<c0076864>] (process_one_work+0x208/0x350)
[<c0076864>] (process_one_work+0x208/0x350) from [<c0076e14>] (worker_thread+0x1cc/0x300)
[<c0076e14>] (worker_thread+0x1cc/0x300) from [<c007bb88>] (kthread+0x84/0x8c)
[<c007bb88>] (kthread+0x84/0x8c) from [<c0041494>] (kernel_thread_exit+0x0/0x8)

Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-02-03 16:44:44 -05:00
Ben Greear
2cf22b897c mac80211: Recalculate channel-type on iface removal.
When a vif goes away, it could cause the super-chan
to be recalculated differently, so do that calculation
on iface removal.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-02-03 16:44:43 -05:00
Juuso Oikarinen
bf6a0579f6 cfg80211: Fix power save state after interface type change
Currently cfg80211 only configures the PSM state to the driver upon creation
of a new virtual interface, but not after interface type change. The mac80211
on the other hand reinitializes its sdata structure every time the interface
type is changed, losing the PSM configuration.

Hence, if the interface type is changed to, say, ad-hoc and then back to
managed, "iw wlan0 get power_save" will claim that PSM is enabled, when in
fact on mac80211 level it is not.

Fix this in cfg80211 by configuring the PSM state to the driver each time
the interface is brought up instead of just when the interface is created.

Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-02-03 16:44:42 -05:00
Ben Greear
172710bf83 mac80211: Warn users if HT fails because of freq mismatch.
I have a netgear WNDR3700 that appears to have an off-by-four
bug in how it fills out the hti->control_chan (I configure the
AP to channel 11, it reports 15 as control_chan).

Poke a message into the kernel logs to give users a
clue as to why they are not getting the expected
channel-type or rate.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-02-03 16:42:44 -05:00
Ben Greear
0fa025f0a2 mac80211: Show configured channel-type in netdev debugfs.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-02-03 16:42:43 -05:00
Ben Greear
eeabee7e53 mac80211: Be more careful when changing channels.
If we cannot set the channel type, set the channel back to the
original.

Don't update the driver hardware if nothing actually changed.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-02-03 16:38:26 -05:00
David S. Miller
fd95240568 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2011-02-03 13:06:43 -08:00
Simon Horman
8525d6f84f IPVS: Use correct lock in SCTP module
Use sctp_app_lock instead of tcp_app_lock in the SCTP protocol module.

This appears to be a typo introduced by the netns changes.

Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
2011-02-03 20:45:55 +09:00
David S. Miller
cdfb74d4c2 sch_choke: Need linux/vmalloc.h
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-02 23:06:31 -08:00
stephen hemminger
45e144339a sched: CHOKe flow scheduler
CHOKe ("CHOose and Kill" or "CHOose and Keep") is an alternative
packet scheduler based on the Random Exponential Drop (RED) algorithm.

The core idea is:
  For every packet arrival:
  	Calculate Qave
	if (Qave < minth)
	     Queue the new packet
	else
	     Select randomly a packet from the queue
	     if (both packets from same flow)
	     then Drop both the packets
	     else if (Qave > maxth)
	          Drop packet
	     else
	       	  Admit packet with proability p (same as RED)

See also:
  Rong Pan, Balaji Prabhakar, Konstantinos Psounis, "CHOKe: a stateless active
   queue management scheme for approximating fair bandwidth allocation",
  Proceeding of INFOCOM'2000, March 2000.

Help from:
     Eric Dumazet <eric.dumazet@gmail.com>
     Patrick McHardy <kaber@trash.net>

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-02 20:52:42 -08:00
stephen hemminger
119b3d3869 sfq: deadlock in error path
The change to allow divisor to be a parameter (in 2.6.38-rc1)
 commit 817fb15dfd
introduced a possible deadlock caught by sparse.

The scheduler tree lock was left locked in the case of an incorrect
divisor value. Simplest fix is to move test outside of lock
which also solves problem of partial update.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-02 20:51:20 -08:00
David S. Miller
b299e4f001 ipv4: Fix fib_trie build in some configurations.
If we end up including include/linux/node.h (either explicitly
or implicitly) that header has a definition of "structt node"
too.

So rename the one we use in fib_trie to "rt_trie_node" to avoid
the conflict.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-02 20:48:47 -08:00
David S. Miller
442b9635c5 tcp: Increase the initial congestion window to 10.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Nandita Dukkipati <nanditad@google.com>
2011-02-02 20:48:47 -08:00
David S. Miller
0bc0be7f20 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6 2011-02-02 15:52:23 -08:00
David S. Miller
8fe73503fa Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2011-02-02 15:24:48 -08:00
Patrick McHardy
9291747f11 netfilter: xtables: add device group match
Add a new 'devgroup' match to match on the device group of the
incoming and outgoing network device of a packet.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-03 00:05:43 +01:00
Jozsef Kadlecsik
5f52bc3cdd netfilter: ipset: send error message manually
When a message carries multiple commands and one of them triggers
an error, we have to report to the userspace which one was that.
The line number of the command plays this role and there's an attribute
reserved in the header part of the message to be filled out with the error
line number. In order not to modify the original message received from
the userspace, we construct a new, complete netlink error message and
modifies the attribute there, then send it.
Netlink is notified not to send its ACK/error message.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-02 23:56:00 +01:00
Andy Gospodarek
6d152e23ad gro: reset skb_iif on reuse
Like Herbert's change from a few days ago:

66c46d741e gro: Reset dev pointer on reuse

this may not be necessary at this point, but we should still clean up
the skb->skb_iif.  If not we may end up with an invalid valid for
skb->skb_iif when the skb is reused and the check is done in
__netif_receive_skb.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-02 14:53:25 -08:00
Patrick McHardy
724bab476b netfilter: ipset: fix linking with CONFIG_IPV6=n
Add a dummy ip_set_get_ip6_port function that unconditionally
returns false for CONFIG_IPV6=n and convert the real function
to ipv6_skip_exthdr() to avoid pulling in the ip6_tables module
when loading ipset.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-02 23:50:01 +01:00
Johannes Berg
4334ec8518 mac80211: fix TX status cookie in HW offload case
When the off-channel TX is done with remain-on-channel
offloaded to hardware, the reported cookie is wrong as
in that case we shouldn't use the SKB as the cookie but
need to instead use the corresponding r-o-c cookie
(XOR'ed with 2 to prevent API mismatches).

Fix this by keeping track of the hw_roc_skb pointer
just for the status processing and use the correct
cookie to report in this case. We can't use the
hw_roc_skb pointer itself because it is NULL'ed when
the frame is transmitted to prevent it being used
twice.

This fixes a bug where the P2P state machine in the
supplicant gets stuck because it never gets a correct
result for its transmitted frame.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-02-02 16:38:59 -05:00
Bao Liang
e733fb6208 Bluetooth: Set conn state to BT_DISCONN to avoid multiple responses
This patch fixes a minor issue that two connection responses will be sent
for one L2CAP connection request. If the L2CAP connection request is first
blocked due to security reason and responded with reason "security block",
the state of the connection remains BT_CONNECT2. If a pairing procedure
completes successfully before the ACL connection is down, local host will
send another connection complete response. See the following packets
captured by hcidump.

2010-12-07 22:21:24.928096 < ACL data: handle 12 flags 0x00 dlen 16
    0000: 0c 00 01 00 03 19 08 00  41 00 53 00 03 00 00 00  ........A.S.....
... ...

2010-12-07 22:21:35.791747 > HCI Event: Auth Complete (0x06) plen 3
    status 0x00 handle 12
... ...

2010-12-07 22:21:35.872372 > ACL data: handle 12 flags 0x02 dlen 16
    L2CAP(s): Connect rsp: dcid 0x0054 scid 0x0040 result 0 status 0
      Connection successful

Signed-off-by: Liang Bao <tim.bao@gmail.com>
Acked-by: Ville Tervo <ville.tervo@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-02 12:47:59 -02:00
Patrick McHardy
316ed38880 netfilter: ipset: add missing break statemtns in ip_set_get_ip_port()
Don't fall through in the switch statement, otherwise IPv4 headers
are incorrectly parsed again as IPv6 and the return value will always
be 'false'.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-02 09:31:37 +01:00
David S. Miller
123b9731b1 ipv4: Rename fib_hash_* locals in fib_semantics.c
To avoid confusion with the recently deleted fib_hash.c
code, use "fib_info_hash_*" instead of plain "fib_hash_*".

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-01 15:35:26 -08:00
David S. Miller
5348ba85a0 ipv4: Update some fib_hash centric interface names.
fib_hash_init() --> fib_trie_init()
fib_hash_table() --> fib_trie_table()

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-01 15:35:25 -08:00
David S. Miller
3630b7c050 ipv4: Remove fib_hash.
The time has finally come to remove the hash based routing table
implementation in ipv4.

FIB Trie is mature, well tested, and I've done an audit of it's code
to confirm that it implements insert, delete, and lookup with the same
identical semantics as fib_hash did.

If there are any semantic differences found in fib_trie, we should
simply fix them.

I've placed the trie statistic config option under advanced router
configuration.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
2011-02-01 15:35:25 -08:00
Simon Horman
ed3d1e7b72 IPVS: Remove ip_vs_sync_cleanup from section __exit
ip_vs_sync_cleanup() may be called from ip_vs_init() on error
and thus needs to be accesible from section __init

Reporte-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Tested-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 18:30:26 +01:00
Simon Horman
0443929ff0 IPVS: Allow compilation with CONFIG_SYSCTL disabled
This is a rather naieve approach to allowing PVS to compile with
CONFIG_SYSCTL disabled.  I am working on a more comprehensive patch which
will remove compilation of all sysctl-related IPVS code when CONFIG_SYSCTL
is disabled.

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Tested-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 18:29:04 +01:00
Simon Horman
258e958b85 IPVS: remove duplicate initialisation or rs_table
Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Tested-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 18:24:09 +01:00
Simon Horman
a870c8c5cb IPVS: use z modifier for sizeof() argument
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Tested-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 18:21:53 +01:00
Patrick McHardy
a00f1f3686 netfilter: ctnetlink: fix ctnetlink_parse_tuple() warning
net/netfilter/nf_conntrack_netlink.c: In function 'ctnetlink_parse_tuple':
net/netfilter/nf_conntrack_netlink.c:832:11: warning: comparison between 'enum ctattr_tuple' and 'enum ctattr_type'

Use ctattr_type for the 'type' parameter since that's the type of all attributes
passed to this function.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 17:26:37 +01:00
Patrick McHardy
582e1fc85c netfilter: ipset: remove unnecessary includes
None of the set types need uaccess.h since this is handled centrally
in ip_set_core. Most set types additionally don't need bitops.h and
spinlock.h since they use neither. tcp.h is only needed by those
using before(), udp.h is not needed at all.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 16:57:37 +01:00
Patrick McHardy
8da560ced5 netfilter: ipset: use nla_parse_nested()
Replace calls of the form:

nla_parse(tb, ATTR_MAX, nla_data(attr), nla_len(attr), policy)

by:

nla_parse_nested(tb, ATTR_MAX, attr, policy)

Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 16:27:25 +01:00
Pablo Neira Ayuso
3db7e93d33 netfilter: ecache: always set events bits, filter them later
For the following rule:

iptables -I PREROUTING -t raw -j CT --ctevents assured

The event delivered looks like the following:

 [UPDATE] tcp      6 src=192.168.0.2 dst=192.168.1.2 sport=37041 dport=80 src=192.168.1.2 dst=192.168.1.100 sport=80 dport=37041 [ASSURED]

Note that the TCP protocol state is not included. For that reason
the CT event filtering is not very useful for conntrackd.

To resolve this issue, instead of conditionally setting the CT events
bits based on the ctmask, we always set them and perform the filtering
in the late stage, just before the delivery.

Thus, the event delivered looks like the following:

 [UPDATE] tcp      6 432000 ESTABLISHED src=192.168.0.2 dst=192.168.1.2 sport=37041 dport=80 src=192.168.1.2 dst=192.168.1.100 sport=80 dport=37041 [ASSURED]

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 16:06:30 +01:00
Pablo Neira Ayuso
9d0db8b6b1 netfilter: arpt_mangle: fix return values of checkentry
In 135367b "netfilter: xtables: change xt_target.checkentry return type",
the type returned by checkentry was changed from boolean to int, but the
return values where not adjusted.

arptables: Input/output error

This broke arptables with the mangle target since it returns true
under success, which is interpreted by xtables as >0, thus
returning EIO.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 16:03:46 +01:00
Jozsef Kadlecsik
d956798d82 netfilter: xtables: "set" match and "SET" target support
The patch adds the combined module of the "SET" target and "set" match
to netfilter. Both the previous and the current revisions are supported.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 15:56:00 +01:00
Jozsef Kadlecsik
f830837f0e netfilter: ipset: list:set set type support
The module implements the list:set type support in two flavours:
without and with timeout. The sets has two sides: for the userspace,
they store the names of other (non list:set type of) sets: one can add,
delete and test set names. For the kernel, it forms an ordered union of
the member sets: the members sets are tried in order when elements are
added, deleted and tested and the process stops at the first success.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 15:54:59 +01:00
Jozsef Kadlecsik
21f45020a3 netfilter: ipset: hash:net,port set type support
The module implements the hash:net,port type support in four flavours:
for IPv4 and IPv6, both without and with timeout support. The elements
are two dimensional: IPv4/IPv6 network address/prefix and protocol/port
pairs.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 15:53:55 +01:00
Jozsef Kadlecsik
b38370299e netfilter: ipset: hash:net set type support
The module implements the hash:net type support in four flavours:
for IPv4 and IPv6, both without and with timeout support. The elements
are one dimensional: IPv4/IPv6 network address/prefixes.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 15:52:54 +01:00
Jozsef Kadlecsik
41d22f7b2e netfilter: ipset: hash:ip,port,net set type support
The module implements the hash:ip,port,net type support in four flavours:
for IPv4 and IPv6, both without and with timeout support. The elements
are three dimensional: IPv4/IPv6 address, protocol/port and IPv4/IPv6
network address/prefix triples. The different prefixes are searched/matched
from the longest prefix to the shortes one (most specific to least).
In other words the processing time linearly grows with the number of
different prefixes in the set.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 15:51:00 +01:00
Jozsef Kadlecsik
5663bc30e6 netfilter: ipset: hash:ip,port,ip set type support
The module implements the hash:ip,port,ip type support in four flavours:
for IPv4 and IPv6, both without and with timeout support. The elements
are three dimensional: IPv4/IPv6 address, protocol/port and IPv4/IPv6
address triples.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 15:41:26 +01:00
Jozsef Kadlecsik
07896ed37b netfilter: ipset: hash:ip,port set type support
The module implements the hash:ip,port type support in four flavours:
for IPv4 and IPv6, both without and with timeout support. The elements
are two dimensional: IPv4/IPv6 address and protocol/port pairs. The port
is interpeted for TCP, UPD, ICMP and ICMPv6 (at the latters as type/code
of course).

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 15:39:52 +01:00
Jozsef Kadlecsik
6c02788969 netfilter: ipset: hash:ip set type support
The module implements the hash:ip type support in four flavours:
for IPv4 or IPv6, both without and with timeout support.

All the hash types are based on the "array hash" or ahash structure
and functions as a good compromise between minimal memory footprint
and speed. The hashing uses arrays to resolve clashes. The hash table
is resized (doubled) when searching becomes too long. Resizing can be
triggered by userspace add commands only and those are serialized by
the nfnl mutex. During resizing the set is read-locked, so the only
possible concurrent operations are the kernel side readers. Those are
protected by RCU locking.

Because of the four flavours and the other hash types, the functions
are implemented in general forms in the ip_set_ahash.h header file
and the real functions are generated before compiling by macro expansion.
Thus the dereferencing of low-level functions and void pointer arguments
could be avoided: the low-level functions are inlined, the function
arguments are pointers of type-specific structures.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 15:38:36 +01:00
Jozsef Kadlecsik
543261907d netfilter: ipset; bitmap:port set type support
The module implements the bitmap:port type in two flavours, without
and with timeout support to store TCP/UDP ports from a range.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 15:37:04 +01:00
Jozsef Kadlecsik
de76021a1b netfilter: ipset: bitmap:ip,mac type support
The module implements the bitmap:ip,mac set type in two flavours,
without and with timeout support. In this kind of set one can store
IPv4 address and (source) MAC address pairs. The type supports elements
added without the MAC part filled out: when the first matching from kernel
happens, the MAC part is automatically filled out. The timing out of the
elements stars when an element is complete in the IP,MAC pair.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 15:35:12 +01:00
Jozsef Kadlecsik
72205fc68b netfilter: ipset: bitmap:ip set type support
The module implements the bitmap:ip set type in two flavours, without
and with timeout support. In this kind of set one can store IPv4
addresses (or network addresses) from a given range.

In order not to waste memory, the timeout version does not rely on
the kernel timer for every element to be timed out but on garbage
collection. All set types use this mechanism.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 15:33:17 +01:00
Jozsef Kadlecsik
a7b4f989a6 netfilter: ipset: IP set core support
The patch adds the IP set core support to the kernel.

The IP set core implements a netlink (nfnetlink) based protocol by which
one can create, destroy, flush, rename, swap, list, save, restore sets,
and add, delete, test elements from userspace. For simplicity (and backward
compatibilty and for not to force ip(6)tables to be linked with a netlink
library) reasons a small getsockopt-based protocol is also kept in order
to communicate with the ip(6)tables match and target.

The netlink protocol passes all u16, etc values in network order with
NLA_F_NET_BYTEORDER flag. The protocol enforces the proper use of the
NLA_F_NESTED and NLA_F_NET_BYTEORDER flags.

For other kernel subsystems (netfilter match and target) the API contains
the functions to add, delete and test elements in sets and the required calls
to get/put refereces to the sets before those operations can be performed.

The set types (which are implemented in independent modules) are stored
in a simple RCU protected list. A set type may have variants: for example
without timeout or with timeout support, for IPv4 or for IPv6. The sets
(i.e. the pointers to the sets) are stored in an array. The sets are
identified by their index in the array, which makes possible easy and
fast swapping of sets. The array is protected indirectly by the nfnl
mutex from nfnetlink. The content of the sets are protected by the rwlock
of the set.

There are functional differences between the add/del/test functions
for the kernel and userspace:

- kernel add/del/test: works on the current packet (i.e. one element)
- kernel test: may trigger an "add" operation  in order to fill
  out unspecified parts of the element from the packet (like MAC address)
- userspace add/del: works on the netlink message and thus possibly
  on multiple elements from the IPSET_ATTR_ADT container attribute.
- userspace add: may trigger resizing of a set

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 15:28:35 +01:00
Tejun Heo
c534a107e8 rds/ib: use system_wq instead of rds_ib_fmr_wq
With cmwq, there's no reason to use dedicated rds_ib_fmr_wq - it's not
in the memory reclaim path and the maximum number of concurrent work
items is bound by the number of devices.  Drop it and use system_wq
instead.  This rds_ib_fmr_init/exit() noops.  Both removed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Andy Grover <andy.grover@oracle.com>
2011-02-01 11:42:43 +01:00
Tejun Heo
aa70c585b1 net/9p: replace p9_poll_task with a work
Now that cmwq can handle high concurrency, it's more efficient to use
work than a dedicated kthread.  Convert p9_poll_proc() to a work
function for p9_poll_work and make p9_pollwake() schedule it on each
poll event.  The work is sync flushed on module exit.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: v9fs-developer@lists.sourceforge.net
2011-02-01 11:42:43 +01:00
Tejun Heo
61edeeed91 net/9p: use system_wq instead of p9_mux_wq
With cmwq, there's no reason to use a dedicated workqueue in trans_fd.
Drop p9_mux_wq and use system_wq instead.  The used work items are
already sync canceled in p9_conn_destroy() and doesn't require further
synchronization.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: v9fs-developer@lists.sourceforge.net
2011-02-01 11:42:43 +01:00
Eric W. Biederman
bf36076a67 net: Fix ipv6 neighbour unregister_sysctl_table warning
In my testing of 2.6.37 I was occassionally getting a warning about
sysctl table entries being unregistered in the wrong order.  Digging
in it turns out this dates back to the last great sysctl reorg done
where Al Viro introduced the requirement that sysctl directories
needed to be created before and destroyed after the files in them.

It turns out that in that great reorg /proc/sys/net/ipv6/neigh was
overlooked.  So this patch fixes that oversight and makes an annoying
warning message go away.

>------------[ cut here ]------------
>WARNING: at kernel/sysctl.c:1992 unregister_sysctl_table+0x134/0x164()
>Pid: 23951, comm: kworker/u:3 Not tainted 2.6.37-350888.2010AroraKernelBeta.fc14.x86_64 #1
>Call Trace:
> [<ffffffff8103e034>] warn_slowpath_common+0x80/0x98
> [<ffffffff8103e061>] warn_slowpath_null+0x15/0x17
> [<ffffffff810452f8>] unregister_sysctl_table+0x134/0x164
> [<ffffffff810e7834>] ? kfree+0xc4/0xd1
> [<ffffffff813439b2>] neigh_sysctl_unregister+0x22/0x3a
> [<ffffffffa02cd14e>] addrconf_ifdown+0x33f/0x37b [ipv6]
> [<ffffffff81331ec2>] ? skb_dequeue+0x5f/0x6b
> [<ffffffffa02ce4a5>] addrconf_notify+0x69b/0x75c [ipv6]
> [<ffffffffa02eb953>] ? ip6mr_device_event+0x98/0xa9 [ipv6]
> [<ffffffff813d2413>] notifier_call_chain+0x32/0x5e
> [<ffffffff8105bdea>] raw_notifier_call_chain+0xf/0x11
> [<ffffffff8133cdac>] call_netdevice_notifiers+0x45/0x4a
> [<ffffffff8133d2b0>] rollback_registered_many+0x118/0x201
> [<ffffffff8133d3af>] unregister_netdevice_many+0x16/0x6d
> [<ffffffff8133d571>] default_device_exit_batch+0xa4/0xb8
> [<ffffffff81337c42>] ? cleanup_net+0x0/0x194
> [<ffffffff81337a2a>] ops_exit_list+0x4e/0x56
> [<ffffffff81337d36>] cleanup_net+0xf4/0x194
> [<ffffffff81053318>] process_one_work+0x187/0x280
> [<ffffffff8105441b>] worker_thread+0xff/0x19f
> [<ffffffff8105431c>] ? worker_thread+0x0/0x19f
> [<ffffffff8105776d>] kthread+0x7d/0x85
> [<ffffffff81003824>] kernel_thread_helper+0x4/0x10
> [<ffffffff810576f0>] ? kthread+0x0/0x85
> [<ffffffff81003820>] ? kernel_thread_helper+0x0/0x10
>---[ end trace 8a7e9310b35e9486 ]---

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-31 20:54:17 -08:00
Tom Herbert
8587523640 net: Check rps_flow_table when RPS map length is 1
In get_rps_cpu, add check that the rps_flow_table for the device is
NULL when trying to take fast path when RPS map length is one.
Without this, RFS is effectively disabled if map length is one which
is not correct.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-31 16:23:42 -08:00
David S. Miller
0c838ff1ad ipv4: Consolidate all default route selection implementations.
Both fib_trie and fib_hash have a local implementation of
fib_table_select_default().  This is completely unnecessary
code duplication.

Since we now remember the fib_table and the head of the fib
alias list of the default route, we can implement one single
generic version of this routine.

Looking at the fib_hash implementation you may get the impression
that it's possible for there to be multiple top-level routes in
the table for the default route.  The truth is, it isn't, the
insert code will only allow one entry to exist in the zero
prefix hash table, because all keys evaluate to zero and all
keys in a hash table must be unique.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-31 16:16:50 -08:00
David S. Miller
5b4704419c ipv4: Remember FIB alias list head and table in lookup results.
This will be used later to implement fib_select_default() in a
completely generic manner, instead of the current situation where the
default route is re-looked up in the TRIE/HASH table and then the
available aliases are analyzed.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-31 16:10:03 -08:00
Linus Torvalds
0fd08c5545 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: NFSv4 readdir loses entries
  NFS: Micro-optimize nfs4_decode_dirent()
  NFS: Fix an NFS client lockdep issue
  NFS construct consistent co_ownerid for v4.1
  NFS: nfs_wcc_update_inode() should set nfsi->attr_gencount
  NFS improve pnfs_put_deviceid_cache debug print
  NFS fix cb_sequence error processing
  NFS do not find client in NFSv4 pg_authenticate
  NLM: Fix "kernel BUG at fs/lockd/host.c:417!" or ".../host.c:283!"
  NFS: Prevent memory allocation failure in nfsacl_encode()
  NFS: nfsacl_{encode,decode} should return signed integer
  NFS: Fix "kernel BUG at fs/nfs/nfs3xdr.c:1338!"
  NFS: Fix "kernel BUG at fs/aio.c:554!"
  NFS4: Avoid potential NULL pointer dereference in decode_and_add_ds().
  NFS: fix handling of malloc failure during nfs_flush_multi()
2011-02-01 09:41:02 +10:00
David S. Miller
a5e3c2aae2 Merge branch 'batman-adv/next' of git://git.open-mesh.org/ecsv/linux-merge 2011-01-31 13:24:56 -08:00
Roland Dreier
ec831ea72e net: Add default_mtu() methods to blackhole dst_ops
When an IPSEC SA is still being set up, __xfrm_lookup() will return
-EREMOTE and so ip_route_output_flow() will return a blackhole route.
This can happen in a sndmsg call, and after d33e455337 ("net: Abstract
default MTU metric calculation behind an accessor.") this leads to a
crash in ip_append_data() because the blackhole dst_ops have no
default_mtu() method and so dst_mtu() calls a NULL pointer.

Fix this by adding default_mtu() methods (that simply return 0, matching
the old behavior) to the blackhole dst_ops.

The IPv4 part of this patch fixes a crash that I saw when using an IPSEC
VPN; the IPv6 part is untested because I don't have an IPv6 VPN, but it
looks to be needed as well.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-31 13:16:00 -08:00
David S. Miller
5403c8a295 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2011-01-31 13:13:24 -08:00
David S. Miller
c79b9e4936 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2011-01-31 12:31:24 -08:00
Rajkumar Manoharan
8c7914dec2 mac80211: disable power save if an infra AP vif exists
PS should not be enabled if an infra AP vif exists in
the interface list. So while recalculating PS,
AP vif type should be taken into account.

Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Rajkumar Manoharan <rmanoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-31 15:06:26 -05:00
Sven Eckelmann
64afe35398 batman-adv: Update copyright years
Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-31 14:57:12 +01:00
Sven Eckelmann
1299bdaa1c batman-adv: Remove unused variables
Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-31 14:57:12 +01:00
Sven Eckelmann
fb86d7648f batman-adv: Remove declaration of batman_skb_recv
batman_skb_recv can be defined in hard-interface.c as static because it is
never used outside of that file.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-31 14:57:11 +01:00
Sven Eckelmann
335f94c981 batman-adv: Remove unused definitions
Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-31 14:57:10 +01:00
Sven Eckelmann
633979b43f batman-adv: Remove dangling declaration of hash_remove_element
Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-31 14:57:10 +01:00
Simon Wunderlich
74ef115359 batman-adv: remove unused parameters
Some function parameters are obsolete now and can be removed.

Reported-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-31 14:57:09 +01:00
Sven Eckelmann
ae361ce19f batman-adv: Calculate correct size for merged packets
The routing algorithm must be able to decide if a fragment can be merged with
the missing part and still be passed to a forwarding interface. The fragments
can only differ by one byte in case that the original payload had an uneven
length. In that situation the sender has to inform all possible receivers that
the tail is one byte longer using the flag UNI_FRAG_LARGETAIL.

The combination of UNI_FRAG_LARGETAIL and UNI_FRAG_HEAD flag makes it possible
to calculate the correct length for even and uneven sized payloads.

The original formula missed to add the unicast header at all and forgot to
remove the fragment header of the second fragment. This made the results highly
unreliable and only useful for machines with large differences between the
configured MTUs.

Reported-by: Russell Senior <russell@personaltelco.net>
Reported-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-31 14:57:08 +01:00
Sven Eckelmann
5c77d8bb8a batman-adv: Create roughly equal sized fragments
The routing algorithm must know how large two fragments are to be able to
decide that it is safe to merge them or if it should resubmit without waiting
for the second part. When these two fragments have a too different size, it is
not possible to guess right in every situation.

The user could easily configure the MTU of the attached cards so that one
fragment is forwarded and the other one is added to the fragments table to wait
for the missing part.

For even sized packets, it is possible to split it so that the resulting
packages are equal sized by ignoring the old non-fragment header at the
beginning of the original packet.

This still creates different sized fragments for uneven sized packets.

Reported-by: Russell Senior <russell@personaltelco.net>
Reported-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-31 14:57:08 +01:00
David S. Miller
81c2bdb688 Merge branch 'batman-adv/merge-oopsonly' of git://git.open-mesh.org/ecsv/linux-merge 2011-01-30 22:16:34 -08:00
Sven Eckelmann
1181e1daac batman-adv: Make vis info stack traversal threadsafe
The batman-adv vis server has to a stack which stores all information
about packets which should be send later. This stack is protected
with a spinlock that is used to prevent concurrent write access to it.

The send_vis_packets function has to take all elements from the stack
and send them to other hosts over the primary interface. The send will
be initiated without the lock which protects the stack.

The implementation using list_for_each_entry_safe has the problem that
it stores the next element as "safe ptr" to allow the deletion of the
current element in the list. The list may be modified during the
unlock/lock pair in the loop body which may make the safe pointer
not pointing to correct next element.

It is safer to remove and use the first element from the stack until no
elements are available. This does not need reduntant information which
would have to be validated each time the lock was removed.

Reported-by: Russell Senior <russell@personaltelco.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-30 10:32:08 +01:00
Sven Eckelmann
dda9fc6b2c batman-adv: Remove vis info element in free_info
The free_info function will be called when no reference to the info
object exists anymore. It must be ensured that the allocated memory
gets freed and not only the elements which are managed by the info
object.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-30 10:32:06 +01:00
Sven Eckelmann
2674c15870 batman-adv: Remove vis info on hashing errors
A newly created vis info object must be removed when it couldn't be
added to the hash. The old_info which has to be replaced was already
removed and isn't related to the hash anymore.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-30 10:32:02 +01:00
Eric W. Biederman
709b46e8d9 net: Add compat ioctl support for the ipv4 multicast ioctl SIOCGETSGCNT
SIOCGETSGCNT is not a unique ioctl value as it it maps tio SIOCPROTOPRIVATE +1,
which unfortunately means the existing infrastructure for compat networking
ioctls is insufficient.  A trivial compact ioctl implementation would conflict
with:

SIOCAX25ADDUID
SIOCAIPXPRISLT
SIOCGETSGCNT_IN6
SIOCGETSGCNT
SIOCRSSCAUSE
SIOCX25SSUBSCRIP
SIOCX25SDTEFACILITIES

To make this work I have updated the compat_ioctl decode path to mirror the
the normal ioctl decode path.  I have added an ipv4 inet_compat_ioctl function
so that I can have ipv4 specific compat ioctls.   I have added a compat_ioctl
function into struct proto so I can break out ioctls by which kind of ip socket
I am using.  I have added a compat_raw_ioctl function because SIOCGETSGCNT only
works on raw sockets.  I have added a ipmr_compat_ioctl that mirrors the normal
ipmr_ioctl.

This was necessary because unfortunately the struct layout for the SIOCGETSGCNT
has unsigned longs in it so changes between 32bit and 64bit kernels.

This change was sufficient to run a 32bit ip multicast routing daemon on a
64bit kernel.

Reported-by: Bill Fenner <fenner@aristanetworks.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-30 01:14:38 -08:00
Eric W. Biederman
13ad17745c net: Fix ip link add netns oops
Ed Swierk <eswierk@bigswitch.com> writes:
> On 2.6.35.7
>  ip link add link eth0 netns 9999 type macvlan
> where 9999 is a nonexistent PID triggers an oops and causes all network functions to hang:
> [10663.821898] BUG: unable to handle kernel NULL pointer dereference at 000000000000006d
>  [10663.821917] IP: [<ffffffff8149c2fa>] __dev_alloc_name+0x9a/0x170
>  [10663.821933] PGD 1d3927067 PUD 22f5c5067 PMD 0
>  [10663.821944] Oops: 0000 [#1] SMP
>  [10663.821953] last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
>  [10663.821959] CPU 3
>  [10663.821963] Modules linked in: macvlan ip6table_filter ip6_tables rfcomm ipt_MASQUERADE binfmt_misc iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack sco ipt_REJECT bnep l2cap xt_tcpudp iptable_filter ip_tables x_tables bridge stp vboxnetadp vboxnetflt vboxdrv kvm_intel kvm parport_pc ppdev snd_hda_codec_intelhdmi snd_hda_codec_conexant arc4 iwlagn iwlcore mac80211 snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi i915 snd_seq_midi_event snd_seq thinkpad_acpi drm_kms_helper btusb tpm_tis nvram uvcvideo snd_timer snd_seq_device bluetooth videodev v4l1_compat v4l2_compat_ioctl32 tpm drm tpm_bios snd cfg80211 psmouse serio_raw intel_ips soundcore snd_page_alloc intel_agp i2c_algo_bit video output netconsole configfs lp parport usbhid hid e1000e sdhci_pci ahci libahci sdhci led_class
>  [10663.822155]
>  [10663.822161] Pid: 6000, comm: ip Not tainted 2.6.35-23-generic #41-Ubuntu 2901CTO/2901CTO
>  [10663.822167] RIP: 0010:[<ffffffff8149c2fa>] [<ffffffff8149c2fa>] __dev_alloc_name+0x9a/0x170
>  [10663.822177] RSP: 0018:ffff88014aebf7b8 EFLAGS: 00010286
>  [10663.822182] RAX: 00000000fffffff4 RBX: ffff8801ad900800 RCX: 0000000000000000
>  [10663.822187] RDX: ffff880000000000 RSI: 0000000000000000 RDI: ffff88014ad63000
>  [10663.822191] RBP: ffff88014aebf808 R08: 0000000000000041 R09: 0000000000000041
>  [10663.822196] R10: 0000000000000000 R11: dead000000200200 R12: ffff88014aebf818
>  [10663.822201] R13: fffffffffffffffd R14: ffff88014aebf918 R15: ffff88014ad62000
>  [10663.822207] FS: 00007f00c487f700(0000) GS:ffff880001f80000(0000) knlGS:0000000000000000
>  [10663.822212] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  [10663.822216] CR2: 000000000000006d CR3: 0000000231f19000 CR4: 00000000000026e0
>  [10663.822221] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  [10663.822226] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>  [10663.822231] Process ip (pid: 6000, threadinfo ffff88014aebe000, task ffff88014afb16e0)
>  [10663.822236] Stack:
>  [10663.822240] ffff88014aebf808 ffffffff814a2bb5 ffff88014aebf7e8 00000000a00ee8d6
>  [10663.822251] <0> 0000000000000000 ffffffffa00ef940 ffff8801ad900800 ffff88014aebf818
>  [10663.822265] <0> ffff88014aebf918 ffff8801ad900800 ffff88014aebf858 ffffffff8149c413
>  [10663.822281] Call Trace:
>  [10663.822290] [<ffffffff814a2bb5>] ? dev_addr_init+0x75/0xb0
>  [10663.822298] [<ffffffff8149c413>] dev_alloc_name+0x43/0x90
>  [10663.822307] [<ffffffff814a85ee>] rtnl_create_link+0xbe/0x1b0
>  [10663.822314] [<ffffffff814ab2aa>] rtnl_newlink+0x48a/0x570
>  [10663.822321] [<ffffffff814aafcc>] ? rtnl_newlink+0x1ac/0x570
>  [10663.822332] [<ffffffff81030064>] ? native_x2apic_icr_read+0x4/0x20
>  [10663.822339] [<ffffffff814a8c17>] rtnetlink_rcv_msg+0x177/0x290
>  [10663.822346] [<ffffffff814a8aa0>] ? rtnetlink_rcv_msg+0x0/0x290
>  [10663.822354] [<ffffffff814c25d9>] netlink_rcv_skb+0xa9/0xd0
>  [10663.822360] [<ffffffff814a8a85>] rtnetlink_rcv+0x25/0x40
>  [10663.822367] [<ffffffff814c223e>] netlink_unicast+0x2de/0x2f0
>  [10663.822374] [<ffffffff814c303e>] netlink_sendmsg+0x1fe/0x2e0
>  [10663.822383] [<ffffffff81488533>] sock_sendmsg+0xf3/0x120
>  [10663.822391] [<ffffffff815899fe>] ? _raw_spin_lock+0xe/0x20
>  [10663.822400] [<ffffffff81168656>] ? __d_lookup+0x136/0x150
>  [10663.822406] [<ffffffff815899fe>] ? _raw_spin_lock+0xe/0x20
>  [10663.822414] [<ffffffff812b7a0d>] ? _atomic_dec_and_lock+0x4d/0x80
>  [10663.822422] [<ffffffff8116ea90>] ? mntput_no_expire+0x30/0x110
>  [10663.822429] [<ffffffff81486ff5>] ? move_addr_to_kernel+0x65/0x70
>  [10663.822435] [<ffffffff81493308>] ? verify_iovec+0x88/0xe0
>  [10663.822442] [<ffffffff81489020>] sys_sendmsg+0x240/0x3a0
> [10663.822450] [<ffffffff8111e2a9>] ? __do_fault+0x479/0x560
>  [10663.822457] [<ffffffff815899fe>] ? _raw_spin_lock+0xe/0x20
>  [10663.822465] [<ffffffff8116cf4a>] ? alloc_fd+0x10a/0x150
>  [10663.822473] [<ffffffff8158d76e>] ? do_page_fault+0x15e/0x350
>  [10663.822482] [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b
>  [10663.822487] Code: 90 48 8d 78 02 be 25 00 00 00 e8 92 1d e2 ff 48 85 c0 75 cf bf 20 00 00 00 e8 c3 b1 c6 ff 49 89 c7 b8 f4 ff ff ff 4d 85 ff 74 bd <4d> 8b 75 70 49 8d 45 70 48 89 45 b8 49 83 ee 58 eb 28 48 8d 55
>  [10663.822618] RIP [<ffffffff8149c2fa>] __dev_alloc_name+0x9a/0x170
>  [10663.822627] RSP <ffff88014aebf7b8>
>  [10663.822631] CR2: 000000000000006d
>  [10663.822636] ---[ end trace 3dfd6c3ad5327ca7 ]---

This bug was introduced in:
commit 81adee47df
Author: Eric W. Biederman <ebiederm@aristanetworks.com>
Date:   Sun Nov 8 00:53:51 2009 -0800

    net: Support specifying the network namespace upon device creation.

    There is no good reason to not support userspace specifying the
    network namespace during device creation, and it makes it easier
    to create a network device and pass it to a child network namespace
    with a well known name.

    We have to be careful to ensure that the target network namespace
    for the new device exists through the life of the call.  To keep
    that logic clear I have factored out the network namespace grabbing
    logic into rtnl_link_get_net.

    In addtion we need to continue to pass the source network namespace
    to the rtnl_link_ops.newlink method so that we can find the base
    device source network namespace.

    Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
    Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Where apparently I forgot to add error handling to the path where we create
a new network device in a new network namespace, and pass in an invalid pid.

Cc: stable@kernel.org
Reported-by: Ed Swierk <eswierk@bigswitch.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-30 01:14:15 -08:00
Herbert Xu
66c46d741e gro: Reset dev pointer on reuse
On older kernels the VLAN code may zero skb->dev before dropping
it and causing it to be reused by GRO.

Unfortunately we didn't reset skb->dev in that case which causes
the next GRO user to get a bogus skb->dev pointer.

This particular problem no longer happens with the current upstream
kernel due to changes in VLAN processing.

However, for correctness we should still reset the skb->dev pointer
in the GRO reuse function in case a future user does the same thing.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-29 22:36:24 -08:00
David S. Miller
b8dad61cc7 ipv4: If fib metrics are default, no need to grab ref to FIB info.
The fib metric memory in this case is static in the kernel image,
so we don't need to reference count it since it's never going
to go away on us.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-28 14:07:16 -08:00
David S. Miller
725d1e1b45 ipv4: Attach FIB info to dst_default_metrics when possible
If there are no explicit metrics attached to a route, hook
fi->fib_info up to dst_default_metrics.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-28 14:05:05 -08:00
David S. Miller
9c150e82ac ipv4: Allocate fib metrics dynamically.
This is the initial gateway towards super-sharing metrics
if they are all set to zero for a route.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-28 14:01:25 -08:00
John W. Linville
3e11210d46 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
Conflicts:
	drivers/net/wireless/ath/ath9k/init.c
2011-01-28 16:23:14 -05:00
Julia Lawall
efe1cf0c57 net/wireless/nl80211.c: Avoid call to genlmsg_cancel
genlmsg_cancel subtracts some constants from its second argument before
calling nlmsg_cancel.  nlmsg_cancel then calls nlmsg_trim on the same
arguments.  nlmsg_trim tests for NULL before doing any computation, but a
NULL second argument to genlmsg_cancel is no longer NULL due to the initial
subtraction.  Nothing else happens in this execution, so the call to
genlmsg_cancel is simply unnecessary in this case.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression data;
@@

if (data == NULL) { ...
* genlmsg_cancel(..., data);
  ...
  return ...;
}
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-28 15:46:23 -05:00
Ben Greear
4914b3bb7f mac80211: Add sdata state and flags to debugfs.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-28 15:46:23 -05:00
Johannes Berg
6d744bacee mac80211: add MCS information to radiotap
This adds the MCS information we currently get
from the drivers into radiotap.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-28 15:44:29 -05:00
Juuso Oikarinen
45cbad6a12 cfg80211: Allow non-zero indexes for device specific pair-wise ciphers
Some vendor specific cipher suites require non-zero key indexes for pairwise
keys, but as of currently, the cfg80211 does not allow it.

As validating they cipher parameters for vendor specific cipher suites is the
job of the driver or hardware/firmware, change the cfg80211 to allow also
non-zero pairwise key indexes for vendor specific ciphers.

Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-28 15:44:27 -05:00
Thomas Jacob
6a4ddef2a3 netfilter: xt_iprange: add IPv6 match debug print code
Signed-off-by: Thomas Jacob <jacob@internet24.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-28 19:33:13 +01:00
David S. Miller
a4daad6b09 net: Pre-COW metrics for TCP.
TCP is going to record metrics for the connection,
so pre-COW the route metrics at route cache entry
creation time.

This avoids several atomic operations that have to
occur if we COW the metrics after the entry reaches
global visibility.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-27 22:01:53 -08:00
David S. Miller
8571a19c4a Merge branch 'master' of ssh://master.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2011-01-27 16:00:37 -08:00
Eric Dumazet
ccf434380d net: fix dev_seq_next()
Commit c6d14c8456 (net: Introduce for_each_netdev_rcu() iterator)
added a race in dev_seq_next().

The rcu_dereference() call should be done _before_ testing the end of
list, or we might return a wrong net_device if a concurrent thread
changes net_device list under us.

Note : discovered thanks to a sparse warning :

net/core/dev.c:3919:9: error: incompatible types in comparison expression
(different address spaces)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-27 15:02:56 -08:00
David S. Miller
065825402c net: Store ipv4/ipv6 COW'd metrics in inetpeer cache.
Please note that the IPSEC dst entry metrics keep using
the generic metrics COW'ing mechanism using kmalloc/kfree.

This gives the IPSEC routes an opportunity to use metrics
which are unique to their encapsulated paths.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-27 14:59:31 -08:00
David S. Miller
1397e171f1 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2011-01-27 14:59:08 -08:00
David S. Miller
8f2771f2b8 ipv6: Remove route peer binding assertions.
They are bogus.  The basic idea is that I wanted to make sure
that prefixed routes never bind to peers.

The test I used was whether RTF_CACHE was set.

But first of all, the RTF_CACHE flag is set at different spots
depending upon which ip6_rt_copy() caller you're talking about.

I've validated all of the code paths, and even in the future
where we bind peers more aggressively (for route metric COW'ing)
we never bind to prefix'd routes, only fully specified ones.
This even applies when addrconf or icmp6 routes are allocated.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-27 14:55:22 -08:00
Eric Dumazet
c2aa3665cf net: add kmemcheck annotation in __alloc_skb()
pskb_expand_head() triggers a kmemcheck warning when copy of
skb_shared_info is done in pskb_expand_head()

This is because destructor_arg field is not necessarily initialized at
this point. Add kmemcheck_annotate_variable() call in __alloc_skb() to
instruct kmemcheck this is a normal situation.

Resolves bugzilla.kernel.org 27212

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=27212
Reported-by: Christian Casteyde <casteyde.christian@free.fr>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-27 14:41:06 -08:00
Kurt Van Dijck
6d3a9a6854 net: fix validate_link_af in rtnetlink core
I'm testing an API that uses IFLA_AF_SPEC attribute.
In the rtnetlink core , the set_link_af() member
of the rtnl_af_ops struct receives the nested attribute
(as I expected), but the validate_link_af() member
receives the parent attribute.
IMO, this patch fixes this.

Signed-off-by: Kurt Van Dijck <kurt.van.dijck@eia.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-27 14:39:21 -08:00
Eric Dumazet
389f2a18c6 econet: remove compiler warnings
net/econet/af_econet.c: In function ‘econet_sendmsg’:
net/econet/af_econet.c:494: warning: label ‘error’ defined but not used
net/econet/af_econet.c:268: warning: unused variable ‘sk’

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Phil Blundell <philb@gnu.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-27 14:15:54 -08:00
David S. Miller
144001bddc inetpeer: Mark metrics as "new" in fresh inetpeer entries.
Set the RTAX_LOCKED metric to INETPEER_METRICS_NEW (basically,
all ones) on fresh inetpeer entries.

This way code can determine if default metrics have been loaded
in from a routing table entry already.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-27 13:52:16 -08:00
Thomas Jacob
705ca14717 netfilter: xt_iprange: typo in IPv4 match debug print code
Signed-off-by: Thomas Jacob <jacob@internet24.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-27 10:56:32 +01:00
David S. Miller
62fa8a846d net: Implement read-only protection and COW'ing of metrics.
Routing metrics are now copy-on-write.

Initially a route entry points it's metrics at a read-only location.
If a routing table entry exists, it will point there.  Else it will
point at the all zero metric place-holder called 'dst_default_metrics'.

The writeability state of the metrics is stored in the low bits of the
metrics pointer, we have two bits left to spare if we want to store
more states.

For the initial implementation, COW is implemented simply via kmalloc.
However future enhancements will change this to place the writable
metrics somewhere else, in order to increase sharing.  Very likely
this "somewhere else" will be the inetpeer cache.

Note also that this means that metrics updates may transiently fail
if we cannot COW the metrics successfully.

But even by itself, this patch should decrease memory usage and
increase cache locality especially for routing workloads.  In those
cases the read-only metric copies stay in place and never get written
to.

TCP workloads where metrics get updated, and those rare cases where
PMTU triggers occur, will take a very slight performance hit.  But
that hit will be alleviated when the long-term writable metrics
move to a more sharable location.

Since the metrics storage went from a u32 array of RTAX_MAX entries to
what is essentially a pointer, some retooling of the dst_entry layout
was necessary.

Most importantly, we need to preserve the alignment of the reference
count so that it doesn't share cache lines with the read-mostly state,
as per Eric Dumazet's alignment assertion checks.

The only non-trivial bit here is the move of the 'flags' member into
the writeable cacheline.  This is OK since we are always accessing the
flags around the same moment when we made a modification to the
reference count.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-26 20:51:05 -08:00
David S. Miller
b4e69ac670 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2011-01-26 13:49:30 -08:00
David S. Miller
7cc2edb834 xfrm6: Don't forget to propagate peer into ipsec route.
Like ipv4, we have to propagate the ipv6 route peer into
the ipsec top-level route during instantiation.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-26 13:41:03 -08:00
Johannes Berg
ba99d93b3d mac80211: use DECLARE_EVENT_CLASS
For events that include only the local struct as
their parameter, we can use DECLARE_EVENT_CLASS
and save quite some binary size across segments
as well lines of code.

   text	   data	    bss	    dec	    hex	filename
 375745	  19296	    916	 395957	  60ab5	mac80211.ko.before
 367473	  17888	    916	 386277	  5e4e5	mac80211.ko.after
  -8272   -1408       0   -9680   -25d0 delta

Some more tracepoints with identical arguments
could be combined like this but for now this is
the one that benefits most.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-26 16:15:45 -05:00
Eric Dumazet
144ce879b0 net_sched: sch_mqprio: dont leak kernel memory
mqprio_dump() should make sure all fields of struct tc_mqprio_qopt are
initialized.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-26 13:15:29 -08:00
David S. Miller
9b6941d8b1 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2011-01-26 11:49:49 -08:00
Patrick McHardy
2e0348c449 Merge branch 'connlimit' of git://dev.medozas.de/linux 2011-01-26 16:28:45 +01:00
Jan Engelhardt
ad86e1f27a netfilter: xt_connlimit: pick right dstaddr in NAT scenario
xt_connlimit normally records the "original" tuples in a hashlist
(such as "1.2.3.4 -> 5.6.7.8"), and looks in this list for iph->daddr
when counting.

When the user however uses DNAT in PREROUTING, looking for
iph->daddr -- which is now 192.168.9.10 -- will not match. Thus in
daddr mode, we need to record the reverse direction tuple
("192.168.9.10 -> 1.2.3.4") instead. In the reverse tuple, the dst
addr is on the src side, which is convenient, as count_them still uses
&conn->tuple.src.u3.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2011-01-26 13:01:39 +01:00
Linus Lüssing
dd58ddc692 batman-adv: Fix kernel panic when fetching vis data on a vis server
The hash_iterate removal introduced a bug leading to a kernel panic when
fetching the vis data on a vis server. That commit forgot to rename one
variable name, which this commit fixes now.

Reported-by: Russell Senior <russell@personaltelco.net>
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-25 23:58:33 +01:00
Jerry Chu
44f5324b5d TCP: fix a bug that triggers large number of TCP RST by mistake
This patch fixes a bug that causes TCP RST packets to be generated
on otherwise correctly behaved applications, e.g., no unread data
on close,..., etc. To trigger the bug, at least two conditions must
be met:

1. The FIN flag is set on the last data packet, i.e., it's not on a
separate, FIN only packet.
2. The size of the last data chunk on the receive side matches
exactly with the size of buffer posted by the receiver, and the
receiver closes the socket without any further read attempt.

This bug was first noticed on our netperf based testbed for our IW10
proposal to IETF where a large number of RST packets were observed.
netperf's read side code meets the condition 2 above 100%.

Before the fix, tcp_data_queue() will queue the last skb that meets
condition 1 to sk_receive_queue even though it has fully copied out
(skb_copy_datagram_iovec()) the data. Then if condition 2 is also met,
tcp_recvmsg() often returns all the copied out data successfully
without actually consuming the skb, due to a check
"if ((chunk = len - tp->ucopy.len) != 0) {"
and
"len -= chunk;"
after tcp_prequeue_process() that causes "len" to become 0 and an
early exit from the big while loop.

I don't see any reason not to free the skb whose data have been fully
consumed in tcp_data_queue(), regardless of the FIN flag.  We won't
get there if MSG_PEEK is on. Am I missing some arcane cases related
to urgent data?

Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-25 13:46:30 -08:00
Felix Fietkau
eb3e554b4b mac80211: fix a crash in ieee80211_beacon_get_tim on change_interface
Some drivers (e.g. ath9k) do not always disable beacons when they're
supposed to. When an interface is changed using the change_interface op,
the mode specific sdata part is in an undefined state and trying to
get a beacon at this point can produce weird crashes.

To fix this, add a check for ieee80211_sdata_running before using
anything from the sdata.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Cc: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-25 16:28:56 -05:00
Eric Dumazet
26ad787962 pktgen: speedup fragmented skbs
We spend lot of time clearing pages in pktgen.
(Or not clearing them on ipv6 and leaking kernel memory)

Since we dont modify them, we can use one zeroed page, and get
references on it. This page can use NUMA affinity as well.

Define pktgen_finalize_skb() helper, used both in ipv4 and ipv6

Results using skbs with one frag :

Before patch :

Result: OK: 608980458(c608978520+d1938) nsec, 1000000000
(100byte,1frags)
  1642088pps 1313Mb/sec (1313670400bps) errors: 0

After patch :

Result: OK: 345285014(c345283891+d1123) nsec, 1000000000
(100byte,1frags)
  2896158pps 2316Mb/sec (2316926400bps) errors: 0

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-25 13:26:05 -08:00
David S. Miller
73a8bd74e2 ipv6: Revert 'administrative down' address handling changes.
This reverts the following set of commits:

d1ed113f16 ("ipv6: remove duplicate neigh_ifdown")
29ba5fed1b ("ipv6: don't flush routes when setting loopback down")
9d82ca98f7 ("ipv6: fix missing in6_ifa_put in addrconf")
2de7957072 ("ipv6: addrconf: don't remove address state on ifdown if the address is being kept")
8595805aaf ("IPv6: only notify protocols if address is compeletely gone")
27bdb2abcc ("IPv6: keep tentative addresses in hash table")
93fa159abe ("IPv6: keep route for tentative address")
8f37ada5b5 ("IPv6: fix race between cleanup and add/delete address")
84e8b803f1 ("IPv6: addrconf notify when address is unavailable")
dc2b99f71e ("IPv6: keep permanent addresses on admin down")

because the core semantic change to ipv6 address handling on ifdown
has broken some things, in particular "disable_ipv6" sysctl handling.

Stephen has made several attempts to get things back in working order,
but nothing has restored disable_ipv6 fully yet.

Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Tested-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-25 12:49:08 -08:00
Andy Adamson
778be232a2 NFS do not find client in NFSv4 pg_authenticate
The information required to find the nfs_client cooresponding to the incoming
back channel request is contained in the NFS layer. Perform minimal checking
in the RPC layer pg_authenticate method, and push more detailed checking into
the NFS layer where the nfs_client can be found.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-25 15:26:51 -05:00
Sage Weil
42961d2333 libceph: fix socket write error handling
Pass errors from writing to the socket up the stack.  If we get -EAGAIN,
return 0 from the helper to simplify the callers' checks.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-25 08:19:34 -08:00
Sage Weil
98bdb0aa00 libceph: fix socket read error handling
If we get EAGAIN when trying to read from the socket, it is not an error.
Return 0 from the helper in this case to simplify the error handling cases
in the caller (indirectly, try_read).

Fix try_read to pass any error to it's caller (con_work) instead of almost
always returning 0.  This let's us respond to things like socket
disconnects.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-25 08:17:48 -08:00
Tejun Heo
ada609ee2a workqueue: use WQ_MEM_RECLAIM instead of WQ_RESCUER
WQ_RESCUER is now an internal flag and should only be used in the
workqueue implementation proper.  Use WQ_MEM_RECLAIM instead.

This doesn't introduce any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: dm-devel@redhat.com
Cc: Neil Brown <neilb@suse.de>
2011-01-25 14:35:54 +01:00
Changli Gao
9f4e1ccd80 netfilter: ipvs: fix compiler warnings
Fix compiler warnings when IP_VS_DBG() isn't defined.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Acked-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-25 23:17:51 +10:00
Vlad Dogaru
a512b92b3a net: add sysfs entry for device group
The group of a network device can be queried or changed from userspace
using sysfs.

For example, considering sysfs mounted in /sys, one can change the group
that interface lo belongs to:
	echo 1 > /sys/class/net/lo/group

Signed-off-by: Vlad Dogaru <ddvlad@rosedu.org>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24 23:23:28 -08:00
Eugene Teo
b7c7d01aae net: clear heap allocation for ethtool_get_regs()
There is a conflict between commit b00916b1 and a77f5db3. This patch resolves
the conflict by clearing the heap allocation in ethtool_get_regs().

Cc: stable@kernel.org
Signed-off-by: Eugene Teo <eugeneteo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24 21:05:17 -08:00
Hans Schillstrom
07924709f6 IPVS netns BUG, register sysctl for root ns
The newly created table was not used when register sysctl for a new namespace.
I.e. sysctl doesn't work for other than root namespace (init_net)

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-25 12:13:08 +10:00
David S. Miller
d80bc0fd26 ipv6: Always clone offlink routes.
Do not handle PMTU vs. route lookup creation any differently
wrt. offlink routes, always clone them.

Reported-by: PK <runningdoglackey@yahoo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24 16:01:58 -08:00
Michał Mirosław
acd1130e87 net: reduce and unify printk level in netdev_fix_features()
Reduce printk() levels to KERN_INFO in netdev_fix_features() as this will
be used by ethtool and might spam dmesg unnecessarily.

This converts the function to use netdev_info() instead of plain printk().

As a side effect, bonding and bridge devices will now log dropped features
on every slave device change.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24 15:45:15 -08:00
Michał Mirosław
04ed3e741d net: change netdev->features to u32
Quoting Ben Hutchings: we presumably won't be defining features that
can only be enabled on 64-bit architectures.

Occurences found by `grep -r` on net/, drivers/net, include/

[ Move features and vlan_features next to each other in
  struct netdev, as per Eric Dumazet's suggestion -DaveM ]

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24 15:32:47 -08:00
Michał Mirosław
57422dc530 net: Move check of checksum features to netdev_fix_features()
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24 15:29:11 -08:00
John Fastabend
3dce38a02d dcbnl: make get_app handling symmetric for IEEE and CEE DCBx
The IEEE get/set app handlers use generic routines and do not
require the net_device to implement the dcbnl_ops routines. This
patch makes it symmetric so user space and drivers do not have
to handle the CEE version and IEEE DCBx versions differently.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24 15:19:55 -08:00
Ben Hutchings
c445477d74 net: RPS: Enable hardware acceleration of RFS
Allow drivers for multiqueue hardware with flow filter tables to
accelerate RFS.  The driver must:

1. Set net_device::rx_cpu_rmap to a cpu_rmap of the RX completion
IRQs (in queue order).  This will provide a mapping from CPUs to the
queues for which completions are handled nearest to them.

2. Implement net_device_ops::ndo_rx_flow_steer.  This operation adds
or replaces a filter steering the given flow to the given RX queue, if
possible.

3. Periodically remove filters for which rps_may_expire_flow() returns
true.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24 14:53:01 -08:00
Eric Dumazet
fd0273c503 tcp: fix bug in listening_get_next()
commit a8b690f98b (tcp: Fix slowness in read /proc/net/tcp)
introduced a bug in handling of SYN_RECV sockets.

st->offset represents number of sockets found since beginning of
listening_hash[st->bucket].

We should not reset st->offset when iterating through
syn_table[st->sbucket], or else if more than ~25 sockets (if
PAGE_SIZE=4096) are in SYN_RECV state, we exit from listening_get_next()
with a too small st->offset

Next time we enter tcp_seek_last_pos(), we are not able to seek past
already found sockets.

Reported-by: PK <runningdoglackey@yahoo.com>
CC: Tom Herbert <therbert@google.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24 14:41:20 -08:00
David S. Miller
3408404a4c inetpeer: Use correct AVL tree base pointer in inet_getpeer().
Family was hard-coded to AF_INET but should be daddr->family.

This fixes crashes when unlinking ipv6 peer entries, since the
unlink code was looking up the base pointer properly.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24 14:38:09 -08:00
Michal Schmidt
d1dc7abf2f GRO: fix merging a paged skb after non-paged skbs
Suppose that several linear skbs of the same flow were received by GRO. They
were thus merged into one skb with a frag_list. Then a new skb of the same flow
arrives, but it is a paged skb with data starting in its frags[].

Before adding the skb to the frag_list skb_gro_receive() will of course adjust
the skb to throw away the headers. It correctly modifies the page_offset and
size of the frag, but it leaves incorrect information in the skb:
 ->data_len is not decreased at all.
 ->len is decreased only by headlen, as if no change were done to the frag.
Later in a receiving process this causes skb_copy_datagram_iovec() to return
-EFAULT and this is seen in userspace as the result of the recv() syscall.

In practice the bug can be reproduced with the sfc driver. By default the
driver uses an adaptive scheme when it switches between using
napi_gro_receive() (with skbs) and napi_gro_frags() (with pages). The bug is
reproduced when under rx load with enough successful GRO merging the driver
decides to switch from the former to the latter.

Manual control is also possible, so reproducing this is easy with netcat:
 - on machine1 (with sfc): nc -l 12345 > /dev/null
 - on machine2: nc machine1 12345 < /dev/zero
 - on machine1:
   echo 1 > /sys/module/sfc/parameters/rx_alloc_method  # use skbs
   echo 2 > /sys/module/sfc/parameters/rx_alloc_method  # use pages
 - See that nc has quit suddenly.

[v2: Modified by Eric Dumazet to avoid advancing skb->data past the end
     and to use a temporary variable.]

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24 14:27:18 -08:00
David S. Miller
5bdc22a565 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	net/sched/sch_hfsc.c
	net/sched/sch_htb.c
	net/sched/sch_tbf.c
2011-01-24 14:09:35 -08:00
David S. Miller
e92427b289 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 2011-01-24 13:17:06 -08:00
Eric Dumazet
c506653d35 net: arp_ioctl() must hold RTNL
Commit 941666c2e3 "net: RCU conversion of dev_getbyhwaddr() and
arp_ioctl()" introduced a regression, reported by Jamie Heilman.
"arp -Ds 192.168.2.41 eth0 pub" triggered the ASSERT_RTNL() assert
in pneigh_lookup()

Removing RTNL requirement from arp_ioctl() was a mistake, just revert
that part.

Reported-by: Jamie Heilman <jamie@audible.transient.net>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24 13:16:16 -08:00
Thomas Jacob
08b5194b5d netfilter: xt_iprange: Incorrect xt_iprange boundary check for IPv6
iprange_ipv6_sub was substracting 2 unsigned ints and then casting
the result to int to find out whether they are lt, eq or gt each
other, this doesn't work if the full 32 bits of each part
can be used in IPv6 addresses. Patch should remedy that without
significant performance penalties. Also number of ntohl
calls can be reduced this way (Jozsef Kadlecsik).

Signed-off-by: Thomas Jacob <jacob@internet24.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-24 21:35:36 +01:00
Pablo Neira Ayuso
c71caf4114 netfilter: ctnetlink: fix missing refcount increment during dumps
In 13ee6ac netfilter: fix race in conntrack between dump_table and
destroy, we recovered spinlocks to protect the dump of the conntrack
table according to reports from Stephen and acknowledgments on the
issue from Eric.

In that patch, the refcount bump that allows to keep a reference
to the current ct object was removed. However, we still decrement
the refcount for that object in the output path of
ctnetlink_dump_table():

        if (last)
                nf_ct_put(last)

Cc: Stephen Hemminger <stephen.hemminger@vyatta.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-24 19:01:07 +01:00
Rusty Russell
577d6a7c3a module: fix missing semicolons in MODULE macro usage
You always needed them when you were a module, but the builtin versions
of the macros used to be more lenient.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-01-24 14:32:54 +10:30
Simon Horman
4b3fd57138 IPVS: Change sock_create_kernel() to __sock_create()
The recent netns changes omitted to change
sock_create_kernel() to __sock_create() in ip_vs_sync.c

The effect of this is that the interface will be selected in the
root-namespace, from my point of view it's a major bug.

Reported-by: Hans Schillstrom <hans@schillstrom.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-22 13:48:01 +11:00
Changli Gao
091bb34c14 netfilter: ipvs: fix compiler warnings
Fix compiler warnings when no transport protocol load balancing support
is configured.

[horms@verge.net.au: removed suprious __ip_vs_cleanup() clean-up hunk]
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-22 13:19:36 +11:00
Eric Dumazet
23624935e0 net_sched: TCQ_F_CAN_BYPASS generalization
Now qdisc stab is handled before TCQ_F_CAN_BYPASS test in
__dev_xmit_skb(), we can generalize TCQ_F_CAN_BYPASS to other qdiscs
than pfifo_fast : pfifo, bfifo, pfifo_head_drop and sfq

SFQ is special because it can have external classifiers, and in these
cases, we cannot bypass queue discipline (packet could be dropped by
classifier) without admin asking it, or further changes.

Its worth doing this, especially for SFQ, avoiding dirtying memory in
case no packets are already waiting in queue.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-21 16:26:09 -08:00
Eric Dumazet
bb134d2298 net: netif_setup_tc() is static
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-21 13:08:27 -08:00
Bruno Randolf
59eb21a650 cfg80211: Extend channel to frequency mapping for 802.11j
Extend channel to frequency mapping for 802.11j Japan 4.9GHz band, according to
IEEE802.11 section 17.3.8.3.2 and Annex J. Because there are now overlapping
channel numbers in the 2GHz and 5GHz band we can't map from channel to
frequency without knowing the band. This is no problem as in most contexts we
know the band. In places where we don't know the band (and WEXT compatibility)
we assume the 2GHz band for channels below 14.

This patch does not implement all channel to frequency mappings defined in
802.11, it's just an extension for 802.11j 20MHz channels. 5MHz and 10MHz
channels as well as 802.11y channels have been omitted.

The following drivers have been updated to reflect the API changes:
iwl-3945, iwl-agn, iwmc3200wifi, libertas, mwl8k, rt2x00, wl1251, wl12xx.
The drivers have been compile-tested only.

Signed-off-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: Brian Prodoehl <bprodoehl@gmail.com>
Acked-by: Luciano Coelho <coelho@ti.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-21 15:34:17 -05:00
Ben Greear
b305dae488 mac80211: Fix skb-copy failure debug message.
This particular error isn't about multicast.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-21 15:32:21 -05:00
Eric Dumazet
9190b3b320 net_sched: accurate bytes/packets stats/rates
In commit 44b8288308 (net_sched: pfifo_head_drop problem), we fixed
a problem with pfifo_head drops that incorrectly decreased
sch->bstats.bytes and sch->bstats.packets

Several qdiscs (CHOKe, SFQ, pfifo_head, ...) are able to drop a
previously enqueued packet, and bstats cannot be changed, so
bstats/rates are not accurate (over estimated)

This patch changes the qdisc_bstats updates to be done at dequeue() time
instead of enqueue() time. bstats counters no longer account for dropped
frames, and rates are more correct, since enqueue() bursts dont have
effect on dequeue() rate.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-20 23:31:33 -08:00
Patrick McHardy
ffa934f192 rtnetlink: fix link attribute validation with IFLA_GROUP
rtnl_group_changelink() is invoked by rtnl_newlink() before the link
attributes have been validated. Additionally the group changes are
performed even if NLM_F_CREATE is specified and a new link is
created, while more reasonable semantics would be to set the group
value on the newly created link.

Fix both problems by moving the rtnl_group_changelink() invocation
down to the handling of non-existant links without NLM_F_CREATE()
and add a dev_set_group() call to rtnl_create_link().

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Vlad Dogaru <ddvlad@rosedu.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-20 23:28:54 -08:00
David Rientjes
6a108a14fa kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT
The meaning of CONFIG_EMBEDDED has long since been obsoleted; the option
is used to configure any non-standard kernel with a much larger scope than
only small devices.

This patch renames the option to CONFIG_EXPERT in init/Kconfig and fixes
references to the option throughout the kernel.  A new CONFIG_EMBEDDED
option is added that automatically selects CONFIG_EXPERT when enabled and
can be used in the future to isolate options that should only be
considered for embedded systems (RISC architectures, SLOB, etc).

Calling the option "EXPERT" more accurately represents its intention: only
expert users who understand the impact of the configuration changes they
are making should enable it.

Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: David Woodhouse <david.woodhouse@intel.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Greg KH <gregkh@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Robin Holt <holt@sgi.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20 17:02:05 -08:00
Eric Dumazet
f2eda47df4 ipv6: raw: rcu annotations
Remove sparse warnings, using a function typedef to be able to use __rcu
annotation on mh_filter pointer.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-20 16:59:34 -08:00
Eric Dumazet
6193d2be29 neigh: __rcu annotations
fix some minor issues and sparse (__rcu) warnings

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-20 16:59:34 -08:00
Eric Dumazet
753ea8e962 net: ipv6: sit: fix rcu annotations
Fix minor __rcu annotations and remove sparse warnings

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-20 16:59:33 -08:00
Eric Dumazet
a2da570d62 net_sched: RCU conversion of stab
This patch converts stab qdisc management to RCU, so that we can perform
the qdisc_calculate_pkt_len() call before getting qdisc lock.

This shortens the lock's held time in __dev_xmit_skb().

This permits more qdiscs to get TCQ_F_CAN_BYPASS status, avoiding lot of
cache misses and so reducing latencies.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Jesper Dangaard Brouer <hawk@diku.dk>
CC: Jarek Poplawski <jarkao2@gmail.com>
CC: Jamal Hadi Salim <hadi@cyberus.ca>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-20 16:59:32 -08:00
Eric Dumazet
fd245a4adb net_sched: move TCQ_F_THROTTLED flag
In commit 3711210576 (net: QDISC_STATE_RUNNING dont need atomic bit
ops) I moved QDISC_STATE_RUNNING flag to __state container, located in
the cache line containing qdisc lock and often dirtied fields.

I now move TCQ_F_THROTTLED bit too, so that we let first cache line read
mostly, and shared by all cpus. This should speedup HTB/CBQ for example.

Not using test_bit()/__clear_bit()/__test_and_set_bit allows to use an
"unsigned int" for __state container, reducing by 8 bytes Qdisc size.

Introduce helpers to hide implementation details.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Jesper Dangaard Brouer <hawk@diku.dk>
CC: Jarek Poplawski <jarkao2@gmail.com>
CC: Jamal Hadi Salim <hadi@cyberus.ca>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-20 16:59:32 -08:00
Eric Dumazet
817fb15dfd net_sched: sfq: allow divisor to be a parameter
SFQ currently uses a 1024 slots hash table, and its internal structure
(sfq_sched_data) allocation needs order-1 page on x86_64

Allow tc command to specify a divisor value (hash table size), between 1
and 65536.
If no value is provided, assume the 1024 default size.

This allows admins to setup smaller (or bigger) SFQ for specific needs.

This also brings back sfq_sched_data allocations to order-0 ones, saving
3KB per SFQ qdisc.

Jesper uses ~55.000 SFQ in one machine, this patch should free 165 MB of
memory.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Jesper Dangaard Brouer <hawk@diku.dk>
CC: Jarek Poplawski <jarkao2@gmail.com>
CC: Jamal Hadi Salim <hadi@cyberus.ca>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-20 16:59:16 -08:00
Eric Dumazet
3fbd8758b0 net: dev_close_many() is static
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Octavian Purdila <opurdila@ixiacom.com>
Reviewed-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-20 16:55:30 -08:00
Eric Dumazet
bced94ed5e netfilter: add a missing include in nf_conntrack_reasm.c
After commit ae90bdeaea (netfilter: fix compilation when conntrack is
disabled but tproxy is enabled) we have following warnings :

net/ipv6/netfilter/nf_conntrack_reasm.c:520:16: warning: symbol
'nf_ct_frag6_gather' was not declared. Should it be static?
net/ipv6/netfilter/nf_conntrack_reasm.c:591:6: warning: symbol
'nf_ct_frag6_output' was not declared. Should it be static?
net/ipv6/netfilter/nf_conntrack_reasm.c:612:5: warning: symbol
'nf_ct_frag6_init' was not declared. Should it be static?
net/ipv6/netfilter/nf_conntrack_reasm.c:640:6: warning: symbol
'nf_ct_frag6_cleanup' was not declared. Should it be static?

Fix this including net/netfilter/ipv6/nf_defrag_ipv6.h

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-20 21:00:38 +01:00
Changli Gao
41a7cab6d3 netfilter: nf_nat: place conntrack in source hash after SNAT is done
If SNAT isn't done, the wrong info maybe got by the other cts.

As the filter table is after DNAT table, the packets dropped in filter
table also bother bysource hash table.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-20 15:49:52 +01:00
Patrick McHardy
82d800d8e7 Merge branch 'connlimit' of git://dev.medozas.de/linux
Conflicts:
	Documentation/feature-removal-schedule.txt

Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-20 10:33:55 +01:00
Florian Westphal
28a51ba59a netfilter: do not omit re-route check on NF_QUEUE verdict
ret != NF_QUEUE only works in the "--queue-num 0" case; for
queues > 0 the test should be '(ret & NF_VERDICT_MASK) != NF_QUEUE'.

However, NF_QUEUE no longer DROPs the skb unconditionally if queueing
fails (due to NF_VERDICT_FLAG_QUEUE_BYPASS verdict flag), so the
re-route test should also be performed if this flag is set in the
verdict.

The full test would then look something like

&& ((ret & NF_VERDICT_MASK) == NF_QUEUE && (ret & NF_VERDICT_FLAG_QUEUE_BYPASS))

This is rather ugly, so just remove the NF_QUEUE test altogether.

The only effect is that we might perform an unnecessary route lookup
in the NF_QUEUE case.

ip6table_mangle did not have such a check.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-20 10:23:26 +01:00
David S. Miller
a07aa004c8 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2011-01-20 00:06:15 -08:00
Eric Dumazet
cc7ec456f8 net_sched: cleanups
Cleanup net/sched code to current CodingStyle and practices.

Reduce inline abuse

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19 23:31:12 -08:00
Alban Crequy
7180a03118 af_unix: coding style: remove one level of indentation in unix_shutdown()
Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
Reviewed-by: Ian Molton <ian.molton@collabora.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19 23:31:11 -08:00
John Fastabend
b8970f0bfc net_sched: implement a root container qdisc sch_mqprio
This implements a mqprio queueing discipline that by default creates
a pfifo_fast qdisc per tx queue and provides the needed configuration
interface.

Using the mqprio qdisc the number of tcs currently in use along
with the range of queues alloted to each class can be configured. By
default skbs are mapped to traffic classes using the skb priority.
This mapping is configurable.

Configurable parameters,

struct tc_mqprio_qopt {
	__u8    num_tc;
	__u8    prio_tc_map[TC_BITMASK + 1];
	__u8    hw;
	__u16   count[TC_MAX_QUEUE];
	__u16   offset[TC_MAX_QUEUE];
};

Here the count/offset pairing give the queue alignment and the
prio_tc_map gives the mapping from skb->priority to tc.

The hw bit determines if the hardware should configure the count
and offset values. If the hardware bit is set then the operation
will fail if the hardware does not implement the ndo_setup_tc
operation. This is to avoid undetermined states where the hardware
may or may not control the queue mapping. Also minimal bounds
checking is done on the count/offset to verify a queue does not
exceed num_tx_queues and that queue ranges do not overlap. Otherwise
it is left to user policy or hardware configuration to create
useful mappings.

It is expected that hardware QOS schemes can be implemented by
creating appropriate mappings of queues in ndo_tc_setup().

One expected use case is drivers will use the ndo_setup_tc to map
queue ranges onto 802.1Q traffic classes. This provides a generic
mechanism to map network traffic onto these traffic classes and
removes the need for lower layer drivers to know specifics about
traffic types.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19 23:31:11 -08:00
John Fastabend
4f57c087de net: implement mechanism for HW based QOS
This patch provides a mechanism for lower layer devices to
steer traffic using skb->priority to tx queues. This allows
for hardware based QOS schemes to use the default qdisc without
incurring the penalties related to global state and the qdisc
lock. While reliably receiving skbs on the correct tx ring
to avoid head of line blocking resulting from shuffling in
the LLD. Finally, all the goodness from txq caching and xps/rps
can still be leveraged.

Many drivers and hardware exist with the ability to implement
QOS schemes in the hardware but currently these drivers tend
to rely on firmware to reroute specific traffic, a driver
specific select_queue or the queue_mapping action in the
qdisc.

By using select_queue for this drivers need to be updated for
each and every traffic type and we lose the goodness of much
of the upstream work. Firmware solutions are inherently
inflexible. And finally if admins are expected to build a
qdisc and filter rules to steer traffic this requires knowledge
of how the hardware is currently configured. The number of tx
queues and the queue offsets may change depending on resources.
Also this approach incurs all the overhead of a qdisc with filters.

With the mechanism in this patch users can set skb priority using
expected methods ie setsockopt() or the stack can set the priority
directly. Then the skb will be steered to the correct tx queues
aligned with hardware QOS traffic classes. In the normal case with
single traffic class and all queues in this class everything
works as is until the LLD enables multiple tcs.

To steer the skb we mask out the lower 4 bits of the priority
and allow the hardware to configure upto 15 distinct classes
of traffic. This is expected to be sufficient for most applications
at any rate it is more then the 8021Q spec designates and is
equal to the number of prio bands currently implemented in
the default qdisc.

This in conjunction with a userspace application such as
lldpad can be used to implement 8021Q transmission selection
algorithms one of these algorithms being the extended transmission
selection algorithm currently being used for DCB.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19 23:31:10 -08:00
Vlad Dogaru
e7ed828f10 netlink: support setting devgroup parameters
If a rtnetlink request specifies a negative or zero ifindex and has no
interface name attribute, but has a group attribute, then the chenges
are made to all the interfaces belonging to the specified group.

Signed-off-by: Vlad Dogaru <ddvlad@rosedu.org>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19 23:31:10 -08:00
Vlad Dogaru
cbda10fa97 net_device: add support for network device groups
Net devices can now be grouped, enabling simpler manipulation from
userspace. This patch adds a group field to the net_device structure, as
well as rtnetlink support to query and modify it.

Signed-off-by: Vlad Dogaru <ddvlad@rosedu.org>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19 23:31:09 -08:00
Shan Wei
441c793a56 net: cleanup unused macros in net directory
Clean up some unused macros in net/*.
1. be left for code change. e.g. PGV_FROM_VMALLOC, PGV_FROM_VMALLOC, KMEM_SAFETYZONE.
2. never be used since introduced to kernel.
   e.g. P9_RDMA_MAX_SGE, UTIL_CTRL_PKT_SIZE.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19 23:20:04 -08:00
Linus Torvalds
1268afe676 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (41 commits)
  sctp: user perfect name for Delayed SACK Timer option
  net: fix can_checksum_protocol() arguments swap
  Revert "netlink: test for all flags of the NLM_F_DUMP composite"
  gianfar: Fix misleading indentation in startup_gfar()
  net/irda/sh_irda: return to RX mode when TX error
  net offloading: Do not mask out NETIF_F_HW_VLAN_TX for vlan.
  USB CDC NCM: tx_fixup() race condition fix
  ns83820: Avoid bad pointer deref in ns83820_init_one().
  ipv6: Silence privacy extensions initialization
  bnx2x: Update bnx2x version to 1.62.00-4
  bnx2x: Fix AER setting for BCM57712
  bnx2x: Fix BCM84823 LED behavior
  bnx2x: Mark full duplex on some external PHYs
  bnx2x: Fix BCM8073/BCM8727 microcode loading
  bnx2x: LED fix for BCM8727 over BCM57712
  bnx2x: Common init will be executed only once after POR
  bnx2x: Swap BCM8073 PHY polarity if required
  iwlwifi: fix valid chain reading from EEPROM
  ath5k: fix locking in tx_complete_poll_work
  ath9k_hw: do PA offset calibration only on longcal interval
  ...
2011-01-19 20:25:45 -08:00
Shan Wei
4580ccc04d sctp: user perfect name for Delayed SACK Timer option
The option name of Delayed SACK Timer should be SCTP_DELAYED_SACK,
not SCTP_DELAYED_ACK.

Left SCTP_DELAYED_ACK be concomitant with SCTP_DELAYED_SACK,
for making compatibility with existing applications.

Reference:
8.1.19.  Get or Set Delayed SACK Timer (SCTP_DELAYED_SACK)
(http://tools.ietf.org/html/draft-ietf-tsvwg-sctpsocket-25)

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19 16:51:29 -08:00
Patrick McHardy
14f0290ba4 Merge branch 'master' of /repos/git/net-next-2.6 2011-01-19 23:51:37 +01:00
Eric Dumazet
d402786ea4 net: fix can_checksum_protocol() arguments swap
commit 0363466866 (net offloading: Convert checksums to use
centrally computed features.) mistakenly swapped can_checksum_protocol()
arguments.

This broke IPv6 on bnx2 for instance, on NIC without TCPv6 checksum
offloads.

Reported-by: Hans de Bruin <jmdebruin@xmsnet.nl>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19 14:15:21 -08:00
David S. Miller
b8f3ab4290 Revert "netlink: test for all flags of the NLM_F_DUMP composite"
This reverts commit 0ab03c2b14.

It breaks several things including the avahi daemon.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19 13:34:20 -08:00
Patrick McHardy
f5c88f56b3 netfilter: nf_conntrack: fix lifetime display for disabled connections
When no tstamp extension exists, ct_delta_time() returns -1, which is
then assigned to an u64 and tested for negative values to decide
whether to display the lifetime. This obviously doesn't work, use
a s64 and merge the two minor functions into one.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-19 19:10:49 +01:00
Jan Engelhardt
cc4fc02257 netfilter: xtables: connlimit revision 1
This adds destination address-based selection. The old "inverse"
member is overloaded (memory-wise) with a new "flags" variable,
similar to how J.Park did it with xt_string rev 1. Since revision 0
userspace only sets flag 0x1, no great changes are made to explicitly
test for different revisions.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2011-01-19 18:27:46 +01:00
Johan Hedberg
765c2a964b Bluetooth: Fix race condition with conn->sec_level
The conn->sec_level value is supposed to represent the current level of
security that the connection has. However, by assigning to it before
requesting authentication it will have the wrong value during the
authentication procedure. To fix this a pending_sec_level variable is
added which is used to track the desired security level while making
sure that sec_level always represents the current level of security.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-01-19 14:43:11 -02:00
Johan Hedberg
d00ef24fc2 Bluetooth: Fix authentication request for L2CAP raw sockets
When there is an existing connection l2cap_check_security needs to be
called to ensure that the security level of the new socket is fulfilled.
Normally l2cap_do_start takes care of this, but that function doesn't
get called for SOCK_RAW type sockets. This patch adds the necessary
l2cap_check_security call to the appropriate branch in l2cap_do_connect.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-01-19 14:40:43 -02:00
Johan Hedberg
8556edd32f Bluetooth: Create a unified auth_type evaluation function
The logic for determining the needed auth_type for an L2CAP socket is
rather complicated and has so far been duplicated in
l2cap_check_security as well as l2cap_do_connect. Additionally the
l2cap_check_security code was completely missing the handling of
SOCK_RAW type sockets. This patch creates a unified function for the
evaluation and makes l2cap_do_connect and l2cap_check_security use that
function.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-01-19 14:40:43 -02:00
Johan Hedberg
65cf686ee1 Bluetooth: Fix MITM protection requirement preservation
If an existing connection has a MITM protection requirement (the first
bit of the auth_type) then that requirement should not be cleared by new
sockets that reuse the ACL but don't have that requirement.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-01-19 14:40:43 -02:00
Johan Hedberg
88644bb9fe Revert "Bluetooth: Update sec_level/auth_type for already existing connections"
This reverts commit 045309820a. That
commit is wrong for two reasons:

- The conn->sec_level shouldn't be updated without performing
authentication first (as it's supposed to represent the level of
security that the existing connection has)

- A higher auth_type value doesn't mean "more secure" like the commit
seems to assume. E.g. dedicated bonding with MITM protection is 0x03
whereas general bonding without MITM protection is 0x04. hci_conn_auth
already takes care of updating conn->auth_type so hci_connect doesn't
need to do it.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-01-19 14:40:42 -02:00
Lukáš Turek
683d949a7f Bluetooth: Never deallocate a session when some DLC points to it
Fix a bug introduced in commit 9cf5b0ea3a:
function rfcomm_recv_ua calls rfcomm_session_put without checking that
the session is not referenced by some DLC. If the session is freed, that
DLC would refer to deallocated memory, causing an oops later, as shown
in this bug report: https://bugzilla.kernel.org/show_bug.cgi?id=15994

Signed-off-by: Lukas Turek <8an@praha12.net>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-01-19 14:40:42 -02:00
Johan Hedberg
e2e0cacbd4 Bluetooth: Fix leaking blacklist when unregistering a hci device
The blacklist should be freed before the hci device gets unregistered.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-01-19 14:40:42 -02:00
David Sterba
4571928fc7 Bluetooth: l2cap: fix misuse of logical operation in place of bitop
CC: Marcel Holtmann <marcel@holtmann.org>
CC: "Gustavo F. Padovan" <padovan@profusion.mobi>
CC: João Paulo Rechi Vita <jprvita@profusion.mobi>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-01-19 14:40:42 -02:00
Felix Fietkau
fbb327c594 mac80211: drop non-auth 3-addr data frames when running as a 4-addr station
When running as a 4-addr station against an AP that has the 4-addr VLAN
interface and the main 3-addr AP interface bridged together, sometimes
frames originating from the station were looping back from the 3-addr AP
interface, causing the bridge code to emit warnings about receiving frames
with its own source address.
I'm not sure why this is happening yet, but I think it's a good idea to
drop all frames (except 802.1x/EAP frames) that do not match the configured
addressing mode, including 4-address frames sent to a 3-address station.
User test reports indicate that the problem goes away with this patch.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-19 11:36:12 -05:00
Johannes Berg
5dd36bc933 mac80211: allow advertising correct maximum aggregate size
Currently, mac80211 always advertises that it may send
up to 64 subframes in an aggregate. This is fine, since
it's the max, but might as well be set to zero instead
since it doesn't have any information.

However, drivers might have that information, so allow
them to set a variable giving it, which will then be
used. The default of zero will be fine since to the
peer that means we don't know and it will just use its
own limit for the buffer size.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-19 11:36:12 -05:00
Johannes Berg
0b01f030d3 mac80211: track receiver's aggregation reorder buffer size
The aggregation code currently doesn't implement the
buffer size negotiation. It will always request a max
buffer size (which is fine, if a little pointless, as
the mac80211 code doesn't know and might just use 0
instead), but if the peer requests a smaller size it
isn't possible to honour this request.

In order to fix this, look at the buffer size in the
addBA response frame, keep track of it and pass it to
the driver in the ampdu_action callback when called
with the IEEE80211_AMPDU_TX_OPERATIONAL action. That
way the driver can limit the number of subframes in
aggregates appropriately.

Note that this doesn't fix any drivers apart from the
addition of the new argument -- they all need to be
updated separately to use this variable!

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-19 11:36:11 -05:00
Johannes Berg
ac1bd8464f mac80211: don't return beacons when mesh is disabled
When mesh is disabled, mac80211 was returning
beacons with an empty mesh ID. That isn't
desirable, even if drivers shouldn't be trying
to get beacons to start with.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-19 11:36:11 -05:00
Ben Greear
bfc31df33b mac80211: Show max retry-counts in kernel messages.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-19 11:36:09 -05:00
Wey-Yi Guy
0a65169b1f mac80211: mesh only parameter mppath maybe unused
mppath is mesh related parameter and maybe unused

Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-19 11:36:09 -05:00
Luciano Coelho
df6ba5d80d mac80211: add hw configuration for max ampdu buffer size
Some devices don't support the maximum AMDPU buffer size of 64, so we
need to add an option to configure this in the hardware configuration.
This value will be used in the ADDBA response instead of the value
suggested in the request, if the latter is greater than the max
supported.

Signed-off-by: Luciano Coelho <coelho@ti.com>
Tested-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-19 11:36:09 -05:00
Nick Ledovskikh
dcac908bab mac80211:mesh_mpp_table_grow call should depend on MESH_WORK_GROW_MPP_TABLE flag.
Replace MESH_WORK_GROW_MPATH_TABLE by MESH_WORK_GROW_MPP_TABLE in
mesh_mpp_table_grow call condition.

(Clearly the original was a typo... -- JWL)

Signed-off-by: Nickolay Ledovskikh <nledovskikh@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-19 11:36:08 -05:00
Joel A Fernandes
9d52501b42 mac80211: Rewrote code for checking if destinations are proxied.
Rewrote code for checking if the destination is proxied by a mesh portal, to facilitate better
understanding of the functionality.

Signed-off-by: Joel A Fernandes <agnel.joel@gmail.com>
Acked-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-19 11:36:07 -05:00
Pablo Neira Ayuso
a992ca2a04 netfilter: nf_conntrack_tstamp: add flow-based timestamp extension
This patch adds flow-based timestamping for conntracks. This
conntrack extension is disabled by default. Basically, we use
two 64-bits variables to store the creation timestamp once the
conntrack has been confirmed and the other to store the deletion
time. This extension is disabled by default, to enable it, you
have to:

echo 1 > /proc/sys/net/netfilter/nf_conntrack_timestamp

This patch allows to save memory for user-space flow-based
loogers such as ulogd2. In short, ulogd2 does not need to
keep a hashtable with the conntrack in user-space to know
when they were created and destroyed, instead we use the
kernel timestamp. If we want to have a sane IPFIX implementation
in user-space, this nanosecs resolution timestamps are also
useful. Other custom user-space applications can benefit from
this via libnetfilter_conntrack.

This patch modifies the /proc output to display the delta time
in seconds since the flow start. You can also obtain the
flow-start date by means of the conntrack-tools.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-19 16:00:07 +01:00
Jesper Juhl
42b16b3fbb Kill off warning: ‘inline’ is not at beginning of declaration
Fix a bunch of
	warning: ‘inline’ is not at beginning of declaration
messages when building a 'make allyesconfig' kernel with -Wextra.

These warnings are trivial to kill, yet rather annoying when building with
-Wextra.
The more we can cut down on pointless crap like this the better (IMHO).

A previous patch to do this for a 'allnoconfig' build has already been
merged. This just takes the cleanup a little further.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-01-19 15:43:08 +01:00
Eric Dumazet
80f8f1027b net: filter: dont block softirqs in sk_run_filter()
Packet filter (BPF) doesnt need to disable softirqs, being fully
re-entrant and lock-less.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-18 21:33:05 -08:00
Alban Crequy
d6ae3bae3d af_unix: implement socket filter
Linux Socket Filters can already be successfully attached and detached on unix
sockets with setsockopt(sockfd, SOL_SOCKET, SO_{ATTACH,DETACH}_FILTER, ...).
See: Documentation/networking/filter.txt

But the filter was never used in the unix socket code so it did not work. This
patch uses sk_filter() to filter buffers before delivery.

This short program demonstrates the problem on SOCK_DGRAM.

int main(void) {
  int i, j, ret;
  int sv[2];
  struct pollfd fds[2];
  char *message = "Hello world!";
  char buffer[64];
  struct sock_filter ins[32] = {{0,},};
  struct sock_fprog filter;

  socketpair(AF_UNIX, SOCK_DGRAM, 0, sv);

  for (i = 0 ; i < 2 ; i++) {
    fds[i].fd = sv[i];
    fds[i].events = POLLIN;
    fds[i].revents = 0;
  }

  for(j = 1 ; j < 13 ; j++) {

    /* Set a socket filter to truncate the message */
    memset(ins, 0, sizeof(ins));
    ins[0].code = BPF_RET|BPF_K;
    ins[0].k = j;
    filter.len = 1;
    filter.filter = ins;
    setsockopt(sv[1], SOL_SOCKET, SO_ATTACH_FILTER, &filter, sizeof(filter));

    /* send a message */
    send(sv[0], message, strlen(message) + 1, 0);

    /* The filter should let the message pass but truncated. */
    poll(fds, 2, 0);

    /* Receive the truncated message*/
    ret = recv(sv[1], buffer, 64, 0);
    printf("received %d bytes, expected %d\n", ret, j);
  }

    for (i = 0 ; i < 2 ; i++)
      close(sv[i]);

  return 0;
}

Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
Reviewed-by: Ian Molton <ian.molton@collabora.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-18 21:33:05 -08:00
David S. Miller
a5db219f4c Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2011-01-18 16:28:31 -08:00
Jesse Gross
6ee400aafb net offloading: Do not mask out NETIF_F_HW_VLAN_TX for vlan.
In netif_skb_features() we return only the features that are valid for vlans
if we have a vlan packet.  However, we should not mask out NETIF_F_HW_VLAN_TX
since it enables transmission of vlan tags and is obviously valid.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-18 16:13:50 -08:00
Romain Francoise
2fdc1c8093 ipv6: Silence privacy extensions initialization
When a network namespace is created (via CLONE_NEWNET), the loopback
interface is automatically added to the new namespace, triggering a
printk in ipv6_add_dev() if CONFIG_IPV6_PRIVACY is set.

This is problematic for applications which use CLONE_NEWNET as
part of a sandbox, like Chromium's suid sandbox or recent versions of
vsftpd. On a busy machine, it can lead to thousands of useless
"lo: Disabled Privacy Extensions" messages appearing in dmesg.

It's easy enough to check the status of privacy extensions via the
use_tempaddr sysctl, so just removing the printk seems like the most
sensible solution.

Signed-off-by: Romain Francoise <romain@orebokech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-18 16:13:49 -08:00
David S. Miller
f966a13f92 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2011-01-18 12:50:19 -08:00
Jiri Olsa
93557f53e1 netfilter: nf_conntrack: nf_conntrack snmp helper
Adding support for SNMP broadcast connection tracking. The SNMP
broadcast requests are now paired with the SNMP responses.
Thus allowing using SNMP broadcasts with firewall enabled.

Please refer to the following conversation:
http://marc.info/?l=netfilter-devel&m=125992205006600&w=2

Patrick McHardy wrote:
> > The best solution would be to add generic broadcast tracking, the
> > use of expectations for this is a bit of abuse.
> > The second best choice I guess would be to move the help() function
> > to a shared module and generalize it so it can be used for both.
This patch implements the "second best choice".

Since the netbios-ns conntrack module uses the same helper
functionality as the snmp, only one helper function is added
for both snmp and netbios-ns modules into the new object -
nf_conntrack_broadcast.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-18 18:12:24 +01:00
Eric Dumazet
94d117a1c7 netfilter: ipt_CLUSTERIP: remove "no conntrack!"
When a packet is meant to be handled by another node of the cluster,
silently drop it instead of flooding kernel log.

Note : INVALID packets are also dropped without notice.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-18 16:27:56 +01:00
Patrick McHardy
a8fc0d9b34 Merge branch 'master' of git://dev.medozas.de/linux 2011-01-18 16:20:53 +01:00
Florian Westphal
94b27cc361 netfilter: allow NFQUEUE bypass if no listener is available
If an skb is to be NF_QUEUE'd, but no program has opened the queue, the
packet is dropped.

This adds a v2 target revision of xt_NFQUEUE that allows packets to
continue through the ruleset instead.

Because the actual queueing happens outside of the target context, the
'bypass' flag has to be communicated back to the netfilter core.

Unfortunately the only choice to do this without adding a new function
argument is to use the target function return value (i.e. the verdict).

In the NF_QUEUE case, the upper 16bit already contain the queue number
to use.  The previous patch reduced NF_VERDICT_MASK to 0xff, i.e.
we now have extra room for a new flag.

If a hook issued a NF_QUEUE verdict, then the netfilter core will
continue packet processing if the queueing hook
returns -ESRCH (== "this queue does not exist") and the new
NF_VERDICT_FLAG_QUEUE_BYPASS flag is set in the verdict value.

Note: If the queue exists, but userspace does not consume packets fast
enough, the skb will still be dropped.

Signed-off-by: Florian Westphal <fwestphal@astaro.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-18 16:08:30 +01:00
Florian Westphal
f615df76ed netfilter: reduce NF_VERDICT_MASK to 0xff
NF_VERDICT_MASK is currently 0xffff. This is because the upper
16 bits are used to store errno (for NF_DROP) or the queue number
(NF_QUEUE verdict).

As there are up to 0xffff different queues available, there is no more
room to store additional flags.

At the moment there are only 6 different verdicts, i.e. we can reduce
NF_VERDICT_MASK to 0xff to allow storing additional flags in the 0xff00 space.

NF_VERDICT_BITS would then be reduced to 8, but because the value is
exported to userspace, this might cause breakage; e.g.:

e.g. 'queuenr = (1 << NF_VERDICT_BITS) | NF_QUEUE'  would now break.

Thus, remove NF_VERDICT_BITS usage in the kernel and move the old value
to the 'userspace compat' section.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-18 15:52:14 +01:00
Florian Westphal
06cdb6349c netfilter: nfnetlink_queue: do not free skb on error
Move free responsibility from nf_queue to caller.
This enables more flexible error handling; we can now accept the skb
instead of freeing it.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-18 15:28:38 +01:00
Florian Westphal
f158508618 netfilter: nfnetlink_queue: return error number to caller
instead of returning -1 on error, return an error number to allow the
caller to handle some errors differently.

ECANCELED is used to indicate that the hook is going away and should be
ignored.

A followup patch will introduce more 'ignore this hook' conditions,
(depending on queue settings) and will move kfree_skb responsibility
to the caller.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-18 15:27:28 +01:00
Florian Westphal
5f2cafe736 netfilter: Kconfig: NFQUEUE is useless without NETFILTER_NETLINK_QUEUE
NFLOG already does the same thing for NETFILTER_NETLINK_LOG.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-18 15:18:08 +01:00
Changli Gao
45eec34195 netfilter: nf_conntrack: remove an atomic bit operation
As this ct won't be seen by the others, we don't need to set the
IPS_CONFIRMED_BIT in atomic way.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Cc: Tim Gardner <tim.gardner@canonical.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-18 15:08:13 +01:00
Changli Gao
a7c2f4d7da netfilter: nf_nat: fix conversion to non-atomic bit ops
My previous patch (netfilter: nf_nat: don't use atomic bit operation)
made a mistake when converting atomic_set to a normal bit 'or'.
IPS_*_BIT should be replaced with IPS_*.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Cc: Tim Gardner <tim.gardner@canonical.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-18 15:02:48 +01:00
Richard Weinberger
1cc34c30be netfilter: xt_connlimit: use hotdrop jump mark
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2011-01-18 06:50:41 +01:00
Jan Engelhardt
f1e231a356 netfilter: xtables: add missing aliases for autoloading via iptables
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2011-01-18 06:33:54 +01:00
Thomas Graf
fbabf31e4d netfilter: create audit records for x_tables replaces
The setsockopt() syscall to replace tables is already recorded
in the audit logs. This patch stores additional information
such as table name and netfilter protocol.

Cc: Patrick McHardy <kaber@trash.net>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Thomas Graf <tgraf@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-16 18:12:59 +01:00
Thomas Graf
43f393caec netfilter: audit target to record accepted/dropped packets
This patch adds a new netfilter target which creates audit records
for packets traversing a certain chain.

It can be used to record packets which are rejected administraively
as follows:

  -N AUDIT_DROP
  -A AUDIT_DROP -j AUDIT --type DROP
  -A AUDIT_DROP -j DROP

a rule which would typically drop or reject a packet would then
invoke the new chain to record packets before dropping them.

  -j AUDIT_DROP

The module is protocol independant and works for iptables, ip6tables
and ebtables.

The following information is logged:
 - netfilter hook
 - packet length
 - incomming/outgoing interface
 - MAC src/dst/proto for ethernet packets
 - src/dst/protocol address for IPv4/IPv6
 - src/dst port for TCP/UDP/UDPLITE
 - icmp type/code

Cc: Patrick McHardy <kaber@trash.net>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Thomas Graf <tgraf@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-16 18:10:28 +01:00
Dan Carpenter
01a859014b caif: checking the wrong variable
In the original code we check if (servl == NULL) twice.  The first time
should print the message that cfmuxl_remove_uplayer() failed and set
"ret" correctly, but instead it just returns success.  The second check
should be checking the value of "ret" instead of "servl".

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-15 20:58:11 -08:00
Kurt Van Dijck
5e50732803 can: test size of struct sockaddr in sendmsg
This patch makes the CAN socket code conform to the manpage of sendmsg.

Signed-off-by: Kurt Van Dijck <kurt.van.dijck@eia.be>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-15 20:56:42 -08:00
David S. Miller
d78c68efa8 Merge branch 'for-david' of git://git.open-mesh.org/ecsv/linux-merge 2011-01-15 20:48:28 -08:00
Sven Eckelmann
aa0adb1a85 batman-adv: Use "__attribute__" shortcut macros
Linux 2.6.21 defines different macros for __attribute__ which are also
used inside batman-adv. The next version of checkpatch.pl warns about
the usage of __attribute__((packed))).

Linux 2.6.33 defines an extra macro __always_unused which is used to
assist source code analyzers and can be used to removed the last
existing __attribute__ inside the source code.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-16 03:25:19 +01:00
Linus Torvalds
d018b6f4f1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits)
  GRETH: resolve SMP issues and other problems
  GRETH: handle frame error interrupts
  GRETH: avoid writing bad speed/duplex when setting transfer mode
  GRETH: fixed skb buffer memory leak on frame errors
  GRETH: GBit transmit descriptor handling optimization
  GRETH: fix opening/closing
  GRETH: added raw AMBA vendor/device number to match against.
  cassini: Fix build bustage on x86.
  e1000e: consistent use of Rx/Tx vs. RX/TX/rx/tx in comments/logs
  e1000e: update Copyright for 2011
  e1000: Avoid unhandled IRQ
  r8169: keep firmware in memory.
  netdev: tilepro: Use is_unicast_ether_addr helper
  etherdevice.h: Add is_unicast_ether_addr function
  ks8695net: Use default implementation of ethtool_ops::get_link
  ks8695net: Disable non-working ethtool operations
  USB CDC NCM: Don't deref NULL in cdc_ncm_rx_fixup() and don't use uninitialized variable.
  vxge: Remember to release firmware after upgrading firmware
  netdev: bfin_mac: Remove is_multicast_ether_addr use in netdev_for_each_mc_addr
  ipsec: update MAX_AH_AUTH_LEN to support sha512
  ...
2011-01-14 13:25:30 -08:00
Linus Torvalds
18bce371ae Merge branch 'for-2.6.38' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.38' of git://linux-nfs.org/~bfields/linux: (62 commits)
  nfsd4: fix callback restarting
  nfsd: break lease on unlink, link, and rename
  nfsd4: break lease on nfsd setattr
  nfsd: don't support msnfs export option
  nfsd4: initialize cb_per_client
  nfsd4: allow restarting callbacks
  nfsd4: simplify nfsd4_cb_prepare
  nfsd4: give out delegations more quickly in 4.1 case
  nfsd4: add helper function to run callbacks
  nfsd4: make sure sequence flags are set after destroy_session
  nfsd4: re-probe callback on connection loss
  nfsd4: set sequence flag when backchannel is down
  nfsd4: keep finer-grained callback status
  rpc: allow xprt_class->setup to return a preexisting xprt
  rpc: keep backchannel xprt as long as server connection
  rpc: move sk_bc_xprt to svc_xprt
  nfsd4: allow backchannel recovery
  nfsd4: support BIND_CONN_TO_SESSION
  nfsd4: modify session list under cl_lock
  Documentation: fl_mylease no longer exists
  ...

Fix up conflicts in fs/nfsd/vfs.c with the vfs-scale work.  The
vfs-scale work touched some msnfs cases, and this merge removes support
for that entirely, so the conflict was trivial to resolve.
2011-01-14 13:17:26 -08:00
Tejun Heo
e1fcc7e2a7 rxrpc: rxrpc_workqueue isn't used during memory reclaim
rxrpc_workqueue isn't depended upon while reclaiming memory.  Convert
to alloc_workqueue() without WQ_MEM_RECLAIM.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: linux-afs@lists.infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-14 09:25:11 -08:00
Patrick McHardy
d862a6622e netfilter: nf_conntrack: use is_vmalloc_addr()
Use is_vmalloc_addr() in nf_ct_free_hashtable() and get rid of
the vmalloc flags to indicate that a hash table has been allocated
using vmalloc().

Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-14 15:45:56 +01:00
Patrick McHardy
0134e89c7b Merge branch 'master' of git://1984.lsi.us.es/net-next-2.6
Conflicts:
	net/ipv4/route.c

Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-14 14:12:37 +01:00
Patrick McHardy
c7066f70d9 netfilter: fix Kconfig dependencies
Fix dependencies of netfilter realm match: it depends on NET_CLS_ROUTE,
which itself depends on NET_SCHED; this dependency is missing from netfilter.

Since matching on realms is also useful without having NET_SCHED enabled and
the option really only controls whether the tclassid member is included in
route and dst entries, rename the config option to IP_ROUTE_CLASSID and move
it outside of traffic scheduling context to get rid of the NET_SCHED dependeny.

Reported-by: Vladis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-14 13:36:42 +01:00
Eric Dumazet
1ac9ad1394 net: remove dev_txq_stats_fold()
After recent changes, (percpu stats on vlan/tunnels...), we dont need
anymore per struct netdev_queue tx_bytes/tx_packets/tx_dropped counters.

Only remaining users are ixgbe, sch_teql, gianfar & macvlan :

1) ixgbe can be converted to use existing tx_ring counters.

2) macvlan incremented txq->tx_dropped, it can use the
dev->stats.tx_dropped counter.

3) sch_teql : almost revert ab35cd4b8f (Use net_device internal stats)
    Now we have ndo_get_stats64(), use it, even for "unsigned long"
fields (No need to bring back a struct net_device_stats)

4) gianfar adds a stats structure per tx queue to hold
tx_bytes/tx_packets

This removes a lockdep warning (and possible lockup) in rndis gadget,
calling dev_get_stats() from hard IRQ context.

Ref: http://www.spinics.net/lists/netdev/msg149202.html

Reported-by: Neil Jones <neiljay@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jarek Poplawski <jarkao2@gmail.com>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Sandeep Gopalpet <sandeep.kumar@freescale.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-13 21:44:34 -08:00
Jesper Juhl
ed7809d9c4 batman-adv: Even Batman should not dereference NULL pointers
There's a problem in net/batman-adv/unicast.c::frag_send_skb().
dev_alloc_skb() allocates memory and may fail, thus returning NULL. If
this happens we'll pass a NULL pointer on to skb_split() which in turn
hands it to skb_split_inside_header() from where it gets passed to
skb_put() that lets skb_tail_pointer() play with it and that function
dereferences it. And thus the bat dies.

While I was at it I also moved the call to dev_alloc_skb() above the
assignment to 'unicast_packet' since there's no reason to do that
assignment if the memory allocation fails.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-13 22:11:12 +01:00
Luciano Coelho
82694f764d mac80211: use maximum number of AMPDU frames as default in BA RX
When the buffer size is set to zero in the block ack parameter set
field, we should use the maximum supported number of subframes.  The
existing code was bogus and was doing some unnecessary calculations
that lead to wrong values.

Thanks Johannes for helping me figure this one out.

Cc: stable@kernel.org
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Luciano Coelho <coelho@ti.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-13 15:46:45 -05:00
Johannes Berg
681c4d07dd mac80211: fix lockdep warning
Since the introduction of the fixes for the
reorder timer, mac80211 will cause lockdep
warnings because lockdep confuses
local->skb_queue and local->rx_skb_queue
and treats their lock as the same.

However, their locks are different, and are
valid in different contexts (the former is
used in IRQ context, the latter in BH only)
and the only thing to be done is mark the
former as a different lock class so that
lockdep can tell the difference.

Reported-by: Larry Finger <Larry.Finger@lwfinger.net>
Reported-by: Sujith <m.sujith@gmail.com>
Reported-by: Miles Lane <miles.lane@gmail.com>
Tested-by: Sujith <m.sujith@gmail.com>
Tested-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-13 15:46:45 -05:00
David S. Miller
1949e084bf Merge branch 'master' of git://1984.lsi.us.es/net-2.6 2011-01-13 12:34:21 -08:00
Linus Torvalds
b2034d474b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (41 commits)
  fs: add documentation on fallocate hole punching
  Gfs2: fail if we try to use hole punch
  Btrfs: fail if we try to use hole punch
  Ext4: fail if we try to use hole punch
  Ocfs2: handle hole punching via fallocate properly
  XFS: handle hole punching via fallocate properly
  fs: add hole punching to fallocate
  vfs: pass struct file to do_truncate on O_TRUNC opens (try #2)
  fix signedness mess in rw_verify_area() on 64bit architectures
  fs: fix kernel-doc for dcache::prepend_path
  fs: fix kernel-doc for dcache::d_validate
  sanitize ecryptfs ->mount()
  switch afs
  move internal-only parts of ncpfs headers to fs/ncpfs
  switch ncpfs
  switch 9p
  pass default dentry_operations to mount_pseudo()
  switch hostfs
  switch affs
  switch configfs
  ...
2011-01-13 10:27:28 -08:00
Linus Torvalds
27d189c02b Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (46 commits)
  hwrng: via_rng - Fix memory scribbling on some CPUs
  crypto: padlock - Move padlock.h into include/crypto
  hwrng: via_rng - Fix asm constraints
  crypto: n2 - use __devexit not __exit in n2_unregister_algs
  crypto: mark crypto workqueues CPU_INTENSIVE
  crypto: mv_cesa - dont return PTR_ERR() of wrong pointer
  crypto: ripemd - Set module author and update email address
  crypto: omap-sham - backlog handling fix
  crypto: gf128mul - Remove experimental tag
  crypto: af_alg - fix af_alg memory_allocated data type
  crypto: aesni-intel - Fixed build with binutils 2.16
  crypto: af_alg - Make sure sk_security is initialized on accept()ed sockets
  net: Add missing lockdep class names for af_alg
  include: Install linux/if_alg.h for user-space crypto API
  crypto: omap-aes - checkpatch --file warning fixes
  crypto: omap-aes - initialize aes module once per request
  crypto: omap-aes - unnecessary code removed
  crypto: omap-aes - error handling implementation improved
  crypto: omap-aes - redundant locking is removed
  crypto: omap-aes - DMA initialization fixes for OMAP off mode
  ...
2011-01-13 10:25:58 -08:00
Linus Torvalds
a170315420 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  rbd: fix cleanup when trying to mount inexistent image
  net/ceph: make ceph_msgr_wq non-reentrant
  ceph: fsc->*_wq's aren't used in memory reclaim path
  ceph: Always free allocated memory in osdmap_decode()
  ceph: Makefile: Remove unnessary code
  ceph: associate requests with opening sessions
  ceph: drop redundant r_mds field
  ceph: implement DIRLAYOUTHASH feature to get dir layout from MDS
  ceph: add dir_layout to inode
2011-01-13 10:25:24 -08:00
Linus Torvalds
008d23e485 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
  Documentation/trace/events.txt: Remove obsolete sched_signal_send.
  writeback: fix global_dirty_limits comment runtime -> real-time
  ppc: fix comment typo singal -> signal
  drivers: fix comment typo diable -> disable.
  m68k: fix comment typo diable -> disable.
  wireless: comment typo fix diable -> disable.
  media: comment typo fix diable -> disable.
  remove doc for obsolete dynamic-printk kernel-parameter
  remove extraneous 'is' from Documentation/iostats.txt
  Fix spelling milisec -> ms in snd_ps3 module parameter description
  Fix spelling mistakes in comments
  Revert conflicting V4L changes
  i7core_edac: fix typos in comments
  mm/rmap.c: fix comment
  sound, ca0106: Fix assignment to 'channel'.
  hrtimer: fix a typo in comment
  init/Kconfig: fix typo
  anon_inodes: fix wrong function name in comment
  fix comment typos concerning "consistent"
  poll: fix a typo in comment
  ...

Fix up trivial conflicts in:
 - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c)
 - fs/ext4/ext4.h

Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.
2011-01-13 10:05:56 -08:00
Pablo Neira Ayuso
f31e8d4982 netfilter: ctnetlink: fix loop in ctnetlink_get_conntrack()
This patch fixes a loop in ctnetlink_get_conntrack() that can be
triggered if you use the same socket to receive events and to
perform a GET operation. Under heavy load, netlink_unicast()
may return -EAGAIN, this error code is reserved in nfnetlink for
the module load-on-demand. Instead, we return -ENOBUFS which is
the appropriate error code that has to be propagated to
user-space.

Reported-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-01-13 17:03:39 +01:00
Florian Westphal
6faee60a4e netfilter: ebt_ip6: allow matching on ipv6-icmp types/codes
To avoid adding a new match revision icmp type/code are stored
in the sport/dport area.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Holger Eitzenberger <holger@eitzenberger.org>
Reviewed-by: Bart De Schuymer<bdschuym@pandora.be>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-01-13 12:05:12 +01:00
Eric Dumazet
255d0dc340 netfilter: x_table: speedup compat operations
One iptables invocation with 135000 rules takes 35 seconds of cpu time
on a recent server, using a 32bit distro and a 64bit kernel.

We eventually trigger NMI/RCU watchdog.

INFO: rcu_sched_state detected stall on CPU 3 (t=6000 jiffies)

COMPAT mode has quadratic behavior and consume 16 bytes of memory per
rule.

Switch the xt_compat algos to use an array instead of list, and use a
binary search to locate an offset in the sorted array.

This halves memory need (8 bytes per rule), and removes quadratic
behavior [ O(N*N) -> O(N*log2(N)) ]

Time of iptables goes from 35 s to 150 ms.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-01-13 12:05:12 +01:00
Patrick McHardy
b017900aac netfilter: xt_conntrack: support matching on port ranges
Add a new revision 3 that contains port ranges for all of origsrc,
origdst, replsrc and repldst. The high ports are appended to the
original v2 data structure to allow sharing most of the code with
v1 and v2. Use of the revision specific port matching function is
made dependant on par->match->revision.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-01-13 12:05:12 +01:00
Randy Dunlap
3806b4f3b6 eth: fix new kernel-doc warning
Fix new kernel-doc warning (copy-paste typo):

Warning(net/ethernet/eth.c:366): No description found for parameter 'rxqs'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-12 19:00:40 -08:00
David S. Miller
464143c911 Merge branch 'master' of git://1984.lsi.us.es/net-2.6 2011-01-12 18:58:40 -08:00
Alexey Kuznetsov
72b43d0898 inet6: prevent network storms caused by linux IPv6 routers
Linux IPv6 forwards unicast packets, which are link layer multicasts...
The hole was present since day one. I was 100% this check is there, but it is not.

The problem shows itself, f.e. when Microsoft Network Load Balancer runs on a network.
This software resolves IPv6 unicast addresses to multicast MAC addresses.

Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-12 18:51:55 -08:00
Hans Schillstrom
c6d2d445d8 IPVS: netns, final patch enabling network name space.
all init_net removed, (except for some alloc related
that needs to be there)

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-13 10:30:29 +09:00
Hans Schillstrom
4a98480bcc IPVS: netns, misc init_net removal in core.
init_net removed in __ip_vs_addr_is_local_v6, and got net as param.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-13 10:30:29 +09:00
Hans Schillstrom
763f8d0ed4 IPVS: netns, svc counters moved in ip_vs_ctl,c
Last two global vars to be moved,
ip_vs_ftpsvc_counter and ip_vs_nullsvc_counter.

[horms@verge.net.au: removed whitespace-change-only hunk]
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-13 10:30:28 +09:00
Hans Schillstrom
f2431e6e92 IPVS: netns, trash handling
trash list per namspace,
and reordering of some params in dst struct.

[ horms@verge.net.au: Use cancel_delayed_work_sync() instead of
	              cancel_rearming_delayed_work(). Found during
		      merge conflict resoliution ]
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-13 10:30:28 +09:00
Hans Schillstrom
f6340ee0c6 IPVS: netns, defense work timer.
This patch makes defense work timer per name-space,
A net ptr had to be added to the ipvs struct,
since it's needed by defense_work_handler.

[ horms@verge.net.au: Use cancel_delayed_work_sync() instead of
	              cancel_rearming_delayed_work(). Found during
		      merge conflict resoliution ]
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-13 10:30:28 +09:00
Hans Schillstrom
a0840e2e16 IPVS: netns, ip_vs_ctl local vars moved to ipvs struct.
Moving global vars to ipvs struct, except for svc table lock.
Next patch for ctl will be drop-rate handling.

*v3
__ip_vs_mutex remains global
 ip_vs_conntrack_enabled(struct netns_ipvs *ipvs)

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-13 10:30:28 +09:00
Hans Schillstrom
6e67e586e7 IPVS: netns, connection hash got net as param.
Connection hash table is now name space aware.
i.e. net ptr >> 8 is xor:ed to the hash,
and this is the first param to be compared.
The net struct is 0xa40 in size ( a little bit smaller for 32 bit arch:s)
and cache-line aligned, so a ptr >> 5 might be a more clever solution ?

All lookups where net is compared uses net_eq() which returns 1 when netns
is disabled, and the compiler seems to do something clever in that case.

ip_vs_conn_fill_param() have *net as first param now.

Three new inlines added to keep conn struct smaller
when names space is disabled.
- ip_vs_conn_net()
- ip_vs_conn_net_set()
- ip_vs_conn_net_eq()

*v3
  moved net compare to the end in "fast path"

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-13 10:30:28 +09:00
Hans Schillstrom
b17fc9963f IPVS: netns, ip_vs_stats and its procfs
The statistic counter locks for every packet are now removed,
and that statistic is now per CPU, i.e. no locks needed.
However summing is made in ip_vs_est into ip_vs_stats struct
which is moved to ipvs struc.

procfs, ip_vs_stats now have a "per cpu" count and a grand total.
A new function seq_file_single_net() in ip_vs.h created for handling of
single_open_net() since it does not place net ptr in a struct, like others.

/var/lib/lxc # cat /proc/net/ip_vs_stats_percpu
       Total Incoming Outgoing         Incoming         Outgoing
CPU    Conns  Packets  Packets            Bytes            Bytes
  0        0        3        1               9D               34
  1        0        1        2               49               70
  2        0        1        2               34               76
  3        1        2        2               70               74
  ~        1        7        7              18A              18E

     Conns/s   Pkts/s   Pkts/s          Bytes/s          Bytes/s
           0        0        0                0                0

*v3
ip_vs_stats reamains as before, instead ip_vs_stats_percpu is added.
u64 seq lock added

*v4
Bug correction inbytes and outbytes as own vars..
per_cpu counter for all stats now as suggested by Julian.

[horms@verge.net.au: removed whitespace-change-only hunk]
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-13 10:30:28 +09:00
Hans Schillstrom
f131315fa2 IPVS: netns awareness to ip_vs_sync
All global variables moved to struct ipvs,
most external changes fixed (i.e. init_net removed)
in sync_buf create  + 4 replaced by sizeof(struct..)

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-13 10:30:28 +09:00
Hans Schillstrom
29c2026fd4 IPVS: netns awareness to ip_vs_est
All variables moved to struct ipvs,
most external changes fixed (i.e. init_net removed)

*v3
 timer per ns instead of a common timer in estimator.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-13 10:30:28 +09:00
Hans Schillstrom
ab8a5e8408 IPVS: netns awareness to ip_vs_app
All variables moved to struct ipvs,
most external changes fixed (i.e. init_net removed)

in ip_vs_protocol param struct net *net added to:
 - register_app()
 - unregister_app()
This affected almost all proto_xxx.c files

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-13 10:30:28 +09:00
Hans Schillstrom
9bbac6a904 IPVS: netns, common protocol changes and use of appcnt.
appcnt and timeout_table moved from struct ip_vs_protocol to
ip_vs proto_data.

struct net *net added as first param to
 - register_app()
 - unregister_app()
 - app_conn_bind()
 - ip_vs_conn_new()

[horms@verge.net.au: removed cosmetic-change-only hunk]
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-13 10:30:27 +09:00
Hans Schillstrom
9330419d9a IPVS: netns, use ip_vs_proto_data as param.
ip_vs_protocol *pp is replaced by ip_vs_proto_data *pd in
function call in ip_vs_protocol struct i.e. :,
 - timeout_change()
 - state_transition()

ip_vs_protocol_timeout_change() got ipvs as param, due to above
and a upcoming patch - defence work

Most of this changes are triggered by Julians comment:
"tcp_timeout_change should work with the new struct ip_vs_proto_data
        so that tcp_state_table will go to pd->state_table
        and set_tcp_state will get pd instead of pp"

*v3
Mostly comments from Julian
The pp -> pd conversion should start from functions like
ip_vs_out() that use pp = ip_vs_proto_get(iph.protocol),
now they should use ip_vs_proto_data_get(net, iph.protocol).
conn_in_get() and conn_out_get() unused param *pp, removed.

*v4
ip_vs_protocol_timeout_change() walk the proto_data path.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-13 10:30:27 +09:00
Hans Schillstrom
88fe2d3727 IPVS: netns preparation for proto_ah_esp
In this phase (one), all local vars will be moved to ipvs struct.

Remaining work, add param struct net *net to a couple of
functions that common for all protos.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-13 10:30:27 +09:00
Hans Schillstrom
9d934878e7 IPVS: netns preparation for proto_sctp
In this phase (one), all local vars will be moved to ipvs struct.

Remaining work, add param struct net *net to a couple of
functions that is common for all protos and use ip_vs_proto_data

*v3
 Removed unuset function set_state_timeout()

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-13 10:30:27 +09:00
Hans Schillstrom
78b16bde10 IPVS: netns preparation for proto_udp
In this phase (one), all local vars will be moved to ipvs struct.

Remaining work, add param struct net *net to a couple of
functions that is common for all protos and use ip_vs_proto_data

*v3
Removed unused function set_state_timeout()

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-13 10:30:27 +09:00
Hans Schillstrom
4a85b96c08 IPVS: netns preparation for proto_tcp
In this phase (one), all local vars will be moved to ipvs struct.

Remaining work, add param struct net *net to a couple of
functions that is common for all protos and use all
ip_vs_proto_data

*v3
Removed unused function as sugested by Simon

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-13 10:30:27 +09:00
Hans Schillstrom
252c641032 IPVS: netns, prepare protocol
Add support for protocol data per name-space.
in struct ip_vs_protocol, appcnt will be removed when all protos
are modified for network name-space.

This patch causes warnings of unused functions, they will be used
when next patch will be applied.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-13 10:30:27 +09:00
Hans Schillstrom
b6e885ddb9 IPVS: netns awarness to lblc sheduler
var sysctl_ip_vs_lblc_expiration moved to ipvs struct as
    sysctl_lblc_expiration

procfs updated to handle this.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-13 10:30:27 +09:00
Hans Schillstrom
d0a1eef9c3 IPVS: netns awarness to lblcr sheduler
var sysctl_ip_vs_lblcr_expiration moved to ipvs struct as
    sysctl_lblcr_expiration

procfs updated to handle this.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-13 10:30:27 +09:00
Hans Schillstrom
fc723250c9 IPVS: netns to services part 1
Services hash tables got netns ptr a hash arg,
While Real Servers (rs) has been moved to ipvs struct.
Two new inline functions added to get net ptr from skb.

Since ip_vs is called from different contexts there is two
places to dig for the net ptr skb->dev or skb->sk
this is handled in skb_net() and skb_sknet()

Global functions, ip_vs_service_get() ip_vs_lookup_real_service()
etc have got  struct net *net as first param.
If possible get net ptr skb etc,
 - if not &init_net is used at this early stage of patching.

ip_vs_ctl.c  procfs not ready for netns yet.

*v3
 Comments by Julian
- __ip_vs_service_find and __ip_vs_svc_fwm_find are fast path,
  net_eq(svc->net, net) so the check is at the end now.
- net = skb_net(skb) in ip_vs_out moved after check for skb_dst.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-13 10:30:26 +09:00
Hans Schillstrom
61b1ab4583 IPVS: netns, add basic init per netns.
Preparation for network name-space init, in this stage
some empty functions exists.

In most files there is a check if it is root ns i.e. init_net
if (!net_eq(net, &init_net))
        return ...
this will be removed by the last patch, when enabling name-space.

*v3
 ip_vs_conn.c merge error corrected.
 net_ipvs #ifdef removed as sugested by Jan Engelhardt

[ horms@verge.net.au: Removed whitespace-change-only hunks ]
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-01-13 10:30:26 +09:00
Simon Horman
fee1cc0895 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 into HEAD 2011-01-13 10:29:21 +09:00
Al Viro
c74a1cbb3c pass default dentry_operations to mount_pseudo()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:03:43 -05:00
Tejun Heo
f363e45fd1 net/ceph: make ceph_msgr_wq non-reentrant
ceph messenger code does a rather complex dancing around multithread
workqueue to make sure the same work item isn't executed concurrently
on different CPUs.  This restriction can be provided by workqueue with
WQ_NON_REENTRANT.

Make ceph_msgr_wq non-reentrant workqueue with the default concurrency
level and remove the QUEUED/BUSY logic.

* This removes backoff handling in con_work() but it couldn't reliably
  block execution of con_work() to begin with - queue_con() can be
  called after the work started but before BUSY is set.  It seems that
  it was an optimization for a rather cold path and can be safely
  removed.

* The number of concurrent work items is bound by the number of
  connections and connetions are independent from each other.  With
  the default concurrency level, different connections will be
  executed independently.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Sage Weil <sage@newdream.net>
Cc: ceph-devel@vger.kernel.org
Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-12 15:15:14 -08:00
Jesper Juhl
b0aee3516d ceph: Always free allocated memory in osdmap_decode()
Always free memory allocated to 'pi' in
net/ceph/osdmap.c::osdmap_decode().

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-12 15:15:14 -08:00
Sage Weil
6c0f3af72c ceph: add dir_layout to inode
Add a ceph_dir_layout to the inode, and calculate dentry hash values based
on the parent directory's specified dir_hash function.  This is needed
because the old default Linux dcache hash function is extremely week and
leads to a poor distribution of files among dir fragments.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-12 15:15:12 -08:00
KOVACS Krisztian
2fc72c7b84 netfilter: fix compilation when conntrack is disabled but tproxy is enabled
The IPv6 tproxy patches split IPv6 defragmentation off of conntrack, but
failed to update the #ifdef stanzas guarding the defragmentation related
fields and code in skbuff and conntrack related code in nf_defrag_ipv6.c.

This patch adds the required #ifdefs so that IPv6 tproxy can truly be used
without connection tracking.

Original report:
http://marc.info/?l=linux-netdev&m=129010118516341&w=2

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-01-12 20:25:08 +01:00
Kees Cook
5b919f833d net: ax25: fix information leak to userland harder
Commit fe10ae5338 adds a memset() to clear
the structure being sent back to userspace, but accidentally used the
wrong size.

Reported-by: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Kees Cook <kees.cook@canonical.com>
Cc: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-12 00:34:49 -08:00
Linus Torvalds
4162cf6497 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (67 commits)
  cxgb4vf: recover from failure in cxgb4vf_open()
  netfilter: ebtables: make broute table work again
  netfilter: fix race in conntrack between dump_table and destroy
  ah: reload pointers to skb data after calling skb_cow_data()
  ah: update maximum truncated ICV length
  xfrm: check trunc_len in XFRMA_ALG_AUTH_TRUNC
  ehea: Increase the skb array usage
  net/fec: remove config FEC2 as it's used nowhere
  pcnet_cs: add new_id
  tcp: disallow bind() to reuse addr/port
  net/r8169: Update the function of parsing firmware
  net: ppp: use {get,put}_unaligned_be{16,32}
  CAIF: Fix IPv6 support in receive path for GPRS/3G
  arp: allow to invalidate specific ARP entries
  net_sched: factorize qdisc stats handling
  mlx4: Call alloc_etherdev to allocate RX and TX queues
  net: Add alloc_netdev_mqs function
  caif: don't set connection request param size before copying data
  cxgb4vf: fix mailbox data/control coherency domain race
  qlcnic: change module parameter permissions
  ...
2011-01-11 16:32:41 -08:00
David S. Miller
60dbb011df Merge branch 'master' of git://1984.lsi.us.es/net-2.6 2011-01-11 15:43:03 -08:00
Linus Torvalds
b9d919a4ac Merge branch 'nfs-for-2.6.38' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'nfs-for-2.6.38' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (89 commits)
  NFS fix the setting of exchange id flag
  NFS: Don't use vm_map_ram() in readdir
  NFSv4: Ensure continued open and lockowner name uniqueness
  NFS: Move cl_delegations to the nfs_server struct
  NFS: Introduce nfs_detach_delegations()
  NFS: Move cl_state_owners and related fields to the nfs_server struct
  NFS: Allow walking nfs_client.cl_superblocks list outside client.c
  pnfs: layout roc code
  pnfs: update nfs4_callback_recallany to handle layouts
  pnfs: add CB_LAYOUTRECALL handling
  pnfs: CB_LAYOUTRECALL xdr code
  pnfs: change lo refcounting to atomic_t
  pnfs: check that partial LAYOUTGET return is ignored
  pnfs: add layout to client list before sending rpc
  pnfs: serialize LAYOUTGET(openstateid)
  pnfs: layoutget rpc code cleanup
  pnfs: change how lsegs are removed from layout list
  pnfs: change layout state seqlock to a spinlock
  pnfs: add prefix to struct pnfs_layout_hdr fields
  pnfs: add prefix to struct pnfs_layout_segment fields
  ...
2011-01-11 15:11:56 -08:00
Stephen Hemminger
13ee6ac579 netfilter: fix race in conntrack between dump_table and destroy
The netlink interface to dump the connection tracking table has a race
when entries are deleted at the same time. A customer reported a crash
and the backtrace showed thatctnetlink_dump_table was running while a
conntrack entry was being destroyed.
(see https://bugzilla.vyatta.com/show_bug.cgi?id=6402).

According to RCU documentation, when using hlist_nulls the reader
must handle the case of seeing a deleted entry and not proceed
further down the linked list.  The old code would continue
which caused the scan to walk into the free list.

This patch uses locking (rather than RCU) for this operation which
is guaranteed safe, and no longer requires getting reference while
doing dump operation.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-01-11 23:54:42 +01:00
Dang Hongwu
4b0ef1f223 ah: reload pointers to skb data after calling skb_cow_data()
skb_cow_data() may allocate a new data buffer, so pointers on
skb should be set after this function.

Bug was introduced by commit dff3bb06 ("ah4: convert to ahash")
and 8631e9bd ("ah6: convert to ahash").

Signed-off-by: Wang Xuefu <xuefu.wang@6wind.com>
Acked-by: Krzysztof Witek <krzysztof.witek@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-11 14:03:10 -08:00
Nicolas Dichtel
fa6dd8a2c8 xfrm: check trunc_len in XFRMA_ALG_AUTH_TRUNC
Maximum trunc length is defined by MAX_AH_AUTH_LEN (in bytes)
and need to be checked when this value is set (in bits) by
the user. In ah4.c and ah6.c a BUG_ON() checks this condiftion.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-11 14:03:09 -08:00
Eric Dumazet
c191a836a9 tcp: disallow bind() to reuse addr/port
inet_csk_bind_conflict() logic currently disallows a bind() if
it finds a friend socket (a socket bound on same address/port)
satisfying a set of conditions :

1) Current (to be bound) socket doesnt have sk_reuse set
OR
2) other socket doesnt have sk_reuse set
OR
3) other socket is in LISTEN state

We should add the CLOSE state in the 3) condition, in order to avoid two
REUSEADDR sockets in CLOSE state with same local address/port, since
this can deny further operations.

Note : a prior patch tried to address the problem in a different (and
buggy) way. (commit fda48a0d7a tcp: bind() fix when many ports
are bound).

Reported-by: Gaspar Chilingarov <gasparch@gmail.com>
Reported-by: Daniel Baluta <daniel.baluta@gmail.com>
Tested-by: Daniel Baluta <daniel.baluta@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-11 14:03:07 -08:00
J. Bruce Fields
f0418aa4b1 rpc: allow xprt_class->setup to return a preexisting xprt
This allows us to reuse the xprt associated with a server connection if
one has already been set up.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-11 15:04:10 -05:00
J. Bruce Fields
99de8ea962 rpc: keep backchannel xprt as long as server connection
Multiple backchannels can share the same tcp connection; from rfc 5661 section
2.10.3.1:

	A connection's association with a session is not exclusive.  A
	connection associated with the channel(s) of one session may be
	simultaneously associated with the channel(s) of other sessions
	including sessions associated with other client IDs.

However, multiple backchannels share a connection, they must all share
the same xid stream (hence the same rpc_xprt); the only way we have to
match replies with calls at the rpc layer is using the xid.

So, keep the rpc_xprt around as long as the connection lasts, in case
we're asked to use the connection as a backchannel again.

Requests to create new backchannel clients over a given server
connection should results in creating new clients that reuse the
existing rpc_xprt.

But to start, just reject attempts to associate multiple rpc_xprt's with
the same underlying bc_xprt.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-11 15:04:10 -05:00
J. Bruce Fields
d75faea330 rpc: move sk_bc_xprt to svc_xprt
This seems obviously transport-level information even if it's currently
used only by the server socket code.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-11 15:04:10 -05:00
J. Bruce Fields
a2c50f6916 Merge commit 'v2.6.37' into for-2.6.38-incoming
I made a slight mess of Documentation/filesystems/Locking; resolve
conflicts with upstream before fixing it up.
2011-01-11 15:02:19 -05:00
M. Mohan Kumar
219fd58be6 net/9p: Use proper data types
Use proper data types for storing the count of the binary blob and
length of a string. Without this patch length calculation of string will
always result in -1 because of comparision between signed and unsigned
integer.

Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2011-01-11 09:58:07 -06:00
Kumar Sanghvi
d7b92affba CAIF: Fix IPv6 support in receive path for GPRS/3G
Checks version field of IP in the receive path for GPRS/3G data
and appropriately sets the value of skb->protocol.

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-10 16:12:00 -08:00
Maxim Levitsky
545ecdc3b3 arp: allow to invalidate specific ARP entries
IPv4 over firewire needs to be able to remove ARP entries
from the ARP cache that belong to nodes that are removed, because
IPv4 over firewire uses ARP packets for private information
about nodes.

This information becomes invalid as soon as node drops
off the bus and when it reconnects, its only possible
to start talking to it after it responded to an ARP packet.
But ARP cache prevents such packets from being sent.

Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-10 16:10:37 -08:00
Eric Dumazet
bfe0d0298f net_sched: factorize qdisc stats handling
HTB takes into account skb is segmented in stats updates.
Generalize this to all schedulers.

They should use qdisc_bstats_update() helper instead of manipulating
bstats.bytes and bstats.packets

Add bstats_update() helper too for classes that use
gnet_stats_basic_packed fields.

Note : Right now, TCQ_F_CAN_BYPASS shortcurt can be taken only if no
stab is setup on qdisc.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-10 16:07:54 -08:00
Tom Herbert
36909ea438 net: Add alloc_netdev_mqs function
Added alloc_netdev_mqs function which allows the number of transmit and
receive queues to be specified independenty.  alloc_netdev_mq was
changed to a macro to call the new function.  Also added
alloc_etherdev_mqs with same purpose.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-10 16:05:30 -08:00
Dan Rosenberg
91b5c98c2e caif: don't set connection request param size before copying data
The size field should not be set until after the data is successfully
copied in.

Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-10 16:00:54 -08:00
Dan Carpenter
facb4edc1e phonet: some signedness bugs
Dan Rosenberg pointed out that there were some signed comparison bugs
in the phonet protocol.

http://marc.info/?l=full-disclosure&m=129424528425330&w=2

The problem is that we check for array overflows but "protocol" is
signed and we don't check for array underflows.  If you have already
have CAP_SYS_ADMIN then you could use the bugs to get root, or someone
could cause an oops by mistake.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-10 13:33:17 -08:00
Trond Myklebust
68c404b18f Merge branch 'bugfixes' into nfs-for-2.6.38
Conflicts:
	fs/nfs/nfs2xdr.c
	fs/nfs/nfs3xdr.c
	fs/nfs/nfs4xdr.c
2011-01-10 14:48:02 -05:00
Trond Myklebust
6650239a4b NFS: Don't use vm_map_ram() in readdir
vm_map_ram() is not available on NOMMU platforms, and causes trouble
on incoherrent architectures such as ARM when we access the page data
through both the direct and the virtual mapping.

The alternative is to use the direct mapping to access page data
for the case when we are not crossing a page boundary, but to copy
the data into a linear scratch buffer when we are accessing data
that spans page boundaries.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: stable@kernel.org  [2.6.37]
2011-01-10 14:45:01 -05:00
Eric Dumazet
83723d6071 netfilter: x_tables: dont block BH while reading counters
Using "iptables -L" with a lot of rules have a too big BH latency.
Jesper mentioned ~6 ms and worried of frame drops.

Switch to a per_cpu seqlock scheme, so that taking a snapshot of
counters doesnt need to block BH (for this cpu, but also other cpus).

This adds two increments on seqlock sequence per ipt_do_table() call,
its a reasonable cost for allowing "iptables -L" not block BH
processing.

Reported-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-01-10 20:11:38 +01:00
Jesse Gross
0363466866 net offloading: Convert checksums to use centrally computed features.
In order to compute the features for other offloads (primarily
scatter/gather), we need to first check the ability of the NIC to
offload the checksum for the packet.  Since we have already computed
this, we can directly use the result instead of figuring it out
again.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-09 23:35:35 -08:00
Jesse Gross
02932ce9e2 net offloading: Convert skb_need_linearize() to use precomputed features.
This switches skb_need_linearize() to use the features that have
been centrally computed.  In doing so, this fixes a problem where
scatter/gather should not be used because the card does not support
checksum offloading on that type of packet.  On device registration
we only check that some form of checksum offloading is available if
scatter/gatther is enabled but we must also check at transmission
time.  Examples of this include IPv6 or vlan packets on a NIC that
only supports IPv4 offloading.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-09 23:35:35 -08:00
Jesse Gross
91ecb63c07 net offloading: Convert dev_gso_segment() to use precomputed features.
This switches dev_gso_segment() to use the device features computed
by the centralized routine.  In doing so, it fixes a problem where
it would always use dev->features, instead of those appropriate
to the number of vlan tags if any are present.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-09 23:35:34 -08:00
Jesse Gross
fc741216db net offloading: Pass features into netif_needs_gso().
Now that there is a single function that can compute the device
features relevant to a packet, we don't want to run it for each
offload.  This converts netif_needs_gso() to take the features
of the device, rather than computing them itself.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-09 23:35:34 -08:00
Jesse Gross
f01a5236bd net offloading: Generalize netif_get_vlan_features().
netif_get_vlan_features() is currently only used by netif_needs_gso(),
so it only concerns itself with GSO features.  However, several other
places also should take into account the contents of the packet when
deciding whether to offload to hardware.  This generalizes the function
to return features about all of the various forms of offloading.  Since
offloads tend to be linked together, this avoids duplicating the logic
in each location (i.e. the scatter/gather code also needs the checksum
logic).

Suggested-by: Michał Mirosław <mirqus@gmail.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-09 23:35:33 -08:00
Jesse Gross
9497a0518e net offloading: Accept NETIF_F_HW_CSUM for all protocols.
We currently only have software fallback for one type of checksum: the
TCP/UDP one's complement.  This means that a protocol that uses hardware
offloading for a different type of checksum (FCoE, SCTP) must directly
check the device's features and do the right thing ahead of time.  By
the time we get to dev_can_checksum(), we're only deciding whether to
apply the one algorithm in software or hardware.  NETIF_F_HW_CSUM has the
same capabilities as the software version, so we should always use it if
present.  The primary advantage of this is multiply tagged vlans can use
hardware checksumming.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-09 23:35:33 -08:00
Randy Dunlap
697d0e338c net: fix kernel-doc warning in core/filter.c
Fix new kernel-doc notation warning in net/core/filter.c:

Warning(net/core/filter.c:172): No description found for parameter 'fentry'
Warning(net/core/filter.c:172): Excess function parameter 'filter' description in 'sk_run_filter'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-09 16:26:51 -08:00
Jan Engelhardt
0ab03c2b14 netlink: test for all flags of the NLM_F_DUMP composite
Due to NLM_F_DUMP is composed of two bits, NLM_F_ROOT | NLM_F_MATCH,
when doing "if (x & NLM_F_DUMP)", it tests for _either_ of the bits
being set. Because NLM_F_MATCH's value overlaps with NLM_F_EXCL,
non-dump requests with NLM_F_EXCL set are mistaken as dump requests.

Substitute the condition to test for _all_ bits being set.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-09 16:25:03 -08:00
David S. Miller
14934efab6 Merge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/net-next-2.6 2011-01-09 16:16:57 -08:00
Linus Torvalds
23d69b09b7 Merge branch 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
* 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (33 commits)
  usb: don't use flush_scheduled_work()
  speedtch: don't abuse struct delayed_work
  media/video: don't use flush_scheduled_work()
  media/video: explicitly flush request_module work
  ioc4: use static work_struct for ioc4_load_modules()
  init: don't call flush_scheduled_work() from do_initcalls()
  s390: don't use flush_scheduled_work()
  rtc: don't use flush_scheduled_work()
  mmc: update workqueue usages
  mfd: update workqueue usages
  dvb: don't use flush_scheduled_work()
  leds-wm8350: don't use flush_scheduled_work()
  mISDN: don't use flush_scheduled_work()
  macintosh/ams: don't use flush_scheduled_work()
  vmwgfx: don't use flush_scheduled_work()
  tpm: don't use flush_scheduled_work()
  sonypi: don't use flush_scheduled_work()
  hvsi: don't use flush_scheduled_work()
  xen: don't use flush_scheduled_work()
  gdrom: don't use flush_scheduled_work()
  ...

Fixed up trivial conflict in drivers/media/video/bt8xx/bttv-input.c
as per Tejun.
2011-01-07 16:58:04 -08:00
Linus Torvalds
fb5131e188 Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6: (65 commits)
  [S390] prevent unneccesary loops_per_jiffy recalculation
  [S390] cpuinfo: use get_online_cpus() instead of preempt_disable()
  [S390] smp: remove cpu hotplug messages
  [S390] mutex: enable spinning mutex on s390
  [S390] mutex: Introduce arch_mutex_cpu_relax()
  [S390] cio: fix ccwgroup unregistration race condition
  [S390] perf: add DWARF register lookup for s390
  [S390] cleanup ftrace backend functions
  [S390] ptrace cleanup
  [S390] smp/idle: call init_idle() before starting a new cpu
  [S390] smp: delay idle task creation
  [S390] dasd: Correct retry counter for terminated I/O.
  [S390] dasd: Add support for raw ECKD access.
  [S390] dasd: Prevent deadlock during suspend/resume.
  [S390] dasd: Improve handling of stolen DASD reservation
  [S390] dasd: do path verification for paths added at runtime
  [S390] dasd: add High Performance FICON multitrack support
  [S390] cio: reduce memory consumption of itcw structures
  [S390] nmi: enable machine checks early
  [S390] qeth: buffer count imbalance
  ...
2011-01-07 14:50:50 -08:00
Linus Torvalds
b4a45f5fe8 Merge branch 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin
* 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin: (57 commits)
  fs: scale mntget/mntput
  fs: rename vfsmount counter helpers
  fs: implement faster dentry memcmp
  fs: prefetch inode data in dcache lookup
  fs: improve scalability of pseudo filesystems
  fs: dcache per-inode inode alias locking
  fs: dcache per-bucket dcache hash locking
  bit_spinlock: add required includes
  kernel: add bl_list
  xfs: provide simple rcu-walk ACL implementation
  btrfs: provide simple rcu-walk ACL implementation
  ext2,3,4: provide simple rcu-walk ACL implementation
  fs: provide simple rcu-walk generic_check_acl implementation
  fs: provide rcu-walk aware permission i_ops
  fs: rcu-walk aware d_revalidate method
  fs: cache optimise dentry and inode for rcu-walk
  fs: dcache reduce branches in lookup path
  fs: dcache remove d_mounted
  fs: fs_struct use seqlock
  fs: rcu-walk for path lookup
  ...
2011-01-07 08:56:33 -08:00
Gerrit Renker
bfbb23466a dccp: make upper bound for seq_window consistent on 32/64 bit
The 'seq_window' sysctl sets the initial value for the DCCP Sequence Window,
which may range from 32..2^46-1 (RFC 4340, 7.5.2). The patch sets the upper
bound consistently to 2^32-1 on both 32 and 64 bit systems, which should be
sufficient - with a RTT of 1sec and 1-byte packets, a seq_window of 2^32-1
corresponds to a link speed of 34 Gbps.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2011-01-07 12:22:44 +01:00
Samuel Jero
763dadd47c dccp: fix bug in updating the GSR
Currently dccp_check_seqno allows any valid packet to update the Greatest
Sequence Number Received, even if that packet's sequence number is less than
the current GSR. This patch adds a check to make sure that the new packet's
sequence number is greater than GSR.

Signed-off-by: Samuel Jero <sj323707@ohio.edu>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2011-01-07 12:22:43 +01:00
Samuel Jero
2cf5be93d1 dccp: fix return value for sequence-invalid packets
Currently dccp_check_seqno returns 0 (indicating a valid packet) if the
acknowledgment number is out of bounds and the sync that RFC 4340 mandates at
this point is currently being rate-limited. This function should return -1,
indicating an invalid packet.

Signed-off-by: Samuel Jero <sj323707@ohio.edu>
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2011-01-07 12:22:43 +01:00
Nick Piggin
b3e19d924b fs: scale mntget/mntput
The problem that this patch aims to fix is vfsmount refcounting scalability.
We need to take a reference on the vfsmount for every successful path lookup,
which often go to the same mount point.

The fundamental difficulty is that a "simple" reference count can never be made
scalable, because any time a reference is dropped, we must check whether that
was the last reference. To do that requires communication with all other CPUs
that may have taken a reference count.

We can make refcounts more scalable in a couple of ways, involving keeping
distributed counters, and checking for the global-zero condition less
frequently.

- check the global sum once every interval (this will delay zero detection
  for some interval, so it's probably a showstopper for vfsmounts).

- keep a local count and only taking the global sum when local reaches 0 (this
  is difficult for vfsmounts, because we can't hold preempt off for the life of
  a reference, so a counter would need to be per-thread or tied strongly to a
  particular CPU which requires more locking).

- keep a local difference of increments and decrements, which allows us to sum
  the total difference and hence find the refcount when summing all CPUs. Then,
  keep a single integer "long" refcount for slow and long lasting references,
  and only take the global sum of local counters when the long refcount is 0.

This last scheme is what I implemented here. Attached mounts and process root
and working directory references are "long" references, and everything else is
a short reference.

This allows scalable vfsmount references during path walking over mounted
subtrees and unattached (lazy umounted) mounts with processes still running
in them.

This results in one fewer atomic op in the fastpath: mntget is now just a
per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock
and non-atomic decrement in the common case. However code is otherwise bigger
and heavier, so single threaded performance is basically a wash.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:33 +11:00
Nick Piggin
4b936885ab fs: improve scalability of pseudo filesystems
Regardless of how much we possibly try to scale dcache, there is likely
always going to be some fundamental contention when adding or removing children
under the same parent. Pseudo filesystems do not seem need to have connected
dentries because by definition they are disconnected.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:32 +11:00
Nick Piggin
fb045adb99 fs: dcache reduce branches in lookup path
Reduce some branches and memory accesses in dcache lookup by adding dentry
flags to indicate common d_ops are set, rather than having to check them.
This saves a pointer memory access (dentry->d_op) in common path lookup
situations, and saves another pointer load and branch in cases where we
have d_op but not the particular operation.

Patched with:

git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:28 +11:00
Nick Piggin
ff0c7d15f9 fs: avoid inode RCU freeing for pseudo fs
Pseudo filesystems that don't put inode on RCU list or reachable by
rcu-walk dentries do not need to RCU free their inodes.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:26 +11:00
Nick Piggin
fa0d7e3de6 fs: icache RCU free inodes
RCU free the struct inode. This will allow:

- Subsequent store-free path walking patch. The inode must be consulted for
  permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
  to take i_lock no longer need to take sb_inode_list_lock to walk the list in
  the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
  page lock to follow page->mapping.

The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.

In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.

The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:26 +11:00
Nick Piggin
fe15ce446b fs: change d_delete semantics
Change d_delete from a dentry deletion notification to a dentry caching
advise, more like ->drop_inode. Require it to be constant and idempotent,
and not take d_lock. This is how all existing filesystems use the callback
anyway.

This makes fine grained dentry locking of dput and dentry lru scanning
much simpler.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:18 +11:00
Andy Adamson
4a19de0f4b NFS rename client back channel transport field
Differentiate from server backchannel

Signed-off-by: Andy Adamson <andros@netapp.com>
Acked-by: Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:25 -05:00
Andy Adamson
2c2618c6f2 NFS associate sessionid with callback connection
The sessions based callback service is started prior to the CREATE_SESSION call
so that it can handle CB_NULL requests which can be sent before the
CREATE_SESSION call returns and the session ID is known.

Set the callback sessionid after a sucessful CREATE_SESSION.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:24 -05:00
Andy Adamson
16b2d1e1d1 SUNRPC register and unregister the back channel transport
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:23 -05:00
Andy Adamson
1f11a034cd SUNRPC new transport for the NFSv4.1 shared back channel
Move the current sock create and destroy routines into the new transport ops.
Back channel socket will be destroyed by the svc_closs_all call in svc_destroy.

Added check: only TCP supported on shared back channel.

Signed-off-by: Andy Adamson <andros@netapp.com>
Acked-by: Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:23 -05:00
Andy Adamson
71e161a6a9 SUNRPC fix bc_send print
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:23 -05:00
Andy Adamson
4b5b3ba16b SUNRPC move svc_drop to caller of svc_process_common
The NFSv4.1 shared back channel does not need to call svc_drop because the
callback service never outlives the single connection it services, and it
reuses it's buffers and keeps the trasport.

Signed-off-by: Andy Adamson <andros@netapp.com>
Acked-by: Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:23 -05:00
Changli Gao
f88de8de5a net: bridge: check the length of skb after nf_bridge_maybe_copy_header()
Since nf_bridge_maybe_copy_header() may change the length of skb,
we should check the length of skb after it to handle the ppoe skbs.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-06 11:33:05 -08:00
Pablo Neira Ayuso
cba85b532e netfilter: fix export secctx error handling
In 1ae4de0cdf, the secctx was exported
via the /proc/net/netfilter/nf_conntrack and ctnetlink interfaces
instead of the secmark.

That patch introduced the use of security_secid_to_secctx() which may
return a non-zero value on error.

In one of my setups, I have NF_CONNTRACK_SECMARK enabled but no
security modules. Thus, security_secid_to_secctx() returns a negative
value that results in the breakage of the /proc and `conntrack -L'
outputs. To fix this, we skip the inclusion of secctx if the
aforementioned function fails.

This patch also fixes the dynamic netlink message size calculation
if security_secid_to_secctx() returns an error, since its logic is
also wrong.

This problem exists in Linux kernel >= 2.6.37.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-06 11:25:00 -08:00
Changli Gao
f682cefa5a netfilter: fix the race when initializing nf_ct_expect_hash_rnd
Since nf_ct_expect_dst_hash() may be called without nf_conntrack_lock
locked, nf_ct_expect_hash_rnd should be initialized in the atomic way.

In this patch, we use nf_conntrack_hash_rnd instead of
nf_ct_expect_hash_rnd.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-06 11:22:20 -08:00
Eric Dumazet
6623e3b24a ipv4: IP defragmentation must be ECN aware
RFC3168 (The Addition of Explicit Congestion Notification to IP)
states :

5.3.  Fragmentation

   ECN-capable packets MAY have the DF (Don't Fragment) bit set.
   Reassembly of a fragmented packet MUST NOT lose indications of
   congestion.  In other words, if any fragment of an IP packet to be
   reassembled has the CE codepoint set, then one of two actions MUST be
   taken:

      * Set the CE codepoint on the reassembled packet.  However, this
        MUST NOT occur if any of the other fragments contributing to
        this reassembly carries the Not-ECT codepoint.

      * The packet is dropped, instead of being reassembled, for any
        other reason.

This patch implements this requirement for IPv4, choosing the first
action :

If one fragment had NO-ECT codepoint
        reassembled frame has NO-ECT
ElIf one fragment had CE codepoint
        reassembled frame has CE

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-06 11:21:30 -08:00
Dan Carpenter
2a8fe00374 dcb: use after free in dcb_flushapp()
The original code has a use after free bug because it's not using the
_safe() version of the list_for_each_entry() macro.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-06 11:16:54 -08:00
Dan Carpenter
70bfa2d2e1 dcb: unlock on error in dcbnl_ieee_get()
There is a "goto nla_put_failure" hidden inside the NLA_PUT() macro, but
we're holding the dcb_lock so we need to unlock first.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-06 11:16:54 -08:00
David S. Miller
5f9251cb93 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2011-01-06 10:55:42 -08:00
Eric Dumazet
2c6607c611 net: add POLLPRI to sock_def_readable()
Leonardo Chiquitto found poll() could block forever on tcp sockets and
Urgent data was received, if the event flag only contains POLLPRI.

He did a bisection and found commit 4938d7e023 (poll: avoid extra
wakeups in select/poll) was the source of the problem.

Problem is TCP sockets use standard sock_def_readable() function for
their sk_data_ready() handler, and sock_def_readable() doesnt signal
POLLPRI.

Only TCP is affected by the problem. Adding POLLPRI to the list of flags
might trigger unnecessary schedules, but URGENT handling is such a
seldom used feature this seems a good compromise.

Thanks a lot to Leonardo for providing the bisection result and a test
program as well.

Reference : http://www.spinics.net/lists/netdev/msg151793.html

Reported-and-bisected-by: Leonardo Chiquitto <leonardo.lists@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-06 10:54:29 -08:00
David S. Miller
3610cda53f af_unix: Avoid socket->sk NULL OOPS in stream connect security hooks.
unix_release() can asynchornously set socket->sk to NULL, and
it does so without holding the unix_state_lock() on "other"
during stream connects.

However, the reverse mapping, sk->sk_socket, is only transitioned
to NULL under the unix_state_lock().

Therefore make the security hooks follow the reverse mapping instead
of the forward mapping.

Reported-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-05 15:38:53 -08:00
Eric Dumazet
44b8288308 net_sched: pfifo_head_drop problem
commit 57dbb2d83d (sched: add head drop fifo queue)
introduced pfifo_head_drop, and broke the invariant that
sch->bstats.bytes and sch->bstats.packets are COUNTER (increasing
counters only)

This can break estimators because est_timer() handles unsigned deltas
only. A decreasing counter can then give a huge unsigned delta.

My mid term suggestion would be to change things so that
sch->bstats.bytes and sch->bstats.packets are incremented in dequeue()
only, not at enqueue() time. We also could add drop_bytes/drop_packets
and provide estimations of drop rates.

It would be more sensible anyway for very low speeds, and big bursts.
Right now, if we drop packets, they still are accounted in byte/packets
abolute counters and rate estimators.

Before this mid term change, this patch makes pfifo_head_drop behavior
similar to other qdiscs in case of drops :
Dont decrement sch->bstats.bytes and sch->bstats.packets

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-05 13:39:17 -08:00
Johannes Berg
06778b1c38 mac80211: remove stray extern
Somehow this snuck into my earlier patch, and
only now did I see a compiler warning:

net/mac80211/led.c:218:13: warning: function '__ieee80211_create_tpt_led_trigger' with external linkage has definition

Remove the stray extern.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-05 16:07:12 -05:00
Johannes Berg
90fc4b3a5b mac80211: implement off-channel TX using hw r-o-c offload
When the driver has remain-on-channel offload,
implement off-channel transmission using that
primitive.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-05 16:07:12 -05:00
Johannes Berg
21f8358964 mac80211: implement hardware offload for remain-on-channel
This allows drivers to support remain-on-channel
offload if they implement smarter timing or need
to use a device implementation like iwlwifi.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-05 16:07:12 -05:00
John W. Linville
c96e96354a Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem
Conflicts:
	net/bluetooth/Makefile
2011-01-05 16:06:25 -05:00
John W. Linville
6303710d7a Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2011-01-05 14:35:41 -05:00
Heiko Carstens
052ff461c8 [S390] irq: have detailed statistics for interrupt types
Up to now /proc/interrupts only has statistics for external and i/o
interrupts but doesn't split up them any further.
This patch adds a line for every single interrupt source so that it
is possible to easier tell what the machine is/was doing.
Part of the output now looks like this;

           CPU0       CPU2       CPU4
EXT:       3898       4232       2305
I/O:        782        315        245
CLK:       1029       1964        727   [EXT] Clock Comparator
IPI:       2868       2267       1577   [EXT] Signal Processor
TMR:          0          0          0   [EXT] CPU Timer
TAL:          0          0          0   [EXT] Timing Alert
PFL:          0          0          0   [EXT] Pseudo Page Fault
[...]
NMI:          0          1          1   [NMI] Machine Checks

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-01-05 12:47:25 +01:00
J. Bruce Fields
fdef7aa5d4 svcrpc: ensure cache_check caller sees updated entry
Supposes cache_check runs simultaneously with an update on a different
CPU:

	cache_check			task doing update
	^^^^^^^^^^^			^^^^^^^^^^^^^^^^^

	1. test for CACHE_VALID		1'. set entry->data
	   & !CACHE_NEGATIVE

	2. use entry->data		2'. set CACHE_VALID

If the two memory writes performed in step 1' and 2' appear misordered
with respect to the reads in step 1 and 2, then the caller could get
stale data at step 2 even though it saw CACHE_VALID set on the cache
entry.

Add memory barriers to prevent this.

Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-04 16:49:25 -05:00
J. Bruce Fields
6bab93f87e svcrpc: take lock on turning entry NEGATIVE in cache_check
We attempt to turn a cache entry negative in place.  But that entry may
already have been filled in by some other task since we last checked
whether it was valid, so we could be modifying an already-valid entry.
If nothing else there's a likely leak in such a case when the entry is
eventually put() and contents are not freed because it has
CACHE_NEGATIVE set.

So, take the cache_lock just as sunrpc_cache_update() does.

Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-04 16:49:24 -05:00
J. Bruce Fields
9e701c6109 svcrpc: simpler request dropping
Currently we use -EAGAIN returns to determine when to drop a deferred
request.  On its own, that is error-prone, as it makes us treat -EAGAIN
returns from other functions specially to prevent inadvertent dropping.

So, use a flag on the request instead.

Returning an error on request deferral is still required, to prevent
further processing, but we no longer need worry that an error return on
its own could result in a drop.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-04 16:49:22 -05:00
J. Bruce Fields
d76d1815f3 svcrpc: avoid double reply caused by deferral race
Commit d29068c431 "sunrpc: Simplify cache_defer_req and related
functions." asserted that cache_check() could determine success or
failure of cache_defer_req() by checking the CACHE_PENDING bit.

This isn't quite right.

We need to know whether cache_defer_req() created a deferred request,
in which case sending an rpc reply has become the responsibility of the
deferred request, and it is important that we not send our own reply,
resulting in two different replies to the same request.

And the CACHE_PENDING bit doesn't tell us that; we could have
succesfully created a deferred request at the same time as another
thread cleared the CACHE_PENDING bit.

So, partially revert that commit, to ensure that cache_check() returns
-EAGAIN if and only if a deferred request has been created.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: NeilBrown <neilb@suse.de>
2011-01-04 16:49:21 -05:00
J. Bruce Fields
bdd5f05d91 SUNRPC: Remove more code when NFSD_DEPRECATED is not configured
Signed-off-by: NeilBrown <neilb@suse.de>
[bfields@redhat.com: moved svcauth_unix_purge outside ifdef's.]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-04 16:48:02 -05:00
J. Bruce Fields
31f7aa65f5 svcrpc: modifying valid sunrpc cache entries is racy
Once a sunrpc cache entry is VALID, we should be replacing it (and
allowing any concurrent users to destroy it on last put) instead of
trying to update it in place.

Otherwise someone referencing the ip_map we're modifying here could try
to use the m_client just as we're putting the last reference.

The bug should only be seen by users of the legacy nfsd interfaces.

(Thanks to Neil for suggestion to use sunrpc_invalidate.)

Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-04 16:47:29 -05:00
David S. Miller
dbbe68bb12 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2011-01-04 11:57:25 -08:00
Johannes Berg
b5c34f662a mac80211: fix some key comments and code
The key documentation is slightly out of date, fix
that. Also, the list entry in the key struct is no
longer used that way, so list_del_init() isn't
necessary any more there.

Finally, ieee80211_key_link() is no longer invoked
under RCU read lock, but rather with an appropriate
station lock held.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-04 14:46:14 -05:00
Christian Lamparter
707e634326 Revert "mac80211: temporarily disable reorder release timer"
This reverts enables the reorder release timer once again.

The issues laid out in:
<http://www.spinics.net/lists/linux-wireless/msg57214.html>

Have been addressed by:
	mac80211: serialize rx path workers
	mac80211: ignore PSM bit of reordered frames

Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-04 14:46:13 -05:00
Christian Lamparter
24a8fdad35 mac80211: serialize rx path workers
This patch addresses the issue of serialization between
the main rx path and various reorder release timers.

<http://www.spinics.net/lists/linux-wireless/msg57214.html>

It converts the previously local "frames" queue into
a global rx queue [rx_skb_queue]. This way, everyone
(be it the main rx-path or some reorder release timeout)
can add frames to it.

Only one active rx handler worker [ieee80211_rx_handlers]
is needed. All other threads which have lost the race of
"runnning_rx_handler" can now simply "return", knowing that
the thread who had the "edge" will also take care of their
workload.

Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-04 14:46:13 -05:00
Bob Copeland
ff039c6fb3 cfg80211: fix transposition of words in printk
Fixes the misplaced article in the following:

"cfg80211: Updating information on frequency 5785 MHz for
    20 a MHz width channel with regulatory rule:"

Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-04 14:43:01 -05:00
Joel A Fernandes
f76b57b47e mac80211: Fix mesh portal communication with other mesh nodes.
Fixed a bug where if a mesh interface has a different MAC address from its bridge
interface, then it would not be able to send data traffic to any other mesh node.
This also adds support for communication between mesh nodes and external bridged
nodes by using a 6 address format if the source is a node within the mesh and the
destination is an external node proxied by a mesh portal.

Signed-off-by: Joel A Fernandes <agnel.joel@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-04 14:43:01 -05:00
Christian Lamparter
4cfda47b69 mac80211: ignore PSM bit of reordered frames
This patch tackles one of the problems of my
reorder release timer patch from August.

<http://www.spinics.net/lists/linux-wireless/msg57214.html>
=>
What if the reorder release triggers and ap_sta_ps_end
(called by ieee80211_rx_h_sta_process) accidentally clears
the WLAN_STA_PS_STA flag, because 100ms ago - when the STA
was still active - frames were put into the reorder buffer.

Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-04 14:35:15 -05:00
Joel Sing
9fc3bbb4a7 ipv4/route.c: respect prefsrc for local routes
The preferred source address is currently ignored for local routes,
which results in all local connections having a src address that is the
same as the local dst address. Fix this by respecting the preferred source
address when it is provided for local routes.

This bug can be demonstrated as follows:

 # ifconfig dummy0 192.168.0.1
 # ip route show table local | grep local.*dummy0
 local 192.168.0.1 dev dummy0  proto kernel  scope host  src 192.168.0.1
 # ip route change table local local 192.168.0.1 dev dummy0 \
     proto kernel scope host src 127.0.0.1
 # ip route show table local | grep local.*dummy0
 local 192.168.0.1 dev dummy0  proto kernel  scope host  src 127.0.0.1

We now establish a local connection and verify the source IP
address selection:

 # nc -l 192.168.0.1 3128 &
 # nc 192.168.0.1 3128 &
 # netstat -ant | grep 192.168.0.1:3128.*EST
 tcp        0      0 192.168.0.1:3128        192.168.0.1:33228 ESTABLISHED
 tcp        0      0 192.168.0.1:33228       192.168.0.1:3128  ESTABLISHED

Signed-off-by: Joel Sing <jsing@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-04 11:35:12 -08:00
John W. Linville
782a9e31e8 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/padovan/bluetooth-next-2.6 2011-01-04 14:25:28 -05:00
Johannes Berg
d2460f4b2f mac80211: add missing synchronize_rcu
commit ad0e2b5a00
Author: Johannes Berg <johannes.berg@intel.com>
Date:   Tue Jun 1 10:19:19 2010 +0200

    mac80211: simplify key locking

removed the synchronization against RCU and thus
opened a race window where we can use a key for
TX while it is already freed. Put a synchronisation
into the right place to close that window.

Reported-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Cc: stable@kernel.org [2.6.36+]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-04 14:17:23 -05:00
Milton Miller
919bbad580 mac80211: fix mesh forwarding when ratelimited too
Commit b51aff057c said:

    Under memory pressure, the mac80211 mesh code
    may helpfully print a message that it failed
    to clone a mesh frame and then will proceed
    to crash trying to use it anyway. Fix that.

Avoid the reference whenever the frame copy is unsuccessful
regardless of the debug message being suppressed or printed.

Cc: stable@kernel.org [2.6.27+]
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-01-04 14:17:23 -05:00
Trond Myklebust
beb0f0a9fb kernel panic when mount NFSv4
On Tue, 2010-12-14 at 16:58 +0800, Mi Jinlong wrote:
> Hi,
>
> When testing NFSv4 at RHEL6 with kernel 2.6.32, I got a kernel panic
> at NFS client's __rpc_create_common function.
>
> The panic place is:
>   rpc_mkpipe
>     __rpc_lookup_create()          <=== find pipefile *idmap*
>     __rpc_mkpipe()                 <=== pipefile is *idmap*
>       __rpc_create_common()
>        ******  BUG_ON(!d_unhashed(dentry)); ******    *panic*
>
> It means that the dentry's d_flags have be set DCACHE_UNHASHED,
> but it should not be set here.
>
> Is someone known this bug? or give me some idea?
>
> A reproduce program is append, but it can't reproduce the bug every time.
> the export is: "/nfsroot       *(rw,no_root_squash,fsid=0,insecure)"
>
> And the panic message is append.
>
> ============================================================================
> #!/bin/sh
>
> LOOPTOTAL=768
> LOOPCOUNT=0
> ret=0
>
> while [ $LOOPCOUNT -ne $LOOPTOTAL ]
> do
> 	((LOOPCOUNT += 1))
> 	service nfs restart
> 	/usr/sbin/rpc.idmapd
> 	mount -t nfs4 127.0.0.1:/ /mnt|| return 1;
> 	ls -l /var/lib/nfs/rpc_pipefs/nfs/*/
> 	umount /mnt
> 	echo $LOOPCOUNT
> done
>
> ===============================================================================
> Code: af 60 01 00 00 89 fa 89 f0 e8 64 cf 89 f0 e8 5c 7c 64 cf 31 c0 8b 5c 24 10 8b
> 74 24 14 8b 7c 24 18 8b 6c 24 1c 83 c4 20 c3 <0f> 0b eb fc 8b 46 28 c7 44 24 08 20
> de ee f0 c7 44 24 04 56 ea
> EIP:[<f0ee92ea>] __rpc_create_common+0x8a/0xc0 [sunrpc] SS:ESP 0068:eccb5d28
> ---[ end trace 8f5606cd08928ed2]---
> Kernel panic - not syncing: Fatal exception
> Pid:7131, comm: mount.nfs4 Tainted: G     D   -------------------2.6.32 #1
> Call Trace:
>  [<c080ad18>] ? panic+0x42/0xed
>  [<c080e42c>] ? oops_end+0xbc/0xd0
>  [<c040b090>] ? do_invalid_op+0x0/0x90
>  [<c040b10f>] ? do_invalid_op+0x7f/0x90
>  [<f0ee92ea>] ? __rpc_create_common+0x8a/0xc0[sunrpc]
>  [<f0edc433>] ? rpc_free_task+0x33/0x70[sunrpc]
>  [<f0ed6508>] ? prc_call_sync+0x48/0x60[sunrpc]
>  [<f0ed656e>] ? rpc_ping+0x4e/0x60[sunrpc]
>  [<f0ed6eaf>] ? rpc_create+0x38f/0x4f0[sunrpc]
>  [<c080d80b>] ? error_code+0x73/0x78
>  [<f0ee92ea>] ? __rpc_create_common+0x8a/0xc0[sunrpc]
>  [<c0532bda>] ? d_lookup+0x2a/0x40
>  [<f0ee94b1>] ? rpc_mkpipe+0x111/0x1b0[sunrpc]
>  [<f10a59f4>] ? nfs_create_rpc_client+0xb4/0xf0[nfs]
>  [<f10d6c6d>] ? nfs_fscache_get_client_cookie+0x1d/0x50[nfs]
>  [<f10d3fcb>] ? nfs_idmap_new+0x7b/0x140[nfs]
>  [<c05e76aa>] ? strlcpy+0x3a/0x60
>  [<f10a60ca>] ? nfs4_set_client+0xea/0x2b0[nfs]
>  [<f10a6d0c>] ? nfs4_create_server+0xac/0x1b0[nfs]
>  [<c04f1400>] ? krealloc+0x40/0x50
>  [<f10b0e8b>] ? nfs4_remote_get_sb+0x6b/0x250[nfs]
>  [<c04f14ec>] ? kstrdup+0x3c/0x60
>  [<c0520739>] ? vfs_kern_mount+0x69/0x170
>  [<f10b1a3c>] ? nfs_do_root_mount+0x6c/0xa0[nfs]
>  [<f10b1b47>] ? nfs4_try_mount+0x37/0xa0[nfs]
>  [<f10afe6d>] ? nfs4_validate_text_mount_data+-x7d/0xf0[nfs]
>  [<f10b1c42>] ? nfs4_get_sb+0x92/0x2f0
>  [<c0520739>] ? vfs_kern_mount+0x69/0x170
>  [<c05366d2>] ? get_fs_type+0x32/0xb0
>  [<c052089f>] ? do_kern_mount+0x3f/0xe0
>  [<c053954f>] ? do_mount+0x2ef/0x740
>  [<c0537740>] ? copy_mount_options+0xb0/0x120
>  [<c0539a0e>] ? sys_mount+0x6e/0xa0

Hi,

Does the following patch fix the problem?

Cheers
  Trond

--------------------------
SUNRPC: Fix a BUG in __rpc_create_common

From: Trond Myklebust <Trond.Myklebust@netapp.com>

Mi Jinlong reports:

When testing NFSv4 at RHEL6 with kernel 2.6.32, I got a kernel panic
at NFS client's __rpc_create_common function.

The panic place is:
  rpc_mkpipe
      __rpc_lookup_create()          <=== find pipefile *idmap*
      __rpc_mkpipe()                 <=== pipefile is *idmap*
        __rpc_create_common()
         ******  BUG_ON(!d_unhashed(dentry)); ****** *panic*

The test is wrong: we can find ourselves with a hashed negative dentry here
if the idmapper tried to look up the file before we got round to creating
it.

Just replace the BUG_ON() with a d_drop(dentry).

Reported-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-04 13:10:38 -05:00
Tomas Winkler
1a9180a20f net/bridge: fix trivial sparse errors
net/bridge//br_stp_if.c:148:66: warning: conversion of
net/bridge//br_stp_if.c:148:66:     int to
net/bridge//br_stp_if.c:148:66:     int enum umh_wait

net/bridge//netfilter/ebtables.c:1150:30: warning: Using plain integer as NULL pointer

Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-03 13:29:18 -08:00
Eric Dumazet
0dfb33a0d7 sch_red: report backlog information
Provide child qdisc backlog (byte count) information so that "tc -s
qdisc" can report it to user.

packet count is already correctly provided.

qdisc red 11: parent 1:11 limit 60Kb min 15Kb max 45Kb ecn
 Sent 3116427684 bytes 1415782 pkt (dropped 8, overlimits 7866 requeues 0)
 rate 242385Kbit 13630pps backlog 13560b 8p requeues 0
  marked 7865 early 1 pdrop 7 other 0

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-03 12:13:15 -08:00
Shmulik Ravid
7f891cf1fc dcbnl: more informed return values for new dcbnl routines
More accurate return values for the following (new) dcbnl routines:
dcbnl_getdcbx()
dcbnl_setdcbx()
dcbnl_getfeatcfg()
dcbnl_setfeatcfg()

Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-03 12:12:11 -08:00
Florian Westphal
e6f26129eb bridge: stp: ensure mac header is set
commit bf9ae5386b
(llc: use dev_hard_header) removed the
skb_reset_mac_header call from llc_mac_hdr_init.

This seems fine itself, but br_send_bpdu() invokes ebtables LOCAL_OUT.

We oops in ebt_basic_match() because it assumes eth_hdr(skb) returns
a meaningful result.

Cc: acme@ghostprotocols.net
References: https://bugzilla.kernel.org/show_bug.cgi?id=24532
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-03 12:09:33 -08:00
Tomas Winkler
9d89081d69 bridge: fix br_multicast_ipv6_rcv for paged skbs
use pskb_may_pull to access ipv6 header correctly for paged skbs
It was omitted in the bridge code leading to crash in blind
__skb_pull

since the skb is cloned undonditionally we also simplify the
the exit path

this fixes bug https://bugzilla.kernel.org/show_bug.cgi?id=25202

Dec 15 14:36:40 User-PC hostapd: wlan0: STA 00:15:00:60:5d:34 IEEE 802.11: authenticated
Dec 15 14:36:40 User-PC hostapd: wlan0: STA 00:15:00:60:5d:34 IEEE 802.11: associated (aid 2)
Dec 15 14:36:40 User-PC hostapd: wlan0: STA 00:15:00:60:5d:34 RADIUS: starting accounting session 4D0608A3-00000005
Dec 15 14:36:41 User-PC kernel: [175576.120287] ------------[ cut here ]------------
Dec 15 14:36:41 User-PC kernel: [175576.120452] kernel BUG at include/linux/skbuff.h:1178!
Dec 15 14:36:41 User-PC kernel: [175576.120609] invalid opcode: 0000 [#1] SMP
Dec 15 14:36:41 User-PC kernel: [175576.120749] last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/uevent
Dec 15 14:36:41 User-PC kernel: [175576.121035] Modules linked in: approvals binfmt_misc bridge stp llc parport_pc ppdev arc4 iwlagn snd_hda_codec_realtek iwlcore i915 snd_hda_intel mac80211 joydev snd_hda_codec snd_hwdep snd_pcm snd_seq_midi drm_kms_helper snd_rawmidi drm snd_seq_midi_event snd_seq snd_timer snd_seq_device cfg80211 eeepc_wmi usbhid psmouse intel_agp i2c_algo_bit intel_gtt uvcvideo agpgart videodev sparse_keymap snd shpchp v4l1_compat lp hid video serio_raw soundcore output snd_page_alloc ahci libahci atl1c
Dec 15 14:36:41 User-PC kernel: [175576.122712]
Dec 15 14:36:41 User-PC kernel: [175576.122769] Pid: 0, comm: kworker/0:0 Tainted: G        W   2.6.37-rc5-wl+ #3 1015PE/1016P
Dec 15 14:36:41 User-PC kernel: [175576.123012] EIP: 0060:[<f83edd65>] EFLAGS: 00010283 CPU: 1
Dec 15 14:36:41 User-PC kernel: [175576.123193] EIP is at br_multicast_rcv+0xc95/0xe1c [bridge]
Dec 15 14:36:41 User-PC kernel: [175576.123362] EAX: 0000001c EBX: f5626318 ECX: 00000000 EDX: 00000000
Dec 15 14:36:41 User-PC kernel: [175576.123550] ESI: ec512262 EDI: f5626180 EBP: f60b5ca0 ESP: f60b5bd8
Dec 15 14:36:41 User-PC kernel: [175576.123737]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Dec 15 14:36:41 User-PC kernel: [175576.123902] Process kworker/0:0 (pid: 0, ti=f60b4000 task=f60a8000 task.ti=f60b0000)
Dec 15 14:36:41 User-PC kernel: [175576.124137] Stack:
Dec 15 14:36:41 User-PC kernel: [175576.124181]  ec556500 f6d06800 f60b5be8 c01087d8 ec512262 00000030 00000024 f5626180
Dec 15 14:36:41 User-PC kernel: [175576.124181]  f572c200 ef463440 f5626300 3affffff f6d06dd0 e60766a4 000000c4 f6d06860
Dec 15 14:36:41 User-PC kernel: [175576.124181]  ffffffff ec55652c 00000001 f6d06844 f60b5c64 c0138264 c016e451 c013e47d
Dec 15 14:36:41 User-PC kernel: [175576.124181] Call Trace:
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c01087d8>] ? sched_clock+0x8/0x10
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c0138264>] ? enqueue_entity+0x174/0x440
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c016e451>] ? sched_clock_cpu+0x131/0x190
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c013e47d>] ? select_task_rq_fair+0x2ad/0x730
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c0524fc1>] ? nf_iterate+0x71/0x90
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f83e4914>] ? br_handle_frame_finish+0x184/0x220 [bridge]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f83e4790>] ? br_handle_frame_finish+0x0/0x220 [bridge]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f83e46e9>] ? br_handle_frame+0x189/0x230 [bridge]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f83e4790>] ? br_handle_frame_finish+0x0/0x220 [bridge]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f83e4560>] ? br_handle_frame+0x0/0x230 [bridge]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c04ff026>] ? __netif_receive_skb+0x1b6/0x5b0
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c04f7a30>] ? skb_copy_bits+0x110/0x210
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c0503a7f>] ? netif_receive_skb+0x6f/0x80
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f82cb74c>] ? ieee80211_deliver_skb+0x8c/0x1a0 [mac80211]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f82cc836>] ? ieee80211_rx_handlers+0xeb6/0x1aa0 [mac80211]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c04ff1f0>] ? __netif_receive_skb+0x380/0x5b0
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c016e242>] ? sched_clock_local+0xb2/0x190
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c012b688>] ? default_spin_lock_flags+0x8/0x10
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c05d83df>] ? _raw_spin_lock_irqsave+0x2f/0x50
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f82cd621>] ? ieee80211_prepare_and_rx_handle+0x201/0xa90 [mac80211]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f82ce154>] ? ieee80211_rx+0x2a4/0x830 [mac80211]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f815a8d6>] ? iwl_update_stats+0xa6/0x2a0 [iwlcore]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f8499212>] ? iwlagn_rx_reply_rx+0x292/0x3b0 [iwlagn]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c05d83df>] ? _raw_spin_lock_irqsave+0x2f/0x50
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f8483697>] ? iwl_rx_handle+0xe7/0x350 [iwlagn]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<f8486ab7>] ? iwl_irq_tasklet+0xf7/0x5c0 [iwlagn]
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c01aece1>] ? __rcu_process_callbacks+0x201/0x2d0
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c0150d05>] ? tasklet_action+0xc5/0x100
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c0150a07>] ? __do_softirq+0x97/0x1d0
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c05d910c>] ? nmi_stack_correct+0x2f/0x34
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c0150970>] ? __do_softirq+0x0/0x1d0
Dec 15 14:36:41 User-PC kernel: [175576.124181]  <IRQ>
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c01508f5>] ? irq_exit+0x65/0x70
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c05df062>] ? do_IRQ+0x52/0xc0
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c01036b0>] ? common_interrupt+0x30/0x38
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c03a1fc2>] ? intel_idle+0xc2/0x160
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c04daebb>] ? cpuidle_idle_call+0x6b/0x100
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c0101dea>] ? cpu_idle+0x8a/0xf0
Dec 15 14:36:41 User-PC kernel: [175576.124181]  [<c05d2702>] ? start_secondary+0x1e8/0x1ee

Cc: David Miller <davem@davemloft.net>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-03 11:26:34 -08:00
Paul Gortmaker
40cd201e37 tipc: update log.h re-include protection to reflect new name
The tipc/dbg.h file was recently renamed to tipc/log.h,
but the re-include define was not updated accordingly.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-01 14:56:18 -08:00
Allan Stephens
a016892cd6 tipc: remove extraneous braces from single statements
Cleans up TIPC's source code to eliminate the presence of unnecessary
use of {} around single statements.

These changes are purely cosmetic and do not alter the operation of TIPC
in any way.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-01 13:57:57 -08:00
Allan Stephens
e3ec9c7d5e tipc: remove zeroing assignments to static global variables
Cleans up TIPC's source code to eliminate the needless initialization
of static variables to zero.

These changes are purely cosmetic and do not alter the operation of TIPC
in any way.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-01 13:57:57 -08:00
Allan Stephens
2db9983a43 tipc: split variable assignments out of conditional expressions
Cleans up TIPC's source code to eliminate assigning values to variables
within conditional expressions, improving code readability and reducing
warnings from various code checker tools.

These changes are purely cosmetic and do not alter the operation of TIPC
in any way.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-01 13:57:56 -08:00
Allan Stephens
0e65967e33 tipc: cleanup various cosmetic whitespace issues
Cleans up TIPC's source code to eliminate deviations from generally
accepted coding conventions relating to leading/trailing white space
and white space around commas, braces, cases, and sizeof.

These changes are purely cosmetic and do not alter the operation of TIPC
in any way.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-01 13:57:56 -08:00
Paul Gortmaker
25860c3bd5 tipc: recode getsockopt error handling for better readability
The existing code for the copy to user and error handling at the
end of getsockopt isn't easy to follow, due to the excessive use
of if/else.  By simply using return where appropriate, it can be
made smaller and easier to follow at the same time.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-01 13:57:55 -08:00
Allan Stephens
e83504f724 tipc: remove pointless check for NULL prior to kfree
It is acceptable to call kfree() with NULL, so these checks are not
serving any useful purpose.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-01 13:57:55 -08:00
Allan Stephens
886ef52a8c tipc: remove redundant #includes
Eliminates a number of #include statements that no longer serve any
useful purpose.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-01 13:57:54 -08:00
Allan Stephens
6e7e309c62 tipc: Finish streamlining of debugging code
Completes the simplification of TIPC's debugging capabilities. By default
TIPC includes no debugging code, and any debugging code added by developers
that calls the dbg() and dbg_macros() is compiled out. If debugging support
is enabled, TIPC prints out some additional data about its internal state
when certain abnormal conditions occur, and any developer-added calls to the
TIPC debug macros are compiled in.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-01 13:57:54 -08:00
Allan Stephens
8d64a5ba58 tipc: Prune down link-specific debugging code
Eliminates most link-specific debugging code in TIPC, which is now
largely unnecessary. All calls to the link-specific debugging macros
have been removed, as are the macros themselves; in addition, the optional
allocation of print buffers to hold debugging information for each link
endpoint has been removed. The ability for TIPC to print out helpful
diagnostic information when link retransmit failures occur has been
retained for the time being, as an aid in tracking down the cause of
such failures.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-01 13:57:53 -08:00
Allan Stephens
7ced6890bf tipc: remove dump() and tipc_dump_dbg()
Eliminates calls to two debugging macros that are being completely obsoleted,
as well as any associated debugging routines that are no longer required.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-01 13:57:53 -08:00
Allan Stephens
b29f142849 tipc: remove calls to dbg() and msg_dbg()
Eliminates obsolete calls to two of TIPC's main debugging macros, as well
as a pair of associated debugging routines that are no longer required.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-01 13:57:52 -08:00
Allan Stephens
f5e75269f5 tipc: rename dbg.[ch] to log.[ch]
As the first step in removing obsolete debugging code from TIPC the
files that implement TIPC's non-debug-related log buffer subsystem
are renamed to better reflect their true nature.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-01 13:57:51 -08:00
Allan Stephens
5af5479296 tipc: Remove internal linked list of node objects
Eliminates a sorted list TIPC uses to keep track of the neighboring
nodes it has links to, since this duplicates information already present
in the internal array of node object pointers.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-01 13:57:51 -08:00
Allan Stephens
b0c1e928c8 tipc: Remove user registry subsystem
Eliminates routines, data structures, and files that make up TIPC's
user registry. The user registry is no longer needed since the native
API routines that utilized it no longer exist and there are no longer
any internal TIPC services that use it.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-01 13:57:51 -08:00
Allan Stephens
aa70200e00 tipc: Eliminate use of user registry by topology service
Simplifies TIPC's network topology service so that it no longer registers
its ports with the user registry, since the service doesn't take advantage
of any of the registry's capabilities.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-01 13:57:50 -08:00
Allan Stephens
7a488fd3d4 tipc: Eliminate use of user registry by configuration service
Simplifies TIPC's configuration service so that it no longer registers
its port with the user registry, since the service doesn't take advantage
of any of the registry's capabilities.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-01 13:57:50 -08:00
Allan Stephens
8f92df6ad4 tipc: Remove prototype code for supporting multiple clusters
Eliminates routines, data structures, and files that were intended
to allow TIPC to support a network containing multiple clusters.
Currently, TIPC supports only networks consisting of a single cluster
within a single zone, so this code is unnecessary.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-01 13:57:49 -08:00
Allan Stephens
51a8e4dee7 tipc: Remove prototype code for supporting inter-cluster routing
Eliminates routines and data structures that were intended to allow
TIPC to route messages to other clusters. Currently, TIPC supports only
networks consisting of a single cluster within a single zone, so this
code is unnecessary.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-01 13:57:48 -08:00
Allan Stephens
08c80e9a03 tipc: Remove prototype code for supporting slave nodes
Simplifies routines and data structures that were intended to allow
TIPC to support slave nodes (i.e. nodes that did not have links to
all of the other nodes in its cluster, forcing TIPC to route messages
that it could not deliver directly through a non-slave node).

Currently, TIPC supports only networks containing non-slave nodes,
so this code is unnecessary.

Note: The latest edition of the TIPC 2.0 Specification has eliminated
the concept of slave nodes entirely.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-01 13:57:48 -08:00
Allan Stephens
51f98a8d70 tipc: Remove prototype code for supporting multiple zones
Eliminates routines, data structures, and files that were intended
to allows TIPC to support a network containing multiple zones.
Currently, TIPC supports only networks consisting of a single cluster
within a single zone, so this code is unnecessary.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-01 13:57:47 -08:00
Eric Dumazet
18c8d82ae5 sfq: fix slot_dequeue_head()
slot_dequeue_head() should make sure slot skb chain is correct in both
ways, or we can crash if all possible flows are in use.

Jarek pointed out slot_queue_init() can now be done in sfq_init() once,
instead each time a flow is setup.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-31 12:48:55 -08:00
Eric Dumazet
eeaeb068f1 sch_sfq: allow big packets and be fair
SFQ is currently 'limited' to small packets, because it uses a 15bit
allotment number per flow. Introduce a scale by 8, so that we can handle
full size TSO/GRO packets.

Use appropriate handling to make sure allot is positive before a new
packet is dequeued, so that fairness is respected.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Jarek Poplawski <jarkao2@gmail.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-31 12:47:37 -08:00
Dan Rosenberg
9f260e0efa CAN: Use inode instead of kernel address for /proc file
Since the socket address is just being used as a unique identifier, its
inode number is an alternative that does not leak potentially sensitive
information.

CC-ing stable because MITRE has assigned CVE-2010-4565 to the issue.

Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-31 11:13:27 -08:00
Shmulik Ravid
7c14c3f10e dcbnl: cleanup
A couple of small cleanups for patches:
[net-next-2.6 PATCH 1/3] dcbnl: add support for ieee8021Qaz attributes
[net-next-2.6 PATCH 2/3] dcbnl: add appliction tlv handlers
[net-next-2.6 PATCH 3/3] net_dcb: add application notifiers

Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-31 10:52:09 -08:00
Shmulik Ravid
ea45fe4e17 dcbnl: adding DCBX feature flags get-set
Adding a pair of set-get routines to dcbnl for setting the negotiation
flags of the various DCB features. Conforms to the CEE flavor of DCBX
The user sets these flags (enable, advertise, willing) for each feature
to be used by the DCBX engine. The 'get' routine returns which of the
features is enabled after the negotiation.

This patch is dependent on the following patches:
[net-next-2.6 PATCH 1/3] dcbnl: add support for ieee8021Qaz attributes
[net-next-2.6 PATCH 2/3] dcbnl: add appliction tlv handlers
[net-next-2.6 PATCH 3/3] net_dcb: add application notifiers

Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-31 10:50:54 -08:00
Shmulik Ravid
6241b6259b dcbnl: adding DCBX engine capability
Adding an optional DCBX capability and a pair for get-set routines for
setting the device DCBX mode. The DCBX capability is a bit field of
supported attributes. The user is expected to set the DCBX mode with a
subset of the advertised attributes.

This patch is dependent on the following patches:
[net-next-2.6 PATCH 1/3] dcbnl: add support for ieee8021Qaz attributes
[net-next-2.6 PATCH 2/3] dcbnl: add appliction tlv handlers
[net-next-2.6 PATCH 3/3] net_dcb: add application notifiers

Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-31 10:50:54 -08:00
John Fastabend
96b99684e3 net_dcb: add application notifiers
DCBx applications priorities can be changed dynamically. If
application stacks are expected to keep the skb priority
consistent with the dcbx priority the stack will need to
be notified when these changes occur.

This patch adds application notifiers for the stack to register
with.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-31 10:47:46 -08:00
John Fastabend
9ab933ab2c dcbnl: add appliction tlv handlers
This patch adds application tlv handlers. Networking stacks
may use the application priority to set the skb priority of
their stack using the negoatiated dcbx priority.

This patch provides the dcb_{get|set}app() routines for the
stack to query these parameters. Notice lower layer drivers
can use the dcbnl_ops routines if additional handling is
needed. Perhaps in the firmware case for example

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-31 10:47:45 -08:00
John Fastabend
3e29027af4 dcbnl: add support for ieee8021Qaz attributes
The IEEE8021Qaz is the IEEE standard version of CEE. The
standard has had enough significant changes from the CEE
version that many of the CEE attributes have no meaning
in the new spec or do not easily map to IEEE standards.

Rather then attempt to create a complicated mapping
between CEE and IEEE standards this patch adds a nested
IEEE attribute to the list of DCB attributes. The policy
is,

	[DCB_ATTR_IFNAME]
	[DCB_ATTR_STATE]
	...
	[DCB_ATTR_IEEE]
		[DCB_ATTR_IEEE_ETS]
		[DCB_ATTR_IEEE_PFC]
		[DCB_ATTR_IEEE_APP_TABLE]
			[DCB_ATTR_IEEE_APP]
			...

The following dcbnl_rtnl_ops routines were added to handle
the IEEE standard,

	int (*ieee_getets) (struct net_device *, struct ieee_ets *);
	int (*ieee_setets) (struct net_device *, struct ieee_ets *);
	int (*ieee_getpfc) (struct net_device *, struct ieee_pfc *);
	int (*ieee_setpfc) (struct net_device *, struct ieee_pfc *);
	int (*ieee_getapp) (struct net_device *, struct dcb_app *);
	int (*ieee_setapp) (struct net_device *, struct dcb_app *);

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-31 10:47:45 -08:00
David S. Miller
17f7f4d9fc Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	net/ipv4/fib_frontend.c
2010-12-26 22:37:05 -08:00
Linus Torvalds
d7c1255a3a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits)
  ipv4: dont create routes on down devices
  epic100: hamachi: yellowfin: Fix skb allocation size
  sundance: Fix oopses with corrupted skb_shared_info
  Revert "ipv4: Allow configuring subnets as local addresses"
  USB: mcs7830: return negative if auto negotiate fails
  irda: prevent integer underflow in IRLMP_ENUMDEVICES
  tcp: fix listening_get_next()
  atl1c: Do not use legacy PCI power management
  mac80211: fix mesh forwarding
  MAINTAINERS: email address change
  net: Fix range checks in tcf_valid_offset().
  net_sched: sch_sfq: fix allot handling
  hostap: remove netif_stop_queue from init
  mac80211/rt2x00: add ieee80211_tx_status_ni()
  typhoon: memory corruption in typhoon_get_drvinfo()
  net: Add USB PID for new MOSCHIP USB ethernet controller MCS7832 variant
  net_sched: always clone skbs
  ipv6: Fragment locally generated tunnel-mode IPSec6 packets as needed.
  netlink: fix gcc -Wconversion compilation warning
  asix: add USB ID for Logitec LAN-GTJ U2A
  ...
2010-12-26 12:06:56 -08:00
Eric Dumazet
fc75fc8339 ipv4: dont create routes on down devices
In ip_route_output_slow(), instead of allowing a route to be created on
a not UPed device, report -ENETUNREACH immediately.

# ip tunnel add mode ipip remote 10.16.0.164 local
10.16.0.72 dev eth0
# (Note : tunl1 is down)
# ping -I tunl1 10.1.2.3
PING 10.1.2.3 (10.1.2.3) from 192.168.18.5 tunl1: 56(84) bytes of data.
(nothing)
# ./a.out tunl1
# ip tunnel del tunl1
Message from syslogd@shelby at Dec 22 10:12:08 ...
  kernel: unregister_netdevice: waiting for tunl1 to become free.
Usage count = 3

After patch:
# ping -I tunl1 10.1.2.3
connect: Network is unreachable

Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-25 20:05:31 -08:00
Tejun Heo
7f6b0db9f6 net/dsa: don't use flush_scheduled_work()
flush_scheduled_work() is deprecated and scheduled to be removed.
Directly flush dst->link_poll_work on remove instead.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Lennert Buytenhek <buytenh@wantstofly.org>
2010-12-24 15:59:06 +01:00
David S. Miller
e058464990 Revert "ipv4: Allow configuring subnets as local addresses"
This reverts commit 4465b46900.

Conflicts:

	net/ipv4/fib_frontend.c

As reported by Ben Greear, this causes regressions:

> Change 4465b46900 caused rules
> to stop matching the input device properly because the
> FLOWI_FLAG_MATCH_ANY_IIF is always defined in ip_dev_find().
>
> This breaks rules such as:
>
> ip rule add pref 512 lookup local
> ip rule del pref 0 lookup local
> ip link set eth2 up
> ip -4 addr add 172.16.0.102/24 broadcast 172.16.0.255 dev eth2
> ip rule add to 172.16.0.102 iif eth2 lookup local pref 10
> ip rule add iif eth2 lookup 10001 pref 20
> ip route add 172.16.0.0/24 dev eth2 table 10001
> ip route add unreachable 0/0 table 10001
>
> If you had a second interface 'eth0' that was on a different
> subnet, pinging a system on that interface would fail:
>
>   [root@ct503-60 ~]# ping 192.168.100.1
>   connect: Invalid argument

Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-23 12:03:57 -08:00
David S. Miller
a130883d95 Merge branch 'for-davem' of ssh://master.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2010-12-23 10:13:30 -08:00
Dan Rosenberg
fdac1e0697 irda: prevent integer underflow in IRLMP_ENUMDEVICES
If the user-provided len is less than the expected offset, the
IRLMP_ENUMDEVICES getsockopt will do a copy_to_user() with a very large
size value.  While this isn't be a security issue on x86 because it will
get caught by the access_ok() check, it may leak large amounts of kernel
heap on other architectures.  In any event, this patch fixes it.

Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-23 10:09:43 -08:00
Jiri Kosina
d9f4fbaf70 tcp: cleanup of cwnd initialization in tcp_init_metrics()
Commit 86bcebafc5 ("tcp: fix >2 iw selection") fixed a case
when congestion window initialization has been mistakenly omitted
by introducing cwnd label and putting backwards goto from the
end of the function.

This makes the code unnecessarily tricky to read and understand
on a first sight.

Shuffle the code around a little bit to make it more obvious.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-23 09:54:26 -08:00
Eric Dumazet
1bde5ac493 tcp: fix listening_get_next()
Alexey Vlasov found /proc/net/tcp could sometime loop and display
millions of sockets in LISTEN state.

In 2.6.29, when we converted TCP hash tables to RCU, we left two
sk_next() calls in listening_get_next().

We must instead use sk_nulls_next() to properly detect an end of chain.

Reported-by: Alexey Vlasov <renton@renton.name>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-23 09:32:46 -08:00
David S. Miller
b7e03ec9a6 Merge branch 'master' of ssh://master.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-12-22 17:34:40 -08:00
Gustavo F. Padovan
17f9cc3124 Bluetooth: Improve handling of HCI control channel in bind
Does not allow any channel different of HCI_CHANNEL_RAW and
HCI_CHANNEL_CONTROL to bind.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-22 23:00:34 -02:00
Johan Hedberg
23bb57633d Bluetooth: Fix __hci_request synchronization for hci_open_dev
The initialization function used by hci_open_dev (hci_init_req) sends
many different HCI commands. The __hci_request function should only
return when all of these commands have completed (or a timeout occurs).
Several of these commands cause hci_req_complete to be called which
causes __hci_request to return prematurely.

This patch fixes the issue by adding a new hdev->req_last_cmd variable
which is set during the initialization procedure. The hci_req_complete
function will no longer mark the request as complete until the command
matching hdev->req_last_cmd completes.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-22 22:58:07 -02:00
Johan Hedberg
c71e97bfaa Bluetooth: Add management events for controller addition & removal
This patch adds Bluetooth Management interface events for controller
addition and removal. The events correspond to the existing HCI_DEV_REG
and HCI_DEV_UNREG stack internal events.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-22 22:58:00 -02:00
Johan Hedberg
f7b64e69c7 Bluetooth: Add read_info management command
This patch implements the read_info command which is used to fetch basic
info about an adapter.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-22 22:57:51 -02:00
Johan Hedberg
faba42eb2a Bluetooth: Add read_index_list management command
This patch implements the read_index_list command through which
userspace can get a list of current adapter indices.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-22 22:57:44 -02:00
Johan Hedberg
02d981292a Bluetooth: Add read_version management command
This patch implements the initial read_version command that userspace
will use before any other management interface operations.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-22 22:57:37 -02:00
Johan Hedberg
e41d8b4e13 Bluetooth: Add error handling for managment command handlers
The command handlers for bluetooth management messaging should be able
to report errors (such as memory allocation failures) to the higher
levels in the call stack.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-22 22:56:56 -02:00
Johannes Berg
172128468f mac80211: cleanup select_queue
There's a redundant rcu_read_lock/unlock pair, a
redundant variable, and a few redundant accesses
to the 1d_to_ac array. Fix this to make the code
neater and easier to follow.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-22 15:44:22 -05:00
Eric Dumazet
ee09b3c1cf sfq: fix sfq class stats handling
sfq_walk() runs without qdisc lock. By the time it selects a non empty
hash slot and sfq_dump_class_stats() is run (with lock held), slot might
have been freed : We then access q->slots[SFQ_EMPTY_SLOT], out of
bounds, and crash in slot_queue_walk()

On previous kernels, bug is here but out of bounds qs[SFQ_DEPTH] and
allot[SFQ_DEPTH] are located in struct sfq_sched_data, so no illegal
memory access happens, only possibly wrong data reported to user.

Also, slot_dequeue_tail() should make sure slot skb chain is correctly
terminated, or sfq_dump_class_stats() can access freed skbs.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-22 11:39:59 -08:00
Luciano Coelho
65a6538a56 mac80211: check for CONFIG_MAC80211_LEDS in the tpt_led_trigger declaration
If CONFIG_MAC80211_LEDS is not set, ieee80211_i.h was failing to compile,
because struct led_trigger is only declared when CONFIG_LEDS_TRIGGERS is
set.

This patch adds ifdefs around the tpt_led_trigger declaration in
ieee80211_i.h to avoid the problem.

Signed-off-by: Luciano Coelho <luciano.coelho@nokia.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-22 14:36:05 -05:00
Johannes Berg
67408c8c7b mac80211: selective throughput LED trigger active
The throughput LED trigger was always active when
the radio was enabled. In most cases that's likely
the desired behaviour, but iwlwifi requires it to
be only active when one of the virtual interfaces
is actually "connected" in some way.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-22 14:33:37 -05:00
Johannes Berg
e1e5406854 mac80211: add throughput based LED blink trigger
iwlwifi and other drivers like to blink their LED
based on throughput. Implement this generically in
mac80211, based on a throughput table the driver
specifies. That way, drivers can set the blink
frequencies depending on their desired behaviour
and max throughput.

All the drivers need to do is provide an LED class
device, best with blink hardware offload.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-22 14:33:37 -05:00
Johannes Berg
fe67c913f1 mac80211: make LED trigger names available early
The throughput trigger will require doing LED
classdev/trigger handling before register_hw(),
so drivers should have access to the trigger
names before it. If trigger registration fails,
this will still make the trigger name available,
but that's not a big problem since the default
trigger will the simply not be found.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-22 14:33:37 -05:00
John W. Linville
63e35cd9bd Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem
Conflicts:
	drivers/net/wireless/iwlwifi/iwl-1000.c
	drivers/net/wireless/iwlwifi/iwl-6000.c
	drivers/net/wireless/iwlwifi/iwl-core.h
2010-12-22 14:27:21 -05:00
Johannes Berg
b51aff057c mac80211: fix mesh forwarding
Under memory pressure, the mac80211 mesh code
may helpfully print a message that it failed
to clone a mesh frame and then will proceed
to crash trying to use it anyway. Fix that.

Cc: stable@kernel.org [2.6.27+]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-22 13:36:35 -05:00
Jiri Kosina
4b7bd36470 Merge branch 'master' into for-next
Conflicts:
	MAINTAINERS
	arch/arm/mach-omap2/pm24xx.c
	drivers/scsi/bfa/bfa_fcpim.c

Needed to update to apply fixes for which the old branch was too
outdated.
2010-12-22 18:57:02 +01:00
Eric Dumazet
12b16dadbc filter: optimize accesses to ancillary data
We can translate pseudo load instructions at filter check time to
dedicated instructions to speed up filtering and avoid one switch().
libpcap currently uses SKF_AD_PROTOCOL, but custom filters probably use
other ancillary accesses.

Note : I made the assertion that ancillary data was always accessed with
BPF_LD|BPF_?|BPF_ABS instructions, not with BPF_LD|BPF_?|BPF_IND ones
(offset given by K constant, not by K + X register)

On x86_64, this saves a few bytes of text :

# size net/core/filter.o.*
   text	   data	    bss	    dec	    hex	filename
   4864	      0	      0	   4864	   1300	net/core/filter.o.new
   4944	      0	      0	   4944	   1350	net/core/filter.o.old

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-21 12:30:12 -08:00
Eric Dumazet
70978182d4 net: timestamp cloned packet in dev_queue_xmit_nit
Le vendredi 17 décembre 2010 à 10:26 +0100, Eric Dumazet a écrit :

>
> I think we can add this after latest Changli patch :
>
> He does one skb_clone() before calling the sniffers.
> We could set timestamp on this clone, instead of original skb.
>
> Problem solved.
>

[PATCH net-next-2.6] net: timestamp cloned packet in dev_queue_xmit_nit

Now we do one clone of skb if at least one sniffer might take packet,
we also can do the skb timestamping on the clone and let original packet
unchanged.

This is a generalization of commit 8caf153974 (net: sch_netem: Fix an
inconsistency in ingress netem timestamps.)

This way, we can have a good idea when packets are delivered to our
stack (tcpdump -i ifb0), while a tcpdump on original device gives
timestamps right before ingressing.

This also speedup our stack, avoiding taking timestamps if not needed.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Changli Gao <xiaosuo@gmail.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jarek Poplawski <jarkao2@gmail.com>
Acked-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-21 10:50:38 -08:00
Joe Perches
b3bcedadf2 net/sunrpc/clnt.c: Convert sprintf_symbol to %ps
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-21 11:51:26 -05:00
Nandita Dukkipati
356f039822 TCP: increase default initial receive window.
This patch changes the default initial receive window to 10 mss
(defined constant). The default window is limited to the maximum
of 10*1460 and 2*mss (when mss > 1460).

draft-ietf-tcpm-initcwnd-00 is a proposal to the IETF that recommends
increasing TCP's initial congestion window to 10 mss or about 15KB.
Leading up to this proposal were several large-scale live Internet
experiments with an initial congestion window of 10 mss (IW10), where
we showed that the average latency of HTTP responses improved by
approximately 10%. This was accompanied by a slight increase in
retransmission rate (0.5%), most of which is coming from applications
opening multiple simultaneous connections. To understand the extreme
worst case scenarios, and fairness issues (IW10 versus IW3), we further
conducted controlled testbed experiments. We came away finding minimal
negative impact even under low link bandwidths (dial-ups) and small
buffers.  These results are extremely encouraging to adopting IW10.

However, an initial congestion window of 10 mss is useless unless a TCP
receiver advertises an initial receive window of at least 10 mss.
Fortunately, in the large-scale Internet experiments we found that most
widely used operating systems advertised large initial receive windows
of 64KB, allowing us to experiment with a wide range of initial
congestion windows. Linux systems were among the few exceptions that
advertised a small receive window of 6KB. The purpose of this patch is
to fix this shortcoming.

References:
1. A comprehensive list of all IW10 references to date.
http://code.google.com/speed/protocols/tcpm-IW10.html

2. Paper describing results from large-scale Internet experiments with IW10.
http://ccr.sigcomm.org/drupal/?q=node/621

3. Controlled testbed experiments under worst case scenarios and a
fairness study.
http://www.ietf.org/proceedings/79/slides/tcpm-0.pdf

4. Raw test data from testbed experiments (Linux senders/receivers)
with initial congestion and receive windows of both 10 mss.
http://research.csc.ncsu.edu/netsrv/?q=content/iw10

5. Internet-Draft. Increasing TCP's Initial Window.
https://datatracker.ietf.org/doc/draft-ietf-tcpm-initcwnd/

Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-20 21:33:00 -08:00
Eric Dumazet
eda83e3b63 net_sched: sch_sfq: better struct layouts
Here is a respin of patch.

I'll send a short patch to make SFQ more fair in presence of large
packets as well.

Thanks

[PATCH v3 net-next-2.6] net_sched: sch_sfq: better struct layouts

This patch shrinks sizeof(struct sfq_sched_data)
from 0x14f8 (or more if spinlocks are bigger) to 0x1180 bytes, and
reduce text size as well.

   text    data     bss     dec     hex filename
   4821     152       0    4973    136d old/net/sched/sch_sfq.o
   4627     136       0    4763    129b new/net/sched/sch_sfq.o

All data for a slot/flow is now grouped in a compact and cache friendly
structure, instead of being spreaded in many different points.

struct sfq_slot {
        struct sk_buff  *skblist_next;
        struct sk_buff  *skblist_prev;
        sfq_index       qlen; /* number of skbs in skblist */
        sfq_index       next; /* next slot in sfq chain */
        struct sfq_head dep; /* anchor in dep[] chains */
        unsigned short  hash; /* hash value (index in ht[]) */
        short           allot; /* credit for this slot */
};

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jarek Poplawski <jarkao2@gmail.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-20 21:32:59 -08:00
Linus Torvalds
9d5004fcf6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: handle partial result from get_user_pages
  ceph: mark user pages dirty on direct-io reads
  ceph: fix null pointer dereference in ceph_init_dentry for nfs reexport
  ceph: fix direct-io on non-page-aligned buffers
  ceph: fix msgr_init error path
2010-12-20 21:32:20 -08:00
David S. Miller
d9993be65a Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-12-20 13:24:14 -08:00
Eric Dumazet
aa3e219997 net_sched: sch_sfq: fix allot handling
When deploying SFQ/IFB here at work, I found the allot management was
pretty wrong in sfq, even changing allot from short to int...

We should init allot for each new flow, not using a previous value found
in slot.

Before patch, I saw bursts of several packets per flow, apparently
denying the default "quantum 1514" limit I had on my SFQ class.

class sfq 11:1 parent 11: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 7p requeues 0 
 allot 11546 

class sfq 11:46 parent 11: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 1p requeues 0 
 allot -23873 

class sfq 11:78 parent 11: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 5p requeues 0 
 allot 11393 

After patch, better fairness among each flow, allot limit being
respected, allot is positive :

class sfq 11:e parent 11: 
 (dropped 0, overlimits 0 requeues 86) 
 backlog 0b 3p requeues 86 
 allot 596 

class sfq 11:94 parent 11: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 3p requeues 0 
 allot 1468 

class sfq 11:a4 parent 11: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 4p requeues 0 
 allot 650 

class sfq 11:bb parent 11: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 3p requeues 0 
 allot 596 

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-20 13:18:16 -08:00
Eric Dumazet
c426626324 net_sched: sch_sfq: add backlog info in sfq_dump_class_stats()
We currently return for each active SFQ slot the number of packets in
queue. We can also give number of bytes accounted for these packets.

tc -s class show dev ifb0

Before patch :

class sfq 11:3d9 parent 11:
 (dropped 0, overlimits 0 requeues 0)
 backlog 0b 3p requeues 0
 allot 1266

After patch :

class sfq 11:3e4 parent 11:
 (dropped 0, overlimits 0 requeues 0)
 backlog 4380b 3p requeues 0
 allot 1212

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-20 13:13:56 -08:00
Felix Fietkau
f8a0a78148 mac80211: fix potentially redundant skb data copying
When an skb is shared, it needs to be duplicated, along with its data buffer.
If the skb does not have enough headroom, using skb_copy might cause the data
buffer to be copied twice (once by skb_copy and once by pskb_expand_head).
Fix this by using skb_clone initially and letting ieee80211_skb_resize sort
out the rest.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-20 14:52:17 -05:00
Felix Fietkau
4cd06a344d mac80211: skip unnecessary pskb_expand_head calls
If the skb is not cloned and we don't need any extra headroom, there
is no point in reallocating the skb head.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-20 14:52:17 -05:00
Felix Fietkau
489ee9195a mac80211: fix initialization of skb->cb in ieee80211_subif_start_xmit
The change 'mac80211: Fix BUG in pskb_expand_head when transmitting shared skbs'
added a check for copying the skb if it's shared, however the tx info variable
still points at the cb of the old skb

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Acked-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-20 14:52:17 -05:00
Javier Cardona
61ad539459 mac80211: Remove unused third address from mesh address extension header.
The Mesh Control header only includes 0, 1 or 2 addresses. If there is
one address, it should be interpreted as Address 4.  If there are 2,
they are interpreted as Addresses 5 and 6 (Address 4 being the 4th
address in the 802.11 header).

The address extension used to hold up to 3 addresses instead of the current 2.
I'm not sure which draft version changed this, but it is very unlikely that it
will change again given the state of the approval process of this draft.  See
section 7.1.3.6.3 in current draft (8.0).

Also, note that the extra address that I'm removing was not being used, so this
change has no effect on over-the-air frame formats.  But I thought I better
remove it before someone does start using it.

Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-20 14:49:47 -05:00
Bruno Randolf
39fd5de447 nl80211: Export available antennas
Export the information which antennas are available for configuration as TX or
RX antennas via nl80211.

Signed-off-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-20 14:49:47 -05:00
Bruno Randolf
7f531e03ab cfg80211: Separate available antennas for RX and TX
As has been pointed out by Daniel Halperin some devices (e.g. Intel IWL5100)
can only TX from a subset of RX antennas, so use separate availability masks
for RX and TX.

Signed-off-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-20 14:46:58 -05:00
Javier Cardona
c7108a7111 mac80211: Send mesh non-HWMP path selection frames to userspace
Let path selection frames for protocols other than HWMP be sent to
userspace via NL80211_CMD_REGISTER_FRAME.  Also allow userspace to send
and receive mesh path selection frames.

Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-20 14:46:58 -05:00
Javier Cardona
c80d545da3 mac80211: Let userspace enable and configure vendor specific path selection.
Userspace will now be allowed to toggle between the default path
selection algorithm (HWMP, implemented in the kernel), and a vendor
specific alternative.  Also in the same patch, allow userspace to add
information elements to mesh beacons.  This is accordance with the
Extensible Path Selection Framework specified in version 7.0 of the
802.11s draft.

Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-20 14:46:57 -05:00
Javier Cardona
24bdd9f4c9 mac80211: Rename mesh_params to mesh_config to prepare for mesh_setup
Mesh parameters can be to setup a mesh or to configure it.
This patch renames the ambiguous name mesh_params to mesh_config
in preparation for mesh_setup.

Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-20 14:46:57 -05:00
David S. Miller
6561a3b12d ipv4: Flush per-ns routing cache more sanely.
Flush the routing cache only of entries that match the
network namespace in which the purge event occurred.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
2010-12-20 10:37:19 -08:00
Sven Eckelmann
53320fe3bb batman-adv: Return hna count on local buffer fill
hna_local_fill_buffer must return the number of added hna entries and
not the last checked hash bucket.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-20 10:32:03 -08:00
Joe Perches
58231186c4 pktgen: Remove unnecessary prefix from pr_<level>
Remove "pktgen: " prefix string from one pr_info.
pr_fmt adds it, so this is a duplicate.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-20 10:30:06 -08:00
Shan Wei
4c306a9291 net: kill unused macros
These macros never be used, so remove them.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-19 21:59:35 -08:00
Changli Gao
71d9dec24d net: increase skb->users instead of skb_clone()
In dev_queue_xmit_nit(), we have to clone skbs as we need to mangle skbs,
however, we don't need to clone skbs for all the packet_types.

Except for the first packet_type, we increase skb->users instead of
skb_clone().

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-19 21:50:20 -08:00
David Stevens
ad0081e43a ipv6: Fragment locally generated tunnel-mode IPSec6 packets as needed.
This patch modifies IPsec6 to fragment IPv6 packets that are
locally generated as needed.

This version of the patch only fragments in tunnel mode, so that fragment
headers will not be obscured by ESP in transport mode.

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-19 20:22:23 -08:00
stephen hemminger
d1ed113f16 ipv6: remove duplicate neigh_ifdown
When device is being set to down, neigh_ifdown was being called
twice. Once from addrconf notifier and once from ndisc notifier.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-18 22:01:17 -08:00
stephen hemminger
bc3ef6605e ipv6: fib6_ifdown cleanup
Remove (unnecessary) casts to make code cleaner.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-18 22:01:16 -08:00
Joe Perches
f3c0ceea83 net/sunrpc/auth_gss/gss_krb5_crypto.c: Use normal negative error value return
And remove unnecessary double semicolon too.

No effect to code, as test is != 0.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-12-17 15:48:22 -05:00
Shan Wei
66c941f4aa net: sunrpc: kill unused macros
These macros never be used for several years.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-12-17 15:48:21 -05:00
NeilBrown
3942302ea9 sunrpc: svc_sock_names should hold ref to socket being closed.
Currently svc_sock_names calls svc_close_xprt on a svc_sock to
which it does not own a reference.
As soon as svc_close_xprt sets XPT_CLOSE, the socket could be
freed by a separate thread (though this is a very unlikely race).

It is safer to hold a reference while calling svc_close_xprt.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-12-17 15:48:19 -05:00
NeilBrown
7c96aef759 sunrpc: remove xpt_pool
The xpt_pool field is only used for reporting BUGs.
And it isn't used correctly.

In particular, when it is cleared in svc_xprt_received before
XPT_BUSY is cleared, there is no guarantee that either the
compiler or the CPU might not re-order to two assignments, just
setting xpt_pool to NULL after XPT_BUSY is cleared.

If a different cpu were running svc_xprt_enqueue at this moment,
it might see XPT_BUSY clear and then xpt_pool non-NULL, and
so BUG.

This could be fixed by calling
  smp_mb__before_clear_bit()
before the clear_bit.  However as xpt_pool isn't really used,
it seems safest to simply remove xpt_pool.

Another alternate would be to change the clear_bit to
clear_bit_unlock, and the test_and_set_bit to test_and_set_bit_lock.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-12-17 15:48:18 -05:00
David S. Miller
b4aa9e05a6 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/bnx2x/bnx2x.h
	drivers/net/wireless/iwlwifi/iwl-1000.c
	drivers/net/wireless/iwlwifi/iwl-6000.c
	drivers/net/wireless/iwlwifi/iwl-core.h
	drivers/vhost/vhost.c
2010-12-17 12:27:22 -08:00
J. Bruce Fields
ec66ee3797 Merge commit 'v2.6.37-rc6' into for-2.6.38 2010-12-17 13:29:07 -05:00
Henry C Chang
361cf40519 ceph: handle partial result from get_user_pages
The get_user_pages() helper can return fewer than the requested pages.
Error out in that case, and clean up the partial result.

Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-12-17 09:55:59 -08:00
Henry C Chang
b6aa5901c7 ceph: mark user pages dirty on direct-io reads
For read operation, we have to set the argument _write_ of get_user_pages
to 1 since we will write data to pages. Also, we need to SetPageDirty before
releasing these pages.

Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-12-17 09:54:40 -08:00
stephen hemminger
29ba5fed1b ipv6: don't flush routes when setting loopback down
When loopback device is being brought down, then keep the route table
entries because they are special. The entries in the local table for
linklocal routes and ::1 address should not be purged.

This is a sub optimal solution to the problem and should be replaced
by a better fix in future.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-16 18:26:26 -08:00
Wei Yongjun
7d743b7e95 sctp: fix the return value of getting the sctp partial delivery point
Get the sctp partial delivery point using SCTP_PARTIAL_DELIVERY_POINT
socket option should return 0 if success, not -ENOTSUPP.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-16 14:48:44 -08:00
Michał Mirosław
55508d601d net: Use skb_checksum_start_offset()
Replace skb->csum_start - skb_headroom(skb) with skb_checksum_start_offset().

Note for usb/smsc95xx: skb->data - skb->head == skb_headroom(skb).

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-16 14:43:14 -08:00
David Stevens
76d661586c bridge: fix IPv6 queries for bridge multicast snooping
This patch fixes a missing ntohs() for bridge IPv6 multicast snooping.

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-16 14:41:23 -08:00
Octavian Purdila
fcbdf09d96 net: fix nulls list corruptions in sk_prot_alloc
Special care is taken inside sk_port_alloc to avoid overwriting
skc_node/skc_nulls_node. We should also avoid overwriting
skc_bind_node/skc_portaddr_node.

The patch fixes the following crash:

 BUG: unable to handle kernel paging request at fffffffffffffff0
 IP: [<ffffffff812ec6dd>] udp4_lib_lookup2+0xad/0x370
 [<ffffffff812ecc22>] __udp4_lib_lookup+0x282/0x360
 [<ffffffff812ed63e>] __udp4_lib_rcv+0x31e/0x700
 [<ffffffff812bba45>] ? ip_local_deliver_finish+0x65/0x190
 [<ffffffff812bbbf8>] ? ip_local_deliver+0x88/0xa0
 [<ffffffff812eda35>] udp_rcv+0x15/0x20
 [<ffffffff812bba45>] ip_local_deliver_finish+0x65/0x190
 [<ffffffff812bbbf8>] ip_local_deliver+0x88/0xa0
 [<ffffffff812bb2cd>] ip_rcv_finish+0x32d/0x6f0
 [<ffffffff8128c14c>] ? netif_receive_skb+0x99c/0x11c0
 [<ffffffff812bb94b>] ip_rcv+0x2bb/0x350
 [<ffffffff8128c14c>] netif_receive_skb+0x99c/0x11c0

Signed-off-by: Leonard Crestez <lcrestez@ixiacom.com>
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-16 14:26:56 -08:00
Octavian Purdila
443457242b net: factorize sync-rcu call in unregister_netdevice_many
Add dev_close_many and dev_deactivate_many to factorize another
sync-rcu operation on the netdevice unregister path.

$ modprobe dummy numdummies=10000
$ ip link set dev dummy* up
$ time rmmod dummy

Without the patch           With the patch

real    0m 24.63s           real    0m 5.15s
user    0m 0.00s            user    0m 0.00s
sys     0m 6.05s            sys     0m 5.14s

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-16 14:04:44 -08:00
Sven Eckelmann
c6c8fea297 net: Add batman-adv meshing protocol
B.A.T.M.A.N. (better approach to mobile ad-hoc networking) is a routing
protocol for multi-hop ad-hoc mesh networks. The networks may be wired or
wireless. See http://www.open-mesh.org/ for more information and user space
tools.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-16 13:44:24 -08:00
Changli Gao
b236da6931 net: use NUMA_NO_NODE instead of the magic number -1
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-16 13:16:06 -08:00
Vladislav Zolotarov
a3d22a68d7 bnx2x: Take the distribution range definition out of skb_tx_hash()
Move the calcualation of the Tx hash for a given hash range into a separate
function and define the skb_tx_hash(), which calculates a Tx hash for a
[0; dev->real_num_tx_queues - 1] hash values range, using this
function (__skb_tx_hash()).

Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-16 13:15:53 -08:00
Andrey Vagin
d3052b557a ipv6: delete expired route in ip6_pmtu_deliver
The first big packets sent to a "low-MTU" client correctly
triggers the creation of a temporary route containing the reduced MTU.

But after the temporary route has expired, new ICMP6 "packet too big"
will be sent, rt6_pmtu_discovery will find the previous EXPIRED route
check that its mtu isn't bigger then in icmp packet and do nothing
before the temporary route will not deleted by gc.

I make the simple experiment:
while :; do
    time ( dd if=/dev/zero bs=10K count=1 | ssh hostname dd of=/dev/null ) || break;
done

The "time" reports real 0m0.197s if a temporary route isn't expired, but
it reports real 0m52.837s (!!!!) immediately after a temporare route has
expired.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-16 12:28:13 -08:00
Luis R. Rodriguez
2784fe915c cfg80211: fix null pointer dereference with a custom regulatory request
Once we moved the core regulatory request to the queue and let
the scheduler process it last_request will have been left NULL
until the schedular decides to process the first request. When
this happens and we are loading a driver with a custom regulatory
request like all Atheros drivers we end up with a NULL pointer
dereference. We fix this by checking if the request was a
custom one.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
IP: [<ffffffffa016de87>] freq_reg_info_regd.clone.2+0x27/0x130 [cfg80211]
PGD 71f91067 PUD 712b2067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:1d.7/usb2/2-1/firmware/2-1/loading
CPU 0
Modules linked in: ath9k_htc(+) ath9k_common ath9k_hw ath <etc>
Pid: 3094, comm: insmod Tainted: G        W   2.6.37-rc5-wl #16 INVALID/28427ZQ
RIP: 0010:[<ffffffffa016de87>]  [<ffffffffa016de87>] freq_reg_info_regd.clone.2+0x27/0x130 [cfg80211]
RSP: 0018:ffff88007045db78  EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffffffffa047d9a0 RCX: ffff88007045dbd0
RDX: 0000000000004e20 RSI: 000000000024cde0 RDI: ffff8800700483e0
RBP: ffff88007045db98 R08: ffffffffa02f5b40 R09: 0000000000000001
R10: 000000000000000e R11: 0000000000000001 R12: 0000000000000000
R13: ffff88007004e3b0 R14: 0000000000000000 R15: ffff880070048340
FS:  00007f635a707700(0000) GS:ffff880077400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000004 CR3: 00000000708a9000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process insmod (pid: 3094, threadinfo ffff88007045c000, task ffff8800713e3ec0)
Stack:
 ffffffffa047d9a0 0000000000000000 ffff88007004e3b0 0000000000000000
 ffff88007045dc08 ffffffffa016e147 000000007045dc08 0000000000000002
 ffff8800700483e0 ffffffffa02f5b40 ffff88007045dbd8 0000000000000000
Call Trace:
 [<ffffffffa016e147>] wiphy_apply_custom_regulatory+0x137/0x1d0 [cfg80211]
 [<ffffffffa047a690>] ? ath9k_reg_notifier+0x0/0x50 [ath9k_htc]
 [<ffffffffa02f47f7>] ath_regd_init+0x347/0x430 [ath]
 [<ffffffffa047b1f5>] ath9k_htc_probe_device+0x6c5/0x960 [ath9k_htc]
 [<ffffffffa0472a2c>] ath9k_htc_hw_init+0xc/0x30 [ath9k_htc]
 [<ffffffffa04747e6>] ath9k_hif_usb_probe+0x216/0x3b0 [ath9k_htc]
 [<ffffffffa03bb6bc>] usb_probe_interface+0x10c/0x210 [usbcore]
 [<ffffffff812aec26>] driver_probe_device+0x96/0x1c0
 [<ffffffff812aedf3>] __driver_attach+0xa3/0xb0
 [<ffffffff812aed50>] ? __driver_attach+0x0/0xb0
 [<ffffffff812adaae>] bus_for_each_dev+0x5e/0x90
 [<ffffffff812ae8c9>] driver_attach+0x19/0x20
 [<ffffffff812ae438>] bus_add_driver+0x168/0x320
 [<ffffffff812af071>] driver_register+0x71/0x140
 [<ffffffff811fc4a8>] ? __raw_spin_lock_init+0x38/0x70
 [<ffffffffa03ba39c>] usb_register_driver+0xdc/0x190 [usbcore]
 [<ffffffffa03a2000>] ? ath9k_htc_init+0x0/0x4f [ath9k_htc]
 [<ffffffffa047499e>] ath9k_hif_usb_init+0x1e/0x20 [ath9k_htc]
 [<ffffffffa03a202b>] ath9k_htc_init+0x2b/0x4f [ath9k_htc]
 [<ffffffff8100212f>] do_one_initcall+0x3f/0x180
 [<ffffffff8109ef5b>] sys_init_module+0xbb/0x200
 [<ffffffff8100bf52>] system_call_fastpath+0x16/0x1b
Code: <etc, who cares>
RIP  [<ffffffffa016de87>] freq_reg_info_regd.clone.2+0x27/0x130 [cfg80211]
 RSP <ffff88007045db78>
CR2: 0000000000000004
---[ end trace 79e4193601c8b713 ]---

Reported-by: Sujith Manoharan <Sujith.Manoharan@atheros.com>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-16 15:22:31 -05:00
Jouni Malinen
cf4e594ea7 nl80211: Add notification for dropped Deauth/Disassoc
Add a new notification to indicate that a received, unprotected
Deauthentication or Disassociation frame was dropped due to
management frame protection being in use. This notification is
needed to allow user space (e.g., wpa_supplicant) to implement
SA Query procedure to recover from association state mismatch
between an AP and STA.

This is needed to avoid getting stuck in non-working state when MFP
(IEEE 802.11w) is used and a protected Deauthentication or
Disassociation frame is dropped for any reason. After that, the
station would silently discard any unprotected Deauthentication or
Disassociation frame that could be indicating that the AP does not
have association for the STA (when the Reason Code would be 6 or 7).
IEEE Std 802.11w-2009, 11.13 describes this recovery mechanism.

Signed-off-by: Jouni Malinen <j@w1.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-16 15:22:30 -05:00
Chuck Lever
bf2695516d SUNRPC: New xdr_streams XDR decoder API
Now that all client-side XDR decoder routines use xdr_streams, there
should be no need to support the legacy calling sequence [rpc_rqst *,
__be32 *, RPC res *] anywhere.  We can construct an xdr_stream in the
generic RPC code, instead of in each decoder function.

This is a refactoring change.  It should not cause different behavior.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16 12:37:25 -05:00
Chuck Lever
9f06c719f4 SUNRPC: New xdr_streams XDR encoder API
Now that all client-side XDR encoder routines use xdr_streams, there
should be no need to support the legacy calling sequence [rpc_rqst *,
__be32 *, RPC arg *] anywhere.  We can construct an xdr_stream in the
generic RPC code, instead of in each encoder function.

Also, all the client-side encoder functions return 0 now, making a
return value superfluous.  Take this opportunity to convert them to
return void instead.

This is a refactoring change.  It should not cause different behavior.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16 12:37:25 -05:00
Chuck Lever
1ac7c23e4a SUNRPC: Determine value of "nrprocs" automatically
Clean up.

Just fixed a panic where the nrprocs field in a different upper layer
client was set by hand incorrectly.  Use the compiler-generated method
used by the other upper layer protocols.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16 12:37:25 -05:00
Chuck Lever
4129ccf303 SUNRPC: Avoid return code checking in rpcbind XDR encoder functions
Clean up.

The trend in the other XDR encoder functions is to BUG() when encoding
problems occur, since a problem here is always due to a local coding
error.  Then, instead of a status, zero is unconditionally returned.

Update the rpcbind XDR encoders to behave this way.

To finish the update, use the new-style be32_to_cpup() and
cpu_to_be32() macros, and compute the buffer sizes using raw integers
instead of sizeof().  This matches the conventions used in other XDR
functions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16 12:37:25 -05:00
KOVACS Krisztian
ae90bdeaea netfilter: fix compilation when conntrack is disabled but tproxy is enabled
The IPv6 tproxy patches split IPv6 defragmentation off of conntrack, but
failed to update the #ifdef stanzas guarding the defragmentation related
fields and code in skbuff and conntrack related code in nf_defrag_ipv6.c.

This patch adds the required #ifdefs so that IPv6 tproxy can truly be used
without connection tracking.

Original report:
http://marc.info/?l=linux-netdev&m=129010118516341&w=2

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-12-15 23:53:41 +01:00
Shan Wei
38cd6b4f52 wireless:mac80211: kill unuse macro MESH_CFG_CMP_LEN in mesh.h
Commit 00d3f14c has removed the references of this macro,
but left it only. So remove this definition.

commit 00d3f14cf9
Author: Johannes Berg <johannes@sipsolutions.net>
Date:   Tue Feb 10 21:26:00 2009 +0100

    mac80211: use cfg80211s BSS infrastructure

    Remove all the code from mac80211 to keep track of BSSes
    and use the cfg80211-provided code completely.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-15 17:04:08 -05:00
Sujith Manoharan
bd2ce6e43f mac80211: Add timeout to BA session start API
Allow drivers or rate control algorithms to specify BlockAck session
timeout when initiating an ADDBA transaction. This is useful in cases
where maintaining persistent BA sessions does not incur any overhead.

The current timeout value of 5000 TUs is retained for all non ath9k/ath9k_htc
drivers.

Signed-off-by: Sujith Manoharan <Sujith.Manoharan@atheros.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-15 17:03:59 -05:00
Johannes Berg
a293911d4f nl80211: advertise maximum remain-on-channel duration
With the upcoming hardware offload implementation,
some devices will have a different maximum duration
for the remain-on-channel command. Advertise the
maximum duration in mac80211, and make mac80211 set
it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-15 17:03:56 -05:00
John W. Linville
1fcfe76a76 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
Conflicts:
	drivers/net/wireless/iwlwifi/iwl-1000.c
	drivers/net/wireless/iwlwifi/iwl-6000.c
	drivers/net/wireless/iwlwifi/iwl-core.h
2010-12-15 16:33:28 -05:00
David S. Miller
82cc4f5cb8 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-12-15 09:43:13 -08:00
Tejun Heo
afe2c511fb workqueue: convert cancel_rearming_delayed_work[queue]() users to cancel_delayed_work_sync()
cancel_rearming_delayed_work[queue]() has been superceded by
cancel_delayed_work_sync() quite some time ago.  Convert all the
in-kernel users.  The conversions are completely equivalent and
trivial.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: "David S. Miller" <davem@davemloft.net>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: netdev@vger.kernel.org
Cc: Anton Vorontsov <cbou@mail.ru>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Alex Elder <aelder@sgi.com>
Cc: xfs-masters@oss.sgi.com
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: netfilter-devel@vger.kernel.org
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: linux-nfs@vger.kernel.org
2010-12-15 10:56:11 +01:00
Linus Torvalds
b4fe2a0342 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (75 commits)
  pppoe.c: Fix kernel panic caused by __pppoe_xmit
  WAN: Fix a TX IRQ causing BUG() in PC300 and PCI200SYN drivers.
  bnx2x: Advance a version number to 1.60.01-0
  bnx2x: Fixed a compilation warning
  bnx2x: LSO code was broken on BE platforms
  qlge: Fix deadlock when cancelling worker.
  net: fix skb_defer_rx_timestamp()
  cxgb4vf: Ingress Queue Entry Size needs to be 64 bytes
  phy: add the IC+ IP1001 driver
  atm: correct sysfs 'device' link creation and parent relationships
  MAINTAINERS: remove me from tulip
  SCTP: Fix SCTP_SET_PEER_PRIMARY_ADDR to accpet v4mapped address
  enic: Bug Fix: Pass napi reference to the isr that services receive queue
  ipv6: fix nl group when advertising a new link
  connector: add module alias
  net: Document the kernel_recvmsg() function
  r8169: Fix runtime power management
  hso: IP checksuming doesn't work on GE0301 option cards
  xfrm: Fix xfrm_state_migrate leak
  net: Convert netpoll blocking api in bonding driver to be a counter
  ...
2010-12-14 17:33:40 -08:00
David S. Miller
d33e455337 net: Abstract default MTU metric calculation behind an accessor.
Like RTAX_ADVMSS, make the default calculation go through a dst_ops
method rather than caching the computation in the routing cache
entries.

Now dst metrics are pretty much left as-is when new entries are
created, thus optimizing metric sharing becomes a real possibility.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-14 13:01:14 -08:00
David S. Miller
6389aa73ab Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2010-12-14 10:52:54 -08:00
Sage Weil
d96c9043d1 ceph: fix msgr_init error path
create_workqueue() returns NULL on failure.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-12-13 20:30:28 -08:00
David S. Miller
0dbaee3b37 net: Abstract default ADVMSS behind an accessor.
Make all RTAX_ADVMSS metric accesses go through a new helper function,
dst_metric_advmss().

Leave the actual default metric as "zero" in the real metric slot,
and compute the actual default value dynamically via a new dst_ops
AF specific callback.

For stacked IPSEC routes, we use the advmss of the path which
preserves existing behavior.

Unlike ipv4/ipv6, DecNET ties the advmss to the mtu and thus updates
advmss on pmtu updates.  This inconsistency in advmss handling
results in more raw metric accesses than I wish we ended up with.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-13 12:52:14 -08:00
Johannes Berg
207aba6018 mac80211: support IBSS RSN with SW crypto
When software crypto is used, mac80211 will
support IBSS RSN, it doesn't depend on the
driver in that case.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-13 15:23:30 -05:00
Tim Harvey
91f44b0299 mac80211 default tx_last_beacon false (congestion)
The 802.11 spec states that the STA that generated the last Beacon frame shall
be the STA that response to a probe request.  This is important for congestion
reduction when a probe request is received - only 1 node in an adhoc BSS
will transmit a response.  While mac80211 drivers should provide the
tx_last_beacon function to report if they transmitted the last beacon many
do not.  As an attempt to reduce probe response congestion default this
to 0 such that a node not implementing this capability does not contribute
to unnecessary congestion.

In a modern medium sized office environment I see upwards of 100 probe
requests per second received at a given node from various hardware/OS/drivers
doing zeroconf 'active probing' as opposed to passively listening for beacons.
With a modest 10-node adhoc network consisting of drivers that do not implement
this tx_last_beacon feature, I have seen this result in the simultaneous xmit
of probe responses accumulating to 500 probe responses per second because of
collisions which brings the adhoc network to its knees as well as causes
needless congestion.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-13 15:23:29 -05:00
Johannes Berg
f7e0104c1a mac80211: support separate default keys
Add support for split default keys (unicast
and multicast) in mac80211.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-13 15:23:29 -05:00
Johannes Berg
dbd2fd656f cfg80211/nl80211: separate unicast/multicast default TX keys
Allow userspace to specify that a given key
is default only for unicast and/or multicast
transmissions. Only WEP keys are for both,
WPA/RSN keys set here are GTKs for multicast
only. For more future flexibility, allow to
specify all combiations.

Wireless extensions can only set both so use
nl80211; WEP keys (connect keys) must be set
as default for both (but 802.1X WEP is still
possible).

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-13 15:23:28 -05:00
Johannes Berg
897bed8b43 mac80211: clean up RX key checks
Using the default key for "any key set" isn't
quite what we should do. It works, but with the
upcoming changes it makes life unnecessarily
complex, so do something better here and really
check for "any key".

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-13 15:23:28 -05:00
Sven Neumann
01123e2331 cfg80211: update information elements in cached BSS struct
When a cached BSS struct is updated because a new beacon was received,
the code replaces the cached information elements by the IEs from the
new beacon. However it did not update the pub.information_elements
and pub.len_information_elements fields leaving them either pointing
to the old beacon IEs or in an inconsistent state where the data is
replaced by the new beacon IEs but len_information_elements still has
its value from the first beacon.

Fix this by updating the information elements fields if they are
pointing to beacon IEs.

Signed-off-by: Sven Neumann <s.neumann@raumfeld.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-13 15:23:28 -05:00
Bruno Randolf
a7ffac9591 cfg80211: Add antenna availability information
Add a field to wiphy for the hardware to report the availble antennas for
configuration. Only if this is set to something bigger than zero, will the
anntenna configuration ops be executed.

Allthough this could be a simple number of antennas, I defined it as a bitmap
of antennas which are available for configuration, since it's more consistent
with the rest of the antenna API and there could be cases where the
hardware allows only configuration of certain antennas. As it does not make
much of a difference in size or normal usage, I think it's better to be able to
support this, in case the need arises.

The antenna configuration is now also checked against the availabe antennas and
rejected if it does not match.

Signed-off-by: Bruno Randolf <br1@einfach.org>

--
v3:	always apply available antenna mask (for "all" antennas case).

v2:	reject antenna configurations which don't match the available antennas
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-13 15:23:27 -05:00
John W. Linville
1d212aa96e Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem 2010-12-13 15:20:45 -05:00
Eric Dumazet
249fab773d net: add limits to ip_default_ttl
ip_default_ttl should be between 1 and 255

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-13 12:16:14 -08:00
Herton Ronaldo Krzesinski
8808f64171 mac80211: avoid calling ieee80211_work_work unconditionally
On suspend, there might be usb wireless drivers which wrongly trigger
the warning in ieee80211_work_work. If an usb driver doesn't have a
suspend hook, the usb stack will disconnect the device. On disconnect,
a mac80211 driver calls ieee80211_unregister_hw, which calls dev_close,
which calls ieee80211_stop, and in the end calls ieee80211_work_purge->
ieee80211_work_work.

The problem is that this call to ieee80211_work_purge comes after
mac80211 is suspended, triggering the warning even when we don't have
work queued in work_list (the expected case when already suspended),
because it always calls ieee80211_work_work.

So, just call ieee80211_work_work in ieee80211_work_purge if we really
have to abort work. This addresses the warning reported at
https://bugzilla.kernel.org/show_bug.cgi?id=24402

Signed-off-by: Herton Ronaldo Krzesinski <herton@mandriva.com.br>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-13 14:55:08 -05:00
Tim Harvey
c926d006c1 mac80211: Fix NULL-pointer deference on ibss merge when not ready
dev_open will eventually call ieee80211_ibss_join which sets up the
skb used for beacons/probe-responses however it is possible to
receive beacons that attempt to merge before this occurs causing
a null pointer dereference.  Check ssid_len as that is the last
thing set in ieee80211_ibss_join.

This occurs quite easily in the presence of adhoc nodes with hidden SSID's

revised previous patch to check further up based on irc feedback

Signed-off-by: Tim Harvey <harvey.tim@gmail.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-13 14:53:46 -05:00
John W. Linville
10c38c3306 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/padovan/bluetooth-2.6 2010-12-13 14:41:23 -05:00
David S. Miller
323e126f0c ipv4: Don't pre-seed hoplimit metric.
Always go through a new ip4_dst_hoplimit() helper, just like ipv6.

This allowed several simplifications:

1) The interim dst_metric_hoplimit() can go as it's no longer
   userd.

2) The sysctl_ip_default_ttl entry no longer needs to use
   ipv4_doint_and_flush, since the sysctl is not cached in
   routing cache metrics any longer.

3) ipv4_doint_and_flush no longer needs to be exported and
   therefore can be marked static.

When ipv4_doint_and_flush_strategy was removed some time ago,
the external declaration in ip.h was mistakenly left around
so kill that off too.

We have to move the sysctl_ip_default_ttl declaration into
ipv4's route cache definition header net/route.h, because
currently net/ip.h (where the declaration lives now) has
a back dependency on net/route.h

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-12 22:08:17 -08:00
David S. Miller
a02e4b7dae ipv6: Demark default hoplimit as zero.
This is for consistency with ipv4.  Using "-1" makes
no sense.

It was made this way a long time ago merely to be consistent
with how the ipv6 socket hoplimit "default" is stored.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-12 21:39:02 -08:00
David S. Miller
5170ae824d net: Abstract RTAX_HOPLIMIT metric accesses behind helper.
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-12 21:35:57 -08:00
David S. Miller
abbf46ae0e ipv6: Use ip6_dst_hoplimit() instead of direct dst_metric() calls.
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-12 21:14:46 -08:00
Eric Dumazet
a19faf0250 net: fix skb_defer_rx_timestamp()
After commit c1f19b51d1 (net: support time stamping in phy devices.),
kernel might crash if CONFIG_NETWORK_PHY_TIMESTAMPING=y and
skb_defer_rx_timestamp() handles a packet without an ethernet header.

Fixes kernel bugzilla #24102

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=24102
Reported-and-tested-by: Andrew Watts <akwatts@ymail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-10 16:20:56 -08:00
Herbert Xu
deef4b522b bridge: Use consistent NF_DROP returns in nf_pre_routing
The nf_pre_routing functions in bridging have collected two
distinct ways of returning NF_DROP over the years, inline and
via goto.  There is no reason for preferring either one.

So this patch arbitrarily picks the inline variant and converts
the all the gotos.

Also removes a redundant comment.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-10 16:04:53 -08:00
Changli Gao
c053fd96d0 af_packet: use swap() instead of the open coded macro XC()
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-10 16:02:20 -08:00
Ben Hutchings
e596e6e4d5 ethtool: Report link-down while interface is down
While an interface is down, many implementations of
ethtool_ops::get_link, including the default, ethtool_op_get_link(),
will report the last link state seen while the interface was up.  In
general the current physical link state is not available if the
interface is down.

Define ETHTOOL_GLINK to reflect whether the interface *and* any
physical port have a working link, and consistently return 0 when the
interface is down.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-10 15:55:23 -08:00
Dan Williams
d9ca676bcb atm: correct sysfs 'device' link creation and parent relationships
The ATM subsystem was incorrectly creating the 'device' link for ATM
nodes in sysfs.  This led to incorrect device/parent relationships
exposed by sysfs and udev.  Instead of rolling the 'device' link by hand
in the generic ATM code, pass each ATM driver's bus device down to the
sysfs code and let sysfs do this stuff correctly.

Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-10 15:45:05 -08:00
Junchang Wang
a8d764b983 pktgen: adding prefetchw() call
We know for sure pktgen is going to write skb->data right after
*_alloc_skb, causing unnecessary cache misses.

Idea is to add a prefetchw() call to prefetch the first cache line
indicated by skb->data. On systems with Adjacent Cache Line Prefetch,
it's probably two cache lines are prefetched.

With this patch, pktgen on Intel SR1625 server with two E5530
quad-core processors and a single ixgbe-based NIC went from 8.63Mpps
to 9.03Mpps, with 4.6% improvement.

Signed-off-by: Junchang Wang <junchangwang@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-10 15:36:52 -08:00
Wei Yongjun
40a010395c SCTP: Fix SCTP_SET_PEER_PRIMARY_ADDR to accpet v4mapped address
SCTP_SET_PEER_PRIMARY_ADDR does not accpet v4mapped address, using
v4mapped address in SCTP_SET_PEER_PRIMARY_ADDR socket option will
get -EADDRNOTAVAIL error if v4map is enabled. This patch try to
fix it by mapping v4mapped address to v4 address if allowed.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-10 15:29:49 -08:00
Tobias Klauser
376d940ee9 inet6: Remove redundant unlikely()
IS_ERR() already implies unlikely(), so it can be omitted here.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-10 14:57:34 -08:00
Martin Willi
040253c931 xfrm: Traffic Flow Confidentiality for IPv6 ESP
Add TFC padding to all packets smaller than the boundary configured
on the xfrm state. If the boundary is larger than the PMTU, limit
padding to the PMTU.

Signed-off-by: Martin Willi <martin@strongswan.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-10 14:43:59 -08:00
Martin Willi
d979e20f2b xfrm: Traffic Flow Confidentiality for IPv4 ESP
Add TFC padding to all packets smaller than the boundary configured
on the xfrm state. If the boundary is larger than the PMTU, limit
padding to the PMTU.

Signed-off-by: Martin Willi <martin@strongswan.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-10 14:43:59 -08:00
Martin Willi
35d2856b46 xfrm: Add Traffic Flow Confidentiality padding XFRM attribute
The XFRMA_TFCPAD attribute for XFRM state installation configures
Traffic Flow Confidentiality by padding ESP packets to a specified
length.

Signed-off-by: Martin Willi <martin@strongswan.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-10 14:43:58 -08:00
Jiri Pirko
c07224005d net/ipv6/udp.c: fix typo in flush_stack()
skb1 should be passed as parameter to sk_rcvqueues_full() here.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-10 14:05:09 -08:00
David S. Miller
457de4383e ipv6: Fix 'release_it' logic in tcp_v6_get_peer()
We accidently set it to "true" for the case where we
are using a route bound peer.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-10 13:16:09 -08:00
Tobias Klauser
4c0833bcd4 bridge: Fix return values of br_multicast_add_group/br_multicast_new_group
If br_multicast_new_group returns NULL, we would return 0 (no error) to
the caller of br_multicast_add_group, which is not what we want. Instead
br_multicast_new_group should return ERR_PTR(-ENOMEM) in this case.
Also propagate the error number returned by br_mdb_rehash properly.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-10 13:00:39 -08:00
David S. Miller
e91db5cd6f Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-12-10 12:51:02 -08:00
Nicolas Dichtel
5f75a1042f ipv6: fix nl group when advertising a new link
New idev are advertised with NL group RTNLGRP_IPV6_IFADDR, but
should use RTNLGRP_IPV6_IFINFO.
Bug was introduced by commit 8d7a76c9.

Signed-off-by: Wang Xuefu <xuefu.wang@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Thomas Graf <tgraf@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-10 12:49:36 -08:00
David S. Miller
eaa7dcde1d Merge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/net-next-2.6 2010-12-10 11:22:57 -08:00
Martin Lucina
c1249c0aae net: Document the kernel_recvmsg() function
Signed-off-by: Martin Lucina <mato@kotelna.sk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-10 11:13:18 -08:00
David S. Miller
1e13f863ca Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
Conflicts:
	drivers/net/wireless/ath/ath9k/ar9003_eeprom.c
2010-12-10 09:50:47 -08:00
Uwe Kleine-König
a34f0b3139 fix comment typos concerning "consistent"
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-12-10 16:04:28 +01:00
Shan Wei
b7ec19af63 dccp: remove unused macros
Remove macros which have been unused since the initial implementation
(commit 7c657876b6, [DCCP]: Initial
 implementation from Tue Aug 9 20:14:34 2005 -0700).

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2010-12-10 12:49:23 +01:00
Eric Dumazet
4bc65dd8d8 filter: use size of fetched data in __load_pointer()
__load_pointer() checks data we fetch from skb is included in head
portion, but assumes we fetch one byte, instead of up to four.

This wont crash because we have extra bytes (struct skb_shared_info)
after head, but this can read uninitialized bytes.

Fix this using size of the data (1, 2, 4 bytes) in the test.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-09 20:47:04 -08:00
Thomas Egerer
78347c8c6b xfrm: Fix xfrm_state_migrate leak
xfrm_state_migrate calls kfree instead of xfrm_state_put to free
a failed state. According to git commit 553f9118 this can cause
memory leaks.

Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-09 20:35:27 -08:00
Eric Dumazet
68835aba4d net: optimize INET input path further
Followup of commit b178bb3dfc (net: reorder struct sock fields)

Optimize INET input path a bit further, by :

1) moving sk_refcnt close to sk_lock.

This reduces number of dirtied cache lines by one on 64bit arches (and
64 bytes cache line size).

2) moving inet_daddr & inet_rcv_saddr at the beginning of sk

(same cache line than hash / family / bound_dev_if / nulls_node)

This reduces number of accessed cache lines in lookups by one, and dont
increase size of inet and timewait socks.
inet and tw sockets now share same place-holder for these fields.

Before patch :

offsetof(struct sock, sk_refcnt) = 0x10
offsetof(struct sock, sk_lock) = 0x40
offsetof(struct sock, sk_receive_queue) = 0x60
offsetof(struct inet_sock, inet_daddr) = 0x270
offsetof(struct inet_sock, inet_rcv_saddr) = 0x274

After patch :

offsetof(struct sock, sk_refcnt) = 0x44
offsetof(struct sock, sk_lock) = 0x48
offsetof(struct sock, sk_receive_queue) = 0x68
offsetof(struct inet_sock, inet_daddr) = 0x0
offsetof(struct inet_sock, inet_rcv_saddr) = 0x4

compute_score() (udp or tcp) now use a single cache line per ignored
item, instead of two.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-09 20:05:58 -08:00
David S. Miller
defb3519a6 net: Abstract away all dst_entry metrics accesses.
Use helper functions to hide all direct accesses, especially writes,
to dst_entry metrics values.

This will allow us to:

1) More easily change how the metrics are stored.

2) Implement COW for metrics.

In particular this will help us put metrics into the inetpeer
cache if that is what we end up doing.  We can make the _metrics
member a pointer instead of an array, initially have it point
at the read-only metrics in the FIB, and then on the first set
grab an inetpeer entry and point the _metrics member there.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
2010-12-09 10:46:36 -08:00
David S. Miller
4e085e76cb econet: Fix crash in aun_incoming().
Unconditional use of skb->dev won't work here,
try to fetch the econet device via skb_dst()->dev
instead.

Suggested by Eric Dumazet.

Reported-by: Nelson Elhage <nelhage@ksplice.com>
Tested-by: Nelson Elhage <nelhage@ksplice.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-08 20:51:15 -08:00
David S. Miller
fe6c791570 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/wireless/ath/ath9k/ar9003_eeprom.c
	net/llc/af_llc.c
2010-12-08 13:47:38 -08:00
John W. Linville
393934c6b5 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
Conflicts:
	drivers/net/wireless/ath/ath9k/ath9k.h
	drivers/net/wireless/ath/ath9k/main.c
	drivers/net/wireless/ath/ath9k/xmit.c
2010-12-08 16:23:31 -05:00
Ben Greear
2f88675011 mac80211: Show max number of probe tries in debug message.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-08 15:38:44 -05:00
Helmut Schaa
80d7e403c9 mac80211: Apply ht_opmode changes in ieee80211_change_bss
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-08 15:38:43 -05:00
Helmut Schaa
50b12f597b cfg80211: Add new BSS attribute ht_opmode
Add a new BSS attribute to allow hostapd to set the current HT opmode.
Otherwise drivers won't be able to set up protection for HT rates in
AP mode.

Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-08 15:38:43 -05:00
Eric Dumazet
f19872575f tcp: protect sysctl_tcp_cookie_size reads
Make sure sysctl_tcp_cookie_size is read once in
tcp_cookie_size_check(), or we might return an illegal value to caller
if sysctl_tcp_cookie_size is changed by another cpu.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: William Allen Simpson <william.allen.simpson@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-08 12:34:09 -08:00
Eric Dumazet
ad9f4f50fe tcp: avoid a possible divide by zero
sysctl_tcp_tso_win_divisor might be set to zero while one cpu runs in
tcp_tso_should_defer(). Make sure we dont allow a divide by zero by
reading sysctl_tcp_tso_win_divisor exactly once.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-08 12:34:08 -08:00
Helmut Schaa
7e24470756 mac80211: Fix BUG in pskb_expand_head when transmitting shared skbs
mac80211 doesn't handle shared skbs correctly at the moment. As a result
a possible resize can trigger a BUG in pskb_expand_head.

[  676.030000] Kernel bug detected[#1]:
[  676.030000] Cpu 0
[  676.030000] $ 0   : 00000000 00000000 819662ff 00000002
[  676.030000] $ 4   : 81966200 00000020 00000000 00000020
[  676.030000] $ 8   : 819662e0 800043c0 00000002 00020000
[  676.030000] $12   : 3b9aca00 00000000 00000000 00470000
[  676.030000] $16   : 80ea2000 00000000 00000000 00000000
[  676.030000] $20   : 818aa200 80ea2018 80ea2000 00000008
[  676.030000] $24   : 00000002 800ace5c
[  676.030000] $28   : 8199a000 8199bd20 81938f88 80f180d4
[  676.030000] Hi    : 0000026e
[  676.030000] Lo    : 0000757e
[  676.030000] epc   : 801245e4 pskb_expand_head+0x44/0x1d8
[  676.030000]     Not tainted
[  676.030000] ra    : 80f180d4 ieee80211_skb_resize+0xb0/0x114 [mac80211]
[  676.030000] Status: 1000a403    KERNEL EXL IE
[  676.030000] Cause : 10800024
[  676.030000] PrId  : 0001964c (MIPS 24Kc)
[  676.030000] Modules linked in: mac80211_hwsim rt2800lib rt2x00soc rt2x00pci rt2x00lib mac80211 crc_itu_t crc_ccitt cfg80211 compat arc4 aes_generic deflate ecb cbc [last unloaded: rt2800pci]
[  676.030000] Process kpktgend_0 (pid: 97, threadinfo=8199a000, task=81879f48, tls=00000000)
[  676.030000] Stack : ffffffff 00000000 00000000 00000014 00000004 80ea2000 00000000 00000000
[  676.030000]         818aa200 80f180d4 ffffffff 0000000a 81879f78 81879f48 81879f48 00000018
[  676.030000]         81966246 80ea2000 818432e0 80f1a420 80203050 81814d98 00000001 81879f48
[  676.030000]         81879f48 00000018 81966246 818432e0 0000001a 8199bdd4 0000001c 80f1b72c
[  676.030000]         80203020 8001292c 80ef4aa2 7f10b55d 801ab5b8 81879f48 00000188 80005c90
[  676.030000]         ...
[  676.030000] Call Trace:
[  676.030000] [<801245e4>] pskb_expand_head+0x44/0x1d8
[  676.030000] [<80f180d4>] ieee80211_skb_resize+0xb0/0x114 [mac80211]
[  676.030000] [<80f1a420>] ieee80211_xmit+0x150/0x22c [mac80211]
[  676.030000] [<80f1b72c>] ieee80211_subif_start_xmit+0x6f4/0x73c [mac80211]
[  676.030000] [<8014361c>] pktgen_thread_worker+0xfac/0x16f8
[  676.030000] [<8002ebe8>] kthread+0x7c/0x88
[  676.030000] [<80008e0c>] kernel_thread_helper+0x10/0x18
[  676.030000]
[  676.030000]
[  676.030000] Code: 24020001  10620005  2502001f <0200000d> 0804917a  00000000  2502001f  00441023  00531021

Fix this by making a local copy of shared skbs prior to mangeling them.
To avoid copying the skb unnecessarily move the skb_copy call below the
checks that don't need write access to the skb.

Also, move the assignment of nh_pos and h_pos below the skb_copy to point
to the correct skb.

It would be possible to avoid another resize of the copied skb by using
skb_copy_expand instead of skb_copy but that would make the patch more
complex. Also, shared skbs are a corner case right now, so the resize
shouldn't matter much.

Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Cc: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-08 15:23:48 -05:00
Tom Herbert
67631510a3 tcp: Replace time wait bucket msg by counter
Rather than printing the message to the log, use a mib counter to keep
track of the count of occurences of time wait bucket overflow.  Reduces
spam in logs.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-08 12:16:33 -08:00
Apollon Oikonomopoulos
171995e5d8 x25: decrement netdev reference counts on unload
x25 does not decrement the network device reference counts on module unload.
Thus unregistering any pre-existing interface after unloading the x25 module
hangs and results in

 unregister_netdevice: waiting for tap0 to become free. Usage count = 1

This patch decrements the reference counts of all interfaces in x25_link_free,
the way it is already done in x25_link_device_down for NETDEV_DOWN events.

Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-08 12:13:44 -08:00
Michal Marek
e8d34a884e l2tp: Fix modalias of l2tp_ip
Using the SOCK_DGRAM enum results in
"net-pf-2-proto-SOCK_DGRAM-type-115", so use the numeric value like it
is done in net/dccp.

Signed-off-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-08 12:13:43 -08:00
Nelson Elhage
0c62fc6dd0 econet: Do the correct cleanup after an unprivileged SIOCSIFADDR.
We need to drop the mutex and do a dev_put, so set an error code and break like
the other paths, instead of returning directly.

Signed-off-by: Nelson Elhage <nelhage@ksplice.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-08 12:13:42 -08:00
Changli Gao
920b8d913b af_packet: fix freeing pg_vec twice on error path
It is introduced in:
        commit 0e3125c755
        Author: Neil Horman <nhorman@tuxdriver.com>
        Date:   Tue Nov 16 10:26:47 2010 -0800

        packet: Enhance AF_PACKET implementation to not require high order contiguous memory allocation (v4)

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-08 10:43:41 -08:00
Changli Gao
f6dafa95d1 af_packet: eliminate pgv_to_page on some arches
Some arches don't need flush_dcache_page(), and don't implement it, so
we can eliminate pgv_to_page() calls on those arches.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-08 10:43:41 -08:00
Eric Dumazet
15c2d75f49 net: call dev_queue_xmit_nit() after skb_dst_drop()
Avoid some atomic ops on dst refcount, calling dev_queue_xmit_nit()
after skb_dst_drop() in dev_hard_start_xmit().

When queueing a packet into af_packet socket, we drop dst anyway.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-08 10:39:54 -08:00
Eric Dumazet
62ab081213 filter: constify sk_run_filter()
sk_run_filter() doesnt write on skb, change its prototype to reflect
this.

Fix two af_packet comments.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-08 10:30:34 -08:00
Eric Dumazet
941666c2e3 net: RCU conversion of dev_getbyhwaddr() and arp_ioctl()
Le dimanche 05 décembre 2010 à 09:19 +0100, Eric Dumazet a écrit :

> Hmm..
>
> If somebody can explain why RTNL is held in arp_ioctl() (and therefore
> in arp_req_delete()), we might first remove RTNL use in arp_ioctl() so
> that your patch can be applied.
>
> Right now it is not good, because RTNL wont be necessarly held when you
> are going to call arp_invalidate() ?

While doing this analysis, I found a refcount bug in llc, I'll send a
patch for net-2.6

Meanwhile, here is the patch for net-next-2.6

Your patch then can be applied after mine.

Thanks

[PATCH] net: RCU conversion of dev_getbyhwaddr() and arp_ioctl()

dev_getbyhwaddr() was called under RTNL.

Rename it to dev_getbyhwaddr_rcu() and change all its caller to now use
RCU locking instead of RTNL.

Change arp_ioctl() to use RCU instead of RTNL locking.

Note: this fix a dev refcount bug in llc

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-08 10:07:24 -08:00
David S. Miller
a2d4b65d47 Merge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/net-next-2.6 2010-12-08 10:01:00 -08:00
Eric Dumazet
35d9b0c906 llc: fix a device refcount imbalance
Le dimanche 05 décembre 2010 à 12:23 +0100, Eric Dumazet a écrit :
> Le dimanche 05 décembre 2010 à 09:19 +0100, Eric Dumazet a écrit :
>
> > Hmm..
> >
> > If somebody can explain why RTNL is held in arp_ioctl() (and therefore
> > in arp_req_delete()), we might first remove RTNL use in arp_ioctl() so
> > that your patch can be applied.
> >
> > Right now it is not good, because RTNL wont be necessarly held when you
> > are going to call arp_invalidate() ?
>
> While doing this analysis, I found a refcount bug in llc, I'll send a
> patch for net-2.6

Oh well, of course I must first fix the bug in net-2.6, and wait David
pull the fix in net-next-2.6 before sending this rcu conversion.

Note: this patch should be sent to stable teams (2.6.34 and up)

[PATCH net-2.6] llc: fix a device refcount imbalance

commit abf9d537fe (llc: add support for SO_BINDTODEVICE) added one
refcount imbalance in llc_ui_bind(), because dev_getbyhwaddr() doesnt
take a reference on device, while dev_get_by_index() does.

Fix this using RCU locking. And since an RCU conversion will be done for
2.6.38 for dev_getbyhwaddr(), put the rcu_read_lock/unlock exactly at
their final place.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: stable@kernel.org
Cc: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-08 09:58:44 -08:00
Thiago Farina
01b0c5cfb2 net/9p/protocol.c: Remove duplicated macros.
Use the macros already provided by kernel.h file.

Signed-off-by: Thiago Farina <tfransosi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-08 09:56:28 -08:00
Changli Gao
aa94210411 net: init ingress queue
The dev field of ingress queue is forgot to initialized, then NULL
pointer dereference happens in qdisc_alloc().

Move inits of tx queues to netif_alloc_netdev_queues().

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-08 09:43:27 -08:00
Nandita Dukkipati
b1afde60f2 tcp: Bug fix in initialization of receive window.
The bug has to do with boundary checks on the initial receive window.
If the initial receive window falls between init_cwnd and the
receive window specified by the user, the initial window is incorrectly
brought down to init_cwnd. The correct behavior is to allow it to
remain unchanged.

Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-08 09:38:37 -08:00
David S. Miller
4f58605e6b Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-12-08 08:13:01 -08:00
Miloslav Trmač
6f107b5861 net: Add missing lockdep class names for af_alg
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2010-12-08 14:35:34 +08:00
NeilBrown
ed2849d3ec sunrpc: prevent use-after-free on clearing XPT_BUSY
When an xprt is created, it has a refcount of 1, and XPT_BUSY is set.
The refcount is *not* owned by the thread that created the xprt
(as is clear from the fact that creators never put the reference).
Rather, it is owned by the absence of XPT_DEAD.  Once XPT_DEAD is set,
(And XPT_BUSY is clear) that initial reference is dropped and the xprt
can be freed.

So when a creator clears XPT_BUSY it is dropping its only reference and
so must not touch the xprt again.

However svc_recv, after calling ->xpo_accept (and so getting an XPT_BUSY
reference on a new xprt), calls svc_xprt_recieved.  This clears
XPT_BUSY and then svc_xprt_enqueue - this last without owning a reference.
This is dangerous and has been seen to leave svc_xprt_enqueue working
with an xprt containing garbage.

So we need to hold an extra counted reference over that call to
svc_xprt_received.

For safety, any time we clear XPT_BUSY and then use the xprt again, we
first get a reference, and the put it again afterwards.

Note that svc_close_all does not need this extra protection as there are
no threads running, and the final free can only be called asynchronously
from such a thread.

Signed-off-by: NeilBrown <neilb@suse.de>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-12-07 20:39:55 -05:00
Johan Hedberg
a40c406cbd Bluetooth: Make hci_send_to_sock usable for management control sockets
In order to send data to management control sockets the function should:

  - skip checks intended for raw HCI data and stack internal events
  - make sure RAW HCI data or stack internal events don't go to
    management control sockets

In order to accomplish this the patch adds a new member to the bluetooth
skb private data to flag skb's that are destined for management control
sockets.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-07 23:03:39 -02:00
Johan Hedberg
0381101fd6 Bluetooth: Add initial Bluetooth Management interface callbacks
Add initial code for handling Bluetooth Management interface messages.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Acked-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-07 23:03:38 -02:00
Javier Cardona
7659a193f9 mac80211: Fix compilation error when mesh is disabled
Wrap mesh sections inside CONFIG_MAC80211_MESH to fix compilation
problems reported by Stephen Rothwell, Larry Finger and Bruno Randolf.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-07 17:08:09 -05:00
Felix Fietkau
c658e5db01 mac80211: fix a compiler warning
net/mac80211/mlme.c: In function 'ieee80211_sta_work':
net/mac80211/mlme.c:1981: warning: too many arguments for format

Introduced by commit 04ac3c0ee2
("mac80211: speed up AP probing using nullfunc frames").

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-07 17:08:06 -05:00
Eliad Peller
0ab82b04ac mac80211: fix dynamic-ps/pm_qos magic numbers
mac80211 uses pm_qos (/dev/network_latency) in order to determine the
dynamic ps timeout (or disable the dynamic-ps at all in some cases).

commit ff616381 added a comparison for the current network_latency
against one high value (1900ms), and against the default value
(2000sec, rather than the commented 2sec).

however, the representation of 1900ms was incorrect:
1900ms = 1900000us ( != 1900000000 )

fix it by using USEC_TO_MSEC/SEC consts.

Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-07 16:09:13 -05:00
Bruno Randolf
541a45a142 nl80211/mac80211: Report signal average
Extend nl80211 to report an exponential weighted moving average (EWMA) of the
signal value. Since the signal value usually fluctuates between different
packets, an average can be more useful than the value of the last packet.

This uses the recently added generic EWMA library function.

--
v2:	fix ABI breakage and change factor to be a power of 2.

Signed-off-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-07 16:09:12 -05:00
Tracey Dent
ff2109f5f9 Net: bluetooth: Makefile: Remove deprecated kbuild goal definitions
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-07 13:52:39 -02:00
Tomasz Grobelny
0491026507 dccp qpolicy: Parameter checking of cmsg qpolicy parameters
Ensure that cmsg->cmsg_type value is valid for qpolicy
that is currently in use.

Signed-off-by: Tomasz Grobelny <tomasz@grobelny.oswiecenia.net>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2010-12-07 13:47:12 +01:00
Tomasz Grobelny
871a2c16c2 dccp: Policy-based packet dequeueing infrastructure
This patch adds a generic infrastructure for policy-based dequeueing of
TX packets and provides two policies:
 * a simple FIFO policy (which is the default) and
 * a priority based policy (set via socket options).
Both policies honour the tx_qlen sysctl for the maximum size of the write
queue (can be overridden via socket options).

The priority policy uses skb->priority internally to assign an u32 priority
identifier, using the same ranking as SO_PRIORITY. The skb->priority field
is set to 0 when the packet leaves DCCP. The priority is supplied as ancillary
data using cmsg(3), the patch also provides the requisite parsing routines.

Signed-off-by: Tomasz Grobelny <tomasz@grobelny.oswiecenia.net>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2010-12-07 13:47:12 +01:00
David Shwatrz
8917a3c0b7 Fix a typo in datagram.c and sctp/socket.c.
Hi,
This patch fixes a typo in net/core/datagram.c and in net/sctp/socket.c

Regards,
David Shwartz

Signed-off-by: David Shwartz <dshwatrz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-06 13:10:11 -08:00
Johannes Berg
29cbe68c51 cfg80211/mac80211: add mesh join/leave commands
Instead of tying mesh activity to interface up,
add join and leave commands for mesh. Since we
must be backward compatible, let cfg80211 handle
joining a mesh if a mesh ID was pre-configured
when the device goes up.

Note that this therefore must modify mac80211 as
well since mac80211 needs to lose the logic to
start the mesh on interface up.

We now allow querying mesh parameters before the
mesh is connected, which simply returns defaults.
Setting them (internally renamed to "update") is
only allowed while connected. Specify them with
the new mesh join command instead where needed.

In mac80211, beaconing must now also follow the
mesh enabled/not enabled state, which is done
by testing the mesh ID.

Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-06 16:01:29 -05:00
Johannes Berg
bd90fdcc5f nl80211: refactor mesh parameter parsing
I'm going to need this in a new place later.

Tested-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-06 16:01:29 -05:00
Johannes Berg
f9e10ce4cf cfg80211: require add_virtual_intf to return new dev
cfg80211 used to do all its bookkeeping in
the notifier, but some new stuff will have
to use local variables so make the callback
return the netdev pointer.

Tested-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-06 16:01:28 -05:00
Johannes Berg
09b1747026 mac80211: move mesh filter adjusting
Logically, the filter adjusting belongs with
starting/stopping mesh, not interface up/down,
so move it there.

Tested-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-06 16:01:28 -05:00
Javier Cardona
45904f2165 nl80211/mac80211: define and allow configuring mesh element TTL
The TTL in path selection information elements is different from
the mesh ttl used in mesh data frames.  Version 7.03 of the 11s
draft calls this ttl 'Element TTL'.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-06 16:01:28 -05:00
Eric Dumazet
2d5311e4e8 filter: add a security check at install time
We added some security checks in commit 57fe93b374
(filter: make sure filters dont read uninitialized memory) to close a
potential leak of kernel information to user.

This added a potential extra cost at run time, while we can perform a
check of the filter itself, to make sure a malicious user doesnt try to
abuse us.

This patch adds a check_loads() function, whole unique purpose is to
make this check, allocating a temporary array of mask. We scan the
filter and propagate a bitmask information, telling us if a load M(K) is
allowed because a previous store M(K) is guaranteed. (So that
sk_run_filter() can possibly not read unitialized memory)

Note: this can uncover application bug, denying a filter attach,
previously allowed.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Dan Rosenberg <drosenberg@vsecurity.com>
Cc: Changli Gao <xiaosuo@gmail.com>
Acked-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-06 12:59:09 -08:00
Changli Gao
ae9c416d68 net: arp: use assignment
Only when dont_send is 0, arp_filter() is consulted, so we can simply
assign the return value of arp_filter() to dont_send instead.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-06 12:59:09 -08:00
Changli Gao
c56b4d9012 af_packet: remove pgv.flags
As we can check if an address is vmalloc address with is_vmalloc_addr(),
we remove pgv.flags. Then we may get more pg_vecs.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-06 12:59:07 -08:00
Changli Gao
0af55bb58f af_packet: use vmalloc_to_page() instead for the addresss returned by vmalloc()
The following commit causes the pgv->buffer may point to the memory
returned by vmalloc(). And we can't use virt_to_page() for the vmalloc
address.

This patch introduces a new inline function pgv_to_page(), which calls
vmalloc_to_page() for the vmalloc address, and virt_to_page() for the
__get_free_pages address.

We used to increase page pointer to get the next page at the next page
address, after Neil's patch, it is wrong, as the physical address may
be not continuous. This patch also fixes this issue.

    commit 0e3125c755
    Author: Neil Horman <nhorman@tuxdriver.com>
    Date:   Tue Nov 16 10:26:47 2010 -0800

    packet: Enhance AF_PACKET implementation to not require high order contiguous memory allocation (v4)

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-06 12:59:06 -08:00
Eric Dumazet
f7fce74e38 net: kill an RCU warning in inet_fill_link_af()
commits 9f0f7272 (ipv4: AF_INET link address family) and cf7afbfeb8
(rtnl: make link af-specific updates atomic) used incorrect
__in_dev_get_rcu() in RTNL protected contexts, triggering PROVE_RCU
warnings.

Switch to __in_dev_get_rtnl(), wich is more appropriate, since we hold
RTNL.

Based on a report and initial patch from Amerigo Wang.

Reported-by: Amerigo Wang <amwang@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Thomas Graf <tgraf@infradead.org>
Reviewed-by: WANG Cong <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-06 12:59:06 -08:00
Eric Dumazet
da2033c282 filter: add SKF_AD_RXHASH and SKF_AD_CPU
Add SKF_AD_RXHASH and SKF_AD_CPU to filter ancillary mechanism,
to be able to build advanced filters.

This can help spreading packets on several sockets with a fast
selection, after RPS dispatch to N cpus for example, or to catch a
percentage of flows in one queue.

tcpdump -s 500 "cpu = 1" :

[0] ld CPU
[1] jeq #1  jt 2  jf 3
[2] ret #500
[3] ret #0

# take 12.5 % of flows (average)
tcpdump -s 1000 "rxhash & 7 = 2" :

[0] ld RXHASH
[1] and #7
[2] jeq #2  jt 3  jf 4
[3] ret #1000
[4] ret #0

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Rui <wirelesser@gmail.com>
Acked-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-06 12:59:05 -08:00
Michał Mirosław
7903264402 net: Fix too optimistic NETIF_F_HW_CSUM features
NETIF_F_HW_CSUM is a superset of NETIF_F_IP_CSUM+NETIF_F_IPV6_CSUM, but
some drivers miss the difference. Fix this and also fix UFO dependency
on checksumming offload as it makes the same mistake in assumptions.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Acked-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-06 12:59:04 -08:00
Felix Fietkau
04ac3c0ee2 mac80211: speed up AP probing using nullfunc frames
If the nullfunc frame used to probe the AP was not acked, there is no point
in waiting for the probe timeout, so advance to the next try (or disconnect)
immediately.
If we do reach the probe timeout without having received a tx status, the
connection is probably really bad and worth disconnecting.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-06 15:58:44 -05:00
Felix Fietkau
75706d0e9d mac80211: remove a redundant check
ieee80211_is_nullfunc() implies ieee80211_is_data()

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-06 15:58:43 -05:00
Helmut Schaa
c1ce5a74d1 mac80211: Update last_tx_rate only for data frames
The last_tx_rate field was also updated for non-data frames that are
often sent with a lower rate (for example management frames at 1 Mbps).
This is confusing when the data rate is actually much higher.

Hence, only update the last_tx_rate field with tx rate information
gathered from last data frames.

If the rate control algorithm filled in txrc.reported_rate we don't need
to verify this information.

Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-12-06 15:58:43 -05:00
John W. Linville
f435d9eea0 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem 2010-12-06 15:35:34 -05:00
Johan Hedberg
183f732c3f Bluetooth: Fix initial RFCOMM DLC security level
Due to commit 63ce0900 connections initiated through TTYs created with
"rfcomm bind ..." would have security level BT_SECURITY_SDP instead of
BT_SECURITY_LOW. This would cause instant connection failure between any
two SSP capable devices due to the L2CAP connect request to RFCOMM being
sent before authentication has been performed. This patch fixes the
regression by always initializing the DLC security level to
BT_SECURITY_LOW.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Acked-by: Luiz Augusto von Dentz <luiz.dentz-von@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-06 15:47:44 -02:00
Gustavo F. Padovan
df6bd743b6 Bluetooth: Don't accept ConfigReq if we aren't in the BT_CONFIG state
If such event happens we shall reply with a Command Reject, because we are
not expecting any configure request.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-12-06 15:37:50 -02:00
Eric Dumazet
46bcf14f44 filter: fix sk_filter rcu handling
Pavel Emelyanov tried to fix a race between sk_filter_(de|at)tach and
sk_clone() in commit 47e958eac2

Problem is we can have several clones sharing a common sk_filter, and
these clones might want to sk_filter_attach() their own filters at the
same time, and can overwrite old_filter->rcu, corrupting RCU queues.

We can not use filter->rcu without being sure no other thread could do
the same thing.

Switch code to a more conventional ref-counting technique : Do the
atomic decrement immediately and queue one rcu call back when last
reference is released.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-06 09:29:43 -08:00
Changli Gao
ca44ac3861 net: don't reallocate skb->head unless the current one hasn't the needed extra size or is shared
skb head being allocated by kmalloc(), it might be larger than what
actually requested because of discrete kmem caches sizes. Before
reallocating a new skb head, check if the current one has the needed
extra size.

Do this check only if skb head is not shared.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-03 10:59:47 -08:00
Johannes Berg
0bae35e14b leds: fix up dependencies
It's not useful to build LED triggers when there's no LEDs that can be
triggered by them.  Therefore, fix up the dependencies so that this
cannot happen, and fix a few users that select triggers to depend on
LEDS_CLASS as well (there is also one user that also selects LEDS_CLASS,
which is OK).

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Tested-by: Ingo Molnar <mingo@elte.hu>
Cc: Arnd Hannemann <arnd@arndnet.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Richard Purdie <rpurdie@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-12-02 14:51:15 -08:00
Allan Stephens
b924dcf003 tipc: Delete tipc_ownidentity()
Moves the content of the native API routine tipc_ownidentity() into the
sole routine that calls it, since it can no longer be called in isolation.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-02 13:34:06 -08:00
Allan Stephens
12bae479ee tipc: Eliminate obsolete native API forwarding routines
Moves the content of each native API message forwarding routine
into the sole routine that calls it, since the forwarding routines
no longer be called in isolation. Also removes code in each routine
that altered the outgoing message's importance level since this is
now no longer possible.

The previous function mapping (parent function, and child API) was
as follows:

   tipc_send2name
       \--tipc_forward2name

   tipc_send2port
       \--tipc_forward2port

   tipc_send_buf2port
       \--tipc_forward_buf2port

After this commit, the children don't exist and their functionality
is completely in the respective parent.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-02 13:34:06 -08:00
Allan Stephens
471450f7ec tipc: Eliminate an unused symbolic constant in link code
Removes a symbol that is not referenced anywhere by TIPC's link code.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-02 13:34:05 -08:00
Allan Stephens
52fe7b725e tipc: Eliminate useless initialization when creating subscriber
Removes initialization of a local variable that is always assigned
a different value before it is referenced.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-02 13:34:05 -08:00
Allan Stephens
38f232eae2 tipc: Remove unused domain argument from multicast send routine
Eliminates an unused argument from tipc_multicast(), now that this
routine can no longer be called by kernel-based applications.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-02 13:34:04 -08:00
Allan Stephens
a5c2af9922 tipc: Remove support for TIPC mode change callback
Eliminates support for the callback routine invoked when TIPC
changes its mode of operation from inactive to standalone or from
standalone to networked. This callback was part of TIPC's obsolete
native API and is not used by TIPC internally.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-02 13:34:04 -08:00
Allan Stephens
528c771e87 tipc: Delete useless function prototypes
Removes several function declarations that aren't used anywhere,
either because they reference routines that no longer exist or
because all users of the function reference it after it has already
been defined.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-02 13:34:03 -08:00
Allan Stephens
28cc937eac tipc: Eliminate useless return value when disabling a bearer
Modifies bearer_disable() to return void since it always indicates
success anyway.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-02 13:34:03 -08:00
Allan Stephens
8d71919d7a tipc: Delete unused configuration service structure definition
Removes a structure definition that is no longer used by TIPC's
configuration service.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-02 13:34:02 -08:00
Allan Stephens
c802628297 tipc: Remove obsolete inclusions of header files
Gets rid of #include statements that are no longer required as a
result of the merging of obsolete native API header file content
into other TIPC include files.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-02 13:34:02 -08:00
Allan Stephens
d265fef6dd tipc: Remove obsolete native API files and exports
As part of the removal of TIPC's native API support it is no longer
necessary for TIPC to export symbols for routines that can be called
by kernel-based applications, nor for it to have header files that
kernel-based applications can include to access the declarations for
those routines. This commit eliminates the exporting of symbols by
TIPC and migrates the contents of each obsolete native API include
file into its corresponding non-native API equivalent.

The code which was migrated in this commit was migrated intact, in
that there are no technical changes combined with the relocation.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-02 13:34:01 -08:00
Shan Wei
b672083ed3 ipv6: use ND_REACHABLE_TIME and ND_RETRANS_TIMER instead of magic number
ND_REACHABLE_TIME and ND_RETRANS_TIMER have defined
since v2.6.12-rc2, but never been used.
So use them instead of magic number.

This patch also changes original code style to read comfortably .

Thank YOSHIFUJI Hideaki for pointing it out.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-02 13:27:32 -08:00
Shan Wei
97b1ce25e8 tcp: use TCP_BASE_MSS to set basic mss value
TCP_BASE_MSS is defined, but not used.
commit 5d424d5a introduce this macro, so use
it to initial sysctl_tcp_base_mss.

commit 5d424d5a67
Author: John Heffner <jheffner@psc.edu>
Date:   Mon Mar 20 17:53:41 2006 -0800

    [TCP]: MTU probing

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-02 13:27:32 -08:00
John W. Linville
09f921f83f Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
Conflicts:
	drivers/net/wireless/ath/ath9k/ar9003_eeprom.c
2010-12-02 15:46:37 -05:00
David S. Miller
db3949c450 tcp: Implement ipv6 ->get_peer() and ->tw_get_peer().
Now ipv6 timewait recycling is fully implemented and
enabled.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-02 12:14:37 -08:00
David S. Miller
493f377d6d tcp: Add timewait recycling bits to ipv6 connect code.
This will also improve handling of ipv6 tcp socket request
backlog when syncookies are not enabled.  When backlog
becomes very deep, last quarter of backlog is limited to
validated destinations.  Previously only ipv4 implemented
this logic, but now ipv6 does too.

Now we are only one step away from enabling timewait
recycling for ipv6, and that step is simply filling in
the implementation of tcp_v6_get_peer() and
tcp_v6_tw_get_peer().

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-02 12:14:29 -08:00
John W. Linville
1937721f56 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/padovan/bluetooth-2.6 2010-12-02 14:00:51 -05:00
David S. Miller
ae4694b2d3 ipv6: Create inet6_csk_route_req().
Brother of ipv4's inet_csk_route_req().

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-02 10:59:22 -08:00
David S. Miller
ccb7c410dd timewait_sock: Create and use getpeer op.
The only thing AF-specific about remembering the timestamp
for a time-wait TCP socket is getting the peer.

Abstract that behind a new timewait_sock_ops vector.

Support for real IPV6 sockets is not filled in yet, but
curiously this makes timewait recycling start to work
for v4-mapped ipv6 sockets.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-01 18:09:13 -08:00
David S. Miller
8790ca172a inetpeer: Kill use of inet_peer_address_t typedef.
They are verboten these days.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-01 17:28:18 -08:00
Andrei Emeltchenko
70f23020e6 Bluetooth: clean up hci code
Do not use assignment in IF condition, remove extra spaces,
fixing typos, simplify code.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-01 21:04:43 -02:00
Andrei Emeltchenko
894718a6be Bluetooth: clean up l2cap code
Do not initialize static vars to zero, macros with complex values
shall be enclosed with (), remove unneeded braces.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-01 21:04:43 -02:00
Andrei Emeltchenko
285b4e9031 Bluetooth: clean up rfcomm code
Remove extra spaces, assignments in if statement, zeroing static
variables, extra braces. Fix includes.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-01 21:04:43 -02:00
Andrei Emeltchenko
735cbc4784 Bluetooth: clean up sco code
Do not use assignments in IF condition, remove extra spaces

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-01 21:04:43 -02:00
Anderson Lizardo
b78d7b4f20 Bluetooth: Fix error handling for l2cap_init()
create_singlethread_workqueue() may fail with errors such as -ENOMEM. If
this happens, the return value is not set to a negative value and the
module load will succeed. It will then crash on module unload because of
a destroy_workqueue() call on a NULL pointer.

Additionally, the _busy_wq workqueue is not being destroyed if any
errors happen on l2cap_init().

Signed-off-by: Anderson Lizardo <anderson.lizardo@openbossa.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-01 21:04:43 -02:00
Gustavo F. Padovan
eeb366564b Bluetooth: Get rid of __rfcomm_get_sock_by_channel()
rfcomm_get_sock_by_channel() was the only user of this function, so I merged
both into rfcomm_get_sock_by_channel(). The socket lock now should be hold
outside of rfcomm_get_sock_by_channel() once we hold and release it inside the
same function now.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-01 21:04:43 -02:00
Gustavo F. Padovan
e0f0cb5636 Bluetooth: Get rid of __l2cap_get_sock_by_psm()
l2cap_get_sock_by_psm() was the only user of this function, so I merged
both into l2cap_get_sock_by_psm(). The socket lock now should be hold
outside of l2cap_get_sock_by_psm() once we hold and release it inside the
same function now.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-01 21:04:42 -02:00
Andrei Emeltchenko
cc11b9c14d Bluetooth: do not use assignment in if condition
Fix checkpatch errors like:
"ERROR: do not use assignment in if condition"
Simplify code and fix one long line.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com>
Acked-by: Ville Tervo <ville.tervo@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-01 21:04:36 -02:00
Andrei Emeltchenko
940a9eea80 Bluetooth: timer check sk is not owned before freeing
In timer context we might delete l2cap channel used by krfcommd.
The check makes sure that sk is not owned. If sk is owned we
restart timer for HZ/5.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-01 21:04:36 -02:00
Andrei Emeltchenko
a49184c229 Bluetooth: Check sk is not owned before freeing l2cap_conn
Check that socket sk is not locked in user process before removing
l2cap connection handler.

lock_sock and release_sock do not hold a normal spinlock directly but
instead hold the owner field. This means bh_lock_sock can still execute
even if the socket is "locked". More info can be found here:
http://www.linuxfoundation.org/collaborate/workgroups/networking/socketlocks

krfcommd kernel thread may be preempted with l2cap tasklet which remove
l2cap_conn structure. If krfcommd is in process of sending of RFCOMM reply
(like "RFCOMM UA" reply to "RFCOMM DISC") then kernel crash happens.

...
[  694.175933] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[  694.184936] pgd = c0004000
[  694.187683] [00000000] *pgd=00000000
[  694.191711] Internal error: Oops: 5 [#1] PREEMPT
[  694.196350] last sysfs file: /sys/devices/platform/hci_h4p/firmware/hci_h4p/loading
[  694.260375] CPU: 0    Not tainted  (2.6.32.10 #1)
[  694.265106] PC is at l2cap_sock_sendmsg+0x43c/0x73c [l2cap]
[  694.270721] LR is at 0xd7017303
...
[  694.525085] Backtrace:
[  694.527587] [<bf266be0>] (l2cap_sock_sendmsg+0x0/0x73c [l2cap]) from [<c02f2cc8>] (sock_sendmsg+0xb8/0xd8)
[  694.537292] [<c02f2c10>] (sock_sendmsg+0x0/0xd8) from [<c02f3044>] (kernel_sendmsg+0x48/0x80)

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-01 21:04:36 -02:00
Vasiliy Kulikov
d31dbf6e59 Bluetooth: hidp: fix information leak to userland
Structure hidp_conninfo is copied to userland with version, product,
vendor and name fields unitialized if both session->input and session->hid
are NULL.  It leads to leaking of contents of kernel stack memory.

Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-01 21:04:36 -02:00
Vasiliy Kulikov
3185fbd9d7 Bluetooth: cmtp: fix information leak to userland
Structure cmtp_conninfo is copied to userland with some padding fields
unitialized.  It leads to leaking of contents of kernel stack memory.

Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-01 21:04:35 -02:00
Vasiliy Kulikov
5520d20f68 Bluetooth: bnep: fix information leak to userland
Structure bnep_conninfo is copied to userland with the field "device"
that has the last elements unitialized.  It leads to leaking of
contents of kernel stack memory.

Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-01 21:04:35 -02:00
Johan Hedberg
127178d24c Bluetooth: Automate remote name requests
In Bluetooth there are no automatic updates of remote device names when
they get changed on the remote side. Instead, it is a good idea to do a
manual name request when a new connection gets created (for whatever
reason) since at this point it is very cheap (no costly baseband
connection creation needed just for the sake of the name request).

So far userspace has been responsible for this extra name request but
tighter control is needed in order not to flood Bluetooth controllers
with two many commands during connection creation. It has been shown
that some controllers simply fail to function correctly if they get too
many (almost) simultaneous commands during connection creation. The
simplest way to acheive better control of these commands is to move
their sending completely to the kernel side.

This patch inserts name requests into the sequence of events that the
kernel performs during connection creation. It does this after the
remote features have been successfully requested and before any pending
authentication requests are performed. The code will work sub-optimally
with userspace versions that still do the name requesting themselves (it
shouldn't break anything though) so it is recommended to combine this
with a userspace software version that doesn't have automated name
requests.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-01 21:04:35 -02:00
Johan Hedberg
392599b95d Bluetooth: Create a unified authentication request function
This patch adds a single function that's responsible for requesting
authentication for outgoing connections. This is preparation for the
next patch which will add automated name requests and thereby move the
authentication requests to a different location.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-01 21:04:35 -02:00
Johan Hedberg
ccd556fe33 Bluetooth: Simplify remote features callback function logic
The current remote and remote extended features event callbacks logic
can be made simpler by using a label and goto statements instead of the
current multiple levels of nested if statements.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-01 21:04:35 -02:00
Gustavo F. Padovan
17f490bced Merge git://git.kernel.org/pub/scm/linux/kernel/git/padovan/bluetooth-2.6 into test 2010-12-01 21:04:09 -02:00
David McCullough
6dcdd1b369 net/ipv6/sit.c: return unhandled skb to tunnel4_rcv
I found a problem using an IPv6 over IPv4 tunnel.  When CONFIG_IPV6_SIT
was enabled, the packets would be rejected as net/ipv6/sit.c was catching
all IPPROTO_IPV6 packets and returning an ICMP port unreachable error.

I think this patch fixes the problem cleanly.  I believe the code in
net/ipv4/tunnel4.c:tunnel4_rcv takes care of it properly if none of the
handlers claim the skb.

Signed-off-by: David McCullough <david_mccullough@mcafee.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-01 13:19:34 -08:00
stephen hemminger
8afe7c8acd ipip: add module alias for tunl0 tunnel device
If ipip is built as a module the 'ip tunnel add' command would fail because
the ipip module was not being autoloaded.  Adding an alias for
the tunl0 device name cause dev_load() to autoload it when needed.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-01 12:53:23 -08:00
stephen hemminger
4da6a738ff gre: add module alias for gre0 tunnel device
If gre is built as a module the 'ip tunnel add' command would fail because
the ip_gre module was not being autoloaded.  Adding an alias for
the gre0 device name cause dev_load() to autoload it when needed.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-01 12:53:22 -08:00
stephen hemminger
407d6fcbfd gre: minor cleanups
Use strcpy() rather the sprintf() for the case where name is getting
generated.  Fix indentation.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-01 12:53:22 -08:00
Eric Dumazet
f2cd2d3e9b net sched: use xps information for qdisc NUMA affinity
Allocate qdisc memory according to NUMA properties of cpus included in
xps map.

To be effective, qdisc should be (re)setup after changes
of /sys/class/net/eth<n>/queues/tx-<n>/xps_cpus

I added a numa_node field in struct netdev_queue, containing NUMA node
if all cpus included in xps_cpus share same node, else -1.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-01 12:47:42 -08:00
Anders Franzen
381601e5bb Make the ip6_tunnel reflect the true mtu.
The ip6_tunnel always assumes it consumes 40 bytes (ip6 hdr) of the mtu of the
underlaying device. So for a normal ethernet bearer, the mtu of the ip6_tunnel is
1460.
However, when creating a tunnel the encap limit option is enabled by default, and it
consumes 8 bytes more, so the true mtu shall be 1452.

I dont really know if this breaks some statement in some RFC, so this is a request for
comments.

Signed-off-by: Anders Franzen <anders.franzen@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-01 10:55:47 -08:00
David S. Miller
3f419d2d48 inet: Turn ->remember_stamp into ->get_peer in connection AF ops.
Then we can make a completely generic tcp_remember_stamp()
that uses ->get_peer() as a helper, minimizing the AF specific
code and minimizing the eventual code duplication when we implement
the ipv6 side of TW recycling.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-30 12:28:06 -08:00
David S. Miller
b341936380 ipv6: Add infrastructure to bind inet_peer objects to routes.
They are only allowed on cached ipv6 routes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-30 12:27:11 -08:00
David S. Miller
021e929911 inetpeer: Add v6 peers tree, abstract root properly.
Add the ipv6 peer tree instance, and adapt remaining
direct references to 'v4_peers' as needed.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-30 12:12:23 -08:00
David S. Miller
0266304502 inetpeer: Abstract address comparisons.
Now v4 and v6 addresses will both work properly.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-30 12:08:53 -08:00
David S. Miller
b534ecf1cd inetpeer: Make inet_getpeer() take an inet_peer_adress_t pointer.
And make an inet_getpeer_v4() helper, update callers.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-30 11:54:19 -08:00
David S. Miller
582a72da9a inetpeer: Introduce inet_peer_address_t.
Currently only the v4 aspect is used, but this will change.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-30 11:53:55 -08:00
David S. Miller
98158f5a85 inetpeer: Abstract out the tree root accesses.
Instead of directly accessing "peer", change to code to
operate using a "struct inet_peer_base *" pointer.

This will facilitate the addition of a seperate tree for
ipv6 peer entries.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-30 11:41:59 -08:00
Helmut Schaa
08ca944eb2 mac80211: Minor optimization in ieee80211_rx_h_data
Remove a superfluous ieee80211_is_data check as that was checked a few
lines before already and we wont't get here for non-data frames at all.

Second, the frame was already converted to 802.3 header format and
reading the fc and addr1 fields was only possible because the 802.3
header is short enough and didn't overwrite the relevant parts of the
802.11 header. Make the code more obvious by checking the ethernet
header's h_dest field.

Furthermore reorder the conditions to reduce the number of checks
when dynamic powersave is not needed (AP mode for example).

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-30 13:58:07 -05:00
Senthil Balasubramanian
8e26d5ad2f mac80211: Fix STA disconnect due to MIC failure
Th commit titled "mac80211: clean up rx handling wrt. found_sta"
removed found_sta variable which caused a MIC failure event
to be reported twice for a single failure to supplicant resulted
in STA disconnect.

This should fix WPA specific countermeasures WiFi test case (5.2.17)
issues with mac80211 based drivers which report MIC failure events in
rx status.

Cc: Stable <stable@kernel.org> (2.6.37)
Signed-off-by: Senthil Balasubramanian <senthilkumar@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-30 13:45:02 -05:00
Christian Lamparter
2c31333a8f mac80211: ignore non-bcast mcast deauth/disassoc franes
This patch fixes an curious issue due to insufficient
rx frame filtering.

Saqeb Akhter reported frequent disconnects while streaming
videos over samba: <http://marc.info/?m=128600031109136>
> [ 1166.512087] wlan1: deauthenticated from 30:46:9a:10:49:f7 (Reason: 7)
> [ 1526.059997] wlan1: deauthenticated from 30:46:9a:10:49:f7 (Reason: 7)
> [ 2125.324356] wlan1: deauthenticated from 30:46:9a:10:49:f7 (Reason: 7)
> [...]

The reason is that the device generates frames with slightly
bogus SA/TA addresses.

e.g.:
 [ 2314.402316] Ignore 9f:1f:31:f8:64:ff
 [ 2314.402321] Ignore 9f:1f:31:f8:64:ff
 [ 2352.453804] Ignore 0d:1f:31:f8:64:ff
 [ 2352.453808] Ignore 0d:1f:31:f8:64:ff
 					   ^^ the group-address flag is set!
 (the correct SA/TA would be: 00:1f:31:f8:64:ff)

Since the AP does not know from where the frames come, it
generates a DEAUTH response for the (invalid) mcast address.
This mcast deauth frame then passes through all filters and
tricks the stack into thinking that the AP brutally kicked
us!

This patch fixes the problem by simply ignoring
non-broadcast, group-addressed deauth/disassoc frames.

Cc: Jouni Malinen <j@w1.fi>
Cc: Johannes Berg <johannes@sipsolutions.net>
Reported-by: Saqeb Akhter <saqeb.akhter@gmail.com>
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-30 13:23:06 -05:00
Linus Torvalds
a01af8e4a4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (27 commits)
  af_unix: limit recursion level
  pch_gbe driver: The wrong of initializer entry
  pch_gbe dreiver: chang author
  ucc_geth: fix ucc halt problem in half duplex mode
  inet: Fix __inet_inherit_port() to correctly increment bsockets and num_owners
  ehea: Add some info messages and fix an issue
  hso: fix disable_net
  NET: wan/x25_asy, move lapb_unregister to x25_asy_close_tty
  cxgb4vf: fix setting unicast/multicast addresses ...
  net, ppp: Report correct error code if unit allocation failed
  DECnet: don't leak uninitialized stack byte
  au1000_eth: fix invalid address accessing the MAC enable register
  dccp: fix error in updating the GAR
  tcp: restrict net.ipv4.tcp_adv_win_scale (#20312)
  netns: Don't leak others' openreq-s in proc
  Net: ceph: Makefile: Remove unnessary code
  vhost/net: fix rcu check usage
  econet: fix CVE-2010-3848
  econet: fix CVE-2010-3850
  econet: disallow NULL remote addr for sendmsg(), fixes CVE-2010-3849
  ...
2010-11-29 14:36:33 -08:00
Johannes Berg
dd318575ff mac80211: fix RX aggregation locking
The RX aggregation locking documentation was
wrong, which led Christian to also code the
timer timeout handling for it somewhat wrongly.

Fix the documentation, the two places that
need to hold the reorder lock across accesses
to the structure, and the debugfs code that
should just use RCU.

Also, remove acquiring the sta->lock across
reorder timeouts since it isn't necessary, and
change a few places to GFP_KERNEL because the
code path here doesn't need atomic allocations
as I noticed when reviewing all this.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-29 15:30:30 -05:00
Johannes Berg
f30221e4ec mac80211: implement off-channel mgmt TX
This implements the new off-channel TX API
in mac80211 with a new work item type. The
operation doesn't add a new work item when
we're on the right channel and there's no
wait time so that for example p2p probe
responses will be transmitted without delay.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-29 15:24:35 -05:00
Johannes Berg
f7ca38dfe5 nl80211/cfg80211: extend mgmt-tx API for off-channel
With p2p, it is sometimes necessary to transmit
a frame (typically an action frame) on another
channel than the current channel. Enable this
through the CMD_FRAME API, and allow it to wait
for a response. A new command allows that wait
to be aborted.

However, allow userspace to specify whether or
not it wants to allow off-channel TX, it may
actually want to use the same channel only.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-29 15:24:35 -05:00
Jouni Malinen
7dff312553 mac80211: Fix frame injection using non-AP vif
In order for frame injection to work properly for some use cases
(e.g., finding the station entry and keys for encryption), mac80211
needs to find the correct sdata entry. This works when the main vif
is in AP mode, but commit a2c1e3dad5
broke this particular use case for station main vif. While this type of
injection is quite unusual operation, it has some uses and we should fix
it. Do this by changing the monitor vif sdata selection to allow station
vif to be selected instead of limiting it to just AP vifs. We still need
to skip some iftypes to avoid selecting unsuitable vif for injection.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-29 14:41:28 -05:00
David S. Miller
77148625e1 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2010-11-29 11:19:09 -08:00
Eric Dumazet
25888e3031 af_unix: limit recursion level
Its easy to eat all kernel memory and trigger NMI watchdog, using an
exploit program that queues unix sockets on top of others.

lkml ref : http://lkml.org/lkml/2010/11/25/8

This mechanism is used in applications, one choice we have is to have a
recursion limit.

Other limits might be needed as well (if we queue other types of files),
since the passfd mechanism is currently limited by socket receive queue
sizes only.

Add a recursion_level to unix socket, allowing up to 4 levels.

Each time we send an unix socket through sendfd mechanism, we copy its
recursion level (plus one) to receiver. This recursion level is cleared
when socket receive queue is emptied.

Reported-by: Марк Коренберг <socketpair@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-29 09:45:15 -08:00
Eric Dumazet
a417786948 xps: add __rcu annotations
Avoid sparse warnings : add __rcu annotations and use
rcu_dereference_protected() where necessary.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-29 09:43:13 -08:00
Eric Dumazet
b02038a17b xps: NUMA allocations for per cpu data
store_xps_map() allocates maps that are used by single cpu, it makes
sense to use NUMA allocations.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-29 09:43:13 -08:00
Tom Herbert
bf26414510 xps: Add CONFIG_XPS
This patch adds XPS_CONFIG option to enable and disable XPS.  This is
done in the same manner as RPS_CONFIG.  This is also fixes build
failure in XPS code when SMP is not enabled.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 18:24:14 -08:00
Nagendra Tomar
b4ff3c90e6 inet: Fix __inet_inherit_port() to correctly increment bsockets and num_owners
inet sockets corresponding to passive connections are added to the bind hash
using ___inet_inherit_port(). These sockets are later removed from the bind
hash using __inet_put_port(). These two functions are not exactly symmetrical.
__inet_put_port() decrements hashinfo->bsockets and tb->num_owners, whereas
___inet_inherit_port() does not increment them. This results in both of these
going to -ve values.

This patch fixes this by calling inet_bind_hash() from ___inet_inherit_port(),
which does the right thing.

'bsockets' and 'num_owners' were introduced by commit a9d8f9110d
(inet: Allowing more than 64k connections and heavily optimize bind(0))

Signed-off-by: Nagendra Singh Tomar <tomer_iisc@yahoo.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 18:18:44 -08:00
Dan Rosenberg
3c6f27bf33 DECnet: don't leak uninitialized stack byte
A single uninitialized padding byte is leaked to userspace.

Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
CC: stable <stable@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 11:32:30 -08:00
Gerrit Renker
0ac7887022 dccp: fix error in updating the GAR
This fixes a bug in updating the Greatest Acknowledgment number Received (GAR):
the current implementation does not track the greatest received value -
lower values in the range AWL..AWH (RFC 4340, 7.5.1) erase higher ones.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 11:29:27 -08:00
Jozsef Kadlecsik
82a39eb6b3 ipv6: Prepare the tree for un-inlined jhash.
jhash is widely used in the kernel and because the functions
are inlined, the cost in size is significant. Also, the new jhash
functions are slightly larger than the previous ones so better un-inline.
As a preparation step, the calls to the internal macros are replaced
with the plain jhash function calls.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 11:26:21 -08:00
andrew hendry
3f0a069a1d X25 remove bkl in call user data length ioctl
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 11:12:22 -08:00
andrew hendry
74a7e44080 X25 remove bkl from causediag ioctls
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 11:12:21 -08:00
andrew hendry
5b7958dfa5 X25 remove bkl from calluserdata ioctls
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 11:12:21 -08:00
andrew hendry
f90de66067 X25 remove bkl in facility ioctls
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 11:12:20 -08:00
andrew hendry
5595a1a599 X25 remove bkl in subscription ioctls
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 11:12:20 -08:00
John Fastabend
19eb5cc559 8021q: vlan device is lockless do not transfer real_num_{tx|rx}_queues
Now that the vlan device is lockless and single queue do not
transfer the real num queues. This is causing a BUG_ON to occur.

kernel BUG at net/8021q/vlan.c:345!
Call Trace:
[<ffffffff813fd6e8>] ? fib_rules_event+0x28/0x1b0
[<ffffffff814ad2b5>] notifier_call_chain+0x55/0x80
[<ffffffff81089156>] raw_notifier_call_chain+0x16/0x20
[<ffffffff813e5af7>] call_netdevice_notifiers+0x37/0x70
[<ffffffff813e6756>] netdev_features_change+0x16/0x20
[<ffffffffa02995be>] ixgbe_fcoe_enable+0xae/0x100 [ixgbe]
[<ffffffffa01da06a>] vlan_dev_fcoe_enable+0x2a/0x30 [8021q]
[<ffffffffa02d08c3>] fcoe_create+0x163/0x630 [fcoe]
[<ffffffff811244d5>] ? mmap_region+0x255/0x5a0
[<ffffffff81080ef0>] param_attr_store+0x50/0x80
[<ffffffff810809b6>] module_attr_store+0x26/0x30
[<ffffffff811b9db2>] sysfs_write_file+0xf2/0x180
[<ffffffff8114fc88>] vfs_write+0xc8/0x190
[<ffffffff81150621>] sys_write+0x51/0x90
[<ffffffff8100c0b2>] system_call_fastpath+0x16/0x1b

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 10:47:19 -08:00
Eric Dumazet
5a0d2268d2 net: add netif_tx_queue_frozen_or_stopped
When testing struct netdev_queue state against FROZEN bit, we also test
XOFF bit. We can test both bits at once and save some cycles.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 10:47:18 -08:00
Shan Wei
d3c15cab21 ipv6: kill two unused macro definition
1. IPV6_TLV_TEL_DST_SIZE
This has not been using for several years since created.

2. RT6_INFO_LEN
commit 33120b30 kill all RT6_INFO_LEN's references, but only this definition remained.

commit 33120b30cc
Author: Alexey Dobriyan <adobriyan@sw.ru>
Date:   Tue Nov 6 05:27:11 2007 -0800

    [IPV6]: Convert /proc/net/ipv6_route to seq_file interface

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 10:47:17 -08:00
Uwe Kleine-König
a40c9f88b5 net: add some KERN_CONT markers to continuation lines
Cc: netdev@vger.kernel.org
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 10:47:17 -08:00
Alexey Dobriyan
0147fc058d tcp: restrict net.ipv4.tcp_adv_win_scale (#20312)
tcp_win_from_space() does the following:

      if (sysctl_tcp_adv_win_scale <= 0)
              return space >> (-sysctl_tcp_adv_win_scale);
      else
              return space - (space >> sysctl_tcp_adv_win_scale);

"space" is int.

As per C99 6.5.7 (3) shifting int for 32 or more bits is
undefined behaviour.

Indeed, if sysctl_tcp_adv_win_scale is exactly 32,
space >> 32 equals space and function returns 0.

Which means we busyloop in tcp_fixup_rcvbuf().

Restrict net.ipv4.tcp_adv_win_scale to [-31, 31].

Fix https://bugzilla.kernel.org/show_bug.cgi?id=20312

Steps to reproduce:

      echo 32 >/proc/sys/net/ipv4/tcp_adv_win_scale
      wget www.kernel.org
      [softlockup]

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 10:39:45 -08:00
Pavel Emelyanov
8475ef9fd1 netns: Don't leak others' openreq-s in proc
The /proc/net/tcp leaks openreq sockets from other namespaces.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-27 22:57:48 -08:00
Thomas Graf
cf7afbfeb8 rtnl: make link af-specific updates atomic
As David pointed out correctly, updates to af-specific attributes
are currently not atomic. If multiple changes are requested and
one of them fails, previous updates may have been applied already
leaving the link behind in a undefined state.

This patch splits the function parse_link_af() into two functions
validate_link_af() and set_link_at(). validate_link_af() is placed
to validate_linkmsg() check for errors as early as possible before
any changes to the link have been made. set_link_af() is called to
commit the changes later.

This method is not fail proof, while it is currently sufficient
to make set_link_af() inerrable and thus 100% atomic, the
validation function method will not be able to detect all error
scenarios in the future, there will likely always be errors
depending on states which are f.e. not protected by rtnl_mutex
and thus may change between validation and setting.

Also, instead of silently ignoring unknown address families and
config blocks for address families which did not register a set
function the errors EAFNOSUPPORT respectively EOPNOSUPPORT are
returned to avoid comitting 4 out of 5 update requests without
notifying the user.

Signed-off-by: Thomas Graf <tgraf@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-27 22:56:08 -08:00
Tracey Dent
4cb6a614ba Net: ceph: Makefile: Remove unnessary code
Remove the if and else conditional because the code is in mainline and there
is no need in it being there.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-27 17:39:29 -08:00
Linus Torvalds
19650e8580 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: Ensure we return the dirent->d_type when it is known
  NFS: Correct the array bound calculation in nfs_readdir_add_to_array
  NFS: Don't ignore errors from nfs_do_filldir()
  NFS: Fix the error handling in "uncached_readdir()"
  NFS: Fix a page leak in uncached_readdir()
  NFS: Fix a page leak in nfs_do_filldir()
  NFS: Assume eof if the server returns no readdir records
  NFS: Buffer overflow in ->decode_dirent() should not be fatal
  Pure nfs client performance using odirect.
  SUNRPC: Fix an infinite loop in call_refresh/call_refreshresult
2010-11-27 07:30:30 +09:00
Hans Schillstrom
b880c1f077 IPVS: Backup, adding version 0 sending capabilities
This patch adds a sysclt net.ipv4.vs.sync_version
that can be used to send sync msg in version 0 or 1 format.

sync_version value is logical,
     Value 1 (default) New version
           0 Plain old version

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-11-25 10:42:59 +09:00
Hans Schillstrom
986a075795 IPVS: Backup, Change sending to Version 1 format
Enable sending and removal of version 0 sending
Affected functions,

ip_vs_sync_buff_create()
ip_vs_sync_conn()

ip_vs_core.c removal of IPv4 check.

*v5
 Just check cp->pe_data_len in ip_vs_sync_conn
 Check if padding needed before adding a new sync_conn
 to the buffer, i.e. avoid sending padding at the end.

*v4
 moved sanity check and pe_name_len after sloop.
 use cp->pe instead of cp->dest->svc->pe
 real length in each sync_conn, not padded length
 however total size of a sync_msg includes padding.

*v3
 Sending ip_vs_sync_conn_options in network order.
 Sending Templates for ONE_PACKET conn.
 Renaming of ip_vs_sync_mesg to ip_vs_sync_mesg_v0

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-11-25 10:42:59 +09:00
Hans Schillstrom
fe5e7a1efb IPVS: Backup, Adding Version 1 receive capability
Functionality improvements
 * flags  changed from 16 to 32 bits
 * fwmark added (32 bits)
 * timeout in sec. added (32 bits)
 * pe data added (Variable length)
 * IPv6 capabilities (3x16 bytes for addr.)
 * Version and type in every conn msg.

ip_vs_process_message() now handles Version 1 messages
and will call ip_vs_process_message_v0() for version 0 messages.

ip_vs_proc_conn() is common for both version, and handles the update of
connection hash.

ip_vs_conn_fill_param_sync()    - Version 1 messages only
ip_vs_conn_fill_param_sync_v0() - Version 0 messages only

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-11-25 10:42:59 +09:00
Hans Schillstrom
2981bc9a63 IPVS: Backup, Adding structs for new sync format
New structs defined for version 1 of sync.

 * ip_vs_sync_v4       Ipv4 base format struct
 * ip_vs_sync_v6       Ipv6 base format struct

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-11-25 10:42:59 +09:00
Hans Schillstrom
a5959d53d6 IPVS: Handle Scheduling errors.
If ip_vs_conn_fill_param_persist return an error to ip_vs_sched_persist,
this error must propagate as ignored=-1 to ip_vs_schedule().
Errors from ip_vs_conn_new() in ip_vs_sched_persist() and ip_vs_schedule()
should also return *ignored=-1;

This patch just relies on the fact that ignored is 1 before calling
ip_vs_sched_persist().

Sent from Julian:
  "The new case when ip_vs_conn_fill_param_persist fails
   should set *ignored = -1, so that we can use NF_DROP,
   see below. *ignored = -1 should be also used for ip_vs_conn_new
   failure in ip_vs_sched_persist() and ip_vs_schedule().
   The new negative value should be handled in tcp,udp,sctp"

"To summarize:

- *ignored = 1:
      protocol tried to schedule (eg. on SYN), found svc but the
      svc/scheduler decides that this packet should be accepted with
      NF_ACCEPT because it must not be scheduled.

- *ignored = 0:
      scheduler can not find destination, so try bypass or
      return ICMP and then NF_DROP (ip_vs_leave).

- *ignored = -1:
      scheduler tried to schedule but fatal error occurred, eg.
      ip_vs_conn_new failure (ENOMEM) or ip_vs_sip_fill_param
      failure such as missing Call-ID, ENOMEM on skb_linearize
      or pe_data. In this case we should return NF_DROP without
      any attempts to send ICMP with ip_vs_leave."

More or less all ideas and input to this patch is work from
Julian Anastasov

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-11-25 10:42:59 +09:00
Hans Schillstrom
3716522653 IPVS: skb defrag in L7 helpers
L7 helpers like sip needs skb defrag
since L7 data can be fragmented.

This patch requires "IPVS Break ports-2 into src_port and dst_port" patch

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-11-25 10:42:58 +09:00
Hans Schillstrom
ce144f249f IPVS: Split ports[2] into src_port and dst_port
Avoid sending invalid pointer due to skb_linearize() call.
This patch prepares for next patch where skb_linearize is a part.

In ip_vs_sched_persist() params the ports ptr will be replaced by
src and dst port.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-11-25 10:42:58 +09:00
Hans Schillstrom
0e051e683b IPVS: Backup, Prepare for transferring firewall marks (fwmark) to the backup daemon.
One struct will have fwmark added:
 * ip_vs_conn

ip_vs_conn_new() and ip_vs_find_dest()
will have an extra param - fwmark
The effects of that, is in this patch.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-11-25 10:42:58 +09:00
John W. Linville
51cce8a590 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem 2010-11-24 16:49:20 -05:00
Johannes Berg
99ba2a1428 mac80211: implement packet loss notification
For drivers that have accurate TX status reporting
we can report the number of consecutive lost packets
to userspace using the new cfg80211 CQM event. The
threshold is fixed right now, this may need to be
improved in the future.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:19:36 -05:00
Johannes Berg
c063dbf52b cfg80211: allow using CQM event to notify packet loss
This adds the ability for drivers to use CQM events
to notify about packet loss for specific stations
(which could be the AP for the managed mode case).
Since the threshold might be determined by the
driver (it isn't passed in right now) it will be
passed out of the driver to userspace in the event.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:19:36 -05:00
Luis R. Rodriguez
48124d1a91 mac80211: avoid aggregation for VO traffic
This should help with latency issues which can happen when
using aggregation.

Cc: Felix Fietkau <nbd@openwrt.org>
Cc: Matt Smith <matt.smith@atheros.com>
Cc: Senthil Balasubramanian <senthilkumar@atheros.com>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:19:36 -05:00
Felix Fietkau
72a8a3edd6 mac80211: reduce the number of retries for nullfunc probing
Since nullfunc frames are transmitted as unicast frames, they're more
reliable than the broadcast probe requests, so we need fewer retries
to figure out whether the AP is really gone.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:19:35 -05:00
Felix Fietkau
4e5ff37692 mac80211: use nullfunc instead of probe request for connection monitoring
nullfunc frames are better for connection monitoring, because probe requests
are answered even if the AP has already dropped the connection, whereas
nullfunc frames from an unassociated station will trigger a disassoc/deauth
frame from the AP (WLAN_REASON_CLASS3_FRAME_FROM_NONASSOC_STA), which allows
the station to reconnect immediately instead of waiting until it attempts to
transmit the next unicast frame.

This only works on hardware with reliable tx ACK reporting, any other hardware
needs to fall back to the probe request method.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:19:35 -05:00
Felix Fietkau
dd5b4cc71c cfg80211/mac80211: improve ad-hoc multicast rate handling
- store the multicast rate as an index instead of the rate value
  (reduces cpu overhead in a hotpath)
- validate the rate values (must match a bitrate in at least one sband)

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:19:35 -05:00
Felix Fietkau
46090979a5 mac80211: probe the AP when resuming
Check the connection by probing the AP (either using nullfunc or a
probe request). If nullfunc probing is supported and the assoc is no
longer valid, the AP will send a disassoc/deauth immediately.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:19:34 -05:00
Felix Fietkau
7ccc8bd759 mac80211: calculate beacon loss time accurately
Instead of using a fixed 2 second timeout, calculate beacon loss interval
from the advertised beacon interval and a frame count.  With this beacon
loss happens after N (default 7) consecutive frames are missed which
for a typical setup (100TU beacon interval) is ~700ms (or ~1/3 previous).

Signed-off-by: Sam Leffler <sleffler@chromium.org>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:19:34 -05:00
Felix Fietkau
c8a7972c3b mac80211: restart beacon miss timer on system resume from suspend
Signed-off-by: Paul Stewart <pstew@google.com>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:19:34 -05:00
Joe Perches
e9c0268f02 net/wireless: Use pr_<level> and netdev_<level>
No change in output for pr_<level> prefixes.
netdev_<level> output is different, arguably improved.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:19:33 -05:00
John W. Linville
d7a066c923 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-11-24 16:19:24 -05:00
John W. Linville
ccb1435401 Revert "nl80211/mac80211: Report signal average"
This reverts commit 86107fd170.

This patch inadvertantly changed the userland ABI.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:18:36 -05:00
Phil Blundell
a27e13d370 econet: fix CVE-2010-3848
Don't declare variable sized array of iovecs on the stack since this
could cause stack overflow if msg->msgiovlen is large.  Instead, coalesce
the user-supplied data into a new buffer and use a single iovec for it.

Signed-off-by: Phil Blundell <philb@gnu.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-24 11:51:47 -08:00
Phil Blundell
16c41745c7 econet: fix CVE-2010-3850
Add missing check for capable(CAP_NET_ADMIN) in SIOCSIFADDR operation.

Signed-off-by: Phil Blundell <philb@gnu.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-24 11:49:53 -08:00
Phil Blundell
fa0e846494 econet: disallow NULL remote addr for sendmsg(), fixes CVE-2010-3849
Later parts of econet_sendmsg() rely on saddr != NULL, so return early
with EINVAL if NULL was passed otherwise an oops may occur.

Signed-off-by: Phil Blundell <philb@gnu.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-24 11:49:19 -08:00
David S. Miller
c39508d6f1 tcp: Make TCP_MAXSEG minimum more correct.
Use TCP_MIN_MSS instead of constant 64.

Reported-by: Min Zhang <mzhang@mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-24 11:47:22 -08:00
Tom Herbert
1d24eb4815 xps: Transmit Packet Steering
This patch implements transmit packet steering (XPS) for multiqueue
devices.  XPS selects a transmit queue during packet transmission based
on configuration.  This is done by mapping the CPU transmitting the
packet to a queue.  This is the transmit side analogue to RPS-- where
RPS is selecting a CPU based on receive queue, XPS selects a queue
based on the CPU (previously there was an XPS patch from Eric
Dumazet, but that might more appropriately be called transmit completion
steering).

Each transmit queue can be associated with a number of CPUs which will
use the queue to send packets.  This is configured as a CPU mask on a
per queue basis in:

/sys/class/net/eth<n>/queues/tx-<n>/xps_cpus

The mappings are stored per device in an inverted data structure that
maps CPUs to queues.  In the netdevice structure this is an array of
num_possible_cpu structures where each structure holds and array of
queue_indexes for queues which that CPU can use.

The benefits of XPS are improved locality in the per queue data
structures.  Also, transmit completions are more likely to be done
nearer to the sending thread, so this should promote locality back
to the socket on free (e.g. UDP).  The benefits of XPS are dependent on
cache hierarchy, application load, and other factors.  XPS would
nominally be configured so that a queue would only be shared by CPUs
which are sharing a cache, the degenerative configuration woud be that
each CPU has it's own queue.

Below are some benchmark results which show the potential benfit of
this patch.  The netperf test has 500 instances of netperf TCP_RR test
with 1 byte req. and resp.

bnx2x on 16 core AMD
   XPS (16 queues, 1 TX queue per CPU)  1234K at 100% CPU
   No XPS (16 queues)                   996K at 100% CPU

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-24 11:44:20 -08:00
Tom Herbert
3853b5841c xps: Improvements in TX queue selection
In dev_pick_tx, don't do work in calculating queue
index or setting
the index in the sock unless the device has more than one queue.  This
allows the sock to be set only with a queue index of a multi-queue
device which is desirable if device are stacked like in a tunnel.

We also allow the mapping of a socket to queue to be changed.  To
maintain in order packet transmission a flag (ooo_okay) has been
added to the sk_buff structure.  If a transport layer sets this flag
on a packet, the transmit queue can be changed for the socket.
Presumably, the transport would set this if there was no possbility
of creating OOO packets (for instance, there are no packets in flight
for the socket).  This patch includes the modification in TCP output
for setting this flag.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-24 11:44:19 -08:00
Eric Dumazet
bba14de987 scm: lower SCM_MAX_FD
Lower SCM_MAX_FD from 255 to 253 so that allocations for scm_fp_list are
halved. (commit f8d570a4 added two pointers in this structure)

scm_fp_dup() should not copy whole structure (and trigger kmemcheck
warnings), but only the used part. While we are at it, only allocate
needed size.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-24 11:16:43 -08:00
Eric Dumazet
456b61bca8 ipv6: mcast: RCU conversion
ipv6_sk_mc_lock rwlock becomes a spinlock.

readers (inet6_mc_check()) now takes rcu_read_lock() instead of read
lock. Writers dont need to disable BH anymore.

struct ipv6_mc_socklist objects are reclaimed after one RCU grace
period.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-24 11:16:42 -08:00
Eric Dumazet
9915672d41 af_unix: limit unix_tot_inflight
Vegard Nossum found a unix socket OOM was possible, posting an exploit
program.

My analysis is we can eat all LOWMEM memory before unix_gc() being
called from unix_release_sock(). Moreover, the thread blocked in
unix_gc() can consume huge amount of time to perform cleanup because of
huge working set.

One way to handle this is to have a sensible limit on unix_tot_inflight,
tested from wait_for_unix_gc() and to force a call to unix_gc() if this
limit is hit.

This solves the OOM and also reduce overall latencies, and should not
slowdown normal workloads.

Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-24 09:15:27 -08:00
Linus Torvalds
3cbaa0f7a7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  of/phylib: Use device tree properties to initialize Marvell PHYs.
  phylib: Add support for Marvell 88E1149R devices.
  phylib: Use common page register definition for Marvell PHYs.
  qlge: Fix incorrect usage of module parameters and netdev msg level
  ipv6: fix missing in6_ifa_put in addrconf
  SuperH IrDA: correct Baud rate error correction
  atl1c: Fix hardware type check for enabling OTP CLK
  net: allow GFP_HIGHMEM in __vmalloc()
  bonding: change list contact to netdev@vger.kernel.org
  e1000: fix screaming IRQ
2010-11-24 08:22:34 +09:00
Helmut Schaa
18890d4b89 mac80211: Disable hw crypto for GTKs on AP VLAN interfaces
When using AP VLAN interfaces, each VLAN interface should be in its own
broadcast domain. Hostapd achieves this by assigning different GTKs to
different AP VLAN interfaces.

However, mac80211 drivers are not aware of AP VLAN interfaces and as
such mac80211 sends the GTK to the driver in the context of the base AP
mode interface. This causes problems when multiple AP VLAN interfaces
are used since the driver will use the same key slot for the different
GTKs (there's no way for the driver to distinguish the different GTKs
from different AP VLAN interfaces). Thus, only the clients associated
to one AP VLAN interface (the one that was created last) can actually
use broadcast traffic.

Fix this by not programming any GTKs for AP VLAN interfaces into the hw
but fall back to using software crypto. The GTK for the underlying AP
interface is still sent to the driver.

That means, broadcast traffic to stations associated to an AP VLAN
interface is encrypted in software whereas broadcast traffic to
stations associated to the non-VLAN AP interface is encrypted in
hardware.

Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-22 15:48:51 -05:00
Luis R. Rodriguez
b2e253cf30 cfg80211: Fix regulatory bug with multiple cards and delays
When two cards are connected with the same regulatory domain
if CRDA had a delayed response then cfg80211's own set regulatory
domain would still be the world regulatory domain. There was a bug
on cfg80211's logic such that it assumed that once you pegged a
request as the last request it was already the currently set
regulatory domain. This would mean we would race setting a stale
regulatory domain to secondary cards which had the same regulatory
domain since the alpha2 would match.

We fix this by processing each regulatory request atomically,
and only move on to the next one once we get it fully processed.
In the case CRDA is not present we will simply world roam.

This issue is only present when you have a slow system and the
CRDA processing is delayed. Because of this it is not a known
regression.

Without this fix when a delay is present with CRDA the second card
would end up with an intersected regulatory domain and not allow it
to use the channels it really is designed for. When two cards with
two different regulatory domains were inserted you'd end up
rejecting the second card's regulatory domain request.
This fails with mac80211_hswim's regtest=2 (two requests, same alpha2)
and regtest=3 (two requests, different alpha2) module parameter
options.

This was reproduced and tested against mac80211_hwsim using this
CRDA delayer:

       #!/bin/bash
       echo $COUNTRY >> /tmp/log
       sleep 2
       /sbin/crda.orig

And these regulatory tests:

       modprobe mac80211_hwsim regtest=2
       modprobe mac80211_hwsim regtest=3

Reported-by: Mark Mentovai <mark@moxienet.com>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Tested-by: Mark Mentovai <mark@moxienet.com>
Tested-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-22 15:48:51 -05:00
Luis R. Rodriguez
b0e2880b05 cfg80211: move mutex locking to reg_process_pending_hints()
This will be required in the next patch and it makes the
next patch easier to review.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Tested-by: Mark Mentovai <mark@moxienet.com>
Tested-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-22 15:48:50 -05:00
Luis R. Rodriguez
f333a7a2f4 cfg80211: move reg_work and reg_todo above
These will be used earlier in the next few patches.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Tested-by: Mark Mentovai <mark@moxienet.com>
Tested-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-22 15:48:50 -05:00
Luis R. Rodriguez
31e99729ae cfg80211: put core regulatory request into queue
This will simplify the synchronization for pending requests.
Without this we have a race between the core and when we
restore regulatory settings, although this is unlikely
its best to just avoid that race altogether.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Tested-by: Mark Mentovai <mark@moxienet.com>
Tested-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-22 15:48:50 -05:00
Gustavo F. Padovan
c89ad73722 Bluetooth: Fix not returning proper error in SCO
Return 0 in that situation could lead to errors in the caller.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-11-22 18:23:18 -02:00
Trond Myklebust
5fc43978a7 SUNRPC: Fix an infinite loop in call_refresh/call_refreshresult
If the rpcauth_refreshcred() call returns an error other than
EACCES, ENOMEM or ETIMEDOUT, we currently end up looping forever
between call_refresh and call_refreshresult.

The correct thing to do here is to exit on all errors except
EAGAIN and ETIMEDOUT, for which case we retry 3 times, then
return EACCES.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-22 13:22:39 -05:00
Tracey Dent
e5700c740d Net: wanrouter: Makefile: Remove deprecated kbuild goal definitions
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:16 -08:00
Tracey Dent
fdb26195f4 Net: sunrpc: auth_gss: Makefile: Remove deprecated kbuild goal definitions
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:16 -08:00
Tracey Dent
9ed05ad3c0 Net: rxrpc: Makefile: Remove deprecated kbuild goal definitions
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:15 -08:00
Tracey Dent
094f2faaa2 Net: rds: Makefile: Remove deprecated items
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Also, use the ccflags-$ flag instead of EXTRA_CFLAGS because EXTRA_CFLAGS is
deprecated and should now be switched.

Last but not least, took out if-conditionals.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:15 -08:00
Tracey Dent
927a41f50c Net: phonet: Makefile: Remove deprecated kbuild goal definitions
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:14 -08:00
Tracey Dent
64387df8da Net: lapb: Makefile: Remove deprecated kbuild goal definitions
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:14 -08:00
Tracey Dent
94ee288e94 Net: irda: irnet: Makefile: Remove deprecated kbuild goal definitions
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:13 -08:00
Tracey Dent
cd30c62024 Net: irda: irlan: Makefile: Remove deprecated kbuild goal definitions
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:13 -08:00
Tracey Dent
f91c4ae498 Net: irda: ircomm: Makefile: Remove deprecated kbuild goal defintions
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:12 -08:00
Tracey Dent
4de58dfebe Net: ipv6: netfiliter: Makefile: Remove deprecated kbuild goal definitions
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:12 -08:00
Tracey Dent
6b8ff8c517 Net: ipv4: netfilter: Makefile: Remove deprecated kbuild goal definitions
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:11 -08:00
Tracey Dent
cc0bdac399 Net: econet: Makefile: Remove deprecated kbuild goal definitions
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:11 -08:00
Tracey Dent
22674a24b4 Net: dns_resolver: Makefile: Remove deprecated kbuild goal definitions
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:10 -08:00
Tracey Dent
fa13bc3daa Net: ceph: Makefile: remove deprecated kbuild goal definitions
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:10 -08:00
Tracey Dent
bac14e0178 Net: can: Makefile: Remove deprecated kbuild goal definitions
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:09 -08:00
Tracey Dent
a3106d032f Net: caif: Makefile: Remove deprecated items
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Also, use the ccflags-$ flag instead of EXTRA_CFLAGS because EXTRA_CFLAGS is
deprecated and should now be switched.

Last but not least, took out if-conditionals.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:09 -08:00
Tracey Dent
87217502d4 Net: bluetooth: Makefile: Remove deprecated kbuild goal definitions
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:08 -08:00
John Fastabend
88b2a9a3d9 ipv6: fix missing in6_ifa_put in addrconf
Fix ref count bug introduced by

commit 2de7957072
Author: Lorenzo Colitti <lorenzo@google.com>
Date:   Wed Oct 27 18:16:49 2010 +0000

ipv6: addrconf: don't remove address state on ifdown if the address
is being kept

Fix logic so that addrconf_ifdown() decrements the inet6_ifaddr
refcnt correctly with in6_ifa_put().

Reported-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 07:37:36 -08:00
Eric Dumazet
551eaff1b3 pktgen: allow faster module unload
Unloading pktgen module needs ~6 seconds on a 64 cpus machine, to stop
64 kthreads.

Add a pktgen_exiting variable to let kernel threads die faster, so that
kthread_stop() doesnt have to wait too long for them. This variable is
not tested in fast path.

Note : Before exiting from pktgen_thread_worker(), we must make sure
kthread_stop() is waiting for this thread to be stopped, like its done
in kernel/softirq.c

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-21 10:26:44 -08:00
Eric Dumazet
7a1c8e5ab1 net: allow GFP_HIGHMEM in __vmalloc()
We forgot to use __GFP_HIGHMEM in several __vmalloc() calls.

In ceph, add the missing flag.

In fib_trie.c, xfrm_hash.c and request_sock.c, using vzalloc() is
cleaner and allows using HIGHMEM pages as well.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-21 10:04:04 -08:00
Eric Dumazet
bbce5a59e4 packet: use vzalloc()
alloc_one_pg_vec_page() is supposed to return zeroed memory, so use
vzalloc() instead of vmalloc()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-21 10:01:42 -08:00
J. Bruce Fields
9c335c0b8d svcrpc: fix wspace-checking race
We call svc_xprt_enqueue() after something happens which we think may
require handling from a server thread.  To avoid such events being lost,
svc_xprt_enqueue() must guarantee that there will be a svc_serv() call
from a server thread following any such event.  It does that by either
waking up a server thread itself, or checking that XPT_BUSY is set (in
which case somebody else is doing it).

But the check of XPT_BUSY could occur just as someone finishes
processing some other event, and just before they clear XPT_BUSY.

Therefore it's important not to clear XPT_BUSY without subsequently
doing another svc_export_enqueue() to check whether the xprt should be
requeued.

The xpo_wspace() check in svc_xprt_enqueue() breaks this rule, allowing
an event to be missed in situations like:

	data arrives
	call svc_tcp_data_ready():
	call svc_xprt_enqueue():
	set BUSY
	find no write space
				svc_reserve():
				free up write space
				call svc_enqueue():
				test BUSY
	clear BUSY

So, instead, check wspace in the same places that the state flags are
checked: before taking BUSY, and in svc_receive().

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-11-19 18:35:12 -05:00
J. Bruce Fields
b176331627 svcrpc: svc_close_xprt comment
Neil Brown had to explain to me why we do this here; record the answer
for posterity.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-11-19 18:35:12 -05:00
J. Bruce Fields
f8c0d226fe svcrpc: simplify svc_close_all
There's no need to be fooling with XPT_BUSY now that all the threads
are gone.

The list_del_init() here could execute at the same time as the
svc_xprt_enqueue()'s list_add_tail(), with undefined results.  We don't
really care at this point, but it might result in a spurious
list-corruption warning or something.

And svc_close() isn't adding any value; just call svc_delete_xprt()
directly.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-11-19 18:35:11 -05:00
J. Bruce Fields
ca7896cd83 nfsd4: centralize more calls to svc_xprt_received
Follow up on b48fa6b991 by moving all the
svc_xprt_received() calls for the main xprt to one place.  The clearing
of XPT_BUSY here is critical to the correctness of the server, so I'd
prefer it to be obvious where we do it.

The only substantive result is moving svc_xprt_received() after
svc_receive_deferred().  Other than a (likely insignificant) delay
waking up the next thread, that should be harmless.

Also reshuffle the exit code a little to skip a few other steps that we
don't care about the in the svc_delete_xprt() case.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-11-19 18:35:11 -05:00
J. Bruce Fields
62bac4af3d svcrpc: don't set then immediately clear XPT_DEFERRED
There's no harm to doing this, since the only caller will immediately
call svc_enqueue() afterwards, ensuring we don't miss the remaining
deferred requests just because XPT_DEFERRED was briefly cleared.

But why not just do this the simple way?

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-11-19 18:35:11 -05:00
Linus Torvalds
76db8ac45f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: fix readdir EOVERFLOW on 32-bit archs
  ceph: fix frag offset for non-leftmost frags
  ceph: fix dangling pointer
  ceph: explicitly specify page alignment in network messages
  ceph: make page alignment explicit in osd interface
  ceph: fix comment, remove extraneous args
  ceph: fix update of ctime from MDS
  ceph: fix version check on racing inode updates
  ceph: fix uid/gid on resent mds requests
  ceph: fix rdcache_gen usage and invalidate
  ceph: re-request max_size if cap auth changes
  ceph: only let auth caps update max_size
  ceph: fix open for write on clustered mds
  ceph: fix bad pointer dereference in ceph_fill_trace
  ceph: fix small seq message skipping
  Revert "ceph: update issue_seq on cap grant"
2010-11-19 15:32:22 -08:00
Linus Torvalds
caf8394524 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (31 commits)
  net: fix kernel-doc for sk_filter_rcu_release
  be2net: Fix to avoid firmware update when interface is not open.
  netfilter: fix IP_VS dependencies
  net: irda: irttp: sync error paths of data- and udata-requests
  ipv6: Expose reachable and retrans timer values as msecs
  ipv6: Expose IFLA_PROTINFO timer values in msecs instead of jiffies
  3c59x: fix build failure on !CONFIG_PCI
  ipg.c: remove id [SUNDANCE, 0x1021]
  net: caif: spi: fix potential NULL dereference
  ath9k_htc: Avoid setting QoS control for non-QoS frames
  net: zero kobject in rx_queue_release
  net: Fix duplicate volatile warning.
  MAINTAINERS: Add stmmac maintainer
  bonding: fix a race in IGMP handling
  cfg80211: fix can_beacon_sec_chan, reenable HT40
  gianfar: fix signedness issue
  net: bnx2x: fix error value sign
  8139cp: fix checksum broken
  r8169: fix checksum broken
  rds: Integer overflow in RDS cmsg handling
  ...
2010-11-19 15:25:59 -08:00
David S. Miller
24912420e9 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/bonding/bond_main.c
	net/core/net-sysfs.c
	net/ipv6/addrconf.c
2010-11-19 13:13:47 -08:00
andrew hendry
0670b8ae66 X25: remove bkl in routing ioctls
Routing doesn't use the socket data and is protected by x25_route_list_lock

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-19 12:40:02 -08:00
andrew hendry
54aafbd498 X25: remove bkl in inq and outq ioctls
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-19 12:40:01 -08:00
andrew hendry
1ecd66bf2c X25: remove bkl in timestamp ioctls
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-19 12:40:01 -08:00
andrew hendry
70be998c2b X25: pushdown bkl in ioctls
Push down the bkl in the ioctls so they can be removed one at a time.

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-19 12:40:00 -08:00
Eric Dumazet
c26aed40f4 filter: use reciprocal divide
At compile time, we can replace the DIV_K instruction (divide by a
constant value) by a reciprocal divide.

At exec time, the expensive divide is replaced by a multiply, a less
expensive operation on most processors.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-19 10:06:54 -08:00
Eric Dumazet
8c1592d68b filter: cleanup codes[] init
Starting the translated instruction to 1 instead of 0 allows us to
remove one descrement at check time and makes codes[] array init
cleaner.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-19 10:04:23 -08:00
Eric Dumazet
93aaae2e01 filter: optimize sk_run_filter
Remove pc variable to avoid arithmetic to compute fentry at each filter
instruction. Jumps directly manipulate fentry pointer.

As the last instruction of filter[] is guaranteed to be a RETURN, and
all jumps are before the last instruction, we dont need to check filter
bounds (number of instructions in filter array) at each iteration, so we
remove it from sk_run_filter() params.

On x86_32 remove f_k var introduced in commit 57fe93b374
(filter: make sure filters dont read uninitialized memory)

Note : We could use a CONFIG_ARCH_HAS_{FEW|MANY}_REGISTERS in order to
avoid too many ifdefs in this code.

This helps compiler to use cpu registers to hold fentry and A
accumulator.

On x86_32, this saves 401 bytes, and more important, sk_run_filter()
runs much faster because less register pressure (One less conditional
branch per BPF instruction)

# size net/core/filter.o net/core/filter_pre.o
   text    data     bss     dec     hex filename
   2948       0       0    2948     b84 net/core/filter.o
   3349       0       0    3349     d15 net/core/filter_pre.o

on x86_64 :
# size net/core/filter.o net/core/filter_pre.o
   text    data     bss     dec     hex filename
   5173       0       0    5173    1435 net/core/filter.o
   5224       0       0    5224    1468 net/core/filter_pre.o

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-19 09:49:59 -08:00
Randy Dunlap
0302b8622c net: fix kernel-doc for sk_filter_rcu_release
Fix kernel-doc warning for sk_filter_rcu_release():

Warning(net/core/filter.c:586): missing initial short description on line:
 * 	sk_filter_rcu_release: Release a socket filter by rcu_head

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc:	"David S. Miller" <davem@davemloft.net>
Cc:	netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-19 09:27:15 -08:00
Patrick McHardy
dba4490d22 netfilter: fix IP_VS dependencies
When NF_CONNTRACK is enabled, IP_VS uses conntrack symbols.
Therefore IP_VS can't be linked statically when conntrack
is built modular.

Reported-by: Justin P. Mattock <justinmattock@gmail.com>
Tested-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-18 13:14:33 -08:00
Wolfram Sang
925e277f52 net: irda: irttp: sync error paths of data- and udata-requests
irttp_data_request() returns meaningful errorcodes, while irttp_udata_request()
just returns -1 in similar situations. Sync the two and the loglevels of the
accompanying output.

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-18 12:24:25 -08:00
Thomas Graf
18a31e1e28 ipv6: Expose reachable and retrans timer values as msecs
Expose reachable and retrans timer values in msecs instead of jiffies.
Both timer values are already exposed as msecs in the neighbour table
netlink interface.

The creation timestamp format with increased precision is kept but
cleaned up.

Signed-off-by: Thomas Graf <tgraf@infradead.org>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-18 12:08:36 -08:00
David S. Miller
07bfa524d4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-11-18 11:56:09 -08:00
Bruno Randolf
86107fd170 nl80211/mac80211: Report signal average
Extend nl80211 to report an exponential weighted moving average (EWMA) of the
signal value. Since the signal value usually fluctuates between different
packets, an average can be more useful than the value of the last packet.

This uses the recently added generic EWMA library function.

Signed-off-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-18 14:22:20 -05:00
Thomas Graf
93908d1926 ipv6: Expose IFLA_PROTINFO timer values in msecs instead of jiffies
IFLA_PROTINFO exposes timer related per device settings in jiffies.
Change it to expose these values in msecs like the sysctl interface
does.

I did not find any users of IFLA_PROTINFO which rely on any of these
values and even if there are, they are likely already broken because
there is no way for them to reliably convert such a value to another
time format.

Signed-off-by: Thomas Graf <tgraf@infradead.org>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-18 11:05:01 -08:00
Eric Dumazet
57e1ab6ead igmp: refine skb allocations
IGMP allocates MTU sized skbs. This may fail for large MTU (order-2
allocations), so add a fallback to try lower sizes.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-18 11:02:23 -08:00
Changli Gao
4c3710afbc net: move definitions of BPF_S_* to net/core/filter.c
BPF_S_* are used internally, should not be exposed to the others.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-18 10:59:51 -08:00
Tetsuo Handa
cba328fc5e filter: Optimize instruction revalidation code.
Since repeating u16 value to u8 value conversion using switch() clause's
case statement is wasteful, this patch introduces u16 to u8 mapping table
and removes most of case statements. As a result, the size of net/core/filter.o
is reduced by about 29% on x86.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-18 10:58:35 -08:00
John Fastabend
9e50e3ac5a net: add priority field to pktgen
Add option to set skb priority to pktgen. Useful for testing
QOS features. Also by running pktgen on the vlan device the
qdisc on the real device can be tested.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-18 10:43:07 -08:00
John Fastabend
7d8e76bf9a net: zero kobject in rx_queue_release
netif_set_real_num_rx_queues() can decrement and increment
the number of rx queues. For example ixgbe does this as
features and offloads are toggled. Presumably this could
also happen across down/up on most devices if the available
resources changed (cpu offlined).

The kobject needs to be zero'd in this case so that the
state is not preserved across kobject_put()/kobject_init_and_add().

This resolves the following error report.

ixgbe 0000:03:00.0: eth2: NIC Link is Up 10 Gbps, Flow Control: RX/TX
kobject (ffff880324b83210): tried to init an initialized object, something is seriously wrong.
Pid: 1972, comm: lldpad Not tainted 2.6.37-rc18021qaz+ #169
Call Trace:
 [<ffffffff8121c940>] kobject_init+0x3a/0x83
 [<ffffffff8121cf77>] kobject_init_and_add+0x23/0x57
 [<ffffffff8107b800>] ? mark_lock+0x21/0x267
 [<ffffffff813c6d11>] net_rx_queue_update_kobjects+0x63/0xc6
 [<ffffffff813b5e0e>] netif_set_real_num_rx_queues+0x5f/0x78
 [<ffffffffa0261d49>] ixgbe_set_num_queues+0x1c6/0x1ca [ixgbe]
 [<ffffffffa0262509>] ixgbe_init_interrupt_scheme+0x1e/0x79c [ixgbe]
 [<ffffffffa0274596>] ixgbe_dcbnl_set_state+0x167/0x189 [ixgbe]

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-18 09:41:40 -08:00
Gerrit Renker
f72f2f4cde dccp ccid-2: whitespace fix-up
This fixes whitespace noise introduced in commit "dccp ccid-2: Algorithm to
update buffer state", 5753fdfe8b, 14 Nov 2010.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-18 09:37:07 -08:00
Eric Dumazet
866f3b25a2 bonding: IGMP handling cleanup
Instead of iterating in_dev->mc_list from bonding driver, its better
to call a helper function provided by igmp.c
Details of implementation (locking) are private to igmp code.

ip_mc_rejoin_group(struct ip_mc_list *im) becomes
ip_mc_rejoin_groups(struct in_device *in_dev);

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-18 09:33:19 -08:00
Mark Mentovai
09a02fdb91 cfg80211: fix can_beacon_sec_chan, reenable HT40
This follows wireless-testing 9236d838c9
("cfg80211: fix extension channel checks to initiate communication") and
fixes accidental case fall-through. Without this fix, HT40 is entirely
blocked.

Signed-off-by: Mark Mentovai <mark@moxienet.com>
Cc: stable@kernel.org
Acked-by: Luis R. Rodriguez <lrodriguez@atheros.com
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-18 11:35:05 -05:00
Johannes Berg
50a9432dae mac80211: fix powersaving clients races
The code to handle powersaving stations has a race:
when the powersave flag is lifted from a station,
we could transmit a packet that is being processed
for TX at the same time right away, even if there
are other frames queued for it. This would cause
frame reordering. To fix this, lift the flag only
under the appropriate lock that blocks TX.

Additionally, the code to allow drivers to block a
station while frames for it are on the HW queue is
never re-enabled the station, so traffic would get
stuck indefinitely. Fix this by clearing the flag
for this appropriately.

Finally, as an optimisation, don't do anything if
the driver unblocks an already unblocked station.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-17 16:19:33 -05:00
Johannes Berg
4bce22b9b8 mac80211: defines for AC numbers
In many places we've just hardcoded the
AC numbers -- which is a relic from the
original mac80211 (d80211). Add constants
for them so we know what we're talking
about.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-17 16:19:31 -05:00
Vasiliy Kulikov
dda0b38692 net: ipv4: tcp_probe: cleanup snprintf() use
snprintf() returns number of bytes that were copied if there is no overflow.
This code uses return value as number of copied bytes.  Theoretically format
string '%lu.%09lu %pI4:%u %pI4:%u %d %#x %#x %u %u %u %u\n' may be expanded
up to 163 bytes.  In reality tv.tv_sec is just few bytes instead of 20, 2 ports
are just 5 bytes each instead of 10, length is 5 bytes instead of 10.  The rest
is an unstrusted input.  Theoretically if tv_sec is big then copy_to_user() would
overflow tbuf.

tbuf was increased to fit in 163 bytes.  snprintf() is used to follow return
value semantic.

Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-17 12:27:46 -08:00
John Fastabend
9ea19481db net: zero kobject in rx_queue_release
netif_set_real_num_rx_queues() can decrement and increment
the number of rx queues. For example ixgbe does this as
features and offloads are toggled. Presumably this could
also happen across down/up on most devices if the available
resources changed (cpu offlined).

The kobject needs to be zero'd in this case so that the
state is not preserved across kobject_put()/kobject_init_and_add().

This resolves the following error report.

ixgbe 0000:03:00.0: eth2: NIC Link is Up 10 Gbps, Flow Control: RX/TX
kobject (ffff880324b83210): tried to init an initialized object, something is seriously wrong.
Pid: 1972, comm: lldpad Not tainted 2.6.37-rc18021qaz+ #169
Call Trace:
 [<ffffffff8121c940>] kobject_init+0x3a/0x83
 [<ffffffff8121cf77>] kobject_init_and_add+0x23/0x57
 [<ffffffff8107b800>] ? mark_lock+0x21/0x267
 [<ffffffff813c6d11>] net_rx_queue_update_kobjects+0x63/0xc6
 [<ffffffff813b5e0e>] netif_set_real_num_rx_queues+0x5f/0x78
 [<ffffffffa0261d49>] ixgbe_set_num_queues+0x1c6/0x1ca [ixgbe]
 [<ffffffffa0262509>] ixgbe_init_interrupt_scheme+0x1e/0x79c [ixgbe]
 [<ffffffffa0274596>] ixgbe_dcbnl_set_state+0x167/0x189 [ixgbe]

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-17 12:27:46 -08:00
Changli Gao
5811662b15 net: use the macros defined for the members of flowi
Use the macros defined for the members of flowi to clean the code up.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-17 12:27:45 -08:00
Dan Rosenberg
218854af84 rds: Integer overflow in RDS cmsg handling
In rds_cmsg_rdma_args(), the user-provided args->nr_local value is
restricted to less than UINT_MAX.  This seems to need a tighter upper
bound, since the calculation of total iov_size can overflow, resulting
in a small sock_kmalloc() allocation.  This would probably just result
in walking off the heap and crashing when calling rds_rdma_pages() with
a high count value.  If it somehow doesn't crash here, then memory
corruption could occur soon after.

Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-17 12:20:52 -08:00
Thomas Graf
b382b191ea ipv6: AF_INET6 link address family
IPv6 already exposes some address family data via netlink in the
IFLA_PROTINFO attribute if RTM_GETLINK request is sent with the
address family set to AF_INET6. We take over this format and
reuse all the code.

Signed-off-by: Thomas Graf <tgraf@infradead.org>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-17 11:28:26 -08:00
Thomas Graf
9f0f7272ac ipv4: AF_INET link address family
Implements the AF_INET link address family exposing the per
device configuration settings via netlink using the attribute
IFLA_INET_CONF.

The format of IFLA_INET_CONF differs depending on the direction
the attribute is sent. The attribute sent by the kernel consists
of a u32 array, basically a 1:1 copy of in_device->cnf.data[].
The attribute expected by the kernel must consist of a sequence
of nested u32 attributes, each representing a change request,
e.g.
	[IFLA_INET_CONF] = {
		[IPV4_DEVCONF_FORWARDING] = 1,
		[IPV4_DEVCONF_NOXFRM] = 0,
	}

libnl userspace API documentation and example available from:
http://www.infradead.org/~tgr/libnl/doc-git/group__link__inet.html

Signed-off-by: Thomas Graf <tgraf@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-17 11:28:25 -08:00
Thomas Graf
f8ff182c71 rtnetlink: Link address family API
Each net_device contains address family specific data such as
per device settings and statistics. We already expose this data
via procfs/sysfs and partially netlink.

The netlink method requires the requester to send one RTM_GETLINK
request for each address family it wishes to receive data of
and then merge this data itself.

This patch implements a new API which combines all address family
specific link data in a new netlink attribute IFLA_AF_SPEC.
IFLA_AF_SPEC contains a sequence of nested attributes, one for each
address family which in turn defines the structure of its own
attribute. Example:

   [IFLA_AF_SPEC] = {
       [AF_INET] = {
           [IFLA_INET_CONF] = ...,
       },
       [AF_INET6] = {
           [IFLA_INET6_FLAGS] = ...,
           [IFLA_INET6_CONF] = ...,
       }
   }

The API also allows for address families to implement a function
which parses the IFLA_AF_SPEC attribute sent by userspace to
implement address family specific link options.

Signed-off-by: Thomas Graf <tgraf@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-17 11:28:24 -08:00
Eric Paris
ee58681195 network: tcp_connect should return certain errors up the stack
The current tcp_connect code completely ignores errors from sending an skb.
This makes sense in many situations (like -ENOBUFFS) but I want to be able to
immediately fail connections if they are denied by the SELinux netfilter hook.
Netfilter does not normally return ECONNREFUSED when it drops a packet so we
respect that error code as a final and fatal error that can not be recovered.

Based-on-patch-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-17 10:54:35 -08:00
Eric Paris
da68365004 netfilter: allow hooks to pass error code back up the stack
SELinux would like to pass certain fatal errors back up the stack.  This patch
implements the generic netfilter support for this functionality.

Based-on-patch-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-17 10:54:34 -08:00
Joe Perches
37d6680042 net/atm: Remove unnecessary casts of netdev_priv
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-17 10:37:52 -08:00
Arnd Bergmann
451a3c24b0 BKL: remove extraneous #include <smp_lock.h>
The big kernel lock has been removed from all these files at some point,
leaving only the #include.

Remove this too as a cleanup.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-17 08:59:32 -08:00
Felix Fietkau
8f0729b16a mac80211: add support for setting the ad-hoc multicast rate
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-16 16:39:08 -05:00
Felix Fietkau
885a46d0f7 cfg80211: add support for setting the ad-hoc multicast rate
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-16 16:39:08 -05:00
Juuso Oikarinen
a619a4c0e1 mac80211: Add function to get probe request template for current AP
Chipsets with hardware based connection monitoring need to autonomically
send directed probe-request frames to the AP (in the event of beacon loss,
for example.)

For the hardware to be able to do this, it requires a template for the frame
to transmit to the AP, filled in with the BSSID and SSID of the AP, but also
the supported rate IE's.

This patch adds a function to mac80211, which allows the hardware driver to
fetch this template after association, so it can be configured to the hardware.

Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-16 16:37:08 -05:00
Bruno Randolf
15d9675321 mac80211: Add antenna configuration
Allow antenna configuration by calling driver's function for it.

We disallow antenna configuration if the wiphy is already running, mainly to
make life easier for 802.11n drivers which need to recalculate HT capabilites.

Signed-off-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-16 16:37:05 -05:00
Bruno Randolf
afe0cbf875 cfg80211: Add nl80211 antenna configuration
Allow setting of TX and RX antennas configuration via nl80211.

The antenna configuration is defined as a bitmap of allowed antennas to use.
This API can be used to mask out antennas which are not attached or should not
be used for other reasons like regulatory concerns or special setups.

Separate bitmaps are used for RX and TX to allow configuring different antennas
for receiving and transmitting. Each bitmap is 32 bit long, each bit
representing one antenna, starting with antenna 1 at the first bit. If an
antenna bit is set, this means the driver is allowed to use this antenna for RX
or TX respectively; if the bit is not set the hardware is not allowed to use
this antenna.

Using bitmaps has the benefit of allowing for a flexible configuration
interface which can support many different configurations and which can be used
for 802.11n as well as non-802.11n devices. Instead of relying on some hardware
specific assumptions, drivers can use this information to know which antennas
are actually attached to the system and derive their capabilities based on
that.

802.11n devices should enable or disable chains, based on which antennas are
present (If all antennas belonging to a particular chain are disabled, the
entire chain should be disabled). HT capabilities (like STBC, TX Beamforming,
Antenna selection) should be calculated based on the available chains after
applying the antenna masks. Should a 802.11n device have diversity antennas
attached to one of their chains, diversity can be enabled or disabled based on
the antenna information.

Non-802.11n drivers can use the antenna masks to select RX and TX antennas and
to enable or disable antenna diversity.

While covering chainmasks for 802.11n and the standard "legacy" modes "fixed
antenna 1", "fixed antenna 2" and "diversity" this API also allows more rare,
but useful configurations as follows:

1) Send on antenna 1, receive on antenna 2 (or vice versa). This can be used to
have a low gain antenna for TX in order to keep within the regulatory
constraints and a high gain antenna for RX in order to receive weaker signals
("speak softly, but listen harder"). This can be useful for building long-shot
outdoor links. Another usage of this setup is having a low-noise pre-amplifier
on antenna 1 and a power amplifier on the other antenna. This way transmit
noise is mostly kept out of the low noise receive channel.
(This would be bitmaps: tx 1 rx 2).

2) Another similar setup is: Use RX diversity on both antennas, but always send
on antenna 1. Again that would allow us to benefit from a higher gain RX
antenna, while staying within the legal limits.
(This would be: tx 0 rx 3).

3) And finally there can be special experimental setups in research and
development even with pre 802.11n hardware where more than 2 antennas are
available. It's good to keep the API simple, yet flexible.

Signed-off-by: Bruno Randolf <br1@einfach.org>

--
v7:	Made bitmasks 32 bit wide and rebased to latest wireless-testing.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-16 16:37:05 -05:00
Arik Nemtsov
f23a478075 mac80211: support hardware TX fragmentation offload
The lower driver is notified when the fragmentation threshold changes
and upon a reconfig of the interface.

If the driver supports hardware TX fragmentation, don't fragment
packets in the stack.

Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-16 16:37:04 -05:00
Luis R. Rodriguez
9236d838c9 cfg80211: fix extension channel checks to initiate communication
When operating in a mode that initiates communication and using
HT40 we should fail if we cannot use both primary and secondary
channels to initiate communication. Our current ht40 allowmap
only covers STA mode of operation, for beaconing modes we need
a check on the fly as the mode of operation is dynamic and
there other flags other than disable which we should read
to check if we can initiate communication.

Do not allow for initiating communication if our secondary HT40
channel has is either disabled, has a passive scan flag, a
no-ibss flag or is a radar channel. Userspace now has similar
checks but this is also needed in-kernel.

Reported-by: Jouni Malinen <jouni.malinen@atheros.com>
Cc: stable@kernel.org
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-16 15:59:39 -05:00
Ulrich Weber
7d98ffd8c2 xfrm: update flowi saddr in icmp_send if unset
otherwise xfrm_lookup will fail to find correct policy

Signed-off-by: Ulrich Weber <uweber@astaro.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-16 11:43:39 -08:00
Eric Dumazet
c31504dc0d udp: use atomic_inc_not_zero_hint
UDP sockets refcount is usually 2, unless an incoming frame is going to
be queued in receive or backlog queue.

Using atomic_inc_not_zero_hint() permits to reduce latency, because
processor issues less memory transactions.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-16 11:17:43 -08:00
Eric Dumazet
213b15ca81 vlan: remove ndo_select_queue() logic
Now vlan are lockless, we dont need special ndo_select_queue() logic.
dev_pick_tx() will do the multiqueue stuff on the real device transmit.

Suggested-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-16 11:17:42 -08:00
Eric Dumazet
4af429d29b vlan: lockless transmit path
vlan is a stacked device, like tunnels. We should use the lockless
mechanism we are using in tunnels and loopback.

This patch completely removes locking in TX path.

tx stat counters are added into existing percpu stat structure, renamed
from vlan_rx_stats to vlan_pcpu_stats.

Note : this partially reverts commit 2e59af3dcb (vlan: multiqueue vlan
device)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-16 11:15:08 -08:00
Neil Horman
0e3125c755 packet: Enhance AF_PACKET implementation to not require high order contiguous memory allocation (v4)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Version 4 of this patch.

Change notes:
1) Removed extra memset.  Didn't think kcalloc added a GFP_ZERO the way kzalloc did :)

Summary:
It was shown to me recently that systems under high load were driven very deep
into swap when tcpdump was run.  The reason this happened was because the
AF_PACKET protocol has a SET_RINGBUFFER socket option that allows the user space
application to specify how many entries an AF_PACKET socket will have and how
large each entry will be.  It seems the default setting for tcpdump is to set
the ring buffer to 32 entries of 64 Kb each, which implies 32 order 5
allocation.  Thats difficult under good circumstances, and horrid under memory
pressure.

I thought it would be good to make that a bit more usable.  I was going to do a
simple conversion of the ring buffer from contigous pages to iovecs, but
unfortunately, the metadata which AF_PACKET places in these buffers can easily
span a page boundary, and given that these buffers get mapped into user space,
and the data layout doesn't easily allow for a change to padding between frames
to avoid that, a simple iovec change is just going to break user space ABI
consistency.

So I've done this, I've added a three tiered mechanism to the af_packet set_ring
socket option.  It attempts to allocate memory in the following order:

1) Using __get_free_pages with GFP_NORETRY set, so as to fail quickly without
digging into swap

2) Using vmalloc

3) Using __get_free_pages with GFP_NORETRY clear, causing us to try as hard as
needed to get the memory

The effect is that we don't disturb the system as much when we're under load,
while still being able to conduct tcpdumps effectively.

Tested successfully by me.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Maciej Żenczykowski <zenczykowski@gmail.com>
Reported-by: Maciej Żenczykowski <zenczykowski@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-16 10:26:47 -08:00
Wolfram Sang
4c62ab9c53 irda: irttp: allow zero byte packets
Sending zero byte packets is not neccessarily an error (AF_INET accepts it,
too), so just apply a shortcut. This was discovered because of a non-working
software with WINE. See

  http://bugs.winehq.org/show_bug.cgi?id=19397#c86
  http://thread.gmane.org/gmane.linux.irda.general/1643

for very detailed debugging information and a testcase. Kudos to Wolfgang for
those!

Reported-by: Wolfgang Schwotzer <wolfgang.schwotzer@gmx.net>
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Tested-by: Mike Evans <mike.evans@cardolan.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-16 09:50:47 -08:00
John Fastabend
9d82ca98f7 ipv6: fix missing in6_ifa_put in addrconf
Fix ref count bug introduced by

commit 2de7957072
Author: Lorenzo Colitti <lorenzo@google.com>
Date:   Wed Oct 27 18:16:49 2010 +0000

ipv6: addrconf: don't remove address state on ifdown if the address
is being kept

Fix logic so that addrconf_ifdown() decrements the inet6_ifaddr
refcnt correctly with in6_ifa_put().

Reported-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-16 09:24:58 -08:00
David S. Miller
b5e4156743 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2010-11-16 09:17:12 -08:00
Jesper Juhl
94f58df8e5 SUNRPC: Simplify rpc_alloc_iostats by removing pointless local variable
Hi,

We can simplify net/sunrpc/stats.c::rpc_alloc_iostats() a bit by getting
rid of the unneeded local variable 'new'.

Please CC me on replies.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-16 11:58:51 -05:00
Patrick McHardy
2c2bf08614 Merge branch 'for-patrick' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/lvs-test-2.6 2010-11-16 10:21:27 +01:00
Eric Dumazet
3bfd45f93c netfilter: nf_conntrack: one less atomic op in nf_ct_expect_insert()
Instead of doing atomic_inc(&exp->use) twice,
call atomic_add(2, &exp->use);

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-11-16 10:19:18 +01:00
David S. Miller
6b35308850 net: Export netif_get_vlan_features().
ERROR: "netif_get_vlan_features" [drivers/net/xen-netfront.ko] undefined!

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 20:15:03 -08:00
Simon Horman
8f1b03a4c1 ipvs: allow transmit of GRO aggregated skbs
Attempt at allowing LVS to transmit skbs of greater than MTU length that
have been aggregated by GRO and can thus be deaggregated by GSO.

Cc: Julian Anastasov <ja@ssi.bg>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-11-16 08:13:08 +09:00
Eric Dumazet
a333e2ec05 ipvs: remove shadow rt variable
Remove a sparse warning about rt variable.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-11-16 08:13:08 +09:00
Eric Dumazet
4ecd29447e ipvs: add static and read_mostly attributes
ip_vs_conn_tab_bits & ip_vs_conn_tab_mask are static to
ipvs/ip_vs_conn.c

ip_vs_conn_tab_size, ip_vs_conn_tab_mask, ip_vs_conn_tab [the pointer],
ip_vs_conn_rnd are mostly read.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-11-16 08:13:08 +09:00
Simon Horman
8aadf93c9c IPVS: buffer argument to ip_vs_process_message() should not be const
It is assigned to a non-const variable and its contents are modified.

Acked-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-11-16 08:13:08 +09:00
Simon Horman
7ae246a15a IPVS: Remove useless { } block from ip_vs_process_message()
Acked-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-11-16 08:13:08 +09:00
Simon Horman
d494262b8a IPVS: Make the cp argument to ip_vs_sync_conn() static
Acked-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-11-16 08:13:07 +09:00
Simon Horman
ea2c73afc2 IPVS: Only match pe_data created by the same pe
Only match persistence engine data if it was
created by the same persistence engine.

Reported-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2010-11-16 08:13:07 +09:00
Simon Horman
e9e5eee873 IPVS: Add persistence engine to connection entry
The dest of a connection may not exist if it has been created as the result
of connection synchronisation. But in order for connection entries for
templates with persistence engine data created through connection
synchronisation to be valid access to the persistence engine pointer is
required.  So add the persistence engine to the connection itself.

Signed-off-by: Simon Horman <horms@verge.net.au>
2010-11-16 08:13:07 +09:00
Michael Witten
c996d8b9a8 Docs/Kconfig: Update: osdl.org -> linuxfoundation.org
Some of the documentation refers to web pages under
the domain `osdl.org'. However, `osdl.org' now
redirects to `linuxfoundation.org'.

Rather than rely on redirections, this patch updates
the addresses appropriately; for the most part, only
documentation that is meant to be current has been
updated.

The patch should be pretty quick to scan and check;
each new web-page url was gotten by trying out the
original URL in a browser and then simply copying the
the redirected URL (formatting as necessary).

There is some conflict as to which one of these domain
names is preferred:

  linuxfoundation.org
  linux-foundation.org

So, I wrote:

  info@linuxfoundation.org

and got this reply:

  Message-ID: <4CE17EE6.9040807@linuxfoundation.org>
  Date: Mon, 15 Nov 2010 10:41:42 -0800
  From: David Ames <david@linuxfoundation.org>

  ...

  linuxfoundation.org is preferred. The canonical name for our web site is
  www.linuxfoundation.org. Our list site is actually
  lists.linux-foundation.org.

  Regarding email linuxfoundation.org is preferred there are a few people
  who choose to use linux-foundation.org for their own reasons.

Consequently, I used `linuxfoundation.org' for web pages and
`lists.linux-foundation.org' for mailing-list web pages and email addresses;
the only personal email address I updated from `@osdl.org' was that of
Andrew Morton, who prefers `linux-foundation.org' according `git log'.

Signed-off-by: Michael Witten <mfwitten@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-11-15 23:50:13 +01:00
Eric Dumazet
ec1e5610c0 bridge: add RCU annotations to bridge port lookup
br_port_get() renamed to br_port_get_rtnl() to make clear RTNL is held.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 11:13:18 -08:00
stephen hemminger
b5ed54e94d bridge: fix RCU races with bridge port
The macro br_port_exists() is not enough protection when only
RCU is being used. There is a tiny race where other CPU has cleared port
handler hook, but is bridge port flag might still be set.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 11:13:17 -08:00
Eric Dumazet
a386f99025 bridge: add proper RCU annotation to should_route_hook
Add br_should_route_hook_t typedef, this is the only way we can
get a clean RCU implementation for function pointer.

Move route_hook to location where it is used.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 11:13:16 -08:00
Eric Dumazet
e805168800 bridge: add RCU annotation to bridge multicast table
Add modern __rcu annotatations to bridge multicast table.
Use newer hlist macros to avoid direct access to hlist internals.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 11:13:16 -08:00
Joe Perches
8a22c99a80 net/ipv6/mcast.c: Remove unnecessary semicolons
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 11:07:16 -08:00
David S. Miller
6c1b6c6b87 Merge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/net-next-2.6 2010-11-15 10:59:49 -08:00
Tom Herbert
fe8222406c net: Simplify RX queue allocation
This patch move RX queue allocation to alloc_netdev_mq and freeing of
the queues to free_netdev (symmetric to TX queue allocation).  Each
kobject RX queue takes a reference to the queue's device so that the
device can't be freed before all the kobjects have been released-- this
obviates the need for reference counts specific to RX queues.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 10:57:28 -08:00
Tom Herbert
ed9af2e839 net: Move TX queue allocation to alloc_netdev_mq
TX queues are now allocated in alloc_netdev_mq and freed in
free_netdev.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 10:56:54 -08:00
Eric Dumazet
c5d277d29a netfilter: rcu sparse cleanups
Use RCU helpers to reduce number of sparse warnings
(CONFIG_SPARSE_RCU_POINTER=y), and adds lockdep checks.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-11-15 19:45:13 +01:00
Timo Teräs
cc9ff19da9 xfrm: use gre key as flow upper protocol info
The GRE Key field is intended to be used for identifying an individual
traffic flow within a tunnel. It is useful to be able to have XFRM
policy selector matches to have different policies for different
GRE tunnels.

Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 10:44:04 -08:00
David S. Miller
e1f2d8c2cc vlan: Fix build warning in vlandev_seq_show()
net/8021q/vlanproc.c: In function 'vlandev_seq_show':
net/8021q/vlanproc.c:283:20: warning: unused variable 'fmt'

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 10:37:30 -08:00
Jesper Juhl
ffa56e540c mac80211: Remove redundant checks for NULL before calls to crypto_free_cipher()
crypto_free_cipher() is a wrapper around crypto_free_tfm() which is a
wrapper around crypto_destroy_tfm() and the latter can handle being passed
a NULL pointer, so checking for NULL in the
ieee80211_aes_key_free()/ieee80211_aes_cmac_key_free() wrappers around
crypto_free_cipher() is pointless and just increase object code size
needlesly and makes us execute extra test/branch instructions that we
don't need.
Btw; don't we have to many wrappers around wrappers ad nauseam here?
Anyway, this patch removes the redundant conditionals.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-15 13:26:11 -05:00
Eliad Peller
07caf9d6c9 mac80211: refactor debugfs function generation code
refactor mac80211 debugfs code by using a format&copy function, instead of
duplicating the code for each generated function.

this change reduces about 600B from mac80211.ko

Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-15 13:24:48 -05:00
Felix Fietkau
c7317e41df mac80211: minstrel_ht - reduce the overhead of rate sampling
- reduce the number of retransmission attempts for sample rates
- sample lower rates less often
- do not use RTS/CTS for sampling frames
- increase the time between sampling attempts

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-15 13:24:21 -05:00
Luis R. Rodriguez
d91e41b690 cfg80211: prefix REG_DBG_PRINT() with cfg80211
Everyone's doing it, its the cool thing.

Cc: Easwar Krishnan <easwar.krishnan@atheros.com>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-15 13:24:15 -05:00
Luis R. Rodriguez
e702d3cf29 cfg80211: add debug print when processing a channel
In the worst case you are seeing really odd things you want
more information than what is provided right now, for those
that insist and want debug info through CONFIG_CFG80211_REG_DEBUG
provide a print of when we are processing a channel and with what
regulatory rule.

Cc: Easwar Krishnan <easwar.krishnan@atheros.com>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: Senthil Balasubramanian <senthilkumar@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-15 13:24:14 -05:00