linux

Author	SHA1	Message	Date
David S. Miller	f966a13f92	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6	2011-01-18 12:50:19 -08:00
Jiri Olsa	93557f53e1	netfilter: nf_conntrack: nf_conntrack snmp helper Adding support for SNMP broadcast connection tracking. The SNMP broadcast requests are now paired with the SNMP responses. Thus allowing using SNMP broadcasts with firewall enabled. Please refer to the following conversation: http://marc.info/?l=netfilter-devel&m=125992205006600&w=2 Patrick McHardy wrote: > > The best solution would be to add generic broadcast tracking, the > > use of expectations for this is a bit of abuse. > > The second best choice I guess would be to move the help() function > > to a shared module and generalize it so it can be used for both. This patch implements the "second best choice". Since the netbios-ns conntrack module uses the same helper functionality as the snmp, only one helper function is added for both snmp and netbios-ns modules into the new object - nf_conntrack_broadcast. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-01-18 18:12:24 +01:00
Eric Dumazet	94d117a1c7	netfilter: ipt_CLUSTERIP: remove "no conntrack!" When a packet is meant to be handled by another node of the cluster, silently drop it instead of flooding kernel log. Note : INVALID packets are also dropped without notice. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-01-18 16:27:56 +01:00
Patrick McHardy	a8fc0d9b34	Merge branch 'master' of git://dev.medozas.de/linux	2011-01-18 16:20:53 +01:00
Florian Westphal	94b27cc361	netfilter: allow NFQUEUE bypass if no listener is available If an skb is to be NF_QUEUE'd, but no program has opened the queue, the packet is dropped. This adds a v2 target revision of xt_NFQUEUE that allows packets to continue through the ruleset instead. Because the actual queueing happens outside of the target context, the 'bypass' flag has to be communicated back to the netfilter core. Unfortunately the only choice to do this without adding a new function argument is to use the target function return value (i.e. the verdict). In the NF_QUEUE case, the upper 16bit already contain the queue number to use. The previous patch reduced NF_VERDICT_MASK to 0xff, i.e. we now have extra room for a new flag. If a hook issued a NF_QUEUE verdict, then the netfilter core will continue packet processing if the queueing hook returns -ESRCH (== "this queue does not exist") and the new NF_VERDICT_FLAG_QUEUE_BYPASS flag is set in the verdict value. Note: If the queue exists, but userspace does not consume packets fast enough, the skb will still be dropped. Signed-off-by: Florian Westphal <fwestphal@astaro.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-01-18 16:08:30 +01:00
Florian Westphal	f615df76ed	netfilter: reduce NF_VERDICT_MASK to 0xff NF_VERDICT_MASK is currently 0xffff. This is because the upper 16 bits are used to store errno (for NF_DROP) or the queue number (NF_QUEUE verdict). As there are up to 0xffff different queues available, there is no more room to store additional flags. At the moment there are only 6 different verdicts, i.e. we can reduce NF_VERDICT_MASK to 0xff to allow storing additional flags in the 0xff00 space. NF_VERDICT_BITS would then be reduced to 8, but because the value is exported to userspace, this might cause breakage; e.g.: e.g. 'queuenr = (1 << NF_VERDICT_BITS) \| NF_QUEUE' would now break. Thus, remove NF_VERDICT_BITS usage in the kernel and move the old value to the 'userspace compat' section. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-01-18 15:52:14 +01:00
Florian Westphal	06cdb6349c	netfilter: nfnetlink_queue: do not free skb on error Move free responsibility from nf_queue to caller. This enables more flexible error handling; we can now accept the skb instead of freeing it. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-01-18 15:28:38 +01:00
Florian Westphal	f158508618	netfilter: nfnetlink_queue: return error number to caller instead of returning -1 on error, return an error number to allow the caller to handle some errors differently. ECANCELED is used to indicate that the hook is going away and should be ignored. A followup patch will introduce more 'ignore this hook' conditions, (depending on queue settings) and will move kfree_skb responsibility to the caller. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-01-18 15:27:28 +01:00
Florian Westphal	5f2cafe736	netfilter: Kconfig: NFQUEUE is useless without NETFILTER_NETLINK_QUEUE NFLOG already does the same thing for NETFILTER_NETLINK_LOG. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-01-18 15:18:08 +01:00
Changli Gao	45eec34195	netfilter: nf_conntrack: remove an atomic bit operation As this ct won't be seen by the others, we don't need to set the IPS_CONFIRMED_BIT in atomic way. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Cc: Tim Gardner <tim.gardner@canonical.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-01-18 15:08:13 +01:00
Changli Gao	a7c2f4d7da	netfilter: nf_nat: fix conversion to non-atomic bit ops My previous patch (netfilter: nf_nat: don't use atomic bit operation) made a mistake when converting atomic_set to a normal bit 'or'. IPS__BIT should be replaced with IPS_. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Cc: Tim Gardner <tim.gardner@canonical.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-01-18 15:02:48 +01:00
Richard Weinberger	1cc34c30be	netfilter: xt_connlimit: use hotdrop jump mark Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Jan Engelhardt <jengelh@medozas.de>	2011-01-18 06:50:41 +01:00
Jan Engelhardt	f1e231a356	netfilter: xtables: add missing aliases for autoloading via iptables Signed-off-by: Jan Engelhardt <jengelh@medozas.de>	2011-01-18 06:33:54 +01:00
Thomas Graf	fbabf31e4d	netfilter: create audit records for x_tables replaces The setsockopt() syscall to replace tables is already recorded in the audit logs. This patch stores additional information such as table name and netfilter protocol. Cc: Patrick McHardy <kaber@trash.net> Cc: Eric Paris <eparis@parisplace.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Thomas Graf <tgraf@redhat.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-01-16 18:12:59 +01:00
Thomas Graf	43f393caec	netfilter: audit target to record accepted/dropped packets This patch adds a new netfilter target which creates audit records for packets traversing a certain chain. It can be used to record packets which are rejected administraively as follows: -N AUDIT_DROP -A AUDIT_DROP -j AUDIT --type DROP -A AUDIT_DROP -j DROP a rule which would typically drop or reject a packet would then invoke the new chain to record packets before dropping them. -j AUDIT_DROP The module is protocol independant and works for iptables, ip6tables and ebtables. The following information is logged: - netfilter hook - packet length - incomming/outgoing interface - MAC src/dst/proto for ethernet packets - src/dst/protocol address for IPv4/IPv6 - src/dst port for TCP/UDP/UDPLITE - icmp type/code Cc: Patrick McHardy <kaber@trash.net> Cc: Eric Paris <eparis@parisplace.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Thomas Graf <tgraf@redhat.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-01-16 18:10:28 +01:00
Dan Carpenter	01a859014b	caif: checking the wrong variable In the original code we check if (servl == NULL) twice. The first time should print the message that cfmuxl_remove_uplayer() failed and set "ret" correctly, but instead it just returns success. The second check should be checking the value of "ret" instead of "servl". Signed-off-by: Dan Carpenter <error27@gmail.com> Acked-by: Sjur Braendeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-15 20:58:11 -08:00
Kurt Van Dijck	5e50732803	can: test size of struct sockaddr in sendmsg This patch makes the CAN socket code conform to the manpage of sendmsg. Signed-off-by: Kurt Van Dijck <kurt.van.dijck@eia.be> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-15 20:56:42 -08:00
David S. Miller	d78c68efa8	Merge branch 'for-david' of git://git.open-mesh.org/ecsv/linux-merge	2011-01-15 20:48:28 -08:00
Sven Eckelmann	aa0adb1a85	batman-adv: Use "__attribute__" shortcut macros Linux 2.6.21 defines different macros for __attribute__ which are also used inside batman-adv. The next version of checkpatch.pl warns about the usage of __attribute__((packed))). Linux 2.6.33 defines an extra macro __always_unused which is used to assist source code analyzers and can be used to removed the last existing __attribute__ inside the source code. Signed-off-by: Sven Eckelmann <sven@narfation.org>	2011-01-16 03:25:19 +01:00
Linus Torvalds	d018b6f4f1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits) GRETH: resolve SMP issues and other problems GRETH: handle frame error interrupts GRETH: avoid writing bad speed/duplex when setting transfer mode GRETH: fixed skb buffer memory leak on frame errors GRETH: GBit transmit descriptor handling optimization GRETH: fix opening/closing GRETH: added raw AMBA vendor/device number to match against. cassini: Fix build bustage on x86. e1000e: consistent use of Rx/Tx vs. RX/TX/rx/tx in comments/logs e1000e: update Copyright for 2011 e1000: Avoid unhandled IRQ r8169: keep firmware in memory. netdev: tilepro: Use is_unicast_ether_addr helper etherdevice.h: Add is_unicast_ether_addr function ks8695net: Use default implementation of ethtool_ops::get_link ks8695net: Disable non-working ethtool operations USB CDC NCM: Don't deref NULL in cdc_ncm_rx_fixup() and don't use uninitialized variable. vxge: Remember to release firmware after upgrading firmware netdev: bfin_mac: Remove is_multicast_ether_addr use in netdev_for_each_mc_addr ipsec: update MAX_AH_AUTH_LEN to support sha512 ...	2011-01-14 13:25:30 -08:00
Linus Torvalds	18bce371ae	Merge branch 'for-2.6.38' of git://linux-nfs.org/~bfields/linux * 'for-2.6.38' of git://linux-nfs.org/~bfields/linux: (62 commits) nfsd4: fix callback restarting nfsd: break lease on unlink, link, and rename nfsd4: break lease on nfsd setattr nfsd: don't support msnfs export option nfsd4: initialize cb_per_client nfsd4: allow restarting callbacks nfsd4: simplify nfsd4_cb_prepare nfsd4: give out delegations more quickly in 4.1 case nfsd4: add helper function to run callbacks nfsd4: make sure sequence flags are set after destroy_session nfsd4: re-probe callback on connection loss nfsd4: set sequence flag when backchannel is down nfsd4: keep finer-grained callback status rpc: allow xprt_class->setup to return a preexisting xprt rpc: keep backchannel xprt as long as server connection rpc: move sk_bc_xprt to svc_xprt nfsd4: allow backchannel recovery nfsd4: support BIND_CONN_TO_SESSION nfsd4: modify session list under cl_lock Documentation: fl_mylease no longer exists ... Fix up conflicts in fs/nfsd/vfs.c with the vfs-scale work. The vfs-scale work touched some msnfs cases, and this merge removes support for that entirely, so the conflict was trivial to resolve.	2011-01-14 13:17:26 -08:00
Tejun Heo	e1fcc7e2a7	rxrpc: rxrpc_workqueue isn't used during memory reclaim rxrpc_workqueue isn't depended upon while reclaiming memory. Convert to alloc_workqueue() without WQ_MEM_RECLAIM. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com> Cc: linux-afs@lists.infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-14 09:25:11 -08:00
Patrick McHardy	d862a6622e	netfilter: nf_conntrack: use is_vmalloc_addr() Use is_vmalloc_addr() in nf_ct_free_hashtable() and get rid of the vmalloc flags to indicate that a hash table has been allocated using vmalloc(). Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-01-14 15:45:56 +01:00
Patrick McHardy	0134e89c7b	Merge branch 'master' of git://1984.lsi.us.es/net-next-2.6 Conflicts: net/ipv4/route.c Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-01-14 14:12:37 +01:00
Patrick McHardy	c7066f70d9	netfilter: fix Kconfig dependencies Fix dependencies of netfilter realm match: it depends on NET_CLS_ROUTE, which itself depends on NET_SCHED; this dependency is missing from netfilter. Since matching on realms is also useful without having NET_SCHED enabled and the option really only controls whether the tclassid member is included in route and dst entries, rename the config option to IP_ROUTE_CLASSID and move it outside of traffic scheduling context to get rid of the NET_SCHED dependeny. Reported-by: Vladis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-01-14 13:36:42 +01:00
Eric Dumazet	1ac9ad1394	net: remove dev_txq_stats_fold() After recent changes, (percpu stats on vlan/tunnels...), we dont need anymore per struct netdev_queue tx_bytes/tx_packets/tx_dropped counters. Only remaining users are ixgbe, sch_teql, gianfar & macvlan : 1) ixgbe can be converted to use existing tx_ring counters. 2) macvlan incremented txq->tx_dropped, it can use the dev->stats.tx_dropped counter. 3) sch_teql : almost revert `ab35cd4b8f` (Use net_device internal stats) Now we have ndo_get_stats64(), use it, even for "unsigned long" fields (No need to bring back a struct net_device_stats) 4) gianfar adds a stats structure per tx queue to hold tx_bytes/tx_packets This removes a lockdep warning (and possible lockup) in rndis gadget, calling dev_get_stats() from hard IRQ context. Ref: http://www.spinics.net/lists/netdev/msg149202.html Reported-by: Neil Jones <neiljay@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Jarek Poplawski <jarkao2@gmail.com> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Sandeep Gopalpet <sandeep.kumar@freescale.com> CC: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-13 21:44:34 -08:00
Jesper Juhl	ed7809d9c4	batman-adv: Even Batman should not dereference NULL pointers There's a problem in net/batman-adv/unicast.c::frag_send_skb(). dev_alloc_skb() allocates memory and may fail, thus returning NULL. If this happens we'll pass a NULL pointer on to skb_split() which in turn hands it to skb_split_inside_header() from where it gets passed to skb_put() that lets skb_tail_pointer() play with it and that function dereferences it. And thus the bat dies. While I was at it I also moved the call to dev_alloc_skb() above the assignment to 'unicast_packet' since there's no reason to do that assignment if the memory allocation fails. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Sven Eckelmann <sven@narfation.org>	2011-01-13 22:11:12 +01:00
Luciano Coelho	82694f764d	mac80211: use maximum number of AMPDU frames as default in BA RX When the buffer size is set to zero in the block ack parameter set field, we should use the maximum supported number of subframes. The existing code was bogus and was doing some unnecessary calculations that lead to wrong values. Thanks Johannes for helping me figure this one out. Cc: stable@kernel.org Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Luciano Coelho <coelho@ti.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-01-13 15:46:45 -05:00
Johannes Berg	681c4d07dd	mac80211: fix lockdep warning Since the introduction of the fixes for the reorder timer, mac80211 will cause lockdep warnings because lockdep confuses local->skb_queue and local->rx_skb_queue and treats their lock as the same. However, their locks are different, and are valid in different contexts (the former is used in IRQ context, the latter in BH only) and the only thing to be done is mark the former as a different lock class so that lockdep can tell the difference. Reported-by: Larry Finger <Larry.Finger@lwfinger.net> Reported-by: Sujith <m.sujith@gmail.com> Reported-by: Miles Lane <miles.lane@gmail.com> Tested-by: Sujith <m.sujith@gmail.com> Tested-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-01-13 15:46:45 -05:00
David S. Miller	1949e084bf	Merge branch 'master' of git://1984.lsi.us.es/net-2.6	2011-01-13 12:34:21 -08:00
Linus Torvalds	b2034d474b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (41 commits) fs: add documentation on fallocate hole punching Gfs2: fail if we try to use hole punch Btrfs: fail if we try to use hole punch Ext4: fail if we try to use hole punch Ocfs2: handle hole punching via fallocate properly XFS: handle hole punching via fallocate properly fs: add hole punching to fallocate vfs: pass struct file to do_truncate on O_TRUNC opens (try #2) fix signedness mess in rw_verify_area() on 64bit architectures fs: fix kernel-doc for dcache::prepend_path fs: fix kernel-doc for dcache::d_validate sanitize ecryptfs ->mount() switch afs move internal-only parts of ncpfs headers to fs/ncpfs switch ncpfs switch 9p pass default dentry_operations to mount_pseudo() switch hostfs switch affs switch configfs ...	2011-01-13 10:27:28 -08:00
Linus Torvalds	27d189c02b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (46 commits) hwrng: via_rng - Fix memory scribbling on some CPUs crypto: padlock - Move padlock.h into include/crypto hwrng: via_rng - Fix asm constraints crypto: n2 - use __devexit not __exit in n2_unregister_algs crypto: mark crypto workqueues CPU_INTENSIVE crypto: mv_cesa - dont return PTR_ERR() of wrong pointer crypto: ripemd - Set module author and update email address crypto: omap-sham - backlog handling fix crypto: gf128mul - Remove experimental tag crypto: af_alg - fix af_alg memory_allocated data type crypto: aesni-intel - Fixed build with binutils 2.16 crypto: af_alg - Make sure sk_security is initialized on accept()ed sockets net: Add missing lockdep class names for af_alg include: Install linux/if_alg.h for user-space crypto API crypto: omap-aes - checkpatch --file warning fixes crypto: omap-aes - initialize aes module once per request crypto: omap-aes - unnecessary code removed crypto: omap-aes - error handling implementation improved crypto: omap-aes - redundant locking is removed crypto: omap-aes - DMA initialization fixes for OMAP off mode ...	2011-01-13 10:25:58 -08:00
Linus Torvalds	a170315420	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: fix cleanup when trying to mount inexistent image net/ceph: make ceph_msgr_wq non-reentrant ceph: fsc->*_wq's aren't used in memory reclaim path ceph: Always free allocated memory in osdmap_decode() ceph: Makefile: Remove unnessary code ceph: associate requests with opening sessions ceph: drop redundant r_mds field ceph: implement DIRLAYOUTHASH feature to get dir layout from MDS ceph: add dir_layout to inode	2011-01-13 10:25:24 -08:00
Linus Torvalds	008d23e485	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits) Documentation/trace/events.txt: Remove obsolete sched_signal_send. writeback: fix global_dirty_limits comment runtime -> real-time ppc: fix comment typo singal -> signal drivers: fix comment typo diable -> disable. m68k: fix comment typo diable -> disable. wireless: comment typo fix diable -> disable. media: comment typo fix diable -> disable. remove doc for obsolete dynamic-printk kernel-parameter remove extraneous 'is' from Documentation/iostats.txt Fix spelling milisec -> ms in snd_ps3 module parameter description Fix spelling mistakes in comments Revert conflicting V4L changes i7core_edac: fix typos in comments mm/rmap.c: fix comment sound, ca0106: Fix assignment to 'channel'. hrtimer: fix a typo in comment init/Kconfig: fix typo anon_inodes: fix wrong function name in comment fix comment typos concerning "consistent" poll: fix a typo in comment ... Fix up trivial conflicts in: - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c) - fs/ext4/ext4.h Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.	2011-01-13 10:05:56 -08:00
Pablo Neira Ayuso	f31e8d4982	netfilter: ctnetlink: fix loop in ctnetlink_get_conntrack() This patch fixes a loop in ctnetlink_get_conntrack() that can be triggered if you use the same socket to receive events and to perform a GET operation. Under heavy load, netlink_unicast() may return -EAGAIN, this error code is reserved in nfnetlink for the module load-on-demand. Instead, we return -ENOBUFS which is the appropriate error code that has to be propagated to user-space. Reported-by: Holger Eitzenberger <holger@eitzenberger.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2011-01-13 17:03:39 +01:00
Florian Westphal	6faee60a4e	netfilter: ebt_ip6: allow matching on ipv6-icmp types/codes To avoid adding a new match revision icmp type/code are stored in the sport/dport area. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Holger Eitzenberger <holger@eitzenberger.org> Reviewed-by: Bart De Schuymer<bdschuym@pandora.be> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2011-01-13 12:05:12 +01:00
Eric Dumazet	255d0dc340	netfilter: x_table: speedup compat operations One iptables invocation with 135000 rules takes 35 seconds of cpu time on a recent server, using a 32bit distro and a 64bit kernel. We eventually trigger NMI/RCU watchdog. INFO: rcu_sched_state detected stall on CPU 3 (t=6000 jiffies) COMPAT mode has quadratic behavior and consume 16 bytes of memory per rule. Switch the xt_compat algos to use an array instead of list, and use a binary search to locate an offset in the sorted array. This halves memory need (8 bytes per rule), and removes quadratic behavior [ O(NN) -> O(Nlog2(N)) ] Time of iptables goes from 35 s to 150 ms. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2011-01-13 12:05:12 +01:00
Patrick McHardy	b017900aac	netfilter: xt_conntrack: support matching on port ranges Add a new revision 3 that contains port ranges for all of origsrc, origdst, replsrc and repldst. The high ports are appended to the original v2 data structure to allow sharing most of the code with v1 and v2. Use of the revision specific port matching function is made dependant on par->match->revision. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2011-01-13 12:05:12 +01:00
Randy Dunlap	3806b4f3b6	eth: fix new kernel-doc warning Fix new kernel-doc warning (copy-paste typo): Warning(net/ethernet/eth.c:366): No description found for parameter 'rxqs' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-12 19:00:40 -08:00
David S. Miller	464143c911	Merge branch 'master' of git://1984.lsi.us.es/net-2.6	2011-01-12 18:58:40 -08:00
Alexey Kuznetsov	72b43d0898	inet6: prevent network storms caused by linux IPv6 routers Linux IPv6 forwards unicast packets, which are link layer multicasts... The hole was present since day one. I was 100% this check is there, but it is not. The problem shows itself, f.e. when Microsoft Network Load Balancer runs on a network. This software resolves IPv6 unicast addresses to multicast MAC addresses. Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-12 18:51:55 -08:00
Hans Schillstrom	c6d2d445d8	IPVS: netns, final patch enabling network name space. all init_net removed, (except for some alloc related that needs to be there) Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2011-01-13 10:30:29 +09:00
Hans Schillstrom	4a98480bcc	IPVS: netns, misc init_net removal in core. init_net removed in __ip_vs_addr_is_local_v6, and got net as param. Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2011-01-13 10:30:29 +09:00
Hans Schillstrom	763f8d0ed4	IPVS: netns, svc counters moved in ip_vs_ctl,c Last two global vars to be moved, ip_vs_ftpsvc_counter and ip_vs_nullsvc_counter. [horms@verge.net.au: removed whitespace-change-only hunk] Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2011-01-13 10:30:28 +09:00
Hans Schillstrom	f2431e6e92	IPVS: netns, trash handling trash list per namspace, and reordering of some params in dst struct. [ horms@verge.net.au: Use cancel_delayed_work_sync() instead of cancel_rearming_delayed_work(). Found during merge conflict resoliution ] Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2011-01-13 10:30:28 +09:00
Hans Schillstrom	f6340ee0c6	IPVS: netns, defense work timer. This patch makes defense work timer per name-space, A net ptr had to be added to the ipvs struct, since it's needed by defense_work_handler. [ horms@verge.net.au: Use cancel_delayed_work_sync() instead of cancel_rearming_delayed_work(). Found during merge conflict resoliution ] Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2011-01-13 10:30:28 +09:00
Hans Schillstrom	a0840e2e16	IPVS: netns, ip_vs_ctl local vars moved to ipvs struct. Moving global vars to ipvs struct, except for svc table lock. Next patch for ctl will be drop-rate handling. v3 __ip_vs_mutex remains global ip_vs_conntrack_enabled(struct netns_ipvs ipvs) Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2011-01-13 10:30:28 +09:00
Hans Schillstrom	6e67e586e7	IPVS: netns, connection hash got net as param. Connection hash table is now name space aware. i.e. net ptr >> 8 is xor:ed to the hash, and this is the first param to be compared. The net struct is 0xa40 in size ( a little bit smaller for 32 bit arch:s) and cache-line aligned, so a ptr >> 5 might be a more clever solution ? All lookups where net is compared uses net_eq() which returns 1 when netns is disabled, and the compiler seems to do something clever in that case. ip_vs_conn_fill_param() have net as first param now. Three new inlines added to keep conn struct smaller when names space is disabled. - ip_vs_conn_net() - ip_vs_conn_net_set() - ip_vs_conn_net_eq() v3 moved net compare to the end in "fast path" Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2011-01-13 10:30:28 +09:00
Hans Schillstrom	b17fc9963f	IPVS: netns, ip_vs_stats and its procfs The statistic counter locks for every packet are now removed, and that statistic is now per CPU, i.e. no locks needed. However summing is made in ip_vs_est into ip_vs_stats struct which is moved to ipvs struc. procfs, ip_vs_stats now have a "per cpu" count and a grand total. A new function seq_file_single_net() in ip_vs.h created for handling of single_open_net() since it does not place net ptr in a struct, like others. /var/lib/lxc # cat /proc/net/ip_vs_stats_percpu Total Incoming Outgoing Incoming Outgoing CPU Conns Packets Packets Bytes Bytes 0 0 3 1 9D 34 1 0 1 2 49 70 2 0 1 2 34 76 3 1 2 2 70 74 ~ 1 7 7 18A 18E Conns/s Pkts/s Pkts/s Bytes/s Bytes/s 0 0 0 0 0 v3 ip_vs_stats reamains as before, instead ip_vs_stats_percpu is added. u64 seq lock added v4 Bug correction inbytes and outbytes as own vars.. per_cpu counter for all stats now as suggested by Julian. [horms@verge.net.au: removed whitespace-change-only hunk] Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2011-01-13 10:30:28 +09:00
Hans Schillstrom	f131315fa2	IPVS: netns awareness to ip_vs_sync All global variables moved to struct ipvs, most external changes fixed (i.e. init_net removed) in sync_buf create + 4 replaced by sizeof(struct..) Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2011-01-13 10:30:28 +09:00
Hans Schillstrom	29c2026fd4	IPVS: netns awareness to ip_vs_est All variables moved to struct ipvs, most external changes fixed (i.e. init_net removed) *v3 timer per ns instead of a common timer in estimator. Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2011-01-13 10:30:28 +09:00
Hans Schillstrom	ab8a5e8408	IPVS: netns awareness to ip_vs_app All variables moved to struct ipvs, most external changes fixed (i.e. init_net removed) in ip_vs_protocol param struct net *net added to: - register_app() - unregister_app() This affected almost all proto_xxx.c files Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2011-01-13 10:30:28 +09:00
Hans Schillstrom	9bbac6a904	IPVS: netns, common protocol changes and use of appcnt. appcnt and timeout_table moved from struct ip_vs_protocol to ip_vs proto_data. struct net *net added as first param to - register_app() - unregister_app() - app_conn_bind() - ip_vs_conn_new() [horms@verge.net.au: removed cosmetic-change-only hunk] Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2011-01-13 10:30:27 +09:00
Hans Schillstrom	9330419d9a	IPVS: netns, use ip_vs_proto_data as param. ip_vs_protocol pp is replaced by ip_vs_proto_data pd in function call in ip_vs_protocol struct i.e. :, - timeout_change() - state_transition() ip_vs_protocol_timeout_change() got ipvs as param, due to above and a upcoming patch - defence work Most of this changes are triggered by Julians comment: "tcp_timeout_change should work with the new struct ip_vs_proto_data so that tcp_state_table will go to pd->state_table and set_tcp_state will get pd instead of pp" v3 Mostly comments from Julian The pp -> pd conversion should start from functions like ip_vs_out() that use pp = ip_vs_proto_get(iph.protocol), now they should use ip_vs_proto_data_get(net, iph.protocol). conn_in_get() and conn_out_get() unused param pp, removed. *v4 ip_vs_protocol_timeout_change() walk the proto_data path. Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2011-01-13 10:30:27 +09:00
Hans Schillstrom	88fe2d3727	IPVS: netns preparation for proto_ah_esp In this phase (one), all local vars will be moved to ipvs struct. Remaining work, add param struct net *net to a couple of functions that common for all protos. Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2011-01-13 10:30:27 +09:00
Hans Schillstrom	9d934878e7	IPVS: netns preparation for proto_sctp In this phase (one), all local vars will be moved to ipvs struct. Remaining work, add param struct net net to a couple of functions that is common for all protos and use ip_vs_proto_data v3 Removed unuset function set_state_timeout() Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2011-01-13 10:30:27 +09:00
Hans Schillstrom	78b16bde10	IPVS: netns preparation for proto_udp In this phase (one), all local vars will be moved to ipvs struct. Remaining work, add param struct net net to a couple of functions that is common for all protos and use ip_vs_proto_data v3 Removed unused function set_state_timeout() Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2011-01-13 10:30:27 +09:00
Hans Schillstrom	4a85b96c08	IPVS: netns preparation for proto_tcp In this phase (one), all local vars will be moved to ipvs struct. Remaining work, add param struct net net to a couple of functions that is common for all protos and use all ip_vs_proto_data v3 Removed unused function as sugested by Simon Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2011-01-13 10:30:27 +09:00
Hans Schillstrom	252c641032	IPVS: netns, prepare protocol Add support for protocol data per name-space. in struct ip_vs_protocol, appcnt will be removed when all protos are modified for network name-space. This patch causes warnings of unused functions, they will be used when next patch will be applied. Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2011-01-13 10:30:27 +09:00
Hans Schillstrom	b6e885ddb9	IPVS: netns awarness to lblc sheduler var sysctl_ip_vs_lblc_expiration moved to ipvs struct as sysctl_lblc_expiration procfs updated to handle this. Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2011-01-13 10:30:27 +09:00
Hans Schillstrom	d0a1eef9c3	IPVS: netns awarness to lblcr sheduler var sysctl_ip_vs_lblcr_expiration moved to ipvs struct as sysctl_lblcr_expiration procfs updated to handle this. Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2011-01-13 10:30:27 +09:00
Hans Schillstrom	fc723250c9	IPVS: netns to services part 1 Services hash tables got netns ptr a hash arg, While Real Servers (rs) has been moved to ipvs struct. Two new inline functions added to get net ptr from skb. Since ip_vs is called from different contexts there is two places to dig for the net ptr skb->dev or skb->sk this is handled in skb_net() and skb_sknet() Global functions, ip_vs_service_get() ip_vs_lookup_real_service() etc have got struct net net as first param. If possible get net ptr skb etc, - if not &init_net is used at this early stage of patching. ip_vs_ctl.c procfs not ready for netns yet. v3 Comments by Julian - __ip_vs_service_find and __ip_vs_svc_fwm_find are fast path, net_eq(svc->net, net) so the check is at the end now. - net = skb_net(skb) in ip_vs_out moved after check for skb_dst. Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2011-01-13 10:30:26 +09:00
Hans Schillstrom	61b1ab4583	IPVS: netns, add basic init per netns. Preparation for network name-space init, in this stage some empty functions exists. In most files there is a check if it is root ns i.e. init_net if (!net_eq(net, &init_net)) return ... this will be removed by the last patch, when enabling name-space. *v3 ip_vs_conn.c merge error corrected. net_ipvs #ifdef removed as sugested by Jan Engelhardt [ horms@verge.net.au: Removed whitespace-change-only hunks ] Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2011-01-13 10:30:26 +09:00
Simon Horman	fee1cc0895	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 into HEAD	2011-01-13 10:29:21 +09:00
Al Viro	c74a1cbb3c	pass default dentry_operations to mount_pseudo() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-12 20:03:43 -05:00
Tejun Heo	f363e45fd1	net/ceph: make ceph_msgr_wq non-reentrant ceph messenger code does a rather complex dancing around multithread workqueue to make sure the same work item isn't executed concurrently on different CPUs. This restriction can be provided by workqueue with WQ_NON_REENTRANT. Make ceph_msgr_wq non-reentrant workqueue with the default concurrency level and remove the QUEUED/BUSY logic. * This removes backoff handling in con_work() but it couldn't reliably block execution of con_work() to begin with - queue_con() can be called after the work started but before BUSY is set. It seems that it was an optimization for a rather cold path and can be safely removed. * The number of concurrent work items is bound by the number of connections and connetions are independent from each other. With the default concurrency level, different connections will be executed independently. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Sage Weil <sage@newdream.net> Cc: ceph-devel@vger.kernel.org Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-12 15:15:14 -08:00
Jesper Juhl	b0aee3516d	ceph: Always free allocated memory in osdmap_decode() Always free memory allocated to 'pi' in net/ceph/osdmap.c::osdmap_decode(). Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-12 15:15:14 -08:00
Sage Weil	6c0f3af72c	ceph: add dir_layout to inode Add a ceph_dir_layout to the inode, and calculate dentry hash values based on the parent directory's specified dir_hash function. This is needed because the old default Linux dcache hash function is extremely week and leads to a poor distribution of files among dir fragments. Signed-off-by: Sage Weil <sage@newdream.net>	2011-01-12 15:15:12 -08:00
KOVACS Krisztian	2fc72c7b84	netfilter: fix compilation when conntrack is disabled but tproxy is enabled The IPv6 tproxy patches split IPv6 defragmentation off of conntrack, but failed to update the #ifdef stanzas guarding the defragmentation related fields and code in skbuff and conntrack related code in nf_defrag_ipv6.c. This patch adds the required #ifdefs so that IPv6 tproxy can truly be used without connection tracking. Original report: http://marc.info/?l=linux-netdev&m=129010118516341&w=2 Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: KOVACS Krisztian <hidden@balabit.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2011-01-12 20:25:08 +01:00
Kees Cook	5b919f833d	net: ax25: fix information leak to userland harder Commit `fe10ae5338` adds a memset() to clear the structure being sent back to userspace, but accidentally used the wrong size. Reported-by: Brad Spengler <spender@grsecurity.net> Signed-off-by: Kees Cook <kees.cook@canonical.com> Cc: stable@kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-12 00:34:49 -08:00
Linus Torvalds	4162cf6497	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (67 commits) cxgb4vf: recover from failure in cxgb4vf_open() netfilter: ebtables: make broute table work again netfilter: fix race in conntrack between dump_table and destroy ah: reload pointers to skb data after calling skb_cow_data() ah: update maximum truncated ICV length xfrm: check trunc_len in XFRMA_ALG_AUTH_TRUNC ehea: Increase the skb array usage net/fec: remove config FEC2 as it's used nowhere pcnet_cs: add new_id tcp: disallow bind() to reuse addr/port net/r8169: Update the function of parsing firmware net: ppp: use {get,put}_unaligned_be{16,32} CAIF: Fix IPv6 support in receive path for GPRS/3G arp: allow to invalidate specific ARP entries net_sched: factorize qdisc stats handling mlx4: Call alloc_etherdev to allocate RX and TX queues net: Add alloc_netdev_mqs function caif: don't set connection request param size before copying data cxgb4vf: fix mailbox data/control coherency domain race qlcnic: change module parameter permissions ...	2011-01-11 16:32:41 -08:00
David S. Miller	60dbb011df	Merge branch 'master' of git://1984.lsi.us.es/net-2.6	2011-01-11 15:43:03 -08:00
Linus Torvalds	b9d919a4ac	Merge branch 'nfs-for-2.6.38' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'nfs-for-2.6.38' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (89 commits) NFS fix the setting of exchange id flag NFS: Don't use vm_map_ram() in readdir NFSv4: Ensure continued open and lockowner name uniqueness NFS: Move cl_delegations to the nfs_server struct NFS: Introduce nfs_detach_delegations() NFS: Move cl_state_owners and related fields to the nfs_server struct NFS: Allow walking nfs_client.cl_superblocks list outside client.c pnfs: layout roc code pnfs: update nfs4_callback_recallany to handle layouts pnfs: add CB_LAYOUTRECALL handling pnfs: CB_LAYOUTRECALL xdr code pnfs: change lo refcounting to atomic_t pnfs: check that partial LAYOUTGET return is ignored pnfs: add layout to client list before sending rpc pnfs: serialize LAYOUTGET(openstateid) pnfs: layoutget rpc code cleanup pnfs: change how lsegs are removed from layout list pnfs: change layout state seqlock to a spinlock pnfs: add prefix to struct pnfs_layout_hdr fields pnfs: add prefix to struct pnfs_layout_segment fields ...	2011-01-11 15:11:56 -08:00
Stephen Hemminger	13ee6ac579	netfilter: fix race in conntrack between dump_table and destroy The netlink interface to dump the connection tracking table has a race when entries are deleted at the same time. A customer reported a crash and the backtrace showed thatctnetlink_dump_table was running while a conntrack entry was being destroyed. (see https://bugzilla.vyatta.com/show_bug.cgi?id=6402). According to RCU documentation, when using hlist_nulls the reader must handle the case of seeing a deleted entry and not proceed further down the linked list. The old code would continue which caused the scan to walk into the free list. This patch uses locking (rather than RCU) for this operation which is guaranteed safe, and no longer requires getting reference while doing dump operation. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2011-01-11 23:54:42 +01:00
Dang Hongwu	4b0ef1f223	ah: reload pointers to skb data after calling skb_cow_data() skb_cow_data() may allocate a new data buffer, so pointers on skb should be set after this function. Bug was introduced by commit `dff3bb06` ("ah4: convert to ahash") and `8631e9bd` ("ah6: convert to ahash"). Signed-off-by: Wang Xuefu <xuefu.wang@6wind.com> Acked-by: Krzysztof Witek <krzysztof.witek@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-11 14:03:10 -08:00
Nicolas Dichtel	fa6dd8a2c8	xfrm: check trunc_len in XFRMA_ALG_AUTH_TRUNC Maximum trunc length is defined by MAX_AH_AUTH_LEN (in bytes) and need to be checked when this value is set (in bits) by the user. In ah4.c and ah6.c a BUG_ON() checks this condiftion. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-11 14:03:09 -08:00
Eric Dumazet	c191a836a9	tcp: disallow bind() to reuse addr/port inet_csk_bind_conflict() logic currently disallows a bind() if it finds a friend socket (a socket bound on same address/port) satisfying a set of conditions : 1) Current (to be bound) socket doesnt have sk_reuse set OR 2) other socket doesnt have sk_reuse set OR 3) other socket is in LISTEN state We should add the CLOSE state in the 3) condition, in order to avoid two REUSEADDR sockets in CLOSE state with same local address/port, since this can deny further operations. Note : a prior patch tried to address the problem in a different (and buggy) way. (commit `fda48a0d7a` tcp: bind() fix when many ports are bound). Reported-by: Gaspar Chilingarov <gasparch@gmail.com> Reported-by: Daniel Baluta <daniel.baluta@gmail.com> Tested-by: Daniel Baluta <daniel.baluta@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-11 14:03:07 -08:00
J. Bruce Fields	f0418aa4b1	rpc: allow xprt_class->setup to return a preexisting xprt This allows us to reuse the xprt associated with a server connection if one has already been set up. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-01-11 15:04:10 -05:00
J. Bruce Fields	99de8ea962	rpc: keep backchannel xprt as long as server connection Multiple backchannels can share the same tcp connection; from rfc 5661 section 2.10.3.1: A connection's association with a session is not exclusive. A connection associated with the channel(s) of one session may be simultaneously associated with the channel(s) of other sessions including sessions associated with other client IDs. However, multiple backchannels share a connection, they must all share the same xid stream (hence the same rpc_xprt); the only way we have to match replies with calls at the rpc layer is using the xid. So, keep the rpc_xprt around as long as the connection lasts, in case we're asked to use the connection as a backchannel again. Requests to create new backchannel clients over a given server connection should results in creating new clients that reuse the existing rpc_xprt. But to start, just reject attempts to associate multiple rpc_xprt's with the same underlying bc_xprt. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-01-11 15:04:10 -05:00
J. Bruce Fields	d75faea330	rpc: move sk_bc_xprt to svc_xprt This seems obviously transport-level information even if it's currently used only by the server socket code. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-01-11 15:04:10 -05:00
J. Bruce Fields	a2c50f6916	Merge commit 'v2.6.37' into for-2.6.38-incoming I made a slight mess of Documentation/filesystems/Locking; resolve conflicts with upstream before fixing it up.	2011-01-11 15:02:19 -05:00
M. Mohan Kumar	219fd58be6	net/9p: Use proper data types Use proper data types for storing the count of the binary blob and length of a string. Without this patch length calculation of string will always result in -1 because of comparision between signed and unsigned integer. Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-01-11 09:58:07 -06:00
Kumar Sanghvi	d7b92affba	CAIF: Fix IPv6 support in receive path for GPRS/3G Checks version field of IP in the receive path for GPRS/3G data and appropriately sets the value of skb->protocol. Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-10 16:12:00 -08:00
Maxim Levitsky	545ecdc3b3	arp: allow to invalidate specific ARP entries IPv4 over firewire needs to be able to remove ARP entries from the ARP cache that belong to nodes that are removed, because IPv4 over firewire uses ARP packets for private information about nodes. This information becomes invalid as soon as node drops off the bus and when it reconnects, its only possible to start talking to it after it responded to an ARP packet. But ARP cache prevents such packets from being sent. Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-10 16:10:37 -08:00
Eric Dumazet	bfe0d0298f	net_sched: factorize qdisc stats handling HTB takes into account skb is segmented in stats updates. Generalize this to all schedulers. They should use qdisc_bstats_update() helper instead of manipulating bstats.bytes and bstats.packets Add bstats_update() helper too for classes that use gnet_stats_basic_packed fields. Note : Right now, TCQ_F_CAN_BYPASS shortcurt can be taken only if no stab is setup on qdisc. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-10 16:07:54 -08:00
Tom Herbert	36909ea438	net: Add alloc_netdev_mqs function Added alloc_netdev_mqs function which allows the number of transmit and receive queues to be specified independenty. alloc_netdev_mq was changed to a macro to call the new function. Also added alloc_etherdev_mqs with same purpose. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-10 16:05:30 -08:00
Dan Rosenberg	91b5c98c2e	caif: don't set connection request param size before copying data The size field should not be set until after the data is successfully copied in. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-10 16:00:54 -08:00
Dan Carpenter	facb4edc1e	phonet: some signedness bugs Dan Rosenberg pointed out that there were some signed comparison bugs in the phonet protocol. http://marc.info/?l=full-disclosure&m=129424528425330&w=2 The problem is that we check for array overflows but "protocol" is signed and we don't check for array underflows. If you have already have CAP_SYS_ADMIN then you could use the bugs to get root, or someone could cause an oops by mistake. Signed-off-by: Dan Carpenter <error27@gmail.com> Acked-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-10 13:33:17 -08:00
Trond Myklebust	68c404b18f	Merge branch 'bugfixes' into nfs-for-2.6.38 Conflicts: fs/nfs/nfs2xdr.c fs/nfs/nfs3xdr.c fs/nfs/nfs4xdr.c	2011-01-10 14:48:02 -05:00
Trond Myklebust	6650239a4b	NFS: Don't use vm_map_ram() in readdir vm_map_ram() is not available on NOMMU platforms, and causes trouble on incoherrent architectures such as ARM when we access the page data through both the direct and the virtual mapping. The alternative is to use the direct mapping to access page data for the case when we are not crossing a page boundary, but to copy the data into a linear scratch buffer when we are accessing data that spans page boundaries. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Marc Kleine-Budde <mkl@pengutronix.de> Cc: stable@kernel.org [2.6.37]	2011-01-10 14:45:01 -05:00
Eric Dumazet	83723d6071	netfilter: x_tables: dont block BH while reading counters Using "iptables -L" with a lot of rules have a too big BH latency. Jesper mentioned ~6 ms and worried of frame drops. Switch to a per_cpu seqlock scheme, so that taking a snapshot of counters doesnt need to block BH (for this cpu, but also other cpus). This adds two increments on seqlock sequence per ipt_do_table() call, its a reasonable cost for allowing "iptables -L" not block BH processing. Reported-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McHardy <kaber@trash.net> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2011-01-10 20:11:38 +01:00
Jesse Gross	0363466866	net offloading: Convert checksums to use centrally computed features. In order to compute the features for other offloads (primarily scatter/gather), we need to first check the ability of the NIC to offload the checksum for the packet. Since we have already computed this, we can directly use the result instead of figuring it out again. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-09 23:35:35 -08:00
Jesse Gross	02932ce9e2	net offloading: Convert skb_need_linearize() to use precomputed features. This switches skb_need_linearize() to use the features that have been centrally computed. In doing so, this fixes a problem where scatter/gather should not be used because the card does not support checksum offloading on that type of packet. On device registration we only check that some form of checksum offloading is available if scatter/gatther is enabled but we must also check at transmission time. Examples of this include IPv6 or vlan packets on a NIC that only supports IPv4 offloading. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-09 23:35:35 -08:00
Jesse Gross	91ecb63c07	net offloading: Convert dev_gso_segment() to use precomputed features. This switches dev_gso_segment() to use the device features computed by the centralized routine. In doing so, it fixes a problem where it would always use dev->features, instead of those appropriate to the number of vlan tags if any are present. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-09 23:35:34 -08:00
Jesse Gross	fc741216db	net offloading: Pass features into netif_needs_gso(). Now that there is a single function that can compute the device features relevant to a packet, we don't want to run it for each offload. This converts netif_needs_gso() to take the features of the device, rather than computing them itself. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-09 23:35:34 -08:00
Jesse Gross	f01a5236bd	net offloading: Generalize netif_get_vlan_features(). netif_get_vlan_features() is currently only used by netif_needs_gso(), so it only concerns itself with GSO features. However, several other places also should take into account the contents of the packet when deciding whether to offload to hardware. This generalizes the function to return features about all of the various forms of offloading. Since offloads tend to be linked together, this avoids duplicating the logic in each location (i.e. the scatter/gather code also needs the checksum logic). Suggested-by: Michał Mirosław <mirqus@gmail.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-09 23:35:33 -08:00
Jesse Gross	9497a0518e	net offloading: Accept NETIF_F_HW_CSUM for all protocols. We currently only have software fallback for one type of checksum: the TCP/UDP one's complement. This means that a protocol that uses hardware offloading for a different type of checksum (FCoE, SCTP) must directly check the device's features and do the right thing ahead of time. By the time we get to dev_can_checksum(), we're only deciding whether to apply the one algorithm in software or hardware. NETIF_F_HW_CSUM has the same capabilities as the software version, so we should always use it if present. The primary advantage of this is multiply tagged vlans can use hardware checksumming. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-09 23:35:33 -08:00
Randy Dunlap	697d0e338c	net: fix kernel-doc warning in core/filter.c Fix new kernel-doc notation warning in net/core/filter.c: Warning(net/core/filter.c:172): No description found for parameter 'fentry' Warning(net/core/filter.c:172): Excess function parameter 'filter' description in 'sk_run_filter' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-09 16:26:51 -08:00
Jan Engelhardt	0ab03c2b14	netlink: test for all flags of the NLM_F_DUMP composite Due to NLM_F_DUMP is composed of two bits, NLM_F_ROOT \| NLM_F_MATCH, when doing "if (x & NLM_F_DUMP)", it tests for _either_ of the bits being set. Because NLM_F_MATCH's value overlaps with NLM_F_EXCL, non-dump requests with NLM_F_EXCL set are mistaken as dump requests. Substitute the condition to test for _all_ bits being set. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-09 16:25:03 -08:00
David S. Miller	14934efab6	Merge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/net-next-2.6	2011-01-09 16:16:57 -08:00
Linus Torvalds	23d69b09b7	Merge branch 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq * 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (33 commits) usb: don't use flush_scheduled_work() speedtch: don't abuse struct delayed_work media/video: don't use flush_scheduled_work() media/video: explicitly flush request_module work ioc4: use static work_struct for ioc4_load_modules() init: don't call flush_scheduled_work() from do_initcalls() s390: don't use flush_scheduled_work() rtc: don't use flush_scheduled_work() mmc: update workqueue usages mfd: update workqueue usages dvb: don't use flush_scheduled_work() leds-wm8350: don't use flush_scheduled_work() mISDN: don't use flush_scheduled_work() macintosh/ams: don't use flush_scheduled_work() vmwgfx: don't use flush_scheduled_work() tpm: don't use flush_scheduled_work() sonypi: don't use flush_scheduled_work() hvsi: don't use flush_scheduled_work() xen: don't use flush_scheduled_work() gdrom: don't use flush_scheduled_work() ... Fixed up trivial conflict in drivers/media/video/bt8xx/bttv-input.c as per Tejun.	2011-01-07 16:58:04 -08:00
Linus Torvalds	fb5131e188	Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6 * 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6: (65 commits) [S390] prevent unneccesary loops_per_jiffy recalculation [S390] cpuinfo: use get_online_cpus() instead of preempt_disable() [S390] smp: remove cpu hotplug messages [S390] mutex: enable spinning mutex on s390 [S390] mutex: Introduce arch_mutex_cpu_relax() [S390] cio: fix ccwgroup unregistration race condition [S390] perf: add DWARF register lookup for s390 [S390] cleanup ftrace backend functions [S390] ptrace cleanup [S390] smp/idle: call init_idle() before starting a new cpu [S390] smp: delay idle task creation [S390] dasd: Correct retry counter for terminated I/O. [S390] dasd: Add support for raw ECKD access. [S390] dasd: Prevent deadlock during suspend/resume. [S390] dasd: Improve handling of stolen DASD reservation [S390] dasd: do path verification for paths added at runtime [S390] dasd: add High Performance FICON multitrack support [S390] cio: reduce memory consumption of itcw structures [S390] nmi: enable machine checks early [S390] qeth: buffer count imbalance ...	2011-01-07 14:50:50 -08:00
Linus Torvalds	b4a45f5fe8	Merge branch 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin * 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin: (57 commits) fs: scale mntget/mntput fs: rename vfsmount counter helpers fs: implement faster dentry memcmp fs: prefetch inode data in dcache lookup fs: improve scalability of pseudo filesystems fs: dcache per-inode inode alias locking fs: dcache per-bucket dcache hash locking bit_spinlock: add required includes kernel: add bl_list xfs: provide simple rcu-walk ACL implementation btrfs: provide simple rcu-walk ACL implementation ext2,3,4: provide simple rcu-walk ACL implementation fs: provide simple rcu-walk generic_check_acl implementation fs: provide rcu-walk aware permission i_ops fs: rcu-walk aware d_revalidate method fs: cache optimise dentry and inode for rcu-walk fs: dcache reduce branches in lookup path fs: dcache remove d_mounted fs: fs_struct use seqlock fs: rcu-walk for path lookup ...	2011-01-07 08:56:33 -08:00
Gerrit Renker	bfbb23466a	dccp: make upper bound for seq_window consistent on 32/64 bit The 'seq_window' sysctl sets the initial value for the DCCP Sequence Window, which may range from 32..2^46-1 (RFC 4340, 7.5.2). The patch sets the upper bound consistently to 2^32-1 on both 32 and 64 bit systems, which should be sufficient - with a RTT of 1sec and 1-byte packets, a seq_window of 2^32-1 corresponds to a link speed of 34 Gbps. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2011-01-07 12:22:44 +01:00
Samuel Jero	763dadd47c	dccp: fix bug in updating the GSR Currently dccp_check_seqno allows any valid packet to update the Greatest Sequence Number Received, even if that packet's sequence number is less than the current GSR. This patch adds a check to make sure that the new packet's sequence number is greater than GSR. Signed-off-by: Samuel Jero <sj323707@ohio.edu> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2011-01-07 12:22:43 +01:00
Samuel Jero	2cf5be93d1	dccp: fix return value for sequence-invalid packets Currently dccp_check_seqno returns 0 (indicating a valid packet) if the acknowledgment number is out of bounds and the sync that RFC 4340 mandates at this point is currently being rate-limited. This function should return -1, indicating an invalid packet. Signed-off-by: Samuel Jero <sj323707@ohio.edu> Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2011-01-07 12:22:43 +01:00
Nick Piggin	b3e19d924b	fs: scale mntget/mntput The problem that this patch aims to fix is vfsmount refcounting scalability. We need to take a reference on the vfsmount for every successful path lookup, which often go to the same mount point. The fundamental difficulty is that a "simple" reference count can never be made scalable, because any time a reference is dropped, we must check whether that was the last reference. To do that requires communication with all other CPUs that may have taken a reference count. We can make refcounts more scalable in a couple of ways, involving keeping distributed counters, and checking for the global-zero condition less frequently. - check the global sum once every interval (this will delay zero detection for some interval, so it's probably a showstopper for vfsmounts). - keep a local count and only taking the global sum when local reaches 0 (this is difficult for vfsmounts, because we can't hold preempt off for the life of a reference, so a counter would need to be per-thread or tied strongly to a particular CPU which requires more locking). - keep a local difference of increments and decrements, which allows us to sum the total difference and hence find the refcount when summing all CPUs. Then, keep a single integer "long" refcount for slow and long lasting references, and only take the global sum of local counters when the long refcount is 0. This last scheme is what I implemented here. Attached mounts and process root and working directory references are "long" references, and everything else is a short reference. This allows scalable vfsmount references during path walking over mounted subtrees and unattached (lazy umounted) mounts with processes still running in them. This results in one fewer atomic op in the fastpath: mntget is now just a per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock and non-atomic decrement in the common case. However code is otherwise bigger and heavier, so single threaded performance is basically a wash. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:33 +11:00
Nick Piggin	4b936885ab	fs: improve scalability of pseudo filesystems Regardless of how much we possibly try to scale dcache, there is likely always going to be some fundamental contention when adding or removing children under the same parent. Pseudo filesystems do not seem need to have connected dentries because by definition they are disconnected. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:32 +11:00
Nick Piggin	fb045adb99	fs: dcache reduce branches in lookup path Reduce some branches and memory accesses in dcache lookup by adding dentry flags to indicate common d_ops are set, rather than having to check them. This saves a pointer memory access (dentry->d_op) in common path lookup situations, and saves another pointer load and branch in cases where we have d_op but not the particular operation. Patched with: git grep -E '[.>]([[:space:]])d_op([[:space:]])=' \| xargs sed -e 's/$[^\t ]$->d_op = $.$;/d_set_d_op(\1, \2);/' -e 's/$[^\t ]$\.d_op = $.$;/d_set_d_op(\&\1, \2);/' -i Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:28 +11:00
Nick Piggin	ff0c7d15f9	fs: avoid inode RCU freeing for pseudo fs Pseudo filesystems that don't put inode on RCU list or reachable by rcu-walk dentries do not need to RCU free their inodes. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:26 +11:00
Nick Piggin	fa0d7e3de6	fs: icache RCU free inodes RCU free the struct inode. This will allow: - Subsequent store-free path walking patch. The inode must be consulted for permissions when walking, so an RCU inode reference is a must. - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want to take i_lock no longer need to take sb_inode_list_lock to walk the list in the first place. This will simplify and optimize locking. - Could remove some nested trylock loops in dcache code - Could potentially simplify things a bit in VM land. Do not need to take the page lock to follow page->mapping. The downsides of this is the performance cost of using RCU. In a simple creat/unlink microbenchmark, performance drops by about 10% due to inability to reuse cache-hot slab objects. As iterations increase and RCU freeing starts kicking over, this increases to about 20%. In cases where inode lifetimes are longer (ie. many inodes may be allocated during the average life span of a single inode), a lot of this cache reuse is not applicable, so the regression caused by this patch is smaller. The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU, however this adds some complexity to list walking and store-free path walking, so I prefer to implement this at a later date, if it is shown to be a win in real situations. I haven't found a regression in any non-micro benchmark so I doubt it will be a problem. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:26 +11:00
Nick Piggin	fe15ce446b	fs: change d_delete semantics Change d_delete from a dentry deletion notification to a dentry caching advise, more like ->drop_inode. Require it to be constant and idempotent, and not take d_lock. This is how all existing filesystems use the callback anyway. This makes fine grained dentry locking of dput and dentry lru scanning much simpler. Signed-off-by: Nick Piggin <npiggin@kernel.dk>	2011-01-07 17:50:18 +11:00
Andy Adamson	4a19de0f4b	NFS rename client back channel transport field Differentiate from server backchannel Signed-off-by: Andy Adamson <andros@netapp.com> Acked-by: Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 14:46:25 -05:00
Andy Adamson	2c2618c6f2	NFS associate sessionid with callback connection The sessions based callback service is started prior to the CREATE_SESSION call so that it can handle CB_NULL requests which can be sent before the CREATE_SESSION call returns and the session ID is known. Set the callback sessionid after a sucessful CREATE_SESSION. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 14:46:24 -05:00
Andy Adamson	16b2d1e1d1	SUNRPC register and unregister the back channel transport Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 14:46:23 -05:00
Andy Adamson	1f11a034cd	SUNRPC new transport for the NFSv4.1 shared back channel Move the current sock create and destroy routines into the new transport ops. Back channel socket will be destroyed by the svc_closs_all call in svc_destroy. Added check: only TCP supported on shared back channel. Signed-off-by: Andy Adamson <andros@netapp.com> Acked-by: Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 14:46:23 -05:00
Andy Adamson	71e161a6a9	SUNRPC fix bc_send print Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 14:46:23 -05:00
Andy Adamson	4b5b3ba16b	SUNRPC move svc_drop to caller of svc_process_common The NFSv4.1 shared back channel does not need to call svc_drop because the callback service never outlives the single connection it services, and it reuses it's buffers and keeps the trasport. Signed-off-by: Andy Adamson <andros@netapp.com> Acked-by: Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-06 14:46:23 -05:00
Changli Gao	f88de8de5a	net: bridge: check the length of skb after nf_bridge_maybe_copy_header() Since nf_bridge_maybe_copy_header() may change the length of skb, we should check the length of skb after it to handle the ppoe skbs. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-06 11:33:05 -08:00
Pablo Neira Ayuso	cba85b532e	netfilter: fix export secctx error handling In `1ae4de0cdf`, the secctx was exported via the /proc/net/netfilter/nf_conntrack and ctnetlink interfaces instead of the secmark. That patch introduced the use of security_secid_to_secctx() which may return a non-zero value on error. In one of my setups, I have NF_CONNTRACK_SECMARK enabled but no security modules. Thus, security_secid_to_secctx() returns a negative value that results in the breakage of the /proc and `conntrack -L' outputs. To fix this, we skip the inclusion of secctx if the aforementioned function fails. This patch also fixes the dynamic netlink message size calculation if security_secid_to_secctx() returns an error, since its logic is also wrong. This problem exists in Linux kernel >= 2.6.37. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-06 11:25:00 -08:00
Changli Gao	f682cefa5a	netfilter: fix the race when initializing nf_ct_expect_hash_rnd Since nf_ct_expect_dst_hash() may be called without nf_conntrack_lock locked, nf_ct_expect_hash_rnd should be initialized in the atomic way. In this patch, we use nf_conntrack_hash_rnd instead of nf_ct_expect_hash_rnd. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-06 11:22:20 -08:00
Eric Dumazet	6623e3b24a	ipv4: IP defragmentation must be ECN aware RFC3168 (The Addition of Explicit Congestion Notification to IP) states : 5.3. Fragmentation ECN-capable packets MAY have the DF (Don't Fragment) bit set. Reassembly of a fragmented packet MUST NOT lose indications of congestion. In other words, if any fragment of an IP packet to be reassembled has the CE codepoint set, then one of two actions MUST be taken: * Set the CE codepoint on the reassembled packet. However, this MUST NOT occur if any of the other fragments contributing to this reassembly carries the Not-ECT codepoint. * The packet is dropped, instead of being reassembled, for any other reason. This patch implements this requirement for IPv4, choosing the first action : If one fragment had NO-ECT codepoint reassembled frame has NO-ECT ElIf one fragment had CE codepoint reassembled frame has CE Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-06 11:21:30 -08:00
Dan Carpenter	2a8fe00374	dcb: use after free in dcb_flushapp() The original code has a use after free bug because it's not using the _safe() version of the list_for_each_entry() macro. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-06 11:16:54 -08:00
Dan Carpenter	70bfa2d2e1	dcb: unlock on error in dcbnl_ieee_get() There is a "goto nla_put_failure" hidden inside the NLA_PUT() macro, but we're holding the dcb_lock so we need to unlock first. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-06 11:16:54 -08:00
David S. Miller	5f9251cb93	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6	2011-01-06 10:55:42 -08:00
Eric Dumazet	2c6607c611	net: add POLLPRI to sock_def_readable() Leonardo Chiquitto found poll() could block forever on tcp sockets and Urgent data was received, if the event flag only contains POLLPRI. He did a bisection and found commit `4938d7e023` (poll: avoid extra wakeups in select/poll) was the source of the problem. Problem is TCP sockets use standard sock_def_readable() function for their sk_data_ready() handler, and sock_def_readable() doesnt signal POLLPRI. Only TCP is affected by the problem. Adding POLLPRI to the list of flags might trigger unnecessary schedules, but URGENT handling is such a seldom used feature this seems a good compromise. Thanks a lot to Leonardo for providing the bisection result and a test program as well. Reference : http://www.spinics.net/lists/netdev/msg151793.html Reported-and-bisected-by: Leonardo Chiquitto <leonardo.lists@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-06 10:54:29 -08:00
David S. Miller	3610cda53f	af_unix: Avoid socket->sk NULL OOPS in stream connect security hooks. unix_release() can asynchornously set socket->sk to NULL, and it does so without holding the unix_state_lock() on "other" during stream connects. However, the reverse mapping, sk->sk_socket, is only transitioned to NULL under the unix_state_lock(). Therefore make the security hooks follow the reverse mapping instead of the forward mapping. Reported-by: Jeremy Fitzhardinge <jeremy@goop.org> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-05 15:38:53 -08:00
Eric Dumazet	44b8288308	net_sched: pfifo_head_drop problem commit `57dbb2d83d` (sched: add head drop fifo queue) introduced pfifo_head_drop, and broke the invariant that sch->bstats.bytes and sch->bstats.packets are COUNTER (increasing counters only) This can break estimators because est_timer() handles unsigned deltas only. A decreasing counter can then give a huge unsigned delta. My mid term suggestion would be to change things so that sch->bstats.bytes and sch->bstats.packets are incremented in dequeue() only, not at enqueue() time. We also could add drop_bytes/drop_packets and provide estimations of drop rates. It would be more sensible anyway for very low speeds, and big bursts. Right now, if we drop packets, they still are accounted in byte/packets abolute counters and rate estimators. Before this mid term change, this patch makes pfifo_head_drop behavior similar to other qdiscs in case of drops : Dont decrement sch->bstats.bytes and sch->bstats.packets Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-05 13:39:17 -08:00
Johannes Berg	06778b1c38	mac80211: remove stray extern Somehow this snuck into my earlier patch, and only now did I see a compiler warning: net/mac80211/led.c:218:13: warning: function '__ieee80211_create_tpt_led_trigger' with external linkage has definition Remove the stray extern. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-01-05 16:07:12 -05:00
Johannes Berg	90fc4b3a5b	mac80211: implement off-channel TX using hw r-o-c offload When the driver has remain-on-channel offload, implement off-channel transmission using that primitive. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-01-05 16:07:12 -05:00
Johannes Berg	21f8358964	mac80211: implement hardware offload for remain-on-channel This allows drivers to support remain-on-channel offload if they implement smarter timing or need to use a device implementation like iwlwifi. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-01-05 16:07:12 -05:00
John W. Linville	c96e96354a	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem Conflicts: net/bluetooth/Makefile	2011-01-05 16:06:25 -05:00
John W. Linville	6303710d7a	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6	2011-01-05 14:35:41 -05:00
Heiko Carstens	052ff461c8	[S390] irq: have detailed statistics for interrupt types Up to now /proc/interrupts only has statistics for external and i/o interrupts but doesn't split up them any further. This patch adds a line for every single interrupt source so that it is possible to easier tell what the machine is/was doing. Part of the output now looks like this; CPU0 CPU2 CPU4 EXT: 3898 4232 2305 I/O: 782 315 245 CLK: 1029 1964 727 [EXT] Clock Comparator IPI: 2868 2267 1577 [EXT] Signal Processor TMR: 0 0 0 [EXT] CPU Timer TAL: 0 0 0 [EXT] Timing Alert PFL: 0 0 0 [EXT] Pseudo Page Fault [...] NMI: 0 1 1 [NMI] Machine Checks Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2011-01-05 12:47:25 +01:00
J. Bruce Fields	fdef7aa5d4	svcrpc: ensure cache_check caller sees updated entry Supposes cache_check runs simultaneously with an update on a different CPU: cache_check task doing update ^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^ 1. test for CACHE_VALID 1'. set entry->data & !CACHE_NEGATIVE 2. use entry->data 2'. set CACHE_VALID If the two memory writes performed in step 1' and 2' appear misordered with respect to the reads in step 1 and 2, then the caller could get stale data at step 2 even though it saw CACHE_VALID set on the cache entry. Add memory barriers to prevent this. Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-01-04 16:49:25 -05:00
J. Bruce Fields	6bab93f87e	svcrpc: take lock on turning entry NEGATIVE in cache_check We attempt to turn a cache entry negative in place. But that entry may already have been filled in by some other task since we last checked whether it was valid, so we could be modifying an already-valid entry. If nothing else there's a likely leak in such a case when the entry is eventually put() and contents are not freed because it has CACHE_NEGATIVE set. So, take the cache_lock just as sunrpc_cache_update() does. Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-01-04 16:49:24 -05:00
J. Bruce Fields	9e701c6109	svcrpc: simpler request dropping Currently we use -EAGAIN returns to determine when to drop a deferred request. On its own, that is error-prone, as it makes us treat -EAGAIN returns from other functions specially to prevent inadvertent dropping. So, use a flag on the request instead. Returning an error on request deferral is still required, to prevent further processing, but we no longer need worry that an error return on its own could result in a drop. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-01-04 16:49:22 -05:00
J. Bruce Fields	d76d1815f3	svcrpc: avoid double reply caused by deferral race Commit `d29068c431` "sunrpc: Simplify cache_defer_req and related functions." asserted that cache_check() could determine success or failure of cache_defer_req() by checking the CACHE_PENDING bit. This isn't quite right. We need to know whether cache_defer_req() created a deferred request, in which case sending an rpc reply has become the responsibility of the deferred request, and it is important that we not send our own reply, resulting in two different replies to the same request. And the CACHE_PENDING bit doesn't tell us that; we could have succesfully created a deferred request at the same time as another thread cleared the CACHE_PENDING bit. So, partially revert that commit, to ensure that cache_check() returns -EAGAIN if and only if a deferred request has been created. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Acked-by: NeilBrown <neilb@suse.de>	2011-01-04 16:49:21 -05:00
J. Bruce Fields	bdd5f05d91	SUNRPC: Remove more code when NFSD_DEPRECATED is not configured Signed-off-by: NeilBrown <neilb@suse.de> [bfields@redhat.com: moved svcauth_unix_purge outside ifdef's.] Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-01-04 16:48:02 -05:00
J. Bruce Fields	31f7aa65f5	svcrpc: modifying valid sunrpc cache entries is racy Once a sunrpc cache entry is VALID, we should be replacing it (and allowing any concurrent users to destroy it on last put) instead of trying to update it in place. Otherwise someone referencing the ip_map we're modifying here could try to use the m_client just as we're putting the last reference. The bug should only be seen by users of the legacy nfsd interfaces. (Thanks to Neil for suggestion to use sunrpc_invalidate.) Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-01-04 16:47:29 -05:00
David S. Miller	dbbe68bb12	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2011-01-04 11:57:25 -08:00
Johannes Berg	b5c34f662a	mac80211: fix some key comments and code The key documentation is slightly out of date, fix that. Also, the list entry in the key struct is no longer used that way, so list_del_init() isn't necessary any more there. Finally, ieee80211_key_link() is no longer invoked under RCU read lock, but rather with an appropriate station lock held. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-01-04 14:46:14 -05:00
Christian Lamparter	707e634326	Revert "mac80211: temporarily disable reorder release timer" This reverts enables the reorder release timer once again. The issues laid out in: <http://www.spinics.net/lists/linux-wireless/msg57214.html> Have been addressed by: mac80211: serialize rx path workers mac80211: ignore PSM bit of reordered frames Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-01-04 14:46:13 -05:00
Christian Lamparter	24a8fdad35	mac80211: serialize rx path workers This patch addresses the issue of serialization between the main rx path and various reorder release timers. <http://www.spinics.net/lists/linux-wireless/msg57214.html> It converts the previously local "frames" queue into a global rx queue [rx_skb_queue]. This way, everyone (be it the main rx-path or some reorder release timeout) can add frames to it. Only one active rx handler worker [ieee80211_rx_handlers] is needed. All other threads which have lost the race of "runnning_rx_handler" can now simply "return", knowing that the thread who had the "edge" will also take care of their workload. Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-01-04 14:46:13 -05:00
Bob Copeland	ff039c6fb3	cfg80211: fix transposition of words in printk Fixes the misplaced article in the following: "cfg80211: Updating information on frequency 5785 MHz for 20 a MHz width channel with regulatory rule:" Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-01-04 14:43:01 -05:00
Joel A Fernandes	f76b57b47e	mac80211: Fix mesh portal communication with other mesh nodes. Fixed a bug where if a mesh interface has a different MAC address from its bridge interface, then it would not be able to send data traffic to any other mesh node. This also adds support for communication between mesh nodes and external bridged nodes by using a 6 address format if the source is a node within the mesh and the destination is an external node proxied by a mesh portal. Signed-off-by: Joel A Fernandes <agnel.joel@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-01-04 14:43:01 -05:00
Christian Lamparter	4cfda47b69	mac80211: ignore PSM bit of reordered frames This patch tackles one of the problems of my reorder release timer patch from August. <http://www.spinics.net/lists/linux-wireless/msg57214.html> => What if the reorder release triggers and ap_sta_ps_end (called by ieee80211_rx_h_sta_process) accidentally clears the WLAN_STA_PS_STA flag, because 100ms ago - when the STA was still active - frames were put into the reorder buffer. Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-01-04 14:35:15 -05:00
Joel Sing	9fc3bbb4a7	ipv4/route.c: respect prefsrc for local routes The preferred source address is currently ignored for local routes, which results in all local connections having a src address that is the same as the local dst address. Fix this by respecting the preferred source address when it is provided for local routes. This bug can be demonstrated as follows: # ifconfig dummy0 192.168.0.1 # ip route show table local \| grep local.dummy0 local 192.168.0.1 dev dummy0 proto kernel scope host src 192.168.0.1 # ip route change table local local 192.168.0.1 dev dummy0 \ proto kernel scope host src 127.0.0.1 # ip route show table local \| grep local.dummy0 local 192.168.0.1 dev dummy0 proto kernel scope host src 127.0.0.1 We now establish a local connection and verify the source IP address selection: # nc -l 192.168.0.1 3128 & # nc 192.168.0.1 3128 & # netstat -ant \| grep 192.168.0.1:3128.*EST tcp 0 0 192.168.0.1:3128 192.168.0.1:33228 ESTABLISHED tcp 0 0 192.168.0.1:33228 192.168.0.1:3128 ESTABLISHED Signed-off-by: Joel Sing <jsing@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-04 11:35:12 -08:00
John W. Linville	782a9e31e8	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/padovan/bluetooth-next-2.6	2011-01-04 14:25:28 -05:00
Johannes Berg	d2460f4b2f	mac80211: add missing synchronize_rcu commit `ad0e2b5a00` Author: Johannes Berg <johannes.berg@intel.com> Date: Tue Jun 1 10:19:19 2010 +0200 mac80211: simplify key locking removed the synchronization against RCU and thus opened a race window where we can use a key for TX while it is already freed. Put a synchronisation into the right place to close that window. Reported-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Cc: stable@kernel.org [2.6.36+] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-01-04 14:17:23 -05:00
Milton Miller	919bbad580	mac80211: fix mesh forwarding when ratelimited too Commit `b51aff057c` said: Under memory pressure, the mac80211 mesh code may helpfully print a message that it failed to clone a mesh frame and then will proceed to crash trying to use it anyway. Fix that. Avoid the reference whenever the frame copy is unsuccessful regardless of the debug message being suppressed or printed. Cc: stable@kernel.org [2.6.27+] Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-01-04 14:17:23 -05:00
Trond Myklebust	beb0f0a9fb	kernel panic when mount NFSv4 On Tue, 2010-12-14 at 16:58 +0800, Mi Jinlong wrote: > Hi, > > When testing NFSv4 at RHEL6 with kernel 2.6.32, I got a kernel panic > at NFS client's __rpc_create_common function. > > The panic place is: > rpc_mkpipe > __rpc_lookup_create() <=== find pipefile idmap > __rpc_mkpipe() <=== pipefile is idmap > __rpc_create_common() > **** BUG_ON(!d_unhashed(dentry)); **** panic > > It means that the dentry's d_flags have be set DCACHE_UNHASHED, > but it should not be set here. > > Is someone known this bug? or give me some idea? > > A reproduce program is append, but it can't reproduce the bug every time. > the export is: "/nfsroot (rw,no_root_squash,fsid=0,insecure)" > > And the panic message is append. > > ============================================================================ > #!/bin/sh > > LOOPTOTAL=768 > LOOPCOUNT=0 > ret=0 > > while [ $LOOPCOUNT -ne $LOOPTOTAL ] > do > ((LOOPCOUNT += 1)) > service nfs restart > /usr/sbin/rpc.idmapd > mount -t nfs4 127.0.0.1:/ /mnt\|\| return 1; > ls -l /var/lib/nfs/rpc_pipefs/nfs// > umount /mnt > echo $LOOPCOUNT > done > > =============================================================================== > Code: af 60 01 00 00 89 fa 89 f0 e8 64 cf 89 f0 e8 5c 7c 64 cf 31 c0 8b 5c 24 10 8b > 74 24 14 8b 7c 24 18 8b 6c 24 1c 83 c4 20 c3 <0f> 0b eb fc 8b 46 28 c7 44 24 08 20 > de ee f0 c7 44 24 04 56 ea > EIP:[<f0ee92ea>] __rpc_create_common+0x8a/0xc0 [sunrpc] SS:ESP 0068:eccb5d28 > ---[ end trace 8f5606cd08928ed2]--- > Kernel panic - not syncing: Fatal exception > Pid:7131, comm: mount.nfs4 Tainted: G D -------------------2.6.32 #1 > Call Trace: > [<c080ad18>] ? panic+0x42/0xed > [<c080e42c>] ? oops_end+0xbc/0xd0 > [<c040b090>] ? do_invalid_op+0x0/0x90 > [<c040b10f>] ? do_invalid_op+0x7f/0x90 > [<f0ee92ea>] ? __rpc_create_common+0x8a/0xc0[sunrpc] > [<f0edc433>] ? rpc_free_task+0x33/0x70[sunrpc] > [<f0ed6508>] ? prc_call_sync+0x48/0x60[sunrpc] > [<f0ed656e>] ? rpc_ping+0x4e/0x60[sunrpc] > [<f0ed6eaf>] ? rpc_create+0x38f/0x4f0[sunrpc] > [<c080d80b>] ? error_code+0x73/0x78 > [<f0ee92ea>] ? __rpc_create_common+0x8a/0xc0[sunrpc] > [<c0532bda>] ? d_lookup+0x2a/0x40 > [<f0ee94b1>] ? rpc_mkpipe+0x111/0x1b0[sunrpc] > [<f10a59f4>] ? nfs_create_rpc_client+0xb4/0xf0[nfs] > [<f10d6c6d>] ? nfs_fscache_get_client_cookie+0x1d/0x50[nfs] > [<f10d3fcb>] ? nfs_idmap_new+0x7b/0x140[nfs] > [<c05e76aa>] ? strlcpy+0x3a/0x60 > [<f10a60ca>] ? nfs4_set_client+0xea/0x2b0[nfs] > [<f10a6d0c>] ? nfs4_create_server+0xac/0x1b0[nfs] > [<c04f1400>] ? krealloc+0x40/0x50 > [<f10b0e8b>] ? nfs4_remote_get_sb+0x6b/0x250[nfs] > [<c04f14ec>] ? kstrdup+0x3c/0x60 > [<c0520739>] ? vfs_kern_mount+0x69/0x170 > [<f10b1a3c>] ? nfs_do_root_mount+0x6c/0xa0[nfs] > [<f10b1b47>] ? nfs4_try_mount+0x37/0xa0[nfs] > [<f10afe6d>] ? nfs4_validate_text_mount_data+-x7d/0xf0[nfs] > [<f10b1c42>] ? nfs4_get_sb+0x92/0x2f0 > [<c0520739>] ? vfs_kern_mount+0x69/0x170 > [<c05366d2>] ? get_fs_type+0x32/0xb0 > [<c052089f>] ? do_kern_mount+0x3f/0xe0 > [<c053954f>] ? do_mount+0x2ef/0x740 > [<c0537740>] ? copy_mount_options+0xb0/0x120 > [<c0539a0e>] ? sys_mount+0x6e/0xa0 Hi, Does the following patch fix the problem? Cheers Trond -------------------------- SUNRPC: Fix a BUG in __rpc_create_common From: Trond Myklebust <Trond.Myklebust@netapp.com> Mi Jinlong reports: When testing NFSv4 at RHEL6 with kernel 2.6.32, I got a kernel panic at NFS client's __rpc_create_common function. The panic place is: rpc_mkpipe __rpc_lookup_create() <=== find pipefile idmap __rpc_mkpipe() <=== pipefile is idmap __rpc_create_common() **** BUG_ON(!d_unhashed(dentry)); **** panic The test is wrong: we can find ourselves with a hashed negative dentry here if the idmapper tried to look up the file before we got round to creating it. Just replace the BUG_ON() with a d_drop(dentry). Reported-by: Mi Jinlong <mijinlong@cn.fujitsu.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-04 13:10:38 -05:00
Tomas Winkler	1a9180a20f	net/bridge: fix trivial sparse errors net/bridge//br_stp_if.c:148:66: warning: conversion of net/bridge//br_stp_if.c:148:66: int to net/bridge//br_stp_if.c:148:66: int enum umh_wait net/bridge//netfilter/ebtables.c:1150:30: warning: Using plain integer as NULL pointer Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-03 13:29:18 -08:00
Eric Dumazet	0dfb33a0d7	sch_red: report backlog information Provide child qdisc backlog (byte count) information so that "tc -s qdisc" can report it to user. packet count is already correctly provided. qdisc red 11: parent 1:11 limit 60Kb min 15Kb max 45Kb ecn Sent 3116427684 bytes 1415782 pkt (dropped 8, overlimits 7866 requeues 0) rate 242385Kbit 13630pps backlog 13560b 8p requeues 0 marked 7865 early 1 pdrop 7 other 0 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-03 12:13:15 -08:00
Shmulik Ravid	7f891cf1fc	dcbnl: more informed return values for new dcbnl routines More accurate return values for the following (new) dcbnl routines: dcbnl_getdcbx() dcbnl_setdcbx() dcbnl_getfeatcfg() dcbnl_setfeatcfg() Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-03 12:12:11 -08:00
Florian Westphal	e6f26129eb	bridge: stp: ensure mac header is set commit `bf9ae5386b` (llc: use dev_hard_header) removed the skb_reset_mac_header call from llc_mac_hdr_init. This seems fine itself, but br_send_bpdu() invokes ebtables LOCAL_OUT. We oops in ebt_basic_match() because it assumes eth_hdr(skb) returns a meaningful result. Cc: acme@ghostprotocols.net References: https://bugzilla.kernel.org/show_bug.cgi?id=24532 Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-03 12:09:33 -08:00
Tomas Winkler	9d89081d69	bridge: fix br_multicast_ipv6_rcv for paged skbs use pskb_may_pull to access ipv6 header correctly for paged skbs It was omitted in the bridge code leading to crash in blind __skb_pull since the skb is cloned undonditionally we also simplify the the exit path this fixes bug https://bugzilla.kernel.org/show_bug.cgi?id=25202 Dec 15 14:36:40 User-PC hostapd: wlan0: STA 00:15:00:60:5d:34 IEEE 802.11: authenticated Dec 15 14:36:40 User-PC hostapd: wlan0: STA 00:15:00:60:5d:34 IEEE 802.11: associated (aid 2) Dec 15 14:36:40 User-PC hostapd: wlan0: STA 00:15:00:60:5d:34 RADIUS: starting accounting session 4D0608A3-00000005 Dec 15 14:36:41 User-PC kernel: [175576.120287] ------------[ cut here ]------------ Dec 15 14:36:41 User-PC kernel: [175576.120452] kernel BUG at include/linux/skbuff.h:1178! Dec 15 14:36:41 User-PC kernel: [175576.120609] invalid opcode: 0000 [#1] SMP Dec 15 14:36:41 User-PC kernel: [175576.120749] last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/uevent Dec 15 14:36:41 User-PC kernel: [175576.121035] Modules linked in: approvals binfmt_misc bridge stp llc parport_pc ppdev arc4 iwlagn snd_hda_codec_realtek iwlcore i915 snd_hda_intel mac80211 joydev snd_hda_codec snd_hwdep snd_pcm snd_seq_midi drm_kms_helper snd_rawmidi drm snd_seq_midi_event snd_seq snd_timer snd_seq_device cfg80211 eeepc_wmi usbhid psmouse intel_agp i2c_algo_bit intel_gtt uvcvideo agpgart videodev sparse_keymap snd shpchp v4l1_compat lp hid video serio_raw soundcore output snd_page_alloc ahci libahci atl1c Dec 15 14:36:41 User-PC kernel: [175576.122712] Dec 15 14:36:41 User-PC kernel: [175576.122769] Pid: 0, comm: kworker/0:0 Tainted: G W 2.6.37-rc5-wl+ #3 1015PE/1016P Dec 15 14:36:41 User-PC kernel: [175576.123012] EIP: 0060:[<f83edd65>] EFLAGS: 00010283 CPU: 1 Dec 15 14:36:41 User-PC kernel: [175576.123193] EIP is at br_multicast_rcv+0xc95/0xe1c [bridge] Dec 15 14:36:41 User-PC kernel: [175576.123362] EAX: 0000001c EBX: f5626318 ECX: 00000000 EDX: 00000000 Dec 15 14:36:41 User-PC kernel: [175576.123550] ESI: ec512262 EDI: f5626180 EBP: f60b5ca0 ESP: f60b5bd8 Dec 15 14:36:41 User-PC kernel: [175576.123737] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Dec 15 14:36:41 User-PC kernel: [175576.123902] Process kworker/0:0 (pid: 0, ti=f60b4000 task=f60a8000 task.ti=f60b0000) Dec 15 14:36:41 User-PC kernel: [175576.124137] Stack: Dec 15 14:36:41 User-PC kernel: [175576.124181] ec556500 f6d06800 f60b5be8 c01087d8 ec512262 00000030 00000024 f5626180 Dec 15 14:36:41 User-PC kernel: [175576.124181] f572c200 ef463440 f5626300 3affffff f6d06dd0 e60766a4 000000c4 f6d06860 Dec 15 14:36:41 User-PC kernel: [175576.124181] ffffffff ec55652c 00000001 f6d06844 f60b5c64 c0138264 c016e451 c013e47d Dec 15 14:36:41 User-PC kernel: [175576.124181] Call Trace: Dec 15 14:36:41 User-PC kernel: [175576.124181] [<c01087d8>] ? sched_clock+0x8/0x10 Dec 15 14:36:41 User-PC kernel: [175576.124181] [<c0138264>] ? enqueue_entity+0x174/0x440 Dec 15 14:36:41 User-PC kernel: [175576.124181] [<c016e451>] ? sched_clock_cpu+0x131/0x190 Dec 15 14:36:41 User-PC kernel: [175576.124181] [<c013e47d>] ? select_task_rq_fair+0x2ad/0x730 Dec 15 14:36:41 User-PC kernel: [175576.124181] [<c0524fc1>] ? nf_iterate+0x71/0x90 Dec 15 14:36:41 User-PC kernel: [175576.124181] [<f83e4914>] ? br_handle_frame_finish+0x184/0x220 [bridge] Dec 15 14:36:41 User-PC kernel: [175576.124181] [<f83e4790>] ? br_handle_frame_finish+0x0/0x220 [bridge] Dec 15 14:36:41 User-PC kernel: [175576.124181] [<f83e46e9>] ? br_handle_frame+0x189/0x230 [bridge] Dec 15 14:36:41 User-PC kernel: [175576.124181] [<f83e4790>] ? br_handle_frame_finish+0x0/0x220 [bridge] Dec 15 14:36:41 User-PC kernel: [175576.124181] [<f83e4560>] ? br_handle_frame+0x0/0x230 [bridge] Dec 15 14:36:41 User-PC kernel: [175576.124181] [<c04ff026>] ? __netif_receive_skb+0x1b6/0x5b0 Dec 15 14:36:41 User-PC kernel: [175576.124181] [<c04f7a30>] ? skb_copy_bits+0x110/0x210 Dec 15 14:36:41 User-PC kernel: [175576.124181] [<c0503a7f>] ? netif_receive_skb+0x6f/0x80 Dec 15 14:36:41 User-PC kernel: [175576.124181] [<f82cb74c>] ? ieee80211_deliver_skb+0x8c/0x1a0 [mac80211] Dec 15 14:36:41 User-PC kernel: [175576.124181] [<f82cc836>] ? ieee80211_rx_handlers+0xeb6/0x1aa0 [mac80211] Dec 15 14:36:41 User-PC kernel: [175576.124181] [<c04ff1f0>] ? __netif_receive_skb+0x380/0x5b0 Dec 15 14:36:41 User-PC kernel: [175576.124181] [<c016e242>] ? sched_clock_local+0xb2/0x190 Dec 15 14:36:41 User-PC kernel: [175576.124181] [<c012b688>] ? default_spin_lock_flags+0x8/0x10 Dec 15 14:36:41 User-PC kernel: [175576.124181] [<c05d83df>] ? _raw_spin_lock_irqsave+0x2f/0x50 Dec 15 14:36:41 User-PC kernel: [175576.124181] [<f82cd621>] ? ieee80211_prepare_and_rx_handle+0x201/0xa90 [mac80211] Dec 15 14:36:41 User-PC kernel: [175576.124181] [<f82ce154>] ? ieee80211_rx+0x2a4/0x830 [mac80211] Dec 15 14:36:41 User-PC kernel: [175576.124181] [<f815a8d6>] ? iwl_update_stats+0xa6/0x2a0 [iwlcore] Dec 15 14:36:41 User-PC kernel: [175576.124181] [<f8499212>] ? iwlagn_rx_reply_rx+0x292/0x3b0 [iwlagn] Dec 15 14:36:41 User-PC kernel: [175576.124181] [<c05d83df>] ? _raw_spin_lock_irqsave+0x2f/0x50 Dec 15 14:36:41 User-PC kernel: [175576.124181] [<f8483697>] ? iwl_rx_handle+0xe7/0x350 [iwlagn] Dec 15 14:36:41 User-PC kernel: [175576.124181] [<f8486ab7>] ? iwl_irq_tasklet+0xf7/0x5c0 [iwlagn] Dec 15 14:36:41 User-PC kernel: [175576.124181] [<c01aece1>] ? __rcu_process_callbacks+0x201/0x2d0 Dec 15 14:36:41 User-PC kernel: [175576.124181] [<c0150d05>] ? tasklet_action+0xc5/0x100 Dec 15 14:36:41 User-PC kernel: [175576.124181] [<c0150a07>] ? __do_softirq+0x97/0x1d0 Dec 15 14:36:41 User-PC kernel: [175576.124181] [<c05d910c>] ? nmi_stack_correct+0x2f/0x34 Dec 15 14:36:41 User-PC kernel: [175576.124181] [<c0150970>] ? __do_softirq+0x0/0x1d0 Dec 15 14:36:41 User-PC kernel: [175576.124181] <IRQ> Dec 15 14:36:41 User-PC kernel: [175576.124181] [<c01508f5>] ? irq_exit+0x65/0x70 Dec 15 14:36:41 User-PC kernel: [175576.124181] [<c05df062>] ? do_IRQ+0x52/0xc0 Dec 15 14:36:41 User-PC kernel: [175576.124181] [<c01036b0>] ? common_interrupt+0x30/0x38 Dec 15 14:36:41 User-PC kernel: [175576.124181] [<c03a1fc2>] ? intel_idle+0xc2/0x160 Dec 15 14:36:41 User-PC kernel: [175576.124181] [<c04daebb>] ? cpuidle_idle_call+0x6b/0x100 Dec 15 14:36:41 User-PC kernel: [175576.124181] [<c0101dea>] ? cpu_idle+0x8a/0xf0 Dec 15 14:36:41 User-PC kernel: [175576.124181] [<c05d2702>] ? start_secondary+0x1e8/0x1ee Cc: David Miller <davem@davemloft.net> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-03 11:26:34 -08:00
Paul Gortmaker	40cd201e37	tipc: update log.h re-include protection to reflect new name The tipc/dbg.h file was recently renamed to tipc/log.h, but the re-include define was not updated accordingly. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-01 14:56:18 -08:00
Allan Stephens	a016892cd6	tipc: remove extraneous braces from single statements Cleans up TIPC's source code to eliminate the presence of unnecessary use of {} around single statements. These changes are purely cosmetic and do not alter the operation of TIPC in any way. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-01 13:57:57 -08:00
Allan Stephens	e3ec9c7d5e	tipc: remove zeroing assignments to static global variables Cleans up TIPC's source code to eliminate the needless initialization of static variables to zero. These changes are purely cosmetic and do not alter the operation of TIPC in any way. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-01 13:57:57 -08:00
Allan Stephens	2db9983a43	tipc: split variable assignments out of conditional expressions Cleans up TIPC's source code to eliminate assigning values to variables within conditional expressions, improving code readability and reducing warnings from various code checker tools. These changes are purely cosmetic and do not alter the operation of TIPC in any way. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-01 13:57:56 -08:00
Allan Stephens	0e65967e33	tipc: cleanup various cosmetic whitespace issues Cleans up TIPC's source code to eliminate deviations from generally accepted coding conventions relating to leading/trailing white space and white space around commas, braces, cases, and sizeof. These changes are purely cosmetic and do not alter the operation of TIPC in any way. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-01 13:57:56 -08:00
Paul Gortmaker	25860c3bd5	tipc: recode getsockopt error handling for better readability The existing code for the copy to user and error handling at the end of getsockopt isn't easy to follow, due to the excessive use of if/else. By simply using return where appropriate, it can be made smaller and easier to follow at the same time. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-01 13:57:55 -08:00
Allan Stephens	e83504f724	tipc: remove pointless check for NULL prior to kfree It is acceptable to call kfree() with NULL, so these checks are not serving any useful purpose. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-01 13:57:55 -08:00
Allan Stephens	886ef52a8c	tipc: remove redundant #includes Eliminates a number of #include statements that no longer serve any useful purpose. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-01 13:57:54 -08:00
Allan Stephens	6e7e309c62	tipc: Finish streamlining of debugging code Completes the simplification of TIPC's debugging capabilities. By default TIPC includes no debugging code, and any debugging code added by developers that calls the dbg() and dbg_macros() is compiled out. If debugging support is enabled, TIPC prints out some additional data about its internal state when certain abnormal conditions occur, and any developer-added calls to the TIPC debug macros are compiled in. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-01 13:57:54 -08:00
Allan Stephens	8d64a5ba58	tipc: Prune down link-specific debugging code Eliminates most link-specific debugging code in TIPC, which is now largely unnecessary. All calls to the link-specific debugging macros have been removed, as are the macros themselves; in addition, the optional allocation of print buffers to hold debugging information for each link endpoint has been removed. The ability for TIPC to print out helpful diagnostic information when link retransmit failures occur has been retained for the time being, as an aid in tracking down the cause of such failures. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-01 13:57:53 -08:00
Allan Stephens	7ced6890bf	tipc: remove dump() and tipc_dump_dbg() Eliminates calls to two debugging macros that are being completely obsoleted, as well as any associated debugging routines that are no longer required. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-01 13:57:53 -08:00
Allan Stephens	b29f142849	tipc: remove calls to dbg() and msg_dbg() Eliminates obsolete calls to two of TIPC's main debugging macros, as well as a pair of associated debugging routines that are no longer required. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-01 13:57:52 -08:00
Allan Stephens	f5e75269f5	tipc: rename dbg.[ch] to log.[ch] As the first step in removing obsolete debugging code from TIPC the files that implement TIPC's non-debug-related log buffer subsystem are renamed to better reflect their true nature. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-01 13:57:51 -08:00
Allan Stephens	5af5479296	tipc: Remove internal linked list of node objects Eliminates a sorted list TIPC uses to keep track of the neighboring nodes it has links to, since this duplicates information already present in the internal array of node object pointers. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-01 13:57:51 -08:00
Allan Stephens	b0c1e928c8	tipc: Remove user registry subsystem Eliminates routines, data structures, and files that make up TIPC's user registry. The user registry is no longer needed since the native API routines that utilized it no longer exist and there are no longer any internal TIPC services that use it. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-01 13:57:51 -08:00
Allan Stephens	aa70200e00	tipc: Eliminate use of user registry by topology service Simplifies TIPC's network topology service so that it no longer registers its ports with the user registry, since the service doesn't take advantage of any of the registry's capabilities. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-01 13:57:50 -08:00
Allan Stephens	7a488fd3d4	tipc: Eliminate use of user registry by configuration service Simplifies TIPC's configuration service so that it no longer registers its port with the user registry, since the service doesn't take advantage of any of the registry's capabilities. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-01 13:57:50 -08:00
Allan Stephens	8f92df6ad4	tipc: Remove prototype code for supporting multiple clusters Eliminates routines, data structures, and files that were intended to allow TIPC to support a network containing multiple clusters. Currently, TIPC supports only networks consisting of a single cluster within a single zone, so this code is unnecessary. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-01 13:57:49 -08:00
Allan Stephens	51a8e4dee7	tipc: Remove prototype code for supporting inter-cluster routing Eliminates routines and data structures that were intended to allow TIPC to route messages to other clusters. Currently, TIPC supports only networks consisting of a single cluster within a single zone, so this code is unnecessary. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-01 13:57:48 -08:00
Allan Stephens	08c80e9a03	tipc: Remove prototype code for supporting slave nodes Simplifies routines and data structures that were intended to allow TIPC to support slave nodes (i.e. nodes that did not have links to all of the other nodes in its cluster, forcing TIPC to route messages that it could not deliver directly through a non-slave node). Currently, TIPC supports only networks containing non-slave nodes, so this code is unnecessary. Note: The latest edition of the TIPC 2.0 Specification has eliminated the concept of slave nodes entirely. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-01 13:57:48 -08:00
Allan Stephens	51f98a8d70	tipc: Remove prototype code for supporting multiple zones Eliminates routines, data structures, and files that were intended to allows TIPC to support a network containing multiple zones. Currently, TIPC supports only networks consisting of a single cluster within a single zone, so this code is unnecessary. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-01 13:57:47 -08:00
Eric Dumazet	18c8d82ae5	sfq: fix slot_dequeue_head() slot_dequeue_head() should make sure slot skb chain is correct in both ways, or we can crash if all possible flows are in use. Jarek pointed out slot_queue_init() can now be done in sfq_init() once, instead each time a flow is setup. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-31 12:48:55 -08:00
Eric Dumazet	eeaeb068f1	sch_sfq: allow big packets and be fair SFQ is currently 'limited' to small packets, because it uses a 15bit allotment number per flow. Introduce a scale by 8, so that we can handle full size TSO/GRO packets. Use appropriate handling to make sure allot is positive before a new packet is dequeued, so that fairness is respected. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Jarek Poplawski <jarkao2@gmail.com> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-31 12:47:37 -08:00
Dan Rosenberg	9f260e0efa	CAN: Use inode instead of kernel address for /proc file Since the socket address is just being used as a unique identifier, its inode number is an alternative that does not leak potentially sensitive information. CC-ing stable because MITRE has assigned CVE-2010-4565 to the issue. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-31 11:13:27 -08:00
Shmulik Ravid	7c14c3f10e	dcbnl: cleanup A couple of small cleanups for patches: [net-next-2.6 PATCH 1/3] dcbnl: add support for ieee8021Qaz attributes [net-next-2.6 PATCH 2/3] dcbnl: add appliction tlv handlers [net-next-2.6 PATCH 3/3] net_dcb: add application notifiers Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-31 10:52:09 -08:00
Shmulik Ravid	ea45fe4e17	dcbnl: adding DCBX feature flags get-set Adding a pair of set-get routines to dcbnl for setting the negotiation flags of the various DCB features. Conforms to the CEE flavor of DCBX The user sets these flags (enable, advertise, willing) for each feature to be used by the DCBX engine. The 'get' routine returns which of the features is enabled after the negotiation. This patch is dependent on the following patches: [net-next-2.6 PATCH 1/3] dcbnl: add support for ieee8021Qaz attributes [net-next-2.6 PATCH 2/3] dcbnl: add appliction tlv handlers [net-next-2.6 PATCH 3/3] net_dcb: add application notifiers Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-31 10:50:54 -08:00
Shmulik Ravid	6241b6259b	dcbnl: adding DCBX engine capability Adding an optional DCBX capability and a pair for get-set routines for setting the device DCBX mode. The DCBX capability is a bit field of supported attributes. The user is expected to set the DCBX mode with a subset of the advertised attributes. This patch is dependent on the following patches: [net-next-2.6 PATCH 1/3] dcbnl: add support for ieee8021Qaz attributes [net-next-2.6 PATCH 2/3] dcbnl: add appliction tlv handlers [net-next-2.6 PATCH 3/3] net_dcb: add application notifiers Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-31 10:50:54 -08:00
John Fastabend	96b99684e3	net_dcb: add application notifiers DCBx applications priorities can be changed dynamically. If application stacks are expected to keep the skb priority consistent with the dcbx priority the stack will need to be notified when these changes occur. This patch adds application notifiers for the stack to register with. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-31 10:47:46 -08:00
John Fastabend	9ab933ab2c	dcbnl: add appliction tlv handlers This patch adds application tlv handlers. Networking stacks may use the application priority to set the skb priority of their stack using the negoatiated dcbx priority. This patch provides the dcb_{get\|set}app() routines for the stack to query these parameters. Notice lower layer drivers can use the dcbnl_ops routines if additional handling is needed. Perhaps in the firmware case for example Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-31 10:47:45 -08:00
John Fastabend	3e29027af4	dcbnl: add support for ieee8021Qaz attributes The IEEE8021Qaz is the IEEE standard version of CEE. The standard has had enough significant changes from the CEE version that many of the CEE attributes have no meaning in the new spec or do not easily map to IEEE standards. Rather then attempt to create a complicated mapping between CEE and IEEE standards this patch adds a nested IEEE attribute to the list of DCB attributes. The policy is, [DCB_ATTR_IFNAME] [DCB_ATTR_STATE] ... [DCB_ATTR_IEEE] [DCB_ATTR_IEEE_ETS] [DCB_ATTR_IEEE_PFC] [DCB_ATTR_IEEE_APP_TABLE] [DCB_ATTR_IEEE_APP] ... The following dcbnl_rtnl_ops routines were added to handle the IEEE standard, int (ieee_getets) (struct net_device , struct ieee_ets ); int (ieee_setets) (struct net_device , struct ieee_ets ); int (ieee_getpfc) (struct net_device , struct ieee_pfc ); int (ieee_setpfc) (struct net_device , struct ieee_pfc ); int (ieee_getapp) (struct net_device , struct dcb_app ); int (ieee_setapp) (struct net_device , struct dcb_app ); Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-31 10:47:45 -08:00
David S. Miller	17f7f4d9fc	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/ipv4/fib_frontend.c	2010-12-26 22:37:05 -08:00
Linus Torvalds	d7c1255a3a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits) ipv4: dont create routes on down devices epic100: hamachi: yellowfin: Fix skb allocation size sundance: Fix oopses with corrupted skb_shared_info Revert "ipv4: Allow configuring subnets as local addresses" USB: mcs7830: return negative if auto negotiate fails irda: prevent integer underflow in IRLMP_ENUMDEVICES tcp: fix listening_get_next() atl1c: Do not use legacy PCI power management mac80211: fix mesh forwarding MAINTAINERS: email address change net: Fix range checks in tcf_valid_offset(). net_sched: sch_sfq: fix allot handling hostap: remove netif_stop_queue from init mac80211/rt2x00: add ieee80211_tx_status_ni() typhoon: memory corruption in typhoon_get_drvinfo() net: Add USB PID for new MOSCHIP USB ethernet controller MCS7832 variant net_sched: always clone skbs ipv6: Fragment locally generated tunnel-mode IPSec6 packets as needed. netlink: fix gcc -Wconversion compilation warning asix: add USB ID for Logitec LAN-GTJ U2A ...	2010-12-26 12:06:56 -08:00
Eric Dumazet	fc75fc8339	ipv4: dont create routes on down devices In ip_route_output_slow(), instead of allowing a route to be created on a not UPed device, report -ENETUNREACH immediately. # ip tunnel add mode ipip remote 10.16.0.164 local 10.16.0.72 dev eth0 # (Note : tunl1 is down) # ping -I tunl1 10.1.2.3 PING 10.1.2.3 (10.1.2.3) from 192.168.18.5 tunl1: 56(84) bytes of data. (nothing) # ./a.out tunl1 # ip tunnel del tunl1 Message from syslogd@shelby at Dec 22 10:12:08 ... kernel: unregister_netdevice: waiting for tunl1 to become free. Usage count = 3 After patch: # ping -I tunl1 10.1.2.3 connect: Network is unreachable Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-25 20:05:31 -08:00
Tejun Heo	7f6b0db9f6	net/dsa: don't use flush_scheduled_work() flush_scheduled_work() is deprecated and scheduled to be removed. Directly flush dst->link_poll_work on remove instead. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Lennert Buytenhek <buytenh@wantstofly.org>	2010-12-24 15:59:06 +01:00
David S. Miller	e058464990	Revert "ipv4: Allow configuring subnets as local addresses" This reverts commit `4465b46900`. Conflicts: net/ipv4/fib_frontend.c As reported by Ben Greear, this causes regressions: > Change `4465b46900` caused rules > to stop matching the input device properly because the > FLOWI_FLAG_MATCH_ANY_IIF is always defined in ip_dev_find(). > > This breaks rules such as: > > ip rule add pref 512 lookup local > ip rule del pref 0 lookup local > ip link set eth2 up > ip -4 addr add 172.16.0.102/24 broadcast 172.16.0.255 dev eth2 > ip rule add to 172.16.0.102 iif eth2 lookup local pref 10 > ip rule add iif eth2 lookup 10001 pref 20 > ip route add 172.16.0.0/24 dev eth2 table 10001 > ip route add unreachable 0/0 table 10001 > > If you had a second interface 'eth0' that was on a different > subnet, pinging a system on that interface would fail: > > [root@ct503-60 ~]# ping 192.168.100.1 > connect: Invalid argument Reported-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-23 12:03:57 -08:00
David S. Miller	a130883d95	Merge branch 'for-davem' of ssh://master.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6	2010-12-23 10:13:30 -08:00
Dan Rosenberg	fdac1e0697	irda: prevent integer underflow in IRLMP_ENUMDEVICES If the user-provided len is less than the expected offset, the IRLMP_ENUMDEVICES getsockopt will do a copy_to_user() with a very large size value. While this isn't be a security issue on x86 because it will get caught by the access_ok() check, it may leak large amounts of kernel heap on other architectures. In any event, this patch fixes it. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-23 10:09:43 -08:00
Jiri Kosina	d9f4fbaf70	tcp: cleanup of cwnd initialization in tcp_init_metrics() Commit `86bcebafc5` ("tcp: fix >2 iw selection") fixed a case when congestion window initialization has been mistakenly omitted by introducing cwnd label and putting backwards goto from the end of the function. This makes the code unnecessarily tricky to read and understand on a first sight. Shuffle the code around a little bit to make it more obvious. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-23 09:54:26 -08:00
Eric Dumazet	1bde5ac493	tcp: fix listening_get_next() Alexey Vlasov found /proc/net/tcp could sometime loop and display millions of sockets in LISTEN state. In 2.6.29, when we converted TCP hash tables to RCU, we left two sk_next() calls in listening_get_next(). We must instead use sk_nulls_next() to properly detect an end of chain. Reported-by: Alexey Vlasov <renton@renton.name> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-23 09:32:46 -08:00
David S. Miller	b7e03ec9a6	Merge branch 'master' of ssh://master.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6	2010-12-22 17:34:40 -08:00
Gustavo F. Padovan	17f9cc3124	Bluetooth: Improve handling of HCI control channel in bind Does not allow any channel different of HCI_CHANNEL_RAW and HCI_CHANNEL_CONTROL to bind. Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-12-22 23:00:34 -02:00
Johan Hedberg	23bb57633d	Bluetooth: Fix __hci_request synchronization for hci_open_dev The initialization function used by hci_open_dev (hci_init_req) sends many different HCI commands. The __hci_request function should only return when all of these commands have completed (or a timeout occurs). Several of these commands cause hci_req_complete to be called which causes __hci_request to return prematurely. This patch fixes the issue by adding a new hdev->req_last_cmd variable which is set during the initialization procedure. The hci_req_complete function will no longer mark the request as complete until the command matching hdev->req_last_cmd completes. Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-12-22 22:58:07 -02:00
Johan Hedberg	c71e97bfaa	Bluetooth: Add management events for controller addition & removal This patch adds Bluetooth Management interface events for controller addition and removal. The events correspond to the existing HCI_DEV_REG and HCI_DEV_UNREG stack internal events. Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-12-22 22:58:00 -02:00
Johan Hedberg	f7b64e69c7	Bluetooth: Add read_info management command This patch implements the read_info command which is used to fetch basic info about an adapter. Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-12-22 22:57:51 -02:00
Johan Hedberg	faba42eb2a	Bluetooth: Add read_index_list management command This patch implements the read_index_list command through which userspace can get a list of current adapter indices. Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-12-22 22:57:44 -02:00
Johan Hedberg	02d981292a	Bluetooth: Add read_version management command This patch implements the initial read_version command that userspace will use before any other management interface operations. Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-12-22 22:57:37 -02:00
Johan Hedberg	e41d8b4e13	Bluetooth: Add error handling for managment command handlers The command handlers for bluetooth management messaging should be able to report errors (such as memory allocation failures) to the higher levels in the call stack. Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-12-22 22:56:56 -02:00
Johannes Berg	172128468f	mac80211: cleanup select_queue There's a redundant rcu_read_lock/unlock pair, a redundant variable, and a few redundant accesses to the 1d_to_ac array. Fix this to make the code neater and easier to follow. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-22 15:44:22 -05:00
Eric Dumazet	ee09b3c1cf	sfq: fix sfq class stats handling sfq_walk() runs without qdisc lock. By the time it selects a non empty hash slot and sfq_dump_class_stats() is run (with lock held), slot might have been freed : We then access q->slots[SFQ_EMPTY_SLOT], out of bounds, and crash in slot_queue_walk() On previous kernels, bug is here but out of bounds qs[SFQ_DEPTH] and allot[SFQ_DEPTH] are located in struct sfq_sched_data, so no illegal memory access happens, only possibly wrong data reported to user. Also, slot_dequeue_tail() should make sure slot skb chain is correctly terminated, or sfq_dump_class_stats() can access freed skbs. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-22 11:39:59 -08:00
Luciano Coelho	65a6538a56	mac80211: check for CONFIG_MAC80211_LEDS in the tpt_led_trigger declaration If CONFIG_MAC80211_LEDS is not set, ieee80211_i.h was failing to compile, because struct led_trigger is only declared when CONFIG_LEDS_TRIGGERS is set. This patch adds ifdefs around the tpt_led_trigger declaration in ieee80211_i.h to avoid the problem. Signed-off-by: Luciano Coelho <luciano.coelho@nokia.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-22 14:36:05 -05:00
Johannes Berg	67408c8c7b	mac80211: selective throughput LED trigger active The throughput LED trigger was always active when the radio was enabled. In most cases that's likely the desired behaviour, but iwlwifi requires it to be only active when one of the virtual interfaces is actually "connected" in some way. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-22 14:33:37 -05:00
Johannes Berg	e1e5406854	mac80211: add throughput based LED blink trigger iwlwifi and other drivers like to blink their LED based on throughput. Implement this generically in mac80211, based on a throughput table the driver specifies. That way, drivers can set the blink frequencies depending on their desired behaviour and max throughput. All the drivers need to do is provide an LED class device, best with blink hardware offload. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-22 14:33:37 -05:00
Johannes Berg	fe67c913f1	mac80211: make LED trigger names available early The throughput trigger will require doing LED classdev/trigger handling before register_hw(), so drivers should have access to the trigger names before it. If trigger registration fails, this will still make the trigger name available, but that's not a big problem since the default trigger will the simply not be found. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-22 14:33:37 -05:00
John W. Linville	63e35cd9bd	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem Conflicts: drivers/net/wireless/iwlwifi/iwl-1000.c drivers/net/wireless/iwlwifi/iwl-6000.c drivers/net/wireless/iwlwifi/iwl-core.h	2010-12-22 14:27:21 -05:00
Johannes Berg	b51aff057c	mac80211: fix mesh forwarding Under memory pressure, the mac80211 mesh code may helpfully print a message that it failed to clone a mesh frame and then will proceed to crash trying to use it anyway. Fix that. Cc: stable@kernel.org [2.6.27+] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Javier Cardona <javier@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-22 13:36:35 -05:00
Jiri Kosina	4b7bd36470	Merge branch 'master' into for-next Conflicts: MAINTAINERS arch/arm/mach-omap2/pm24xx.c drivers/scsi/bfa/bfa_fcpim.c Needed to update to apply fixes for which the old branch was too outdated.	2010-12-22 18:57:02 +01:00
Eric Dumazet	12b16dadbc	filter: optimize accesses to ancillary data We can translate pseudo load instructions at filter check time to dedicated instructions to speed up filtering and avoid one switch(). libpcap currently uses SKF_AD_PROTOCOL, but custom filters probably use other ancillary accesses. Note : I made the assertion that ancillary data was always accessed with BPF_LD\|BPF_?\|BPF_ABS instructions, not with BPF_LD\|BPF_?\|BPF_IND ones (offset given by K constant, not by K + X register) On x86_64, this saves a few bytes of text : # size net/core/filter.o.* text data bss dec hex filename 4864 0 0 4864 1300 net/core/filter.o.new 4944 0 0 4944 1350 net/core/filter.o.old Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-21 12:30:12 -08:00
Eric Dumazet	70978182d4	net: timestamp cloned packet in dev_queue_xmit_nit Le vendredi 17 décembre 2010 à 10:26 +0100, Eric Dumazet a écrit : > > I think we can add this after latest Changli patch : > > He does one skb_clone() before calling the sniffers. > We could set timestamp on this clone, instead of original skb. > > Problem solved. > [PATCH net-next-2.6] net: timestamp cloned packet in dev_queue_xmit_nit Now we do one clone of skb if at least one sniffer might take packet, we also can do the skb timestamping on the clone and let original packet unchanged. This is a generalization of commit `8caf153974` (net: sch_netem: Fix an inconsistency in ingress netem timestamps.) This way, we can have a good idea when packets are delivered to our stack (tcpdump -i ifb0), while a tcpdump on original device gives timestamps right before ingressing. This also speedup our stack, avoiding taking timestamps if not needed. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Changli Gao <xiaosuo@gmail.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Jarek Poplawski <jarkao2@gmail.com> Acked-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-21 10:50:38 -08:00
Joe Perches	b3bcedadf2	net/sunrpc/clnt.c: Convert sprintf_symbol to %ps Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-21 11:51:26 -05:00
Nandita Dukkipati	356f039822	TCP: increase default initial receive window. This patch changes the default initial receive window to 10 mss (defined constant). The default window is limited to the maximum of 101460 and 2mss (when mss > 1460). draft-ietf-tcpm-initcwnd-00 is a proposal to the IETF that recommends increasing TCP's initial congestion window to 10 mss or about 15KB. Leading up to this proposal were several large-scale live Internet experiments with an initial congestion window of 10 mss (IW10), where we showed that the average latency of HTTP responses improved by approximately 10%. This was accompanied by a slight increase in retransmission rate (0.5%), most of which is coming from applications opening multiple simultaneous connections. To understand the extreme worst case scenarios, and fairness issues (IW10 versus IW3), we further conducted controlled testbed experiments. We came away finding minimal negative impact even under low link bandwidths (dial-ups) and small buffers. These results are extremely encouraging to adopting IW10. However, an initial congestion window of 10 mss is useless unless a TCP receiver advertises an initial receive window of at least 10 mss. Fortunately, in the large-scale Internet experiments we found that most widely used operating systems advertised large initial receive windows of 64KB, allowing us to experiment with a wide range of initial congestion windows. Linux systems were among the few exceptions that advertised a small receive window of 6KB. The purpose of this patch is to fix this shortcoming. References: 1. A comprehensive list of all IW10 references to date. http://code.google.com/speed/protocols/tcpm-IW10.html 2. Paper describing results from large-scale Internet experiments with IW10. http://ccr.sigcomm.org/drupal/?q=node/621 3. Controlled testbed experiments under worst case scenarios and a fairness study. http://www.ietf.org/proceedings/79/slides/tcpm-0.pdf 4. Raw test data from testbed experiments (Linux senders/receivers) with initial congestion and receive windows of both 10 mss. http://research.csc.ncsu.edu/netsrv/?q=content/iw10 5. Internet-Draft. Increasing TCP's Initial Window. https://datatracker.ietf.org/doc/draft-ietf-tcpm-initcwnd/ Signed-off-by: Nandita Dukkipati <nanditad@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-20 21:33:00 -08:00
Eric Dumazet	eda83e3b63	net_sched: sch_sfq: better struct layouts Here is a respin of patch. I'll send a short patch to make SFQ more fair in presence of large packets as well. Thanks [PATCH v3 net-next-2.6] net_sched: sch_sfq: better struct layouts This patch shrinks sizeof(struct sfq_sched_data) from 0x14f8 (or more if spinlocks are bigger) to 0x1180 bytes, and reduce text size as well. text data bss dec hex filename 4821 152 0 4973 136d old/net/sched/sch_sfq.o 4627 136 0 4763 129b new/net/sched/sch_sfq.o All data for a slot/flow is now grouped in a compact and cache friendly structure, instead of being spreaded in many different points. struct sfq_slot { struct sk_buff skblist_next; struct sk_buff skblist_prev; sfq_index qlen; /* number of skbs in skblist / sfq_index next; / next slot in sfq chain / struct sfq_head dep; / anchor in dep[] chains / unsigned short hash; / hash value (index in ht[]) / short allot; / credit for this slot */ }; Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jarek Poplawski <jarkao2@gmail.com> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-20 21:32:59 -08:00
Linus Torvalds	9d5004fcf6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: handle partial result from get_user_pages ceph: mark user pages dirty on direct-io reads ceph: fix null pointer dereference in ceph_init_dentry for nfs reexport ceph: fix direct-io on non-page-aligned buffers ceph: fix msgr_init error path	2010-12-20 21:32:20 -08:00
David S. Miller	d9993be65a	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2010-12-20 13:24:14 -08:00
Eric Dumazet	aa3e219997	net_sched: sch_sfq: fix allot handling When deploying SFQ/IFB here at work, I found the allot management was pretty wrong in sfq, even changing allot from short to int... We should init allot for each new flow, not using a previous value found in slot. Before patch, I saw bursts of several packets per flow, apparently denying the default "quantum 1514" limit I had on my SFQ class. class sfq 11:1 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 7p requeues 0 allot 11546 class sfq 11:46 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 1p requeues 0 allot -23873 class sfq 11:78 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 5p requeues 0 allot 11393 After patch, better fairness among each flow, allot limit being respected, allot is positive : class sfq 11:e parent 11: (dropped 0, overlimits 0 requeues 86) backlog 0b 3p requeues 86 allot 596 class sfq 11:94 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 3p requeues 0 allot 1468 class sfq 11:a4 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 4p requeues 0 allot 650 class sfq 11:bb parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 3p requeues 0 allot 596 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-20 13:18:16 -08:00
Eric Dumazet	c426626324	net_sched: sch_sfq: add backlog info in sfq_dump_class_stats() We currently return for each active SFQ slot the number of packets in queue. We can also give number of bytes accounted for these packets. tc -s class show dev ifb0 Before patch : class sfq 11:3d9 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 3p requeues 0 allot 1266 After patch : class sfq 11:3e4 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 4380b 3p requeues 0 allot 1212 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-20 13:13:56 -08:00
Felix Fietkau	f8a0a78148	mac80211: fix potentially redundant skb data copying When an skb is shared, it needs to be duplicated, along with its data buffer. If the skb does not have enough headroom, using skb_copy might cause the data buffer to be copied twice (once by skb_copy and once by pskb_expand_head). Fix this by using skb_clone initially and letting ieee80211_skb_resize sort out the rest. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-20 14:52:17 -05:00
Felix Fietkau	4cd06a344d	mac80211: skip unnecessary pskb_expand_head calls If the skb is not cloned and we don't need any extra headroom, there is no point in reallocating the skb head. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-20 14:52:17 -05:00
Felix Fietkau	489ee9195a	mac80211: fix initialization of skb->cb in ieee80211_subif_start_xmit The change 'mac80211: Fix BUG in pskb_expand_head when transmitting shared skbs' added a check for copying the skb if it's shared, however the tx info variable still points at the cb of the old skb Signed-off-by: Felix Fietkau <nbd@openwrt.org> Acked-by: Helmut Schaa <helmut.schaa@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-20 14:52:17 -05:00
Javier Cardona	61ad539459	mac80211: Remove unused third address from mesh address extension header. The Mesh Control header only includes 0, 1 or 2 addresses. If there is one address, it should be interpreted as Address 4. If there are 2, they are interpreted as Addresses 5 and 6 (Address 4 being the 4th address in the 802.11 header). The address extension used to hold up to 3 addresses instead of the current 2. I'm not sure which draft version changed this, but it is very unlikely that it will change again given the state of the approval process of this draft. See section 7.1.3.6.3 in current draft (8.0). Also, note that the extra address that I'm removing was not being used, so this change has no effect on over-the-air frame formats. But I thought I better remove it before someone does start using it. Signed-off-by: Javier Cardona <javier@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-20 14:49:47 -05:00
Bruno Randolf	39fd5de447	nl80211: Export available antennas Export the information which antennas are available for configuration as TX or RX antennas via nl80211. Signed-off-by: Bruno Randolf <br1@einfach.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-20 14:49:47 -05:00
Bruno Randolf	7f531e03ab	cfg80211: Separate available antennas for RX and TX As has been pointed out by Daniel Halperin some devices (e.g. Intel IWL5100) can only TX from a subset of RX antennas, so use separate availability masks for RX and TX. Signed-off-by: Bruno Randolf <br1@einfach.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-20 14:46:58 -05:00
Javier Cardona	c7108a7111	mac80211: Send mesh non-HWMP path selection frames to userspace Let path selection frames for protocols other than HWMP be sent to userspace via NL80211_CMD_REGISTER_FRAME. Also allow userspace to send and receive mesh path selection frames. Signed-off-by: Javier Cardona <javier@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-20 14:46:58 -05:00
Javier Cardona	c80d545da3	mac80211: Let userspace enable and configure vendor specific path selection. Userspace will now be allowed to toggle between the default path selection algorithm (HWMP, implemented in the kernel), and a vendor specific alternative. Also in the same patch, allow userspace to add information elements to mesh beacons. This is accordance with the Extensible Path Selection Framework specified in version 7.0 of the 802.11s draft. Signed-off-by: Javier Cardona <javier@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-20 14:46:57 -05:00
Javier Cardona	24bdd9f4c9	mac80211: Rename mesh_params to mesh_config to prepare for mesh_setup Mesh parameters can be to setup a mesh or to configure it. This patch renames the ambiguous name mesh_params to mesh_config in preparation for mesh_setup. Signed-off-by: Javier Cardona <javier@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-20 14:46:57 -05:00
David S. Miller	6561a3b12d	ipv4: Flush per-ns routing cache more sanely. Flush the routing cache only of entries that match the network namespace in which the purge event occurred. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>	2010-12-20 10:37:19 -08:00
Sven Eckelmann	53320fe3bb	batman-adv: Return hna count on local buffer fill hna_local_fill_buffer must return the number of added hna entries and not the last checked hash bucket. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-20 10:32:03 -08:00
Joe Perches	58231186c4	pktgen: Remove unnecessary prefix from pr_<level> Remove "pktgen: " prefix string from one pr_info. pr_fmt adds it, so this is a duplicate. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-20 10:30:06 -08:00
Shan Wei	4c306a9291	net: kill unused macros These macros never be used, so remove them. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-19 21:59:35 -08:00
Changli Gao	71d9dec24d	net: increase skb->users instead of skb_clone() In dev_queue_xmit_nit(), we have to clone skbs as we need to mangle skbs, however, we don't need to clone skbs for all the packet_types. Except for the first packet_type, we increase skb->users instead of skb_clone(). Signed-off-by: Changli Gao <xiaosuo@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-19 21:50:20 -08:00
David Stevens	ad0081e43a	ipv6: Fragment locally generated tunnel-mode IPSec6 packets as needed. This patch modifies IPsec6 to fragment IPv6 packets that are locally generated as needed. This version of the patch only fragments in tunnel mode, so that fragment headers will not be obscured by ESP in transport mode. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-19 20:22:23 -08:00
stephen hemminger	d1ed113f16	ipv6: remove duplicate neigh_ifdown When device is being set to down, neigh_ifdown was being called twice. Once from addrconf notifier and once from ndisc notifier. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-18 22:01:17 -08:00
stephen hemminger	bc3ef6605e	ipv6: fib6_ifdown cleanup Remove (unnecessary) casts to make code cleaner. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-18 22:01:16 -08:00
Joe Perches	f3c0ceea83	net/sunrpc/auth_gss/gss_krb5_crypto.c: Use normal negative error value return And remove unnecessary double semicolon too. No effect to code, as test is != 0. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-12-17 15:48:22 -05:00
Shan Wei	66c941f4aa	net: sunrpc: kill unused macros These macros never be used for several years. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-12-17 15:48:21 -05:00
NeilBrown	3942302ea9	sunrpc: svc_sock_names should hold ref to socket being closed. Currently svc_sock_names calls svc_close_xprt on a svc_sock to which it does not own a reference. As soon as svc_close_xprt sets XPT_CLOSE, the socket could be freed by a separate thread (though this is a very unlikely race). It is safer to hold a reference while calling svc_close_xprt. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-12-17 15:48:19 -05:00
NeilBrown	7c96aef759	sunrpc: remove xpt_pool The xpt_pool field is only used for reporting BUGs. And it isn't used correctly. In particular, when it is cleared in svc_xprt_received before XPT_BUSY is cleared, there is no guarantee that either the compiler or the CPU might not re-order to two assignments, just setting xpt_pool to NULL after XPT_BUSY is cleared. If a different cpu were running svc_xprt_enqueue at this moment, it might see XPT_BUSY clear and then xpt_pool non-NULL, and so BUG. This could be fixed by calling smp_mb__before_clear_bit() before the clear_bit. However as xpt_pool isn't really used, it seems safest to simply remove xpt_pool. Another alternate would be to change the clear_bit to clear_bit_unlock, and the test_and_set_bit to test_and_set_bit_lock. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-12-17 15:48:18 -05:00
David S. Miller	b4aa9e05a6	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/bnx2x/bnx2x.h drivers/net/wireless/iwlwifi/iwl-1000.c drivers/net/wireless/iwlwifi/iwl-6000.c drivers/net/wireless/iwlwifi/iwl-core.h drivers/vhost/vhost.c	2010-12-17 12:27:22 -08:00
J. Bruce Fields	ec66ee3797	Merge commit 'v2.6.37-rc6' into for-2.6.38	2010-12-17 13:29:07 -05:00
Henry C Chang	361cf40519	ceph: handle partial result from get_user_pages The get_user_pages() helper can return fewer than the requested pages. Error out in that case, and clean up the partial result. Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-17 09:55:59 -08:00
Henry C Chang	b6aa5901c7	ceph: mark user pages dirty on direct-io reads For read operation, we have to set the argument _write_ of get_user_pages to 1 since we will write data to pages. Also, we need to SetPageDirty before releasing these pages. Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-17 09:54:40 -08:00
stephen hemminger	29ba5fed1b	ipv6: don't flush routes when setting loopback down When loopback device is being brought down, then keep the route table entries because they are special. The entries in the local table for linklocal routes and ::1 address should not be purged. This is a sub optimal solution to the problem and should be replaced by a better fix in future. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-16 18:26:26 -08:00
Wei Yongjun	7d743b7e95	sctp: fix the return value of getting the sctp partial delivery point Get the sctp partial delivery point using SCTP_PARTIAL_DELIVERY_POINT socket option should return 0 if success, not -ENOTSUPP. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-16 14:48:44 -08:00
Michał Mirosław	55508d601d	net: Use skb_checksum_start_offset() Replace skb->csum_start - skb_headroom(skb) with skb_checksum_start_offset(). Note for usb/smsc95xx: skb->data - skb->head == skb_headroom(skb). Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-16 14:43:14 -08:00
David Stevens	76d661586c	bridge: fix IPv6 queries for bridge multicast snooping This patch fixes a missing ntohs() for bridge IPv6 multicast snooping. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-16 14:41:23 -08:00
Octavian Purdila	fcbdf09d96	net: fix nulls list corruptions in sk_prot_alloc Special care is taken inside sk_port_alloc to avoid overwriting skc_node/skc_nulls_node. We should also avoid overwriting skc_bind_node/skc_portaddr_node. The patch fixes the following crash: BUG: unable to handle kernel paging request at fffffffffffffff0 IP: [<ffffffff812ec6dd>] udp4_lib_lookup2+0xad/0x370 [<ffffffff812ecc22>] __udp4_lib_lookup+0x282/0x360 [<ffffffff812ed63e>] __udp4_lib_rcv+0x31e/0x700 [<ffffffff812bba45>] ? ip_local_deliver_finish+0x65/0x190 [<ffffffff812bbbf8>] ? ip_local_deliver+0x88/0xa0 [<ffffffff812eda35>] udp_rcv+0x15/0x20 [<ffffffff812bba45>] ip_local_deliver_finish+0x65/0x190 [<ffffffff812bbbf8>] ip_local_deliver+0x88/0xa0 [<ffffffff812bb2cd>] ip_rcv_finish+0x32d/0x6f0 [<ffffffff8128c14c>] ? netif_receive_skb+0x99c/0x11c0 [<ffffffff812bb94b>] ip_rcv+0x2bb/0x350 [<ffffffff8128c14c>] netif_receive_skb+0x99c/0x11c0 Signed-off-by: Leonard Crestez <lcrestez@ixiacom.com> Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-16 14:26:56 -08:00
Octavian Purdila	443457242b	net: factorize sync-rcu call in unregister_netdevice_many Add dev_close_many and dev_deactivate_many to factorize another sync-rcu operation on the netdevice unregister path. $ modprobe dummy numdummies=10000 $ ip link set dev dummy* up $ time rmmod dummy Without the patch With the patch real 0m 24.63s real 0m 5.15s user 0m 0.00s user 0m 0.00s sys 0m 6.05s sys 0m 5.14s Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-16 14:04:44 -08:00
Sven Eckelmann	c6c8fea297	net: Add batman-adv meshing protocol B.A.T.M.A.N. (better approach to mobile ad-hoc networking) is a routing protocol for multi-hop ad-hoc mesh networks. The networks may be wired or wireless. See http://www.open-mesh.org/ for more information and user space tools. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-16 13:44:24 -08:00
Changli Gao	b236da6931	net: use NUMA_NO_NODE instead of the magic number -1 Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-16 13:16:06 -08:00
Vladislav Zolotarov	a3d22a68d7	bnx2x: Take the distribution range definition out of skb_tx_hash() Move the calcualation of the Tx hash for a given hash range into a separate function and define the skb_tx_hash(), which calculates a Tx hash for a [0; dev->real_num_tx_queues - 1] hash values range, using this function (__skb_tx_hash()). Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-16 13:15:53 -08:00
Andrey Vagin	d3052b557a	ipv6: delete expired route in ip6_pmtu_deliver The first big packets sent to a "low-MTU" client correctly triggers the creation of a temporary route containing the reduced MTU. But after the temporary route has expired, new ICMP6 "packet too big" will be sent, rt6_pmtu_discovery will find the previous EXPIRED route check that its mtu isn't bigger then in icmp packet and do nothing before the temporary route will not deleted by gc. I make the simple experiment: while :; do time ( dd if=/dev/zero bs=10K count=1 \| ssh hostname dd of=/dev/null ) \|\| break; done The "time" reports real 0m0.197s if a temporary route isn't expired, but it reports real 0m52.837s (!!!!) immediately after a temporare route has expired. Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-16 12:28:13 -08:00
Luis R. Rodriguez	2784fe915c	cfg80211: fix null pointer dereference with a custom regulatory request Once we moved the core regulatory request to the queue and let the scheduler process it last_request will have been left NULL until the schedular decides to process the first request. When this happens and we are loading a driver with a custom regulatory request like all Atheros drivers we end up with a NULL pointer dereference. We fix this by checking if the request was a custom one. BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: [<ffffffffa016de87>] freq_reg_info_regd.clone.2+0x27/0x130 [cfg80211] PGD 71f91067 PUD 712b2067 PMD 0 Oops: 0000 [#1] PREEMPT SMP last sysfs file: /sys/devices/pci0000:00/0000:00:1d.7/usb2/2-1/firmware/2-1/loading CPU 0 Modules linked in: ath9k_htc(+) ath9k_common ath9k_hw ath <etc> Pid: 3094, comm: insmod Tainted: G W 2.6.37-rc5-wl #16 INVALID/28427ZQ RIP: 0010:[<ffffffffa016de87>] [<ffffffffa016de87>] freq_reg_info_regd.clone.2+0x27/0x130 [cfg80211] RSP: 0018:ffff88007045db78 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffffa047d9a0 RCX: ffff88007045dbd0 RDX: 0000000000004e20 RSI: 000000000024cde0 RDI: ffff8800700483e0 RBP: ffff88007045db98 R08: ffffffffa02f5b40 R09: 0000000000000001 R10: 000000000000000e R11: 0000000000000001 R12: 0000000000000000 R13: ffff88007004e3b0 R14: 0000000000000000 R15: ffff880070048340 FS: 00007f635a707700(0000) GS:ffff880077400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000004 CR3: 00000000708a9000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process insmod (pid: 3094, threadinfo ffff88007045c000, task ffff8800713e3ec0) Stack: ffffffffa047d9a0 0000000000000000 ffff88007004e3b0 0000000000000000 ffff88007045dc08 ffffffffa016e147 000000007045dc08 0000000000000002 ffff8800700483e0 ffffffffa02f5b40 ffff88007045dbd8 0000000000000000 Call Trace: [<ffffffffa016e147>] wiphy_apply_custom_regulatory+0x137/0x1d0 [cfg80211] [<ffffffffa047a690>] ? ath9k_reg_notifier+0x0/0x50 [ath9k_htc] [<ffffffffa02f47f7>] ath_regd_init+0x347/0x430 [ath] [<ffffffffa047b1f5>] ath9k_htc_probe_device+0x6c5/0x960 [ath9k_htc] [<ffffffffa0472a2c>] ath9k_htc_hw_init+0xc/0x30 [ath9k_htc] [<ffffffffa04747e6>] ath9k_hif_usb_probe+0x216/0x3b0 [ath9k_htc] [<ffffffffa03bb6bc>] usb_probe_interface+0x10c/0x210 [usbcore] [<ffffffff812aec26>] driver_probe_device+0x96/0x1c0 [<ffffffff812aedf3>] __driver_attach+0xa3/0xb0 [<ffffffff812aed50>] ? __driver_attach+0x0/0xb0 [<ffffffff812adaae>] bus_for_each_dev+0x5e/0x90 [<ffffffff812ae8c9>] driver_attach+0x19/0x20 [<ffffffff812ae438>] bus_add_driver+0x168/0x320 [<ffffffff812af071>] driver_register+0x71/0x140 [<ffffffff811fc4a8>] ? __raw_spin_lock_init+0x38/0x70 [<ffffffffa03ba39c>] usb_register_driver+0xdc/0x190 [usbcore] [<ffffffffa03a2000>] ? ath9k_htc_init+0x0/0x4f [ath9k_htc] [<ffffffffa047499e>] ath9k_hif_usb_init+0x1e/0x20 [ath9k_htc] [<ffffffffa03a202b>] ath9k_htc_init+0x2b/0x4f [ath9k_htc] [<ffffffff8100212f>] do_one_initcall+0x3f/0x180 [<ffffffff8109ef5b>] sys_init_module+0xbb/0x200 [<ffffffff8100bf52>] system_call_fastpath+0x16/0x1b Code: <etc, who cares> RIP [<ffffffffa016de87>] freq_reg_info_regd.clone.2+0x27/0x130 [cfg80211] RSP <ffff88007045db78> CR2: 0000000000000004 ---[ end trace 79e4193601c8b713 ]--- Reported-by: Sujith Manoharan <Sujith.Manoharan@atheros.com> Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-16 15:22:31 -05:00
Jouni Malinen	cf4e594ea7	nl80211: Add notification for dropped Deauth/Disassoc Add a new notification to indicate that a received, unprotected Deauthentication or Disassociation frame was dropped due to management frame protection being in use. This notification is needed to allow user space (e.g., wpa_supplicant) to implement SA Query procedure to recover from association state mismatch between an AP and STA. This is needed to avoid getting stuck in non-working state when MFP (IEEE 802.11w) is used and a protected Deauthentication or Disassociation frame is dropped for any reason. After that, the station would silently discard any unprotected Deauthentication or Disassociation frame that could be indicating that the AP does not have association for the STA (when the Reason Code would be 6 or 7). IEEE Std 802.11w-2009, 11.13 describes this recovery mechanism. Signed-off-by: Jouni Malinen <j@w1.fi> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-16 15:22:30 -05:00
Chuck Lever	bf2695516d	SUNRPC: New xdr_streams XDR decoder API Now that all client-side XDR decoder routines use xdr_streams, there should be no need to support the legacy calling sequence [rpc_rqst , __be32 , RPC res *] anywhere. We can construct an xdr_stream in the generic RPC code, instead of in each decoder function. This is a refactoring change. It should not cause different behavior. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-16 12:37:25 -05:00
Chuck Lever	9f06c719f4	SUNRPC: New xdr_streams XDR encoder API Now that all client-side XDR encoder routines use xdr_streams, there should be no need to support the legacy calling sequence [rpc_rqst , __be32 , RPC arg *] anywhere. We can construct an xdr_stream in the generic RPC code, instead of in each encoder function. Also, all the client-side encoder functions return 0 now, making a return value superfluous. Take this opportunity to convert them to return void instead. This is a refactoring change. It should not cause different behavior. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-16 12:37:25 -05:00
Chuck Lever	1ac7c23e4a	SUNRPC: Determine value of "nrprocs" automatically Clean up. Just fixed a panic where the nrprocs field in a different upper layer client was set by hand incorrectly. Use the compiler-generated method used by the other upper layer protocols. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-16 12:37:25 -05:00
Chuck Lever	4129ccf303	SUNRPC: Avoid return code checking in rpcbind XDR encoder functions Clean up. The trend in the other XDR encoder functions is to BUG() when encoding problems occur, since a problem here is always due to a local coding error. Then, instead of a status, zero is unconditionally returned. Update the rpcbind XDR encoders to behave this way. To finish the update, use the new-style be32_to_cpup() and cpu_to_be32() macros, and compute the buffer sizes using raw integers instead of sizeof(). This matches the conventions used in other XDR functions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-12-16 12:37:25 -05:00
KOVACS Krisztian	ae90bdeaea	netfilter: fix compilation when conntrack is disabled but tproxy is enabled The IPv6 tproxy patches split IPv6 defragmentation off of conntrack, but failed to update the #ifdef stanzas guarding the defragmentation related fields and code in skbuff and conntrack related code in nf_defrag_ipv6.c. This patch adds the required #ifdefs so that IPv6 tproxy can truly be used without connection tracking. Original report: http://marc.info/?l=linux-netdev&m=129010118516341&w=2 Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: KOVACS Krisztian <hidden@balabit.hu> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-12-15 23:53:41 +01:00
Shan Wei	38cd6b4f52	wireless:mac80211: kill unuse macro MESH_CFG_CMP_LEN in mesh.h Commit `00d3f14c` has removed the references of this macro, but left it only. So remove this definition. commit `00d3f14cf9` Author: Johannes Berg <johannes@sipsolutions.net> Date: Tue Feb 10 21:26:00 2009 +0100 mac80211: use cfg80211s BSS infrastructure Remove all the code from mac80211 to keep track of BSSes and use the cfg80211-provided code completely. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-15 17:04:08 -05:00
Sujith Manoharan	bd2ce6e43f	mac80211: Add timeout to BA session start API Allow drivers or rate control algorithms to specify BlockAck session timeout when initiating an ADDBA transaction. This is useful in cases where maintaining persistent BA sessions does not incur any overhead. The current timeout value of 5000 TUs is retained for all non ath9k/ath9k_htc drivers. Signed-off-by: Sujith Manoharan <Sujith.Manoharan@atheros.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-15 17:03:59 -05:00
Johannes Berg	a293911d4f	nl80211: advertise maximum remain-on-channel duration With the upcoming hardware offload implementation, some devices will have a different maximum duration for the remain-on-channel command. Advertise the maximum duration in mac80211, and make mac80211 set it. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-15 17:03:56 -05:00
John W. Linville	1fcfe76a76	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 Conflicts: drivers/net/wireless/iwlwifi/iwl-1000.c drivers/net/wireless/iwlwifi/iwl-6000.c drivers/net/wireless/iwlwifi/iwl-core.h	2010-12-15 16:33:28 -05:00
David S. Miller	82cc4f5cb8	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6	2010-12-15 09:43:13 -08:00
Tejun Heo	afe2c511fb	workqueue: convert cancel_rearming_delayed_work[queue]() users to cancel_delayed_work_sync() cancel_rearming_delayed_work[queue]() has been superceded by cancel_delayed_work_sync() quite some time ago. Convert all the in-kernel users. The conversions are completely equivalent and trivial. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: "David S. Miller" <davem@davemloft.net> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-by: Evgeniy Polyakov <zbr@ioremap.net> Cc: Jeff Garzik <jgarzik@pobox.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: netdev@vger.kernel.org Cc: Anton Vorontsov <cbou@mail.ru> Cc: David Woodhouse <dwmw2@infradead.org> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Cc: Alex Elder <aelder@sgi.com> Cc: xfs-masters@oss.sgi.com Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: netfilter-devel@vger.kernel.org Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: linux-nfs@vger.kernel.org	2010-12-15 10:56:11 +01:00
Linus Torvalds	b4fe2a0342	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (75 commits) pppoe.c: Fix kernel panic caused by __pppoe_xmit WAN: Fix a TX IRQ causing BUG() in PC300 and PCI200SYN drivers. bnx2x: Advance a version number to 1.60.01-0 bnx2x: Fixed a compilation warning bnx2x: LSO code was broken on BE platforms qlge: Fix deadlock when cancelling worker. net: fix skb_defer_rx_timestamp() cxgb4vf: Ingress Queue Entry Size needs to be 64 bytes phy: add the IC+ IP1001 driver atm: correct sysfs 'device' link creation and parent relationships MAINTAINERS: remove me from tulip SCTP: Fix SCTP_SET_PEER_PRIMARY_ADDR to accpet v4mapped address enic: Bug Fix: Pass napi reference to the isr that services receive queue ipv6: fix nl group when advertising a new link connector: add module alias net: Document the kernel_recvmsg() function r8169: Fix runtime power management hso: IP checksuming doesn't work on GE0301 option cards xfrm: Fix xfrm_state_migrate leak net: Convert netpoll blocking api in bonding driver to be a counter ...	2010-12-14 17:33:40 -08:00
David S. Miller	d33e455337	net: Abstract default MTU metric calculation behind an accessor. Like RTAX_ADVMSS, make the default calculation go through a dst_ops method rather than caching the computation in the routing cache entries. Now dst metrics are pretty much left as-is when new entries are created, thus optimizing metric sharing becomes a real possibility. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-14 13:01:14 -08:00
David S. Miller	6389aa73ab	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6	2010-12-14 10:52:54 -08:00
Sage Weil	d96c9043d1	ceph: fix msgr_init error path create_workqueue() returns NULL on failure. Signed-off-by: Sage Weil <sage@newdream.net>	2010-12-13 20:30:28 -08:00
David S. Miller	0dbaee3b37	net: Abstract default ADVMSS behind an accessor. Make all RTAX_ADVMSS metric accesses go through a new helper function, dst_metric_advmss(). Leave the actual default metric as "zero" in the real metric slot, and compute the actual default value dynamically via a new dst_ops AF specific callback. For stacked IPSEC routes, we use the advmss of the path which preserves existing behavior. Unlike ipv4/ipv6, DecNET ties the advmss to the mtu and thus updates advmss on pmtu updates. This inconsistency in advmss handling results in more raw metric accesses than I wish we ended up with. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-13 12:52:14 -08:00
Johannes Berg	207aba6018	mac80211: support IBSS RSN with SW crypto When software crypto is used, mac80211 will support IBSS RSN, it doesn't depend on the driver in that case. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-13 15:23:30 -05:00
Tim Harvey	91f44b0299	mac80211 default tx_last_beacon false (congestion) The 802.11 spec states that the STA that generated the last Beacon frame shall be the STA that response to a probe request. This is important for congestion reduction when a probe request is received - only 1 node in an adhoc BSS will transmit a response. While mac80211 drivers should provide the tx_last_beacon function to report if they transmitted the last beacon many do not. As an attempt to reduce probe response congestion default this to 0 such that a node not implementing this capability does not contribute to unnecessary congestion. In a modern medium sized office environment I see upwards of 100 probe requests per second received at a given node from various hardware/OS/drivers doing zeroconf 'active probing' as opposed to passively listening for beacons. With a modest 10-node adhoc network consisting of drivers that do not implement this tx_last_beacon feature, I have seen this result in the simultaneous xmit of probe responses accumulating to 500 probe responses per second because of collisions which brings the adhoc network to its knees as well as causes needless congestion. Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-13 15:23:29 -05:00
Johannes Berg	f7e0104c1a	mac80211: support separate default keys Add support for split default keys (unicast and multicast) in mac80211. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-13 15:23:29 -05:00
Johannes Berg	dbd2fd656f	cfg80211/nl80211: separate unicast/multicast default TX keys Allow userspace to specify that a given key is default only for unicast and/or multicast transmissions. Only WEP keys are for both, WPA/RSN keys set here are GTKs for multicast only. For more future flexibility, allow to specify all combiations. Wireless extensions can only set both so use nl80211; WEP keys (connect keys) must be set as default for both (but 802.1X WEP is still possible). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-13 15:23:28 -05:00
Johannes Berg	897bed8b43	mac80211: clean up RX key checks Using the default key for "any key set" isn't quite what we should do. It works, but with the upcoming changes it makes life unnecessarily complex, so do something better here and really check for "any key". Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-13 15:23:28 -05:00
Sven Neumann	01123e2331	cfg80211: update information elements in cached BSS struct When a cached BSS struct is updated because a new beacon was received, the code replaces the cached information elements by the IEs from the new beacon. However it did not update the pub.information_elements and pub.len_information_elements fields leaving them either pointing to the old beacon IEs or in an inconsistent state where the data is replaced by the new beacon IEs but len_information_elements still has its value from the first beacon. Fix this by updating the information elements fields if they are pointing to beacon IEs. Signed-off-by: Sven Neumann <s.neumann@raumfeld.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-13 15:23:28 -05:00
Bruno Randolf	a7ffac9591	cfg80211: Add antenna availability information Add a field to wiphy for the hardware to report the availble antennas for configuration. Only if this is set to something bigger than zero, will the anntenna configuration ops be executed. Allthough this could be a simple number of antennas, I defined it as a bitmap of antennas which are available for configuration, since it's more consistent with the rest of the antenna API and there could be cases where the hardware allows only configuration of certain antennas. As it does not make much of a difference in size or normal usage, I think it's better to be able to support this, in case the need arises. The antenna configuration is now also checked against the availabe antennas and rejected if it does not match. Signed-off-by: Bruno Randolf <br1@einfach.org> -- v3: always apply available antenna mask (for "all" antennas case). v2: reject antenna configurations which don't match the available antennas Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-13 15:23:27 -05:00
John W. Linville	1d212aa96e	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem	2010-12-13 15:20:45 -05:00
Eric Dumazet	249fab773d	net: add limits to ip_default_ttl ip_default_ttl should be between 1 and 255 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-13 12:16:14 -08:00
Herton Ronaldo Krzesinski	8808f64171	mac80211: avoid calling ieee80211_work_work unconditionally On suspend, there might be usb wireless drivers which wrongly trigger the warning in ieee80211_work_work. If an usb driver doesn't have a suspend hook, the usb stack will disconnect the device. On disconnect, a mac80211 driver calls ieee80211_unregister_hw, which calls dev_close, which calls ieee80211_stop, and in the end calls ieee80211_work_purge-> ieee80211_work_work. The problem is that this call to ieee80211_work_purge comes after mac80211 is suspended, triggering the warning even when we don't have work queued in work_list (the expected case when already suspended), because it always calls ieee80211_work_work. So, just call ieee80211_work_work in ieee80211_work_purge if we really have to abort work. This addresses the warning reported at https://bugzilla.kernel.org/show_bug.cgi?id=24402 Signed-off-by: Herton Ronaldo Krzesinski <herton@mandriva.com.br> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-13 14:55:08 -05:00
Tim Harvey	c926d006c1	mac80211: Fix NULL-pointer deference on ibss merge when not ready dev_open will eventually call ieee80211_ibss_join which sets up the skb used for beacons/probe-responses however it is possible to receive beacons that attempt to merge before this occurs causing a null pointer dereference. Check ssid_len as that is the last thing set in ieee80211_ibss_join. This occurs quite easily in the presence of adhoc nodes with hidden SSID's revised previous patch to check further up based on irc feedback Signed-off-by: Tim Harvey <harvey.tim@gmail.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-13 14:53:46 -05:00
John W. Linville	10c38c3306	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/padovan/bluetooth-2.6	2010-12-13 14:41:23 -05:00
David S. Miller	323e126f0c	ipv4: Don't pre-seed hoplimit metric. Always go through a new ip4_dst_hoplimit() helper, just like ipv6. This allowed several simplifications: 1) The interim dst_metric_hoplimit() can go as it's no longer userd. 2) The sysctl_ip_default_ttl entry no longer needs to use ipv4_doint_and_flush, since the sysctl is not cached in routing cache metrics any longer. 3) ipv4_doint_and_flush no longer needs to be exported and therefore can be marked static. When ipv4_doint_and_flush_strategy was removed some time ago, the external declaration in ip.h was mistakenly left around so kill that off too. We have to move the sysctl_ip_default_ttl declaration into ipv4's route cache definition header net/route.h, because currently net/ip.h (where the declaration lives now) has a back dependency on net/route.h Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-12 22:08:17 -08:00
David S. Miller	a02e4b7dae	ipv6: Demark default hoplimit as zero. This is for consistency with ipv4. Using "-1" makes no sense. It was made this way a long time ago merely to be consistent with how the ipv6 socket hoplimit "default" is stored. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-12 21:39:02 -08:00
David S. Miller	5170ae824d	net: Abstract RTAX_HOPLIMIT metric accesses behind helper. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-12 21:35:57 -08:00
David S. Miller	abbf46ae0e	ipv6: Use ip6_dst_hoplimit() instead of direct dst_metric() calls. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-12 21:14:46 -08:00
Eric Dumazet	a19faf0250	net: fix skb_defer_rx_timestamp() After commit `c1f19b51d1` (net: support time stamping in phy devices.), kernel might crash if CONFIG_NETWORK_PHY_TIMESTAMPING=y and skb_defer_rx_timestamp() handles a packet without an ethernet header. Fixes kernel bugzilla #24102 Reference: https://bugzilla.kernel.org/show_bug.cgi?id=24102 Reported-and-tested-by: Andrew Watts <akwatts@ymail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-10 16:20:56 -08:00
Herbert Xu	deef4b522b	bridge: Use consistent NF_DROP returns in nf_pre_routing The nf_pre_routing functions in bridging have collected two distinct ways of returning NF_DROP over the years, inline and via goto. There is no reason for preferring either one. So this patch arbitrarily picks the inline variant and converts the all the gotos. Also removes a redundant comment. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-10 16:04:53 -08:00
Changli Gao	c053fd96d0	af_packet: use swap() instead of the open coded macro XC() Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-10 16:02:20 -08:00
Ben Hutchings	e596e6e4d5	ethtool: Report link-down while interface is down While an interface is down, many implementations of ethtool_ops::get_link, including the default, ethtool_op_get_link(), will report the last link state seen while the interface was up. In general the current physical link state is not available if the interface is down. Define ETHTOOL_GLINK to reflect whether the interface and any physical port have a working link, and consistently return 0 when the interface is down. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-10 15:55:23 -08:00
Dan Williams	d9ca676bcb	atm: correct sysfs 'device' link creation and parent relationships The ATM subsystem was incorrectly creating the 'device' link for ATM nodes in sysfs. This led to incorrect device/parent relationships exposed by sysfs and udev. Instead of rolling the 'device' link by hand in the generic ATM code, pass each ATM driver's bus device down to the sysfs code and let sysfs do this stuff correctly. Signed-off-by: Dan Williams <dcbw@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-10 15:45:05 -08:00
Junchang Wang	a8d764b983	pktgen: adding prefetchw() call We know for sure pktgen is going to write skb->data right after *_alloc_skb, causing unnecessary cache misses. Idea is to add a prefetchw() call to prefetch the first cache line indicated by skb->data. On systems with Adjacent Cache Line Prefetch, it's probably two cache lines are prefetched. With this patch, pktgen on Intel SR1625 server with two E5530 quad-core processors and a single ixgbe-based NIC went from 8.63Mpps to 9.03Mpps, with 4.6% improvement. Signed-off-by: Junchang Wang <junchangwang@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-10 15:36:52 -08:00
Wei Yongjun	40a010395c	SCTP: Fix SCTP_SET_PEER_PRIMARY_ADDR to accpet v4mapped address SCTP_SET_PEER_PRIMARY_ADDR does not accpet v4mapped address, using v4mapped address in SCTP_SET_PEER_PRIMARY_ADDR socket option will get -EADDRNOTAVAIL error if v4map is enabled. This patch try to fix it by mapping v4mapped address to v4 address if allowed. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-10 15:29:49 -08:00
Tobias Klauser	376d940ee9	inet6: Remove redundant unlikely() IS_ERR() already implies unlikely(), so it can be omitted here. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-10 14:57:34 -08:00
Martin Willi	040253c931	xfrm: Traffic Flow Confidentiality for IPv6 ESP Add TFC padding to all packets smaller than the boundary configured on the xfrm state. If the boundary is larger than the PMTU, limit padding to the PMTU. Signed-off-by: Martin Willi <martin@strongswan.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-10 14:43:59 -08:00
Martin Willi	d979e20f2b	xfrm: Traffic Flow Confidentiality for IPv4 ESP Add TFC padding to all packets smaller than the boundary configured on the xfrm state. If the boundary is larger than the PMTU, limit padding to the PMTU. Signed-off-by: Martin Willi <martin@strongswan.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-10 14:43:59 -08:00
Martin Willi	35d2856b46	xfrm: Add Traffic Flow Confidentiality padding XFRM attribute The XFRMA_TFCPAD attribute for XFRM state installation configures Traffic Flow Confidentiality by padding ESP packets to a specified length. Signed-off-by: Martin Willi <martin@strongswan.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-10 14:43:58 -08:00
Jiri Pirko	c07224005d	net/ipv6/udp.c: fix typo in flush_stack() skb1 should be passed as parameter to sk_rcvqueues_full() here. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-10 14:05:09 -08:00
David S. Miller	457de4383e	ipv6: Fix 'release_it' logic in tcp_v6_get_peer() We accidently set it to "true" for the case where we are using a route bound peer. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-10 13:16:09 -08:00
Tobias Klauser	4c0833bcd4	bridge: Fix return values of br_multicast_add_group/br_multicast_new_group If br_multicast_new_group returns NULL, we would return 0 (no error) to the caller of br_multicast_add_group, which is not what we want. Instead br_multicast_new_group should return ERR_PTR(-ENOMEM) in this case. Also propagate the error number returned by br_mdb_rehash properly. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-10 13:00:39 -08:00
David S. Miller	e91db5cd6f	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6	2010-12-10 12:51:02 -08:00
Nicolas Dichtel	5f75a1042f	ipv6: fix nl group when advertising a new link New idev are advertised with NL group RTNLGRP_IPV6_IFADDR, but should use RTNLGRP_IPV6_IFINFO. Bug was introduced by commit `8d7a76c9`. Signed-off-by: Wang Xuefu <xuefu.wang@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-10 12:49:36 -08:00
David S. Miller	eaa7dcde1d	Merge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/net-next-2.6	2010-12-10 11:22:57 -08:00
Martin Lucina	c1249c0aae	net: Document the kernel_recvmsg() function Signed-off-by: Martin Lucina <mato@kotelna.sk> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-10 11:13:18 -08:00
David S. Miller	1e13f863ca	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 Conflicts: drivers/net/wireless/ath/ath9k/ar9003_eeprom.c	2010-12-10 09:50:47 -08:00
Uwe Kleine-König	a34f0b3139	fix comment typos concerning "consistent" Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-12-10 16:04:28 +01:00
Shan Wei	b7ec19af63	dccp: remove unused macros Remove macros which have been unused since the initial implementation (commit `7c657876b6`, [DCCP]: Initial implementation from Tue Aug 9 20:14:34 2005 -0700). Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2010-12-10 12:49:23 +01:00
Eric Dumazet	4bc65dd8d8	filter: use size of fetched data in __load_pointer() __load_pointer() checks data we fetch from skb is included in head portion, but assumes we fetch one byte, instead of up to four. This wont crash because we have extra bytes (struct skb_shared_info) after head, but this can read uninitialized bytes. Fix this using size of the data (1, 2, 4 bytes) in the test. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-09 20:47:04 -08:00
Thomas Egerer	78347c8c6b	xfrm: Fix xfrm_state_migrate leak xfrm_state_migrate calls kfree instead of xfrm_state_put to free a failed state. According to git commit `553f9118` this can cause memory leaks. Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-09 20:35:27 -08:00
Eric Dumazet	68835aba4d	net: optimize INET input path further Followup of commit `b178bb3dfc` (net: reorder struct sock fields) Optimize INET input path a bit further, by : 1) moving sk_refcnt close to sk_lock. This reduces number of dirtied cache lines by one on 64bit arches (and 64 bytes cache line size). 2) moving inet_daddr & inet_rcv_saddr at the beginning of sk (same cache line than hash / family / bound_dev_if / nulls_node) This reduces number of accessed cache lines in lookups by one, and dont increase size of inet and timewait socks. inet and tw sockets now share same place-holder for these fields. Before patch : offsetof(struct sock, sk_refcnt) = 0x10 offsetof(struct sock, sk_lock) = 0x40 offsetof(struct sock, sk_receive_queue) = 0x60 offsetof(struct inet_sock, inet_daddr) = 0x270 offsetof(struct inet_sock, inet_rcv_saddr) = 0x274 After patch : offsetof(struct sock, sk_refcnt) = 0x44 offsetof(struct sock, sk_lock) = 0x48 offsetof(struct sock, sk_receive_queue) = 0x68 offsetof(struct inet_sock, inet_daddr) = 0x0 offsetof(struct inet_sock, inet_rcv_saddr) = 0x4 compute_score() (udp or tcp) now use a single cache line per ignored item, instead of two. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-09 20:05:58 -08:00
David S. Miller	defb3519a6	net: Abstract away all dst_entry metrics accesses. Use helper functions to hide all direct accesses, especially writes, to dst_entry metrics values. This will allow us to: 1) More easily change how the metrics are stored. 2) Implement COW for metrics. In particular this will help us put metrics into the inetpeer cache if that is what we end up doing. We can make the _metrics member a pointer instead of an array, initially have it point at the read-only metrics in the FIB, and then on the first set grab an inetpeer entry and point the _metrics member there. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>	2010-12-09 10:46:36 -08:00
David S. Miller	4e085e76cb	econet: Fix crash in aun_incoming(). Unconditional use of skb->dev won't work here, try to fetch the econet device via skb_dst()->dev instead. Suggested by Eric Dumazet. Reported-by: Nelson Elhage <nelhage@ksplice.com> Tested-by: Nelson Elhage <nelhage@ksplice.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-08 20:51:15 -08:00
David S. Miller	fe6c791570	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/ath/ath9k/ar9003_eeprom.c net/llc/af_llc.c	2010-12-08 13:47:38 -08:00
John W. Linville	393934c6b5	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 Conflicts: drivers/net/wireless/ath/ath9k/ath9k.h drivers/net/wireless/ath/ath9k/main.c drivers/net/wireless/ath/ath9k/xmit.c	2010-12-08 16:23:31 -05:00
Ben Greear	2f88675011	mac80211: Show max number of probe tries in debug message. Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-08 15:38:44 -05:00
Helmut Schaa	80d7e403c9	mac80211: Apply ht_opmode changes in ieee80211_change_bss Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-08 15:38:43 -05:00
Helmut Schaa	50b12f597b	cfg80211: Add new BSS attribute ht_opmode Add a new BSS attribute to allow hostapd to set the current HT opmode. Otherwise drivers won't be able to set up protection for HT rates in AP mode. Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-08 15:38:43 -05:00
Eric Dumazet	f19872575f	tcp: protect sysctl_tcp_cookie_size reads Make sure sysctl_tcp_cookie_size is read once in tcp_cookie_size_check(), or we might return an illegal value to caller if sysctl_tcp_cookie_size is changed by another cpu. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: William Allen Simpson <william.allen.simpson@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-08 12:34:09 -08:00
Eric Dumazet	ad9f4f50fe	tcp: avoid a possible divide by zero sysctl_tcp_tso_win_divisor might be set to zero while one cpu runs in tcp_tso_should_defer(). Make sure we dont allow a divide by zero by reading sysctl_tcp_tso_win_divisor exactly once. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-08 12:34:08 -08:00
Helmut Schaa	7e24470756	mac80211: Fix BUG in pskb_expand_head when transmitting shared skbs mac80211 doesn't handle shared skbs correctly at the moment. As a result a possible resize can trigger a BUG in pskb_expand_head. [ 676.030000] Kernel bug detected[#1]: [ 676.030000] Cpu 0 [ 676.030000] $ 0 : 00000000 00000000 819662ff 00000002 [ 676.030000] $ 4 : 81966200 00000020 00000000 00000020 [ 676.030000] $ 8 : 819662e0 800043c0 00000002 00020000 [ 676.030000] $12 : 3b9aca00 00000000 00000000 00470000 [ 676.030000] $16 : 80ea2000 00000000 00000000 00000000 [ 676.030000] $20 : 818aa200 80ea2018 80ea2000 00000008 [ 676.030000] $24 : 00000002 800ace5c [ 676.030000] $28 : 8199a000 8199bd20 81938f88 80f180d4 [ 676.030000] Hi : 0000026e [ 676.030000] Lo : 0000757e [ 676.030000] epc : 801245e4 pskb_expand_head+0x44/0x1d8 [ 676.030000] Not tainted [ 676.030000] ra : 80f180d4 ieee80211_skb_resize+0xb0/0x114 [mac80211] [ 676.030000] Status: 1000a403 KERNEL EXL IE [ 676.030000] Cause : 10800024 [ 676.030000] PrId : 0001964c (MIPS 24Kc) [ 676.030000] Modules linked in: mac80211_hwsim rt2800lib rt2x00soc rt2x00pci rt2x00lib mac80211 crc_itu_t crc_ccitt cfg80211 compat arc4 aes_generic deflate ecb cbc [last unloaded: rt2800pci] [ 676.030000] Process kpktgend_0 (pid: 97, threadinfo=8199a000, task=81879f48, tls=00000000) [ 676.030000] Stack : ffffffff 00000000 00000000 00000014 00000004 80ea2000 00000000 00000000 [ 676.030000] 818aa200 80f180d4 ffffffff 0000000a 81879f78 81879f48 81879f48 00000018 [ 676.030000] 81966246 80ea2000 818432e0 80f1a420 80203050 81814d98 00000001 81879f48 [ 676.030000] 81879f48 00000018 81966246 818432e0 0000001a 8199bdd4 0000001c 80f1b72c [ 676.030000] 80203020 8001292c 80ef4aa2 7f10b55d 801ab5b8 81879f48 00000188 80005c90 [ 676.030000] ... [ 676.030000] Call Trace: [ 676.030000] [<801245e4>] pskb_expand_head+0x44/0x1d8 [ 676.030000] [<80f180d4>] ieee80211_skb_resize+0xb0/0x114 [mac80211] [ 676.030000] [<80f1a420>] ieee80211_xmit+0x150/0x22c [mac80211] [ 676.030000] [<80f1b72c>] ieee80211_subif_start_xmit+0x6f4/0x73c [mac80211] [ 676.030000] [<8014361c>] pktgen_thread_worker+0xfac/0x16f8 [ 676.030000] [<8002ebe8>] kthread+0x7c/0x88 [ 676.030000] [<80008e0c>] kernel_thread_helper+0x10/0x18 [ 676.030000] [ 676.030000] [ 676.030000] Code: 24020001 10620005 2502001f <0200000d> 0804917a 00000000 2502001f 00441023 00531021 Fix this by making a local copy of shared skbs prior to mangeling them. To avoid copying the skb unnecessarily move the skb_copy call below the checks that don't need write access to the skb. Also, move the assignment of nh_pos and h_pos below the skb_copy to point to the correct skb. It would be possible to avoid another resize of the copied skb by using skb_copy_expand instead of skb_copy but that would make the patch more complex. Also, shared skbs are a corner case right now, so the resize shouldn't matter much. Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com> Cc: stable@kernel.org Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-08 15:23:48 -05:00
Tom Herbert	67631510a3	tcp: Replace time wait bucket msg by counter Rather than printing the message to the log, use a mib counter to keep track of the count of occurences of time wait bucket overflow. Reduces spam in logs. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-08 12:16:33 -08:00
Apollon Oikonomopoulos	171995e5d8	x25: decrement netdev reference counts on unload x25 does not decrement the network device reference counts on module unload. Thus unregistering any pre-existing interface after unloading the x25 module hangs and results in unregister_netdevice: waiting for tap0 to become free. Usage count = 1 This patch decrements the reference counts of all interfaces in x25_link_free, the way it is already done in x25_link_device_down for NETDEV_DOWN events. Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-08 12:13:44 -08:00
Michal Marek	e8d34a884e	l2tp: Fix modalias of l2tp_ip Using the SOCK_DGRAM enum results in "net-pf-2-proto-SOCK_DGRAM-type-115", so use the numeric value like it is done in net/dccp. Signed-off-by: Michal Marek <mmarek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-08 12:13:43 -08:00
Nelson Elhage	0c62fc6dd0	econet: Do the correct cleanup after an unprivileged SIOCSIFADDR. We need to drop the mutex and do a dev_put, so set an error code and break like the other paths, instead of returning directly. Signed-off-by: Nelson Elhage <nelhage@ksplice.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-08 12:13:42 -08:00
Changli Gao	920b8d913b	af_packet: fix freeing pg_vec twice on error path It is introduced in: commit `0e3125c755` Author: Neil Horman <nhorman@tuxdriver.com> Date: Tue Nov 16 10:26:47 2010 -0800 packet: Enhance AF_PACKET implementation to not require high order contiguous memory allocation (v4) Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-08 10:43:41 -08:00
Changli Gao	f6dafa95d1	af_packet: eliminate pgv_to_page on some arches Some arches don't need flush_dcache_page(), and don't implement it, so we can eliminate pgv_to_page() calls on those arches. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-08 10:43:41 -08:00
Eric Dumazet	15c2d75f49	net: call dev_queue_xmit_nit() after skb_dst_drop() Avoid some atomic ops on dst refcount, calling dev_queue_xmit_nit() after skb_dst_drop() in dev_hard_start_xmit(). When queueing a packet into af_packet socket, we drop dst anyway. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-08 10:39:54 -08:00
Eric Dumazet	62ab081213	filter: constify sk_run_filter() sk_run_filter() doesnt write on skb, change its prototype to reflect this. Fix two af_packet comments. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-08 10:30:34 -08:00
Eric Dumazet	941666c2e3	net: RCU conversion of dev_getbyhwaddr() and arp_ioctl() Le dimanche 05 décembre 2010 à 09:19 +0100, Eric Dumazet a écrit : > Hmm.. > > If somebody can explain why RTNL is held in arp_ioctl() (and therefore > in arp_req_delete()), we might first remove RTNL use in arp_ioctl() so > that your patch can be applied. > > Right now it is not good, because RTNL wont be necessarly held when you > are going to call arp_invalidate() ? While doing this analysis, I found a refcount bug in llc, I'll send a patch for net-2.6 Meanwhile, here is the patch for net-next-2.6 Your patch then can be applied after mine. Thanks [PATCH] net: RCU conversion of dev_getbyhwaddr() and arp_ioctl() dev_getbyhwaddr() was called under RTNL. Rename it to dev_getbyhwaddr_rcu() and change all its caller to now use RCU locking instead of RTNL. Change arp_ioctl() to use RCU instead of RTNL locking. Note: this fix a dev refcount bug in llc Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-08 10:07:24 -08:00
David S. Miller	a2d4b65d47	Merge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/net-next-2.6	2010-12-08 10:01:00 -08:00
Eric Dumazet	35d9b0c906	llc: fix a device refcount imbalance Le dimanche 05 décembre 2010 à 12:23 +0100, Eric Dumazet a écrit : > Le dimanche 05 décembre 2010 à 09:19 +0100, Eric Dumazet a écrit : > > > Hmm.. > > > > If somebody can explain why RTNL is held in arp_ioctl() (and therefore > > in arp_req_delete()), we might first remove RTNL use in arp_ioctl() so > > that your patch can be applied. > > > > Right now it is not good, because RTNL wont be necessarly held when you > > are going to call arp_invalidate() ? > > While doing this analysis, I found a refcount bug in llc, I'll send a > patch for net-2.6 Oh well, of course I must first fix the bug in net-2.6, and wait David pull the fix in net-next-2.6 before sending this rcu conversion. Note: this patch should be sent to stable teams (2.6.34 and up) [PATCH net-2.6] llc: fix a device refcount imbalance commit `abf9d537fe` (llc: add support for SO_BINDTODEVICE) added one refcount imbalance in llc_ui_bind(), because dev_getbyhwaddr() doesnt take a reference on device, while dev_get_by_index() does. Fix this using RCU locking. And since an RCU conversion will be done for 2.6.38 for dev_getbyhwaddr(), put the rcu_read_lock/unlock exactly at their final place. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: stable@kernel.org Cc: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-08 09:58:44 -08:00
Thiago Farina	01b0c5cfb2	net/9p/protocol.c: Remove duplicated macros. Use the macros already provided by kernel.h file. Signed-off-by: Thiago Farina <tfransosi@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-08 09:56:28 -08:00
Changli Gao	aa94210411	net: init ingress queue The dev field of ingress queue is forgot to initialized, then NULL pointer dereference happens in qdisc_alloc(). Move inits of tx queues to netif_alloc_netdev_queues(). Signed-off-by: Changli Gao <xiaosuo@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-08 09:43:27 -08:00
Nandita Dukkipati	b1afde60f2	tcp: Bug fix in initialization of receive window. The bug has to do with boundary checks on the initial receive window. If the initial receive window falls between init_cwnd and the receive window specified by the user, the initial window is incorrectly brought down to init_cwnd. The correct behavior is to allow it to remain unchanged. Signed-off-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-08 09:38:37 -08:00
David S. Miller	4f58605e6b	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6	2010-12-08 08:13:01 -08:00
Miloslav Trmač	6f107b5861	net: Add missing lockdep class names for af_alg Signed-off-by: Miloslav Trmač <mitr@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2010-12-08 14:35:34 +08:00
NeilBrown	ed2849d3ec	sunrpc: prevent use-after-free on clearing XPT_BUSY When an xprt is created, it has a refcount of 1, and XPT_BUSY is set. The refcount is not owned by the thread that created the xprt (as is clear from the fact that creators never put the reference). Rather, it is owned by the absence of XPT_DEAD. Once XPT_DEAD is set, (And XPT_BUSY is clear) that initial reference is dropped and the xprt can be freed. So when a creator clears XPT_BUSY it is dropping its only reference and so must not touch the xprt again. However svc_recv, after calling ->xpo_accept (and so getting an XPT_BUSY reference on a new xprt), calls svc_xprt_recieved. This clears XPT_BUSY and then svc_xprt_enqueue - this last without owning a reference. This is dangerous and has been seen to leave svc_xprt_enqueue working with an xprt containing garbage. So we need to hold an extra counted reference over that call to svc_xprt_received. For safety, any time we clear XPT_BUSY and then use the xprt again, we first get a reference, and the put it again afterwards. Note that svc_close_all does not need this extra protection as there are no threads running, and the final free can only be called asynchronously from such a thread. Signed-off-by: NeilBrown <neilb@suse.de> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-12-07 20:39:55 -05:00
Johan Hedberg	a40c406cbd	Bluetooth: Make hci_send_to_sock usable for management control sockets In order to send data to management control sockets the function should: - skip checks intended for raw HCI data and stack internal events - make sure RAW HCI data or stack internal events don't go to management control sockets In order to accomplish this the patch adds a new member to the bluetooth skb private data to flag skb's that are destined for management control sockets. Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-12-07 23:03:39 -02:00
Johan Hedberg	0381101fd6	Bluetooth: Add initial Bluetooth Management interface callbacks Add initial code for handling Bluetooth Management interface messages. Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Acked-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-12-07 23:03:38 -02:00
Javier Cardona	7659a193f9	mac80211: Fix compilation error when mesh is disabled Wrap mesh sections inside CONFIG_MAC80211_MESH to fix compilation problems reported by Stephen Rothwell, Larry Finger and Bruno Randolf. Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-07 17:08:09 -05:00
Felix Fietkau	c658e5db01	mac80211: fix a compiler warning net/mac80211/mlme.c: In function 'ieee80211_sta_work': net/mac80211/mlme.c:1981: warning: too many arguments for format Introduced by commit `04ac3c0ee2` ("mac80211: speed up AP probing using nullfunc frames"). Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-07 17:08:06 -05:00
Eliad Peller	0ab82b04ac	mac80211: fix dynamic-ps/pm_qos magic numbers mac80211 uses pm_qos (/dev/network_latency) in order to determine the dynamic ps timeout (or disable the dynamic-ps at all in some cases). commit `ff616381` added a comparison for the current network_latency against one high value (1900ms), and against the default value (2000sec, rather than the commented 2sec). however, the representation of 1900ms was incorrect: 1900ms = 1900000us ( != 1900000000 ) fix it by using USEC_TO_MSEC/SEC consts. Signed-off-by: Eliad Peller <eliad@wizery.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-07 16:09:13 -05:00
Bruno Randolf	541a45a142	nl80211/mac80211: Report signal average Extend nl80211 to report an exponential weighted moving average (EWMA) of the signal value. Since the signal value usually fluctuates between different packets, an average can be more useful than the value of the last packet. This uses the recently added generic EWMA library function. -- v2: fix ABI breakage and change factor to be a power of 2. Signed-off-by: Bruno Randolf <br1@einfach.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-07 16:09:12 -05:00
Tracey Dent	ff2109f5f9	Net: bluetooth: Makefile: Remove deprecated kbuild goal definitions Changed Makefile to use <modules>-y instead of <modules>-objs because -objs is deprecated and not mentioned in Documentation/kbuild/makefiles.txt. Signed-off-by: Tracey Dent <tdent48227@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-07 13:52:39 -02:00
Tomasz Grobelny	0491026507	dccp qpolicy: Parameter checking of cmsg qpolicy parameters Ensure that cmsg->cmsg_type value is valid for qpolicy that is currently in use. Signed-off-by: Tomasz Grobelny <tomasz@grobelny.oswiecenia.net> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2010-12-07 13:47:12 +01:00
Tomasz Grobelny	871a2c16c2	dccp: Policy-based packet dequeueing infrastructure This patch adds a generic infrastructure for policy-based dequeueing of TX packets and provides two policies: * a simple FIFO policy (which is the default) and * a priority based policy (set via socket options). Both policies honour the tx_qlen sysctl for the maximum size of the write queue (can be overridden via socket options). The priority policy uses skb->priority internally to assign an u32 priority identifier, using the same ranking as SO_PRIORITY. The skb->priority field is set to 0 when the packet leaves DCCP. The priority is supplied as ancillary data using cmsg(3), the patch also provides the requisite parsing routines. Signed-off-by: Tomasz Grobelny <tomasz@grobelny.oswiecenia.net> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2010-12-07 13:47:12 +01:00
David Shwatrz	8917a3c0b7	Fix a typo in datagram.c and sctp/socket.c. Hi, This patch fixes a typo in net/core/datagram.c and in net/sctp/socket.c Regards, David Shwartz Signed-off-by: David Shwartz <dshwatrz@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-06 13:10:11 -08:00
Johannes Berg	29cbe68c51	cfg80211/mac80211: add mesh join/leave commands Instead of tying mesh activity to interface up, add join and leave commands for mesh. Since we must be backward compatible, let cfg80211 handle joining a mesh if a mesh ID was pre-configured when the device goes up. Note that this therefore must modify mac80211 as well since mac80211 needs to lose the logic to start the mesh on interface up. We now allow querying mesh parameters before the mesh is connected, which simply returns defaults. Setting them (internally renamed to "update") is only allowed while connected. Specify them with the new mesh join command instead where needed. In mac80211, beaconing must now also follow the mesh enabled/not enabled state, which is done by testing the mesh ID. Signed-off-by: Javier Cardona <javier@cozybit.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-06 16:01:29 -05:00
Johannes Berg	bd90fdcc5f	nl80211: refactor mesh parameter parsing I'm going to need this in a new place later. Tested-by: Javier Cardona <javier@cozybit.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-06 16:01:29 -05:00
Johannes Berg	f9e10ce4cf	cfg80211: require add_virtual_intf to return new dev cfg80211 used to do all its bookkeeping in the notifier, but some new stuff will have to use local variables so make the callback return the netdev pointer. Tested-by: Javier Cardona <javier@cozybit.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-06 16:01:28 -05:00
Johannes Berg	09b1747026	mac80211: move mesh filter adjusting Logically, the filter adjusting belongs with starting/stopping mesh, not interface up/down, so move it there. Tested-by: Javier Cardona <javier@cozybit.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-06 16:01:28 -05:00
Javier Cardona	45904f2165	nl80211/mac80211: define and allow configuring mesh element TTL The TTL in path selection information elements is different from the mesh ttl used in mesh data frames. Version 7.03 of the 11s draft calls this ttl 'Element TTL'. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-06 16:01:28 -05:00
Eric Dumazet	2d5311e4e8	filter: add a security check at install time We added some security checks in commit `57fe93b374` (filter: make sure filters dont read uninitialized memory) to close a potential leak of kernel information to user. This added a potential extra cost at run time, while we can perform a check of the filter itself, to make sure a malicious user doesnt try to abuse us. This patch adds a check_loads() function, whole unique purpose is to make this check, allocating a temporary array of mask. We scan the filter and propagate a bitmask information, telling us if a load M(K) is allowed because a previous store M(K) is guaranteed. (So that sk_run_filter() can possibly not read unitialized memory) Note: this can uncover application bug, denying a filter attach, previously allowed. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Dan Rosenberg <drosenberg@vsecurity.com> Cc: Changli Gao <xiaosuo@gmail.com> Acked-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-06 12:59:09 -08:00
Changli Gao	ae9c416d68	net: arp: use assignment Only when dont_send is 0, arp_filter() is consulted, so we can simply assign the return value of arp_filter() to dont_send instead. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-06 12:59:09 -08:00
Changli Gao	c56b4d9012	af_packet: remove pgv.flags As we can check if an address is vmalloc address with is_vmalloc_addr(), we remove pgv.flags. Then we may get more pg_vecs. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-06 12:59:07 -08:00
Changli Gao	0af55bb58f	af_packet: use vmalloc_to_page() instead for the addresss returned by vmalloc() The following commit causes the pgv->buffer may point to the memory returned by vmalloc(). And we can't use virt_to_page() for the vmalloc address. This patch introduces a new inline function pgv_to_page(), which calls vmalloc_to_page() for the vmalloc address, and virt_to_page() for the __get_free_pages address. We used to increase page pointer to get the next page at the next page address, after Neil's patch, it is wrong, as the physical address may be not continuous. This patch also fixes this issue. commit `0e3125c755` Author: Neil Horman <nhorman@tuxdriver.com> Date: Tue Nov 16 10:26:47 2010 -0800 packet: Enhance AF_PACKET implementation to not require high order contiguous memory allocation (v4) Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-06 12:59:06 -08:00
Eric Dumazet	f7fce74e38	net: kill an RCU warning in inet_fill_link_af() commits `9f0f7272` (ipv4: AF_INET link address family) and `cf7afbfeb8` (rtnl: make link af-specific updates atomic) used incorrect __in_dev_get_rcu() in RTNL protected contexts, triggering PROVE_RCU warnings. Switch to __in_dev_get_rtnl(), wich is more appropriate, since we hold RTNL. Based on a report and initial patch from Amerigo Wang. Reported-by: Amerigo Wang <amwang@redhat.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Thomas Graf <tgraf@infradead.org> Reviewed-by: WANG Cong <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-06 12:59:06 -08:00
Eric Dumazet	da2033c282	filter: add SKF_AD_RXHASH and SKF_AD_CPU Add SKF_AD_RXHASH and SKF_AD_CPU to filter ancillary mechanism, to be able to build advanced filters. This can help spreading packets on several sockets with a fast selection, after RPS dispatch to N cpus for example, or to catch a percentage of flows in one queue. tcpdump -s 500 "cpu = 1" : [0] ld CPU [1] jeq #1 jt 2 jf 3 [2] ret #500 [3] ret #0 # take 12.5 % of flows (average) tcpdump -s 1000 "rxhash & 7 = 2" : [0] ld RXHASH [1] and #7 [2] jeq #2 jt 3 jf 4 [3] ret #1000 [4] ret #0 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Rui <wirelesser@gmail.com> Acked-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-06 12:59:05 -08:00
Michał Mirosław	7903264402	net: Fix too optimistic NETIF_F_HW_CSUM features NETIF_F_HW_CSUM is a superset of NETIF_F_IP_CSUM+NETIF_F_IPV6_CSUM, but some drivers miss the difference. Fix this and also fix UFO dependency on checksumming offload as it makes the same mistake in assumptions. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Acked-by: Jon Mason <jon.mason@exar.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-06 12:59:04 -08:00
Felix Fietkau	04ac3c0ee2	mac80211: speed up AP probing using nullfunc frames If the nullfunc frame used to probe the AP was not acked, there is no point in waiting for the probe timeout, so advance to the next try (or disconnect) immediately. If we do reach the probe timeout without having received a tx status, the connection is probably really bad and worth disconnecting. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-06 15:58:44 -05:00
Felix Fietkau	75706d0e9d	mac80211: remove a redundant check ieee80211_is_nullfunc() implies ieee80211_is_data() Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-06 15:58:43 -05:00
Helmut Schaa	c1ce5a74d1	mac80211: Update last_tx_rate only for data frames The last_tx_rate field was also updated for non-data frames that are often sent with a lower rate (for example management frames at 1 Mbps). This is confusing when the data rate is actually much higher. Hence, only update the last_tx_rate field with tx rate information gathered from last data frames. If the rate control algorithm filled in txrc.reported_rate we don't need to verify this information. Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-12-06 15:58:43 -05:00
John W. Linville	f435d9eea0	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem	2010-12-06 15:35:34 -05:00
Johan Hedberg	183f732c3f	Bluetooth: Fix initial RFCOMM DLC security level Due to commit `63ce0900` connections initiated through TTYs created with "rfcomm bind ..." would have security level BT_SECURITY_SDP instead of BT_SECURITY_LOW. This would cause instant connection failure between any two SSP capable devices due to the L2CAP connect request to RFCOMM being sent before authentication has been performed. This patch fixes the regression by always initializing the DLC security level to BT_SECURITY_LOW. Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com> Acked-by: Luiz Augusto von Dentz <luiz.dentz-von@nokia.com> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-12-06 15:47:44 -02:00
Gustavo F. Padovan	df6bd743b6	Bluetooth: Don't accept ConfigReq if we aren't in the BT_CONFIG state If such event happens we shall reply with a Command Reject, because we are not expecting any configure request. Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2010-12-06 15:37:50 -02:00
Eric Dumazet	46bcf14f44	filter: fix sk_filter rcu handling Pavel Emelyanov tried to fix a race between sk_filter_(de\|at)tach and sk_clone() in commit `47e958eac2` Problem is we can have several clones sharing a common sk_filter, and these clones might want to sk_filter_attach() their own filters at the same time, and can overwrite old_filter->rcu, corrupting RCU queues. We can not use filter->rcu without being sure no other thread could do the same thing. Switch code to a more conventional ref-counting technique : Do the atomic decrement immediately and queue one rcu call back when last reference is released. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-06 09:29:43 -08:00
Changli Gao	ca44ac3861	net: don't reallocate skb->head unless the current one hasn't the needed extra size or is shared skb head being allocated by kmalloc(), it might be larger than what actually requested because of discrete kmem caches sizes. Before reallocating a new skb head, check if the current one has the needed extra size. Do this check only if skb head is not shared. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-03 10:59:47 -08:00
Johannes Berg	0bae35e14b	leds: fix up dependencies It's not useful to build LED triggers when there's no LEDs that can be triggered by them. Therefore, fix up the dependencies so that this cannot happen, and fix a few users that select triggers to depend on LEDS_CLASS as well (there is also one user that also selects LEDS_CLASS, which is OK). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Tested-by: Ingo Molnar <mingo@elte.hu> Cc: Arnd Hannemann <arnd@arndnet.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Richard Purdie <rpurdie@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-12-02 14:51:15 -08:00
Allan Stephens	b924dcf003	tipc: Delete tipc_ownidentity() Moves the content of the native API routine tipc_ownidentity() into the sole routine that calls it, since it can no longer be called in isolation. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-02 13:34:06 -08:00
Allan Stephens	12bae479ee	tipc: Eliminate obsolete native API forwarding routines Moves the content of each native API message forwarding routine into the sole routine that calls it, since the forwarding routines no longer be called in isolation. Also removes code in each routine that altered the outgoing message's importance level since this is now no longer possible. The previous function mapping (parent function, and child API) was as follows: tipc_send2name \--tipc_forward2name tipc_send2port \--tipc_forward2port tipc_send_buf2port \--tipc_forward_buf2port After this commit, the children don't exist and their functionality is completely in the respective parent. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-02 13:34:06 -08:00
Allan Stephens	471450f7ec	tipc: Eliminate an unused symbolic constant in link code Removes a symbol that is not referenced anywhere by TIPC's link code. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-02 13:34:05 -08:00
Allan Stephens	52fe7b725e	tipc: Eliminate useless initialization when creating subscriber Removes initialization of a local variable that is always assigned a different value before it is referenced. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-02 13:34:05 -08:00
Allan Stephens	38f232eae2	tipc: Remove unused domain argument from multicast send routine Eliminates an unused argument from tipc_multicast(), now that this routine can no longer be called by kernel-based applications. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-02 13:34:04 -08:00
Allan Stephens	a5c2af9922	tipc: Remove support for TIPC mode change callback Eliminates support for the callback routine invoked when TIPC changes its mode of operation from inactive to standalone or from standalone to networked. This callback was part of TIPC's obsolete native API and is not used by TIPC internally. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-02 13:34:04 -08:00
Allan Stephens	528c771e87	tipc: Delete useless function prototypes Removes several function declarations that aren't used anywhere, either because they reference routines that no longer exist or because all users of the function reference it after it has already been defined. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-02 13:34:03 -08:00
Allan Stephens	28cc937eac	tipc: Eliminate useless return value when disabling a bearer Modifies bearer_disable() to return void since it always indicates success anyway. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-02 13:34:03 -08:00
Allan Stephens	8d71919d7a	tipc: Delete unused configuration service structure definition Removes a structure definition that is no longer used by TIPC's configuration service. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-02 13:34:02 -08:00
Allan Stephens	c802628297	tipc: Remove obsolete inclusions of header files Gets rid of #include statements that are no longer required as a result of the merging of obsolete native API header file content into other TIPC include files. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-02 13:34:02 -08:00
Allan Stephens	d265fef6dd	tipc: Remove obsolete native API files and exports As part of the removal of TIPC's native API support it is no longer necessary for TIPC to export symbols for routines that can be called by kernel-based applications, nor for it to have header files that kernel-based applications can include to access the declarations for those routines. This commit eliminates the exporting of symbols by TIPC and migrates the contents of each obsolete native API include file into its corresponding non-native API equivalent. The code which was migrated in this commit was migrated intact, in that there are no technical changes combined with the relocation. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-02 13:34:01 -08:00
Shan Wei	b672083ed3	ipv6: use ND_REACHABLE_TIME and ND_RETRANS_TIMER instead of magic number ND_REACHABLE_TIME and ND_RETRANS_TIMER have defined since v2.6.12-rc2, but never been used. So use them instead of magic number. This patch also changes original code style to read comfortably . Thank YOSHIFUJI Hideaki for pointing it out. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-02 13:27:32 -08:00
Shan Wei	97b1ce25e8	tcp: use TCP_BASE_MSS to set basic mss value TCP_BASE_MSS is defined, but not used. commit `5d424d5a` introduce this macro, so use it to initial sysctl_tcp_base_mss. commit `5d424d5a67` Author: John Heffner <jheffner@psc.edu> Date: Mon Mar 20 17:53:41 2006 -0800 [TCP]: MTU probing Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-02 13:27:32 -08:00
John W. Linville	09f921f83f	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 Conflicts: drivers/net/wireless/ath/ath9k/ar9003_eeprom.c	2010-12-02 15:46:37 -05:00
David S. Miller	db3949c450	tcp: Implement ipv6 ->get_peer() and ->tw_get_peer(). Now ipv6 timewait recycling is fully implemented and enabled. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-02 12:14:37 -08:00
David S. Miller	493f377d6d	tcp: Add timewait recycling bits to ipv6 connect code. This will also improve handling of ipv6 tcp socket request backlog when syncookies are not enabled. When backlog becomes very deep, last quarter of backlog is limited to validated destinations. Previously only ipv4 implemented this logic, but now ipv6 does too. Now we are only one step away from enabling timewait recycling for ipv6, and that step is simply filling in the implementation of tcp_v6_get_peer() and tcp_v6_tw_get_peer(). Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-02 12:14:29 -08:00
John W. Linville	1937721f56	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/padovan/bluetooth-2.6	2010-12-02 14:00:51 -05:00
David S. Miller	ae4694b2d3	ipv6: Create inet6_csk_route_req(). Brother of ipv4's inet_csk_route_req(). Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-02 10:59:22 -08:00
David S. Miller	ccb7c410dd	timewait_sock: Create and use getpeer op. The only thing AF-specific about remembering the timestamp for a time-wait TCP socket is getting the peer. Abstract that behind a new timewait_sock_ops vector. Support for real IPV6 sockets is not filled in yet, but curiously this makes timewait recycling start to work for v4-mapped ipv6 sockets. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-01 18:09:13 -08:00
David S. Miller	8790ca172a	inetpeer: Kill use of inet_peer_address_t typedef. They are verboten these days. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-01 17:28:18 -08:00
Andrei Emeltchenko	70f23020e6	Bluetooth: clean up hci code Do not use assignment in IF condition, remove extra spaces, fixing typos, simplify code. Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-12-01 21:04:43 -02:00
Andrei Emeltchenko	894718a6be	Bluetooth: clean up l2cap code Do not initialize static vars to zero, macros with complex values shall be enclosed with (), remove unneeded braces. Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-12-01 21:04:43 -02:00
Andrei Emeltchenko	285b4e9031	Bluetooth: clean up rfcomm code Remove extra spaces, assignments in if statement, zeroing static variables, extra braces. Fix includes. Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-12-01 21:04:43 -02:00
Andrei Emeltchenko	735cbc4784	Bluetooth: clean up sco code Do not use assignments in IF condition, remove extra spaces Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-12-01 21:04:43 -02:00
Anderson Lizardo	b78d7b4f20	Bluetooth: Fix error handling for l2cap_init() create_singlethread_workqueue() may fail with errors such as -ENOMEM. If this happens, the return value is not set to a negative value and the module load will succeed. It will then crash on module unload because of a destroy_workqueue() call on a NULL pointer. Additionally, the _busy_wq workqueue is not being destroyed if any errors happen on l2cap_init(). Signed-off-by: Anderson Lizardo <anderson.lizardo@openbossa.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-12-01 21:04:43 -02:00
Gustavo F. Padovan	eeb366564b	Bluetooth: Get rid of __rfcomm_get_sock_by_channel() rfcomm_get_sock_by_channel() was the only user of this function, so I merged both into rfcomm_get_sock_by_channel(). The socket lock now should be hold outside of rfcomm_get_sock_by_channel() once we hold and release it inside the same function now. Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-12-01 21:04:43 -02:00
Gustavo F. Padovan	e0f0cb5636	Bluetooth: Get rid of __l2cap_get_sock_by_psm() l2cap_get_sock_by_psm() was the only user of this function, so I merged both into l2cap_get_sock_by_psm(). The socket lock now should be hold outside of l2cap_get_sock_by_psm() once we hold and release it inside the same function now. Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-12-01 21:04:42 -02:00
Andrei Emeltchenko	cc11b9c14d	Bluetooth: do not use assignment in if condition Fix checkpatch errors like: "ERROR: do not use assignment in if condition" Simplify code and fix one long line. Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com> Acked-by: Ville Tervo <ville.tervo@nokia.com> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-12-01 21:04:36 -02:00
Andrei Emeltchenko	940a9eea80	Bluetooth: timer check sk is not owned before freeing In timer context we might delete l2cap channel used by krfcommd. The check makes sure that sk is not owned. If sk is owned we restart timer for HZ/5. Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-12-01 21:04:36 -02:00
Andrei Emeltchenko	a49184c229	Bluetooth: Check sk is not owned before freeing l2cap_conn Check that socket sk is not locked in user process before removing l2cap connection handler. lock_sock and release_sock do not hold a normal spinlock directly but instead hold the owner field. This means bh_lock_sock can still execute even if the socket is "locked". More info can be found here: http://www.linuxfoundation.org/collaborate/workgroups/networking/socketlocks krfcommd kernel thread may be preempted with l2cap tasklet which remove l2cap_conn structure. If krfcommd is in process of sending of RFCOMM reply (like "RFCOMM UA" reply to "RFCOMM DISC") then kernel crash happens. ... [ 694.175933] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 694.184936] pgd = c0004000 [ 694.187683] [00000000] *pgd=00000000 [ 694.191711] Internal error: Oops: 5 [#1] PREEMPT [ 694.196350] last sysfs file: /sys/devices/platform/hci_h4p/firmware/hci_h4p/loading [ 694.260375] CPU: 0 Not tainted (2.6.32.10 #1) [ 694.265106] PC is at l2cap_sock_sendmsg+0x43c/0x73c [l2cap] [ 694.270721] LR is at 0xd7017303 ... [ 694.525085] Backtrace: [ 694.527587] [<bf266be0>] (l2cap_sock_sendmsg+0x0/0x73c [l2cap]) from [<c02f2cc8>] (sock_sendmsg+0xb8/0xd8) [ 694.537292] [<c02f2c10>] (sock_sendmsg+0x0/0xd8) from [<c02f3044>] (kernel_sendmsg+0x48/0x80) Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-12-01 21:04:36 -02:00
Vasiliy Kulikov	d31dbf6e59	Bluetooth: hidp: fix information leak to userland Structure hidp_conninfo is copied to userland with version, product, vendor and name fields unitialized if both session->input and session->hid are NULL. It leads to leaking of contents of kernel stack memory. Signed-off-by: Vasiliy Kulikov <segooon@gmail.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-12-01 21:04:36 -02:00
Vasiliy Kulikov	3185fbd9d7	Bluetooth: cmtp: fix information leak to userland Structure cmtp_conninfo is copied to userland with some padding fields unitialized. It leads to leaking of contents of kernel stack memory. Signed-off-by: Vasiliy Kulikov <segooon@gmail.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-12-01 21:04:35 -02:00
Vasiliy Kulikov	5520d20f68	Bluetooth: bnep: fix information leak to userland Structure bnep_conninfo is copied to userland with the field "device" that has the last elements unitialized. It leads to leaking of contents of kernel stack memory. Signed-off-by: Vasiliy Kulikov <segooon@gmail.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-12-01 21:04:35 -02:00
Johan Hedberg	127178d24c	Bluetooth: Automate remote name requests In Bluetooth there are no automatic updates of remote device names when they get changed on the remote side. Instead, it is a good idea to do a manual name request when a new connection gets created (for whatever reason) since at this point it is very cheap (no costly baseband connection creation needed just for the sake of the name request). So far userspace has been responsible for this extra name request but tighter control is needed in order not to flood Bluetooth controllers with two many commands during connection creation. It has been shown that some controllers simply fail to function correctly if they get too many (almost) simultaneous commands during connection creation. The simplest way to acheive better control of these commands is to move their sending completely to the kernel side. This patch inserts name requests into the sequence of events that the kernel performs during connection creation. It does this after the remote features have been successfully requested and before any pending authentication requests are performed. The code will work sub-optimally with userspace versions that still do the name requesting themselves (it shouldn't break anything though) so it is recommended to combine this with a userspace software version that doesn't have automated name requests. Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-12-01 21:04:35 -02:00
Johan Hedberg	392599b95d	Bluetooth: Create a unified authentication request function This patch adds a single function that's responsible for requesting authentication for outgoing connections. This is preparation for the next patch which will add automated name requests and thereby move the authentication requests to a different location. Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-12-01 21:04:35 -02:00
Johan Hedberg	ccd556fe33	Bluetooth: Simplify remote features callback function logic The current remote and remote extended features event callbacks logic can be made simpler by using a label and goto statements instead of the current multiple levels of nested if statements. Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-12-01 21:04:35 -02:00
Gustavo F. Padovan	17f490bced	Merge git://git.kernel.org/pub/scm/linux/kernel/git/padovan/bluetooth-2.6 into test	2010-12-01 21:04:09 -02:00
David McCullough	6dcdd1b369	net/ipv6/sit.c: return unhandled skb to tunnel4_rcv I found a problem using an IPv6 over IPv4 tunnel. When CONFIG_IPV6_SIT was enabled, the packets would be rejected as net/ipv6/sit.c was catching all IPPROTO_IPV6 packets and returning an ICMP port unreachable error. I think this patch fixes the problem cleanly. I believe the code in net/ipv4/tunnel4.c:tunnel4_rcv takes care of it properly if none of the handlers claim the skb. Signed-off-by: David McCullough <david_mccullough@mcafee.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-01 13:19:34 -08:00
stephen hemminger	8afe7c8acd	ipip: add module alias for tunl0 tunnel device If ipip is built as a module the 'ip tunnel add' command would fail because the ipip module was not being autoloaded. Adding an alias for the tunl0 device name cause dev_load() to autoload it when needed. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-01 12:53:23 -08:00
stephen hemminger	4da6a738ff	gre: add module alias for gre0 tunnel device If gre is built as a module the 'ip tunnel add' command would fail because the ip_gre module was not being autoloaded. Adding an alias for the gre0 device name cause dev_load() to autoload it when needed. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-01 12:53:22 -08:00
stephen hemminger	407d6fcbfd	gre: minor cleanups Use strcpy() rather the sprintf() for the case where name is getting generated. Fix indentation. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-01 12:53:22 -08:00
Eric Dumazet	f2cd2d3e9b	net sched: use xps information for qdisc NUMA affinity Allocate qdisc memory according to NUMA properties of cpus included in xps map. To be effective, qdisc should be (re)setup after changes of /sys/class/net/eth<n>/queues/tx-<n>/xps_cpus I added a numa_node field in struct netdev_queue, containing NUMA node if all cpus included in xps_cpus share same node, else -1. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-01 12:47:42 -08:00
Anders Franzen	381601e5bb	Make the ip6_tunnel reflect the true mtu. The ip6_tunnel always assumes it consumes 40 bytes (ip6 hdr) of the mtu of the underlaying device. So for a normal ethernet bearer, the mtu of the ip6_tunnel is 1460. However, when creating a tunnel the encap limit option is enabled by default, and it consumes 8 bytes more, so the true mtu shall be 1452. I dont really know if this breaks some statement in some RFC, so this is a request for comments. Signed-off-by: Anders Franzen <anders.franzen@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-01 10:55:47 -08:00
David S. Miller	3f419d2d48	inet: Turn ->remember_stamp into ->get_peer in connection AF ops. Then we can make a completely generic tcp_remember_stamp() that uses ->get_peer() as a helper, minimizing the AF specific code and minimizing the eventual code duplication when we implement the ipv6 side of TW recycling. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-30 12:28:06 -08:00
David S. Miller	b341936380	ipv6: Add infrastructure to bind inet_peer objects to routes. They are only allowed on cached ipv6 routes. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-30 12:27:11 -08:00
David S. Miller	021e929911	inetpeer: Add v6 peers tree, abstract root properly. Add the ipv6 peer tree instance, and adapt remaining direct references to 'v4_peers' as needed. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-30 12:12:23 -08:00
David S. Miller	0266304502	inetpeer: Abstract address comparisons. Now v4 and v6 addresses will both work properly. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-30 12:08:53 -08:00
David S. Miller	b534ecf1cd	inetpeer: Make inet_getpeer() take an inet_peer_adress_t pointer. And make an inet_getpeer_v4() helper, update callers. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-30 11:54:19 -08:00
David S. Miller	582a72da9a	inetpeer: Introduce inet_peer_address_t. Currently only the v4 aspect is used, but this will change. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-30 11:53:55 -08:00
David S. Miller	98158f5a85	inetpeer: Abstract out the tree root accesses. Instead of directly accessing "peer", change to code to operate using a "struct inet_peer_base *" pointer. This will facilitate the addition of a seperate tree for ipv6 peer entries. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-30 11:41:59 -08:00
Helmut Schaa	08ca944eb2	mac80211: Minor optimization in ieee80211_rx_h_data Remove a superfluous ieee80211_is_data check as that was checked a few lines before already and we wont't get here for non-data frames at all. Second, the frame was already converted to 802.3 header format and reading the fc and addr1 fields was only possible because the 802.3 header is short enough and didn't overwrite the relevant parts of the 802.11 header. Make the code more obvious by checking the ethernet header's h_dest field. Furthermore reorder the conditions to reduce the number of checks when dynamic powersave is not needed (AP mode for example). Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-30 13:58:07 -05:00
Senthil Balasubramanian	8e26d5ad2f	mac80211: Fix STA disconnect due to MIC failure Th commit titled "mac80211: clean up rx handling wrt. found_sta" removed found_sta variable which caused a MIC failure event to be reported twice for a single failure to supplicant resulted in STA disconnect. This should fix WPA specific countermeasures WiFi test case (5.2.17) issues with mac80211 based drivers which report MIC failure events in rx status. Cc: Stable <stable@kernel.org> (2.6.37) Signed-off-by: Senthil Balasubramanian <senthilkumar@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-30 13:45:02 -05:00
Christian Lamparter	2c31333a8f	mac80211: ignore non-bcast mcast deauth/disassoc franes This patch fixes an curious issue due to insufficient rx frame filtering. Saqeb Akhter reported frequent disconnects while streaming videos over samba: <http://marc.info/?m=128600031109136> > [ 1166.512087] wlan1: deauthenticated from 30:46:9a:10:49:f7 (Reason: 7) > [ 1526.059997] wlan1: deauthenticated from 30:46:9a:10:49:f7 (Reason: 7) > [ 2125.324356] wlan1: deauthenticated from 30:46:9a:10:49:f7 (Reason: 7) > [...] The reason is that the device generates frames with slightly bogus SA/TA addresses. e.g.: [ 2314.402316] Ignore 9f:1f:31:f8:64:ff [ 2314.402321] Ignore 9f:1f:31:f8:64:ff [ 2352.453804] Ignore 0d:1f:31:f8:64:ff [ 2352.453808] Ignore 0d:1f:31:f8:64:ff ^^ the group-address flag is set! (the correct SA/TA would be: 00:1f:31:f8:64:ff) Since the AP does not know from where the frames come, it generates a DEAUTH response for the (invalid) mcast address. This mcast deauth frame then passes through all filters and tricks the stack into thinking that the AP brutally kicked us! This patch fixes the problem by simply ignoring non-broadcast, group-addressed deauth/disassoc frames. Cc: Jouni Malinen <j@w1.fi> Cc: Johannes Berg <johannes@sipsolutions.net> Reported-by: Saqeb Akhter <saqeb.akhter@gmail.com> Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-30 13:23:06 -05:00
Linus Torvalds	a01af8e4a4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (27 commits) af_unix: limit recursion level pch_gbe driver: The wrong of initializer entry pch_gbe dreiver: chang author ucc_geth: fix ucc halt problem in half duplex mode inet: Fix __inet_inherit_port() to correctly increment bsockets and num_owners ehea: Add some info messages and fix an issue hso: fix disable_net NET: wan/x25_asy, move lapb_unregister to x25_asy_close_tty cxgb4vf: fix setting unicast/multicast addresses ... net, ppp: Report correct error code if unit allocation failed DECnet: don't leak uninitialized stack byte au1000_eth: fix invalid address accessing the MAC enable register dccp: fix error in updating the GAR tcp: restrict net.ipv4.tcp_adv_win_scale (#20312) netns: Don't leak others' openreq-s in proc Net: ceph: Makefile: Remove unnessary code vhost/net: fix rcu check usage econet: fix CVE-2010-3848 econet: fix CVE-2010-3850 econet: disallow NULL remote addr for sendmsg(), fixes CVE-2010-3849 ...	2010-11-29 14:36:33 -08:00
Johannes Berg	dd318575ff	mac80211: fix RX aggregation locking The RX aggregation locking documentation was wrong, which led Christian to also code the timer timeout handling for it somewhat wrongly. Fix the documentation, the two places that need to hold the reorder lock across accesses to the structure, and the debugfs code that should just use RCU. Also, remove acquiring the sta->lock across reorder timeouts since it isn't necessary, and change a few places to GFP_KERNEL because the code path here doesn't need atomic allocations as I noticed when reviewing all this. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-29 15:30:30 -05:00
Johannes Berg	f30221e4ec	mac80211: implement off-channel mgmt TX This implements the new off-channel TX API in mac80211 with a new work item type. The operation doesn't add a new work item when we're on the right channel and there's no wait time so that for example p2p probe responses will be transmitted without delay. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-29 15:24:35 -05:00
Johannes Berg	f7ca38dfe5	nl80211/cfg80211: extend mgmt-tx API for off-channel With p2p, it is sometimes necessary to transmit a frame (typically an action frame) on another channel than the current channel. Enable this through the CMD_FRAME API, and allow it to wait for a response. A new command allows that wait to be aborted. However, allow userspace to specify whether or not it wants to allow off-channel TX, it may actually want to use the same channel only. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-29 15:24:35 -05:00
Jouni Malinen	7dff312553	mac80211: Fix frame injection using non-AP vif In order for frame injection to work properly for some use cases (e.g., finding the station entry and keys for encryption), mac80211 needs to find the correct sdata entry. This works when the main vif is in AP mode, but commit `a2c1e3dad5` broke this particular use case for station main vif. While this type of injection is quite unusual operation, it has some uses and we should fix it. Do this by changing the monitor vif sdata selection to allow station vif to be selected instead of limiting it to just AP vifs. We still need to skip some iftypes to avoid selecting unsuitable vif for injection. Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-29 14:41:28 -05:00
David S. Miller	77148625e1	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6	2010-11-29 11:19:09 -08:00
Eric Dumazet	25888e3031	af_unix: limit recursion level Its easy to eat all kernel memory and trigger NMI watchdog, using an exploit program that queues unix sockets on top of others. lkml ref : http://lkml.org/lkml/2010/11/25/8 This mechanism is used in applications, one choice we have is to have a recursion limit. Other limits might be needed as well (if we queue other types of files), since the passfd mechanism is currently limited by socket receive queue sizes only. Add a recursion_level to unix socket, allowing up to 4 levels. Each time we send an unix socket through sendfd mechanism, we copy its recursion level (plus one) to receiver. This recursion level is cleared when socket receive queue is emptied. Reported-by: Марк Коренберг <socketpair@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-29 09:45:15 -08:00
Eric Dumazet	a417786948	xps: add __rcu annotations Avoid sparse warnings : add __rcu annotations and use rcu_dereference_protected() where necessary. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-29 09:43:13 -08:00
Eric Dumazet	b02038a17b	xps: NUMA allocations for per cpu data store_xps_map() allocates maps that are used by single cpu, it makes sense to use NUMA allocations. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-29 09:43:13 -08:00
Tom Herbert	bf26414510	xps: Add CONFIG_XPS This patch adds XPS_CONFIG option to enable and disable XPS. This is done in the same manner as RPS_CONFIG. This is also fixes build failure in XPS code when SMP is not enabled. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-28 18:24:14 -08:00
Nagendra Tomar	b4ff3c90e6	inet: Fix __inet_inherit_port() to correctly increment bsockets and num_owners inet sockets corresponding to passive connections are added to the bind hash using ___inet_inherit_port(). These sockets are later removed from the bind hash using __inet_put_port(). These two functions are not exactly symmetrical. __inet_put_port() decrements hashinfo->bsockets and tb->num_owners, whereas ___inet_inherit_port() does not increment them. This results in both of these going to -ve values. This patch fixes this by calling inet_bind_hash() from ___inet_inherit_port(), which does the right thing. 'bsockets' and 'num_owners' were introduced by commit `a9d8f9110d` (inet: Allowing more than 64k connections and heavily optimize bind(0)) Signed-off-by: Nagendra Singh Tomar <tomer_iisc@yahoo.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Evgeniy Polyakov <zbr@ioremap.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-28 18:18:44 -08:00
Dan Rosenberg	3c6f27bf33	DECnet: don't leak uninitialized stack byte A single uninitialized padding byte is leaked to userspace. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> CC: stable <stable@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-28 11:32:30 -08:00
Gerrit Renker	0ac7887022	dccp: fix error in updating the GAR This fixes a bug in updating the Greatest Acknowledgment number Received (GAR): the current implementation does not track the greatest received value - lower values in the range AWL..AWH (RFC 4340, 7.5.1) erase higher ones. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-28 11:29:27 -08:00
Jozsef Kadlecsik	82a39eb6b3	ipv6: Prepare the tree for un-inlined jhash. jhash is widely used in the kernel and because the functions are inlined, the cost in size is significant. Also, the new jhash functions are slightly larger than the previous ones so better un-inline. As a preparation step, the calls to the internal macros are replaced with the plain jhash function calls. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-28 11:26:21 -08:00
andrew hendry	3f0a069a1d	X25 remove bkl in call user data length ioctl Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-28 11:12:22 -08:00
andrew hendry	74a7e44080	X25 remove bkl from causediag ioctls Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-28 11:12:21 -08:00
andrew hendry	5b7958dfa5	X25 remove bkl from calluserdata ioctls Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-28 11:12:21 -08:00
andrew hendry	f90de66067	X25 remove bkl in facility ioctls Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-28 11:12:20 -08:00
andrew hendry	5595a1a599	X25 remove bkl in subscription ioctls Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-28 11:12:20 -08:00
John Fastabend	19eb5cc559	8021q: vlan device is lockless do not transfer real_num_{tx\|rx}_queues Now that the vlan device is lockless and single queue do not transfer the real num queues. This is causing a BUG_ON to occur. kernel BUG at net/8021q/vlan.c:345! Call Trace: [<ffffffff813fd6e8>] ? fib_rules_event+0x28/0x1b0 [<ffffffff814ad2b5>] notifier_call_chain+0x55/0x80 [<ffffffff81089156>] raw_notifier_call_chain+0x16/0x20 [<ffffffff813e5af7>] call_netdevice_notifiers+0x37/0x70 [<ffffffff813e6756>] netdev_features_change+0x16/0x20 [<ffffffffa02995be>] ixgbe_fcoe_enable+0xae/0x100 [ixgbe] [<ffffffffa01da06a>] vlan_dev_fcoe_enable+0x2a/0x30 [8021q] [<ffffffffa02d08c3>] fcoe_create+0x163/0x630 [fcoe] [<ffffffff811244d5>] ? mmap_region+0x255/0x5a0 [<ffffffff81080ef0>] param_attr_store+0x50/0x80 [<ffffffff810809b6>] module_attr_store+0x26/0x30 [<ffffffff811b9db2>] sysfs_write_file+0xf2/0x180 [<ffffffff8114fc88>] vfs_write+0xc8/0x190 [<ffffffff81150621>] sys_write+0x51/0x90 [<ffffffff8100c0b2>] system_call_fastpath+0x16/0x1b Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-28 10:47:19 -08:00
Eric Dumazet	5a0d2268d2	net: add netif_tx_queue_frozen_or_stopped When testing struct netdev_queue state against FROZEN bit, we also test XOFF bit. We can test both bits at once and save some cycles. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-28 10:47:18 -08:00
Shan Wei	d3c15cab21	ipv6: kill two unused macro definition 1. IPV6_TLV_TEL_DST_SIZE This has not been using for several years since created. 2. RT6_INFO_LEN commit `33120b30` kill all RT6_INFO_LEN's references, but only this definition remained. commit `33120b30cc` Author: Alexey Dobriyan <adobriyan@sw.ru> Date: Tue Nov 6 05:27:11 2007 -0800 [IPV6]: Convert /proc/net/ipv6_route to seq_file interface Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-28 10:47:17 -08:00
Uwe Kleine-König	a40c9f88b5	net: add some KERN_CONT markers to continuation lines Cc: netdev@vger.kernel.org Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-28 10:47:17 -08:00
Alexey Dobriyan	0147fc058d	tcp: restrict net.ipv4.tcp_adv_win_scale (#20312 ) tcp_win_from_space() does the following: if (sysctl_tcp_adv_win_scale <= 0) return space >> (-sysctl_tcp_adv_win_scale); else return space - (space >> sysctl_tcp_adv_win_scale); "space" is int. As per C99 6.5.7 (3) shifting int for 32 or more bits is undefined behaviour. Indeed, if sysctl_tcp_adv_win_scale is exactly 32, space >> 32 equals space and function returns 0. Which means we busyloop in tcp_fixup_rcvbuf(). Restrict net.ipv4.tcp_adv_win_scale to [-31, 31]. Fix https://bugzilla.kernel.org/show_bug.cgi?id=20312 Steps to reproduce: echo 32 >/proc/sys/net/ipv4/tcp_adv_win_scale wget www.kernel.org [softlockup] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-28 10:39:45 -08:00
Pavel Emelyanov	8475ef9fd1	netns: Don't leak others' openreq-s in proc The /proc/net/tcp leaks openreq sockets from other namespaces. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-27 22:57:48 -08:00
Thomas Graf	cf7afbfeb8	rtnl: make link af-specific updates atomic As David pointed out correctly, updates to af-specific attributes are currently not atomic. If multiple changes are requested and one of them fails, previous updates may have been applied already leaving the link behind in a undefined state. This patch splits the function parse_link_af() into two functions validate_link_af() and set_link_at(). validate_link_af() is placed to validate_linkmsg() check for errors as early as possible before any changes to the link have been made. set_link_af() is called to commit the changes later. This method is not fail proof, while it is currently sufficient to make set_link_af() inerrable and thus 100% atomic, the validation function method will not be able to detect all error scenarios in the future, there will likely always be errors depending on states which are f.e. not protected by rtnl_mutex and thus may change between validation and setting. Also, instead of silently ignoring unknown address families and config blocks for address families which did not register a set function the errors EAFNOSUPPORT respectively EOPNOSUPPORT are returned to avoid comitting 4 out of 5 update requests without notifying the user. Signed-off-by: Thomas Graf <tgraf@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-27 22:56:08 -08:00
Tracey Dent	4cb6a614ba	Net: ceph: Makefile: Remove unnessary code Remove the if and else conditional because the code is in mainline and there is no need in it being there. Signed-off-by: Tracey Dent <tdent48227@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-27 17:39:29 -08:00
Linus Torvalds	19650e8580	Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: Ensure we return the dirent->d_type when it is known NFS: Correct the array bound calculation in nfs_readdir_add_to_array NFS: Don't ignore errors from nfs_do_filldir() NFS: Fix the error handling in "uncached_readdir()" NFS: Fix a page leak in uncached_readdir() NFS: Fix a page leak in nfs_do_filldir() NFS: Assume eof if the server returns no readdir records NFS: Buffer overflow in ->decode_dirent() should not be fatal Pure nfs client performance using odirect. SUNRPC: Fix an infinite loop in call_refresh/call_refreshresult	2010-11-27 07:30:30 +09:00
Hans Schillstrom	b880c1f077	IPVS: Backup, adding version 0 sending capabilities This patch adds a sysclt net.ipv4.vs.sync_version that can be used to send sync msg in version 0 or 1 format. sync_version value is logical, Value 1 (default) New version 0 Plain old version Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2010-11-25 10:42:59 +09:00
Hans Schillstrom	986a075795	IPVS: Backup, Change sending to Version 1 format Enable sending and removal of version 0 sending Affected functions, ip_vs_sync_buff_create() ip_vs_sync_conn() ip_vs_core.c removal of IPv4 check. v5 Just check cp->pe_data_len in ip_vs_sync_conn Check if padding needed before adding a new sync_conn to the buffer, i.e. avoid sending padding at the end. v4 moved sanity check and pe_name_len after sloop. use cp->pe instead of cp->dest->svc->pe real length in each sync_conn, not padded length however total size of a sync_msg includes padding. *v3 Sending ip_vs_sync_conn_options in network order. Sending Templates for ONE_PACKET conn. Renaming of ip_vs_sync_mesg to ip_vs_sync_mesg_v0 Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2010-11-25 10:42:59 +09:00
Hans Schillstrom	fe5e7a1efb	IPVS: Backup, Adding Version 1 receive capability Functionality improvements * flags changed from 16 to 32 bits * fwmark added (32 bits) * timeout in sec. added (32 bits) * pe data added (Variable length) * IPv6 capabilities (3x16 bytes for addr.) * Version and type in every conn msg. ip_vs_process_message() now handles Version 1 messages and will call ip_vs_process_message_v0() for version 0 messages. ip_vs_proc_conn() is common for both version, and handles the update of connection hash. ip_vs_conn_fill_param_sync() - Version 1 messages only ip_vs_conn_fill_param_sync_v0() - Version 0 messages only Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2010-11-25 10:42:59 +09:00
Hans Schillstrom	2981bc9a63	IPVS: Backup, Adding structs for new sync format New structs defined for version 1 of sync. * ip_vs_sync_v4 Ipv4 base format struct * ip_vs_sync_v6 Ipv6 base format struct Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2010-11-25 10:42:59 +09:00
Hans Schillstrom	a5959d53d6	IPVS: Handle Scheduling errors. If ip_vs_conn_fill_param_persist return an error to ip_vs_sched_persist, this error must propagate as ignored=-1 to ip_vs_schedule(). Errors from ip_vs_conn_new() in ip_vs_sched_persist() and ip_vs_schedule() should also return ignored=-1; This patch just relies on the fact that ignored is 1 before calling ip_vs_sched_persist(). Sent from Julian: "The new case when ip_vs_conn_fill_param_persist fails should set ignored = -1, so that we can use NF_DROP, see below. ignored = -1 should be also used for ip_vs_conn_new failure in ip_vs_sched_persist() and ip_vs_schedule(). The new negative value should be handled in tcp,udp,sctp" "To summarize: - ignored = 1: protocol tried to schedule (eg. on SYN), found svc but the svc/scheduler decides that this packet should be accepted with NF_ACCEPT because it must not be scheduled. - ignored = 0: scheduler can not find destination, so try bypass or return ICMP and then NF_DROP (ip_vs_leave). - ignored = -1: scheduler tried to schedule but fatal error occurred, eg. ip_vs_conn_new failure (ENOMEM) or ip_vs_sip_fill_param failure such as missing Call-ID, ENOMEM on skb_linearize or pe_data. In this case we should return NF_DROP without any attempts to send ICMP with ip_vs_leave." More or less all ideas and input to this patch is work from Julian Anastasov Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2010-11-25 10:42:59 +09:00
Hans Schillstrom	3716522653	IPVS: skb defrag in L7 helpers L7 helpers like sip needs skb defrag since L7 data can be fragmented. This patch requires "IPVS Break ports-2 into src_port and dst_port" patch Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2010-11-25 10:42:58 +09:00
Hans Schillstrom	ce144f249f	IPVS: Split ports[2] into src_port and dst_port Avoid sending invalid pointer due to skb_linearize() call. This patch prepares for next patch where skb_linearize is a part. In ip_vs_sched_persist() params the ports ptr will be replaced by src and dst port. Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2010-11-25 10:42:58 +09:00
Hans Schillstrom	0e051e683b	IPVS: Backup, Prepare for transferring firewall marks (fwmark) to the backup daemon. One struct will have fwmark added: * ip_vs_conn ip_vs_conn_new() and ip_vs_find_dest() will have an extra param - fwmark The effects of that, is in this patch. Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2010-11-25 10:42:58 +09:00
John W. Linville	51cce8a590	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem	2010-11-24 16:49:20 -05:00
Johannes Berg	99ba2a1428	mac80211: implement packet loss notification For drivers that have accurate TX status reporting we can report the number of consecutive lost packets to userspace using the new cfg80211 CQM event. The threshold is fixed right now, this may need to be improved in the future. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-24 16:19:36 -05:00
Johannes Berg	c063dbf52b	cfg80211: allow using CQM event to notify packet loss This adds the ability for drivers to use CQM events to notify about packet loss for specific stations (which could be the AP for the managed mode case). Since the threshold might be determined by the driver (it isn't passed in right now) it will be passed out of the driver to userspace in the event. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-24 16:19:36 -05:00
Luis R. Rodriguez	48124d1a91	mac80211: avoid aggregation for VO traffic This should help with latency issues which can happen when using aggregation. Cc: Felix Fietkau <nbd@openwrt.org> Cc: Matt Smith <matt.smith@atheros.com> Cc: Senthil Balasubramanian <senthilkumar@atheros.com> Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-24 16:19:36 -05:00
Felix Fietkau	72a8a3edd6	mac80211: reduce the number of retries for nullfunc probing Since nullfunc frames are transmitted as unicast frames, they're more reliable than the broadcast probe requests, so we need fewer retries to figure out whether the AP is really gone. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-24 16:19:35 -05:00
Felix Fietkau	4e5ff37692	mac80211: use nullfunc instead of probe request for connection monitoring nullfunc frames are better for connection monitoring, because probe requests are answered even if the AP has already dropped the connection, whereas nullfunc frames from an unassociated station will trigger a disassoc/deauth frame from the AP (WLAN_REASON_CLASS3_FRAME_FROM_NONASSOC_STA), which allows the station to reconnect immediately instead of waiting until it attempts to transmit the next unicast frame. This only works on hardware with reliable tx ACK reporting, any other hardware needs to fall back to the probe request method. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-24 16:19:35 -05:00
Felix Fietkau	dd5b4cc71c	cfg80211/mac80211: improve ad-hoc multicast rate handling - store the multicast rate as an index instead of the rate value (reduces cpu overhead in a hotpath) - validate the rate values (must match a bitrate in at least one sband) Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-24 16:19:35 -05:00
Felix Fietkau	46090979a5	mac80211: probe the AP when resuming Check the connection by probing the AP (either using nullfunc or a probe request). If nullfunc probing is supported and the assoc is no longer valid, the AP will send a disassoc/deauth immediately. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-24 16:19:34 -05:00
Felix Fietkau	7ccc8bd759	mac80211: calculate beacon loss time accurately Instead of using a fixed 2 second timeout, calculate beacon loss interval from the advertised beacon interval and a frame count. With this beacon loss happens after N (default 7) consecutive frames are missed which for a typical setup (100TU beacon interval) is ~700ms (or ~1/3 previous). Signed-off-by: Sam Leffler <sleffler@chromium.org> Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-24 16:19:34 -05:00
Felix Fietkau	c8a7972c3b	mac80211: restart beacon miss timer on system resume from suspend Signed-off-by: Paul Stewart <pstew@google.com> Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-24 16:19:34 -05:00
Joe Perches	e9c0268f02	net/wireless: Use pr_<level> and netdev_<level> No change in output for pr_<level> prefixes. netdev_<level> output is different, arguably improved. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-24 16:19:33 -05:00
John W. Linville	d7a066c923	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6	2010-11-24 16:19:24 -05:00
John W. Linville	ccb1435401	Revert "nl80211/mac80211: Report signal average" This reverts commit `86107fd170`. This patch inadvertantly changed the userland ABI. Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-24 16:18:36 -05:00
Phil Blundell	a27e13d370	econet: fix CVE-2010-3848 Don't declare variable sized array of iovecs on the stack since this could cause stack overflow if msg->msgiovlen is large. Instead, coalesce the user-supplied data into a new buffer and use a single iovec for it. Signed-off-by: Phil Blundell <philb@gnu.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-24 11:51:47 -08:00
Phil Blundell	16c41745c7	econet: fix CVE-2010-3850 Add missing check for capable(CAP_NET_ADMIN) in SIOCSIFADDR operation. Signed-off-by: Phil Blundell <philb@gnu.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-24 11:49:53 -08:00
Phil Blundell	fa0e846494	econet: disallow NULL remote addr for sendmsg(), fixes CVE-2010-3849 Later parts of econet_sendmsg() rely on saddr != NULL, so return early with EINVAL if NULL was passed otherwise an oops may occur. Signed-off-by: Phil Blundell <philb@gnu.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-24 11:49:19 -08:00
David S. Miller	c39508d6f1	tcp: Make TCP_MAXSEG minimum more correct. Use TCP_MIN_MSS instead of constant 64. Reported-by: Min Zhang <mzhang@mvista.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-24 11:47:22 -08:00
Tom Herbert	1d24eb4815	xps: Transmit Packet Steering This patch implements transmit packet steering (XPS) for multiqueue devices. XPS selects a transmit queue during packet transmission based on configuration. This is done by mapping the CPU transmitting the packet to a queue. This is the transmit side analogue to RPS-- where RPS is selecting a CPU based on receive queue, XPS selects a queue based on the CPU (previously there was an XPS patch from Eric Dumazet, but that might more appropriately be called transmit completion steering). Each transmit queue can be associated with a number of CPUs which will use the queue to send packets. This is configured as a CPU mask on a per queue basis in: /sys/class/net/eth<n>/queues/tx-<n>/xps_cpus The mappings are stored per device in an inverted data structure that maps CPUs to queues. In the netdevice structure this is an array of num_possible_cpu structures where each structure holds and array of queue_indexes for queues which that CPU can use. The benefits of XPS are improved locality in the per queue data structures. Also, transmit completions are more likely to be done nearer to the sending thread, so this should promote locality back to the socket on free (e.g. UDP). The benefits of XPS are dependent on cache hierarchy, application load, and other factors. XPS would nominally be configured so that a queue would only be shared by CPUs which are sharing a cache, the degenerative configuration woud be that each CPU has it's own queue. Below are some benchmark results which show the potential benfit of this patch. The netperf test has 500 instances of netperf TCP_RR test with 1 byte req. and resp. bnx2x on 16 core AMD XPS (16 queues, 1 TX queue per CPU) 1234K at 100% CPU No XPS (16 queues) 996K at 100% CPU Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-24 11:44:20 -08:00
Tom Herbert	3853b5841c	xps: Improvements in TX queue selection In dev_pick_tx, don't do work in calculating queue index or setting the index in the sock unless the device has more than one queue. This allows the sock to be set only with a queue index of a multi-queue device which is desirable if device are stacked like in a tunnel. We also allow the mapping of a socket to queue to be changed. To maintain in order packet transmission a flag (ooo_okay) has been added to the sk_buff structure. If a transport layer sets this flag on a packet, the transmit queue can be changed for the socket. Presumably, the transport would set this if there was no possbility of creating OOO packets (for instance, there are no packets in flight for the socket). This patch includes the modification in TCP output for setting this flag. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-24 11:44:19 -08:00
Eric Dumazet	bba14de987	scm: lower SCM_MAX_FD Lower SCM_MAX_FD from 255 to 253 so that allocations for scm_fp_list are halved. (commit `f8d570a4` added two pointers in this structure) scm_fp_dup() should not copy whole structure (and trigger kmemcheck warnings), but only the used part. While we are at it, only allocate needed size. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-24 11:16:43 -08:00
Eric Dumazet	456b61bca8	ipv6: mcast: RCU conversion ipv6_sk_mc_lock rwlock becomes a spinlock. readers (inet6_mc_check()) now takes rcu_read_lock() instead of read lock. Writers dont need to disable BH anymore. struct ipv6_mc_socklist objects are reclaimed after one RCU grace period. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-24 11:16:42 -08:00
Eric Dumazet	9915672d41	af_unix: limit unix_tot_inflight Vegard Nossum found a unix socket OOM was possible, posting an exploit program. My analysis is we can eat all LOWMEM memory before unix_gc() being called from unix_release_sock(). Moreover, the thread blocked in unix_gc() can consume huge amount of time to perform cleanup because of huge working set. One way to handle this is to have a sensible limit on unix_tot_inflight, tested from wait_for_unix_gc() and to force a call to unix_gc() if this limit is hit. This solves the OOM and also reduce overall latencies, and should not slowdown normal workloads. Reported-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-24 09:15:27 -08:00
Linus Torvalds	3cbaa0f7a7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: of/phylib: Use device tree properties to initialize Marvell PHYs. phylib: Add support for Marvell 88E1149R devices. phylib: Use common page register definition for Marvell PHYs. qlge: Fix incorrect usage of module parameters and netdev msg level ipv6: fix missing in6_ifa_put in addrconf SuperH IrDA: correct Baud rate error correction atl1c: Fix hardware type check for enabling OTP CLK net: allow GFP_HIGHMEM in __vmalloc() bonding: change list contact to netdev@vger.kernel.org e1000: fix screaming IRQ	2010-11-24 08:22:34 +09:00
Helmut Schaa	18890d4b89	mac80211: Disable hw crypto for GTKs on AP VLAN interfaces When using AP VLAN interfaces, each VLAN interface should be in its own broadcast domain. Hostapd achieves this by assigning different GTKs to different AP VLAN interfaces. However, mac80211 drivers are not aware of AP VLAN interfaces and as such mac80211 sends the GTK to the driver in the context of the base AP mode interface. This causes problems when multiple AP VLAN interfaces are used since the driver will use the same key slot for the different GTKs (there's no way for the driver to distinguish the different GTKs from different AP VLAN interfaces). Thus, only the clients associated to one AP VLAN interface (the one that was created last) can actually use broadcast traffic. Fix this by not programming any GTKs for AP VLAN interfaces into the hw but fall back to using software crypto. The GTK for the underlying AP interface is still sent to the driver. That means, broadcast traffic to stations associated to an AP VLAN interface is encrypted in software whereas broadcast traffic to stations associated to the non-VLAN AP interface is encrypted in hardware. Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-22 15:48:51 -05:00
Luis R. Rodriguez	b2e253cf30	cfg80211: Fix regulatory bug with multiple cards and delays When two cards are connected with the same regulatory domain if CRDA had a delayed response then cfg80211's own set regulatory domain would still be the world regulatory domain. There was a bug on cfg80211's logic such that it assumed that once you pegged a request as the last request it was already the currently set regulatory domain. This would mean we would race setting a stale regulatory domain to secondary cards which had the same regulatory domain since the alpha2 would match. We fix this by processing each regulatory request atomically, and only move on to the next one once we get it fully processed. In the case CRDA is not present we will simply world roam. This issue is only present when you have a slow system and the CRDA processing is delayed. Because of this it is not a known regression. Without this fix when a delay is present with CRDA the second card would end up with an intersected regulatory domain and not allow it to use the channels it really is designed for. When two cards with two different regulatory domains were inserted you'd end up rejecting the second card's regulatory domain request. This fails with mac80211_hswim's regtest=2 (two requests, same alpha2) and regtest=3 (two requests, different alpha2) module parameter options. This was reproduced and tested against mac80211_hwsim using this CRDA delayer: #!/bin/bash echo $COUNTRY >> /tmp/log sleep 2 /sbin/crda.orig And these regulatory tests: modprobe mac80211_hwsim regtest=2 modprobe mac80211_hwsim regtest=3 Reported-by: Mark Mentovai <mark@moxienet.com> Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Tested-by: Mark Mentovai <mark@moxienet.com> Tested-by: Bruno Randolf <br1@einfach.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-22 15:48:51 -05:00
Luis R. Rodriguez	b0e2880b05	cfg80211: move mutex locking to reg_process_pending_hints() This will be required in the next patch and it makes the next patch easier to review. Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Tested-by: Mark Mentovai <mark@moxienet.com> Tested-by: Bruno Randolf <br1@einfach.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-22 15:48:50 -05:00
Luis R. Rodriguez	f333a7a2f4	cfg80211: move reg_work and reg_todo above These will be used earlier in the next few patches. Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Tested-by: Mark Mentovai <mark@moxienet.com> Tested-by: Bruno Randolf <br1@einfach.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-22 15:48:50 -05:00
Luis R. Rodriguez	31e99729ae	cfg80211: put core regulatory request into queue This will simplify the synchronization for pending requests. Without this we have a race between the core and when we restore regulatory settings, although this is unlikely its best to just avoid that race altogether. Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Tested-by: Mark Mentovai <mark@moxienet.com> Tested-by: Bruno Randolf <br1@einfach.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-22 15:48:50 -05:00
Gustavo F. Padovan	c89ad73722	Bluetooth: Fix not returning proper error in SCO Return 0 in that situation could lead to errors in the caller. Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-11-22 18:23:18 -02:00
Trond Myklebust	5fc43978a7	SUNRPC: Fix an infinite loop in call_refresh/call_refreshresult If the rpcauth_refreshcred() call returns an error other than EACCES, ENOMEM or ETIMEDOUT, we currently end up looping forever between call_refresh and call_refreshresult. The correct thing to do here is to exit on all errors except EAGAIN and ETIMEDOUT, for which case we retry 3 times, then return EACCES. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-11-22 13:22:39 -05:00
Tracey Dent	e5700c740d	Net: wanrouter: Makefile: Remove deprecated kbuild goal definitions Changed Makefile to use <modules>-y instead of <modules>-objs because -objs is deprecated and not mentioned in Documentation/kbuild/makefiles.txt. Signed-off-by: Tracey Dent <tdent48227@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-22 08:16:16 -08:00
Tracey Dent	fdb26195f4	Net: sunrpc: auth_gss: Makefile: Remove deprecated kbuild goal definitions Changed Makefile to use <modules>-y instead of <modules>-objs because -objs is deprecated and not mentioned in Documentation/kbuild/makefiles.txt. Signed-off-by: Tracey Dent <tdent48227@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-22 08:16:16 -08:00
Tracey Dent	9ed05ad3c0	Net: rxrpc: Makefile: Remove deprecated kbuild goal definitions Changed Makefile to use <modules>-y instead of <modules>-objs because -objs is deprecated and not mentioned in Documentation/kbuild/makefiles.txt. Signed-off-by: Tracey Dent <tdent48227@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-22 08:16:15 -08:00
Tracey Dent	094f2faaa2	Net: rds: Makefile: Remove deprecated items Changed Makefile to use <modules>-y instead of <modules>-objs because -objs is deprecated and not mentioned in Documentation/kbuild/makefiles.txt. Also, use the ccflags-$ flag instead of EXTRA_CFLAGS because EXTRA_CFLAGS is deprecated and should now be switched. Last but not least, took out if-conditionals. Signed-off-by: Tracey Dent <tdent48227@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-22 08:16:15 -08:00
Tracey Dent	927a41f50c	Net: phonet: Makefile: Remove deprecated kbuild goal definitions Changed Makefile to use <modules>-y instead of <modules>-objs because -objs is deprecated and not mentioned in Documentation/kbuild/makefiles.txt. Signed-off-by: Tracey Dent <tdent48227@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-22 08:16:14 -08:00
Tracey Dent	64387df8da	Net: lapb: Makefile: Remove deprecated kbuild goal definitions Changed Makefile to use <modules>-y instead of <modules>-objs because -objs is deprecated and not mentioned in Documentation/kbuild/makefiles.txt. Signed-off-by: Tracey Dent <tdent48227@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-22 08:16:14 -08:00
Tracey Dent	94ee288e94	Net: irda: irnet: Makefile: Remove deprecated kbuild goal definitions Changed Makefile to use <modules>-y instead of <modules>-objs because -objs is deprecated and not mentioned in Documentation/kbuild/makefiles.txt. Signed-off-by: Tracey Dent <tdent48227@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-22 08:16:13 -08:00
Tracey Dent	cd30c62024	Net: irda: irlan: Makefile: Remove deprecated kbuild goal definitions Changed Makefile to use <modules>-y instead of <modules>-objs because -objs is deprecated and not mentioned in Documentation/kbuild/makefiles.txt. Signed-off-by: Tracey Dent <tdent48227@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-22 08:16:13 -08:00
Tracey Dent	f91c4ae498	Net: irda: ircomm: Makefile: Remove deprecated kbuild goal defintions Changed Makefile to use <modules>-y instead of <modules>-objs because -objs is deprecated and not mentioned in Documentation/kbuild/makefiles.txt. Signed-off-by: Tracey Dent <tdent48227@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-22 08:16:12 -08:00
Tracey Dent	4de58dfebe	Net: ipv6: netfiliter: Makefile: Remove deprecated kbuild goal definitions Changed Makefile to use <modules>-y instead of <modules>-objs because -objs is deprecated and not mentioned in Documentation/kbuild/makefiles.txt. Signed-off-by: Tracey Dent <tdent48227@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-22 08:16:12 -08:00
Tracey Dent	6b8ff8c517	Net: ipv4: netfilter: Makefile: Remove deprecated kbuild goal definitions Changed Makefile to use <modules>-y instead of <modules>-objs because -objs is deprecated and not mentioned in Documentation/kbuild/makefiles.txt. Signed-off-by: Tracey Dent <tdent48227@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-22 08:16:11 -08:00
Tracey Dent	cc0bdac399	Net: econet: Makefile: Remove deprecated kbuild goal definitions Changed Makefile to use <modules>-y instead of <modules>-objs because -objs is deprecated and not mentioned in Documentation/kbuild/makefiles.txt. Signed-off-by: Tracey Dent <tdent48227@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-22 08:16:11 -08:00
Tracey Dent	22674a24b4	Net: dns_resolver: Makefile: Remove deprecated kbuild goal definitions Changed Makefile to use <modules>-y instead of <modules>-objs because -objs is deprecated and not mentioned in Documentation/kbuild/makefiles.txt. Signed-off-by: Tracey Dent <tdent48227@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-22 08:16:10 -08:00
Tracey Dent	fa13bc3daa	Net: ceph: Makefile: remove deprecated kbuild goal definitions Changed Makefile to use <modules>-y instead of <modules>-objs because -objs is deprecated and not mentioned in Documentation/kbuild/makefiles.txt. Signed-off-by: Tracey Dent <tdent48227@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-22 08:16:10 -08:00
Tracey Dent	bac14e0178	Net: can: Makefile: Remove deprecated kbuild goal definitions Changed Makefile to use <modules>-y instead of <modules>-objs because -objs is deprecated and not mentioned in Documentation/kbuild/makefiles.txt. Signed-off-by: Tracey Dent <tdent48227@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-22 08:16:09 -08:00
Tracey Dent	a3106d032f	Net: caif: Makefile: Remove deprecated items Changed Makefile to use <modules>-y instead of <modules>-objs because -objs is deprecated and not mentioned in Documentation/kbuild/makefiles.txt. Also, use the ccflags-$ flag instead of EXTRA_CFLAGS because EXTRA_CFLAGS is deprecated and should now be switched. Last but not least, took out if-conditionals. Signed-off-by: Tracey Dent <tdent48227@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-22 08:16:09 -08:00
Tracey Dent	87217502d4	Net: bluetooth: Makefile: Remove deprecated kbuild goal definitions Changed Makefile to use <modules>-y instead of <modules>-objs because -objs is deprecated and not mentioned in Documentation/kbuild/makefiles.txt. Signed-off-by: Tracey Dent <tdent48227@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-22 08:16:08 -08:00
John Fastabend	88b2a9a3d9	ipv6: fix missing in6_ifa_put in addrconf Fix ref count bug introduced by commit `2de7957072` Author: Lorenzo Colitti <lorenzo@google.com> Date: Wed Oct 27 18:16:49 2010 +0000 ipv6: addrconf: don't remove address state on ifdown if the address is being kept Fix logic so that addrconf_ifdown() decrements the inet6_ifaddr refcnt correctly with in6_ifa_put(). Reported-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-22 07:37:36 -08:00
Eric Dumazet	551eaff1b3	pktgen: allow faster module unload Unloading pktgen module needs ~6 seconds on a 64 cpus machine, to stop 64 kthreads. Add a pktgen_exiting variable to let kernel threads die faster, so that kthread_stop() doesnt have to wait too long for them. This variable is not tested in fast path. Note : Before exiting from pktgen_thread_worker(), we must make sure kthread_stop() is waiting for this thread to be stopped, like its done in kernel/softirq.c Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-21 10:26:44 -08:00
Eric Dumazet	7a1c8e5ab1	net: allow GFP_HIGHMEM in __vmalloc() We forgot to use __GFP_HIGHMEM in several __vmalloc() calls. In ceph, add the missing flag. In fib_trie.c, xfrm_hash.c and request_sock.c, using vzalloc() is cleaner and allows using HIGHMEM pages as well. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-21 10:04:04 -08:00
Eric Dumazet	bbce5a59e4	packet: use vzalloc() alloc_one_pg_vec_page() is supposed to return zeroed memory, so use vzalloc() instead of vmalloc() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-21 10:01:42 -08:00
J. Bruce Fields	9c335c0b8d	svcrpc: fix wspace-checking race We call svc_xprt_enqueue() after something happens which we think may require handling from a server thread. To avoid such events being lost, svc_xprt_enqueue() must guarantee that there will be a svc_serv() call from a server thread following any such event. It does that by either waking up a server thread itself, or checking that XPT_BUSY is set (in which case somebody else is doing it). But the check of XPT_BUSY could occur just as someone finishes processing some other event, and just before they clear XPT_BUSY. Therefore it's important not to clear XPT_BUSY without subsequently doing another svc_export_enqueue() to check whether the xprt should be requeued. The xpo_wspace() check in svc_xprt_enqueue() breaks this rule, allowing an event to be missed in situations like: data arrives call svc_tcp_data_ready(): call svc_xprt_enqueue(): set BUSY find no write space svc_reserve(): free up write space call svc_enqueue(): test BUSY clear BUSY So, instead, check wspace in the same places that the state flags are checked: before taking BUSY, and in svc_receive(). Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-11-19 18:35:12 -05:00
J. Bruce Fields	b176331627	svcrpc: svc_close_xprt comment Neil Brown had to explain to me why we do this here; record the answer for posterity. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-11-19 18:35:12 -05:00
J. Bruce Fields	f8c0d226fe	svcrpc: simplify svc_close_all There's no need to be fooling with XPT_BUSY now that all the threads are gone. The list_del_init() here could execute at the same time as the svc_xprt_enqueue()'s list_add_tail(), with undefined results. We don't really care at this point, but it might result in a spurious list-corruption warning or something. And svc_close() isn't adding any value; just call svc_delete_xprt() directly. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-11-19 18:35:11 -05:00
J. Bruce Fields	ca7896cd83	nfsd4: centralize more calls to svc_xprt_received Follow up on `b48fa6b991` by moving all the svc_xprt_received() calls for the main xprt to one place. The clearing of XPT_BUSY here is critical to the correctness of the server, so I'd prefer it to be obvious where we do it. The only substantive result is moving svc_xprt_received() after svc_receive_deferred(). Other than a (likely insignificant) delay waking up the next thread, that should be harmless. Also reshuffle the exit code a little to skip a few other steps that we don't care about the in the svc_delete_xprt() case. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-11-19 18:35:11 -05:00
J. Bruce Fields	62bac4af3d	svcrpc: don't set then immediately clear XPT_DEFERRED There's no harm to doing this, since the only caller will immediately call svc_enqueue() afterwards, ensuring we don't miss the remaining deferred requests just because XPT_DEFERRED was briefly cleared. But why not just do this the simple way? Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-11-19 18:35:11 -05:00
Linus Torvalds	76db8ac45f	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: fix readdir EOVERFLOW on 32-bit archs ceph: fix frag offset for non-leftmost frags ceph: fix dangling pointer ceph: explicitly specify page alignment in network messages ceph: make page alignment explicit in osd interface ceph: fix comment, remove extraneous args ceph: fix update of ctime from MDS ceph: fix version check on racing inode updates ceph: fix uid/gid on resent mds requests ceph: fix rdcache_gen usage and invalidate ceph: re-request max_size if cap auth changes ceph: only let auth caps update max_size ceph: fix open for write on clustered mds ceph: fix bad pointer dereference in ceph_fill_trace ceph: fix small seq message skipping Revert "ceph: update issue_seq on cap grant"	2010-11-19 15:32:22 -08:00
Linus Torvalds	caf8394524	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (31 commits) net: fix kernel-doc for sk_filter_rcu_release be2net: Fix to avoid firmware update when interface is not open. netfilter: fix IP_VS dependencies net: irda: irttp: sync error paths of data- and udata-requests ipv6: Expose reachable and retrans timer values as msecs ipv6: Expose IFLA_PROTINFO timer values in msecs instead of jiffies 3c59x: fix build failure on !CONFIG_PCI ipg.c: remove id [SUNDANCE, 0x1021] net: caif: spi: fix potential NULL dereference ath9k_htc: Avoid setting QoS control for non-QoS frames net: zero kobject in rx_queue_release net: Fix duplicate volatile warning. MAINTAINERS: Add stmmac maintainer bonding: fix a race in IGMP handling cfg80211: fix can_beacon_sec_chan, reenable HT40 gianfar: fix signedness issue net: bnx2x: fix error value sign 8139cp: fix checksum broken r8169: fix checksum broken rds: Integer overflow in RDS cmsg handling ...	2010-11-19 15:25:59 -08:00
David S. Miller	24912420e9	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/bonding/bond_main.c net/core/net-sysfs.c net/ipv6/addrconf.c	2010-11-19 13:13:47 -08:00
andrew hendry	0670b8ae66	X25: remove bkl in routing ioctls Routing doesn't use the socket data and is protected by x25_route_list_lock Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-19 12:40:02 -08:00
andrew hendry	54aafbd498	X25: remove bkl in inq and outq ioctls Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-19 12:40:01 -08:00
andrew hendry	1ecd66bf2c	X25: remove bkl in timestamp ioctls Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-19 12:40:01 -08:00
andrew hendry	70be998c2b	X25: pushdown bkl in ioctls Push down the bkl in the ioctls so they can be removed one at a time. Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-19 12:40:00 -08:00
Eric Dumazet	c26aed40f4	filter: use reciprocal divide At compile time, we can replace the DIV_K instruction (divide by a constant value) by a reciprocal divide. At exec time, the expensive divide is replaced by a multiply, a less expensive operation on most processors. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-19 10:06:54 -08:00
Eric Dumazet	8c1592d68b	filter: cleanup codes[] init Starting the translated instruction to 1 instead of 0 allows us to remove one descrement at check time and makes codes[] array init cleaner. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-19 10:04:23 -08:00
Eric Dumazet	93aaae2e01	filter: optimize sk_run_filter Remove pc variable to avoid arithmetic to compute fentry at each filter instruction. Jumps directly manipulate fentry pointer. As the last instruction of filter[] is guaranteed to be a RETURN, and all jumps are before the last instruction, we dont need to check filter bounds (number of instructions in filter array) at each iteration, so we remove it from sk_run_filter() params. On x86_32 remove f_k var introduced in commit `57fe93b374` (filter: make sure filters dont read uninitialized memory) Note : We could use a CONFIG_ARCH_HAS_{FEW\|MANY}_REGISTERS in order to avoid too many ifdefs in this code. This helps compiler to use cpu registers to hold fentry and A accumulator. On x86_32, this saves 401 bytes, and more important, sk_run_filter() runs much faster because less register pressure (One less conditional branch per BPF instruction) # size net/core/filter.o net/core/filter_pre.o text data bss dec hex filename 2948 0 0 2948 b84 net/core/filter.o 3349 0 0 3349 d15 net/core/filter_pre.o on x86_64 : # size net/core/filter.o net/core/filter_pre.o text data bss dec hex filename 5173 0 0 5173 1435 net/core/filter.o 5224 0 0 5224 1468 net/core/filter_pre.o Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-19 09:49:59 -08:00
Randy Dunlap	0302b8622c	net: fix kernel-doc for sk_filter_rcu_release Fix kernel-doc warning for sk_filter_rcu_release(): Warning(net/core/filter.c:586): missing initial short description on line: * sk_filter_rcu_release: Release a socket filter by rcu_head Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-19 09:27:15 -08:00
Patrick McHardy	dba4490d22	netfilter: fix IP_VS dependencies When NF_CONNTRACK is enabled, IP_VS uses conntrack symbols. Therefore IP_VS can't be linked statically when conntrack is built modular. Reported-by: Justin P. Mattock <justinmattock@gmail.com> Tested-by: Justin P. Mattock <justinmattock@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-18 13:14:33 -08:00
Wolfram Sang	925e277f52	net: irda: irttp: sync error paths of data- and udata-requests irttp_data_request() returns meaningful errorcodes, while irttp_udata_request() just returns -1 in similar situations. Sync the two and the loglevels of the accompanying output. Signed-off-by: Wolfram Sang <w.sang@pengutronix.de> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-18 12:24:25 -08:00
Thomas Graf	18a31e1e28	ipv6: Expose reachable and retrans timer values as msecs Expose reachable and retrans timer values in msecs instead of jiffies. Both timer values are already exposed as msecs in the neighbour table netlink interface. The creation timestamp format with increased precision is kept but cleaned up. Signed-off-by: Thomas Graf <tgraf@infradead.org> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-18 12:08:36 -08:00
David S. Miller	07bfa524d4	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6	2010-11-18 11:56:09 -08:00
Bruno Randolf	86107fd170	nl80211/mac80211: Report signal average Extend nl80211 to report an exponential weighted moving average (EWMA) of the signal value. Since the signal value usually fluctuates between different packets, an average can be more useful than the value of the last packet. This uses the recently added generic EWMA library function. Signed-off-by: Bruno Randolf <br1@einfach.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-18 14:22:20 -05:00
Thomas Graf	93908d1926	ipv6: Expose IFLA_PROTINFO timer values in msecs instead of jiffies IFLA_PROTINFO exposes timer related per device settings in jiffies. Change it to expose these values in msecs like the sysctl interface does. I did not find any users of IFLA_PROTINFO which rely on any of these values and even if there are, they are likely already broken because there is no way for them to reliably convert such a value to another time format. Signed-off-by: Thomas Graf <tgraf@infradead.org> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-18 11:05:01 -08:00
Eric Dumazet	57e1ab6ead	igmp: refine skb allocations IGMP allocates MTU sized skbs. This may fail for large MTU (order-2 allocations), so add a fallback to try lower sizes. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-18 11:02:23 -08:00
Changli Gao	4c3710afbc	net: move definitions of BPF_S_* to net/core/filter.c BPF_S_* are used internally, should not be exposed to the others. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-18 10:59:51 -08:00
Tetsuo Handa	cba328fc5e	filter: Optimize instruction revalidation code. Since repeating u16 value to u8 value conversion using switch() clause's case statement is wasteful, this patch introduces u16 to u8 mapping table and removes most of case statements. As a result, the size of net/core/filter.o is reduced by about 29% on x86. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-18 10:58:35 -08:00
John Fastabend	9e50e3ac5a	net: add priority field to pktgen Add option to set skb priority to pktgen. Useful for testing QOS features. Also by running pktgen on the vlan device the qdisc on the real device can be tested. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-18 10:43:07 -08:00
John Fastabend	7d8e76bf9a	net: zero kobject in rx_queue_release netif_set_real_num_rx_queues() can decrement and increment the number of rx queues. For example ixgbe does this as features and offloads are toggled. Presumably this could also happen across down/up on most devices if the available resources changed (cpu offlined). The kobject needs to be zero'd in this case so that the state is not preserved across kobject_put()/kobject_init_and_add(). This resolves the following error report. ixgbe 0000:03:00.0: eth2: NIC Link is Up 10 Gbps, Flow Control: RX/TX kobject (ffff880324b83210): tried to init an initialized object, something is seriously wrong. Pid: 1972, comm: lldpad Not tainted 2.6.37-rc18021qaz+ #169 Call Trace: [<ffffffff8121c940>] kobject_init+0x3a/0x83 [<ffffffff8121cf77>] kobject_init_and_add+0x23/0x57 [<ffffffff8107b800>] ? mark_lock+0x21/0x267 [<ffffffff813c6d11>] net_rx_queue_update_kobjects+0x63/0xc6 [<ffffffff813b5e0e>] netif_set_real_num_rx_queues+0x5f/0x78 [<ffffffffa0261d49>] ixgbe_set_num_queues+0x1c6/0x1ca [ixgbe] [<ffffffffa0262509>] ixgbe_init_interrupt_scheme+0x1e/0x79c [ixgbe] [<ffffffffa0274596>] ixgbe_dcbnl_set_state+0x167/0x189 [ixgbe] Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-18 09:41:40 -08:00
Gerrit Renker	f72f2f4cde	dccp ccid-2: whitespace fix-up This fixes whitespace noise introduced in commit "dccp ccid-2: Algorithm to update buffer state", `5753fdfe8b`, 14 Nov 2010. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-18 09:37:07 -08:00
Eric Dumazet	866f3b25a2	bonding: IGMP handling cleanup Instead of iterating in_dev->mc_list from bonding driver, its better to call a helper function provided by igmp.c Details of implementation (locking) are private to igmp code. ip_mc_rejoin_group(struct ip_mc_list im) becomes ip_mc_rejoin_groups(struct in_device in_dev); Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-18 09:33:19 -08:00
Mark Mentovai	09a02fdb91	cfg80211: fix can_beacon_sec_chan, reenable HT40 This follows wireless-testing `9236d838c9` ("cfg80211: fix extension channel checks to initiate communication") and fixes accidental case fall-through. Without this fix, HT40 is entirely blocked. Signed-off-by: Mark Mentovai <mark@moxienet.com> Cc: stable@kernel.org Acked-by: Luis R. Rodriguez <lrodriguez@atheros.com Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-18 11:35:05 -05:00
Johannes Berg	50a9432dae	mac80211: fix powersaving clients races The code to handle powersaving stations has a race: when the powersave flag is lifted from a station, we could transmit a packet that is being processed for TX at the same time right away, even if there are other frames queued for it. This would cause frame reordering. To fix this, lift the flag only under the appropriate lock that blocks TX. Additionally, the code to allow drivers to block a station while frames for it are on the HW queue is never re-enabled the station, so traffic would get stuck indefinitely. Fix this by clearing the flag for this appropriately. Finally, as an optimisation, don't do anything if the driver unblocks an already unblocked station. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-17 16:19:33 -05:00
Johannes Berg	4bce22b9b8	mac80211: defines for AC numbers In many places we've just hardcoded the AC numbers -- which is a relic from the original mac80211 (d80211). Add constants for them so we know what we're talking about. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-17 16:19:31 -05:00
Vasiliy Kulikov	dda0b38692	net: ipv4: tcp_probe: cleanup snprintf() use snprintf() returns number of bytes that were copied if there is no overflow. This code uses return value as number of copied bytes. Theoretically format string '%lu.%09lu %pI4:%u %pI4:%u %d %#x %#x %u %u %u %u\n' may be expanded up to 163 bytes. In reality tv.tv_sec is just few bytes instead of 20, 2 ports are just 5 bytes each instead of 10, length is 5 bytes instead of 10. The rest is an unstrusted input. Theoretically if tv_sec is big then copy_to_user() would overflow tbuf. tbuf was increased to fit in 163 bytes. snprintf() is used to follow return value semantic. Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-17 12:27:46 -08:00
John Fastabend	9ea19481db	net: zero kobject in rx_queue_release netif_set_real_num_rx_queues() can decrement and increment the number of rx queues. For example ixgbe does this as features and offloads are toggled. Presumably this could also happen across down/up on most devices if the available resources changed (cpu offlined). The kobject needs to be zero'd in this case so that the state is not preserved across kobject_put()/kobject_init_and_add(). This resolves the following error report. ixgbe 0000:03:00.0: eth2: NIC Link is Up 10 Gbps, Flow Control: RX/TX kobject (ffff880324b83210): tried to init an initialized object, something is seriously wrong. Pid: 1972, comm: lldpad Not tainted 2.6.37-rc18021qaz+ #169 Call Trace: [<ffffffff8121c940>] kobject_init+0x3a/0x83 [<ffffffff8121cf77>] kobject_init_and_add+0x23/0x57 [<ffffffff8107b800>] ? mark_lock+0x21/0x267 [<ffffffff813c6d11>] net_rx_queue_update_kobjects+0x63/0xc6 [<ffffffff813b5e0e>] netif_set_real_num_rx_queues+0x5f/0x78 [<ffffffffa0261d49>] ixgbe_set_num_queues+0x1c6/0x1ca [ixgbe] [<ffffffffa0262509>] ixgbe_init_interrupt_scheme+0x1e/0x79c [ixgbe] [<ffffffffa0274596>] ixgbe_dcbnl_set_state+0x167/0x189 [ixgbe] Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-17 12:27:46 -08:00
Changli Gao	5811662b15	net: use the macros defined for the members of flowi Use the macros defined for the members of flowi to clean the code up. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-17 12:27:45 -08:00
Dan Rosenberg	218854af84	rds: Integer overflow in RDS cmsg handling In rds_cmsg_rdma_args(), the user-provided args->nr_local value is restricted to less than UINT_MAX. This seems to need a tighter upper bound, since the calculation of total iov_size can overflow, resulting in a small sock_kmalloc() allocation. This would probably just result in walking off the heap and crashing when calling rds_rdma_pages() with a high count value. If it somehow doesn't crash here, then memory corruption could occur soon after. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-17 12:20:52 -08:00
Thomas Graf	b382b191ea	ipv6: AF_INET6 link address family IPv6 already exposes some address family data via netlink in the IFLA_PROTINFO attribute if RTM_GETLINK request is sent with the address family set to AF_INET6. We take over this format and reuse all the code. Signed-off-by: Thomas Graf <tgraf@infradead.org> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-17 11:28:26 -08:00
Thomas Graf	9f0f7272ac	ipv4: AF_INET link address family Implements the AF_INET link address family exposing the per device configuration settings via netlink using the attribute IFLA_INET_CONF. The format of IFLA_INET_CONF differs depending on the direction the attribute is sent. The attribute sent by the kernel consists of a u32 array, basically a 1:1 copy of in_device->cnf.data[]. The attribute expected by the kernel must consist of a sequence of nested u32 attributes, each representing a change request, e.g. [IFLA_INET_CONF] = { [IPV4_DEVCONF_FORWARDING] = 1, [IPV4_DEVCONF_NOXFRM] = 0, } libnl userspace API documentation and example available from: http://www.infradead.org/~tgr/libnl/doc-git/group__link__inet.html Signed-off-by: Thomas Graf <tgraf@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-17 11:28:25 -08:00
Thomas Graf	f8ff182c71	rtnetlink: Link address family API Each net_device contains address family specific data such as per device settings and statistics. We already expose this data via procfs/sysfs and partially netlink. The netlink method requires the requester to send one RTM_GETLINK request for each address family it wishes to receive data of and then merge this data itself. This patch implements a new API which combines all address family specific link data in a new netlink attribute IFLA_AF_SPEC. IFLA_AF_SPEC contains a sequence of nested attributes, one for each address family which in turn defines the structure of its own attribute. Example: [IFLA_AF_SPEC] = { [AF_INET] = { [IFLA_INET_CONF] = ..., }, [AF_INET6] = { [IFLA_INET6_FLAGS] = ..., [IFLA_INET6_CONF] = ..., } } The API also allows for address families to implement a function which parses the IFLA_AF_SPEC attribute sent by userspace to implement address family specific link options. Signed-off-by: Thomas Graf <tgraf@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-17 11:28:24 -08:00
Eric Paris	ee58681195	network: tcp_connect should return certain errors up the stack The current tcp_connect code completely ignores errors from sending an skb. This makes sense in many situations (like -ENOBUFFS) but I want to be able to immediately fail connections if they are denied by the SELinux netfilter hook. Netfilter does not normally return ECONNREFUSED when it drops a packet so we respect that error code as a final and fatal error that can not be recovered. Based-on-patch-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-17 10:54:35 -08:00
Eric Paris	da68365004	netfilter: allow hooks to pass error code back up the stack SELinux would like to pass certain fatal errors back up the stack. This patch implements the generic netfilter support for this functionality. Based-on-patch-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-17 10:54:34 -08:00
Joe Perches	37d6680042	net/atm: Remove unnecessary casts of netdev_priv Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-17 10:37:52 -08:00
Arnd Bergmann	451a3c24b0	BKL: remove extraneous #include <smp_lock.h> The big kernel lock has been removed from all these files at some point, leaving only the #include. Remove this too as a cleanup. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-17 08:59:32 -08:00
Felix Fietkau	8f0729b16a	mac80211: add support for setting the ad-hoc multicast rate Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-16 16:39:08 -05:00
Felix Fietkau	885a46d0f7	cfg80211: add support for setting the ad-hoc multicast rate Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-16 16:39:08 -05:00
Juuso Oikarinen	a619a4c0e1	mac80211: Add function to get probe request template for current AP Chipsets with hardware based connection monitoring need to autonomically send directed probe-request frames to the AP (in the event of beacon loss, for example.) For the hardware to be able to do this, it requires a template for the frame to transmit to the AP, filled in with the BSSID and SSID of the AP, but also the supported rate IE's. This patch adds a function to mac80211, which allows the hardware driver to fetch this template after association, so it can be configured to the hardware. Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-16 16:37:08 -05:00
Bruno Randolf	15d9675321	mac80211: Add antenna configuration Allow antenna configuration by calling driver's function for it. We disallow antenna configuration if the wiphy is already running, mainly to make life easier for 802.11n drivers which need to recalculate HT capabilites. Signed-off-by: Bruno Randolf <br1@einfach.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-16 16:37:05 -05:00
Bruno Randolf	afe0cbf875	cfg80211: Add nl80211 antenna configuration Allow setting of TX and RX antennas configuration via nl80211. The antenna configuration is defined as a bitmap of allowed antennas to use. This API can be used to mask out antennas which are not attached or should not be used for other reasons like regulatory concerns or special setups. Separate bitmaps are used for RX and TX to allow configuring different antennas for receiving and transmitting. Each bitmap is 32 bit long, each bit representing one antenna, starting with antenna 1 at the first bit. If an antenna bit is set, this means the driver is allowed to use this antenna for RX or TX respectively; if the bit is not set the hardware is not allowed to use this antenna. Using bitmaps has the benefit of allowing for a flexible configuration interface which can support many different configurations and which can be used for 802.11n as well as non-802.11n devices. Instead of relying on some hardware specific assumptions, drivers can use this information to know which antennas are actually attached to the system and derive their capabilities based on that. 802.11n devices should enable or disable chains, based on which antennas are present (If all antennas belonging to a particular chain are disabled, the entire chain should be disabled). HT capabilities (like STBC, TX Beamforming, Antenna selection) should be calculated based on the available chains after applying the antenna masks. Should a 802.11n device have diversity antennas attached to one of their chains, diversity can be enabled or disabled based on the antenna information. Non-802.11n drivers can use the antenna masks to select RX and TX antennas and to enable or disable antenna diversity. While covering chainmasks for 802.11n and the standard "legacy" modes "fixed antenna 1", "fixed antenna 2" and "diversity" this API also allows more rare, but useful configurations as follows: 1) Send on antenna 1, receive on antenna 2 (or vice versa). This can be used to have a low gain antenna for TX in order to keep within the regulatory constraints and a high gain antenna for RX in order to receive weaker signals ("speak softly, but listen harder"). This can be useful for building long-shot outdoor links. Another usage of this setup is having a low-noise pre-amplifier on antenna 1 and a power amplifier on the other antenna. This way transmit noise is mostly kept out of the low noise receive channel. (This would be bitmaps: tx 1 rx 2). 2) Another similar setup is: Use RX diversity on both antennas, but always send on antenna 1. Again that would allow us to benefit from a higher gain RX antenna, while staying within the legal limits. (This would be: tx 0 rx 3). 3) And finally there can be special experimental setups in research and development even with pre 802.11n hardware where more than 2 antennas are available. It's good to keep the API simple, yet flexible. Signed-off-by: Bruno Randolf <br1@einfach.org> -- v7: Made bitmasks 32 bit wide and rebased to latest wireless-testing. Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-16 16:37:05 -05:00
Arik Nemtsov	f23a478075	mac80211: support hardware TX fragmentation offload The lower driver is notified when the fragmentation threshold changes and upon a reconfig of the interface. If the driver supports hardware TX fragmentation, don't fragment packets in the stack. Signed-off-by: Arik Nemtsov <arik@wizery.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-16 16:37:04 -05:00
Luis R. Rodriguez	9236d838c9	cfg80211: fix extension channel checks to initiate communication When operating in a mode that initiates communication and using HT40 we should fail if we cannot use both primary and secondary channels to initiate communication. Our current ht40 allowmap only covers STA mode of operation, for beaconing modes we need a check on the fly as the mode of operation is dynamic and there other flags other than disable which we should read to check if we can initiate communication. Do not allow for initiating communication if our secondary HT40 channel has is either disabled, has a passive scan flag, a no-ibss flag or is a radar channel. Userspace now has similar checks but this is also needed in-kernel. Reported-by: Jouni Malinen <jouni.malinen@atheros.com> Cc: stable@kernel.org Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-16 15:59:39 -05:00
Ulrich Weber	7d98ffd8c2	xfrm: update flowi saddr in icmp_send if unset otherwise xfrm_lookup will fail to find correct policy Signed-off-by: Ulrich Weber <uweber@astaro.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-16 11:43:39 -08:00
Eric Dumazet	c31504dc0d	udp: use atomic_inc_not_zero_hint UDP sockets refcount is usually 2, unless an incoming frame is going to be queued in receive or backlog queue. Using atomic_inc_not_zero_hint() permits to reduce latency, because processor issues less memory transactions. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-16 11:17:43 -08:00
Eric Dumazet	213b15ca81	vlan: remove ndo_select_queue() logic Now vlan are lockless, we dont need special ndo_select_queue() logic. dev_pick_tx() will do the multiqueue stuff on the real device transmit. Suggested-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-16 11:17:42 -08:00
Eric Dumazet	4af429d29b	vlan: lockless transmit path vlan is a stacked device, like tunnels. We should use the lockless mechanism we are using in tunnels and loopback. This patch completely removes locking in TX path. tx stat counters are added into existing percpu stat structure, renamed from vlan_rx_stats to vlan_pcpu_stats. Note : this partially reverts commit `2e59af3dcb` (vlan: multiqueue vlan device) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-16 11:15:08 -08:00
Neil Horman	0e3125c755	packet: Enhance AF_PACKET implementation to not require high order contiguous memory allocation (v4) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Version 4 of this patch. Change notes: 1) Removed extra memset. Didn't think kcalloc added a GFP_ZERO the way kzalloc did :) Summary: It was shown to me recently that systems under high load were driven very deep into swap when tcpdump was run. The reason this happened was because the AF_PACKET protocol has a SET_RINGBUFFER socket option that allows the user space application to specify how many entries an AF_PACKET socket will have and how large each entry will be. It seems the default setting for tcpdump is to set the ring buffer to 32 entries of 64 Kb each, which implies 32 order 5 allocation. Thats difficult under good circumstances, and horrid under memory pressure. I thought it would be good to make that a bit more usable. I was going to do a simple conversion of the ring buffer from contigous pages to iovecs, but unfortunately, the metadata which AF_PACKET places in these buffers can easily span a page boundary, and given that these buffers get mapped into user space, and the data layout doesn't easily allow for a change to padding between frames to avoid that, a simple iovec change is just going to break user space ABI consistency. So I've done this, I've added a three tiered mechanism to the af_packet set_ring socket option. It attempts to allocate memory in the following order: 1) Using __get_free_pages with GFP_NORETRY set, so as to fail quickly without digging into swap 2) Using vmalloc 3) Using __get_free_pages with GFP_NORETRY clear, causing us to try as hard as needed to get the memory The effect is that we don't disturb the system as much when we're under load, while still being able to conduct tcpdumps effectively. Tested successfully by me. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Maciej Żenczykowski <zenczykowski@gmail.com> Reported-by: Maciej Żenczykowski <zenczykowski@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-16 10:26:47 -08:00
Wolfram Sang	4c62ab9c53	irda: irttp: allow zero byte packets Sending zero byte packets is not neccessarily an error (AF_INET accepts it, too), so just apply a shortcut. This was discovered because of a non-working software with WINE. See http://bugs.winehq.org/show_bug.cgi?id=19397#c86 http://thread.gmane.org/gmane.linux.irda.general/1643 for very detailed debugging information and a testcase. Kudos to Wolfgang for those! Reported-by: Wolfgang Schwotzer <wolfgang.schwotzer@gmx.net> Signed-off-by: Wolfram Sang <w.sang@pengutronix.de> Tested-by: Mike Evans <mike.evans@cardolan.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-16 09:50:47 -08:00
John Fastabend	9d82ca98f7	ipv6: fix missing in6_ifa_put in addrconf Fix ref count bug introduced by commit `2de7957072` Author: Lorenzo Colitti <lorenzo@google.com> Date: Wed Oct 27 18:16:49 2010 +0000 ipv6: addrconf: don't remove address state on ifdown if the address is being kept Fix logic so that addrconf_ifdown() decrements the inet6_ifaddr refcnt correctly with in6_ifa_put(). Reported-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-16 09:24:58 -08:00
David S. Miller	b5e4156743	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6	2010-11-16 09:17:12 -08:00
Jesper Juhl	94f58df8e5	SUNRPC: Simplify rpc_alloc_iostats by removing pointless local variable Hi, We can simplify net/sunrpc/stats.c::rpc_alloc_iostats() a bit by getting rid of the unneeded local variable 'new'. Please CC me on replies. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-11-16 11:58:51 -05:00
Patrick McHardy	2c2bf08614	Merge branch 'for-patrick' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/lvs-test-2.6	2010-11-16 10:21:27 +01:00
Eric Dumazet	3bfd45f93c	netfilter: nf_conntrack: one less atomic op in nf_ct_expect_insert() Instead of doing atomic_inc(&exp->use) twice, call atomic_add(2, &exp->use); Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-11-16 10:19:18 +01:00
David S. Miller	6b35308850	net: Export netif_get_vlan_features(). ERROR: "netif_get_vlan_features" [drivers/net/xen-netfront.ko] undefined! Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-15 20:15:03 -08:00
Simon Horman	8f1b03a4c1	ipvs: allow transmit of GRO aggregated skbs Attempt at allowing LVS to transmit skbs of greater than MTU length that have been aggregated by GRO and can thus be deaggregated by GSO. Cc: Julian Anastasov <ja@ssi.bg> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Simon Horman <horms@verge.net.au>	2010-11-16 08:13:08 +09:00
Eric Dumazet	a333e2ec05	ipvs: remove shadow rt variable Remove a sparse warning about rt variable. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2010-11-16 08:13:08 +09:00
Eric Dumazet	4ecd29447e	ipvs: add static and read_mostly attributes ip_vs_conn_tab_bits & ip_vs_conn_tab_mask are static to ipvs/ip_vs_conn.c ip_vs_conn_tab_size, ip_vs_conn_tab_mask, ip_vs_conn_tab [the pointer], ip_vs_conn_rnd are mostly read. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2010-11-16 08:13:08 +09:00
Simon Horman	8aadf93c9c	IPVS: buffer argument to ip_vs_process_message() should not be const It is assigned to a non-const variable and its contents are modified. Acked-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2010-11-16 08:13:08 +09:00
Simon Horman	7ae246a15a	IPVS: Remove useless { } block from ip_vs_process_message() Acked-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2010-11-16 08:13:08 +09:00
Simon Horman	d494262b8a	IPVS: Make the cp argument to ip_vs_sync_conn() static Acked-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2010-11-16 08:13:07 +09:00
Simon Horman	ea2c73afc2	IPVS: Only match pe_data created by the same pe Only match persistence engine data if it was created by the same persistence engine. Reported-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2010-11-16 08:13:07 +09:00
Simon Horman	e9e5eee873	IPVS: Add persistence engine to connection entry The dest of a connection may not exist if it has been created as the result of connection synchronisation. But in order for connection entries for templates with persistence engine data created through connection synchronisation to be valid access to the persistence engine pointer is required. So add the persistence engine to the connection itself. Signed-off-by: Simon Horman <horms@verge.net.au>	2010-11-16 08:13:07 +09:00
Michael Witten	c996d8b9a8	Docs/Kconfig: Update: osdl.org -> linuxfoundation.org Some of the documentation refers to web pages under the domain `osdl.org'. However, `osdl.org' now redirects to `linuxfoundation.org'. Rather than rely on redirections, this patch updates the addresses appropriately; for the most part, only documentation that is meant to be current has been updated. The patch should be pretty quick to scan and check; each new web-page url was gotten by trying out the original URL in a browser and then simply copying the the redirected URL (formatting as necessary). There is some conflict as to which one of these domain names is preferred: linuxfoundation.org linux-foundation.org So, I wrote: info@linuxfoundation.org and got this reply: Message-ID: <4CE17EE6.9040807@linuxfoundation.org> Date: Mon, 15 Nov 2010 10:41:42 -0800 From: David Ames <david@linuxfoundation.org> ... linuxfoundation.org is preferred. The canonical name for our web site is www.linuxfoundation.org. Our list site is actually lists.linux-foundation.org. Regarding email linuxfoundation.org is preferred there are a few people who choose to use linux-foundation.org for their own reasons. Consequently, I used `linuxfoundation.org' for web pages and `lists.linux-foundation.org' for mailing-list web pages and email addresses; the only personal email address I updated from `@osdl.org' was that of Andrew Morton, who prefers `linux-foundation.org' according `git log'. Signed-off-by: Michael Witten <mfwitten@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-11-15 23:50:13 +01:00
Eric Dumazet	ec1e5610c0	bridge: add RCU annotations to bridge port lookup br_port_get() renamed to br_port_get_rtnl() to make clear RTNL is held. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-15 11:13:18 -08:00
stephen hemminger	b5ed54e94d	bridge: fix RCU races with bridge port The macro br_port_exists() is not enough protection when only RCU is being used. There is a tiny race where other CPU has cleared port handler hook, but is bridge port flag might still be set. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-15 11:13:17 -08:00
Eric Dumazet	a386f99025	bridge: add proper RCU annotation to should_route_hook Add br_should_route_hook_t typedef, this is the only way we can get a clean RCU implementation for function pointer. Move route_hook to location where it is used. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-15 11:13:16 -08:00
Eric Dumazet	e805168800	bridge: add RCU annotation to bridge multicast table Add modern __rcu annotatations to bridge multicast table. Use newer hlist macros to avoid direct access to hlist internals. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-15 11:13:16 -08:00
Joe Perches	8a22c99a80	net/ipv6/mcast.c: Remove unnecessary semicolons Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-15 11:07:16 -08:00
David S. Miller	6c1b6c6b87	Merge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/net-next-2.6	2010-11-15 10:59:49 -08:00
Tom Herbert	fe8222406c	net: Simplify RX queue allocation This patch move RX queue allocation to alloc_netdev_mq and freeing of the queues to free_netdev (symmetric to TX queue allocation). Each kobject RX queue takes a reference to the queue's device so that the device can't be freed before all the kobjects have been released-- this obviates the need for reference counts specific to RX queues. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-15 10:57:28 -08:00
Tom Herbert	ed9af2e839	net: Move TX queue allocation to alloc_netdev_mq TX queues are now allocated in alloc_netdev_mq and freed in free_netdev. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-15 10:56:54 -08:00
Eric Dumazet	c5d277d29a	netfilter: rcu sparse cleanups Use RCU helpers to reduce number of sparse warnings (CONFIG_SPARSE_RCU_POINTER=y), and adds lockdep checks. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-11-15 19:45:13 +01:00
Timo Teräs	cc9ff19da9	xfrm: use gre key as flow upper protocol info The GRE Key field is intended to be used for identifying an individual traffic flow within a tunnel. It is useful to be able to have XFRM policy selector matches to have different policies for different GRE tunnels. Signed-off-by: Timo Teräs <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-15 10:44:04 -08:00
David S. Miller	e1f2d8c2cc	vlan: Fix build warning in vlandev_seq_show() net/8021q/vlanproc.c: In function 'vlandev_seq_show': net/8021q/vlanproc.c:283:20: warning: unused variable 'fmt' Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-15 10:37:30 -08:00
Jesper Juhl	ffa56e540c	mac80211: Remove redundant checks for NULL before calls to crypto_free_cipher() crypto_free_cipher() is a wrapper around crypto_free_tfm() which is a wrapper around crypto_destroy_tfm() and the latter can handle being passed a NULL pointer, so checking for NULL in the ieee80211_aes_key_free()/ieee80211_aes_cmac_key_free() wrappers around crypto_free_cipher() is pointless and just increase object code size needlesly and makes us execute extra test/branch instructions that we don't need. Btw; don't we have to many wrappers around wrappers ad nauseam here? Anyway, this patch removes the redundant conditionals. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-15 13:26:11 -05:00
Eliad Peller	07caf9d6c9	mac80211: refactor debugfs function generation code refactor mac80211 debugfs code by using a format&copy function, instead of duplicating the code for each generated function. this change reduces about 600B from mac80211.ko Signed-off-by: Eliad Peller <eliad@wizery.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-15 13:24:48 -05:00
Felix Fietkau	c7317e41df	mac80211: minstrel_ht - reduce the overhead of rate sampling - reduce the number of retransmission attempts for sample rates - sample lower rates less often - do not use RTS/CTS for sampling frames - increase the time between sampling attempts Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-15 13:24:21 -05:00
Luis R. Rodriguez	d91e41b690	cfg80211: prefix REG_DBG_PRINT() with cfg80211 Everyone's doing it, its the cool thing. Cc: Easwar Krishnan <easwar.krishnan@atheros.com> Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-15 13:24:15 -05:00
Luis R. Rodriguez	e702d3cf29	cfg80211: add debug print when processing a channel In the worst case you are seeing really odd things you want more information than what is provided right now, for those that insist and want debug info through CONFIG_CFG80211_REG_DEBUG provide a print of when we are processing a channel and with what regulatory rule. Cc: Easwar Krishnan <easwar.krishnan@atheros.com> Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: Senthil Balasubramanian <senthilkumar@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-15 13:24:14 -05:00
Luis R. Rodriguez	a65185367f	cfg80211: add debug print when disabling a channel on a custom regd Cc: Easwar Krishnan <easwar.krishnan@atheros.com> Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-15 13:24:13 -05:00
Luis R. Rodriguez	926a0a094d	cfg80211: add debug prints for when we ignore regulatory hints This can help with debugging issues. You will only see these with CONFIG_CFG80211_REG_DEBUG enabled. Cc: Easwar Krishnan <easwar.krishnan@atheros.com> Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: Senthil Balasubramanian <senthilkumar@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-15 13:24:12 -05:00
Luis R. Rodriguez	ca4ffe8f28	cfg80211: fix disabling channels based on hints After a module loads you will have loaded the world roaming regulatory domain or a custom regulatory domain. Further regulatory hints are welcomed and should be respected unless the regulatory hint is coming from a country IE as the IEEE spec allows for a country IE to be a subset of what is allowed by the local regulatory agencies. So disable all channels that do not fit a regulatory domain sent from a unless the hint is from a country IE and the country IE had no information about the band we are currently processing. This fixes a few regulatory issues, for example for drivers that depend on CRDA and had no 5 GHz freqencies allowed were not properly disabling 5 GHz at all, furthermore it also allows users to restrict devices further as was intended. If you recieve a country IE upon association we will also disable the channels that are not allowed if the country IE had at least one channel on the respective band we are procesing. This was the original intention behind this design but it was completely overlooked... Cc: David Quan <david.quan@atheros.com> Cc: Jouni Malinen <jouni.malinen@atheros.com> cc: Easwar Krishnan <easwar.krishnan@atheros.com> Cc: stable@kernel.org Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-15 13:24:10 -05:00
Luis R. Rodriguez	749b527b21	cfg80211: fix allowing country IEs for WIPHY_FLAG_STRICT_REGULATORY We should be enabling country IE hints for WIPHY_FLAG_STRICT_REGULATORY even if we haven't yet recieved regulatory domain hint for the driver if it needed one. Without this Country IEs are not passed on to drivers that have set WIPHY_FLAG_STRICT_REGULATORY, today this is just all Atheros chipset drivers: ath5k, ath9k, ar9170, carl9170. This was part of the original design, however it was completely overlooked... Cc: Easwar Krishnan <easwar.krishnan@atheros.com> Cc: stable@kernel.org Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-15 13:24:09 -05:00
Luis R. Rodriguez	7ca43d03b1	cfg80211: pass the reg hint initiator to helpers This is required later. Cc: Easwar Krishnan <easwar.krishnan@atheros.com> Cc: stable@kernel.org signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-15 13:24:08 -05:00
Stephen Hemminger	2e48928d8a	rfkill: remove dead code The following code is defined but never used. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-15 13:24:06 -05:00
John Fastabend	636e19a342	net: consolidate 8021q tagging Now that VLAN packets are tagged in dev_hard_start_xmit() at the bottom of the stack we no longer need to tag them in the 8021Q module (Except in the !VLAN_FLAG_REORDER_HDR case). This allows the accel path and non accel paths to be consolidated. Here the vlan_tci in the skb is always set and we allow the stack to add the actual tag in dev_hard_start_xmit(). Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-15 10:11:40 -08:00
John Fastabend	8f5549f381	net: remove check for headroom in vlan_dev_create It is possible for the headroom to be smaller then the hard_header_len for a short period of time after toggling the vlan offload setting. This is not a hard error and skb_cow_head is called in __vlan_put_tag() to resolve this. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-15 10:11:40 -08:00
John Fastabend	029f5fc31c	8021q: set hard_header_len when VLAN offload features are toggled Toggling the vlan tx\|rx hw offloads needs to set the hard_header_len as well otherwise we end up using LL_RESERVED_SPACE incorrectly. This results in pskb_expand_head() being used unnecessarily. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-15 10:11:39 -08:00
Eric Dumazet	ab0cba2512	netfilter: nf_nat_amanda: rename a variable Avoid a sparse warning about 'ret' variable shadowing Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-11-15 18:45:12 +01:00
Eric Dumazet	eb733162ae	netfilter: add __rcu annotations Use helpers to reduce number of sparse warnings (CONFIG_SPARSE_RCU_POINTER=y) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-11-15 18:43:59 +01:00
Jesse Gross	58e998c6d2	offloading: Force software GSO for multiple vlan tags. We currently use vlan_features to check for TSO support if there is a vlan tag. However, it's quite likely that the NIC is not able to do TSO when there is an arbitrary number of tags. Therefore if there is more than one tag (in-band or out-of-band), fall back to software emulation. Signed-off-by: Jesse Gross <jesse@nicira.com> CC: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-15 09:22:53 -08:00
Jesse Gross	c8d5bcd1af	offloading: Support multiple vlan tags in GSO. We assume that hardware TSO can't support multiple levels of vlan tags but we allow it to be done. Therefore, enable GSO to parse these tags so we can fallback to software. Signed-off-by: Jesse Gross <jesse@nicira.com> CC: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-15 09:22:53 -08:00
Jesse Gross	e1e78db628	offloading: Make scatter/gather more tolerant of vlans. When checking if it is necessary to linearize a packet, we currently use vlan_features if the packet contains either an in-band or out- of-band vlan tag. However, in-band tags aren't special in any way for scatter/gather since they are part of the packet buffer and are simply more data to DMA. Therefore, only use vlan_features for out- of-band tags, which could potentially have some interaction with scatter/gather. Signed-off-by: Jesse Gross <jesse@nicira.com> CC: Ben Hutchings <bhutchings@solarflare.com> Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-15 09:22:52 -08:00
Eric Dumazet	be9e9163af	netfilter: nf_ct_frag6_sysctl_table is static Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-11-15 18:18:29 +01:00
Eric Dumazet	0e60ebe04c	netfilter: add __rcu annotations Add some __rcu annotations and use helpers to reduce number of sparse warnings (CONFIG_SPARSE_RCU_POINTER=y) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-11-15 18:17:21 +01:00
David S. Miller	d9aa93804e	ipv4: Fix build with multicast disabled. net/ipv4/igmp.c: In function 'ip_mc_inc_group': net/ipv4/igmp.c:1228: error: implicit declaration of function 'for_each_pmc_rtnl' net/ipv4/igmp.c:1228: error: expected ';' before '{' token net/ipv4/igmp.c: In function 'ip_mc_unmap': net/ipv4/igmp.c:1333: error: expected ';' before 'igmp_group_dropped' ... Move for_each_pmc_rcu and for_each_pmc_rtnl macro definitions outside of multicast ifdef protection. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-15 08:52:02 -08:00
Fr�d�ric Leroy	9811600f7c	netfilter: xt_CLASSIFY: add ARP support, allow CLASSIFY target on any table Signed-off-by: Fr�d�ric Leroy <fredo@starox.org> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-11-15 13:57:56 +01:00
Changli Gao	76a2d3bcfc	netfilter: nf_nat: don't use atomic bit operation As we own the conntrack and the others can't see it until we confirm it, we don't need to use atomic bit operation on ct->status. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-11-15 11:59:03 +01:00
Changli Gao	3b23688069	netfilter: ct_extend: fix the wrong alloc_size In function update_alloc_size(), sizeof(struct nf_ct_ext) is added twice wrongly. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-11-15 11:47:52 +01:00
Jan Engelhardt	b468645d72	netfilter: xt_LOG: do print MAC header on FORWARD I am observing consistent behavior even with bridges, so let's unlock this. xt_mac is already usable in FORWARD, too. Section 9 of http://ebtables.sourceforge.net/br_fw_ia/br_fw_ia.html#section9 says the MAC source address is changed, but my observation does not match that claim -- the MAC header is retained. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> [Patrick; code inspection seems to confirm this] Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-11-15 11:23:06 +01:00
Gerrit Renker	7e87fe8430	dccp ccid-2: Separate option parsing from CCID processing This patch replaces an almost identical replication of code: large parts of dccp_parse_options() re-appeared as ccid2_ackvector() in ccid2.c. Apart from the duplication, this caused two more problems: 1. CCIDs should not need to be concerned with parsing header options; 2. one can not assume that Ack Vectors appear as a contiguous area within an skb, it is legal to insert other options and/or padding in between. The current code would throw an error and stop reading in such a case. Since Ack Vectors provide CCID-specific information, they are now processed by the CCID directly, separating this functionality from the main DCCP code. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2010-11-15 07:12:01 +01:00
Gerrit Renker	52394eecec	dccp ccid-2: Remove old infrastructure This removes * functions for which updates have been provided in the preceding patches and * the @av_vec_len field - it is no longer necessary since the buffer length is now always computed dynamically. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2010-11-15 07:12:00 +01:00
Gerrit Renker	d83447f094	dccp ccid-2: Schedule Sync as out-of-band mechanism The problem with Ack Vectors is that i) their length is variable and can in principle grow quite large, ii) it is hard to predict exactly how large they will be. Due to the second point it seems not a good idea to reduce the MPS; in particular when on average there is enough room for the Ack Vector and an increase in length is momentarily due to some burst loss, after which the Ack Vector returns to its normal/average length. The solution taken by this patch is to subtract a minimum-expected Ack Vector length from the MPS, and to defer any larger Ack Vectors onto a separate Sync - but only if indeed there is no space left on the skb. This patch provides the infrastructure to schedule Sync-packets for transporting (urgent) out-of-band data. Its signalling is quicker than scheduling an Ack, since it does not need to wait for new application data. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2010-11-15 07:12:00 +01:00
Gerrit Renker	18219463c8	dccp ccid-2: Consolidate Ack-Vector processing within main DCCP module This aggregates Ack Vector processing (handling input and clearing old state) into one function, for the following reasons and benefits: * all Ack Vector-specific processing is now in one place; * duplicated code is removed; * ensuring sanity: from an Ack Vector point of view, it is better to clear the old state first before entering new state; * Ack Event handling happens mostly within the CCIDs, not the main DCCP module. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2010-11-15 07:11:59 +01:00
Gerrit Renker	3802408644	dccp ccid-2: Update code for the Ack Vector input/registration routine This patch updates the code which registers new packets as received, using the new circular buffer interface. It contributes a new algorithm which * supports both tail/head pointers and buffer wrap-around and * deals with overflow (head/tail move in lock-step). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2010-11-15 07:11:59 +01:00
Gerrit Renker	5753fdfe8b	dccp ccid-2: Algorithm to update buffer state This provides a routine to consistently update the buffer state when the peer acknowledges receipt of Ack Vectors; updating state in the list of Ack Vectors as well as in the circular buffer. While based on RFC 4340, several additional (and necessary) precautions were added to protect the consistency of the buffer state. These additions are essential, since analysis and experience showed that the basic algorithm was insufficient for this task (which lead to problems that were hard to debug). The algorithm now * deals with HC-sender acknowledging to HC-receiver and vice versa, * keeps track of the last unacknowledged but received seqno in tail_ackno, * has special cases to reset the overflow condition when appropriate, * is protected against receiving older information (would mess up buffer state). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2010-11-15 07:11:59 +01:00
David S. Miller	c25ecd0a21	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2010-11-14 11:57:05 -08:00
Linus Torvalds	9457b24a09	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (66 commits) can-bcm: fix minor heap overflow gianfar: Do not call device_set_wakeup_enable() under a spinlock ipv6: Warn users if maximum number of routes is reached. docs: Add neigh/gc_thresh3 and route/max_size documentation. axnet_cs: fix resume problem for some Ax88790 chip ipv6: addrconf: don't remove address state on ifdown if the address is being kept tcp: Don't change unlocked socket state in tcp_v4_err(). x25: Prevent crashing when parsing bad X.25 facilities cxgb4vf: add call to Firmware to reset VF State. cxgb4vf: Fail open if link_start() fails. cxgb4vf: flesh out PCI Device ID Table ... cxgb4vf: fix some errors in Gather List to skb conversion cxgb4vf: fix bug in Generic Receive Offload cxgb4vf: don't implement trivial (and incorrect) ndo_select_queue() ixgbe: Look inside vlan when determining offload protocol. bnx2x: Look inside vlan when determining checksum proto. vlan: Add function to retrieve EtherType from vlan packets. virtio-net: init link state correctly ucc_geth: Fix deadlock ucc_geth: Do not bring the whole IF down when TX failure. ...	2010-11-12 17:17:55 -08:00
Oliver Hartkopp	0597d1b99f	can-bcm: fix minor heap overflow On 64-bit platforms the ASCII representation of a pointer may be up to 17 bytes long. This patch increases the length of the buffer accordingly. http://marc.info/?l=linux-netdev&m=128872251418192&w=2 Reported-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> CC: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-12 14:07:14 -08:00
Ben Greear	4038565327	ipv6: Warn users if maximum number of routes is reached. This gives users at least some clue as to what the problem might be and how to go about fixing it. Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-12 14:03:24 -08:00
Lorenzo Colitti	2de7957072	ipv6: addrconf: don't remove address state on ifdown if the address is being kept Currently, addrconf_ifdown does not delete statically configured IPv6 addresses when the interface is brought down. The intent is that when the interface comes back up the address will be usable again. However, this doesn't actually work, because the system stops listening on the corresponding solicited-node multicast address, so the address cannot respond to neighbor solicitations and thus receive traffic. Also, the code notifies the rest of the system that the address is being deleted (e.g, RTM_DELADDR), even though it is not. Fix it so that none of this state is updated if the address is being kept on the interface. Tested: Added a statically configured IPv6 address to an interface, started ping, brought link down, brought link up again. When link came up ping kept on going and "ip -6 maddr" showed that the host was still subscribed to there Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-12 13:44:24 -08:00
David S. Miller	8f49c2703b	tcp: Don't change unlocked socket state in tcp_v4_err(). Alexey Kuznetsov noticed a regression introduced by commit `f1ecd5d9e7` ("Revert Backoff [v3]: Revert RTO on ICMP destination unreachable") The RTO and timer modification code added to tcp_v4_err() doesn't check sock_owned_by_user(), which if true means we don't have exclusive access to the socket and therefore cannot modify it's critical state. Just skip this new code block if sock_owned_by_user() is true and eliminate the now superfluous sock_owned_by_user() code block contained within. Reported-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net> CC: Damian Lukowski <damian@tvk.rwth-aachen.de> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>	2010-11-12 13:35:00 -08:00
Eric Dumazet	190683a9d5	net: net_families __rcu annotations Use modern RCU API / annotations for net_families array. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-12 13:27:25 -08:00
Eric Dumazet	1d7138de87	igmp: RCU conversion of in_dev->mc_list in_dev->mc_list is protected by one rwlock (in_dev->mc_list_lock). This can easily be converted to a RCU protection. Writers hold RTNL, so mc_list_lock is removed, not replaced by a spinlock. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Cypher Wu <cypher.w@gmail.com> Cc: Américo Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-12 13:18:57 -08:00
Dan Rosenberg	5ef41308f9	x25: Prevent crashing when parsing bad X.25 facilities Now with improved comma support. On parsing malformed X.25 facilities, decrementing the remaining length may cause it to underflow. Since the length is an unsigned integer, this will result in the loop continuing until the kernel crashes. This patch adds checks to ensure decrementing the remaining length does not cause it to wrap around. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-12 12:44:42 -08:00
Mariusz Kozlowski	1f18b7176e	net: Fix header size check for GSO case in recvmsg (af_packet) Parameter 'len' is size_t type so it will never get negative. Signed-off-by: Mariusz Kozlowski <mk@lab.zgora.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-12 11:06:46 -08:00
David S. Miller	7c13a0d9a1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6	2010-11-12 11:04:26 -08:00
Thomas Graf	369cf77a6a	rtnetlink: Fix message size calculation for link messages nlmsg_total_size() calculates the length of a netlink message including header and alignment. nla_total_size() calculates the space an individual attribute consumes which was meant to be used in this context. Also, ensure to account for the attribute header for the IFLA_INFO_XSTATS attribute as implementations of get_xstats_size() seem to assume that we do so. The addition of two message headers minus the missing attribute header resulted in a calculated message size that was larger than required. Therefore we never risked running out of skb tailroom. Signed-off-by: Thomas Graf <tgraf@infradead.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-12 10:53:09 -08:00
Changli Gao	ca36181050	netfilter: xt_NFQUEUE: remove modulo operations Signed-off-by: Changli Gao <xiaosuo@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-11-12 17:34:17 +01:00
Changli Gao	e5fc9e7a66	netfilter: nf_conntrack: don't always initialize ct->proto ct->proto is big(60 bytes) due to structure ip_ct_tcp, and we don't need to initialize the whole for all the other protocols. This patch moves proto to the end of structure nf_conn, and pushes the initialization down to the individual protocols. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-11-12 17:33:17 +01:00
Shan Wei	22e091e525	netfilter: ipv6: fix overlap check for fragments The type of FRAG6_CB(prev)->offset is int, skb->len is unsigned int, and offset is int. Without this patch, type conversion occurred to this expression, when (FRAG6_CB(prev)->offset + prev->len) is less than offset. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-11-12 08:51:55 +01:00
David S. Miller	c753796769	ipv4: Make rt->fl.iif tests lest obscure. When we test rt->fl.iif against zero, we're seeing if it's an output or an input route. Make that explicit with some helper functions. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-11 17:07:48 -08:00
David S. Miller	ed1deb7021	Merge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/net-next-2.6	2010-11-11 10:43:30 -08:00
Eric Dumazet	72cdd1d971	net: get rid of rtable->idev It seems idev field in struct rtable has no special purpose, but adding extra atomic ops. We hold refcounts on the device itself (using percpu data, so pretty cheap in current kernel). infiniband case is solved using dst.dev instead of idev->dev Removal of this field means routing without route cache is now using shared data, percpu data, and only potential contention is a pair of atomic ops on struct neighbour per forwarded packet. About 5% speedup on routing test. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Roland Dreier <rolandd@cisco.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-11 10:29:40 -08:00
David S. Miller	8877870f8a	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6	2010-11-10 22:15:31 -08:00
David S. Miller	7a1abd08d5	tcp: Increase TCP_MAXSEG socket option minimum. As noted by Steve Chen, since commit `f5fff5dc8a` ("tcp: advertise MSS requested by user") we can end up with a situation where tcp_select_initial_window() does a divide by a zero (or even negative) mss value. The problem is that sometimes we effectively subtract TCPOLEN_TSTAMP_ALIGNED and/or TCPOLEN_MD5SIG_ALIGNED from the mss. Fix this by increasing the minimum from 8 to 64. Reported-by: Steve Chen <schen@mvista.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-10 21:35:37 -08:00
Gerrit Renker	b3d14bff12	dccp ccid-2: Implementation of circular Ack Vector buffer with overflow handling This completes the implementation of a circular buffer for Ack Vectors, by extending the current (linear array-based) implementation. The changes are: (a) An `overflow' flag to deal with the case of overflow. As before, dynamic growth of the buffer will not be supported; but code will be added to deal robustly with overflowing Ack Vector buffers. (b) A `tail_seqno' field. When naively implementing the algorithm of Appendix A in RFC 4340, problems arise whenever subsequent Ack Vector records overlap, which can bring the entire run length calculation completely out of synch. (This is documented on http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/\ ack_vectors/tracking_tail_ackno/ .) (c) The buffer length is now computed dynamically (i.e. current fill level), as the span between head to tail. As a result, dccp_ackvec_pending() is now simpler - the #ifdef is no longer necessary since buf_empty is always true when IP_DCCP_ACKVEC is not configured. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2010-11-10 21:21:35 +01:00
Gerrit Renker	7d87093660	dccp ccid-2: Separate internals of Ack Vectors from option-parsing code This patch * separates Ack Vector housekeeping code from option-insertion code; * shifts option-specific code from ackvec.c into options.c; * introduces a dedicated routine to take care of the Ack Vector records; * simplifies the dccp_ackvec_insert_avr() routine: the BUG_ON was redundant, since the list is automatically arranged in descending order of ack_seqno. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2010-11-10 21:21:02 +01:00
Gerrit Renker	f17a37c9b8	dccp ccid-2: Ack Vector interface clean-up This patch brings the Ack Vector interface up to date. Its main purpose is to lay the basis for the subsequent patches of this set, which will use the new data structure fields and routines. There are no real algorithmic changes, rather an adaptation: (1) Replaced the static Ack Vector size (2) with a #define so that it can be adapted (with low loss / Ack Ratio, a value of 1 works, so 2 seems to be sufficient for the moment) and added a solution so that computing the ECN nonce will continue to work - even with larger Ack Vectors. (2) Replaced the #defines for Ack Vector states with a complete enum. (3) Replaced #defines to compute Ack Vector length and state with general purpose routines (inlines), and updated code to use these. (4) Added a `tail' field (conversion to circular buffer in subsequent patch). (5) Updated the (outdated) documentation for Ack Vector struct. (6) All sequence number containers now trimmed to 48 bits. (7) Removal of unused bits: * removed dccpav_ack_nonce from struct dccp_ackvec, since this is already redundantly stored in the `dccpavr_ack_nonce' (of Ack Vector record); * removed Elapsed Time for Ack Vectors (it was nowhere used); * replaced semantics of dccpavr_sent_len with dccpavr_ack_runlen, since the code needs to be able to remember the old run length; * reduced the de-/allocation routines (redundant / duplicate tests). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2010-11-10 21:20:07 +01:00
Eric Dumazet	8d987e5c75	net: avoid limits overflow Robin Holt tried to boot a 16TB machine and found some limits were reached : sysctl_tcp_mem[2], sysctl_udp_mem[2] We can switch infrastructure to use long "instead" of "int", now atomic_long_t primitives are available for free. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: Robin Holt <holt@sgi.com> Reviewed-by: Robin Holt <holt@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-10 12:12:00 -08:00
Vasiliy Kulikov	67286640f6	net: packet: fix information leak to userland packet_getname_spkt() doesn't initialize all members of sa_data field of sockaddr struct if strlen(dev->name) < 13. This structure is then copied to userland. It leads to leaking of contents of kernel stack memory. We have to fully fill sa_data with strncpy() instead of strlcpy(). The same with packet_getname(): it doesn't initialize sll_pkttype field of sockaddr_ll. Set it to zero. Signed-off-by: Vasiliy Kulikov <segooon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-10 12:09:10 -08:00
David S. Miller	57fe93b374	filter: make sure filters dont read uninitialized memory There is a possibility malicious users can get limited information about uninitialized stack mem array. Even if sk_run_filter() result is bound to packet length (0 .. 65535), we could imagine this can be used by hostile user. Initializing mem[] array, like Dan Rosenberg suggested in his patch is expensive since most filters dont even use this array. Its hard to make the filter validation in sk_chk_filter(), because of the jumps. This might be done later. In this patch, I use a bitmap (a single long var) so that only filters using mem[] loads/stores pay the price of added security checks. For other filters, additional cost is a single instruction. [ Since we access fentry->k a lot now, cache it in a local variable and mark filter entry pointer as const. -DaveM ] Reported-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-10 10:38:24 -08:00
Vasiliy Kulikov	fe10ae5338	net: ax25: fix information leak to userland Sometimes ax25_getname() doesn't initialize all members of fsa_digipeater field of fsa struct, also the struct has padding bytes between sax25_call and sax25_ndigis fields. This structure is then copied to userland. It leads to leaking of contents of kernel stack memory. Signed-off-by: Vasiliy Kulikov <segooon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-10 10:14:33 -08:00
Sage Weil	c5c6b19d4b	ceph: explicitly specify page alignment in network messages The alignment used for reading data into or out of pages used to be taken from the data_off field in the message header. This only worked as long as the page alignment matched the object offset, breaking direct io to non-page aligned offsets. Instead, explicitly specify the page alignment next to the page vector in the ceph_msg struct, and use that instead of the message header (which probably shouldn't be trusted). The alloc_msg callback is responsible for filling in this field properly when it sets up the page vector. Signed-off-by: Sage Weil <sage@newdream.net>	2010-11-09 12:43:17 -08:00
Sage Weil	b7495fc2ff	ceph: make page alignment explicit in osd interface We used to infer alignment of IOs within a page based on the file offset, which assumed they matched. This broke with direct IO that was not aligned to pages (e.g., 512-byte aligned IO). We were also trusting the alignment specified in the OSD reply, which could have been adjusted by the server. Explicitly specify the page alignment when setting up OSD IO requests. Signed-off-by: Sage Weil <sage@newdream.net>	2010-11-09 12:43:12 -08:00
Sage Weil	e98b6fed84	ceph: fix comment, remove extraneous args The offset/length arguments aren't used. Signed-off-by: Sage Weil <sage@newdream.net>	2010-11-09 12:24:53 -08:00
Eric Dumazet	332dd96f7a	net/dst: dst_dev_event() called after other notifiers Followup of commit `ef885afbf8` (net: use rcu_barrier() in rollback_registered_many) dst_dev_event() scans a garbage dst list that might be feeded by various network notifiers at device dismantle time. Its important to call dst_dev_event() after other notifiers, or we might enter the infamous msleep(250) in netdev_wait_allrefs(), and wait one second before calling again call_netdevice_notifiers(NETDEV_UNREGISTER, dev) to properly remove last device references. Use priority -10 to let dst_dev_notifier be called after other network notifiers (they have the default 0 priority) Reported-by: Ben Greear <greearb@candelatech.com> Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reported-by: Octavian Purdila <opurdila@ixiacom.com> Reported-by: Benjamin LaHaise <bcrl@kvack.org> Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-09 12:17:16 -08:00
Kulikov Vasiliy	88f8a5e3e7	net: tipc: fix information leak to userland Structure sockaddr_tipc is copied to userland with padding bytes after "id" field in union field "name" unitialized. It leads to leaking of contents of kernel stack memory. We have to initialize them to zero. Signed-off-by: Vasiliy Kulikov <segooon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-09 09:25:46 -08:00
Joe Perches	2af6fd8b18	net/ipv4/tcp.c: Update WARN uses Coalesce long formats. Align arguments. Remove KERN_<level>. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-09 09:22:32 -08:00
Joe Perches	b194a3674f	net/core/dev.c: Update WARN uses Coalesce long formats. Add missing newlines. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-09 09:22:31 -08:00
Eric Dumazet	18943d292f	inet: fix ip_mc_drop_socket() commit `8723e1b4ad` (inet: RCU changes in inetdev_by_index()) forgot one call site in ip_mc_drop_socket() We should not decrease idev refcount after inetdev_by_index() call, since refcount is not increased anymore. Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Reported-by: Miles Lane <miles.lane@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-09 08:26:42 -08:00
Luiz Augusto von Dentz	63ce0900d7	Bluetooth: fix not setting security level when creating a rfcomm session This cause 'No Bonding' to be used if userspace has not yet been paired with remote device since the l2cap socket used to create the rfcomm session does not have any security level set. Signed-off-by: Luiz Augusto von Dentz <luiz.dentz-von@nokia.com> Acked-by: Ville Tervo <ville.tervo@nokia.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-11-09 00:56:10 -02:00
Gustavo F. Padovan	4f8b691c9f	Bluetooth: fix endianness conversion in L2CAP Last commit added a wrong endianness conversion. Fixing that. Reported-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-11-09 00:56:09 -02:00
steven miao	bfaaeb3ed5	Bluetooth: fix unaligned access to l2cap conf data In function l2cap_get_conf_opt() and l2cap_add_conf_opt() the address of opt->val sometimes is not at the edge of 2-bytes/4-bytes, so 2-bytes/4 bytes access will cause data misalignment exeception. Use get_unaligned_le16/32 and put_unaligned_le16/32 function to avoid data misalignment execption. Signed-off-by: steven miao <realmz6@gmail.com> Signed-off-by: Mike Frysinger <vapier@gentoo.org> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-11-09 00:56:00 -02:00
Johan Hedberg	bdb7524a75	Bluetooth: Fix non-SSP auth request for HIGH security level sockets When initiating dedicated bonding a L2CAP raw socket with HIGH security level is used. The kernel is supposed to trigger the authentication request in this case but this doesn't happen currently for non-SSP (pre-2.1) devices. The reason is that the authentication request happens in the remote extended features callback which never gets called for non-SSP devices. This patch fixes the issue by requesting also authentiation in the (normal) remote features callback in the case of non-SSP devices. This rule is applied only for HIGH security level which might at first seem unintuitive since on the server socket side MEDIUM is already enough for authentication. However, for the clients we really want to prefer the server side to decide the authentication requrement in most cases, and since most client sockets use MEDIUM it's better to be avoided on the kernel side for these sockets. The important socket to request it for is the dedicated bonding one and that socket uses HIGH security level. The patch is based on the initial investigation and patch proposal from Andrei Emeltchenko <endrei.emeltchenko@nokia.com>. Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-11-09 00:55:27 -02:00
Randy Dunlap	96c99b473a	Bluetooth: fix hidp kconfig dependency warning Fix kconfig dependency warning to satisfy dependencies: warning: (BT_HIDP && NET && BT && BT_L2CAP && INPUT \|\| USB_HID && HID_SUPPORT && USB && INPUT) selects HID which has unmet direct dependencies (HID_SUPPORT && INPUT) Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-11-09 00:55:27 -02:00
Brian Cavagnolo	352ffad646	mac80211: unset SDATA_STATE_OFFCHANNEL when cancelling a scan For client STA interfaces, ieee80211_do_stop unsets the relevant interface's SDATA_STATE_RUNNING state bit prior to cancelling an interrupted scan. When ieee80211_offchannel_return is invoked as part of cancelling the scan, it doesn't bother unsetting the SDATA_STATE_OFFCHANNEL bit because it sees that the interface is down. Normally this doesn't matter because when the client STA interface is brought back up, it will probably issue a scan. But in some cases (e.g., the user changes the interface type while it is down), the SDATA_STATE_OFFCHANNEL bit will remain set. This prevents the interface queues from being started. So we cancel the scan before unsetting the SDATA_STATE_RUNNING bit. Signed-off-by: Brian Cavagnolo <brian@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-08 16:53:47 -05:00
Felix Fietkau	3cc25e510d	cfg80211: fix a crash in dev lookup on dump commands IS_ERR and PTR_ERR were called with the wrong pointer, leading to a crash when cfg80211_get_dev_from_ifindex fails. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-11-08 16:53:47 -05:00
Eric Dumazet	973a34aa85	af_unix: optimize unix_dgram_poll() unix_dgram_poll() is pretty expensive to check POLLOUT status, because it has to lock the socket to get its peer, take a reference on the peer to check its receive queue status, and queue another poll_wait on peer_wait. This all can be avoided if the process calling unix_dgram_poll() is not interested in POLLOUT status. It makes unix_dgram_recvmsg() faster by not queueing irrelevant pollers in peer_wait. On a test program provided by Alan Crequy : Before: real 0m0.211s user 0m0.000s sys 0m0.208s After: real 0m0.044s user 0m0.000s sys 0m0.040s Suggested-by: Davide Libenzi <davidel@xmailserver.org> Reported-by: Alban Crequy <alban.crequy@collabora.co.uk> Acked-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-08 13:50:09 -08:00
Eric Dumazet	5456f09aaf	af_unix: fix unix_dgram_poll() behavior for EPOLLOUT event Alban Crequy reported a problem with connected dgram af_unix sockets and provided a test program. epoll() would miss to send an EPOLLOUT event when a thread unqueues a packet from the other peer, making its receive queue not full. This is because unix_dgram_poll() fails to call sock_poll_wait(file, &unix_sk(other)->peer_wait, wait); if the socket is not writeable at the time epoll_ctl(ADD) is called. We must call sock_poll_wait(), regardless of 'writable' status, so that epoll can be notified later of states changes. Misc: avoids testing twice (sk->sk_shutdown & RCV_SHUTDOWN) Reported-by: Alban Crequy <alban.crequy@collabora.co.uk> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-08 13:50:09 -08:00
Eric Dumazet	67426b756c	af_unix: use keyed wakeups Instead of wakeup all sleepers, use wake_up_interruptible_sync_poll() to wakeup only ones interested into writing the socket. This patch is a specialization of commit `37e5540b3c` (epoll keyed wakeups: make sockets use keyed wakeups). On a test program provided by Alan Crequy : Before: real 0m3.101s user 0m0.000s sys 0m6.104s After: real 0m0.211s user 0m0.000s sys 0m0.208s Reported-by: Alban Crequy <alban.crequy@collabora.co.uk> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-08 13:50:08 -08:00
Eric Dumazet	fc766e4c49	decnet: RCU conversion and get rid of dev_base_lock While tracking dev_base_lock users, I found decnet used it in dnet_select_source(), but for a wrong purpose: Writers only hold RTNL, not dev_base_lock, so readers must use RCU if they cannot use RTNL. Adds an rcu_head in struct dn_ifaddr and handle proper RCU management. Adds __rcu annotation in dn_route as well. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-08 13:50:08 -08:00
David S. Miller	d0eaeec8e8	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2010-11-08 12:38:28 -08:00
Pavel Emelyanov	aa58163a76	rds: Fix rds message leak in rds_message_map_pages The sgs allocation error path leaks the allocated message. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-08 12:17:09 -08:00
Junchang Wang	eb589063ed	pktgen: correct uninitialized queue_map This fix a bug reported by backyes. Right the first time pktgen's using queue_map that's not been initialized by set_cur_queue_map(pkt_dev); Signed-off-by: Junchang Wang <junchangwang@gmail.com> Signed-off-by: Backyes <backyes@mail.ustc.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-08 12:17:07 -08:00
Shan Wei	f46421416f	ipv6: fix overlap check for fragments The type of FRAG6_CB(prev)->offset is int, skb->len is unsigned int, and offset is int. Without this patch, type conversion occurred to this expression, when (FRAG6_CB(prev)->offset + prev->len) is less than offset. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-08 12:17:06 -08:00
stephen hemminger	4c46ee5258	classifier: report statistics for basic classifier The basic classifier keeps statistics but does not report it to user space. This showed up when using basic classifier (with police) as a default catch all on ingress; no statistics were reported. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-08 12:17:05 -08:00
Dmitry Torokhov	86c2c0a8a4	NET: pktgen - fix compile warning This should fix the following warning: net/core/pktgen.c: In function ‘pktgen_if_write’: net/core/pktgen.c:890: warning: comparison of distinct pointer types lacks a cast Signed-off-by: Dmitry Torokhov <dtor@mail.ru> Reviewed-by: Nelson Elhage <nelhage@ksplice.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-07 05:28:01 -08:00
Linus Torvalds	4b4a2700f4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (41 commits) inet_diag: Make sure we actually run the same bytecode we audited. netlink: Make nlmsg_find_attr take a const nlmsghdr*. fib: fib_result_assign() should not change fib refcounts netfilter: ip6_tables: fix information leak to userspace cls_cgroup: Fix crash on module unload memory corruption in X.25 facilities parsing net dst: fix percpu_counter list corruption and poison overwritten rds: Remove kfreed tcp conn from list rds: Lost locking in loop connection freeing de2104x: fix panic on load atl1 : fix panic on load netxen: remove unused firmware exports caif: Remove noisy printout when disconnecting caif socket caif: SPI-driver bugfix - incorrect padding. caif: Bugfix for socket priority, bindtodev and dbg channel. smsc911x: Set Ethernet EEPROM size to supported device's size ipv4: netfilter: ip_tables: fix information leak to userland ipv4: netfilter: arp_tables: fix information leak to userland cxgb4vf: remove call to stop TX queues at load time. cxgb4: remove call to stop TX queues at load time. ...	2010-11-05 15:25:48 -07:00
Nelson Elhage	22e76c849d	inet_diag: Make sure we actually run the same bytecode we audited. We were using nlmsg_find_attr() to look up the bytecode by attribute when auditing, but then just using the first attribute when actually running bytecode. So, if we received a message with two attribute elements, where only the second had type INET_DIAG_REQ_BYTECODE, we would validate and run different bytecode strings. Fix this by consistently using nlmsg_find_attr everywhere. Signed-off-by: Nelson Elhage <nelhage@ksplice.com> Signed-off-by: Thomas Graf <tgraf@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-04 12:26:34 -07:00
Eric Dumazet	1f1b9c9990	fib: fib_result_assign() should not change fib refcounts After commit `ebc0ffae5` (RCU conversion of fib_lookup()), fib_result_assign() should not change fib refcounts anymore. Thanks to Michael who did the bisection and bug report. Reported-by: Michael Ellerman <michael@ellerman.id.au> Tested-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-04 12:05:32 -07:00
Jan Engelhardt	cccbe5ef85	netfilter: ip6_tables: fix information leak to userspace Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-03 18:55:39 -07:00
David S. Miller	758cb41106	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6	2010-11-03 18:52:32 -07:00
Herbert Xu	c00b2c9e79	cls_cgroup: Fix crash on module unload Somewhere along the lines net_cls_subsys_id became a macro when cls_cgroup is built as a module. Not only did it make cls_cgroup completely useless, it also causes it to crash on module unload. This patch fixes this by removing that macro. Thanks to Eric Dumazet for diagnosing this problem. Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-03 18:50:50 -07:00
andrew hendry	a6331d6f9a	memory corruption in X.25 facilities parsing Signed-of-by: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-03 18:50:50 -07:00
Xiaotian Feng	41bb78b4b9	net dst: fix percpu_counter list corruption and poison overwritten There're some percpu_counter list corruption and poison overwritten warnings in recent kernel, which is resulted by `fc66f95c`. commit `fc66f95c` switches to use percpu_counter, in ip6_route_net_init, kernel init the percpu_counter for dst entries, but, the percpu_counter is never destroyed in ip6_route_net_exit. So if the related data is freed by kernel, the freed percpu_counter is still on the list, then if we insert/remove other percpu_counter, list corruption resulted. Also, if the insert/remove option modifies the ->prev,->next pointer of the freed value, the poison overwritten is resulted then. With the following patch, the percpu_counter list corruption and poison overwritten warnings disappeared. Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-03 18:50:07 -07:00
Pavel Emelyanov	8200a59f24	rds: Remove kfreed tcp conn from list All the rds_tcp_connection objects are stored list, but when being freed it should be removed from there. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-03 18:50:07 -07:00
Pavel Emelyanov	58c490babd	rds: Lost locking in loop connection freeing The conn is removed from list in there and this requires proper lock protection. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-03 18:50:06 -07:00
sjur.brandeland@stericsson.com	47d1ff1765	caif: Remove noisy printout when disconnecting caif socket Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-03 18:50:04 -07:00
André Carvalho de Matos	f2527ec436	caif: Bugfix for socket priority, bindtodev and dbg channel. Changes: o Bugfix: SO_PRIORITY for SOL_SOCKET could not be handled in caif's setsockopt, using the struct sock attribute priority instead. o Bugfix: SO_BINDTODEVICE for SOL_SOCKET could not be handled in caif's setsockopt, using the struct sock attribute ifindex instead. o Wrong assert statement for RFM layer segmentation. o CAIF Debug channels was not working over SPI, caif_payload_info containing padding info must be initialized. o Check on pointer before dereferencing when unregister dev in caif_dev.c Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-03 18:50:03 -07:00
Vasiliy Kulikov	b5f15ac4f8	ipv4: netfilter: ip_tables: fix information leak to userland Structure ipt_getinfo is copied to userland with the field "name" that has the last elements unitialized. It leads to leaking of contents of kernel stack memory. Signed-off-by: Vasiliy Kulikov <segooon@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-11-03 08:45:06 +01:00
Vasiliy Kulikov	1a8b7a6722	ipv4: netfilter: arp_tables: fix information leak to userland Structure arpt_getinfo is copied to userland with the field "name" that has the last elements unitialized. It leads to leaking of contents of kernel stack memory. Signed-off-by: Vasiliy Kulikov <segooon@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-11-03 08:44:12 +01:00
Sage Weil	df9f86faf3	ceph: fix small seq message skipping If the client gets out of sync with the server message sequence number, we normally skip low seq messages (ones we already received). The skip code was also incrementing the expected seq, such that all subsequent messages also appeared old and got skipped, and an eventual timeout on the osd connection. This resulted in some lagging requests and console messages like [233480.882885] ceph: skipping osd22 10.138.138.13:6804 seq 2016, expected 2017 [233480.882919] ceph: skipping osd22 10.138.138.13:6804 seq 2017, expected 2018 [233480.882963] ceph: skipping osd22 10.138.138.13:6804 seq 2018, expected 2019 [233480.883488] ceph: skipping osd22 10.138.138.13:6804 seq 2019, expected 2020 [233485.219558] ceph: skipping osd22 10.138.138.13:6804 seq 2020, expected 2021 [233485.906595] ceph: skipping osd22 10.138.138.13:6804 seq 2021, expected 2022 [233490.379536] ceph: skipping osd22 10.138.138.13:6804 seq 2022, expected 2023 [233495.523260] ceph: skipping osd22 10.138.138.13:6804 seq 2023, expected 2024 [233495.923194] ceph: skipping osd22 10.138.138.13:6804 seq 2024, expected 2025 [233500.534614] ceph: tid 6023602 timed out on osd22, will reset osd Reported-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sage Weil <sage@newdream.net>	2010-11-01 15:49:23 -07:00
Tom Herbert	df32cc193a	net: check queue_index from sock is valid for device In dev_pick_tx recompute the queue index if the value stored in the socket is greater than or equal to the number of real queues for the device. The saved index in the sock structure is not guaranteed to be appropriate for the egress device (this could happen on a route change or in presence of tunnelling). The result of the queue index being bad would be to return a bogus queue (crash could prersumably follow). Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-01 12:55:52 -07:00
Uwe Kleine-König	b595076a18	tree-wide: fix comment/printk typos "gadget", "through", "command", "maintain", "maintain", "controller", "address", "between", "initiali[zs]e", "instead", "function", "select", "already", "equal", "access", "management", "hierarchy", "registration", "interest", "relative", "memory", "offset", "already", Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-11-01 15:38:34 -04:00
Dr. David Alan Gilbert	6f9b901823	l2tp: kzalloc with swapped params in l2tp_dfs_seq_open 'sparse' spotted that the parameters to kzalloc in l2tp_dfs_seq_open were swapped. Tested on current git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git at `1792f17b72` build, boots and I can see that directory, but there again I could see /sys/kernel/debug/l2tp with it swapped; I don't have any l2tp in use. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-01 06:56:02 -07:00
Thomas Graf	5ec1cea057	text ematch: check for NULL pointer before destroying textsearch config While validating the configuration em_ops is already set, thus the individual destroy functions are called, but the ematch data has not been allocated and associated with the ematch yet. Signed-off-by: Thomas Graf <tgraf@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-31 09:37:38 -07:00
Linus Torvalds	3985c7ce85	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: isdn: mISDN: socket: fix information leak to userland netdev: can: Change mail address of Hans J. Koch pcnet_cs: add new_id net: Truncate recvfrom and sendto length to INT_MAX. RDS: Let rds_message_alloc_sgs() return NULL RDS: Copy rds_iovecs into kernel memory instead of rereading from userspace RDS: Clean up error handling in rds_cmsg_rdma_args RDS: Return -EINVAL if rds_rdma_pages returns an error net: fix rds_iovec page count overflow can: pch_can: fix section mismatch warning by using a whitelisted name can: pch_can: fix sparse warning netxen_nic: Fix the tx queue manipulation bug in netxen_nic_probe ip_gre: fix fallback tunnel setup vmxnet: trivial annotation of protocol constant vmxnet3: remove unnecessary byteswapping in BAR writing macros ipv6/udp: report SndbufErrors and RcvbufErrors phy/marvell: rename 88ec048 to 88e1318s and fix mscr1 addr	2010-10-30 18:42:58 -07:00
Linus Torvalds	253eacc070	net: Truncate recvfrom and sendto length to INT_MAX. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-30 16:44:07 -07:00
Andy Grover	d139ff0907	RDS: Let rds_message_alloc_sgs() return NULL Even with the previous fix, we still are reading the iovecs once to determine SGs needed, and then again later on. Preallocating space for sg lists as part of rds_message seemed like a good idea but it might be better to not do this. While working to redo that code, this patch attempts to protect against userspace rewriting the rds_iovec array between the first and second accesses. The consequences of this would be either a too-small or too-large sg list array. Too large is not an issue. This patch changes all callers of message_alloc_sgs to handle running out of preallocated sgs, and fail gracefully. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-30 16:34:18 -07:00
Andy Grover	fc8162e3c0	RDS: Copy rds_iovecs into kernel memory instead of rereading from userspace Change rds_rdma_pages to take a passed-in rds_iovec array instead of doing copy_from_user itself. Change rds_cmsg_rdma_args to copy rds_iovec array once only. This eliminates the possibility of userspace changing it after our sanity checks. Implement stack-based storage for small numbers of iovecs, based on net/socket.c, to save an alloc in the extremely common case. Although this patch reduces iovec copies in cmsg_rdma_args to 1, we still do another one in rds_rdma_extra_size. Getting rid of that one will be trickier, so it'll be a separate patch. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-30 16:34:17 -07:00
Andy Grover	f4a3fc03c1	RDS: Clean up error handling in rds_cmsg_rdma_args We don't need to set ret = 0 at the end -- it's initialized to 0. Also, don't increment s_send_rdma stat if we're exiting with an error. Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-30 16:34:17 -07:00
Andy Grover	a09f69c49b	RDS: Return -EINVAL if rds_rdma_pages returns an error rds_cmsg_rdma_args would still return success even if rds_rdma_pages returned an error (or overflowed). Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-30 16:34:16 -07:00
Linus Torvalds	1b1f693d7a	net: fix rds_iovec page count overflow As reported by Thomas Pollet, the rdma page counting can overflow. We get the rdma sizes in 64-bit unsigned entities, but then limit it to UINT_MAX bytes and shift them down to pages (so with a possible "+1" for an unaligned address). So each individual page count fits comfortably in an 'unsigned int' (not even close to overflowing into signed), but as they are added up, they might end up resulting in a signed return value. Which would be wrong. Catch the case of tot_pages turning negative, and return the appropriate error code. Reported-by: Thomas Pollet <thomas.pollet@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-30 16:34:16 -07:00
Eric Dumazet	3285ee3bb2	ip_gre: fix fallback tunnel setup Before making the fallback tunnel visible to lookups, we should make sure it is completely setup, once ipgre_tunnel_init() had been called and tstats per_cpu pointer allocated. move rcu_assign_pointer(ign->tunnels_wc[0], tunnel); from ipgre_fb_tunnel_init() to ipgre_init_net() Based on a patch from Pavel Emelyanov Reported-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-30 16:21:28 -07:00
Eric Dumazet	870be39258	ipv6/udp: report SndbufErrors and RcvbufErrors commit `a18135eb93` (Add UDP_MIB_{SND,RCV}BUFERRORS handling.) forgot to make the necessary changes in net/ipv6/proc.c to report additional counters in /proc/net/snmp6 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-30 16:17:23 -07:00
Linus Torvalds	1840897ab5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (34 commits) b43: Fix warning at drivers/mmc/core/core.c:237 in mmc_wait_for_cmd mac80211: fix failure to check kmalloc return value in key_key_read libertas: Fix sd8686 firmware reload ath9k: Fix incorrect access of rate flags in RC netfilter: xt_socket: Make tproto signed in socket_mt6_v1(). stmmac: enable/disable rx/tx in the core with a single write. net: atarilance - flags should be unsigned long netxen: fix kdump pktgen: Limit how much data we copy onto the stack. net: Limit socket I/O iovec total length to INT_MAX. USB: gadget: fix ethernet gadget crash in gether_setup fib: Fix fib zone and its hash leak on namespace stop cxgb3: Fix panic in free_tx_desc() cxgb3: fix crash due to manipulating queues before registration 8390: Don't oops on starting dev queue dccp ccid-2: Stop polling dccp: Refine the wait-for-ccid mechanism dccp: Extend CCID packet dequeueing interface dccp: Return-value convention of hc_tx_send_packet() igbvf: fix panic on load ...	2010-10-29 14:17:12 -07:00
David S. Miller	a4765fa7bf	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6	2010-10-29 12:23:15 -07:00
Jesper Juhl	520efd1ace	mac80211: fix failure to check kmalloc return value in key_key_read I noticed two small issues in mac80211/debugfs_key.c::key_key_read while reading through the code. Patch below. The key_key_read() function returns ssize_t and the value that's actually returned is the return value of simple_read_from_buffer() which also returns ssize_t, so let's hold the return value in a ssize_t local variable rather than a int one. Also, memory is allocated dynamically with kmalloc() which can fail, but the return value of kmalloc() is not checked, so we may end up operating on a null pointer further on. So check for a NULL return and bail out with -ENOMEM in that case. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-29 14:33:26 -04:00
Eric Dumazet	d817d29d0b	netfilter: fix nf_conntrack_l4proto_register() While doing __rcu annotations work on net/netfilter I found following bug. On some arches, it is possible we publish a table while its content is not yet committed to memory, and lockless reader can dereference wild pointer. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-10-29 19:59:40 +02:00
Patrick McHardy	64e4674922	netfilter: nf_nat: fix compiler warning with CONFIG_NF_CT_NETLINK=n net/ipv4/netfilter/nf_nat_core.c:52: warning: 'nf_nat_proto_find_get' defined but not used net/ipv4/netfilter/nf_nat_core.c:66: warning: 'nf_nat_proto_put' defined but not used Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-10-29 16:28:07 +02:00
Al Viro	51139adac9	convert get_sb_pseudo() users Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-29 04:16:33 -04:00
Al Viro	fc14f2fef6	convert get_sb_single() users Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-29 04:16:28 -04:00
David S. Miller	089282fb02	netfilter: xt_socket: Make tproto signed in socket_mt6_v1(). Otherwise error indications from ipv6_find_hdr() won't be noticed. This required making the protocol argument to extract_icmp6_fields() signed too. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-28 12:59:53 -07:00
Nelson Elhage	448d7b5daf	pktgen: Limit how much data we copy onto the stack. A program that accidentally writes too much data to the pktgen file can overflow the kernel stack and oops the machine. This is only triggerable by root, so there's no security issue, but it's still an unfortunate bug. printk() won't print more than 1024 bytes in a single call, anyways, so let's just never copy more than that much data. We're on a fairly shallow stack, so that should be safe even with CONFIG_4KSTACKS. Signed-off-by: Nelson Elhage <nelhage@ksplice.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-28 11:47:53 -07:00
David S. Miller	8acfe468b0	net: Limit socket I/O iovec total length to INT_MAX. This helps protect us from overflow issues down in the individual protocol sendmsg/recvmsg handlers. Once we hit INT_MAX we truncate out the rest of the iovec by setting the iov_len members to zero. This works because: 1) For SOCK_STREAM and SOCK_SEQPACKET sockets, partial writes are allowed and the application will just continue with another write to send the rest of the data. 2) For datagram oriented sockets, where there must be a one-to-one correspondance between write() calls and packets on the wire, INT_MAX is going to be far larger than the packet size limit the protocol is going to check for and signal with -EMSGSIZE. Based upon a patch by Linus Torvalds. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-28 11:47:52 -07:00
Pavel Emelyanov	4aa2c466a7	fib: Fix fib zone and its hash leak on namespace stop When we stop a namespace we flush the table and free one, but the added fn_zone-s (and their hashes if grown) are leaked. Need to free. Tries releases all its stuff in the flushing code. Shame on us - this bug exists since the very first make-fib-per-net patches in 2.6.27 :( Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-28 10:27:03 -07:00
Gerrit Renker	1c0e0a0569	dccp ccid-2: Stop polling This updates CCID-2 to use the CCID dequeuing mechanism, converting from previous continuous-polling to a now event-driven mechanism. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-28 10:27:01 -07:00
Gerrit Renker	b1fcf55eea	dccp: Refine the wait-for-ccid mechanism This extends the existing wait-for-ccid routine so that it may be used with different types of CCID, addressing the following problems: 1) The queue-drain mechanism only works with rate-based CCIDs. If CCID-2 for example has a full TX queue and becomes network-limited just as the application wants to close, then waiting for CCID-2 to become unblocked could lead to an indefinite delay (i.e., application "hangs"). 2) Since each TX CCID in turn uses a feedback mechanism, there may be changes in its sending policy while the queue is being drained. This can lead to further delays during which the application will not be able to terminate. 3) The minimum wait time for CCID-3/4 can be expected to be the queue length times the current inter-packet delay. For example if tx_qlen=100 and a delay of 15 ms is used for each packet, then the application would have to wait for a minimum of 1.5 seconds before being allowed to exit. 4) There is no way for the user/application to control this behaviour. It would be good to use the timeout argument of dccp_close() as an upper bound. Then the maximum time that an application is willing to wait for its CCIDs to can be set via the SO_LINGER option. These problems are addressed by giving the CCID a grace period of up to the `timeout' value. The wait-for-ccid function is, as before, used when the application (a) has read all the data in its receive buffer and (b) if SO_LINGER was set with a non-zero linger time, or (c) the socket is either in the OPEN (active close) or in the PASSIVE_CLOSEREQ state (client application closes after receiving CloseReq). In addition, there is a catch-all case of __skb_queue_purge() after waiting for the CCID. This is necessary since the write queue may still have data when (a) the host has been passively-closed, (b) abnormal termination (unread data, zero linger time), (c) wait-for-ccid could not finish within the given time limit. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-28 10:27:01 -07:00
Gerrit Renker	dc841e30ea	dccp: Extend CCID packet dequeueing interface This extends the packet dequeuing interface of dccp_write_xmit() to allow 1. CCIDs to take care of timing when the next packet may be sent; 2. delayed sending (as before, with an inter-packet gap up to 65.535 seconds). The main purpose is to take CCID-2 out of its polling mode (when it is network- limited, it tries every millisecond to send, without interruption). The mode of operation for (2) is as follows: * new packet is enqueued via dccp_sendmsg() => dccp_write_xmit(), * ccid_hc_tx_send_packet() detects that it may not send (e.g. window full), * it signals this condition via `CCID_PACKET_WILL_DEQUEUE_LATER', * dccp_write_xmit() returns without further action; * after some time the wait-condition for CCID becomes true, * that CCID schedules the tasklet, * tasklet function calls ccid_hc_tx_send_packet() via dccp_write_xmit(), * since the wait-condition is now true, ccid_hc_tx_packet() returns "send now", * packet is sent, and possibly more (since dccp_write_xmit() loops). Code reuse: the taskled function calls dccp_write_xmit(), the timer function reduces to a wrapper around the same code. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-28 10:27:00 -07:00
Gerrit Renker	fe84f4140f	dccp: Return-value convention of hc_tx_send_packet() This patch reorganises the return value convention of the CCID TX sending function, to permit more flexible schemes, as required by subsequent patches. Currently the convention is * values < 0 mean error, * a value == 0 means "send now", and * a value x > 0 means "send in x milliseconds". The patch provides symbolic constants and a function to interpret return values. In addition, it caps the maximum positive return value to 0xFFFF milliseconds, corresponding to 65.535 seconds. This is possible since in CCID-3/4 the maximum possible inter-packet gap is fixed at t_mbi = 64 sec. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-28 10:27:00 -07:00
Sanchit Garg	f6ac55b6c1	net/9p: Return error on read with NULL buffer This patch ensures that a read(fd, NULL, 10) returns EFAULT on a 9p file. Signed-off-by: Sanchit Garg <sancgarg@linux.vnet.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2010-10-28 09:08:49 -05:00
Venkateswararao Jujjuri (JV)	b165d60145	9p: Add datasync to client side TFSYNC/RFSYNC for dotl SYNOPSIS size[4] Tfsync tag[2] fid[4] datasync[4] size[4] Rfsync tag[2] DESCRIPTION The Tfsync transaction transfers ("flushes") all modified in-core data of file identified by fid to the disk device (or other permanent storage device) where that file resides. If datasync flag is specified data will be fleshed but does not flush modified metadata unless that metadata is needed in order to allow a subsequent data retrieval to be correctly handled. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2010-10-28 09:08:49 -05:00
Aneesh Kumar K.V	7b3bb3fe16	net/9p: Return error if we fail to encode protocol data We need to return error in case we fail to encode data in protocol buffer. This patch also return error in case of a failed copy_from_user. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2010-10-28 09:08:49 -05:00
Venkateswararao Jujjuri (JV)	52f44e0d08	net/9p: Add waitq to VirtIO transport. If there is not enough space for the PDU on the VirtIO ring, current code returns -EIO propagating the error to user. This patch introduced a wqit_queue on the channel, and lets the process wait on this queue until VirtIO ring frees up. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2010-10-28 09:08:48 -05:00
Venkateswararao Jujjuri (JV)	419b39561e	[net/9p]Serialize virtqueue operations to make VirtIO transport SMP safe. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2010-10-28 09:08:48 -05:00
M. Mohan Kumar	329176cc2c	9p: Implement TREADLINK operation for 9p2000.L Synopsis size[4] TReadlink tag[2] fid[4] size[4] RReadlink tag[2] target[s] Description Readlink is used to return the contents of the symoblic link referred by fid. Contents of symboic link is returned as a response. target[s] - Contents of the symbolic link referred by fid. Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2010-10-28 09:08:48 -05:00
M. Mohan Kumar	1d769cd192	9p: Implement TGETLOCK Synopsis size[4] TGetlock tag[2] fid[4] getlock[n] size[4] RGetlock tag[2] getlock[n] Description TGetlock is used to test for the existence of byte range posix locks on a file identified by given fid. The reply contains getlock structure. If the lock could be placed it returns F_UNLCK in type field of getlock structure. Otherwise it returns the details of the conflicting locks in the getlock structure getlock structure: type[1] - Type of lock: F_RDLCK, F_WRLCK start[8] - Starting offset for lock length[8] - Number of bytes to check for the lock If length is 0, check for lock in all bytes starting at the location 'start' through to the end of file pid[4] - PID of the process that wants to take lock/owns the task in case of reply client[4] - Client id of the system that owns the process which has the conflicting lock Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2010-10-28 09:08:47 -05:00
M. Mohan Kumar	a099027c77	9p: Implement TLOCK Synopsis size[4] TLock tag[2] fid[4] flock[n] size[4] RLock tag[2] status[1] Description Tlock is used to acquire/release byte range posix locks on a file identified by given fid. The reply contains status of the lock request flock structure: type[1] - Type of lock: F_RDLCK, F_WRLCK, F_UNLCK flags[4] - Flags could be either of P9_LOCK_FLAGS_BLOCK - Blocked lock request, if there is a conflicting lock exists, wait for that lock to be released. P9_LOCK_FLAGS_RECLAIM - Reclaim lock request, used when client is trying to reclaim a lock after a server restrart (due to crash) start[8] - Starting offset for lock length[8] - Number of bytes to lock If length is 0, lock all bytes starting at the location 'start' through to the end of file pid[4] - PID of the process that wants to take lock client_id[4] - Unique client id status[1] - Status of the lock request, can be P9_LOCK_SUCCESS(0), P9_LOCK_BLOCKED(1), P9_LOCK_ERROR(2) or P9_LOCK_GRACE(3) P9_LOCK_SUCCESS - Request was successful P9_LOCK_BLOCKED - A conflicting lock is held by another process P9_LOCK_ERROR - Error while processing the lock request P9_LOCK_GRACE - Server is in grace period, it can't accept new lock requests in this period (except locks with P9_LOCK_FLAGS_RECLAIM flag set) Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2010-10-28 09:08:47 -05:00
Venkateswararao Jujjuri (JV)	920e65dc69	[9p] Introduce client side TFSYNC/RFSYNC for dotl. SYNOPSIS size[4] Tfsync tag[2] fid[4] size[4] Rfsync tag[2] DESCRIPTION The Tfsync transaction transfers ("flushes") all modified in-core data of file identified by fid to the disk device (or other permanent storage device) where that file resides. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2010-10-28 09:08:47 -05:00
jvrao	8e44a0805f	net/9p: Add a Warning to catch NULL fids passed to p9_client_clunk(). Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2010-10-28 09:08:45 -05:00
Arun R Bharadwaj	4f7ebe8072	net/9p: This patch implements TLERROR/RLERROR on the 9P client. Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2010-10-28 09:08:45 -05:00
Eric Dumazet	6b1686a71e	netfilter: nf_conntrack: allow nf_ct_alloc_hashtable() to get highmem pages commit `ea781f197d` (use SLAB_DESTROY_BY_RCU and get rid of call_rcu()) did a mistake in __vmalloc() call in nf_ct_alloc_hashtable(). I forgot to add __GFP_HIGHMEM, so pages were taken from LOWMEM only. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-10-28 12:34:21 +02:00
Linus Torvalds	22cdbd1d57	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (108 commits) ehea: Fixing statistics bonding: Fix lockdep warning after bond_vlan_rx_register() tunnels: Fix tunnels change rcu protection caif-u5500: Build config for CAIF shared mem driver caif-u5500: CAIF shared memory mailbox interface caif-u5500: CAIF shared memory transport protocol caif-u5500: Adding shared memory include drivers/isdn: delete double assignment drivers/net/typhoon.c: delete double assignment drivers/net/sb1000.c: delete double assignment qlcnic: define valid vlan id range qlcnic: reduce rx ring size qlcnic: fix mac learning ehea: fix use after free inetpeer: __rcu annotations fib_rules: __rcu annotates ctarget tunnels: add __rcu annotations net: add __rcu annotations to protocol ipv4: add __rcu annotations to routes.c qlge: bugfix: Restoring the vlan setting. ...	2010-10-27 18:28:00 -07:00
Pavel Emelyanov	74b0b85b88	tunnels: Fix tunnels change rcu protection After making rcu protection for tunnels (ipip, gre, sit and ip6) a bug was introduced into the SIOCCHGTUNNEL code. The tunnel is first unlinked, then addresses change, then it is linked back probably into another bucket. But while changing the parms, the hash table is unlocked to readers and they can lookup the improper tunnel. Respective commits are `b7285b79` (ipip: get rid of ipip_lock), `1507850b` (gre: get rid of ipgre_lock), `3a43be3c` (sit: get rid of ipip6_lock) and `94767632` (ip6tnl: get rid of ip6_tnl_lock). The quick fix is to wait for quiescent state to pass after unlinking, but if it is inappropriate I can invent something better, just let me know. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-27 14:20:08 -07:00
Jouni Malinen	dc9f48ce7c	mac80211: Fix scan_ies_len to include DS Params Commit `651b52254f` added DS Parameter Set information into Probe Request frames that are transmitted on 2.4 GHz band, but it failed to increment local->scan_ies_len to cover this new information. This variable needs to be updated to match the maximum IE data length so that the extra buffer need gets reduced from the driver limit. Signed-off-by: Jouni Malinen <j@w1.fi> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-27 15:46:51 -04:00
Eric Dumazet	b914c4ea92	inetpeer: __rcu annotations Adds __rcu annotations to inetpeer (struct inet_peer)->avl_left (struct inet_peer)->avl_right This is a tedious cleanup, but removes one smp_wmb() from link_to_pool() since we now use more self documenting rcu_assign_pointer(). Note the use of RCU_INIT_POINTER() instead of rcu_assign_pointer() in all cases we dont need a memory barrier. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-27 11:37:33 -07:00
Eric Dumazet	7a2b03c517	fib_rules: __rcu annotates ctarget Adds __rcu annotation to (struct fib_rule)->ctarget Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-27 11:37:32 -07:00
Eric Dumazet	b33eab0844	tunnels: add __rcu annotations Add __rcu annotations to : (struct ip_tunnel)->prl (struct ip_tunnel_prl_entry)->next (struct xfrm_tunnel)->next struct xfrm_tunnel tunnel4_handlers struct xfrm_tunnel tunnel64_handlers And use appropriate rcu primitives to reduce sparse warnings if CONFIG_SPARSE_RCU_POINTER=y Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-27 11:37:32 -07:00
Eric Dumazet	e0ad61ec86	net: add __rcu annotations to protocol Add __rcu annotations to : struct net_protocol inet_protos struct net_protocol inet6_protos And use appropriate casts to reduce sparse warnings if CONFIG_SPARSE_RCU_POINTER=y Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-27 11:37:31 -07:00
Eric Dumazet	1c31720a74	ipv4: add __rcu annotations to routes.c Add __rcu annotations to : (struct dst_entry)->rt_next (struct rt_hash_bucket)->chain And use appropriate rcu primitives to reduce sparse warnings if CONFIG_SPARSE_RCU_POINTER=y Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-27 11:37:31 -07:00
Ursula Braun	853dc2e03d	ipv6: fix refcnt problem related to POSTDAD state After running this bonding setup script modprobe bonding miimon=100 mode=0 max_bonds=1 ifconfig bond0 10.1.1.1/16 ifenslave bond0 eth1 ifenslave bond0 eth3 on s390 with qeth-driven slaves, modprobe -r fails with this message unregister_netdevice: waiting for bond0 to become free. Usage count = 1 due to twice detection of duplicate address. Problem is caused by a missing decrease of ifp->refcnt in addrconf_dad_failure. An extra call of in6_ifa_put(ifp) solves it. Problem has been introduced with commit `f2344a131b`. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-27 11:37:30 -07:00
Ben Hutchings	66c68bcc48	net: NETIF_F_HW_CSUM does not imply FCoE CRC offload NETIF_F_HW_CSUM indicates the ability to update an TCP/IP-style 16-bit checksum with the checksum of an arbitrary part of the packet data, whereas the FCoE CRC is something entirely different. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Cc: stable@kernel.org [2.6.32+] Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-27 11:37:29 -07:00
Ben Hutchings	af1905dbec	net: Fix some corner cases in dev_can_checksum() dev_can_checksum() incorrectly returns true in these cases: 1. The skb has both out-of-band and in-band VLAN tags and the device supports checksum offload for the encapsulated protocol but only with one layer of encapsulation. 2. The skb has a VLAN tag and the device supports generic checksumming but not in conjunction with VLAN encapsulation. Rearrange the VLAN tag checks to avoid these. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-27 11:37:29 -07:00
Linus Torvalds	426e1f5cec	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits) split invalidate_inodes() fs: skip I_FREEING inodes in writeback_sb_inodes fs: fold invalidate_list into invalidate_inodes fs: do not drop inode_lock in dispose_list fs: inode split IO and LRU lists fs: switch bdev inode bdi's correctly fs: fix buffer invalidation in invalidate_list fsnotify: use dget_parent smbfs: use dget_parent exportfs: use dget_parent fs: use RCU read side protection in d_validate fs: clean up dentry lru modification fs: split __shrink_dcache_sb fs: improve DCACHE_REFERENCED usage fs: use percpu counter for nr_dentry and nr_dentry_unused fs: simplify __d_free fs: take dcache_lock inside __d_path fs: do not assign default i_ino in new_inode fs: introduce a per-cpu last_ino allocator new helper: ihold() ...	2010-10-26 17:58:44 -07:00
Eric Dumazet	518de9b39e	fs: allow for more than 2^31 files Robin Holt tried to boot a 16TB system and found af_unix was overflowing a 32bit value : <quote> We were seeing a failure which prevented boot. The kernel was incapable of creating either a named pipe or unix domain socket. This comes down to a common kernel function called unix_create1() which does: atomic_inc(&unix_nr_socks); if (atomic_read(&unix_nr_socks) > 2 * get_max_files()) goto out; The function get_max_files() is a simple return of files_stat.max_files. files_stat.max_files is a signed integer and is computed in fs/file_table.c's files_init(). n = (mempages * (PAGE_SIZE / 1024)) / 10; files_stat.max_files = n; In our case, mempages (total_ram_pages) is approx 3,758,096,384 (0xe0000000). That leaves max_files at approximately 1,503,238,553. This causes 2 * get_max_files() to integer overflow. </quote> Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long integers, and change af_unix to use an atomic_long_t instead of atomic_t. get_max_files() is changed to return an unsigned long. get_nr_files() is changed to return a long. unix_nr_socks is changed from atomic_t to atomic_long_t, while not strictly needed to address Robin problem. Before patch (on a 64bit kernel) : # echo 2147483648 >/proc/sys/fs/file-max # cat /proc/sys/fs/file-max -18446744071562067968 After patch: # echo 2147483648 >/proc/sys/fs/file-max # cat /proc/sys/fs/file-max 2147483648 # cat /proc/sys/fs/file-nr 704 0 2147483648 Reported-by: Robin Holt <holt@sgi.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David Miller <davem@davemloft.net> Reviewed-by: Robin Holt <holt@sgi.com> Tested-by: Robin Holt <holt@sgi.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:15 -07:00
Glenn Wurster	7a876b0efc	IPv6: Temp addresses are immediately deleted. There is a bug in the interaction between ipv6_create_tempaddr and addrconf_verify. Because ipv6_create_tempaddr uses the cstamp and tstamp from the public address in creating a private address, if we have not received a router advertisement in a while, tstamp + temp_valid_lft might be < now. If this happens, the new address is created inside ipv6_create_tempaddr, then the loop within addrconf_verify starts again and the address is immediately deleted. We are left with no temporary addresses on the interface, and no more will be created until the public IP address is updated. To avoid this, set the expiry time to be the minimum of the time left on the public address or the config option PLUS the current age of the public interface. Signed-off-by: Glenn Wurster <gwurster@scs.carleton.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-26 12:35:13 -07:00
Glenn Wurster	aed65501e8	IPv6: Create temporary address if none exists. If privacy extentions are enabled, but no current temporary address exists, then create one when we get a router advertisement. Signed-off-by: Glenn Wurster <gwurster@scs.carleton.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-26 12:35:12 -07:00
Eric Dumazet	ded85aa86b	fib_hash: fix rcu sparse and logical errors While fixing CONFIG_SPARSE_RCU_POINTER errors, I had to fix accesses to fz->fz_hash for real. - &fz->fz_hash[fn_hash(f->fn_key, fz)] + rcu_dereference(fz->fz_hash) + fn_hash(f->fn_key, fz) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-26 11:42:39 -07:00
Eric Dumazet	ebb9fed2de	fib: fix fib_nl_newrule() Some panic reports in fib_rules_lookup() show a rule could have a NULL pointer as a next pointer in the rules_list. This can actually happen because of a bug in fib_nl_newrule() : It checks if current rule is the destination of unresolved gotos. (Other rules have gotos to this about to be inserted rule) Problem is it does the resolution of the gotos before the rule is inserted in the rules_list (and has a valid next pointer) Fix this by moving the rules_list insertion before the changes on gotos. A lockless reader can not any more follow a ctarget pointer, unless destination is ready (has a valid next pointer) Reported-by: Oleg A. Arkhangelsky <sysoleg@yandex.ru> Reported-by: Joe Buehler <aspam@cox.net> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-26 11:42:38 -07:00
David S. Miller	78fd9c4491	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6	2010-10-26 11:32:28 -07:00
Linus Torvalds	4390110fef	Merge branch 'for-2.6.37' of git://linux-nfs.org/~bfields/linux * 'for-2.6.37' of git://linux-nfs.org/~bfields/linux: (99 commits) svcrpc: svc_tcp_sendto XPT_DEAD check is redundant svcrpc: no need for XPT_DEAD check in svc_xprt_enqueue svcrpc: assume svc_delete_xprt() called only once svcrpc: never clear XPT_BUSY on dead xprt nfsd4: fix connection allocation in sequence() nfsd4: only require krb5 principal for NFSv4.0 callbacks nfsd4: move minorversion to client nfsd4: delay session removal till free_client nfsd4: separate callback change and callback probe nfsd4: callback program number is per-session nfsd4: track backchannel connections nfsd4: confirm only on succesful create_session nfsd4: make backchannel sequence number per-session nfsd4: use client pointer to backchannel session nfsd4: move callback setup into session init code nfsd4: don't cache seq_misordered replies SUNRPC: Properly initialize sock_xprt.srcaddr in all cases SUNRPC: Use conventional switch statement when reclassifying sockets sunrpc/xprtrdma: clean up workqueue usage sunrpc: Turn list_for_each-s into the ..._entry-s ... Fix up trivial conflicts (two different deprecation notices added in separate branches) in Documentation/feature-removal-schedule.txt	2010-10-26 09:55:25 -07:00
Linus Torvalds	a4dd8dce14	Merge branch 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: net/sunrpc: Use static const char arrays nfs4: fix channel attribute sanity-checks NFSv4.1: Use more sensible names for 'initialize_mountpoint' NFSv4.1: pnfs: filelayout: add driver's LAYOUTGET and GETDEVICEINFO infrastructure NFSv4.1: pnfs: add LAYOUTGET and GETDEVICEINFO infrastructure NFS: client needs to maintain list of inodes with active layouts NFS: create and destroy inode's layout cache NFSv4.1: pnfs: filelayout: introduce minimal file layout driver NFSv4.1: pnfs: full mount/umount infrastructure NFS: set layout driver NFS: ask for layouttypes during v4 fsinfo call NFS: change stateid to be a union NFSv4.1: pnfsd, pnfs: protocol level pnfs constants SUNRPC: define xdr_decode_opaque_fixed NFSD: remove duplicate NFS4_STATEID_SIZE	2010-10-26 09:52:09 -07:00
David S. Miller	7932c2e55c	netfilter: Add missing CONFIG_SYSCTL checks in ipv6's nf_conntrack_reasm.c Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-26 09:08:53 -07:00
Joe Perches	411b5e0561	net/sunrpc: Use static const char arrays Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-25 22:19:52 -04:00
Christoph Hellwig	85fe4025c6	fs: do not assign default i_ino in new_inode Instead of always assigning an increasing inode number in new_inode move the call to assign it into those callers that actually need it. For now callers that need it is estimated conservatively, that is the call is added to all filesystems that do not assign an i_ino by themselves. For a few more filesystems we can avoid assigning any inode number given that they aren't user visible, and for others it could be done lazily when an inode number is actually needed, but that's left for later patches. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-25 21:26:11 -04:00
Al Viro	7de9c6ee3e	new helper: ihold() Clones an existing reference to inode; caller must already hold one. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-25 21:26:11 -04:00
Eric Dumazet	7e360c38ab	fs: allow for more than 2^31 files Andrew, Could you please review this patch, you probably are the right guy to take it, because it crosses fs and net trees. Note : /proc/sys/fs/file-nr is a read-only file, so this patch doesnt depend on previous patch (sysctl: fix min/max handling in __do_proc_doulongvec_minmax()) Thanks ! [PATCH V4] fs: allow for more than 2^31 files Robin Holt tried to boot a 16TB system and found af_unix was overflowing a 32bit value : <quote> We were seeing a failure which prevented boot. The kernel was incapable of creating either a named pipe or unix domain socket. This comes down to a common kernel function called unix_create1() which does: atomic_inc(&unix_nr_socks); if (atomic_read(&unix_nr_socks) > 2 * get_max_files()) goto out; The function get_max_files() is a simple return of files_stat.max_files. files_stat.max_files is a signed integer and is computed in fs/file_table.c's files_init(). n = (mempages * (PAGE_SIZE / 1024)) / 10; files_stat.max_files = n; In our case, mempages (total_ram_pages) is approx 3,758,096,384 (0xe0000000). That leaves max_files at approximately 1,503,238,553. This causes 2 * get_max_files() to integer overflow. </quote> Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long integers, and change af_unix to use an atomic_long_t instead of atomic_t. get_max_files() is changed to return an unsigned long. get_nr_files() is changed to return a long. unix_nr_socks is changed from atomic_t to atomic_long_t, while not strictly needed to address Robin problem. Before patch (on a 64bit kernel) : # echo 2147483648 >/proc/sys/fs/file-max # cat /proc/sys/fs/file-max -18446744071562067968 After patch: # echo 2147483648 >/proc/sys/fs/file-max # cat /proc/sys/fs/file-max 2147483648 # cat /proc/sys/fs/file-nr 704 0 2147483648 Reported-by: Robin Holt <holt@sgi.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David Miller <davem@davemloft.net> Reviewed-by: Robin Holt <holt@sgi.com> Tested-by: Robin Holt <holt@sgi.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-25 21:18:20 -04:00
J. Bruce Fields	42d7ba3d6d	svcrpc: svc_tcp_sendto XPT_DEAD check is redundant The only caller (svc_send) has already checked XPT_DEAD. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-25 17:59:34 -04:00
J. Bruce Fields	01dba075d5	svcrpc: no need for XPT_DEAD check in svc_xprt_enqueue If any xprt marked DEAD is also left BUSY for the rest of its life, then the XPT_DEAD check here is superfluous--we'll get the same result from the XPT_BUSY check just after. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-25 17:59:33 -04:00
J. Bruce Fields	ac9303eb74	svcrpc: assume svc_delete_xprt() called only once As long as DEAD exports are left BUSY, and svc_delete_xprt is called only with BUSY held, then svc_delete_xprt() will never be called on an xprt that is already DEAD. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-25 17:59:32 -04:00
J. Bruce Fields	7e4fdd0744	svcrpc: never clear XPT_BUSY on dead xprt Once an xprt has been deleted, there's no reason to allow it to be enqueued--at worst, that might cause the xprt to be re-added to some global list, resulting in later corruption. Also, note this leaves us with no need for the reference-count manipulation here. Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-25 17:58:40 -04:00
Eric Dumazet	43a951e999	ipv4: add __rcu annotations to ip_ra_chain Add __rcu annotations to : (struct ip_ra_chain)->next struct ip_ra_chain *ip_ra_chain; And use appropriate rcu primitives. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-25 14:18:28 -07:00
Eric Dumazet	0d7da9ddd9	net: add __rcu annotation to sk_filter Add __rcu annotation to : (struct sock)->sk_filter And use appropriate rcu primitives to reduce sparse warnings if CONFIG_SPARSE_RCU_POINTER=y Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-25 14:18:28 -07:00
Eric Dumazet	1c87733d06	net_ns: add __rcu annotations add __rcu annotation to (struct net)->gen, and use rcu_dereference_protected() in net_assign_generic() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-25 14:18:27 -07:00
Eric Dumazet	6e3f7faf3e	rps: add __rcu annotations Add __rcu annotations to : (struct netdev_rx_queue)->rps_map (struct netdev_rx_queue)->rps_flow_table struct rps_sock_flow_table *rps_sock_flow_table; And use appropriate rcu primitives. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-25 14:18:27 -07:00
KOVACS Krisztian	f6318e5588	netfilter: fix module dependency issues with IPv6 defragmentation, ip6tables and xt_TPROXY One of the previous tproxy related patches split IPv6 defragmentation and connection tracking, but did not correctly add Kconfig stanzas to handle the new dependencies correctly. This patch fixes that by making the config options mirror the setup we have for IPv4: a distinct config option for defragmentation that is automatically selected by both connection tracking and xt_TPROXY/xt_socket. The patch also changes the #ifdefs enclosing IPv6 specific code in xt_socket and xt_TPROXY: we only compile these in case we have ip6tables support enabled. Signed-off-by: KOVACS Krisztian <hidden@balabit.hu> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-25 13:58:36 -07:00
Linus Torvalds	74eb94b218	Merge branch 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (67 commits) SUNRPC: Cleanup duplicate assignment in rpcauth_refreshcred nfs: fix unchecked value Ask for time_delta during fsinfo probe Revalidate caches on lock SUNRPC: After calling xprt_release(), we must restart from call_reserve NFSv4: Fix up the 'dircount' hint in encode_readdir NFSv4: Clean up nfs4_decode_dirent NFSv4: nfs4_decode_dirent must clear entry->fattr->valid NFSv4: Fix a regression in decode_getfattr NFSv4: Fix up decode_attr_filehandle() to handle the case of empty fh pointer NFS: Ensure we check all allocation return values in new readdir code NFS: Readdir plus in v4 NFS: introduce generic decode_getattr function NFS: check xdr_decode for errors NFS: nfs_readdir_filler catch all errors NFS: readdir with vmapped pages NFS: remove page size checking code NFS: decode_dirent should use an xdr_stream SUNRPC: Add a helper function xdr_inline_peek NFS: remove readdir plus limit ...	2010-10-25 13:48:29 -07:00
Eric Dumazet	6f0bcf1525	tunnels: add _rcu annotations (struct ip6_tnl)->next is rcu protected : (struct ip_tunnel)->next is rcu protected : (struct xfrm6_tunnel)->next is rcu protected : add __rcu annotation and proper rcu primitives. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-25 13:09:45 -07:00
Eric Dumazet	3cc77ec74e	net/802: add __rcu annotations (struct net_device)->garp_port is rcu protected : (struct garp_port)->applicants is rcu protected : add __rcu annotation and proper rcu primitives. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-25 13:09:44 -07:00
Eric Dumazet	198caeca3e	ipv6: ip6_ptr rcu annotations (struct net_device)->ip6_ptr is rcu protected : add __rcu annotation and proper rcu primitives. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-25 13:09:43 -07:00
Eric Dumazet	b616b09afa	vlan: rcu annotations (struct net_device)->vlgrp is rcu protected : add __rcu annotation and proper rcu primitives. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-25 13:09:43 -07:00
David S. Miller	11a766ce91	net: Increase xmit RECURSION_LIMIT to 10. Three is definitely too low, and we know from reports that GRE tunnels stacked as deeply as 37 levels cause stack overflows, so pick some reasonable value between those two. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-25 12:51:55 -07:00
Rajkumar Manoharan	c8716d9dc1	mac80211: Fix ibss station got expired immediately Station addition in ieee80211_ibss_rx_queued_mgmt is not updating sta->last_rx which is causing station expiry in ieee80211_ibss_work path. So sta addition and deletion happens repeatedly. CC: stable@kernel.org Signed-off-by: Rajkumar Manoharan <rmanoharan@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-25 14:45:56 -04:00
Luis R. Rodriguez	a171fba491	cfg80211: fix regression on processing country IEs The patch `4f366c5`: wireless: only use alpha2 regulatory information from country IE removed some complex intersection we were always doing between the AP's country IE info and what we got from CRDA. When CRDA sent us back a regulatory domain we would do some sanity checks on that regulatory domain response we just got. Part of these sanity checks included checking that we already had performed an intersection for the request of NL80211_REGDOM_SET_BY_COUNTRY_IE type. This mean that cfg80211 was only processing country IEs for cases where we already had an intersection, but since we removed enforcing this this is no longer required, we should just apply the country IE country hint with the data received from CRDA. This patch has fixes intended for kernels >= 2.6.36. Cc: stable@kernel.org Reported-by: Easwar Krishnan <easwar.krishnan@atheros.com> Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-25 14:43:14 -04:00
Christian Lamparter	5f4e6b2d3c	mac80211: don't sanitize invalid rates I found this bug while poking around with a pure-gn AP. Commit: cfg80211/mac80211: Use more generic bitrate mask for rate control Added some sanity checks to ensure that each tx rate index is included in the configured mask and it would change any rate indexes if it wasn't. But, the current implementation doesn't take into account that the invalid rate index "-1" has a special meaning (= no further attempts) and it should not be "changed". Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Cc: stable@kernel.org Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-25 14:43:13 -04:00
Tejun Heo	99b88a0ecb	mac80211: cancel restart_work explicitly instead of depending on flush_scheduled_work() iee80211_hw->restart_work is the only work which uses the system workqueue. Instead of calling flush_scheduled_work() during iee80211_exit(), cancel the work during unregistration. This is to prepare for the deprecation and removal of flush_scheduled_work(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-25 14:43:12 -04:00
Martin Schwidefsky	f6649a7e5a	[S390] cleanup lowcore access from external interrupts Read external interrupts parameters from the lowcore in the first level interrupt handler in entry[64].S. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2010-10-25 16:10:19 +02:00
Stephen Rothwell	e341b2ddc1	l2tp: static functions should not be exported Causes these build failures on PowerPC: net/l2tp/l2tp_core.c:1228: error: __ksymtab_l2tp_tunnel_closeall causes a section type conflict net/l2tp/l2tp_core.c:1228: error: __ksymtab_l2tp_tunnel_closeall causes a section type conflict net/l2tp/l2tp_core.c:1006: error: __ksymtab_l2tp_xmit_core causes a section type conflict net/l2tp/l2tp_core.c:1006: error: __ksymtab_l2tp_xmit_core causes a section type conflict net/l2tp/l2tp_core.c:847: error: __ksymtab_l2tp_udp_recv_core causes a section type conflict net/l2tp/l2tp_core.c:847: error: __ksymtab_l2tp_udp_recv_core causes a section type conflict Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-24 22:26:41 -07:00
Eric Dumazet	5c398dc8f5	netlink: fix netlink_change_ngroups() commit `6c04bb18dd` (netlink: use call_rcu for netlink_change_ngroups) used a somewhat convoluted and racy way to perform call_rcu(). The old block of memory is freed after a grace period, but the rcu_head used to track it is located in new block. This can clash if we call two times or more netlink_change_ngroups(), and a block is freed before another. call_rcu() called on different cpus makes no guarantee in order of callbacks. Fix this using a more standard way of handling this : Each block of memory contains its own rcu_head, so that no 'use after free' can happens. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Johannes Berg <johannes@sipsolutions.net> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-24 16:25:39 -07:00
Balazs Scheidler	b889416b54	tproxy: Add missing CAP_NET_ADMIN check to ipv6 side IP_TRANSPARENT requires root (more precisely CAP_NET_ADMIN privielges) for IPV6. However as I see right now this check was missed from the IPv6 implementation. Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-24 16:07:50 -07:00
Anders Franzen	7e223de84b	ip6_tunnel dont update the mtu on the route. The ip6_tunnel device did not unset the flag, IFF_XMIT_DST_RELEASE. This will make the dev layer to release the dst before calling the tunnel. The tunnel will not update any mtu/pmtu info, since it does not have a dst on the skb. Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-24 15:23:36 -07:00
Paul Gortmaker	d618222352	pktgen: clean up handling of local/transient counter vars The temporary variable "i" is needlessly initialized to zero in two distinct cases in this file: 1) where it is set to zero and then used as an argument in an addition before being assigned a non-zero value. 2) where it is only used in a standard/typical loop counter For (1), simply delete assignment to zero and usages while still zero; for (2) simply make the loop start at zero as per standard practice as seen everywhere else in the same file. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-24 15:23:35 -07:00
Trond Myklebust	9a84d38031	SUNRPC: Cleanup duplicate assignment in rpcauth_refreshcred Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-24 18:00:46 -04:00
stephen hemminger	fc130840d7	l2tp: make local function static Also moved the refcound inlines from l2tp_core.h to l2tp_core.c since only used in that one file. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-24 14:55:47 -07:00
Trond Myklebust	118df3d17f	SUNRPC: After calling xprt_release(), we must restart from call_reserve Rob Leslie reports seeing the following Oops after his Kerberos session expired. BUG: unable to handle kernel NULL pointer dereference at 00000058 IP: [<e186ed94>] rpcauth_refreshcred+0x11/0x12c [sunrpc] *pde = 00000000 Oops: 0000 [#1] last sysfs file: /sys/devices/platform/pc87360.26144/temp3_input Modules linked in: autofs4 authenc esp4 xfrm4_mode_transport ipt_LOG ipt_REJECT xt_limit xt_state ipt_REDIRECT xt_owner xt_HL xt_hl xt_tcpudp xt_mark cls_u32 cls_tcindex sch_sfq sch_htb sch_dsmark geodewdt deflate ctr twofish_generic twofish_i586 twofish_common camellia serpent blowfish cast5 cbc xcbc rmd160 sha512_generic sha1_generic hmac crypto_null af_key rpcsec_gss_krb5 nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc ip_gre sit tunnel4 dummy ext3 jbd nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables pc8736x_gpio nsc_gpio pc87360 hwmon_vid loop aes_i586 aes_generic sha256_generic dm_crypt cs5535_gpio serio_raw cs5535_mfgpt hifn_795x des_generic geode_rng rng_core led_class ext4 mbcache jbd2 crc16 dm_mirror dm_region_hash dm_log dm_snapshot dm_mod sd_mod crc_t10dif ide_pci_generic cs5536 amd74xx ide_core pata_cs5536 ata_generic libata usb_storage via_rhine mii scsi_mod btrfs zlib_deflate crc32c libcrc32c [last unloaded: scsi_wait_scan] Pid: 12875, comm: sudo Not tainted 2.6.36-net5501 #1 / EIP: 0060:[<e186ed94>] EFLAGS: 00010292 CPU: 0 EIP is at rpcauth_refreshcred+0x11/0x12c [sunrpc] EAX: 00000000 EBX: defb13a0 ECX: 00000006 EDX: e18683b8 ESI: defb13a0 EDI: 00000000 EBP: 00000000 ESP: de571d58 DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 Process sudo (pid: 12875, ti=de570000 task=decd1430 task.ti=de570000) Stack: e186e008 00000000 defb13a0 0000000d deda6000 e1868f22 e196f12b defb13a0 <0> defb13d8 00000000 00000000 e186e0aa 00000000 defb13a0 de571dac 00000000 <0> e186956c de571e34 debea5c0 de571dc8 e186967a 00000000 debea5c0 de571e34 Call Trace: [<e186e008>] ? rpc_wake_up_next+0x114/0x11b [sunrpc] [<e1868f22>] ? call_decode+0x24a/0x5af [sunrpc] [<e196f12b>] ? nfs4_xdr_dec_access+0x0/0xa2 [nfs] [<e186e0aa>] ? __rpc_execute+0x62/0x17b [sunrpc] [<e186956c>] ? rpc_run_task+0x91/0x97 [sunrpc] [<e186967a>] ? rpc_call_sync+0x40/0x5b [sunrpc] [<e1969ca2>] ? nfs4_proc_access+0x10a/0x176 [nfs] [<e19572fa>] ? nfs_do_access+0x2b1/0x2c0 [nfs] [<e186ed61>] ? rpcauth_lookupcred+0x62/0x84 [sunrpc] [<e19573b6>] ? nfs_permission+0xad/0x13b [nfs] [<c0177824>] ? exec_permission+0x15/0x4b [<c0177fbd>] ? link_path_walk+0x4f/0x456 [<c017867d>] ? path_walk+0x4c/0xa8 [<c0179678>] ? do_path_lookup+0x1f/0x68 [<c017a3fb>] ? user_path_at+0x37/0x5f [<c016359c>] ? handle_mm_fault+0x229/0x55b [<c0170a2d>] ? sys_faccessat+0x93/0x146 [<c0170aef>] ? sys_access+0xf/0x13 [<c02cf615>] ? syscall_call+0x7/0xb Code: 0f 94 c2 84 d2 74 09 8b 44 24 0c e8 6a e9 8b de 83 c4 14 89 d8 5b 5e 5f 5d c3 55 57 56 53 83 ec 1c fc 89 c6 8b 40 10 89 44 24 04 <8b> 58 58 85 db 0f 85 d4 00 00 00 0f b7 46 70 8b 56 20 89 c5 83 EIP: [<e186ed94>] rpcauth_refreshcred+0x11/0x12c [sunrpc] SS:ESP 0068:de571d58 CR2: 0000000000000058 This appears to be caused by the function rpc_verify_header() first calling xprt_release(), then doing a call_refresh. If we release the transport slot, we should _always_ jump back to call_reserve before calling anything else. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org	2010-10-24 17:27:14 -04:00
Linus Torvalds	229aebb873	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits) Update broken web addresses in arch directory. Update broken web addresses in the kernel. Revert "drivers/usb: Remove unnecessary return's from void functions" for musb gadget Revert "Fix typo: configuation => configuration" partially ida: document IDA_BITMAP_LONGS calculation ext2: fix a typo on comment in ext2/inode.c drivers/scsi: Remove unnecessary casts of private_data drivers/s390: Remove unnecessary casts of private_data net/sunrpc/rpc_pipe.c: Remove unnecessary casts of private_data drivers/infiniband: Remove unnecessary casts of private_data drivers/gpu/drm: Remove unnecessary casts of private_data kernel/pm_qos_params.c: Remove unnecessary casts of private_data fs/ecryptfs: Remove unnecessary casts of private_data fs/seq_file.c: Remove unnecessary casts of private_data arm: uengine.c: remove C99 comments arm: scoop.c: remove C99 comments Fix typo configue => configure in comments Fix typo: configuation => configuration Fix typo interrest[ing\|ed] => interest[ing\|ed] Fix various typos of valid in comments ... Fix up trivial conflicts in: drivers/char/ipmi/ipmi_si_intf.c drivers/usb/gadget/rndis.c net/irda/irnet/irnet_ppp.c	2010-10-24 13:41:39 -07:00
Trond Myklebust	ba8e452a4f	SUNRPC: Add a helper function xdr_inline_peek We sometimes need to be able to read ahead in an xdr_stream without incrementing the current pointer position. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2010-10-23 15:27:32 -04:00
Linus Torvalds	5f05647dd8	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1699 commits) bnx2/bnx2x: Unsupported Ethtool operations should return -EINVAL. vlan: Calling vlan_hwaccel_do_receive() is always valid. tproxy: use the interface primary IP address as a default value for --on-ip tproxy: added IPv6 support to the socket match cxgb3: function namespace cleanup tproxy: added IPv6 support to the TPROXY target tproxy: added IPv6 socket lookup function to nf_tproxy_core be2net: Changes to use only priority codes allowed by f/w tproxy: allow non-local binds of IPv6 sockets if IP_TRANSPARENT is enabled tproxy: added tproxy sockopt interface in the IPV6 layer tproxy: added udp6_lib_lookup function tproxy: added const specifiers to udp lookup functions tproxy: split off ipv6 defragmentation to a separate module l2tp: small cleanup nf_nat: restrict ICMP translation for embedded header can: mcp251x: fix generation of error frames can: mcp251x: fix endless loop in interrupt handler if CANINTF_MERRF is set can-raw: add msg_flags to distinguish local traffic 9p: client code cleanup rds: make local functions/variables static ... Fix up conflicts in net/core/dev.c, drivers/net/pcmcia/smc91c92_cs.c and drivers/net/wireless/ath/ath9k/debug.c as per David	2010-10-23 11:47:02 -07:00
Linus Torvalds	73ecf3a6e3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (49 commits) serial8250: ratelimit "too much work" error serial: bfin_sport_uart: speed up sport RX sample rate to be 3% faster serial: abstraction for 8250 legacy ports serial/imx: check that the buffer is non-empty before sending it out serial: mfd: add more baud rates support jsm: Remove the uart port on errors Alchemy: Add UART PM methods. 8250: allow platforms to override PM hook. altera_uart: Don't use plain integer as NULL pointer altera_uart: Fix missing prototype for registering an early console altera_uart: Fixup type usage of port flags altera_uart: Make it possible to use Altera UART and 8250 ports together altera_uart: Add support for different address strides altera_uart: Add support for getting mapbase and IRQ from resources altera_uart: Add support for polling mode (IRQ-less) serial: Factor out uart_poll_timeout() from 8250 driver serial: mark the 8250 driver as maintained serial: 8250: Don't delay after transmitter is ready. tty: MAINTAINERS: add drivers/serial/jsm/ as maintained driver vcs: invoke the vt update callback when /dev/vcs* is written to ...	2010-10-22 19:59:04 -07:00
Linus Torvalds	092e0e7e52	Merge branch 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl * 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl: vfs: make no_llseek the default vfs: don't use BKL in default_llseek llseek: automatically add .llseek fop libfs: use generic_file_llseek for simple_attr mac80211: disallow seeks in minstrel debug code lirc: make chardev nonseekable viotape: use noop_llseek raw: use explicit llseek file operations ibmasmfs: use generic_file_llseek spufs: use llseek in all file operations arm/omap: use generic_file_llseek in iommu_debug lkdtm: use generic_file_llseek in debugfs net/wireless: use generic_file_llseek in debugfs drm: use noop_llseek	2010-10-22 10:52:56 -07:00
Linus Torvalds	5704e44d28	Merge branch 'config' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl * 'config' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl: BKL: introduce CONFIG_BKL. dabusb: remove the BKL sunrpc: remove the big kernel lock init/main.c: remove BKL notations blktrace: remove the big kernel lock rtmutex-tester: make it build without BKL dvb-core: kill the big kernel lock dvb/bt8xx: kill the big kernel lock tlclk: remove big kernel lock fix rawctl compat ioctls breakage on amd64 and itanic uml: kill big kernel lock parisc: remove big kernel lock cris: autoconvert trivial BKL users alpha: kill big kernel lock isapnp: BKL removal s390/block: kill the big kernel lock hpet: kill BKL, add compat_ioctl	2010-10-22 10:43:11 -07:00
Alan Cox	0587102cf9	tty: icount changeover for other main devices Again basically cut and paste Convert the main driver set to use the hooks for GICOUNT Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-10-22 10:20:05 -07:00
Linus Torvalds	bc4016f481	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (29 commits) sched: Export account_system_vtime() sched: Call tick_check_idle before __irq_enter sched: Remove irq time from available CPU power sched: Do not account irq time to current task x86: Add IRQ_TIME_ACCOUNTING sched: Add IRQ_TIME_ACCOUNTING, finer accounting of irq time sched: Add a PF flag for ksoftirqd identification sched: Consolidate account_system_vtime extern declaration sched: Fix softirq time accounting sched: Drop group_capacity to 1 only if local group has extra capacity sched: Force balancing on newidle balance if local group has capacity sched: Set group_imb only a task can be pulled from the busiest cpu sched: Do not consider SCHED_IDLE tasks to be cache hot sched: Drop all load weight manipulation for RT tasks sched: Create special class for stop/migrate work sched: Unindent labels sched: Comment updates: fix default latency and granularity numbers tracing/sched: Add sched_pi_setprio tracepoint sched: Give CPU bound RT tasks preference sched: Try not to migrate higher priority RT tasks ...	2010-10-21 12:55:43 -07:00
Linus Torvalds	5d70f79b5e	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (163 commits) tracing: Fix compile issue for trace_sched_wakeup.c [S390] hardirq: remove pointless header file includes [IA64] Move local_softirq_pending() definition perf, powerpc: Fix power_pmu_event_init to not use event->ctx ftrace: Remove recursion between recordmcount and scripts/mod/empty jump_label: Add COND_STMT(), reducer wrappery perf: Optimize sw events perf: Use jump_labels to optimize the scheduler hooks jump_label: Add atomic_t interface jump_label: Use more consistent naming perf, hw_breakpoint: Fix crash in hw_breakpoint creation perf: Find task before event alloc perf: Fix task refcount bugs perf: Fix group moving irq_work: Add generic hardirq context callbacks perf_events: Fix transaction recovery in group_sched_in() perf_events: Fix bogus AMD64 generic TLB events perf_events: Fix bogus context time tracking tracing: Remove parent recording in latency tracer graph options tracing: Use one prologue for the preempt irqs off tracer function tracers ...	2010-10-21 12:54:49 -07:00
Linus Torvalds	888a6f77e0	Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (52 commits) sched: fix RCU lockdep splat from task_group() rcu: using ACCESS_ONCE() to observe the jiffies_stall/rnp->qsmask value sched: suppress RCU lockdep splat in task_fork_fair net: suppress RCU lockdep false positive in sock_update_classid rcu: move check from rcu_dereference_bh to rcu_read_lock_bh_held rcu: Add advice to PROVE_RCU_REPEATEDLY kernel config parameter rcu: Add tracing data to support queueing models rcu: fix sparse errors in rcutorture.c rcu: only one evaluation of arg in rcu_dereference_check() unless sparse kernel: Remove undead ifdef CONFIG_DEBUG_LOCK_ALLOC rcu: fix _oddness handling of verbose stall warnings rcu: performance fixes to TINY_PREEMPT_RCU callback checking rcu: upgrade stallwarn.txt documentation for CPU-bound RT processes vhost: add __rcu annotations rcu: add comment stating that list_empty() applies to RCU-protected lists rcu: apply TINY_PREEMPT_RCU read-side speedup to TREE_PREEMPT_RCU rcu: combine duplicate code, courtesy of CONFIG_PREEMPT_RCU rcu: Upgrade srcu_read_lock() docbook about SRCU grace periods rcu: document ways of stalling updates in low-memory situations rcu: repair code-duplication FIXMEs ...	2010-10-21 12:54:12 -07:00
Linus Torvalds	a8fe150098	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (26 commits) selinux: include vmalloc.h for vmalloc_user secmark: fix config problem when CONFIG_NF_CONNTRACK_SECMARK is not set selinux: implement mmap on /selinux/policy SELinux: allow userspace to read policy back out of the kernel SELinux: drop useless (and incorrect) AVTAB_MAX_SIZE SELinux: deterministic ordering of range transition rules kernel: roundup should only reference arguments once kernel: rounddown helper function secmark: export secctx, drop secmark in procfs conntrack: export lsm context rather than internal secid via netlink security: secid_to_secctx returns len when data is NULL secmark: make secmark object handling generic secmark: do not return early if there was no error AppArmor: Ensure the size of the copy is < the buffer allocated to hold it TOMOYO: Print URL information before panic(). security: remove unused parameter from security_task_setscheduler() tpm: change 'tpm_suspend_pcr' to be module parameter selinux: fix up style problem on /selinux/status selinux: change to new flag variable selinux: really fix dependency causing parallel compile failure. ...	2010-10-21 12:41:19 -07:00
Linus Torvalds	2017bd1945	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (22 commits) ceph: do not carry i_lock for readdir from dcache fs/ceph/xattr.c: Use kmemdup rbd: passing wrong variable to bvec_kunmap_irq() rbd: null vs ERR_PTR ceph: fix num_pages_free accounting in pagelist ceph: add CEPH_MDS_OP_SETDIRLAYOUT and associated ioctl. ceph: don't crash when passed bad mount options ceph: fix debugfs warnings block: rbd: removing unnecessary test block: rbd: fixed may leaks ceph: switch from BKL to lock_flocks() ceph: preallocate flock state without locks held ceph: add pagelist_reserve, pagelist_truncate, pagelist_set_cursor ceph: use mapping->nrpages to determine if mapping is empty ceph: only invalidate on check_caps if we actually have pages ceph: do not hide .snap in root directory rbd: introduce rados block device (rbd), based on libceph ceph: factor out libceph from Ceph file system ceph-rbd: osdc support for osd call and rollback operations ceph: messenger and osdc changes for rbd ...	2010-10-21 12:38:28 -07:00
David S. Miller	2198a10b50	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/core/dev.c	2010-10-21 08:43:05 -07:00
David S. Miller	9941fb6276	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6	2010-10-21 08:21:34 -07:00
Patrick McHardy	3b1a1ce6f4	Merge branch 'for-patrick' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/lvs-test-2.6	2010-10-21 16:25:51 +02:00
Balazs Scheidler	cc6eb43385	tproxy: use the interface primary IP address as a default value for --on-ip The REDIRECT target and the older TProxy versions used the primary address of the incoming interface as the default value of the --on-ip parameter. This was unintentionally changed during the initial TProxy submission and caused confusion among users. Since IPv6 has no notion of primary address, we just select the first address on the list: this way the socket lookup finds wildcard bound sockets properly and we cannot really do better without the user telling us the IPv6 address of the proxy. This is implemented for both IPv4 and IPv6. Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> Signed-off-by: KOVACS Krisztian <hidden@balabit.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-10-21 16:21:10 +02:00
Balazs Scheidler	b64c9256a9	tproxy: added IPv6 support to the socket match The ICMP extraction bits were contributed by Harry Mason. Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> Signed-off-by: KOVACS Krisztian <hidden@balabit.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-10-21 16:19:42 +02:00
Balazs Scheidler	6ad7889327	tproxy: added IPv6 support to the TPROXY target This requires a new revision as the old target structure was IPv4 specific. Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> Signed-off-by: KOVACS Krisztian <hidden@balabit.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-10-21 16:17:26 +02:00
Chuck Lever	9247685088	SUNRPC: Properly initialize sock_xprt.srcaddr in all cases The source address field in the transport's sock_xprt is initialized ONLY IF the RPC application passed a pointer to a source address during the call to rpc_create(). However, xs_bind() subsequently uses the value of this field without regard to whether the source address was initialized during transport creation or not. So far we've been lucky: the uninitialized value of this field is zeroes. xs_bind(), until recently, used only the sin[6]_addr field in this sockaddr, and all zeroes is a valid value for this: it means ANYADDR. This is a happy coincidence. However, xs_bind() now wants to use the sa_family field as well, and expects it to be initialized to something other than zero. Therefore, the source address sockaddr field should be fully initialized at transport create time in _every_ case, not just when the RPC application wants to use a specific bind address. Bruce added a workaround for this missing initialization by adjusting commit `6bc9638a`, but the "right" way to do this is to ensure that the source address sockaddr is always correctly initialized from the get-go. This patch doesn't introduce a behavior change. It's simply a clean-up of Bruce's fix, to prevent future problems of this kind. It may look like overkill, but a) it clearly documents the default initial value of this field, b) it doesn't assume that the sockaddr_storage memory is first initialized to any particular value, and c) it will fail verbosely if some unknown address family is passed in Originally introduced by commit `d3bc9a1d`. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-21 10:11:47 -04:00
Chuck Lever	4232e8634a	SUNRPC: Use conventional switch statement when reclassifying sockets Clean up. Defensive coding: If "family" is ever something that is neither AF_INET nor AF_INET6, xs_reclassify_socket6() is not the appropriate default action. Choose to do nothing in that case. Introduced by commit `6bc9638a`. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-21 10:11:46 -04:00
Tejun Heo	a25e758c5f	sunrpc/xprtrdma: clean up workqueue usage * Create and use svc_rdma_wq instead of using the system workqueue and flush_scheduled_work(). This workqueue is necessary to serve as flushing domain for rdma->sc_work which is used to destroy itself and thus can't be flushed explicitly. * Replace cancel_delayed_work() + flush_scheduled_work() with cancel_delayed_work_sync(). * Implement synchronous connect in xprt_rdma_connect() using flush_delayed_work() on the rdma_connect work instead of using flush_scheduled_work(). This is to prepare for the deprecation and removal of flush_scheduled_work(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-21 10:11:45 -04:00
Balazs Scheidler	0a513f6af9	tproxy: allow non-local binds of IPv6 sockets if IP_TRANSPARENT is enabled Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> Signed-off-by: KOVACS Krisztian <hidden@balabit.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-10-21 16:10:03 +02:00
Balazs Scheidler	6c46862280	tproxy: added tproxy sockopt interface in the IPV6 layer Support for IPV6_RECVORIGDSTADDR sockopt for UDP sockets were contributed by Harry Mason. Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> Signed-off-by: KOVACS Krisztian <hidden@balabit.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-10-21 16:08:28 +02:00
Balazs Scheidler	aa976fc011	tproxy: added udp6_lib_lookup function Just like with IPv4, we need access to the UDP hash table to look up local sockets, but instead of exporting the global udp_table, export a lookup function. Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> Signed-off-by: KOVACS Krisztian <hidden@balabit.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-10-21 16:05:41 +02:00
Balazs Scheidler	88440ae70e	tproxy: added const specifiers to udp lookup functions The parameters for various UDP lookup functions were non-const, even though they could be const. TProxy has some const references and instead of downcasting it, I added const specifiers along the path. Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> Signed-off-by: KOVACS Krisztian <hidden@balabit.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-10-21 16:04:33 +02:00
Balazs Scheidler	e97c3e278e	tproxy: split off ipv6 defragmentation to a separate module Like with IPv4, TProxy needs IPv6 defragmentation but does not require connection tracking. Since defragmentation was coupled with conntrack, I split off the two, creating an nf_defrag_ipv6 module, similar to the already existing nf_defrag_ipv4. Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> Signed-off-by: KOVACS Krisztian <hidden@balabit.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-10-21 16:03:43 +02:00
Arnd Bergmann	6de5bd128d	BKL: introduce CONFIG_BKL. With all the patches we have queued in the BKL removal tree, only a few dozen modules are left that actually rely on the BKL, and even there are lots of low-hanging fruit. We need to decide what to do about them, this patch illustrates one of the options: Every user of the BKL is marked as 'depends on BKL' in Kconfig, and the CONFIG_BKL becomes a user-visible option. If it gets disabled, no BKL using module can be built any more and the BKL code itself is compiled out. The one exception is file locking, which is practically always enabled and does a 'select BKL' instead. This effectively forces CONFIG_BKL to be enabled until we have solved the fs/lockd mess and can apply the patch that removes the BKL from fs/locks.c. Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2010-10-21 15:44:13 +02:00
Eric Dumazet	e83726b046	l2tp: small cleanup Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-21 04:39:09 -07:00
Julian Anastasov	b0aeef3043	nf_nat: restrict ICMP translation for embedded header Skip ICMP translation of embedded protocol header if NAT bits are not set. Needed for IPVS to see the original embedded addresses because for IPVS traffic the IPS_SRC_NAT_BIT and IPS_DST_NAT_BIT bits are not set. It happens when IPVS performs DNAT for client packets after using nf_conntrack_alter_reply to expect replies from real server. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2010-10-21 13:30:02 +02:00
Oliver Hartkopp	1e55659ce6	can-raw: add msg_flags to distinguish local traffic CAN has no addressing scheme. It is currently impossible for userspace to tell is a received CAN frame comes from another process on the local host, or from a remote CAN device. This patch add support for userspace applications to distinguish between 'own', 'local' and 'remote' CAN traffic. The distinction is made by returning flags in msg->msg_flags in the call to recvmsg(). The added documentation explains the introduced flags. Signed-off-by: Kurt Van Dijck <kurt.van.dijck@eia.be> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-21 04:27:03 -07:00
stephen hemminger	32a875adcd	9p: client code cleanup Make p9_client_version static since only used in one file. Remove p9_client_auth because it is defined but never used. Compile tested only. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-21 04:26:39 -07:00
stephen hemminger	ff51bf8415	rds: make local functions/variables static The RDS protocol has lots of functions that should be declared static. rds_message_get/add_version_extension is removed since it defined but never used. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-21 04:26:39 -07:00
stephen hemminger	d0c2b0d265	napi: unexport napi_reuse_skb The function napi_reuse_skb is only used inside core. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-21 04:26:38 -07:00
Allan Stephens	a0e00369f1	tipc: delete needless memset from bearer enabling. Eliminates zeroing out of the new bearer structure at the start of activation, since it is already in that state. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-21 04:25:51 -07:00
Benjamin Poirier	1e253c3b8a	bridge: Forward reserved group addresses if !STP Make all frames sent to reserved group MAC addresses (01:80:c2:00:00:00 to 01:80:c2:00:00:0f) be forwarded if STP is disabled. This enables forwarding EAPOL frames, among other things. Signed-off-by: Benjamin Poirier <benjamin.poirier@polymtl.ca> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-21 04:25:48 -07:00
Tejun Heo	a5c30b349b	net/neighbour: cancel_delayed_work() + flush_scheduled_work() -> cancel_delayed_work_sync() flush_scheduled_work() is going away. Prepare for it. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-21 04:25:48 -07:00
Neil Horman	db5a753bf1	Revert `d88dca79d3` TIPC needs to have its endianess issues fixed. Unfortunately, the format of a subscriber message is passed in directly from user space, so requiring this message to be in network byte order breaks user space ABI. Revert this change until such time as we can determine how to do this in a backwards compatible manner. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-21 04:11:08 -07:00
Neil Horman	8c97443808	Revert `c6537d6742` Backout the tipc changes to the flags int he subscription message. These changees, while reasonable on the surface, interefere with user space ABI compatibility which is a no-no. This was part of the changes to fix the endianess issues in the TIPC protocol, which would be really nice to do but we need to do so in a way that is backwards compatible with user space. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-21 04:11:07 -07:00
Balazs Scheidler	093d282321	tproxy: fix hash locking issue when using port redirection in __inet_inherit_port() When __inet_inherit_port() is called on a tproxy connection the wrong locks are held for the inet_bind_bucket it is added to. __inet_inherit_port() made an implicit assumption that the listener's port number (and thus its bind bucket). Unfortunately, if you're using the TPROXY target to redirect skbs to a transparent proxy that assumption is not true anymore and things break. This patch adds code to __inet_inherit_port() so that it can handle this case by looking up or creating a new bind bucket for the child socket and updates callers of __inet_inherit_port() to gracefully handle __inet_inherit_port() failing. Reported by and original patch from Stephen Buck <stephen.buck@exinda.com>. See http://marc.info/?t=128169268200001&r=1&w=2 for the original discussion. Signed-off-by: KOVACS Krisztian <hidden@balabit.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-10-21 13:06:43 +02:00
Ben Greear	d2ed817766	net/core: Allow tagged VLAN packets to flow through VETH devices. When there are VLANs on a VETH device, the packets being transmitted through the VETH device may be 4 bytes bigger than MTU. A check in dev_forward_skb did not take this into account and so dropped these packets. This patch is needed at least as far back as 2.6.34.7 and should be considered for -stable. Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-21 04:06:29 -07:00
Balazs Scheidler	6006db84a9	tproxy: add lookup type checks for UDP in nf_tproxy_get_sock_v4() Also, inline this function as the lookup_type is always a literal and inlining removes branches performed at runtime. Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> Signed-off-by: KOVACS Krisztian <hidden@balabit.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-10-21 12:47:34 +02:00
Balazs Scheidler	106e4c26b1	tproxy: kick out TIME_WAIT sockets in case a new connection comes in with the same tuple Without tproxy redirections an incoming SYN kicks out conflicting TIME_WAIT sockets, in order to handle clients that reuse ports within the TIME_WAIT period. The same mechanism didn't work in case TProxy is involved in finding the proper socket, as the time_wait processing code looked up the listening socket assuming that the listener addr/port matches those of the established connection. This is not the case with TProxy as the listener addr/port is possibly changed with the tproxy rule. Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> Signed-off-by: KOVACS Krisztian <hidden@balabit.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-10-21 12:45:14 +02:00
Changli Gao	3511c9132f	net_sched: remove the unused parameter of qdisc_create_dflt() The first parameter dev isn't in use in qdisc_create_dflt(). Signed-off-by: Changli Gao <xiaosuo@gmail.com> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-21 03:09:47 -07:00
stephen hemminger	1c4c40c42d	xfrm: make xfrm_bundle_ok local Only used in one place. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-21 03:09:46 -07:00
stephen hemminger	8d8a0b1cc2	rtnetlink: remove rtnl_kill_links The function rtnl_kill_links is defined but never used. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-21 03:09:45 -07:00
stephen hemminger	6f747aca5e	xfrm6: make xfrm6_tunnel_free_spi local Function only defined and used in one file. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-21 03:09:45 -07:00
stephen hemminger	d028023281	bridge: make br_parse_ip_options static Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-21 03:09:42 -07:00
stephen hemminger	11165f1457	socket: localize functions A couple of functions in socket.c are only used there and should be localized. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-21 03:09:42 -07:00
Eric Dumazet	9b0c290e78	fib: introduce fib_alias_accessed() helper Perf tools session at NFWS 2010 pointed out a false sharing on struct fib_alias that can be avoided pretty easily, if we set FA_S_ACCESSED bit only if needed (ie : not already set) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-21 03:09:41 -07:00
Eric Dumazet	7b5edbc4cf	net/sched: fix missing spinlock init Under network load, doing : tc qdisc del dev eth0 root triggers : [ 167.193087] BUG: spinlock bad magic on CPU#3, udpflood/4928 [ 167.193139] lock: c15bc324, .magic: 00000000, .owner: <none>/-1, .owner_cpu: -1 [ 167.193193] Pid: 4928, comm: udpflood Not tainted 2.6.36-rc7-11417-g215340c-dirty #323 [ 167.193245] Call Trace: [ 167.193292] [<c13abaa0>] ? printk+0x18/0x20 [ 167.193342] [<c11afb53>] spin_bug+0xa3/0xf0 [ 167.193389] [<c11afcdd>] do_raw_spin_lock+0x7d/0x160 [ 167.193440] [<c1313d4e>] ? __dev_xmit_skb+0x27e/0x2b0 [ 167.193496] [<c107382b>] ? trace_hardirqs_on+0xb/0x10 [ 167.193545] [<c13ae99a>] _raw_spin_lock+0x3a/0x40 [ 167.193593] [<c1313d4e>] ? __dev_xmit_skb+0x27e/0x2b0 [ 167.193641] [<c1313d4e>] __dev_xmit_skb+0x27e/0x2b0 commit `79640a4ca6` (add additional lock to qdisc to increase throughput) forgot to initialize noop_qdisc and noqueue_qdisc busylock Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-21 03:09:40 -07:00
Julian Anastasov	0d79641a96	ipvs: provide address family for debugging As skb->protocol is not valid in LOCAL_OUT add parameter for address family in packet debugging functions. Even if ports are not present in AH and ESP change them to use ip_vs_tcpudp_debug_packet to show at least valid addresses as before. This patch removes the last user of skb->protocol in IPVS. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2010-10-21 11:04:43 +02:00
Julian Anastasov	3233759be7	ipvs: inherit forwarding method in backup Connections in backup server should inherit the forwarding method from real server. It is a way to fix a problem where the forwarding method in backup connection is damaged by logical OR operation with the real server's connection flags. And the change is needed for setups where the backup server uses different forwarding method for the same real servers. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2010-10-21 11:04:30 +02:00
Julian Anastasov	cb59155f21	ipvs: changes for local client This patch deals with local client processing. Prefer LOCAL_OUT hook for scheduling connections from local clients. LOCAL_IN is still supported if the packets are not marked as processed in LOCAL_OUT. The idea to process requests in LOCAL_OUT is to alter conntrack reply before it is confirmed at POST_ROUTING. If the local requests are processed in LOCAL_IN the conntrack can not be updated and matching by state is impossible. Add the following handlers: - ip_vs_reply[46] at LOCAL_IN:99 to process replies from remote real servers to local clients. Now when both replies from remote real servers (ip_vs_reply) and local real servers (ip_vs_local_reply) are handled it is safe to remove the conn_out_get call from ip_vs_in because it does not support related ICMP packets. - ip_vs_local_request[46] at LOCAL_OUT:-98 to process requests from local client Handling in LOCAL_OUT causes some changes: - as skb->dev, skb->protocol and skb->pkt_type are not defined in LOCAL_OUT make sure we set skb->dev before calling icmpv6_send, prefer skb_dst(skb) for struct net and remove the skb->protocol checks from TUN transmitters. [ horms@verge.net.au: removed trailing whitespace ] Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2010-10-21 11:04:01 +02:00
Julian Anastasov	fc60476761	ipvs: changes for local real server This patch deals with local real servers: - Add support for DNAT to local address (different real server port). It needs ip_vs_out hook in LOCAL_OUT for both families because skb->protocol is not set for locally generated packets and can not be used to set 'af'. - Skip packets in ip_vs_in marked with skb->ipvs_property because ip_vs_out processing can be executed in LOCAL_OUT but we still have the conn_out_get check in ip_vs_in. - Ignore packets with inet->nodefrag from local stack - Require skb_dst(skb) != NULL because we use it to get struct net - Add support for changing the route to local IPv4 stack after DNAT depending on the source address type. Local client sets output route and the remote client sets input route. It looks like IPv6 does not need such rerouting because the replies use addresses from initial incoming header, not from skb route. - All transmitters now have strict checks for the destination address type: redirect from non-local address to local real server requires NAT method, local address can not be used as source address when talking to remote real server. - Now LOCALNODE is not set explicitly as forwarding method in real server to allow the connections to provide correct forwarding method to the backup server. Not sure if this breaks tools that expect to see 'Local' real server type. If needed, this can be supported with new flag IP_VS_DEST_F_LOCAL. Now it should be possible connections in backup that lost their fwmark information during sync to be forwarded properly to their daddr, even if it is local address in the backup server. By this way backup could be used as real server for DR or TUN, for NAT there are some restrictions because tuple collisions in conntracks can create problems for the traffic. - Call ip_vs_dst_reset when destination is updated in case some real server IP type is changed between local and remote. [ horms@verge.net.au: removed trailing whitespace ] Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2010-10-21 11:03:46 +02:00
Julian Anastasov	f5a41847ac	ipvs: move ip_route_me_harder for ICMP Currently, ip_route_me_harder after ip_vs_out_icmp is called even if packet is not related to IPVS connection. Move it into handle_response_icmp. Also, force rerouting if sending to local client because IPv4 stack uses addresses from the route. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2010-10-21 10:51:43 +02:00
Julian Anastasov	1ca5bb5450	ipvs: create ip_vs_defrag_user Create new function ip_vs_defrag_user to return correct IP_DEFRAG_xxx user depending on the hooknum. It will be needed when we add handlers in LOCAL_OUT. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2010-10-21 10:51:28 +02:00
Julian Anastasov	4256f1aaa6	ipvs: fix CHECKSUM_PARTIAL for TUN method The recent change in IP_VS_XMIT_TUNNEL to set CHECKSUM_NONE is not correct. After adding IPIP header skb->csum becomes invalid but the CHECKSUM_PARTIAL case must be supported. So, use skb_forward_csum() which is most suitable for us to allow local clients to send IPIP to remote real server. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2010-10-21 10:51:11 +02:00
Julian Anastasov	489fdedaed	ipvs: stop ICMP from FORWARD to local Delivering locally ICMP from FORWARD hook is not supported. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2010-10-21 10:50:57 +02:00
Julian Anastasov	190ecd27cd	ipvs: do not schedule conns from real servers This patch is needed to avoid scheduling of packets from local real server when we add ip_vs_in in LOCAL_OUT hook to support local client. Currently, when ip_vs_in can not find existing connection it tries to create new one by calling ip_vs_schedule. The default indication from ip_vs_schedule was if connection was scheduled to real server. If real server is not available we try to use the bypass forwarding method or to send ICMP error. But in some cases we do not want to use the bypass feature. So, add flag 'ignored' to indicate if the scheduler ignores this packet. Make sure we do not create new connections from replies. We can hit this problem for persistent services and local real server when ip_vs_in is added to LOCAL_OUT hook to handle local clients. Also, make sure ip_vs_schedule ignores SYN packets for Active FTP DATA from local real server. The FTP DATA connection should be created on SYN+ACK from client to assign correct connection daddr. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2010-10-21 10:50:41 +02:00
Julian Anastasov	cf356d69db	ipvs: switch to notrack mode Change skb->ipvs_property semantic. This is preparation to support ip_vs_out processing in LOCAL_OUT. ipvs_property=1 will be used to avoid expensive lookups for traffic sent by transmitters. Now when conntrack support is not used we call ip_vs_notrack method to avoid problems in OUTPUT and POST_ROUTING hooks instead of exiting POST_ROUTING as before. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2010-10-21 10:50:20 +02:00
Julian Anastasov	8b27b10f58	ipvs: optimize checksums for apps Avoid full checksum calculation for apps that can provide info whether csum was broken after payload mangling. For now only ip_vs_ftp mangles payload and it updates the csum, so the full recalculation is avoided for all packets. Add CHECKSUM_UNNECESSARY for snat_handler (TCP and UDP). It is needed to support SNAT from local address for the case when csum is fully recalculated. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2010-10-21 10:50:02 +02:00
Julian Anastasov	5bc9068e9d	ipvs: fix CHECKSUM_PARTIAL for TCP, UDP Fix CHECKSUM_PARTIAL handling. Tested for IPv4 TCP, UDP not tested because it needs network card with HW CSUM support. May be fixes problem where IPVS can not be used in virtual boxes. Problem appears with DNAT to local address when the local stack sends reply in CHECKSUM_PARTIAL mode. Fix tcp_dnat_handler and udp_dnat_handler to provide vaddr and daddr in right order (old and new IP) when calling tcp_partial_csum_update/udp_partial_csum_update (CHECKSUM_PARTIAL). Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2010-10-21 10:49:39 +02:00
Jesse Gross	361ff8a6cf	bridge: Add support for TX vlan offload. If some of the underlying devices support it, enable vlan offload on transmit for bridge devices. This allows senders to take advantage of the hardware support, similar to other forms of acceleration. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-21 01:26:54 -07:00
Jesse Gross	d5dbda2380	ethtool: Add support for vlan accleration. Now that vlan acceleration is handled consistently regardless of usage, it is possible to enable and disable it at will. This adds support for Ethtool operations that change the offloading status for debugging purposes, similar to other forms of hardware acceleration. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-21 01:26:54 -07:00
Jesse Gross	3701e51382	vlan: Centralize handling of hardware acceleration. Currently each driver that is capable of vlan hardware acceleration must be aware of the vlan groups that are configured and then pass the stripped tag to a specialized receive function. This is different from other types of hardware offload in that it places a significant amount of knowledge in the driver itself rather keeping it in the networking core. This makes vlan offloading function more similarly to other forms of offloading (such as checksum offloading or TSO) by doing the following: * On receive, stripped vlans are passed directly to the network core, without attempting to check for vlan groups or reconstructing the header if no group * vlans are made less special by folding the logic into the main receive routines * On transmit, the device layer will add the vlan header in software if the hardware doesn't support it, instead of spreading that logic out in upper layers, such as bonding. There are a number of advantages to this: * Fixes all bugs with drivers incorrectly dropping vlan headers at once. * Avoids having to disable VLAN acceleration when in promiscuous mode (good for bridging since it always puts devices in promiscuous mode). * Keeps VLAN tag separate until given to ultimate consumer, which avoids needing to do header reconstruction as in tg3 unless absolutely necessary. * Consolidates common code in core networking. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-21 01:26:53 -07:00
Jesse Gross	65ac6a5fa6	vlan: Avoid hash table lookup to find group. A struct net_device always maps to zero or one vlan groups and we always know the device when we are looking up a group. We currently do a hash table lookup on the device to find the group but it is much simpler to just store a pointer. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-21 01:26:53 -07:00
Jesse Gross	7b9c609037	vlan: Enable software emulation for vlan accleration. Currently users of hardware vlan accleration need to know whether the device supports it before generating packets. However, vlan acceleration will soon be available in a more flexible manner so knowing ahead of time becomes much more difficult. This adds a software fallback path for vlan packets on devices without the necessary offloading support, similar to other types of hardware accleration. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-21 01:26:52 -07:00
Jesse Gross	b738127dfb	vlan: Rename VLAN_GROUP_ARRAY_LEN to VLAN_N_VID. VLAN_GROUP_ARRAY_LEN is simply the number of possible vlan VIDs. Since vlan groups will soon be more of an implementation detail for vlan devices, rename the constant to be descriptive of its actual purpose. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-21 01:26:50 -07:00
Jesse Gross	13937911f9	ebtables: Allow filtering of hardware accelerated vlan frames. An upcoming commit will allow packets with hardware vlan acceleration information to be passed though more parts of the network stack, including packets trunked through the bridge. This adds support for matching and filtering those packets through ebtables. Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-21 01:26:49 -07:00
David S. Miller	4993e0d211	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/padovan/bluetooth-2.6	2010-10-21 00:54:29 -07:00
Eric Paris	ff660c80d0	secmark: fix config problem when CONFIG_NF_CONNTRACK_SECMARK is not set When CONFIG_NF_CONNTRACK_SECMARK is not set we accidentally attempt to use the secmark fielf of struct nf_conn. Problem is when that config isn't set the field doesn't exist. whoops. Wrap the incorrect usage in the config. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>	2010-10-21 10:13:00 +11:00
Eric Paris	1ae4de0cdf	secmark: export secctx, drop secmark in procfs The current secmark code exports a secmark= field which just indicates if there is special labeling on a packet or not. We drop this field as it isn't particularly useful and instead export a new field secctx= which is the actual human readable text label. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: James Morris <jmorris@namei.org>	2010-10-21 10:12:52 +11:00
Eric Paris	1cc63249ad	conntrack: export lsm context rather than internal secid via netlink The conntrack code can export the internal secid to userspace. These are dynamic, can change on lsm changes, and have no meaning in userspace. We should instead be sending lsm contexts to userspace instead. This patch sends the secctx (rather than secid) to userspace over the netlink socket. We use a new field CTA_SECCTX and stop using the the old CTA_SECMARK field since it did not send particularly useful information. Signed-off-by: Eric Paris <eparis@redhat.com> Reviewed-by: Paul Moore <paul.moore@hp.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: James Morris <jmorris@namei.org>	2010-10-21 10:12:51 +11:00
Eric Paris	2606fd1fa5	secmark: make secmark object handling generic Right now secmark has lots of direct selinux calls. Use all LSM calls and remove all SELinux specific knowledge. The only SELinux specific knowledge we leave is the mode. The only point is to make sure that other LSMs at least test this generic code before they assume it works. (They may also have to make changes if they do not represent labels as strings) Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Paul Moore <paul.moore@hp.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: James Morris <jmorris@namei.org>	2010-10-21 10:12:48 +11:00
Eric Paris	15714f7b58	secmark: do not return early if there was no error Commit `4a5a5c73` attempted to pass decent error messages back to userspace for netfilter errors. In xt_SECMARK.c however the patch screwed up and returned on 0 (aka no error) early and didn't finish setting up secmark. This results in a kernel BUG if you use SECMARK. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>	2010-10-21 10:12:47 +11:00
Sage Weil	240634e9b3	ceph: fix num_pages_free accounting in pagelist Decrement the free page counter when removing a page from the free_list. Signed-off-by: Sage Weil <sage@newdream.net>	2010-10-20 15:38:23 -07:00
Yehuda Sadeh	010e3b48fc	ceph: don't crash when passed bad mount options This only happened when parse_extra_token was not passed to ceph_parse_option() (hence, only happened in rbd). Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2010-10-20 15:38:22 -07:00
Greg Farnum	ac0b74d8a1	ceph: add pagelist_reserve, pagelist_truncate, pagelist_set_cursor These facilitate preallocation of pages so that we can encode into the pagelist in an atomic context. Signed-off-by: Greg Farnum <gregf@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2010-10-20 15:38:16 -07:00
Yehuda Sadeh	602adf4002	rbd: introduce rados block device (rbd), based on libceph The rados block device (rbd), based on osdblk, creates a block device that is backed by objects stored in the Ceph distributed object storage cluster. Each device consists of a single metadata object and data striped over many data objects. The rbd driver supports read-only snapshots. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2010-10-20 15:38:13 -07:00
Yehuda Sadeh	3d14c5d2b6	ceph: factor out libceph from Ceph file system This factors out protocol and low-level storage parts of ceph into a separate libceph module living in net/ceph and include/linux/ceph. This is mostly a matter of moving files around. However, a few key pieces of the interface change as well: - ceph_client becomes ceph_fs_client and ceph_client, where the latter captures the mon and osd clients, and the fs_client gets the mds client and file system specific pieces. - Mount option parsing and debugfs setup is correspondingly broken into two pieces. - The mon client gets a generic handler callback for otherwise unknown messages (mds map, in this case). - The basic supported/required feature bits can be expanded (and are by ceph_fs_client). No functional change, aside from some subtle error handling cases that got cleaned up in the refactoring process. Signed-off-by: Sage Weil <sage@newdream.net>	2010-10-20 15:37:28 -07:00
Eric Dumazet	27b75c95f1	net: avoid RCU for NOCACHE dst There is no point using RCU for dst we allocate for a very short time (used once). Change dst_release() to take DST_NOCACHE into account, but also change skb_dst_set_noref() to force a refcount increment for such dst. This is a _huge_ gain, because we dont waste memory to store xx thousand of dsts. Instead of queueing them to RCU, we can free them instantly. CPU caches can stay hot, re-using same memory blocks to hold temporary dsts. Note : remove unneeded smp_mb__before_atomic_dec(); in dst_release(), since atomic_dec_return() implies a full memory barrier. Stress test, 160.000.000 udp frames sent, IP route cache disabled (DDOS). Before: real 0m38.091s user 0m13.189s sys 7m53.018s After: real 0m29.946s user 0m12.157s sys 7m40.605s For reference, if IP route cache was enabled : real 0m32.030s user 0m10.521s sys 8m15.243s Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-20 03:02:23 -07:00
Tom Herbert	e6484930d7	net: allocate tx queues in register_netdevice This patch introduces netif_alloc_netdev_queues which is called from register_device instead of alloc_netdev_mq. This makes TX queue allocation symmetric with RX allocation. Also, queue locks allocation is done in netdev_init_one_queue. Change set_real_num_tx_queues to fail if requested number < 1 or greater than number of allocated queues. Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-20 02:27:59 -07:00
Tom Herbert	bd25fa7ba5	net: cleanups in RX queue allocation Clean up in RX queue allocation. In netif_set_real_num_rx_queues return error on attempt to set zero queues, or requested number is greater than number of allocated queues. In netif_alloc_rx_queues, do BUG_ON if queue_count is zero. Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-20 02:27:59 -07:00
Tom Herbert	55513fb428	net: fail alloc_netdev_mq if queue count < 1 In alloc_netdev_mq fail if requested queue_count < 1. Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-20 02:27:58 -07:00
David S. Miller	5eeaa2db16	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6	2010-10-20 01:59:48 -07:00
Changli Gao	c5e90f5620	phonet: remove the unused variable pn Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-20 01:55:54 -07:00
Neil Horman	f13d493d9c	netpoll: Revert napi_poll fix for bonding driver In an erlier patch I modified napi_poll so that devices with IFF_MASTER polled the per_cpu list instead of the device list for napi. I did this because the bonding driver has no napi instances to poll, it instead expects to check the slave devices napi instances, which napi_poll was unaware of. Looking at this more closely however, I now see this isn't strictly needed. As the bond driver poll_controller calls the slaves poll_controller via netpoll_poll_dev, which recursively calls poll_napi on each slave, allowing those napi instances to get serviced. The earlier patch isn't at all harmfull, its just not needed, so lets revert it to make the code cleaner. Sorry for the noise, Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: WANG Cong <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-20 01:44:30 -07:00
Eduardo Blanco	d86bef73b4	Fixed race condition at ip_vs.ko module init. Lists were initialized after the module was registered. Multiple ipvsadm processes at module load triggered a race condition that resulted in a null pointer dereference in do_ip_vs_get_ctl(). As a result, __ip_vs_mutex was left locked preventing all further ipvsadm commands. Signed-off-by: Eduardo J. Blanco <ejblanco@google.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2010-10-19 17:13:16 +02:00
Pavel Emelyanov	8f3a6de313	sunrpc: Turn list_for_each-s into the ..._entry-s Saves some lines of code and some branticks when reading one. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-19 10:48:16 -04:00
Pavel Emelyanov	50fa0d40a9	sunrpc: Remove dead "else" branch from bc xprt creation Since the xprt in question is forcibly set to be bound the else branch of this check is unneeded. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-19 10:48:16 -04:00
Pavel Emelyanov	c636b572e0	sunrpc: Don't return NULL from rpcb_create > The reason for this is in the future, we may want to support additional > address family types. We should, therefore, ensure that every piece of > code that is sensitive to address families fail in some orderly manner > to let developers know where a change is needed. Makes sense. I was under impression, that AF-s other than INET are not cared about at all :( Here's a fixed version of the patch. Log: Its callers check for ERR_PTR. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-19 10:48:16 -04:00
Pavel Emelyanov	f10fef38d2	sunrpc: Remove useless if (task == NULL) from xprt_reserve_xprt The task in question is dereferenced above (and is actually never NULL). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-19 10:48:16 -04:00
Pavel Emelyanov	8c14ff2aaf	sunrpc: Remove UDP worker wrappers Same for UDP sockets creation paths. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-19 10:48:15 -04:00
Pavel Emelyanov	cdd518d524	sunrpc: Remove TCP worker wrappers The v4 and the v6 wrappers only pass the respective family to the xs_tcp_setup_socket. This family can be taken from the xprt's sockaddr. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-19 10:48:15 -04:00
Pavel Emelyanov	7dfe1fc362	sunrpc: Pass family to setup_socket calls Now we have a single socket creation routine and can call it directly from the setup_socket routines. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-19 10:48:15 -04:00
Pavel Emelyanov	6bc9638ab4	sunrpc: Merge xs_create_sock code After xs_bind is merged it's easy to merge its callers. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [bfields@redhat.com: fix address family initialization] Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-19 10:48:15 -04:00
Pavel Emelyanov	beb59b6828	sunrpc: Merge the xs_bind code There's the only difference betseen the xs_bind4 and the xs_bind6 - the size of sockaddr structure they use. Fortunatelly its size can be indirectly get from the transport. Change since v1: * use sockaddr_storage instead of sockaddr * use rpc_set_port instead of manual port assigning Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [bfields@redhat.com: fix address family initialization] Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-19 10:48:15 -04:00
Pavel Emelyanov	573018c07e	sunrpc: Call xs_create_sockX directly from setup_socket Remove now unneeded wrappers that just add type and protocol to socket creation callback. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-19 10:48:15 -04:00
Pavel Emelyanov	22d44a7d8a	sunrpc: Factor out v6 sockets creation Same patch for v6 protocols. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-19 10:48:15 -04:00
Pavel Emelyanov	22f793268d	sunrpc: Factor out v4 sockets creation The UDPv4 and TCPv4 socket creation callbacks now look very similar. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-19 10:48:14 -04:00
Pavel Emelyanov	b65c031061	sunrpc: Factor out udp sockets creation Make it look like the TCP sockets creation. Unfortunately the git diff made the patch look messy :( Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-19 10:48:14 -04:00
Pavel Emelyanov	58dddac9c5	sunrpc: Remove duplicate xprt/transport arguments from calls The xs_tcp_reuse_connection takes the xprt only to pass it down to the xs_abort_connection. The later one can get it from the given transport itself. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-19 10:48:14 -04:00
Pavel Emelyanov	a9f5f0f7bf	sunrpc: Get xprt pointer once in xs_tcp_setup_socket Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-19 10:48:14 -04:00
Pavel Emelyanov	baaf4e487a	sunrpc: Remove unused sock arg from xs_next_srcport Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-19 10:48:14 -04:00
Pavel Emelyanov	5d4ec93297	sunrpc: Remove unused sock arg from xs_get_srcport Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-19 10:48:14 -04:00
Eric Dumazet	8723e1b4ad	inet: RCU changes in inetdev_by_index() Convert inetdev_by_index() to not increment in_dev refcount. Callers hold RCU or RTNL, and should not decrement in_dev refcount. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-19 03:50:48 -07:00
Eric Dumazet	9e917dca74	net: avoid a dev refcount in ip_mc_find_dev() We hold RTNL in ip_mc_find_dev(), no need to touch device refcount. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-19 03:50:47 -07:00
Arnd Bergmann	a6f8dbc654	sunrpc: remove the big kernel lock The sunrpc cache_ioctl function does not need the big kernel lock because it uses its own queue_lock already. rpc_pipe_ioctl apparently should be using i_lock like the other operations on the pipe file descriptor do. Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2010-10-19 11:29:59 +02:00
Hans Schillstrom	714f095f74	ipvs: IPv6 tunnel mode IPv6 encapsulation uses a bad source address for the tunnel. i.e. VIP will be used as local-addr and encap. dst addr. Decapsulation will not accept this. Example LVS (eth1 2003::2:0:1/96, VIP 2003::2:0:100) (eth0 2003::1:0:1/96) RS (ethX 2003::1:0:5/96) tcpdump 2003::2:0:100 > 2003::1:0:5: IP6 (hlim 63, next-header TCP (6) payload length: 40) 2003::3:0:10.50991 > 2003::2:0:100.http: Flags [S], cksum 0x7312 (correct), seq 3006460279, win 5760, options [mss 1440,sackOK,TS val 1904932 ecr 0,nop,wscale 3], length 0 In Linux IPv6 impl. you can't have a tunnel with an any cast address receiving packets (I have not tried to interpret RFC 2473) To have receive capabilities the tunnel must have: - Local address set as multicast addr or an unicast addr - Remote address set as an unicast addr. - Loop back addres or Link local address are not allowed. This causes us to setup a tunnel in the Real Server with the LVS as the remote address, here you can't use the VIP address since it's used inside the tunnel. Solution Use outgoing interface IPv6 address (match against the destination). i.e. use ip6_route_output() to look up the route cache and then use ipv6_dev_get_saddr(...) to set the source address of the encapsulated packet. Additionally, cache the results in new destination fields: dst_cookie and dst_saddr and properly check the returned dst from ip6_route_output. We now add xfrm_lookup call only for the tunneling method where the source address is a local one. Signed-off-by:Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-10-19 10:38:48 +02:00
Pablo Neira Ayuso	ebbf41df4a	netfilter: ctnetlink: add expectation deletion events This patch allows to listen to events that inform about expectations destroyed. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-10-19 10:19:06 +02:00
Tom Tucker	4a84386fc2	svcrdma: Cleanup DMA unmapping in error paths. There are several error paths in the code that do not unmap DMA. This patch adds calls to svc_rdma_unmap_dma to free these DMA contexts. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-18 19:51:32 -04:00
Tom Tucker	b432e6b3d9	svcrdma: Change DMA mapping logic to avoid the page_address kernel API There was logic in the send path that assumed that a page containing data to send to the client has a KVA. This is not always the case and can result in data corruption when page_address returns zero and we end up DMA mapping zero. This patch changes the bus mapping logic to avoid page_address() where necessary and converts all calls from ib_dma_map_single to ib_dma_map_page in order to keep the map/unmap calls symmetric. Signed-off-by: Tom Tucker <tom@ogc.us> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-18 19:51:31 -04:00
Venkatesh Pallipadi	75e1056f5c	sched: Fix softirq time accounting Peter Zijlstra found a bug in the way softirq time is accounted in VIRT_CPU_ACCOUNTING on this thread: http://lkml.indiana.edu/hypermail//linux/kernel/1009.2/01366.html The problem is, softirq processing uses local_bh_disable internally. There is no way, later in the flow, to differentiate between whether softirq is being processed or is it just that bh has been disabled. So, a hardirq when bh is disabled results in time being wrongly accounted as softirq. Looking at the code a bit more, the problem exists in !VIRT_CPU_ACCOUNTING as well. As account_system_time() in normal tick based accouting also uses softirq_count, which will be set even when not in softirq with bh disabled. Peter also suggested solution of using 2*SOFTIRQ_OFFSET as irq count for local_bh_{disable,enable} and using just SOFTIRQ_OFFSET while softirq processing. The patch below does that and adds API in_serving_softirq() which returns whether we are currently processing softirq or not. Also changes one of the usages of softirq_count in net/sched/cls_cgroup.c to in_serving_softirq. Looks like many usages of in_softirq really want in_serving_softirq. Those changes can be made individually on a case by case basis. Signed-off-by: Venkatesh Pallipadi <venki@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1286237003-12406-2-git-send-email-venki@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-18 20:52:20 +02:00
Neil Horman	990c3d6f9c	bonding: Fix napi poll for bonding driver Usually the netpoll path, when preforming a napi poll can get away with just polling all the napi instances of the configured device. Thats not the case for the bonding driver however, as the napi instances which may wind up getting flagged as needing polling after the poll_controller call don't belong to the bonded device, but rather to the slave devices. Fix this by checking the device in question for the IFF_MASTER flag, if set, we know we need to check the full poll list for this cpu, rather than just the devices napi instance list. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-18 08:32:08 -07:00
Neil Horman	c2355e1ab9	bonding: Fix bonding drivers improper modification of netpoll structure The bonding driver currently modifies the netpoll structure in its xmit path while sending frames from netpoll. This is racy, as other cpus can access the netpoll structure in parallel. Since the bonding driver points np->dev to a slave device, other cpus can inadvertently attempt to send data directly to slave devices, leading to improper locking with the bonding master, lost frames, and deadlocks. This patch fixes that up. This patch also removes the real_dev pointer from the netpoll structure as that data is really only used by bonding in the poll_controller, and we can emulate its behavior by check each slave for IS_UP. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-18 08:32:07 -07:00
Andy Walls	27a954bd56	IPv4: route.c: Change checks against 0xffffffff to ipv4_is_lbcast() Change a few checks against the hardcoded broadcast address, 0xffffffff, to ipv4_is_lbcast(). Remove some existing checks using ipv4_is_lbcast() that are now obviously superfluous. Signed-off-by: Andy Walls <awalls@md.metrocast.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-18 07:22:50 -07:00
Randy Dunlap	76b6717bc6	netfilter: fix kconfig unmet dependency warning Fix netfilter kconfig unmet dependencies warning & spell out "compatible" while there. warning: (IP_NF_TARGET_TTL && NET && INET && NETFILTER && IP_NF_IPTABLES && NETFILTER_ADVANCED \|\| IP6_NF_TARGET_HL && NET && INET && IPV6 && NETFILTER && IP6_NF_IPTABLES && NETFILTER_ADVANCED) selects NETFILTER_XT_TARGET_HL which has unmet direct dependencies ((IP_NF_MANGLE \|\| IP6_NF_MANGLE) && NETFILTER_ADVANCED) Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-10-18 11:13:30 +02:00
Justin P. Mattock	631dd1a885	Update broken web addresses in the kernel. The patch below updates broken web addresses in the kernel Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> Cc: Maciej W. Rozycki <macro@linux-mips.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Finn Thain <fthain@telegraphics.com.au> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Matt Turner <mattst88@gmail.com> Cc: Dimitry Torokhov <dmitry.torokhov@gmail.com> Cc: Mike Frysinger <vapier.adi@gmail.com> Acked-by: Ben Pfaff <blp@cs.stanford.edu> Acked-by: Hans J. Koch <hjk@linutronix.de> Reviewed-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-10-18 11:03:14 +02:00
Allan Stephens	ccc901ee58	tipc: Simplify bearer shutdown logic Optimize processing in TIPC's bearer shutdown code, including: 1. Remove an unnecessary check to see if TIPC bearer's can exist. 2. Don't release spinlocks before calling a media-specific disabling routine, since the routine can't sleep. 3. Make bearer_disable() operate directly on a struct bearer, instead of needlessly taking a name and then mapping that to the struct. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Reviewed-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-18 01:50:49 -07:00
David S. Miller	724829b3ad	tipc: Kill tipc_get_mode() completely. It's completely unused and exporting a static symbol makes no sense and breaks the build. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-18 01:06:20 -07:00
Nathan Holstein	d793fe8caa	Bluetooth: fix oops in l2cap_connect_req In error cases when the ACL is insecure or we fail to allocate a new struct sock, we jump to the "response" label. If so, "sk" will be null and the kernel crashes. Signed-off-by: Nathan Holstein <nathan.holstein@gmail.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-10-17 21:19:19 -02:00
Eric Dumazet	19f572565e	fib_hash: RCU conversion phase 2 Get rid of fib_hash_lock rwlock. The fn_zone hash table resize is the noticeable part of this patch. I added a seqlock per fn_zone, so that readers can restart their lookup in the (very rare) case a writer expanded the hash table. Add rcu heads in fib_alias and fib_node, use call_rcu() to defer their freeing, and use appropriate _rcu list manipulations. Stress test (160.000.000 udp frames sent, IP route cache disabled to mimic DDOS attack, FIB_HASH) Before: real 0m41.191s user 0m13.137s sys 8m55.241s After: real 0m38.091s user 0m13.189s sys 7m53.018s Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-17 13:53:16 -07:00
Eric Dumazet	117a8cdea3	fib_hash: RCU conversion phase 1 First step for RCU conversion of fib_hash : struct fn_zone are created and never deleted. Very classic conversion, using rcu_assign_pointer(), rcu_dereference() and rtnl_dereference() verbs. __rcu markers on fz_next and fn_zone_list They are created under RTNL, we dont need fib_hash_lock anymore in fn_new_zone(). Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-17 13:53:15 -07:00
Eric Dumazet	9bef83edfb	fib_hash: embed initial hash table in fn_zone While looking for false sharing problems, I noticed sizeof(struct fn_zone) was small (28 bytes) and possibly sharing a cache line with an often written kernel structure. Most of the time, fn_zone uses its initial hash table of 16 slots. We can avoid the false sharing problem by embedding this initial hash table in fn_zone itself, so that sizeof(fn_zone) > L1_CACHE_BYTES We did a similar optimization in commit `a6501e080c` (Reduce memory needs and speedup lookups) Add a fz_revorder field to speedup fn_hash() a bit. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-17 13:53:15 -07:00
Ilpo Järvinen	c60ce4e265	tcp: use correct counters in CA_CWR state too As CWR is stronger than CA_Disorder state, we can miscount SACK/Reno failure into other timeouts. Not a bad problem as it can happen only due to ECN, FRTO detecting spurious RTO or xmit error which are the only callers of tcp_enter_cwr. And even then losses and RTO must still follow thereafter to actually end up into the relevant code paths. Compile tested. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-17 13:46:33 -07:00
Ilpo Järvinen	1fdb936101	tcp: sack lost marking fixes When only fast rexmit should be done, tcp_mark_head_lost marks L too far. Also, sacked_upto below 1 is perfectly valid number, the packets == 0 then needs to be trapped elsewhere. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-17 13:46:33 -07:00
stephen hemminger	31e3c3f6f1	tipc: cleanup function namespace Do some cleanups of TIPC based on make namespacecheck 1. Don't export unused symbols 2. Eliminate dead code 3. Make functions and variables local 4. Rename buf_acquire to tipc_buf_acquire since it is used in several files Compile tested only. This make break out of tree kernel modules that depend on TIPC routines. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-16 11:13:24 -07:00
Eric Dumazet	10da66f755	fib: avoid false sharing on fib_table_hash While doing profile analysis, I found fib_hash_table was sometime in a cache line shared by a possibly often written kernel structure. (CONFIG_IP_ROUTE_MULTIPATH \|\| !CONFIG_IPV6_MULTIPLE_TABLES) It's hard to detect because not easily reproductible. Make sure we allocate a full cache line to keep this shared in all cpus caches. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-16 11:13:23 -07:00
Eric Dumazet	874ffa8f72	fib_trie: use fls() instead of open coded loop fib_table_lookup() might use fls() to speedup an open coded loop. Noticed while doing a profile analysis. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-16 11:13:22 -07:00
Eric Dumazet	a0a4a85a15	fib: remove a useless synchronize_rcu() call fib_nl_delrule() calls synchronize_rcu() for no apparent reason, while rtnl is held. I suspect it was done to avoid an atomic_inc_not_zero() in fib_rules_lookup(), which commit `7fa7cb7109` added anyway. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-16 11:13:22 -07:00
Eric Dumazet	2c1c00040a	fib6: use FIB_LOOKUP_NOREF in fib6_rule_lookup() Avoid two atomic ops on found rule in fib6_rule_lookup() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-16 11:13:21 -07:00
Eric Dumazet	564824b0c5	net: allocate skbs on local node commit `b30973f877` (node-aware skb allocation) spread a wrong habit of allocating net drivers skbs on a given memory node : The one closest to the NIC hardware. This is wrong because as soon as we try to scale network stack, we need to use many cpus to handle traffic and hit slub/slab management on cross-node allocations/frees when these cpus have to alloc/free skbs bound to a central node. skb allocated in RX path are ephemeral, they have a very short lifetime : Extra cost to maintain NUMA affinity is too expensive. What appeared as a nice idea four years ago is in fact a bad one. In 2010, NIC hardwares are multiqueue, or we use RPS to spread the load, and two 10Gb NIC might deliver more than 28 million packets per second, needing all the available cpus. Cost of cross-node handling in network and vm stacks outperforms the small benefit hardware had when doing its DMA transfert in its 'local' memory node at RX time. Even trying to differentiate the two allocations done for one skb (the sk_buff on local node, the data part on NIC hardware node) is not enough to bring good performance. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-16 11:13:19 -07:00
John W. Linville	c64557d666	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem	2010-10-15 16:11:56 -04:00
John W. Linville	1a63c353c8	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/padovan/bluetooth-next-2.6 into for-davem	2010-10-15 16:00:02 -04:00
Johannes Berg	9ebad4ab87	radiotap: fix vendor namespace parsing There's a bug with radiotap vendor namespace parsing if you don't register for the given namespace extensions. Fix this by passing only the unknown vendor namespaces and the registered data to frontends, but not both. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-15 15:57:34 -04:00
Linus Torvalds	799c10559d	De-pessimize rds_page_copy_user Don't try to "optimize" rds_page_copy_user() by using kmap_atomic() and the unsafe atomic user mode accessor functions. It's actually slower than the straightforward code on any reasonable modern CPU. Back when the code was written (although probably not by the time it was actually merged, though), 32-bit x86 may have been the dominant architecture. And there kmap_atomic() can be a lot faster than kmap() (unless you have very good locality, in which case the virtual address caching by kmap() can overcome all the downsides). But these days, x86-64 may not be more populous, but it's getting there (and if you care about performance, it's definitely already there - you'd have upgraded your CPU's already in the last few years). And on x86-64, the non-kmap_atomic() version is faster, simply because the code is simpler and doesn't have the "re-try page fault" case. People with old hardware are not likely to care about RDS anyway, and the optimization for the 32-bit case is simply buggy, since it doesn't verify the user addresses properly. Reported-by: Dan Rosenberg <drosenberg@vsecurity.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-15 11:09:28 -07:00
Arnd Bergmann	6038f373a3	llseek: automatically add .llseek fop All file_operations should get a .llseek operation so we can make nonseekable_open the default for future file operations without a .llseek pointer. The three cases that we can automatically detect are no_llseek, seq_lseek and default_llseek. For cases where we can we can automatically prove that the file offset is always ignored, we use noop_llseek, which maintains the current behavior of not returning an error from a seek. New drivers should normally not use noop_llseek but instead use no_llseek and call nonseekable_open at open time. Existing drivers can be converted to do the same when the maintainer knows for certain that no user code relies on calling seek on the device file. The generated code is often incorrectly indented and right now contains comments that clarify for each added line why a specific variant was chosen. In the version that gets submitted upstream, the comments will be gone and I will manually fix the indentation, because there does not seem to be a way to do that using coccinelle. Some amount of new code is currently sitting in linux-next that should get the same modifications, which I will do at the end of the merge window. Many thanks to Julia Lawall for helping me learn to write a semantic patch that does all this. ===== begin semantic patch ===== // This adds an llseek= method to all file operations, // as a preparation for making no_llseek the default. // // The rules are // - use no_llseek explicitly if we do nonseekable_open // - use seq_lseek for sequential files // - use default_llseek if we know we access f_pos // - use noop_llseek if we know we don't access f_pos, // but we still want to allow users to call lseek // @ open1 exists @ identifier nested_open; @@ nested_open(...) { <+... nonseekable_open(...) ...+> } @ open exists@ identifier open_f; identifier i, f; identifier open1.nested_open; @@ int open_f(struct inode i, struct file f) { <+... ( nonseekable_open(...) \| nested_open(...) ) ...+> } @ read disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t read_f(struct file f, char p, size_t s, loff_t off) { <+... ( off = E \| off += E \| func(..., off, ...) \| E = off ) ...+> } @ read_no_fpos disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t read_f(struct file f, char p, size_t s, loff_t off) { ... when != off } @ write @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t write_f(struct file f, const char p, size_t s, loff_t off) { <+... ( off = E \| off += E \| func(..., off, ...) \| E = off ) ...+> } @ write_no_fpos @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t write_f(struct file f, const char p, size_t s, loff_t off) { ... when != off } @ fops0 @ identifier fops; @@ struct file_operations fops = { ... }; @ has_llseek depends on fops0 @ identifier fops0.fops; identifier llseek_f; @@ struct file_operations fops = { ... .llseek = llseek_f, ... }; @ has_read depends on fops0 @ identifier fops0.fops; identifier read_f; @@ struct file_operations fops = { ... .read = read_f, ... }; @ has_write depends on fops0 @ identifier fops0.fops; identifier write_f; @@ struct file_operations fops = { ... .write = write_f, ... }; @ has_open depends on fops0 @ identifier fops0.fops; identifier open_f; @@ struct file_operations fops = { ... .open = open_f, ... }; // use no_llseek if we call nonseekable_open //////////////////////////////////////////// @ nonseekable1 depends on !has_llseek && has_open @ identifier fops0.fops; identifier nso ~= "nonseekable_open"; @@ struct file_operations fops = { ... .open = nso, ... +.llseek = no_llseek, /* nonseekable / }; @ nonseekable2 depends on !has_llseek @ identifier fops0.fops; identifier open.open_f; @@ struct file_operations fops = { ... .open = open_f, ... +.llseek = no_llseek, / open uses nonseekable / }; // use seq_lseek for sequential files ///////////////////////////////////// @ seq depends on !has_llseek @ identifier fops0.fops; identifier sr ~= "seq_read"; @@ struct file_operations fops = { ... .read = sr, ... +.llseek = seq_lseek, / we have seq_read / }; // use default_llseek if there is a readdir /////////////////////////////////////////// @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier readdir_e; @@ // any other fop is used that changes pos struct file_operations fops = { ... .readdir = readdir_e, ... +.llseek = default_llseek, / readdir is present / }; // use default_llseek if at least one of read/write touches f_pos ///////////////////////////////////////////////////////////////// @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read.read_f; @@ // read fops use offset struct file_operations fops = { ... .read = read_f, ... +.llseek = default_llseek, / read accesses f_pos / }; @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, ... + .llseek = default_llseek, / write accesses f_pos / }; // Use noop_llseek if neither read nor write accesses f_pos /////////////////////////////////////////////////////////// @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; identifier write_no_fpos.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, .read = read_f, ... +.llseek = noop_llseek, / read and write both use no f_pos / }; @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write_no_fpos.write_f; @@ struct file_operations fops = { ... .write = write_f, ... +.llseek = noop_llseek, / write uses no f_pos / }; @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; @@ struct file_operations fops = { ... .read = read_f, ... +.llseek = noop_llseek, / read uses no f_pos / }; @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; @@ struct file_operations fops = { ... +.llseek = noop_llseek, / no read or write fn */ }; ===== End semantic patch ===== Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Julia Lawall <julia@diku.dk> Cc: Christoph Hellwig <hch@infradead.org>	2010-10-15 15:53:27 +02:00
Kumar Sanghvi	b3d6255388	Phonet: 'connect' socket implementation for Pipe controller Based on suggestion by Rémi Denis-Courmont to implement 'connect' for Pipe controller logic, this patch implements 'connect' socket call for the Pipe controller logic. The patch does following:- - Removes setsockopts for PNPIPE_CREATE and PNPIPE_DESTROY - Adds setsockopt for setting the Pipe handle value - Implements connect socket call - Updates the Pipe controller logic User-space should now follow below sequence with Pipe controller:- -socket -bind -setsockopt for PNPIPE_PIPE_HANDLE -connect -setsockopt for PNPIPE_ENCAP_IP -setsockopt for PNPIPE_ENABLE GPRS/3G data has been tested working fine with this. Signed-off-by: Kumar Sanghvi <kumar.sanghvi@stericsson.com> Acked-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-13 14:40:34 -07:00
Paul Gortmaker	7368ddf144	tipc: clean out all instances of #if 0'd unused code Remove all instances of legacy, or as yet to be implemented code that is currently living within an #if 0 ... #endif block. In the rare instance that some of it be needed in the future, it can still be dragged out of history, but there is no need for it to sit in mainline. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-13 14:27:34 -07:00
Johannes Berg	e4b55957eb	mac80211: fix SMPS request It looks like I submitted a different patch than I tested, because clearly the code in mac80211 is missing actually propagating the requested SMPS mode. Fix that! Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-13 15:45:23 -04:00
Johannes Berg	7be5086d4c	mac80211: add probe request filter flag Using the frame registration notification, we can see when probe requests are requested and notify the low-level driver via filtering. The flag is also set in AP and IBSS modes. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-13 15:45:22 -04:00
Johannes Berg	271733cf84	cfg80211: notify drivers about frame registrations Drivers may need to adjust their filters according to frame registrations, so notify them about them. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-13 15:45:22 -04:00
Simon Horman	a91fd267e3	IPVS: ip_vs_dbg_callid() is only needed for debugging ip_vs_dbg_callid() and IP_VS_DEBUG_CALLID() are only needed it CONFIG_IP_VS_DEBUG is defined. This resolves the following build warning when CONFIG_IP_VS_DEBUG is not defined. net/netfilter/ipvs/ip_vs_pe_sip.c:11: warning: 'ip_vs_dbg_callid' defined but not used Reported-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-10-13 21:22:35 +02:00
Jan Engelhardt	243bf6e29e	netfilter: xtables: resolve indirect macros 3/3	2010-10-13 18:00:46 +02:00
Jan Engelhardt	87a2e70db6	netfilter: xtables: resolve indirect macros 2/3 Signed-off-by: Jan Engelhardt <jengelh@medozas.de>	2010-10-13 18:00:41 +02:00
Jan Engelhardt	12b00c2c02	netfilter: xtables: resolve indirect macros 1/3 Many of the used macros are just there for userspace compatibility. Substitute the in-kernel code to directly use the terminal macro and stuff the defines into #ifndef __KERNEL__ sections. Signed-off-by: Jan Engelhardt <jengelh@medozas.de>	2010-10-13 18:00:36 +02:00
Ben Greear	cfd8e12f42	wireless: Print wiphy name in sysfs. The index cannot be used to reliably reconstruct a phy name, so explicitly add the phy name to sysfs so that scripts can figure out the parent phy device for a particular wireless interface. Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-12 16:05:29 -04:00
Eric Dumazet	29b4433d99	net: percpu net_device refcount We tried very hard to remove all possible dev_hold()/dev_put() pairs in network stack, using RCU conversions. There is still an unavoidable device refcount change for every dst we create/destroy, and this can slow down some workloads (routers or some app servers, mmap af_packet) We can switch to a percpu refcount implementation, now dynamic per_cpu infrastructure is mature. On a 64 cpus machine, this consumes 256 bytes per device. On x86, dev_hold(dev) code : before lock incl 0x280(%ebx) after: movl 0x260(%ebx),%eax incl fs:(%eax) Stress bench : (Sending 160.000.000 UDP frames, IP route cache disabled, dual E5540 @2.53GHz, 32bit kernel, FIB_TRIE) Before: real 1m1.662s user 0m14.373s sys 12m55.960s After: real 0m51.179s user 0m15.329s sys 10m15.942s Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-12 12:35:25 -07:00
David S. Miller	8fa6e3d454	Merge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/net-next-2.6	2010-10-12 11:43:42 -07:00
Andrei Emeltchenko	534c92fde7	Bluetooth: clean up rfcomm code Remove dead code and unused rfcomm thread events Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-10-12 12:44:53 -03:00
Haijun Liu	ab3e571564	Bluetooth: Update conf_state before send config_req out Update conf_state with L2CAP_CONF_REQ_SENT before send config_req out in l2cap_config_req(). Signed-off-by: Haijun Liu <haijun.liu@atheros.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-10-12 12:44:53 -03:00
Gustavo F. Padovan	0175d629e0	Bluetooth: Use the proper error value from bt_skb_send_alloc() &err points to the proper error set by bt_skb_send_alloc() when it fails. Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-10-12 12:44:52 -03:00
Gustavo F. Padovan	d6b2eb2f89	Bluetooth: make batostr() print in the right order The Bluetooth core uses the the BD_ADDR in the opposite order from the human readable order. So we are changing batostr() to print in the correct order and then removing some baswap(), as they are not needed anymore. Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-10-12 12:44:52 -03:00
Gustavo F. Padovan	cb810a189d	Bluetooth: remove unused variable from cmtp A value was attributed to 'src', but no one was using. Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-10-12 12:44:52 -03:00
Andrei Emeltchenko	aae7fe22a8	Bluetooth: check for l2cap header in start fragment BLUETOOTH SPECIFICATION Version 4.0 [Vol 3] page 36 mentioned "Note: Start Fragments always begin with the Basic L2CAP header of a PDU." Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-10-12 12:44:52 -03:00
Andrei Emeltchenko	8979481328	Bluetooth: check L2CAP length in first ACL fragment Current Bluetooth code assembles fragments of big L2CAP packets in l2cap_recv_acldata and then checks allowed L2CAP size in assemled L2CAP packet (pi->imtu < skb->len). The patch moves allowed L2CAP size check to the early stage when we receive the first fragment of L2CAP packet. We do not need to reserve and keep L2CAP fragments for bad packets. Updated version after comments from Mat Martineau <mathewm@codeaurora.org> and Gustavo Padovan <padovan@profusion.mobi>. Trace below is received when using stress tools sending big fragmented L2CAP packets. ... [ 1712.798492] swapper: page allocation failure. order:4, mode:0x4020 [ 1712.804809] [<c0031870>] (unwind_backtrace+0x0/0xdc) from [<c00a1f70>] (__alloc_pages_nodemask+0x4) [ 1712.814666] [<c00a1f70>] (__alloc_pages_nodemask+0x47c/0x4d4) from [<c00a1fd8>] (__get_free_pages+) [ 1712.824645] [<c00a1fd8>] (__get_free_pages+0x10/0x3c) from [<c026eb5c>] (__alloc_skb+0x4c/0xfc) [ 1712.833465] [<c026eb5c>] (__alloc_skb+0x4c/0xfc) from [<bf28c738>] (l2cap_recv_acldata+0xf0/0x1f8 ) [ 1712.843322] [<bf28c738>] (l2cap_recv_acldata+0xf0/0x1f8 [l2cap]) from [<bf0094ac>] (hci_rx_task+0x) ... Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-10-12 12:44:52 -03:00
Johan Hedberg	80e2c88803	Bluetooth: Don't clear the blacklist when closing the HCI device Clearing the blacklist in hci_dev_do_close() would mean that user space needs to do extra work to re-block devices after a DEVDOWN-DEVUP cycle. This patch removes the clearing of the blacklist in this case and thereby saves user space from the extra work. Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com> Acked-by: Ville Tervo <ville.tervo@nokia.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-10-12 12:44:52 -03:00
Andrei Emeltchenko	5017d8dde1	Bluetooth: remove extra newline from debug output Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com> Acked-by: Ville Tervo <ville.tervo@nokia.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-10-12 12:44:51 -03:00
Mat Martineau	6fdf482bb3	Bluetooth: Use a stream-oriented recvmsg with SOCK_STREAM L2CAP sockets. L2CAP ERTM sockets can be opened with the SOCK_STREAM socket type, which is a mandatory request for ERTM mode. However, these sockets still have SOCK_SEQPACKET read semantics when bt_sock_recvmsg() is used to pull data from the receive queue. If the application is only reading part of a frame, then the unread portion of the frame is discarded. If the application requests more bytes than are in the current frame, only the current frame's data is returned. This patch utilizes common code derived from RFCOMM's recvmsg() function to make L2CAP SOCK_STREAM reads behave like RFCOMM reads (and other SOCK_STREAM sockets in general). The application may read one byte at a time from the input stream and not lose any data, and may also read across L2CAP frame boundaries. Signed-off-by: Mat Martineau <mathewm@codeaurora.org> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-10-12 12:44:51 -03:00
Mat Martineau	3d7d01dffe	Bluetooth: Use common SOCK_STREAM receive code in RFCOMM To reduce code duplication, have rfcomm_sock_recvmsg() call bt_sock_stream_recvmsg(). The common bt_sock_stream_recvmsg() code is nearly identical, with the RFCOMM-specific functionality for deferred setup and connection unthrottling left in rfcomm_sock_recvmsg(). Signed-off-by: Mat Martineau <mathewm@codeaurora.org> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-10-12 12:44:51 -03:00
Mat Martineau	796c86eec8	Bluetooth: Add common code for stream-oriented recvmsg() This commit adds a bt_sock_stream_recvmsg() function for use by any Bluetooth code that uses SOCK_STREAM sockets. This code is copied from rfcomm_sock_recvmsg() with minimal modifications to remove RFCOMM-specific functionality and improve readability. L2CAP (with the SOCK_STREAM socket type) and RFCOMM have common needs when it comes to reading data. Proper stream read semantics require that applications can read from a stream one byte at a time and not lose any data. The RFCOMM code already operated on and pulled data from the underlying L2CAP socket, so very few changes were required to make the code more generic for use with non-RFCOMM data over L2CAP. Applications that need more awareness of L2CAP frame boundaries are still free to use SOCK_SEQPACKET sockets, and may verify that they connection did not fall back to basic mode by calling getsockopt(). Signed-off-by: Mat Martineau <mathewm@codeaurora.org> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-10-12 12:44:51 -03:00
Mat Martineau	0fba2558cb	Bluetooth: Validate PSM values in calls to connect() and bind() Valid L2CAP PSMs are odd numbers, and the least significant bit of the most significant byte must be 0. Signed-off-by: Mat Martineau <mathewm@codeaurora.org> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-10-12 12:44:51 -03:00
Yuri Kululin	08601469a5	Bluetooth: Fix RFCOMM RPN negotiation According to the ETSI 3GPP TS 07.10 the default bit rate value for RFCOMM is 9600 bit/s. Return this bit rate in case of RPN request and accept other sane bit rates proposed by the sender in RPM command. Signed-off-by: Yuri Kululin <ext-yuri.kululin@nokia.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-10-12 12:44:51 -03:00
David Vrabel	8f1e174223	Bluetooth: HCI devices are either BR/EDR or AMP radios HCI transport drivers may not know what type of radio an AMP device has so only say whether they're BR/EDR or AMP devices. Signed-off-by: David Vrabel <david.vrabel@csr.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-10-12 12:44:51 -03:00
Gerrit Renker	2f34b32977	dccp: cosmetics - warning format This omits the redundant "DCCP:" in warning messages, since DCCP_WARN() already echoes the function name, avoiding messages like kernel: [10988.766503] dccp_close: DCCP: ABORT -- 209 bytes unread Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2010-10-12 06:57:43 +02:00
Gerrit Renker	ecdfbdabbe	dccp: schedule an Ack when receiving timestamps This schedules an Ack when receiving a timestamp, exploiting the existing inet_csk_schedule_ack() function, saving one case in the `dccp_ack_pending()' function. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2010-10-12 06:57:43 +02:00
Ivo Calado	d196c9a5d4	dccp: generalise data-loss condition This patch generalises the task of determining data loss from RFC 4340, 7.7.1. Let S_A, S_B be sequence numbers such that S_B is "after" S_A, and let N_B be the NDP count of packet S_B. Then, using modulo-2^48 arithmetic, D = S_B - S_A - 1 is an upper bound of the number of lost data packets, D - N_B is an approximation of the number of lost data packets (there are cases where this is not exact). The patch implements this as dccp_loss_count(S_A, S_B, N_B) := max(S_B - S_A - 1 - N_B, 0) Signed-off-by: Ivo Calado <ivocalado@embedded.ufcg.edu.br> Signed-off-by: Erivaldo Xavier <desadoc@gmail.com> Signed-off-by: Leandro Sales <leandroal@gmail.com> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2010-10-12 06:57:42 +02:00
Gerrit Renker	baf9e782e1	dccp: remove unused argument in CCID tx function This removes the argument `more' from ccid_hc_tx_packet_sent, since it was nowhere used in the entire code. (Btw, this argument was not even used in the original KAME code where the function initially came from; compare the variable moreToSend in the freebsd61-dccp-kame-28.08.2006.patch kept by Emmanuel Lochin.) Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2010-10-12 06:57:41 +02:00
Gerrit Renker	93344af44c	dccp: merge now-reduced connect_init() function After moving the assignment of GAR/ISS from dccp_connect_init() to dccp_transmit_skb(), the former function becomes very small, so that a merger with dccp_connect() suggests itself. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2010-10-12 06:57:40 +02:00
Gerrit Renker	0b53d4604a	dccp: fix the adjustments to AWL and SWL This fixes a problem and a potential loophole with regard to seqno/ackno validity: currently the initial adjustments to AWL/SWL are only performed once at the begin of the connection, during the handshake. Since the Sequence Window feature is always greater than Wmin=32 (7.5.2), it is however necessary to perform these adjustments at least for the first W/W' (variables as per 7.5.1) packets in the lifetime of a connection. This requirement is complicated by the fact that W/W' can change at any time during the lifetime of a connection. Therefore it is better to perform that safety check each time SWL/AWL are updated, as implemented by the patch. A second problem solved by this patch is that the remote/local Sequence Window feature values (which set the bounds for AWL/SWL/SWH) are undefined until the feature negotiation has completed. During the initial handshake we have more stringent sequence number protection; the changes added by this patch effect that {A,S}W{L,H} are within the correct bounds at the instant that feature negotiation completes (since the SeqWin feature activation handlers call dccp_update_gsr/gss()). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2010-10-12 06:57:40 +02:00
Pavel Emelyanov	70dc78da2c	sunrpc: Use helper to set v4 mapped addr in ip_map_parse Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-11 20:00:17 -04:00
NeilBrown	e33534d54f	sunrpc/cache: centralise handling of size limit on deferred list. We limit the number of 'defer' requests to DFR_MAX. The imposition of this limit is spread about a bit - sometime we don't add new things to the list, sometimes we remove old things. Also it is currently applied to requests which we are 'waiting' for rather than 'deferring'. This doesn't seem ideal as 'waiting' requests are naturally limited by the number of threads. So gather the DFR_MAX handling code to one place and only apply it to requests that are actually being deferred. This means that not all 'cache_deferred_req' structures go on the 'cache_defer_list, so we need to be careful when adding and removing things. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-11 19:30:28 -04:00
NeilBrown	d29068c431	sunrpc: Simplify cache_defer_req and related functions. The return value from cache_defer_req is somewhat confusing. Various different error codes are returned, but the single caller is only interested in success or failure. In fact it can measure this success or failure itself by checking CACHE_PENDING, which makes the point of the code more explicit. So change cache_defer_req to return 'void' and test CACHE_PENDING after it completes, to see if the request was actually deferred or not. Similarly setup_deferral and cache_wait_req don't need a return value, so make them void and remove some code. The call to cache_revisit_request (to guard against a race) is only needed for the second call to setup_deferral, so move it out of setup_deferral to after that second call. With the first call the race is handled differently (by explicitly calling 'wait_for_completion'). Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-11 19:30:27 -04:00
Eric Dumazet	fc66f95c68	net dst: use a percpu_counter to track entries struct dst_ops tracks number of allocated dst in an atomic_t field, subject to high cache line contention in stress workload. Switch to a percpu_counter, to reduce number of time we need to dirty a central location. Place it on a separate cache line to avoid dirtying read only fields. Stress test : (Sending 160.000.000 UDP frames, IP route cache disabled, dual E5540 @2.53GHz, 32bit kernel, FIB_TRIE, SLUB/NUMA) Before: real 0m51.179s user 0m15.329s sys 10m15.942s After: real 0m45.570s user 0m15.525s sys 9m56.669s With a small reordering of struct neighbour fields, subject of a following patch, (to separate refcnt from other read mostly fields) real 0m41.841s user 0m15.261s sys 8m45.949s Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-11 13:06:53 -07:00
Eric Dumazet	0ed8ddf404	neigh: Protect neigh->ha[] with a seqlock Add a seqlock in struct neighbour to protect neigh->ha[], and avoid dirtying neighbour in stress situation (many different flows / dsts) Dirtying takes place because of read_lock(&n->lock) and n->used writes. Switching to a seqlock, and writing n->used only on jiffies changes permits less dirtying. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-11 12:54:04 -07:00
David S. Miller	d122179a3c	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/core/ethtool.c	2010-10-11 12:30:34 -07:00
Kees Cook	b00916b189	net: clear heap allocations for privileged ethtool actions Several other ethtool functions leave heap uncleared (potentially) by drivers. Some interfaces appear safe (eeprom, etc), in that the sizes are well controlled. In some situations (e.g. unchecked error conditions), the heap will remain unchanged in areas before copying back to userspace. Note that these are less of an issue since these all require CAP_NET_ADMIN. Cc: stable@kernel.org Signed-off-by: Kees Cook <kees.cook@canonical.com> Acked-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-11 12:23:25 -07:00
Björn Smedman	15d46f38df	mac80211: minstrel_ht A-MPDU fix This patch fixes two problems with the minstrel_ht rate control algorithms handling of A-MPDU frames: 1. The ampdu_len field of the tx status is not always initialized for non-HT frames (and it would probably be unreasonable to require all drivers to do so). This could cause rate control statistics to be corrupted. We now trust the ampdu_len and ampdu_ack_len fields only when the frame is marked with the IEEE80211_TX_STAT_AMPDU flag. 2. Successful transmission attempts where only recognized when the A-MPDU subframe carrying the rate control status information was marked with the IEEE80211_TX_STAT_ACK flag. If this information happed to be carried on a frame that failed to be ACKed then the other subframes (which may have succeeded) where not correctly registered. We now update rate control statistics regardless of whether the subframe carrying the information was ACKed or not. Cc: <stable@kernel.org> Signed-off-by: Björn Smedman <bjorn.smedman@venatech.se> Acked-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-11 15:04:24 -04:00
Johannes Berg	730bd83b03	mac80211: don't kmalloc 16 bytes Since this small buffer isn't used for DMA, we can simply allocate it on the stack, it just needs to be 16 bytes of which only 8 will be used for WEP40 keys. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-11 15:04:23 -04:00
Felix Fietkau	8610c29a2c	cfg80211: add channel utilization stats to the survey command Using these, user space can calculate a relative channel utilization with arbitrary intervals by regularly taking snapshots of the survey results. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-11 15:04:20 -04:00
Christian Lamparter	15943a72c7	mac80211: temporarily disable reorder release timer Several serve threading problems in the current release reorder timer implementation have been discovered. A lengthy discussion - which lists some of the pitfalls and possible solutions - can be found at: http://marc.info/?t=128635927000001 But due to the complicated nature of the subject and the imminent advent of a new -rc cycle, it was decided to disable the feature for the time being. Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-11 15:04:20 -04:00
Christian Lamparter	d12c74528e	mac80211: fix possible null-pointer de-reference This patch not only fixes a null-pointer de-reference that would be triggered by a PLINK_OPEN frame with mis- matching/incompatible mesh configuration, but also responds correctly to non-compatible PLINK_OPEN frames by generating a PLINK_CLOSE with the right reason code. The original bug was detected by smatch. ( http://repo.or.cz/w/smatch.git ) net/mac80211/mesh_plink.c +574 mesh_rx_plink_frame(168) error: we previously assumed 'sta' could be null. Cc: <stable@kernel.org> Reviewed-and-Tested-by: Steve deRosier <steve@cozybit.com> Reviewed-and-Tested-by: Javier Cardona <javier@cozybit.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-11 15:04:20 -04:00
Ben Greear	5a5c731aa5	wireless: Set some stats used by /proc/net/wireless (wext) Some stats for /proc/net/wireless (and wext in general) are not being set. This patch addresses a few of those with values easily obtained from mac80211 core. Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-11 15:04:19 -04:00
Ben Greear	b38afa8769	mac80211: Improve mlme probe response log messages. Old messages didn't mention the device in question. Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-11 15:04:19 -04:00
Johannes Berg	7623225f90	Revert "wireless: Use first phyX name available when registering phy devices." This reverts commit `5a254ffe3f`. The commit failed to take into account that allocated wireless devices (wiphys) are not added into the device list upon allocation, but only when they are registered. Therefore, it opened up a race between allocating and registering a name, so that if two processes allocate and register concurrently ("alloc, alloc, register, register" rather than "alloc, register, alloc, register") the code will attempt to use the same name twice. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2010-10-11 14:46:52 -04:00
Jiri Slaby	5518b29f22	ATM: mpc, fix use after free Stanse found that mpc_push frees skb and then it dereferences it. It is a typo, new_skb should be dereferenced there. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-11 11:05:42 -07:00
Eric Dumazet	34d101dd62	neigh: speedup neigh_hh_init() When a new dst is used to send a frame, neigh_resolve_output() tries to associate an struct hh_cache to this dst, calling neigh_hh_init() with the neigh rwlock write locked. Most of the time, hh_cache is already known and linked into neighbour, so we find it and increment its refcount. This patch changes the logic so that we call neigh_hh_init() with neighbour lock read locked only, so that fast path can be run in parallel by concurrent cpus. This brings part of the speedup we got with commit `c7d4426a98` (introduce DST_NOCACHE flag) for non cached dsts, even for cached ones, removing one of the contention point that routers hit on multiqueue enabled machines. Further improvements would need to use a seqlock instead of an rwlock to protect neigh->ha[], to not dirty neigh too often and remove two atomic ops. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-11 09:16:57 -07:00
Samuel Ortiz	37f9fc452d	irda: Fix heap memory corruption in iriap.c While parsing the GetValuebyClass command frame, we could potentially write passed the skb->data pointer. Cc: stable@kernel.org Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com> Signed-off-by: Samuel Ortiz <samuel@sortiz.org>	2010-10-11 02:12:26 +02:00
Samuel Ortiz	efc463eb50	irda: Fix parameter extraction stack overflow Cc: stable@kernel.org Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com> Signed-off-by: Samuel Ortiz <samuel@sortiz.org>	2010-10-11 02:12:17 +02:00
Samuel Ortiz	f8cba16cad	irda: Remove BKL instances from irnet The code intends to lock the irnet_socket, so adding a mutex to it allows for a complet BKL removal. Signed-off-by: Samuel Ortiz <samuel@sortiz.org>	2010-10-11 02:11:34 +02:00
Samuel Ortiz	5b40964ead	irda: Remove BKL instances from af_irda.c Most of the times, lock_kernel() was pointless or could simply be replaced by lock_sock(). Signed-off-by: Samuel Ortiz <samuel@sortiz.org>	2010-10-11 02:11:23 +02:00
Changli Gao	e18434c457	net_sched: use __TCA_HTB_MAX and TCA_HTB_MAX Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-09 09:22:53 -07:00
Tom Herbert	4315d834c1	net: Fix rxq ref counting The rx->count reference is used to track reference counts to the number of rx-queue kobjects created for the device. This patch eliminates initialization of the counter in netif_alloc_rx_queues and instead increments the counter each time a kobject is created. This is now symmetric with the decrement that is done when an object is released. Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-08 14:34:32 -07:00
Rémi Denis-Courmont	a131d82266	Phonet: mark the pipe controller as EXPERIMENTAL There are a bunch of issues that need to be fixed, including: - GFP_KERNEL allocations from atomic context (and GFP_ATOMIC in process context), - abuse of the setsockopt() call convention, - unprotected/unlocked static variables... IMHO, we will need to alter the userspace ABI when we fix it. So mark the configuration option as EXPERIMENTAL for the time being (or should it be BROKEN instead?). Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-08 14:09:10 -07:00
Rémi Denis-Courmont	03789f2672	Phonet: cleanup pipe enable socket option The current code works like this: int garbage, status; socklen_t len = sizeof(status); /* enable pipe / setsockopt(fd, SOL_PNPIPE, PNPIPE_ENABLE, &garbage, sizeof(garbage)); / disable pipe / setsockopt(fd, SOL_PNPIPE, PNPIPE_DISABLE, &garbage, sizeof(garbage)); / get status / getsockopt(fd, SOL_PNPIPE, PNPIPE_INQ, &status, &len); ...which does not follow the usual socket option pattern. This patch merges all three "options" into a single gettable&settable option, before Linux 2.6.37 gets out: int status; socklen_t len = sizeof(status); / enable pipe / status = 1; setsockopt(fd, SOL_PNPIPE, PNPIPE_ENABLE, &status, sizeof(status)); / disable pipe / status = 0; setsockopt(fd, SOL_PNPIPE, PNPIPE_ENABLE, &status, sizeof(status)); / get status */ getsockopt(fd, SOL_PNPIPE, PNPIPE_ENABLE, &status, &len); This also fixes the error code from EFAULT to ENOTCONN. Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Cc: Kumar Sanghvi <kumar.sanghvi@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-08 14:09:10 -07:00
Rémi Denis-Courmont	6d8e74ed37	Phonet: advise against enabling the pipe controller As it currently is, the new code path is not compatible with existing Nokia modems. This would break existing userspace for Nokia modem, such as the existing oFono ISI driver. Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-08 14:09:09 -07:00
David S. Miller	9cf8d1a3b8	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6	2010-10-08 13:51:11 -07:00
John W. Linville	e9a68707d7	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem Conflicts: Documentation/feature-removal-schedule.txt drivers/net/wireless/ipw2x00/ipw2200.c	2010-10-08 15:39:28 -04:00
Dimitris Michailidis	8391d07b80	ipv4: Remove leftover rcu_read_unlock calls from __mkroute_output() Commit "fib: RCU conversion of fib_lookup()" removed rcu_read_lock() from __mkroute_output but left a couple of calls to rcu_read_unlock() in there. This causes lockdep to complain that the rcu_read_unlock() call in __ip_route_output_key causes a lock inbalance and quickly crashes the kernel. The below fixes this for me. Signed-off-by: Dimitris Michailidis <dm@chelsio.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-08 10:51:08 -07:00
Kees Cook	ae6df5f96a	net: clear heap allocation for ETHTOOL_GRXCLSRLALL Calling ETHTOOL_GRXCLSRLALL with a large rule_cnt will allocate kernel heap without clearing it. For the one driver (niu) that implements it, it will leave the unused portion of heap unchanged and copy the full contents back to userspace. Signed-off-by: Kees Cook <kees.cook@canonical.com> Acked-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-08 10:48:28 -07:00
David S. Miller	94b105723a	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6	2010-10-08 10:36:51 -07:00
Ben Hutchings	4e7f79511e	net: Update kernel-doc for netif_set_real_num_rx_queues() Synchronise the comment with the preceding implementation change. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-08 10:33:39 -07:00
Ingo Molnar	7cd2541cf2	Merge commit 'v2.6.36-rc7' into perf/core Conflicts: arch/x86/kernel/module.c Merge reason: Resolve the conflict, pick up fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-08 10:46:27 +02:00
Johannes Berg	388ac775be	cfg80211: constify WDS address There's no need for the WDS peer address to not be const, so make it const. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-07 14:41:28 -04:00
Johannes Berg	43b19952de	nl80211: use new genl helpers for WDS Bill Jordan's patch to allow setting the WDS peer crossed with my patch removing all the boilerplate code in nl80211, and consequently he didn't make use of it yet. Fix that. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-07 14:41:27 -04:00
Johannes Berg	7b99a7c2da	mac80211: fix sw scan locking The recent scan overhaul broke locking because now we can jump to code that attempts to unlock, while we don't have the mutex held. Fix this by holding the mutex around all the relevant code. Reported-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-07 14:41:27 -04:00
John W. Linville	7573eac762	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6	2010-10-07 14:39:03 -04:00
Paul E. McKenney	1144182a87	net: suppress RCU lockdep false positive in sock_update_classid > =================================================== > [ INFO: suspicious rcu_dereference_check() usage. ] > --------------------------------------------------- > include/linux/cgroup.h:542 invoked rcu_dereference_check() without protection! > > other info that might help us debug this: > > > rcu_scheduler_active = 1, debug_locks = 0 > 1 lock held by swapper/1: > #0: (net_mutex){+.+.+.}, at: [<ffffffff813e9010>] > register_pernet_subsys+0x1f/0x47 > > stack backtrace: > Pid: 1, comm: swapper Not tainted 2.6.35.4-28.fc14.x86_64 #1 > Call Trace: > [<ffffffff8107bd3a>] lockdep_rcu_dereference+0xaa/0xb3 > [<ffffffff813e04b9>] sock_update_classid+0x7c/0xa2 > [<ffffffff813e054a>] sk_alloc+0x6b/0x77 > [<ffffffff8140b281>] __netlink_create+0x37/0xab > [<ffffffff813f941c>] ? rtnetlink_rcv+0x0/0x2d > [<ffffffff8140cee1>] netlink_kernel_create+0x74/0x19d > [<ffffffff8149c3ca>] ? __mutex_lock_common+0x339/0x35b > [<ffffffff813f7e9c>] rtnetlink_net_init+0x2e/0x48 > [<ffffffff813e8d7a>] ops_init+0xe9/0xff > [<ffffffff813e8f0d>] register_pernet_operations+0xab/0x130 > [<ffffffff813e901f>] register_pernet_subsys+0x2e/0x47 > [<ffffffff81db7bca>] rtnetlink_init+0x53/0x102 > [<ffffffff81db835c>] netlink_proto_init+0x126/0x143 > [<ffffffff81db8236>] ? netlink_proto_init+0x0/0x143 > [<ffffffff810021b8>] do_one_initcall+0x72/0x186 > [<ffffffff81d78ebc>] kernel_init+0x23b/0x2c9 > [<ffffffff8100aae4>] kernel_thread_helper+0x4/0x10 > [<ffffffff8149e2d0>] ? restore_args+0x0/0x30 > [<ffffffff81d78c81>] ? kernel_init+0x0/0x2c9 > [<ffffffff8100aae0>] ? kernel_thread_helper+0x0/0x10 The sock_update_classid() function calls task_cls_classid(current), but the calling task cannot go away, so there is no danger of the associated structures disappearing. Insert an RCU read-side critical section to suppress the false positive. Reported-by: Subrata Modak <subrata@linux.vnet.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2010-10-07 10:02:28 -07:00
John W. Linville	4efe7f51be	Revert "mac80211: use netif_receive_skb in ieee80211_tx_status callpath" This reverts commit `5ed3bc7288`. It turns-out that not all drivers are calling ieee80211_tx_status from a compatible context. Revert this for now and try again later... Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-07 11:35:40 -04:00
David S. Miller	fb3dbece26	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/padovan/bluetooth-2.6	2010-10-07 00:59:39 -07:00
Ingo Molnar	556ef63255	Merge commit 'v2.6.36-rc7' into core/rcu Merge reason: Update from -rc3 to -rc7. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-07 09:43:45 +02:00
Ingo Molnar	d4f8f217b8	Merge branch 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu into core/rcu	2010-10-07 09:43:11 +02:00
John Fastabend	3d3211ef5c	net: netif_set_real_num_rx_queues may cap num_rx_queues at init time Do not set num_rx_queues in netif_set_real_num_rx_queues() some drivers will increase the real_num_rx_queues later due to a feature changes or available interrupts increasing. By setting num_rx_queues here this ends up creating a cap on the number of rx queues available. For example the ixgbe driver sets the max number of queues it intends to use ever then sets the current number in use with the netif_set_num_{rx\|tx}_queues calls. With the current implementation the number of rx queues gets limited so when a feature such as DCB or FCoE is enabled the queues are no longer available. kobjects will only be allocated for real_num_rx_queues so the waste in memory is minimal. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-06 23:35:15 -07:00
stephen hemminger	1f4f0f645c	dccp: Kill dead code and add static markers. Remove dead code and make some functions static. Compile tested only. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-06 23:12:07 -07:00
John Heffner	9c6d5e5537	TCP: Fix setting of snd_ssthresh in tcp_mtu_probe_success This looks like a simple typo that has gone unnoticed for some time. The impact is relatively low but it's clearly wrong. Signed-off-by: John Heffner <johnwheffner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-06 21:18:02 -07:00
David S. Miller	69259abb64	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/pcmcia/pcnet_cs.c net/caif/caif_socket.c	2010-10-06 19:39:31 -07:00
David S. Miller	12e94471b2	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6	2010-10-06 19:11:17 -07:00
Eric Dumazet	767e97e1e0	neigh: RCU conversion of struct neighbour This is the second step for neighbour RCU conversion. (first was commit `d6bf7817` : RCU conversion of neigh hash table) neigh_lookup() becomes lockless, but still take a reference on found neighbour. (no more read_lock()/read_unlock() on tbl->lock) struct neighbour gets an additional rcu_head field and is freed after an RCU grace period. Future work would need to eventually not take a reference on neighbour for temporary dst (DST_NOCACHE), but this would need dst->_neighbour to use a noref bit like we did for skb->_dst. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-06 18:01:33 -07:00
John W. Linville	494486f8fd	mac80211: avoid uninitialized var warning in ieee80211_scan_cancel net/mac80211/scan.c: In function ‘ieee80211_scan_cancel’: net/mac80211/scan.c:794: warning: ‘finish’ may be used uninitialized in this function Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-06 16:40:40 -04:00
Johannes Berg	3207390a8b	cfg80211: fix BSS double-unlinking When multiple interfaces are actively trying to associate with the same BSS, they may both find that the BSS isn't there and then try to unlink it. This can cause errors since the unlinking code can't currently deal with items that have already been unlinked. Normally this doesn't happen as most people don't try to use multiple station interfaces that associate at the same time too. Fix this by using the list entry as a flag to see if the item is still on a list. Cc: stable@kernel.org Reported-by: Ben Greear <greearb@candelatech.com> Tested-by: Hun-Kyi Wynn <hkwynn@candelatech.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-06 16:30:43 -04:00
Bruno Randolf	b206b4ef06	nl80211/mac80211: Add retry and failed transmission count to station info This information is already available in mac80211, we just need to export it via cfg80211 and nl80211. Signed-off-by: Bruno Randolf <br1@einfach.org> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-06 16:30:43 -04:00
Stanislaw Gruszka	3aed49ef17	mac80211: compete scan to cfg80211 if deferred scan fail to start We nulify local->scan_req on failure in __ieee80211_start_scan, so __ieee80211_scan_completed will not call cfg80211_scan_done. Fix that. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-06 16:30:42 -04:00
Stanislaw Gruszka	6eb11a9a31	mac80211: do not requeue scan work when not needed When performing hw scan and not abort it, __ieee80211_scan_completed() is currently called from scan work, so does not need to reschedule work to call drv_hw_scan(). Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-06 16:30:42 -04:00
Stanislaw Gruszka	4136c4224c	mac80211: assure we also cancel deferred scan request This is partial revert and fix for commit `85f72bc839` "mac80211: only cancel software-based scans on suspend" When cfg80211 request the scan and mac80211 perform some management work, we defer the scan request. We do not canceling such requests when calling ieee80211_scan_cancel(), because of SCAN_SW_SCANNING bit check just before the call. So fix that problem. Another problem, which commit `85f72bc839` tries to solve, is we can not cancel HW scan. Hence patch make ieee80211_scan_cancel() ignore HW scan (see code comments). Keeping local->mtx lock assures that the deferred scan will not become "working" HW scan. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-06 16:30:42 -04:00
Stanislaw Gruszka	e229f844d7	mac80211: keep lock when calling __ieee80211_scan_completed() We are taking local->mtx inside __ieee80211_scan_completed(), but just before call to that function we drop the lock. Dropping/taking lock is not good, because can lead to hard to understand race conditions. Patch split scan_completed() code into two functions, first must be called with local->mtx taken and second without it. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-06 16:30:42 -04:00
Stanislaw Gruszka	259b62e35b	mac80211: reduce number of __ieee80211_scan_completed calls Use goto instruction to call __ieee80211_scan_completed only ones in ieee80211_scan_work. This is prepare for the next patch. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-06 16:30:41 -04:00
Johannes Berg	d537f5fdfc	nl80211: fix error in generic netif_running check Yikes! The error return keeps a netdev reference and the rdev mutex locked, fix that! Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-06 16:30:41 -04:00
Johannes Berg	e31b82136d	cfg80211/mac80211: allow per-station GTKs This adds API to allow adding per-station GTKs, updates mac80211 to support it, and also allows drivers to remove a key from hwaccel again when this may be necessary due to multiple GTKs. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-06 16:30:40 -04:00
Johannes Berg	53f73c09d6	mac80211: avoid transmitting delBA to old AP When roaming while we have active BA session, we can end up transmitting delBA frames to the old AP while we're already on the new AP's channel, which can cause warnings. Simply avoid sending those frames, but still tear down the internal session state, since they are not really necessary anyway as we will implicitly disassociate when sending the association to the new AP. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-06 16:30:40 -04:00
John W. Linville	373426cac0	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6	2010-10-06 16:25:52 -04:00
Johannes Berg	44271488b9	mac80211: delete AddBA response timer We never delete the addBA response timer, which is typically fine, but if the station it belongs to is deleted very quickly after starting the BA session, before the peer had a chance to reply, the timer may fire after the station struct has been freed already. Therefore, we need to delete the timer in a suitable spot -- best when the session is being stopped (which will happen even then) in which case the delete will be a no-op most of the time. I've reproduced the scenario and tested the fix. This fixes the crash reported at http://mid.gmane.org/4CAB6F96.6090701@candelatech.com Cc: stable@kernel.org Reported-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-06 15:58:29 -04:00
Eric Dumazet	ebc0ffae5d	fib: RCU conversion of fib_lookup() fib_lookup() converted to be called in RCU protected context, no reference taken and released on a contended cache line (fib_clntref) fib_table_lookup() and fib_semantic_match() get an additional parameter. struct fib_info gets an rcu_head field, and is freed after an rcu grace period. Stress test : (Sending 160.000.000 UDP frames on same neighbour, IP route cache disabled, dual E5540 @2.53GHz, 32bit kernel, FIB_HASH) (about same results for FIB_TRIE) Before patch : real 1m31.199s user 0m13.761s sys 23m24.780s After patch: real 1m5.375s user 0m14.997s sys 15m50.115s Before patch Profile : 13044.00 15.4% __ip_route_output_key vmlinux 8438.00 10.0% dst_destroy vmlinux 5983.00 7.1% fib_semantic_match vmlinux 5410.00 6.4% fib_rules_lookup vmlinux 4803.00 5.7% neigh_lookup vmlinux 4420.00 5.2% _raw_spin_lock vmlinux 3883.00 4.6% rt_set_nexthop vmlinux 3261.00 3.9% _raw_read_lock vmlinux 2794.00 3.3% fib_table_lookup vmlinux 2374.00 2.8% neigh_resolve_output vmlinux 2153.00 2.5% dst_alloc vmlinux 1502.00 1.8% _raw_read_lock_bh vmlinux 1484.00 1.8% kmem_cache_alloc vmlinux 1407.00 1.7% eth_header vmlinux 1406.00 1.7% ipv4_dst_destroy vmlinux 1298.00 1.5% __copy_from_user_ll vmlinux 1174.00 1.4% dev_queue_xmit vmlinux 1000.00 1.2% ip_output vmlinux After patch Profile : 13712.00 15.8% dst_destroy vmlinux 8548.00 9.9% __ip_route_output_key vmlinux 7017.00 8.1% neigh_lookup vmlinux 4554.00 5.3% fib_semantic_match vmlinux 4067.00 4.7% _raw_read_lock vmlinux 3491.00 4.0% dst_alloc vmlinux 3186.00 3.7% neigh_resolve_output vmlinux 3103.00 3.6% fib_table_lookup vmlinux 2098.00 2.4% _raw_read_lock_bh vmlinux 2081.00 2.4% kmem_cache_alloc vmlinux 2013.00 2.3% _raw_spin_lock vmlinux 1763.00 2.0% __copy_from_user_ll vmlinux 1763.00 2.0% ip_output vmlinux 1761.00 2.0% ipv4_dst_destroy vmlinux 1631.00 1.9% eth_header vmlinux 1440.00 1.7% _raw_read_unlock_bh vmlinux Reference results, if IP route cache is enabled : real 0m29.718s user 0m10.845s sys 7m37.341s 25213.00 29.5% __ip_route_output_key vmlinux 9011.00 10.5% dst_release vmlinux 4817.00 5.6% ip_push_pending_frames vmlinux 4232.00 5.0% ip_finish_output vmlinux 3940.00 4.6% udp_sendmsg vmlinux 3730.00 4.4% __copy_from_user_ll vmlinux 3716.00 4.4% ip_route_output_flow vmlinux 2451.00 2.9% __xfrm_lookup vmlinux 2221.00 2.6% ip_append_data vmlinux 1718.00 2.0% _raw_spin_lock_bh vmlinux 1655.00 1.9% __alloc_skb vmlinux 1572.00 1.8% sock_wfree vmlinux 1345.00 1.6% kfree vmlinux Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-05 20:39:38 -07:00
Eric Dumazet	79315068f4	caif: fix two caif_connect() bugs caif_connect() might dereference a netdevice after dev_put() it. It also doesnt check dev_get_by_index() return value and could dereference a NULL pointer. Fix it, using RCU to avoid taking a reference. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Sjur Braendeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-05 20:35:53 -07:00
Flavio Leitner	e12b453904	bonding: fix to rejoin multicast groups immediately The IGMP specs states that if the system receives a membership report, it shouldn't send another for the next minute. However, if a link failure happens right after that, the backup slave and the switch connected to this slave will not know about the multicast and the traffic will hang for about a minute. This patch fixes it to rejoin multicast groups immediately after a failover restoring the multicast traffic. Signed-off-by: Flavio Leitner <fleitner@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-05 20:26:57 -07:00
Alban Crequy	3f66116e89	AF_UNIX: Implement SO_TIMESTAMP and SO_TIMETAMPNS on Unix sockets Userspace applications can already request to receive timestamps with: setsockopt(sockfd, SOL_SOCKET, SO_TIMESTAMP, ...) Although setsockopt() returns zero (success), timestamps are not added to the ancillary data. This patch fixes that on SOCK_DGRAM and SOCK_SEQPACKET Unix sockets. Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-05 14:54:36 -07:00
Eric Dumazet	d6bf781712	net neigh: RCU conversion of neigh hash table David This is the first step for RCU conversion of neigh code. Next patches will convert hash_buckets[] and "struct neighbour" to RCU protected objects. Thanks [PATCH net-next] net neigh: RCU conversion of neigh hash table Instead of storing hash_buckets, hash_mask and hash_rnd in "struct neigh_table", a new structure is defined : struct neigh_hash_table { struct neighbour *hash_buckets; unsigned int hash_mask; __u32 hash_rnd; struct rcu_head rcu; }; And "struct neigh_table" has an RCU protected pointer to such a neigh_hash_table. This means the signature of (hash)() function changed: We need to add a third parameter with the actual hash_rnd value, since this is not anymore a neigh_table field. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-05 14:54:36 -07:00
Eric Dumazet	110b249937	net neigh: neigh_delete() and neigh_add() changes neigh_delete() and neigh_add() dont need to touch device refcount, we hold RTNL when calling them, so device cannot disappear under us. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-05 14:54:35 -07:00
Eric Dumazet	caf586e5f2	net: add a core netdev->rx_dropped counter In various situations, a device provides a packet to our stack and we drop it before it enters protocol stack : - softnet backlog full (accounted in /proc/net/softnet_stat) - bad vlan tag (not accounted) - unknown/unregistered protocol (not accounted) We can handle a per-device counter of such dropped frames at core level, and automatically adds it to the device provided stats (rx_dropped), so that standard tools can be used (ifconfig, ip link, cat /proc/net/dev) This is a generalization of commit `8990f468a` (net: rx_dropped accounting), thus reverting it. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-05 14:47:55 -07:00
Luis R. Rodriguez	e7480bbb92	mac80211: fix channel assumption for association done work Be consistent and use the wk->chan instead of the local->hw.conf.channel for the association done work. This prevents any possible races against channel changes while we run this work. In the case that the race did happen we would be initializing the bit rates for the new AP under the assumption of a wrong channel and in the worst case, wrong band. This could lead to trying to assuming we could use CCK frames on 5 GHz, for example. This patch has a fix for kernels >= v2.6.34 Cc: stable@kernel.org Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-05 13:37:52 -04:00
Johannes Berg	025e6be220	mac80211: fix deadlock with multiple interfaces The locking around ieee80211_recalc_smps is buggy -- it cannot acquire another interface's mutex while the iflist mutex is held because another code path could be holding the iface mutex and trying to acquire the iflist mutex. But the locking is also unnecessary, we only check "ifmgd->associated" as a bool, and don't use the pointer (in check_mgd_smps). Reported-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-05 13:37:51 -04:00
Johannes Berg	6774889314	nl80211: reduce dumping boilerplate Consolidate boilerplate code needed for .dumpit calls operating on netdevs. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-05 13:37:51 -04:00
Johannes Berg	4126571481	nl80211: use generic check for netif_running Use a new flag that requires the netdev to be UP and use it to check instead of coding the check into all functions that require it. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-05 13:37:51 -04:00
Johannes Berg	4c47699106	nl80211: use the new genetlink pre/post_doit hooks This makes nl80211 use the new genetlink pre_doit/post_doit hooks for locking and checking the interface/wiphy index. This significantly reduces the code size and the likelihood of locking errors. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-05 13:37:51 -04:00
Johannes Berg	ff4c92d85c	genetlink: introduce pre_doit/post_doit hooks Each family may have some amount of boilerplate locking code that applies to most, or even all, commands. This allows a family to handle such things in a more generic way, by allowing it to a) include private flags in each operation b) specify a pre_doit hook that is called, before an operation's doit() callback and may return an error directly, c) specify a post_doit hook that can undo locking or similar things done by pre_doit, and finally d) include two private pointers in each info struct passed between all these operations including doit(). (It's two because I'll need two in nl80211 -- can be extended.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-05 13:35:30 -04:00
Bruno Randolf	9eba612549	mac80211: Add WME information element for IBSS Enable WME QoS in IBSS mode by adding a WME information element to beacons and probe respones and by checking for it and marking stations as WME capable if it is present. Signed-off-by: Bruno Randolf <br1@einfach.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-05 13:35:29 -04:00
Helmut Schaa	78be49ec2a	mac80211: distinct between max rates and the number of rates the hw can report Some drivers cannot handle multiple retry rates specified by the rc algorithm but instead use their own retry table (for example rt2800). However, if such a device registers itself with a max_rates value of 1 the rc algorithm cannot make use of the extended information the device can provide about retried rates. On the other hand, if a device registers itself with a max_rates value > 1 the rc algorithm assumes that the device can handle multi rate retries. Fix this issue by introducing another hw parameter max_report_rates that can be set to a different value then max_rates to indicate if a device is capable of reporting more rates then specified in max_rates. Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com> Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-05 13:35:28 -04:00
Bill Jordan	1be7fe8de9	mac80211: fix for WDS interfaces Initialize the rate table for WDS interfaces, and add cases to allow WDS packets to pass the xmit and receive tests. Signed-off-by: Bill Jordan <bjordan@rajant.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-05 13:35:25 -04:00
Bill Jordan	e8347ebad2	cfg80211: patches to allow setting the WDS peer Added a nl interface to set the peer bssid of a WDS interface. Signed-off-by: Bill Jordan <bjordan@rajant.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-05 13:35:24 -04:00
Juuso Oikarinen	d8ec44335c	mac80211: Add validity check for beacon_crc value On association to an AP, after receiving beacons, the beacon_crc value is set. The beacon_crc value is not reset in disassociation, but the BSS data may be expired at a later point. When associating again, it's possible that a beacon for the AP is not received, resulting in the beacon_ies to remain NULL. After association, further beacons will not update the beacon data, as the crc value of the beacon has not changed, and the beacon_crc still holds a value matching the beacon. The beacon_ies will remain forever null. One of the results of this is that WLAN power save cannot be entered, the STA will remain foreven in active mode. Fix this by adding a validation flag for the beacon_crc, which is cleared on association. Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-05 13:35:24 -04:00
Stanislaw Gruszka	bc86863de6	mac80211: perform scan cancel in hw reset work Move ieee80211_scan_cancel() and all other related code to ieee80211_restart_work() as ieee80211_restart_hw() is intended to be callable from any context. Fix a bug that RTNL lock is not taken during ieee80211_cancel_scan(). Take local->mtx before WARN(test_bit(SCAN_HW_SCANNING, &local->scanning) to prevent the race condition with __ieee80211_start_scan() described here: http://marc.info/?l=linux-wireless&m=128516716810537&w=2 Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-05 13:35:24 -04:00
Johannes Berg	2234362c42	cfg80211: fix locking Add missing unlocking of the wiphy in set_channel, and don't try to unlock a non-existing wiphy in set_cqm. Cc: stable@kernel.org [2.6.35+] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-05 13:35:23 -04:00
Johannes Berg	663fcafd97	cfg80211/mac80211: allow management frame TX in AP mode Enable management frame transmission and subscribing to management frames through nl80211 in both cfg80211 and mac80211. Also update a few places that I forgot to update for P2P-client mode previously, and fix a small bug with non-action frames in this API. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-05 13:35:23 -04:00
Felix Fietkau	17e5a80828	nl80211: allow drivers to indicate whether the survey data channel is in use Some user space applications only want to display survey data for the operating channel, however there is no API to get that yet. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-05 13:35:22 -04:00
Christian Lamparter	85416a4fa1	mac80211: fix rx monitor filter refcounters This patch fixes an refcounting bug. Previously it was possible to corrupt the per-device recv. filter and monitor management counters when: iw dev wlanX set monitor [new flags] was issued on an active monitor interface. Acked-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-05 13:35:21 -04:00
Ben Greear	5a254ffe3f	wireless: Use first phyX name available when registering phy devices. Choose first available phyX name when creating phy devices. This means that reloading a wifi driver will not cause a change in the name of it's phy device. Also, allow users to rename a phy to any un-used name, including phy%d. Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-10-05 13:35:21 -04:00
stephen hemminger	c61393ea83	ipv6: make __ipv6_isatap_ifid static Another exported symbol only used in one file Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-05 00:47:39 -07:00
stephen hemminger	1df9916e46	fib: fib_rules_cleanup can be static fib_rules_cleanup_ups is only defined and used in one place. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-05 00:47:39 -07:00
Eric Dumazet	6a31d2a97c	fib: cleanups Code style cleanups before upcoming functional changes. C99 initializer for fib_props array. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-05 00:47:38 -07:00
Nicolas Kaiser	ee624599d3	caif: remove duplicated include Remove duplicated include. Signed-off-by: Nicolas Kaiser <nikai@nikai.net> Acked-by: Sjur Braendeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-05 00:47:37 -07:00
Dan Carpenter	4e18b3edf7	cls_u32: signedness bug skb_headroom() is unsigned so "skb_headroom(skb) + toff" is also unsigned and can't be less than zero. This test was added in `66d50d25`: "u32: negative offset fix" It was supposed to fix a regression. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-05 00:40:39 -07:00
David S. Miller	13f5bf18ba	ipvs: Use frag walker helper in SCTP proto support. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Simon Horman <horms@verge.net.au>	2010-10-05 00:27:05 -07:00
Eric Dumazet	24824a09e3	net: dynamic ingress_queue allocation ingress being not used very much, and net_device->ingress_queue being quite a big object (128 or 256 bytes), use a dynamic allocation if needed (tc qdisc add dev eth0 ingress ...) dev_ingress_queue(dev) helper should be used only with RTNL taken. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-05 00:23:44 -07:00
Gustavo F. Padovan	eaa71b318c	Bluetooth: Disallow to change L2CAP_OPTIONS values when connected L2CAP doesn't permit change like MTU, FCS, TxWindow values while the connection is alive, we can only set that before the connection/configuration process. That can lead to bugs in the L2CAP operation. Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>	2010-10-04 19:28:52 -03:00
Changli Gao	f68c53015c	netfilter: unregister nf hooks, matches and targets in the reverse order Since we register nf hooks, matches and targets in order, we'd better unregister them in the reverse order. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-10-04 22:24:12 +02:00
Nicolas Kaiser	e55df53dd6	netfilter: remove duplicated include Remove duplicated include. Signed-off-by: Nicolas Kaiser <nikai@nikai.net> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-10-04 21:00:42 +02:00
David S. Miller	21a180cda0	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/ipv4/Kconfig net/ipv4/tcp_timer.c	2010-10-04 11:56:38 -07:00
Eric Dumazet	a8defca048	netfilter: ipt_LOG: add bufferisation to call printk() once ipt_LOG & ip6t_LOG use lot of calls to printk() and use a lock in a hope several cpus wont mix their output in syslog. printk() being very expensive [1], its better to call it once, on a prebuilt and complete line. Also, with mixed IPv4 and IPv6 trafic, separate IPv4/IPv6 locks dont avoid garbage. I used an allocation of a 1024 bytes structure, sort of seq_printf() but with a fixed size limit. Use a static buffer if dynamic allocation failed. Emit a once time alert if buffer size happens to be too short. [1]: printk() has various features like printk_delay()... Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-10-04 20:56:05 +02:00
Stephen Hemminger	0c200d9353	netfilter: nf_nat: make find/put static The functions nf_nat_proto_find_get and nf_nat_proto_put are only used internally in nf_nat_core. This might break some out of tree NAT module. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-10-04 20:53:18 +02:00
Linus Torvalds	4a73a43741	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: vlan: dont drop packets from unknown vlans in promiscuous mode Phonet: Correct header retrieval after pskb_may_pull um: Proper Fix for `f25c80a4`: remove duplicate structure field initialization ip_gre: Fix dependencies wrt. ipv6. net-2.6: SYN retransmits: Add new parameter to retransmits_timed_out() iwl3945: queue the right work if the scan needs to be aborted mac80211: fix use-after-free	2010-10-04 11:11:01 -07:00
Simon Horman	758ff03387	IPVS: sip persistence engine Add the SIP callid as a key for persistence. This allows multiple connections from the same IP address to be differentiated on the basis of the callid. When used in conjunction with the persistence mask, it allows connections from different IP addresses to be aggregated on the basis of the callid. It is envisaged that a persistence mask of 0.0.0.0 will be a useful setting. That is, ignore the source IP address when checking for persistence. It is envisaged that this option will be used in conjunction with one-packet scheduling. This only works with UDP and cannot be made to work with TCP within the current framework. Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>	2010-10-04 22:45:24 +09:00
Simon Horman	f71499aa11	IPVS: Fallback if persistence engine fails Fall back to normal persistence handling if the persistence engine fails to recognise a packet. This way, at least the packet will go somewhere. It is envisaged that iptables could be used to block packets such if this is not desired although nf_conntrack_sip would likely need to be enhanced first. Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>	2010-10-04 22:45:24 +09:00
Simon Horman	0d1e71b04a	IPVS: Allow configuration of persistence engines Allow the persistence engine of a virtual service to be set, edited and unset. This feature only works with the netlink user-space interface. Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>	2010-10-04 22:45:24 +09:00
Simon Horman	8be67a6617	IPVS: management of persistence engine modules This is based heavily on the scheduler management code Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>	2010-10-04 22:45:24 +09:00
Simon Horman	a3c918acd2	IPVS: Add persistence engine data to /proc/net/ip_vs_conn This shouldn't break compatibility with userspace as the new data is at the end of the line. I have confirmed that this doesn't break ipvsadm, the main (only?) user-space user of this data. Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>	2010-10-04 22:45:24 +09:00
Simon Horman	85999283a2	IPVS: Add struct ip_vs_pe Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>	2010-10-04 22:45:24 +09:00
Simon Horman	2fabf35bfc	IPVS: ip_vs_{un,}bind_scheduler NULL arguments In general NULL arguments aren't passed by the few callers that exist, so don't test for them. The exception is to make passing NULL to ip_vs_unbind_scheduler() a noop. Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>	2010-10-04 22:45:24 +09:00
Simon Horman	6e08bfb879	IPVS: Allow null argument to ip_vs_scheduler_put() This simplifies caller logic sightly. Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>	2010-10-04 22:45:24 +09:00
Simon Horman	f11017ec2d	IPVS: Add struct ip_vs_conn_param Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>	2010-10-04 22:45:24 +09:00
Simon Horman	5b57a98c1f	IPVS: compact ip_vs_sched_persist() Compact ip_vs_sched_persist() by setting up parameters and calling functions once. Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>	2010-10-04 22:45:23 +09:00
Simon Horman	001985b2c0	netfilter: nf_conntrack_sip: Add callid parser Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>	2010-10-04 22:45:23 +09:00
Simon Horman	5adbb9fb0c	netfilter: nf_conntrack_sip: Allow ct_sip_get_header() to be called with a null ct argument Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>	2010-10-04 22:45:23 +09:00
Eric Dumazet	c7d4426a98	net: introduce DST_NOCACHE flag While doing stress tests with IP route cache disabled, and multi queue devices, I noticed a very high contention on one rwlock used in neighbour code. When many cpus are trying to send frames (possibly using a high performance multiqueue device) to the same neighbour, they fight for the neigh->lock rwlock in order to call neigh_hh_init(), and fight on hh->hh_refcnt (a pair of atomic_inc/atomic_dec_and_test()) But we dont need to call neigh_hh_init() for dst that are used only once. It costs four atomic operations at least, on two contended cache lines, plus the high contention on neigh->lock rwlock. Introduce a new dst flag, DST_NOCACHE, that is set when dst was not inserted in route cache. With the stress test bench, sending 160000000 frames on one neighbour, results are : Before patch: real 2m28.406s user 0m11.781s sys 36m17.964s After patch: real 1m26.532s user 0m12.185s sys 20m3.903s Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-03 22:17:54 -07:00
David S. Miller	9a7241c21b	sctp: Fix break indentation in sctp_ioctl(). Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-03 22:14:37 -07:00
David S. Miller	7282907126	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6	2010-10-03 22:09:32 -07:00
Dan Rosenberg	51e97a12be	sctp: Fix out-of-bounds reading in sctp_asoc_get_hmac() The sctp_asoc_get_hmac() function iterates through a peer's hmac_ids array and attempts to ensure that only a supported hmac entry is returned. The current code fails to do this properly - if the last id in the array is out of range (greater than SCTP_AUTH_HMAC_ID_MAX), the id integer remains set after exiting the loop, and the address of an out-of-bounds entry will be returned and subsequently used in the parent function, causing potentially ugly memory corruption. This patch resets the id integer to 0 on encountering an invalid id so that NULL will be returned after finishing the loop if no valid ids are found. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-03 21:58:49 -07:00
Dan Rosenberg	d7e0d19aa0	sctp: prevent reading out-of-bounds memory Two user-controlled allocations in SCTP are subsequently dereferenced as sockaddr structs, without checking if the dereferenced struct members fall beyond the end of the allocated chunk. There doesn't appear to be any information leakage here based on how these members are used and additional checking, but it's still worth fixing. [akpm@linux-foundation.org: remove unfashionable newlines, fix gmail tab->space conversion] Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-03 21:58:48 -07:00
David Stevens	5b7c840667	ipv4: correct IGMP behavior on v3 query during v2-compatibility mode A recent patch to allow IGMPv2 responses to IGMPv3 queries bypasses length checks for valid query lengths, incorrectly resets the v2_seen timer, and does not support IGMPv1. The following patch responds with a v2 report as required by IGMPv2 while correcting the other problems introduced by the patch. Signed-Off-By: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-03 21:58:47 -07:00
Eric Dumazet	a8cb16dd9c	ipmr: cleanups Various code style cleanups Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-03 21:50:53 -07:00
Eric Dumazet	a8c9486b81	ipmr: RCU protection for mfc_cache_array Use RCU & RTNL protection for mfc_cache_array[] ipmr_cache_find() is called under rcu_read_lock(); Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-03 21:50:53 -07:00
Eric Dumazet	4c9687098f	ipmr: RCU conversion of mroute_sk Use RCU and RTNL to protect (struct mr_table)->mroute_sk Readers use RCU, writers use RTNL. ip_ra_control() already use an RCU grace period before ip_ra_destroy_rcu(), so we dont need synchronize_rcu() in mrtsock_destruct() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-03 21:50:52 -07:00
Eric Dumazet	55747a0a73	ipmr: __pim_rcv() is called under rcu_read_lock No need to get a reference on reg_dev and release it, we are in a rcu_read_lock() protected section. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-03 21:50:52 -07:00
stephen hemminger	ddcb4541e9	gre: protocol table can be static This table is only used in gre.c Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-03 21:50:51 -07:00
Ben Hutchings	c5d3557103	Revert "ipv4: Make INET_LRO a bool instead of tristate." This reverts commit `e81963b180`. LRO is now deprecated in favour of GRO, and only a few drivers use it, so it is desirable to build it as a module in distribution kernels. The original change to prevent building it as a module was made in an attempt to avoid the case where some dependents are set to y and some to m, and INET_LRO can be set to m rather than y. However, the Kconfig system will reliably set INET_LRO=y in this case. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-03 21:46:23 -07:00
Nagendra Tomar	482964e56e	net: Fix the condition passed to sk_wait_event() This patch fixes the condition (3rd arg) passed to sk_wait_event() in sk_stream_wait_memory(). The incorrect check in sk_stream_wait_memory() causes the following soft lockup in tcp_sendmsg() when the global tcp memory pool has exhausted. >>> snip <<< localhost kernel: BUG: soft lockup - CPU#3 stuck for 11s! [sshd:6429] localhost kernel: CPU 3: localhost kernel: RIP: 0010:[sk_stream_wait_memory+0xcd/0x200] [sk_stream_wait_memory+0xcd/0x200] sk_stream_wait_memory+0xcd/0x200 localhost kernel: localhost kernel: Call Trace: localhost kernel: [sk_stream_wait_memory+0x1b1/0x200] sk_stream_wait_memory+0x1b1/0x200 localhost kernel: [<ffffffff802557c0>] autoremove_wake_function+0x0/0x40 localhost kernel: [ipv6:tcp_sendmsg+0x6e6/0xe90] tcp_sendmsg+0x6e6/0xce0 localhost kernel: [sock_aio_write+0x126/0x140] sock_aio_write+0x126/0x140 localhost kernel: [xfs:do_sync_write+0xf1/0x130] do_sync_write+0xf1/0x130 localhost kernel: [<ffffffff802557c0>] autoremove_wake_function+0x0/0x40 localhost kernel: [hrtimer_start+0xe3/0x170] hrtimer_start+0xe3/0x170 localhost kernel: [vfs_write+0x185/0x190] vfs_write+0x185/0x190 localhost kernel: [sys_write+0x50/0x90] sys_write+0x50/0x90 localhost kernel: [system_call+0x7e/0x83] system_call+0x7e/0x83 >>> snip <<< What is happening is, that the sk_wait_event() condition passed from sk_stream_wait_memory() evaluates to true for the case of tcp global memory exhaustion. This is because both sk_stream_memory_free() and vm_wait are true which causes sk_wait_event() to not call schedule_timeout(). Hence sk_stream_wait_memory() returns immediately to the caller w/o sleeping. This causes the caller to again try allocation, which again fails and again calls sk_stream_wait_memory(), and so on. [ Bug introduced by commit `c1cbe4b7ad` ("[NET]: Avoid atomic xchg() for non-error case") -DaveM ] Signed-off-by: Nagendra Singh Tomar <tomer_iisc@yahoo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-03 20:41:32 -07:00
Maciej Żenczykowski	ae878ae280	net: Fix IPv6 PMTU disc. w/ asymmetric routes Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-03 14:49:00 -07:00
J. Bruce Fields	edc7a89403	nfsd: provide callbacks on svc_xprt deletion NFSv4.1 needs warning when a client tcp connection goes down, if that connection is being used as a backchannel, so that it can warn the client that it has lost the backchannel connection. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2010-10-01 19:29:44 -04:00
J. Bruce Fields	1e7af1b806	nfsd4: remove spkm3 Unfortunately, spkm3 never got very far; while interoperability with one other implementation was demonstrated at some point, problems were found with the spec that were deemed not worth fixing. The kernel code is useless on its own without nfs-utils patches which were never merged into nfs-utils, and were only ever available from citi.umich.edu. They appear not to have been updated since 2005. Therefore it seems safe to assume that this code has no users, and never will. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-01 18:09:55 -04:00
NeilBrown	277f68dbba	sunrpc: fix race in new cache_wait code. If we set up to wait for a cache item to be filled in, and then find that it is no longer pending, it could be that some other thread is in 'cache_revisit_request' and has moved our request to its 'pending' list. So when our setup_deferral calls cache_revisit_request it will find nothing to put on the pending list, and do nothing. We then return from cache_wait_req, thus leaving the 'sleeper' on-stack structure open to being corrupted by subsequent stack usage. However that 'sleeper' could still be on the 'pending' list that the other thread is looking at and so any corruption could cause it to behave badly. To avoid this race we simply take the same path as if the 'wait_for_completion_interruptible_timeout' was interrupted and if the sleeper is no longer on the list (which it won't be) we wait on the completion - which will ensure that any other cache_revisit_request will have let go of the sleeper. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-01 18:09:54 -04:00
Pavel Emelyanov	14ec63c333	sunrpc: Create sockets in net namespaces The context is already known in all the sock_create callers. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-01 17:19:00 -04:00
Pavel Emelyanov	721db93a55	net: Export __sock_create Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-01 17:18:59 -04:00
Pavel Emelyanov	37aa213373	sunrpc: Tag rpc_xprt with net The net is known from the xprt_create and this tagging will also give un the context in the conntection workers where real sockets are created. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-01 17:18:58 -04:00
Pavel Emelyanov	9a23e332ec	sunrpc: Add net to xprt_create Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-01 17:18:57 -04:00
Pavel Emelyanov	c653ce3f0a	sunrpc: Add net to rpc_create_args Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-01 17:18:56 -04:00
Pavel Emelyanov	62832c039e	sunrpc: Pull net argument downto svc_create_socket After this the socket creation in it knows the context. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-01 17:18:55 -04:00
Pavel Emelyanov	fc5d00b04a	sunrpc: Add net argument to svc_create_xprt Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-01 17:18:54 -04:00
Pavel Emelyanov	e204e621b4	sunrpc: Factor out rpc_xprt freeing Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-01 17:18:53 -04:00
Pavel Emelyanov	bd1722d431	sunrpc: Factor out rpc_xprt allocation Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2010-10-01 17:18:52 -04:00
John W. Linville	41f4a6f71f	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem	2010-10-01 11:12:36 -04:00
Eric Dumazet	0197aa38df	ipv4: rcu conversion in ip_route_output_slow ip_route_output_slow() is enclosed in an rcu_read_lock() protected section, so that no references are taken/released on device, thanks to __ip_dev_find() & dev_get_by_index_rcu() Tested with ip route cache disabled, and a stress test : Before patch: elapsed time : real 1m38.347s user 0m11.909s sys 23m51.501s Profile: 13788.00 22.7% ip_route_output_slow [kernel] 7875.00 13.0% dst_destroy [kernel] 3925.00 6.5% fib_semantic_match [kernel] 3144.00 5.2% fib_rules_lookup [kernel] 3061.00 5.0% dst_alloc [kernel] 2276.00 3.7% rt_set_nexthop [kernel] 1762.00 2.9% fib_table_lookup [kernel] 1538.00 2.5% _raw_read_lock [kernel] 1358.00 2.2% ip_output [kernel] After patch: real 1m28.808s user 0m13.245s sys 20m37.293s 10950.00 17.2% ip_route_output_slow [kernel] 10726.00 16.9% dst_destroy [kernel] 5170.00 8.1% fib_semantic_match [kernel] 3937.00 6.2% dst_alloc [kernel] 3635.00 5.7% rt_set_nexthop [kernel] 2900.00 4.6% fib_rules_lookup [kernel] 2240.00 3.5% fib_table_lookup [kernel] 1427.00 2.2% _raw_read_lock [kernel] 1157.00 1.8% kmem_cache_alloc [kernel] Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-30 21:16:06 -07:00
Eric Dumazet	82efee1499	ipv4: introduce __ip_dev_find() ip_dev_find(net, addr) finds a device given an IPv4 source address and takes a reference on it. Introduce __ip_dev_find(), taking a third argument, to optionally take the device reference. Callers not asking the reference to be taken should be in an rcu_read_lock() protected section. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-30 21:16:05 -07:00
Eric Dumazet	173e79fb70	vlan: dont drop packets from unknown vlans in promiscuous mode Roger Luethi noticed packets for unknown VLANs getting silently dropped even in promiscuous mode. Check for promiscuous mode in __vlan_hwaccel_rx() and vlan_gro_common() before drops. As suggested by Patrick, mark such packets to have skb->pkt_type set to PACKET_OTHERHOST to make sure they are dropped by IP stack. Reported-by: Roger Luethi <rl@hellgate.ch> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-30 18:04:21 -07:00
Eric Dumazet	dd28d1a0b5	ipv4: __mkroute_output() speedup While doing stress tests with a disabled IP route cache, I found __mkroute_output() was touching three times in_device atomic refcount. Use RCU to touch it once to reduce cache line ping pongs. Before patch time to perform the test real 1m42.009s user 0m12.545s sys 25m0.726s Profile : 16109.00 26.4% ip_route_output_slow vmlinux 7434.00 12.2% dst_destroy vmlinux 3280.00 5.4% fib_rules_lookup vmlinux 3252.00 5.3% fib_semantic_match vmlinux 2622.00 4.3% fib_table_lookup vmlinux 2535.00 4.1% dst_alloc vmlinux 1750.00 2.9% _raw_read_lock vmlinux 1532.00 2.5% rt_set_nexthop vmlinux After patch real 1m36.503s user 0m12.977s sys 23m25.608s 14234.00 22.4% ip_route_output_slow vmlinux 8717.00 13.7% dst_destroy vmlinux 4052.00 6.4% fib_rules_lookup vmlinux 3951.00 6.2% fib_semantic_match vmlinux 3191.00 5.0% dst_alloc vmlinux 1764.00 2.8% fib_table_lookup vmlinux 1692.00 2.7% _raw_read_lock vmlinux 1605.00 2.5% rt_set_nexthop vmlinux Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-30 17:59:30 -07:00
Rémi Denis-Courmont	e1a5964f0c	Phonet: restore flow control credits when sending fails This patch restores the below flow control patch submitted by Rémi Denis-Courmont, which accidentaly got lost due to Pipe controller patch on Phonet. commit `1a98214fee` Author: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Date: Mon Aug 30 12:57:03 2010 +0000 Phonet: restore flow control credits when sending fails Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kumar Sanghvi <kumar.sanghvi@stericsson.com> Acked-by: Linus Walleij <linus.walleij@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-30 17:57:30 -07:00

... 20 21 22 23 24 ...

18854 Commits