Commit Graph

3010 Commits

Author SHA1 Message Date
Herbert Xu
6abaaaae6d [IPSEC]: Fix tunnel error handling in ipcomp6
The error handling in ipcomp6_tunnel_create is broken in two ways:

1) If we fail to allocate an SPI (this should never happen in practice
since there are plenty of 32-bit SPI values for us to use), we will
still go ahead and create the SA.

2) When xfrm_init_state fails, we first of all may trigger the BUG_TRAP
in __xfrm_state_destroy because we didn't set the state to DEAD.  More
importantly we end up returning the freed state as if we succeeded!

This patch fixes them both.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-26 17:37:54 -08:00
Matthew Dobson
93d2341c75 [PATCH] mempool: use mempool_create_slab_pool()
Modify well over a dozen mempool users to call mempool_create_slab_pool()
rather than calling mempool_create() with extra arguments, saving about 30
lines of code and increasing readability.

Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:00 -08:00
Ingo Molnar
14cc3e2b63 [PATCH] sem2mutex: misc static one-file mutexes
Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jens Axboe <axboe@suse.de>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Cc: Greg KH <greg@kroah.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:55 -08:00
Linus Torvalds
1b9a391736 Merge branch 'audit.b3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current
* 'audit.b3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: (22 commits)
  [PATCH] fix audit_init failure path
  [PATCH] EXPORT_SYMBOL patch for audit_log, audit_log_start, audit_log_end and audit_format
  [PATCH] sem2mutex: audit_netlink_sem
  [PATCH] simplify audit_free() locking
  [PATCH] Fix audit operators
  [PATCH] promiscuous mode
  [PATCH] Add tty to syscall audit records
  [PATCH] add/remove rule update
  [PATCH] audit string fields interface + consumer
  [PATCH] SE Linux audit events
  [PATCH] Minor cosmetic cleanups to the code moved into auditfilter.c
  [PATCH] Fix audit record filtering with !CONFIG_AUDITSYSCALL
  [PATCH] Fix IA64 success/failure indication in syscall auditing.
  [PATCH] Miscellaneous bug and warning fixes
  [PATCH] Capture selinux subject/object context information.
  [PATCH] Exclude messages by message type
  [PATCH] Collect more inode information during syscall processing.
  [PATCH] Pass dentry, not just name, in fsnotify creation hooks.
  [PATCH] Define new range of userspace messages.
  [PATCH] Filter rule comparators
  ...

Fixed trivial conflict in security/selinux/hooks.c
2006-03-25 09:24:53 -08:00
Linus Torvalds
53846a21c1 Merge git://git.linux-nfs.org/pub/linux/nfs-2.6
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (103 commits)
  SUNRPC,RPCSEC_GSS: spkm3--fix config dependencies
  SUNRPC,RPCSEC_GSS: spkm3: import contexts using NID_cast5_cbc
  LOCKD: Make nlmsvc_traverse_shares return void
  LOCKD: nlmsvc_traverse_blocks return is unused
  SUNRPC,RPCSEC_GSS: fix krb5 sequence numbers.
  NFSv4: Dont list system.nfs4_acl for filesystems that don't support it.
  SUNRPC,RPCSEC_GSS: remove unnecessary kmalloc of a checksum
  SUNRPC: Ensure rpc_call_async() always calls tk_ops->rpc_release()
  SUNRPC: Fix memory barriers for req->rq_received
  NFS: Fix a race in nfs_sync_inode()
  NFS: Clean up nfs_flush_list()
  NFS: Fix a race with PG_private and nfs_release_page()
  NFSv4: Ensure the callback daemon flushes signals
  SUNRPC: Fix a 'Busy inodes' error in rpc_pipefs
  NFS, NLM: Allow blocking locks to respect signals
  NFS: Make nfs_fhget() return appropriate error values
  NFSv4: Fix an oops in nfs4_fill_super
  lockd: blocks should hold a reference to the nlm_file
  NFSv4: SETCLIENTID_CONFIRM should handle NFS4ERR_DELAY/NFS4ERR_RESOURCE
  NFSv4: Send the delegation stateid for SETATTR calls
  ...
2006-03-25 09:18:27 -08:00
Linus Torvalds
b55813a2e5 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
  [NETFILTER] x_table.c: sem2mutex
  [IPV4]: Aggregate route entries with different TOS values
  [TCP]: Mark tcp_*mem[] __read_mostly.
  [TCP]: Set default max buffers from memory pool size
  [SCTP]: Fix up sctp_rcv return value
  [NET]: Take RTNL when unregistering notifier
  [WIRELESS]: Fix config dependencies.
  [NET]: Fill in a 32-bit hole in struct sock on 64-bit platforms.
  [NET]: Ensure device name passed to SO_BINDTODEVICE is NULL terminated.
  [MODULES]: Don't allow statically declared exports
  [BRIDGE]: Unaligned accesses in the ethernet bridge
2006-03-25 08:39:20 -08:00
Davide Libenzi
f348d70a32 [PATCH] POLLRDHUP/EPOLLRDHUP handling for half-closed devices notifications
Implement the half-closed devices notifiation, by adding a new POLLRDHUP
(and its alias EPOLLRDHUP) bit to the existing poll/select sets.  Since the
existing POLLHUP handling, that does not report correctly half-closed
devices, was feared to be changed, this implementation leaves the current
POLLHUP reporting unchanged and simply add a new bit that is set in the few
places where it makes sense.  The same thing was discussed and conceptually
agreed quite some time ago:

http://lkml.org/lkml/2003/7/12/116

Since this new event bit is added to the existing Linux poll infrastruture,
even the existing poll/select system calls will be able to use it.  As far
as the existing POLLHUP handling, the patch leaves it as is.  The
pollrdhup-2.6.16.rc5-0.10.diff defines the POLLRDHUP for all the existing
archs and sets the bit in the six relevant files.  The other attached diff
is the simple change required to sys/epoll.h to add the EPOLLRDHUP
definition.

There is "a stupid program" to test POLLRDHUP delivery here:

 http://www.xmailserver.org/pollrdhup-test.c

It tests poll(2), but since the delivery is same epoll(2) will work equally.

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:56 -08:00
Jesper Juhl
21e4f95269 [PATCH] fix 'defined but not used' warning in net/rxrpc/main.c::rxrpc_initialise
net/rxrpc/main.c: In function `rxrpc_initialise':
net/rxrpc/main.c:83: warning: label `error_proc' defined but not used

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:52 -08:00
Al Viro
871751e25d [PATCH] slab: implement /proc/slab_allocators
Implement /proc/slab_allocators.   It produces output like:

idr_layer_cache: 80 idr_pre_get+0x33/0x4e
buffer_head: 2555 alloc_buffer_head+0x20/0x75
mm_struct: 9 mm_alloc+0x1e/0x42
mm_struct: 20 dup_mm+0x36/0x370
vm_area_struct: 384 dup_mm+0x18f/0x370
vm_area_struct: 151 do_mmap_pgoff+0x2e0/0x7c3
vm_area_struct: 1 split_vma+0x5a/0x10e
vm_area_struct: 11 do_brk+0x206/0x2e2
vm_area_struct: 2 copy_vma+0xda/0x142
vm_area_struct: 9 setup_arg_pages+0x99/0x214
fs_cache: 8 copy_fs_struct+0x21/0x133
fs_cache: 29 copy_process+0xf38/0x10e3
files_cache: 30 alloc_files+0x1b/0xcf
signal_cache: 81 copy_process+0xbaa/0x10e3
sighand_cache: 77 copy_process+0xe65/0x10e3
sighand_cache: 1 de_thread+0x4d/0x5f8
anon_vma: 241 anon_vma_prepare+0xd9/0xf3
size-2048: 1 add_sect_attrs+0x5f/0x145
size-2048: 2 journal_init_revoke+0x99/0x302
size-2048: 2 journal_init_revoke+0x137/0x302
size-2048: 2 journal_init_inode+0xf9/0x1c4

Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Alexander Nyberg <alexn@telia.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
DESC
slab-leaks3-locking-fix
EDESC
From: Andrew Morton <akpm@osdl.org>

Update for slab-remove-cachep-spinlock.patch

Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Alexander Nyberg <alexn@telia.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:49 -08:00
Ingo Molnar
9e19bb6d7a [NETFILTER] x_table.c: sem2mutex
Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-25 01:41:10 -08:00
Ilia Sotnikov
cef2685e00 [IPV4]: Aggregate route entries with different TOS values
When we get an ICMP need-to-frag message, the original TOS value in the
ICMP payload cannot be used as a key to look up the routes to update.
This is because the TOS field may have been modified by routers on the
way.  Similarly, ip_rt_redirect should also ignore the TOS as the router
that gave us the message may have modified the TOS value.

The patch achieves this objective by aggregating entries with different
TOS values (but are otherwise identical) into the same bucket.  This
makes it easy to update them at the same time when an ICMP message is
received.

In future we should use a twin-hashing scheme where teh aggregation
occurs at the entry level.  That is, the TOS goes back into the hash
for normal lookups while ICMP lookups will end up with a node that
gives us a list that contains all other route entries that differ
only by TOS.

Signed-off-by: Ilia Sotnikov <hostcc@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-25 01:38:55 -08:00
David S. Miller
b8059eadf9 [TCP]: Mark tcp_*mem[] __read_mostly.
Suggested by Stephen Hemminger.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-25 01:36:56 -08:00
John Heffner
7b4f4b5ebc [TCP]: Set default max buffers from memory pool size
This patch sets the maximum TCP buffer sizes (available to automatic
buffer tuning, not to setsockopt) based on the TCP memory pool size.
The maximum sndbuf and rcvbuf each will be up to 4 MB, but no more
than 1/128 of the memory pressure threshold.

Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-25 01:34:07 -08:00
Herbert Xu
2babf9daae [SCTP]: Fix up sctp_rcv return value
I was working on the ipip/xfrm problem and as usual I get side-tracked by
other problems.

As part of an attempt to change the IPv4 protocol handler calling
convention I found that SCTP violated the existing convention.

It's returning non-zero values after freeing the skb.  This is doubly bad
as 1) the skb gets resubmitted; 2) the return value is interpreted as a
protocol number.

This patch changes those return values to zero.

IPv6 doesn't suffer from this problem because it uses a positive return
value as an indication for resubmission.  So the only effect of this patch
there is to increment the IPSTATS_MIB_INDELIVERS counter which IMHO is
the right thing to do.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-25 01:25:29 -08:00
Herbert Xu
9f514950bb [NET]: Take RTNL when unregistering notifier
The netdev notifier call chain is currently unregistered without taking
any locks outside the notifier system.  Because the notifier system itself
does not synchronise unregistration with respect to the calling of the
chain, we as its user need to do our own locking.

We are supposed to take the RTNL for all calls to netdev notifiers, so
taking the RTNL should be sufficient to protect it.

The registration path in dev.c already takes the RTNL so it's OK.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-25 01:24:25 -08:00
David S. Miller
f67ed26f2b [NET]: Ensure device name passed to SO_BINDTODEVICE is NULL terminated.
Found by Solar Designer.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-24 15:44:59 -08:00
Peter Chubb
4dc6d9cc38 [BRIDGE]: Unaligned accesses in the ethernet bridge
I see lots of
	kernel unaligned access to 0xa0000001009dbb6f, ip=0xa000000100811591
	kernel unaligned access to 0xa0000001009dbb6b, ip=0xa0000001008115c1
	kernel unaligned access to 0xa0000001009dbb6d, ip=0xa0000001008115f1
messages in my logs on IA64 when using the ethernet bridge with 2.6.16.

Appended is a patch to fix them.

Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-24 15:44:57 -08:00
Alexey Dobriyan
53b3531bbb [PATCH] s/;;/;/g
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-24 07:33:24 -08:00
Paul Jackson
fffb60f93c [PATCH] cpuset memory spread: slab cache format
Rewrap the overly long source code lines resulting from the previous
patch's addition of the slab cache flag SLAB_MEM_SPREAD.  This patch
contains only formatting changes, and no function change.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-24 07:33:23 -08:00
Paul Jackson
4b6a9316fa [PATCH] cpuset memory spread: slab cache filesystems
Mark file system inode and similar slab caches subject to SLAB_MEM_SPREAD
memory spreading.

If a slab cache is marked SLAB_MEM_SPREAD, then anytime that a task that's
in a cpuset with the 'memory_spread_slab' option enabled goes to allocate
from such a slab cache, the allocations are spread evenly over all the
memory nodes (task->mems_allowed) allowed to that task, instead of favoring
allocation on the node local to the current cpu.

The following inode and similar caches are marked SLAB_MEM_SPREAD:

    file                               cache
    ====                               =====
    fs/adfs/super.c                    adfs_inode_cache
    fs/affs/super.c                    affs_inode_cache
    fs/befs/linuxvfs.c                 befs_inode_cache
    fs/bfs/inode.c                     bfs_inode_cache
    fs/block_dev.c                     bdev_cache
    fs/cifs/cifsfs.c                   cifs_inode_cache
    fs/coda/inode.c                    coda_inode_cache
    fs/dquot.c                         dquot
    fs/efs/super.c                     efs_inode_cache
    fs/ext2/super.c                    ext2_inode_cache
    fs/ext2/xattr.c (fs/mbcache.c)     ext2_xattr
    fs/ext3/super.c                    ext3_inode_cache
    fs/ext3/xattr.c (fs/mbcache.c)     ext3_xattr
    fs/fat/cache.c                     fat_cache
    fs/fat/inode.c                     fat_inode_cache
    fs/freevxfs/vxfs_super.c           vxfs_inode
    fs/hpfs/super.c                    hpfs_inode_cache
    fs/isofs/inode.c                   isofs_inode_cache
    fs/jffs/inode-v23.c                jffs_fm
    fs/jffs2/super.c                   jffs2_i
    fs/jfs/super.c                     jfs_ip
    fs/minix/inode.c                   minix_inode_cache
    fs/ncpfs/inode.c                   ncp_inode_cache
    fs/nfs/direct.c                    nfs_direct_cache
    fs/nfs/inode.c                     nfs_inode_cache
    fs/ntfs/super.c                    ntfs_big_inode_cache_name
    fs/ntfs/super.c                    ntfs_inode_cache
    fs/ocfs2/dlm/dlmfs.c               dlmfs_inode_cache
    fs/ocfs2/super.c                   ocfs2_inode_cache
    fs/proc/inode.c                    proc_inode_cache
    fs/qnx4/inode.c                    qnx4_inode_cache
    fs/reiserfs/super.c                reiser_inode_cache
    fs/romfs/inode.c                   romfs_inode_cache
    fs/smbfs/inode.c                   smb_inode_cache
    fs/sysv/inode.c                    sysv_inode_cache
    fs/udf/super.c                     udf_inode_cache
    fs/ufs/super.c                     ufs_inode_cache
    net/socket.c                       sock_inode_cache
    net/sunrpc/rpc_pipe.c              rpc_inode_cache

The choice of which slab caches to so mark was quite simple.  I marked
those already marked SLAB_RECLAIM_ACCOUNT, except for fs/xfs, dentry_cache,
inode_cache, and buffer_head, which were marked in a previous patch.  Even
though SLAB_RECLAIM_ACCOUNT is for a different purpose, it marks the same
potentially large file system i/o related slab caches as we need for memory
spreading.

Given that the rule now becomes "wherever you would have used a
SLAB_RECLAIM_ACCOUNT slab cache flag before (usually the inode cache), use
the SLAB_MEM_SPREAD flag too", this should be easy enough to maintain.
Future file system writers will just copy one of the existing file system
slab cache setups and tend to get it right without thinking.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-24 07:33:23 -08:00
Trond Myklebust
1ebbe2b200 Merge branch 'linus' 2006-03-23 23:44:19 -05:00
Linus Torvalds
aca361c1a0 Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: (45 commits)
  [PATCH] Restore channel setting after scan.
  [PATCH] hostap: Fix memory leak on PCI probe error path
  [PATCH] hostap: Remove dead code (duplicated idx != 0)
  [PATCH] hostap: Fix unlikely read overrun in CIS parsing
  [PATCH] hostap: Fix double free in prism2_config() error path
  [PATCH] hostap: Fix ap_add_sta() return value verification
  [PATCH] hostap: Fix hw reset after CMDCODE_ACCESS_WRITE timeout
  [PATCH] wireless/airo: cache wireless scans
  [PATCH] wireless/airo: define default MTU
  [PATCH] wireless/airo: clean up printk usage to print device name
  [PATCH] WE-20 for kernel 2.6.16
  [PATCH] softmac: remove function_enter()
  [PATCH] skge: version 1.5
  [PATCH] skge: compute available ring buffers
  [PATCH] skge: dont free skb until multi-part transmit complete
  [PATCH] skge: multicast statistics fix
  [PATCH] skge: rx_reuse called twice
  [PATCH] skge: dont use dev_alloc_skb for rx buffs
  [PATCH] skge: align receive buffers
  [PATCH] sky2: dont need to use dev_kfree_skb_any
  ...
2006-03-23 16:25:49 -08:00
Jeff Garzik
9b7c84899e Merge branch 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2006-03-23 17:15:41 -05:00
David Woodhouse
4edac92fcf [PATCH] Restore channel setting after scan.
After a scan, we weren't switching back to the original channel if we
were associated with an AP. So NetworkManager's periodic scans would
disrupt connectivity until the ESSID was manually set again. Fix that.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-23 16:18:47 -05:00
Jean Tourrilhes
711e2c33ac [PATCH] WE-20 for kernel 2.6.16
This is version 20 of the Wireless Extensions. This is the
completion of the RtNetlink work I started early 2004, it enables the
full Wireless Extension API over RtNetlink.

	Few comments on the patch :
	o totally driver transparent, no change in drivers needed.
	o iwevent were already RtNetlink based since they were created
(around 2.5.7). This adds all the regular SET and GET requests over
RtNetlink, using the exact same mechanism and data format as iwevents.
	o This is a Kconfig option, as currently most people have no
need for it. Surprisingly, patch is actually small and well
encapsulated.
	o Tested on SMP, attention as been paid to make it 64 bits clean.
	o Code do probably too many checks and could be further
optimised, but better safe than sorry.
	o RtNetlink based version of the Wireless Tools available on
my web page for people inclined to try out this stuff.

	I would also like to thank Alexey Kuznetsov for his helpful
suggestions to make this patch better.

Signed-off-by: Jean Tourrilhes <jt@hpl.hp.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-23 07:12:57 -05:00
John W. Linville
9a107aa24a [PATCH] softmac: remove function_enter()
Remove the function_enter() debugging macros.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-23 07:12:36 -05:00
Patrick McHardy
b30bd282cb [IPV6]: ip6_xmit: remove unnecessary NULL ptr check
The sk argument to ip6_xmit is never NULL nowadays since the skb->priority
assigment expects a valid socket.

Coverity #354

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-23 01:17:25 -08:00
Patrick McHardy
1ae39a430b [NET_SCHED]: cls_u32: remove unnecessary NULL-ptr check
In both cases n can't be NULL without crashing anyway.

Coverity #78

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-23 01:16:48 -08:00
Patrick McHardy
a5cdc03003 [IPV4]: Add fib rule netlink notifications
To really make sense of route notifications in the presence of
multiple tables, userspace also needs to be notified about routing
rule updates.  Notifications are sent to the so far unused
RTNLGRP_NOP1 (now RTNLGRP_RULE) group.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-23 01:16:06 -08:00
Steven Whitehouse
ca6549af77 [PKTGEN]: Add MPLS extension.
Signed-off-by: Steven Whitehouse <steve@chygwyn.com>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-23 01:10:26 -08:00
Jeff Garzik
fa4fa40a99 Merge branch 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2006-03-22 22:55:57 -05:00
Larry Finger
fe0b06b123 [PATCH] Fix softmac scan
Softmac scanning fails because the stop flag is not cleared before
scanning is started. The attached one-line patch fixes this.

Signed-Off-By: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-22 22:17:02 -05:00
Johannes Berg
1196862b79 [PATCH] softmac: remove dead code
This patch removes ieee80211softmac_reassoc which is neither implemented
nor used nor necessary.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-22 22:17:01 -05:00
Johannes Berg
b6c7658ef8 [PATCH] softmac: add reassociation code
This patch adds handling of reassociation to softmac when the AP
requests it. Patch from Larry Finger.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-22 22:17:01 -05:00
Johannes Berg
b10c991fa4 [PATCH] softmac: update deauth handler to quiet warning
Recently the deauth packet handler was updated to use a deauth packet
struct (identical to the auth packet struct) and this now gives a
warning. This patch updates the code to properly use a deauth struct and
deauth variable.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-22 22:17:00 -05:00
Johannes Berg
f484d582d3 [PATCH] trivial fixes to softmac
This patch removes a blank line that shouldn't be there and fixes a
spelling error in softmac.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-22 22:16:59 -05:00
Johannes Berg
7985905106 [PATCH] update copyright in softmac
This patch updates the copyright statements in softmac that I
erroneously added for 2005 only (when we already had 2006).

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-22 22:16:59 -05:00
Denis Vlasenko
1a995b45a5 [PATCH] ieee80211_rx_any: filter out packets, call ieee80211_rx or ieee80211_rx_mgt
Version 2 of the patch. Added checks for version 0
and proper from/to DS bits. Even in promisc
mode we won't receive packets from another BSSes.

bcm43xx_rx() contains code to filter out packets from
foreign BSSes and decide whether to call ieee80211_rx
or ieee80211_rx_mgt. This is not bcm specific.

Patch adapts that code and adds it to 80211
as ieee80211_rx_any() function.

Signed-off-by: Denis Vlasenko <vda@ilport.com.ua>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-22 22:16:58 -05:00
Johannes Berg
4c718cfd7d [PATCH] softmac: move EXPORT_SYMBOL_GPL right after functions
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-22 22:16:58 -05:00
Johannes Berg
9ebdd46681 [PATCH] softmac: add MODULE_DESCRIPTION and MODULE_AUTHORs
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-22 22:16:57 -05:00
Johannes Berg
4855d25b1e [PATCH] softmac: add copyright and license headers
add copyright and license headers to all softmac files

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-22 22:16:56 -05:00
Johannes Berg
b2b9b6518e [PATCH] softmac: some comment stuff
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-22 22:16:56 -05:00
Johannes Berg
bba52d5e9e [PATCH] softmac: properly check return value of ieee80211softmac_alloc_mgt
Properly check return value of ieee80211softmac_alloc_mgt
in ieee80211softmac_disassoc_deauth (patch by Denis Vlasenko)

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-22 22:16:55 -05:00
Johannes Berg
1dc09776d7 [PATCH] softmac: scan at least once before selecting a network by essid
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-22 22:16:55 -05:00
Johannes Berg
48b2e4ce69 [PATCH] softmac: check if disassociation is for us before processing it
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-22 22:16:54 -05:00
Johannes Berg
78e4f36e05 [PATCH] softmac: select "best" network based on rssi
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-22 22:16:54 -05:00
Johannes Berg
51da28a847 [PATCH] softmac: add fixme for disassoc
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-22 22:16:53 -05:00
Johannes Berg
d1469cf2c7 [PATCH] softmac: try to reassociate when being disassociated from the AP
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-22 22:16:53 -05:00
Johannes Berg
2dd50801b3 [PATCH] softmac: correctly use netif_carrier_{on,off}
TODO: add callbacks for ifup/ifdown (see mailing list)

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-22 22:16:52 -05:00
Johannes Berg
5c4df6da58 [PATCH] softmac: convert to use global workqueue
Convert softmac to use global workqueue instead of private one...

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-22 22:16:52 -05:00
Johannes Berg
45867e6a55 [PATCH] softmac: fix Makefiles
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-22 22:16:51 -05:00
Johannes Berg
714e1a5116 [PATCH] softmac: fix some sparse warnings
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-22 22:16:51 -05:00
Johannes Berg
32821837fa [PATCH] make softmac depend on IEEE80211 and EXPERIMENTAL
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-22 22:16:50 -05:00
Johannes Berg
370121e519 [PATCH] wireless: Add softmac layer to the kernel
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-22 22:16:50 -05:00
Alexey Kuznetsov
1a55d57b10 [TCP]: Do not use inet->id of global tcp_socket when sending RST.
The problem is in ip_push_pending_frames(), which uses:

        if (!df) {
                __ip_select_ident(iph, &rt->u.dst, 0);
        } else {
                iph->id = htons(inet->id++);
        }

instead of ip_select_ident().

Right now I think the code is a nonsense. Most likely, I copied it from
old ip_build_xmit(), where it was really special, we had to decide
whether to generate unique ID when generating the first (well, the last)
fragment.

In ip_push_pending_frames() it does not make sense, it should use plain
ip_select_ident() instead.

Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-22 14:27:59 -08:00
Patrick McHardy
6a534ee35c [NETFILTER]: Fix undefined references to get_h225_addr
get_h225_addr is exported, but declared static, which fails when
linking statically.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-22 13:57:25 -08:00
Patrick McHardy
81fbfd6925 [NETFILTER]: Fix xt_policy address matching
Fix missing inversion in address matching, it was broken during the
conversion to x_tables.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-22 13:56:33 -08:00
Pablo Neira Ayuso
b9f78f9fca [NETFILTER]: nf_conntrack: support for layer 3 protocol load on demand
x_tables matches and targets that require nf_conntrack_ipv[4|6] to work
don't have enough information to load on demand these modules. This
patch introduces the following changes to solve this issue:

o nf_ct_l3proto_try_module_get: try to load the layer 3 connection
tracker module and increases the refcount.
o nf_ct_l3proto_module put: drop the refcount of the module.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-22 13:56:08 -08:00
Pablo Neira Ayuso
a45049c51c [NETFILTER]: x_tables: set the protocol family in x_tables targets/matches
Set the family field in xt_[matches|targets] registered.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-22 13:55:40 -08:00
Pablo Neira Ayuso
4e3882f773 [NETFILTER]: conntrack: cleanup the conntrack ID initialization
Currently the first conntrack ID assigned is 2, use 1 instead.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-22 13:55:11 -08:00
Pablo Neira Ayuso
f0d835835b [NETFILTER]: nfnetlink_queue: fix nfnetlink message size
Fix oversized message, use NLMSG_SPACE just one since it reserves space
for the netlink header and NFA_SPACE for every attribute.

Thanks to Harald Welte for the feedback

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-22 13:54:40 -08:00
Pablo Neira Ayuso
1cde64365b [NETFILTER]: ctnetlink: Fix expectaction mask dumping
The expectation mask has some particularities that requires a different
handling. The protocol number fields can be set to non-valid protocols,
ie. l3num is set to 0xFFFF. Since that protocol does not exist, the mask
tuple will not be dumped. Moreover, this results in a kernel panic when
nf_conntrack accesses the array of protocol handlers, that is PF_MAX (0x1F)
long.

This patch introduces the function ctnetlink_exp_dump_mask, that correctly
dumps the expectation mask. Such function uses the l3num value from the
expectation tuple that is a valid layer 3 protocol number. The value of the
l3num mask isn't dumped since it is meaningless from the userspace side.

Thanks to Yasuyuki Kozakai and Patrick McHardy for the feedback.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-22 13:54:15 -08:00
Thomas Vgtle
50b521aa54 [NETFILTER]: Fix Kconfig typos
Signed-off-by: Thomas Vgtle <tv@lio96.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-22 13:53:48 -08:00
Patrick McHardy
443da0d527 [NETFILTER]: Fix ip6tables breakage from {get,set}sockopt compat layer
do_ipv6_getsockopt returns -EINVAL for unknown options, not
-ENOPROTOOPT as do_ipv6_setsockopt.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-22 13:53:20 -08:00
Shaun Pereira
9a6b9f2e76 [X25]: dte facilities 32 64 ioctl conversion
Allows dte facility patch to use 32 64 bit ioctl conversion mechanism

Signed-off-by: Shaun Pereira <spereira@tusc.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-22 00:02:00 -08:00
Shaun Pereira
a64b7b936d [X25]: allow ITU-T DTE facilities for x25
Allows use of the optional user facility to insert ITU-T
(http://www.itu.int/ITU-T/) specified DTE facilities in call set-up x25
packets.  This feature is optional; no facilities will be added if the ioctl
is not used, and call setup packet remains the same as before.

If the ioctls provided by the patch are used, then a facility marker will be
added to the x25 packet header so that the called dte address extension
facility can be differentiated from other types of facilities (as described in
the ITU-T X.25 recommendation) that are also allowed in the x25 packet header.

Facility markers are made up of two octets, and may be present in the x25
packet headers of call-request, incoming call, call accepted, clear request,
and clear indication packets.  The first of the two octets represents the
facility code field and is set to zero by this patch.  The second octet of the
marker represents the facility parameter field and is set to 0x0F because the
marker will be inserted before ITU-T type DTE facilities.

Since according to ITU-T X.25 Recommendation X.25(10/96)- 7.1 "All networks
will support the facility markers with a facility parameter field set to all
ones or to 00001111", therefore this patch should work with all x.25 networks.

While there are many ITU-T DTE facilities, this patch implements only the
called and calling address extension, with placeholders in the
x25_dte_facilities structure for the rest of the facilities.

Testing:

This patch was tested using a cisco xot router connected on its serial ports
to an X.25 network, and on its lan ports to a host running an xotd daemon.

It is also possible to test this patch using an xotd daemon and an x25tap
patch, where the xotd daemons work back-to-back without actually using an x.25
network.  See www.fyonne.net for details on how to do this.

Signed-off-by: Shaun Pereira <spereira@tusc.com.au>
Acked-by: Andrew Hendry <ahendry@tusc.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-22 00:01:31 -08:00
Shaun Pereira
bac37ec830 [X25]: fix kernel error message 64 bit kernel
Fixes the following error from kernel
T2 kernel: schedule_timeout:
wrong timeout value ffffffffffffffff from ffffffff88164796

Signed-off-by: Shaun Pereira <spereira@tusc.com.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-22 00:00:40 -08:00
Shaun Pereira
1b06e6ba25 [X25]: ioctl conversion 32 bit user to 64 bit kernel
To allow 32 bit x25 module structures to be passed to a 64 bit kernel via
ioctl using the new compat_sock_ioctl registration mechanism instead of the
obsolete 'register_ioctl32_conversion into hash table' mechanism

Signed-off-by: Shaun Pereira <spereira@tusc.com.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-22 00:00:12 -08:00
Shaun Pereira
f0ac261441 [NET]: socket timestamp 32 bit handler for 64 bit kernel
Get socket timestamp handler function that does not use the
ioctl32_hash_table.

Signed-off-by: Shaun Pereira <spereira@tusc.com.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-21 23:59:39 -08:00
Shaun Pereira
89bbfc95d6 [NET]: allow 32 bit socket ioctl in 64 bit kernel
Since the register_ioctl32_conversion() patch in the kernel is now obsolete,
provide another method to allow 32 bit user space ioctls to reach the kernel.

Signed-off-by: Shaun Pereira <spereira@tusc.com.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-21 23:58:08 -08:00
Tobias Klauser
67b52e554b [BLUETOOTH]: Return negative error constant
Return negative error constant.

Signed-off-by: Tobias Klauser <tklauser@nuerscht.ch>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-21 23:53:16 -08:00
Trond Myklebust
ac58c9059d Merge branch 'linus' 2006-03-21 12:08:21 -05:00
Jing Min Zhao
5e35941d99 [NETFILTER]: Add H.323 conntrack/NAT helper
Signed-off-by: Jing Min Zhao <zhaojignmin@hotmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 23:41:17 -08:00
Ingo Oeser
322f74a432 [IPV6]: Cleanups for net/ipv6/addrconf.c (kzalloc, early exit) v2
Here are some possible (and trivial) cleanups.
- use kzalloc() where possible
- invert allocation failure test like
  if (object) {
        /* Rest of function here */
  }
  to

  if (object == NULL)
        return NULL;

  /* Rest of function here */

Signed-off-by: Ingo Oeser <ioe-lkml@rameria.de>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 23:01:47 -08:00
Ingo Oeser
0c600eda4b [IPV6]: Nearly complete kzalloc cleanup for net/ipv6
Stupidly use kzalloc() instead of kmalloc()/memset()
everywhere where this is possible in net/ipv6/*.c .

Signed-off-by: Ingo Oeser <ioe-lkml@rameria.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 23:01:32 -08:00
Ingo Oeser
78c784c47a [IPV6]: Cleanup of net/ipv6/reassambly.c
Two minor cleanups:

1. Using kzalloc() in fraq_alloc_queue()
   saves the memset() in ipv6_frag_create().

2. Invert sense of if-statements to streamline code.
   Inverts the comment, too.

Signed-off-by: Ingo Oeser <ioe-lkml@rameria.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 23:01:17 -08:00
Andrew Morton
b3e83d6d18 [BRIDGE]: Remove duplicate const from is_link_local() argument type.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 23:00:56 -08:00
Adrian Bunk
3400112794 [DECNET]: net/decnet/dn_route.c: fix inconsequent NULL checking
The Coverity checker noted this inconsequent NULL checking in
dnrt_drop().

Since all callers ensure that NULL isn't passed, we can simply remove
the check.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 23:00:29 -08:00
Stephen Hemminger
12ac84c4a9 [BRIDGE]: use LLC to send STP
The bridge code can use existing LLC output code when building
spanning tree protocol packets.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:59:49 -08:00
Stephen Hemminger
f4ad2b162d [LLC]: llc_mac_hdr_init const arguments
Cleanup of LLC.  llc_mac_hdr_init can take constant arguments,
and it is defined twice once in llc_output.h that is otherwise unused.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Acked-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:59:36 -08:00
Stephen Hemminger
fda93d92d7 [BRIDGE]: allow show/store of group multicast address
Bridge's communicate with each other using Spanning Tree Protocol
over a standard multicast address. There are times when testing or
layering bridges over existing topologies or tunnels, when it is
useful to use alternative multicast addresses for STP packets.

The 802.1d standard has some unused addresses, that can be used for this.
This patch is restrictive in that it only allows one of the possible
addresses in the standard.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:59:21 -08:00
Stephen Hemminger
cf0f02d04a [BRIDGE]: use llc for receiving STP packets
Use LLC for the receive path of Spanning Tree Protocol packets.
This allows link local multicast packets to be received by
other protocols (if they care), and uses the existing LLC
code to get STP packets back into bridge code.

The bridge multicast address is also checked, so bridges using
other link local multicast addresses are ignored. This allows
for use of different multicast addresses to define separate STP
domains.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:59:06 -08:00
Stephen Hemminger
18fdb2b25b [BRIDGE]: stp timer to jiffies cleanup
Cleanup the get/set of bridge timer value in the packets.
It is clearer not to bury the conversion in macro.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:58:49 -08:00
Stephen Hemminger
f8ae737dee [BRIDGE]: forwarding remove unneeded preempt and bh diasables
Optimize the forwarding and transmit paths. Both places are
called with bottom half/no preempt so there is no need to use
spin_lock_bh or rcu_read_lock.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:58:36 -08:00
Stephen Hemminger
fdeabdefb2 [BRIDGE]: netfilter inline cleanup
Move nf_bridge_alloc from header file to the one place it is
used and optimize it.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:58:21 -08:00
Stephen Hemminger
8b42ec3926 [BRIDGE]: netfilter VLAN macro cleanup
Fix the VLAN macros in bridge netfilter code. Macros should
not depend on magic variables.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:58:05 -08:00
Stephen Hemminger
f8a2602861 [BRIDGE]: netfilter dont use __constant_htons
Only use__constant_htons() for initializers and switch cases.
For other uses, it is just as efficient and clearer to use htons

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:57:46 -08:00
Stephen Hemminger
789bc3e5b6 [BRIDGE]: netfilter whitespace
Run br_netfilter through Lindent to fix whitespace.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:57:32 -08:00
Stephen Hemminger
d5513a7d32 [BRIDGE]: optimize frame pass up
The netfilter hook that is used to receive frames doesn't need to be a
stub.  It is only called in two ways, both of which ignore the return
value.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:57:18 -08:00
Stephen Hemminger
cee4854122 [BRIDGE]: use kzalloc
Use kzalloc versus kmalloc+memset. Also don't need to do
memset() of bridge address since it is in netdev private data
that is already zero'd in alloc_netdev.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:57:03 -08:00
Stephen Hemminger
3b781fa10b [BRIDGE]: use kcalloc
Use kcalloc rather than kmalloc + memset.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:56:50 -08:00
Stephen Hemminger
a95fcacdc3 [BRIDGE]: use setup_timer
Use the now standard setup_timer function.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:56:38 -08:00
Stephen Hemminger
e3efe08e9a [BRIDGE]: remove unneeded bh disables
The STP timers run off softirq (kernel timers), so there is no need to
disable bottom half in the spin locks.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:56:25 -08:00
Andrew Morton
9ebddc1aa3 [BRIDGE] br_netfilter: Warning fixes.
net/bridge/br_netfilter.c: In function `br_nf_pre_routing':
net/bridge/br_netfilter.c:427: warning: unused variable `vhdr'
net/bridge/br_netfilter.c:445: warning: unused variable `vhdr'

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:55:24 -08:00
Andrew Morton
74ca4e5acd [BRIDGE] ebtables: Build fix.
net/bridge/netfilter/ebtables.c:1481: warning: initialization makes pointer from integer without a cast

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:55:02 -08:00
David S. Miller
dbeff12b4d [INET]: Fix typo in Arnaldo's connection sock compat fixups.
"struct inet_csk" --> "struct inet_connection_sock" :-)

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:52:32 -08:00
Arnaldo Carvalho de Melo
8ca0d17bd7 [DCCP] feat: Pass dccp_minisock ptr where only the minisock is used
This is in preparation for having a dccp_minisock embedded into
dccp_request_sock so that feature negotiation can be done prior to
creating the full blown dccp_sock.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:51:53 -08:00
Arnaldo Carvalho de Melo
a4bf390242 [DCCP] minisock: Rename struct dccp_options to struct dccp_minisock
This will later be included in struct dccp_request_sock so that we can
have per connection feature negotiation state while in the 3way
handshake, when we clone the DCCP_ROLE_LISTEN socket (in
dccp_create_openreq_child) we'll just copy this state from
dreq_minisock to dccps_minisock.

Also the feature negotiation and option parsing code will mostly touch
dccps_minisock, which will simplify some stuff.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:50:58 -08:00
Arnaldo Carvalho de Melo
543d9cfeec [NET]: Identation & other cleanups related to compat_[gs]etsockopt cset
No code changes, just tidying up, in some cases moving EXPORT_SYMBOLs
to just after the function exported, etc.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:48:35 -08:00
Arnaldo Carvalho de Melo
f94691acf9 [SK_BUFF]: export skb_pull_rcsum
*** Warning: "skb_pull_rcsum" [net/bridge/bridge.ko] undefined!
*** Warning: "skb_pull_rcsum" [net/8021q/8021q.ko] undefined!
*** Warning: "skb_pull_rcsum" [drivers/net/pppoe.ko] undefined!
*** Warning: "skb_pull_rcsum" [drivers/net/ppp_generic.ko] undefined!

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:47:55 -08:00
Arnaldo Carvalho de Melo
dec73ff029 [ICSK] compat: Introduce inet_csk_compat_[gs]etsockopt
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:46:16 -08:00
Arnaldo Carvalho de Melo
d1d47beef8 [SNAP]: Remove leftover unused hdr variable
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:45:37 -08:00
Dmitry Mishin
3fdadf7d27 [NET]: {get|set}sockopt compatibility layer
This patch extends {get|set}sockopt compatibility layer in order to
move protocol specific parts to their place and avoid huge universal
net/compat.c file in the future.

Signed-off-by: Dmitry Mishin <dim@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:45:21 -08:00
Dave Jones
c750360938 [IPV6]: remove useless test in ip6_append_data
We've already dereferenced 'np' a dozen
times at this point, so it's safe to say it's not null.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:44:52 -08:00
Adrian Bunk
afb5bb5744 [PKT_SCHED]: Let NET_CLS_ACT no longer depend on EXPERIMENTAL
This option should IMHO no longer depend on EXPERIMENTAL.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
ACKed-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:44:24 -08:00
Herbert Xu
cbb042f9e1 [NET]: Replace skb_pull/skb_postpull_rcsum with skb_pull_rcsum
We're now starting to have quite a number of places that do skb_pull
followed immediately by an skb_postpull_rcsum.  We can merge these two
operations into one function with skb_pull_rcsum.  This makes sense
since most pull operations on receive skb's need to update the
checksum.

I've decided to make this out-of-line since it is fairly big and the
fast path where hardware checksums are enabled need to call
csum_partial anyway.

Since this is a brand new function we get to add an extra check on the
len argument.  As it is most callers of skb_pull ignore its return
value which essentially means that there is no check on the len
argument.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:43:56 -08:00
Steven Whitehouse
ecba320f2e [DECnet]: Use RCU locking in dn_rules.c
As per Robert Olsson's patch for ipv4, this is the DECnet
version to keep the code "in step". It changes the list
of rules to use RCU rather than an rwlock.

Inspired-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: Steven Whitehouse <steve@chygwyn.com>
Signed-off-by: Patrick Caulfield <patrick@tykepenguin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:43:28 -08:00
Patrick Caulfield
c60992db46 [DECnet]: Patch to fix recvmsg() flag check
This patch means that 64bit kernel/32bit userland platforms will now
work correctly with DECnet.

Signed-off-by: Patrick Caulfield <patrick@tykepenguin.com>
Signed-off-by: Steven Whitehouse <steve@chygwyn.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:43:05 -08:00
Steven Whitehouse
c4ea94ab37 [DECnet]: Endian annotation and fixes for DECnet.
The typedef for dn_address has been removed in favour of using __le16
or __u16 directly as appropriate. All the DECnet header files are
updated accordingly.

The byte ordering of dn_eth2dn() and dn_dn2eth() are both changed
since just about all their callers wanted network order rather than
host order, so the conversion is now done in the functions themselves.

Several missed endianess conversions have been picked up during the
conversion process. The nh_gw field in struct dn_fib_info has been
changed from a 32 bit field to 16 bits as it ought to be.

One or two cases of using htons rather than dn_htons in the routing
code have been found and fixed.

There are still a few warnings to fix, but this patch deals with the
important cases.

Signed-off-by: Steven Whitehouse <steve@chygwyn.com>
Signed-off-by: Patrick Caulfield <patrick@tykepenguin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:42:39 -08:00
Catherine Zhang
2c7946a7bf [SECURITY]: TCP/UDP getpeersec
This patch implements an application of the LSM-IPSec networking
controls whereby an application can determine the label of the
security association its TCP or UDP sockets are currently connected to
via getsockopt and the auxiliary data mechanism of recvmsg.

Patch purpose:

This patch enables a security-aware application to retrieve the
security context of an IPSec security association a particular TCP or
UDP socket is using.  The application can then use this security
context to determine the security context for processing on behalf of
the peer at the other end of this connection.  In the case of UDP, the
security context is for each individual packet.  An example
application is the inetd daemon, which could be modified to start
daemons running at security contexts dependent on the remote client.

Patch design approach:

- Design for TCP
The patch enables the SELinux LSM to set the peer security context for
a socket based on the security context of the IPSec security
association.  The application may retrieve this context using
getsockopt.  When called, the kernel determines if the socket is a
connected (TCP_ESTABLISHED) TCP socket and, if so, uses the dst_entry
cache on the socket to retrieve the security associations.  If a
security association has a security context, the context string is
returned, as for UNIX domain sockets.

- Design for UDP
Unlike TCP, UDP is connectionless.  This requires a somewhat different
API to retrieve the peer security context.  With TCP, the peer
security context stays the same throughout the connection, thus it can
be retrieved at any time between when the connection is established
and when it is torn down.  With UDP, each read/write can have
different peer and thus the security context might change every time.
As a result the security context retrieval must be done TOGETHER with
the packet retrieval.

The solution is to build upon the existing Unix domain socket API for
retrieving user credentials.  Linux offers the API for obtaining user
credentials via ancillary messages (i.e., out of band/control messages
that are bundled together with a normal message).

Patch implementation details:

- Implementation for TCP
The security context can be retrieved by applications using getsockopt
with the existing SO_PEERSEC flag.  As an example (ignoring error
checking):

getsockopt(sockfd, SOL_SOCKET, SO_PEERSEC, optbuf, &optlen);
printf("Socket peer context is: %s\n", optbuf);

The SELinux function, selinux_socket_getpeersec, is extended to check
for labeled security associations for connected (TCP_ESTABLISHED ==
sk->sk_state) TCP sockets only.  If so, the socket has a dst_cache of
struct dst_entry values that may refer to security associations.  If
these have security associations with security contexts, the security
context is returned.

getsockopt returns a buffer that contains a security context string or
the buffer is unmodified.

- Implementation for UDP
To retrieve the security context, the application first indicates to
the kernel such desire by setting the IP_PASSSEC option via
getsockopt.  Then the application retrieves the security context using
the auxiliary data mechanism.

An example server application for UDP should look like this:

toggle = 1;
toggle_len = sizeof(toggle);

setsockopt(sockfd, SOL_IP, IP_PASSSEC, &toggle, &toggle_len);
recvmsg(sockfd, &msg_hdr, 0);
if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) {
    cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr);
    if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) &&
        cmsg_hdr->cmsg_level == SOL_IP &&
        cmsg_hdr->cmsg_type == SCM_SECURITY) {
        memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext));
    }
}

ip_setsockopt is enhanced with a new socket option IP_PASSSEC to allow
a server socket to receive security context of the peer.  A new
ancillary message type SCM_SECURITY.

When the packet is received we get the security context from the
sec_path pointer which is contained in the sk_buff, and copy it to the
ancillary message space.  An additional LSM hook,
selinux_socket_getpeersec_udp, is defined to retrieve the security
context from the SELinux space.  The existing function,
selinux_socket_getpeersec does not suit our purpose, because the
security context is copied directly to user space, rather than to
kernel space.

Testing:

We have tested the patch by setting up TCP and UDP connections between
applications on two machines using the IPSec policies that result in
labeled security associations being built.  For TCP, we can then
extract the peer security context using getsockopt on either end.  For
UDP, the receiving end can retrieve the security context using the
auxiliary data mechanism of recvmsg.

Signed-off-by: Catherine Zhang <cxzhang@watson.ibm.com>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:41:23 -08:00
Patrick McHardy
be33690d8f [XFRM]: Fix aevent related crash
When xfrm_user isn't loaded xfrm_nl is NULL, which makes IPsec crash because
xfrm_aevent_is_on passes the NULL pointer to netlink_has_listeners as socket.
A second problem is that the xfrm_nl pointer is not cleared when the socket
is releases at module unload time.

Protect references of xfrm_nl from outside of xfrm_user by RCU, check
that the socket is present in xfrm_aevent_is_on and set it to NULL
when unloading xfrm_user.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:40:54 -08:00
Rick Jones
15d99e02ba [TCP]: sysctl to allow TCP window > 32767 sans wscale
Back in the dark ages, we had to be conservative and only allow 15-bit
window fields if the window scale option was not negotiated.  Some
ancient stacks used a signed 16-bit quantity for the window field of
the TCP header and would get confused.

Those days are long gone, so we can use the full 16-bits by default
now.

There is a sysctl added so that we can still interact with such old
stacks

Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:40:29 -08:00
Neil Horman
abd596a4b6 [IPV4] ARP: Alloc acceptance of unsolicited ARP via netdevice sysctl.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:39:47 -08:00
Per Liden
87546b1c25 [TIPC]: Avoid compiler warning
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:38:33 -08:00
Per Liden
de70c5ba43 [TIPC]: Reduce stack usage
The node_map struct can be quite large (516 bytes) and allocating two of
them on the stack is not a good idea since we might only have a 4K stack
to start with.

Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:38:14 -08:00
Adrian Bunk
988f088a8e [TIPC]: Cleanups
This patch contains the following possible cleanups:
- make needlessly global code static
- #if 0 the following unused global functions:
  - name_table.c: tipc_nametbl_print()
  - name_table.c: tipc_nametbl_dump()
  - net.c: tipc_net_next_node()

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:37:52 -08:00
Per Liden
7c501a5960 [TIPC]: Remove unused functions
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:37:27 -08:00
Sam Ravnborg
05790c6456 [TIPC]: Remove inlines from *.c
With reference to latest discussions on linux-kernel with respect to
inline here is a patch for tipc to remove all inlines as used in
the .c files. See also chapter 14 in Documentation/CodingStyle.

Before:
   text        data     bss     dec     hex filename
 102990        5292    1752  110034   1add2 tipc.o

Now:
   text        data     bss     dec     hex filename
 101190        5292    1752  108234   1a6ca tipc.o

This is a nice text size reduction which will improve icache usage.
In some cases bigger (> 4 lines) functions where declared inline
and used in many places, they are most probarly no longer inlined by gcc
resulting in the size reduction.
There are several one liners that no longer are declared inline, but gcc
should inline these just fine without the inline hint.

With this patch applied one warning is added about an unused static
function - that was hidded by utilising inline before.
The function in question were kept so this patch is solely a
inline removal patch.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:37:04 -08:00
Sam Ravnborg
1fc54d8f49 [TIPC]: Fix simple sparse warnings
Tried to run the new tipc stack through sparse.
Following patch fixes all cases where 0 was used
as replacement of NULL.
Use NULL to document this is a pointer and to silence sparse.

This brough sparse warning count down with 127 to 24 warnings.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:36:47 -08:00
David S. Miller
edb2c34fb2 [NETFILTER]: Fix warnings in ip_nat_snmp_basic.c
net/ipv4/netfilter/ip_nat_snmp_basic.c: In function 'asn1_header_decode':
net/ipv4/netfilter/ip_nat_snmp_basic.c:248: warning: 'len' may be used uninitialized in this function
net/ipv4/netfilter/ip_nat_snmp_basic.c:248: warning: 'def' may be used uninitialized in this function
net/ipv4/netfilter/ip_nat_snmp_basic.c: In function 'snmp_translate':
net/ipv4/netfilter/ip_nat_snmp_basic.c:672: warning: 'l' may be used uninitialized in this function
net/ipv4/netfilter/ip_nat_snmp_basic.c:668: warning: 'type' may be used uninitialized in this function

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:36:21 -08:00
David S. Miller
fb9504964d [DCCP]: Fix uninitialized var warnings in dccp_parse_options().
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:36:01 -08:00
Ingo Molnar
57b47a53ec [NET]: sem2mutex part 2
Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:35:41 -08:00
Arjan van de Ven
4a3e2f711a [NET] sem2mutex: net/
Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:33:17 -08:00
Stephen Hemminger
1533306186 [NET]: dev_put/dev_hold cleanup
Get rid of the old __dev_put macro that is just a hold over from pre 2.6
kernel.  And turn dev_hold into an inline instead of a macro.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:32:28 -08:00
Arnaldo Carvalho de Melo
2d0817d11e [DCCP] options: Make dccp_insert_options & friends yell on error
And not the silly LIMIT_NETDEBUG and silently return without inserting
the option requested.

Also drop some old debugging messages associated to option insertion.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:32:06 -08:00
Arnaldo Carvalho de Melo
110bae4efb [DCCP]: Remove leftover dccp_send_response prototype
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:31:46 -08:00
Arnaldo Carvalho de Melo
c5fed1597e [DCCP]: ditch dccp_v[46]_ctl_send_ack
Merging it with its only user: dccp_v[46]_reqsk_send_ack.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:31:26 -08:00
Arnaldo Carvalho de Melo
118b2c9532 [DCCP]: Use sk->sk_prot->max_header consistently for non-data packets
Using this also provides opportunities for introducing
inet_csk_alloc_skb that would call alloc_skb, account it to the sock
and skb_reserve(max_header), but I'll leave this for later, for now
using sk_prot->max_header consistently is enough.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:31:09 -08:00
Arnaldo Carvalho de Melo
e5a6de915b [DCCP] options: Fix handling of ackvecs in DATA packets
I.e. they should be just ignored, but we have to use 'break', not 'continue',
as we have to possibly reset the mandatory flag.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:30:51 -08:00
David S. Miller
aa837b5bbd [ATM]: Fix build after neigh->parms->neigh_destructor change.
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:30:23 -08:00
Benjamin LaHaise
6cb153cab9 [NET]: use fget_light() in net/socket.c
Here's an updated copy of the patch to use fget_light in net/socket.c.
Rerunning the tests show a drop of ~80Mbit/s on average, which looks
bad until you see the drop in cpu usage from ~89% to ~82%.  That will
get fixed in another patch...

Before: max 8113.70, min 8026.32, avg 8072.34
 87380  16384  16384    10.01      8045.55   87.11    87.11    1.774   1.774
 87380  16384  16384    10.01      8065.14   90.86    90.86    1.846   1.846
 87380  16384  16384    10.00      8077.76   89.85    89.85    1.822   1.822
 87380  16384  16384    10.00      8026.32   89.80    89.80    1.833   1.833
 87380  16384  16384    10.01      8108.59   89.81    89.81    1.815   1.815
 87380  16384  16384    10.01      8034.53   89.01    89.01    1.815   1.815
 87380  16384  16384    10.00      8113.70   90.45    90.45    1.827   1.827
 87380  16384  16384    10.00      8111.37   89.90    89.90    1.816   1.816
 87380  16384  16384    10.01      8077.75   87.96    87.96    1.784   1.784
 87380  16384  16384    10.00      8062.70   90.25    90.25    1.834   1.834

After: max 8035.81, min 7963.69, avg 7998.14
 87380  16384  16384    10.01      8000.93   82.11    82.11    1.682   1.682
 87380  16384  16384    10.01      8016.17   83.67    83.67    1.710   1.710
 87380  16384  16384    10.01      7963.69   83.47    83.47    1.717   1.717
 87380  16384  16384    10.01      8014.35   81.71    81.71    1.671   1.671
 87380  16384  16384    10.00      7967.68   83.41    83.41    1.715   1.715
 87380  16384  16384    10.00      7995.22   81.00    81.00    1.660   1.660
 87380  16384  16384    10.00      8002.61   83.90    83.90    1.718   1.718
 87380  16384  16384    10.00      8035.81   81.71    81.71    1.666   1.666
 87380  16384  16384    10.01      8005.36   82.56    82.56    1.690   1.690
 87380  16384  16384    10.00      7979.61   82.50    82.50    1.694   1.694

Signed-off-by: Benjamin LaHaise <bcrl@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:27:12 -08:00
Stephen Hemminger
8aca8a27d9 [NET]: minor net_rx_action optimization
The functions list_del followed by list_add_tail is equivalent to the
existing inline list_move_tail. list_move_tail avoids unnecessary
_LIST_POISON.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:26:39 -08:00
Michael S. Tsirkin
c5ecd62c25 [NET]: Move destructor from neigh->ops to neigh_params
struct neigh_ops currently has a destructor field, which no in-kernel
drivers outside of infiniband use.  The infiniband/ulp/ipoib in-tree
driver stashes some info in the neighbour structure (the results of
the second-stage lookup from ARP results to real link-level path), and
it uses neigh->ops->destructor to get a callback so it can clean up
this extra info when a neighbour is freed.  We've run into problems
with this: since the destructor is in an ops field that is shared
between neighbours that may belong to different net devices, there's
no way to set/clear it safely.

The following patch moves this field to neigh_parms where it can be
safely set, together with its twin neigh_setup.  Two additional
patches in the patch series update ipoib to use this new interface.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:25:41 -08:00
Luiz Capitulino
53dcb0e38c [PKTGEN]: Updates version.
Due to the thread's lock changes, we're at a new version now.

Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:25:05 -08:00
Luiz Capitulino
6146e6a43b [PKTGEN]: Removes thread_{un,}lock() macros.
As suggested by Arnaldo, this patch replaces the
thread_lock()/thread_unlock() by directly calls to
mutex_lock()/mutex_unlock().

This change makes the code a bit more readable, and the direct calls
are used everywhere in the kernel.

Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:24:45 -08:00
Luiz Capitulino
222fa07665 [PKTGEN]: Convert thread lock to mutexes.
pktgen's thread semaphores are strict mutexes, convert them to the
mutex implementation.

Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:24:27 -08:00
Stephen Hemminger
6756ae4b4e [NET]: Convert RTNL to mutex.
This patch turns the RTNL from a semaphore to a new 2.6.16 mutex and
gets rid of some of the leftover legacy.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:23:58 -08:00
David S. Miller
253aa11578 [IPSEC] xfrm_user: Kill PAGE_SIZE check in verify_sec_ctx_len()
First, it warns when PAGE_SIZE >= 64K because the ctx_len
field is 16-bits.

Secondly, if there are any real length limitations it can
be verified by the security layer security_xfrm_state_alloc()
call.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:23:35 -08:00
Baruch Even
50bf3e224a [TCP] H-TCP: Better time accounting
Instead of estimating the time since the last congestion event, count
it directly.

Signed-off-by: Baruch Even <baruch@ev-en.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:23:10 -08:00
Baruch Even
0bc6d90b82 [TCP] H-TCP: Account for delayed-ACKs
Account for delayed-ACKs in H-TCP.

Delayed-ACKs cause H-TCP to be less aggressive than its design calls
for. It is especially true when the receiver is a Linux machine where
the average delayed ack is over 3 packets with values of 7 not unheard
of.

Signed-off-By: Baruch Even <baruch@ev-en.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:22:47 -08:00
Baruch Even
c33ad6e476 [TCP] H-TCP: Use msecs_to_jiffies
Use functions to calculate jiffies from milliseconds and not the old,
crude method of dividing HZ by a value. Ensures more accurate values
even in the face of strange HZ values.

Signed-off-By: Baruch Even <baruch@ev-en.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:22:20 -08:00
Luiz Capitulino
65a3980e6b [PKTGEN]: Updates version.
With all the previous changes, we're at a new version now.

Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:18:31 -08:00
Luiz Capitulino
c26a80168f [PKTGEN]: Ports if_list to the in-kernel implementation.
This patch ports the per-thread interface list list to the in-kernel
linked list implementation. In the general, the resulting code is a
bit simpler.

Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:18:16 -08:00
Luiz Capitulino
8024bb2454 [PKTGEN]: Fix Initialization fail leak.
Even if pktgen's thread initialization fails for all CPUs, the module
will be successfully loaded.

This patch changes that behaivor, by returning an error on module load time,
and also freeing all the resources allocated. It also prints a warning if a
thread initialization has failed.

Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:17:55 -08:00
Luiz Capitulino
12e1872328 [PKTGEN]: Fix kernel_thread() fail leak.
Free all the alocated resources if kernel_thread() call fails.

Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:17:00 -08:00
Luiz Capitulino
cdcdbe0b17 [PKTGEN]: Ports thread list to Kernel list implementation.
The final result is a simpler and smaller code.

Note that I'm adding a new member in the struct pktgen_thread called
'removed'. The reason is that I didn't find a better wait condition to
be used in the place of the replaced one.

Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:16:40 -08:00
Luiz Capitulino
222f180658 [PKTGEN]: Lindent run.
Lindet run, with some fixes made by hand.

Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:16:13 -08:00
Arnaldo Carvalho de Melo
6df9424a9c [DCCP] options: Fix some aspects of mandatory option processing
According to dccp draft (draft-ietf-dccp-spec-13.txt) section 5.8.2
(Mandatory Option) the following patch correct the handling of the
following cases:

1) "... and any Mandatory options received on DCCP-Data packets MUST be
  ignored."

2) "The connection is in error and should be reset with Reset Code 5, ...
  if option O is absent (Mandatory was the last byte of the option list), or
  if option O equals Mandatory."

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:06:02 -08:00
Arnaldo Carvalho de Melo
c0c736db7e [DCCP] ccid2: coding style cleanups
No changes in the logic where made.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:05:37 -08:00
Arnaldo Carvalho de Melo
45329e71ee [DCCP] ipv6: cleanups
No changes in the logic were made, just removing trailing whitespaces,
etc.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:01:29 -08:00
Arnaldo Carvalho de Melo
c4d9390941 [ICSK]: Introduce inet_csk_ctl_sock_create
Consolidating open coded sequences in tcp and dccp, v4 and v6.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:01:03 -08:00
Arnaldo Carvalho de Melo
7247887357 [DCCP] ipv6: Add missing ipv6 control socket
I guess I forgot to add it, nah, now it just works:

18:04:33.274066 IP6 ::1.1476 > ::1.5001: request (service=0)
18:04:33.334482 IP6 ::1.5001 > ::1.1476: reset (code=bad_service_code)

Ditched IP_DCCP_UNLOAD_HACK, as now we would have to do it for both
IPv6 and IPv4, so I'll come up with another way for freeing the
control sockets in upcoming changesets.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 22:00:37 -08:00
Arnaldo Carvalho de Melo
c25a18ba34 [DCCP]: Uninline some functions
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 21:58:56 -08:00
Adrian Bunk
5e0817f84c [DCCP] ipv4: make struct dccp_v4_prot static
There's no reason for struct dccp_v4_prot being global.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 21:58:29 -08:00
David S. Miller
d76e60a5b5 [IPV6]: Fix some code/comment formatting in ip6_dst_output().
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 21:35:50 -08:00
Robert Olsson
06ef921d60 [IPV4]: fib_trie stats fix
fib_triestats has been buggy and caused oopses some platforms as
openwrt.  The patch below should cure those problems.

Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 21:35:01 -08:00
Robert Olsson
5ddf0eb2bf [IPV4]: fib_trie initialzation fix
In some kernel configs /proc functions seems to be accessed before the
trie is initialized. The patch below checks for this.

Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 21:34:12 -08:00
John Heffner
0e7b13685f [TCP] mtu probing: move tcp-specific data out of inet_connection_sock
This moves some TCP-specific MTU probing state out of
inet_connection_sock back to tcp_sock.

Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 21:32:58 -08:00
Benjamin LaHaise
e9df7d7f58 [AF_UNIX]: use shift instead of integer division
The patch below replaces a divide by 2 with a shift -- sk_sndbuf is an
integer, so gcc emits an idiv, which takes 10x longer than a shift by 1.
This improves af_unix bandwidth by ~6-10K/s.  Also, tidy up the comment
to fit in 80 columns while we're at it.

Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 21:29:05 -08:00
Jrn Engel
231d06ae82 [NET]: Uninline kfree_skb and allow NULL argument
o Uninline kfree_skb, which saves some 15k of object code on my notebook.

o Allow kfree_skb to be called with a NULL argument.

  Subsequent patches can remove conditional from drivers and further
  reduce source and object size.

Signed-off-by: Jrn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 21:28:35 -08:00
Arnaldo Carvalho de Melo
2e1f47c74c [LLC]: Fix sap refcounting
Thanks to Leslie Harlley Watter <leslie@watter.org> for reporting the
problem an testing this patch.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 21:28:11 -08:00
Arnaldo Carvalho de Melo
2342c990bb [LLC]: Replace __inline__ with inline
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 21:27:43 -08:00
Arnaldo Carvalho de Melo
9c005e018c [LLC]: Fix struct proto .name
Cut'n'paste error from ddp_proto.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 21:27:23 -08:00
Arthur Kepner
95ed63f791 [NET] pktgen: Fix races between control/worker threads.
There's a race in pktgen which can lead to a double
free of a pktgen_dev's skb. If a worker thread is in
the midst of doing fill_packet(), and the controlling
thread gets a "stop" message, the already freed skb
can be freed once again in pktgen_stop_device(). This
patch gives all responsibility for cleaning up a
pktgen_dev's skb to the associated worker thread.

Signed-off-by: Arthur Kepner <akepner@sgi.com>
Acked-by: Robert Olsson <Robert.Olsson@data.slu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 21:26:56 -08:00
Arnaldo Carvalho de Melo
b61fafc4ef [DCCP]: Move the IPv4 specific bits from proto.c to ipv4.c
With this patch in place we can break down the complexity by better
compartmentalizing the code that is common to ipv6 and ipv4.

Now we have these modules:
Module                  Size  Used by
dccp_diag               1344  0
inet_diag               9448  1 dccp_diag
dccp_ccid3             15856  0
dccp_tfrc_lib          12320  1 dccp_ccid3
dccp_ccid2              5764  0
dccp_ipv4              16996  2
dccp                   48208  4 dccp_diag,dccp_ccid3,dccp_ccid2,dccp_ipv4

dccp_ipv6 still requires dccp_ipv4 due to dccp_ipv6_mapped, that is
the next target to work on the "hey, ipv4 is legacy, I only want ipv6
dude!" direction.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 21:25:11 -08:00
Arnaldo Carvalho de Melo
46f09ffa7d [DCCP]: Rename init_dccp_v4_mibs to dccp_mib_init
And introduce dccp_mib_exit grouping previously open coded sequence.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 21:24:42 -08:00
Arnaldo Carvalho de Melo
075ae86611 [DCCP]: Move dccp_hashinfo from ipv4.c to the core
As it is used by both ipv4 and ipv6.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 21:24:19 -08:00
Arnaldo Carvalho de Melo
0a1ec676dd [DCCP]: Dont use dccp_v4_checksum in dccp_make_response
dccp_make_response is shared by ipv4/6 and the ipv6 code was
recalculating the checksum, not good, so move the dccp_v4_checksum
call to dccp_v4_send_response.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 21:23:59 -08:00
Arnaldo Carvalho de Melo
c985ed705f [DCCP]: Move dccp_[un]hash from ipv4.c to the core
As this is used by both ipv4 and ipv6 and is not ipv4 specific.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 21:23:39 -08:00
Arnaldo Carvalho de Melo
3e0fadc51f [DCCP]: Move dccp_v4_{init,destroy}_sock to the core
Removing one more ipv6 uses ipv4 stuff case in dccp land.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 21:23:15 -08:00
J. Bruce Fields
0e19c1ea2f SUNRPC,RPCSEC_GSS: spkm3: import contexts using NID_cast5_cbc
Import the NID_cast5_cbc from the userland context. Not used.

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 23:24:40 -05:00
J. Bruce Fields
eaa82edf20 SUNRPC,RPCSEC_GSS: fix krb5 sequence numbers.
Use a spinlock to ensure unique sequence numbers when creating krb5 gss tokens.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 23:24:04 -05:00
J. Bruce Fields
9e57b302cf SUNRPC,RPCSEC_GSS: remove unnecessary kmalloc of a checksum
Remove unnecessary kmalloc of temporary space to hold the md5 result; it's
small enough to just put on the stack.

This code may be called to process rpc's necessary to perform writes, so
there's a potential deadlock whenever we kmalloc() here.  After this a
couple kmalloc()'s still remain, to be removed soon.

This also fixes a rare double-free on error noticed by coverity.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 23:23:11 -05:00
Arnaldo Carvalho de Melo
017487d7d1 [DCCP]: Generalize dccp_v4_send_reset
Renaming it to dccp_send_reset and moving it from the ipv4 specific
code to the core dccp code.

This fixes some bugs in IPV6 where timers would send v4 resets, etc.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:25:24 -08:00
Arnaldo Carvalho de Melo
e55d912f5b [DCCP] feat: Introduce sysctls for the default features
[root@qemu ~]# for a in /proc/sys/net/dccp/default/* ; do echo $a ; cat $a ; done
/proc/sys/net/dccp/default/ack_ratio
2
/proc/sys/net/dccp/default/rx_ccid
3
/proc/sys/net/dccp/default/send_ackvec
1
/proc/sys/net/dccp/default/send_ndp
1
/proc/sys/net/dccp/default/seq_window
100
/proc/sys/net/dccp/default/tx_ccid
3
[root@qemu ~]#

So if wanting to test ccid3 as the tx CCID one can just do:

[root@qemu ~]# echo 3 > /proc/sys/net/dccp/default/tx_ccid
[root@qemu ~]# echo 2 > /proc/sys/net/dccp/default/rx_ccid
[root@qemu ~]# cat /proc/sys/net/dccp/default/[tr]x_ccid
2
3
[root@qemu ~]#

Of course we also need the setsockopt for each app to tell its preferences, but
for testing or defining something other than CCID2 as the default for apps that
don't explicitely set their preference the sysctl interface is handy.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:25:02 -08:00
Arnaldo Carvalho de Melo
04e2661e9c [DCCP]: Call dccp_feat_init more early in dccp_v4_init_sock
So that dccp_feat_clean doesn't get confused with uninitialized
list_heads.

Noticed when testing with no ccid kernel modules.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:24:41 -08:00
Arnaldo Carvalho de Melo
057fc6755a [DCCP]: Kconfig tidy up
Make CCID2 and CCID3 default to what was selected for DCCP and use the
standard short description for the CCIDs (TCP-Like & TCP-Friendly).

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:24:22 -08:00
Andrea Bittau
60fe62e789 [DCCP]: sparse endianness annotations
This also fixes the layout of dccp_hdr short sequence numbers, problem
was not fatal now as we only support long (48 bits) sequence numbers.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:23:32 -08:00
Patrick McHardy
a193a4abdd [NETFILTER]: Fix skb->nf_bridge lifetime issues
The bridge netfilter code simulates the NF_IP_PRE_ROUTING hook and skips
the real hook by registering with high priority and returning NF_STOP if
skb->nf_bridge is present and the BRNF_NF_BRIDGE_PREROUTING flag is not
set. The flag is only set during the simulated hook.

Because skb->nf_bridge is only freed when the packet is destroyed, the
packet will not only skip the first invocation of NF_IP_PRE_ROUTING, but
in the case of tunnel devices on top of the bridge also all further ones.
Forwarded packets from a bridge encapsulated by a tunnel device and sent
as locally outgoing packet will also still have the incorrect bridge
information from the input path attached.

We already have nf_reset calls on all RX/TX paths of tunnel devices,
so simply reset the nf_bridge field there too. As an added bonus,
the bridge information for locally delivered packets is now also freed
when the packet is queued to a socket.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:23:05 -08:00
Andrea Bittau
6ffd30fbbb [DCCP] feat: Actually change the CCID upon negotiation
Change the CCID upon successful feature negotiation.

Commiter note: patch mostly rewritten to use the new ccid API.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:22:37 -08:00
Arnaldo Carvalho de Melo
91f0ebf7b6 [DCCP] CCID: Improve CCID infrastructure
1. No need for ->ccid_init nor ->ccid_exit, this is what module_{init,exit}
   does and anynways neither ccid2 nor ccid3 were using it.

2. Rename struct ccid to struct ccid_operations and introduce struct ccid
   with a pointer to ccid_operations and rigth after it the rx or tx
   private state.

3. Remove the pointer to the state of the half connections from struct
   dccp_sock, now its derived thru ccid_priv() from the ccid pointer.

Now we also can implement the setsockopt for changing the CCID easily as
no ccid init routines can affect struct dccp_sock in any way that prevents
other CCIDs from working if a CCID switch operation is asked by apps.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:21:44 -08:00
Patrick McHardy
f38c39d6ce [PKT_SCHED]: Convert sch_red to a classful qdisc
Convert sch_red to a classful qdisc. All qdiscs that maintain accurate
backlog counters are eligible as child qdiscs. When a queue limit larger
than zero is given, a bfifo qdisc is used for backwards compatibility.
Current versions of tc enforce a limit larger than zero, other users
can avoid creating the default qdisc by using zero.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:20:44 -08:00
David S. Miller
a70fcb0ba3 [XFRM]: Add some missing exports.
To fix the case of modular xfrm_user.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:18:52 -08:00
David S. Miller
ee857a7d67 [XFRM]: Move xfrm_nl to xfrm_state.c from xfrm_user.c
xfrm_user could be modular, and since generic code uses this symbol
now...

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:18:37 -08:00
David S. Miller
0ac8475248 [XFRM]: Make sure xfrm_replay_timer_handler() is declared early enough.
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:18:23 -08:00
Jamal Hadi Salim
6c5c8ca7ff [IPSEC]: Sync series - policy expires
This is similar to the SA expire insertion patch - only it inserts
expires for SP.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:17:25 -08:00
Jamal Hadi Salim
53bc6b4d29 [IPSEC]: Sync series - SA expires
This patch allows a user to insert SA expires. This is useful to
do on an HA backup for the case of byte counts but may not be very
useful for the case of time based expiry.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:17:03 -08:00
Jamal Hadi Salim
980ebd2579 [IPSEC]: Sync series - acquire insert
This introduces a feature similar to the one described in RFC 2367:
"
   ... the application needing an SA sends a PF_KEY
   SADB_ACQUIRE message down to the Key Engine, which then either
   returns an error or sends a similar SADB_ACQUIRE message up to one or
   more key management applications capable of creating such SAs.
   ...
   ...
   The third is where an application-layer consumer of security
   associations (e.g.  an OSPFv2 or RIPv2 daemon) needs a security
   association.

        Send an SADB_ACQUIRE message from a user process to the kernel.

        <base, address(SD), (address(P),) (identity(SD),) (sensitivity,)
          proposal>

        The kernel returns an SADB_ACQUIRE message to registered
          sockets.

        <base, address(SD), (address(P),) (identity(SD),) (sensitivity,)
          proposal>

        The user-level consumer waits for an SADB_UPDATE or SADB_ADD
        message for its particular type, and then can use that
        association by using SADB_GET messages.

 "
An app such as OSPF could then use ipsec KM to get keys

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:16:40 -08:00
Jamal Hadi Salim
d51d081d65 [IPSEC]: Sync series - user
Add xfrm as the user of the core changes

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:16:12 -08:00
Jamal Hadi Salim
9500e8a81f [IPSEC]: Sync series - fast path
Fast path sequence updates that will generate ipsec async
events

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:15:29 -08:00
Jamal Hadi Salim
f8cd54884e [IPSEC]: Sync series - core changes
This patch provides the core functionality needed for sync events
for ipsec. Derived work of Krisztian KOVACS <hidden@balabit.hu>

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:15:11 -08:00
Patrick McHardy
f5539eb8ca [PKT_SCHED]: Keep backlog counter in sch_sfq
Keep backlog counter in SFQ qdisc to make it usable as child qdisc
with RED.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:01:38 -08:00
Patrick McHardy
053cfed75d [PKT_SCHED]: Restore TBF change semantic
When TBF was converted to a classful qdisc, the semantic of the limit
parameter was broken. On initilization an inner bfifo qdisc is created
for backwards compatibility, when changing parameters however the new
limit is ignored and the current child qdisc remains in place.

Always replace the child qdisc by the default bfifo when limit is above
zero, otherwise don't touch the inner qdisc. Current tc version enforce
a limit above zero, other users can avoid creating the inner qdisc by
using zero.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:01:21 -08:00
Patrick McHardy
cdc7f8e362 [PKT_SCHED]: Dump child qdisc handle in sch_{atm,dsmark}
A qdisc should set tcm_info to the child qdisc handle in its class
dump function.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:01:06 -08:00
Patrick McHardy
6d037a26f0 [PKT_SCHED]: Qdisc drop operation is optional
The drop operation is optional and qdiscs must check if childs support it.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 19:00:49 -08:00
Patrick McHardy
4277a083ec [NETLINK]: Add netlink_has_listeners for avoiding unneccessary event message generation
Keep a bitmask of multicast groups with subscribed listeners to let
netlink users check for listeners before generating multicast
messages.

Queries don't perform any locking, which may result in false
positives, it is guaranteed however that any new subscriptions are
visible before bind() or setsockopt() return.

Signed-off-by: Patrick McHardy <kaber@trash.net>
ACKed-by: Jamal Hadi Salim<hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 18:52:01 -08:00
Patrick McHardy
a242769248 [NETFILTER]: ctnetlink: avoid unneccessary event message generation
Avoid unneccessary event message generation by checking for netlink
listeners before building a message.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 18:03:59 -08:00
Patrick McHardy
c4b8851392 [NETFILTER]: x_tables: replace IPv4/IPv6 policy match by address family independant version
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 18:03:40 -08:00
Patrick McHardy
f2ffd9eeda [NETFILTER]: Move ip6_masked_addrcmp to include/net/ipv6.h
Replace netfilter's ip6_masked_addrcmp by a more efficient version
in include/net/ipv6.h to make it usable without module dependencies.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 18:03:16 -08:00
Patrick McHardy
c498673474 [NETFILTER]: x_tables: add xt_{match,target} arguments to match/target functions
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 18:02:56 -08:00
Patrick McHardy
1c524830d0 [NETFILTER]: x_tables: pass registered match/target data to match/target functions
This allows to make decisions based on the revision (and address family
with a follow-up patch) at runtime.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 18:02:15 -08:00
Patrick McHardy
5d04bff096 [NETFILTER]: Convert x_tables matches/targets to centralized error checking
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 18:01:58 -08:00
Patrick McHardy
7f9397138e [NETFILTER]: Convert ip6_tables matches/targets to centralized error checking
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 18:01:43 -08:00
Patrick McHardy
aa83c1ab43 [NETFILTER]: Convert arp_tables targets to centralized error checking
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 18:01:28 -08:00
Patrick McHardy
1d5cd90976 [NETFILTER]: Convert ip_tables matches/targets to centralized error checking
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 18:01:14 -08:00
Patrick McHardy
3cdc7c953e [NETFILTER]: Change {ip,ip6,arp}_tables to use centralized error checking
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 18:00:36 -08:00
Patrick McHardy
37f9f7334b [NETFILTER]: xt_tables: add centralized error checking
Introduce new functions for common match/target checks (private data
size, valid hooks, valid tables and valid protocols) to get more consistent
error reporting and to avoid each module duplicating them.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:59:06 -08:00
Yasuyuki Kozakai
6ea46c9c12 [NETFILTER]: nf_conntrack: use ipv6_addr_equal in nf_ct_reasm
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:58:44 -08:00
Holger Eitzenberger
f2ad52c9da [NETFILTER]: Fix CID offset bug in PPTP NAT helper debug message
The recent (kernel 2.6.15.1) fix for PPTP NAT helper introduced a
bug - which only appears if DEBUGP is enabled though.

The calculation of the CID offset into a PPTP request struct is
not correct, so that at least not the correct CID is displayed
if DEBUGP is enabled.

This patch corrects CID offset calculation and introduces a #define
for that.

Signed-off-by: Holger Eitzenberger <heitzenberger@astaro.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:58:21 -08:00
Andrea Bittau
77ff72d528 [DCCP] CCID2: Drop sock reference count on timer expiration and reset.
There was a hybrid use of standard timers and sk_timers.  This caused
the reference count of the sock to be incorrect when resetting the RTO
timer.  The sock reference count should now be correct, enabling its
destruction, and allowing the DCCP module to be unloaded.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-03-20 17:57:52 -08:00
Harald Welte
dc808fe28d [NETFILTER] nf_conntrack: clean up to reduce size of 'struct nf_conn'
This patch moves all helper related data fields of 'struct nf_conn'
into a separate structure 'struct nf_conn_help'.  This new structure
is only present in conntrack entries for which we actually have a
helper loaded.

Also, this patch cleans up the nf_conntrack 'features' mechanism to
resemble what the original idea was: Just glue the feature-specific
data structures at the end of 'struct nf_conn', and explicitly
re-calculate the pointer to it when needed rather than keeping
pointers around.

Saves 20 bytes per conntrack on my x86_64 box. A non-helped conntrack
is 276 bytes. We still need to save another 20 bytes in order to fit
into to target of 256bytes.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:56:32 -08:00
John Heffner
5d424d5a67 [TCP]: MTU probing
Implementation of packetization layer path mtu discovery for TCP, based on
the internet-draft currently found at
<http://www.ietf.org/internet-drafts/draft-ietf-pmtud-method-05.txt>.

Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:53:41 -08:00
Adrian Bunk
d15150f755 [IPV4] fib_rules.c: make struct fib_rules static again
struct fib_rules became global for no good reason.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:46:56 -08:00
Jesper Juhl
2b191befe2 [IPCOMP6]: don't check vfree() argument for NULL.
vfree does it's own NULL checking, so checking a pointer before
handing it to vfree is pointless.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:46:29 -08:00
Andrea Bittau
afe00251dd [DCCP]: Initial feature negotiation implementation
Still needs more work, but boots and doesn't crashes, even
does some negotiation!

18:38:52.174934  127.0.0.1.43458 > 127.0.0.1.5001: request <change_l ack_ratio 2, change_r ccid 2, change_l ccid 2>
18:38:52.218526  127.0.0.1.5001 > 127.0.0.1.43458: response <nop, nop, change_l ack_ratio 2, confirm_r ccid 2 2, confirm_l ccid 2 2, confirm_r ack_ratio 2>
18:38:52.185398  127.0.0.1.43458 > 127.0.0.1.5001: <nop, confirm_r ack_ratio 2, ack_vector0 0x00, elapsed_time 212>

:-)

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:43:56 -08:00
Andrea Bittau
2a91aa3967 [DCCP] CCID2: Initial CCID2 (TCP-Like) implementation
Original work by Andrea Bittau, Arnaldo Melo cleaned up and fixed several
issues on the merge process.

For now CCID2 was turned the default for all SOCK_DCCP connections, but this
will be remedied soon with the merge of the feature negotiation code.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:41:47 -08:00
Arnaldo Carvalho de Melo
aa5d7df3b2 [DCCP] CCID3: Set the no_feedback_timer fields near init_timer
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:35:13 -08:00
Arnaldo Carvalho de Melo
9833d6da00 [DCCP]: Don't alloc ack vector for the control sock
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:34:53 -08:00
Arnaldo Carvalho de Melo
d5e9b2c737 [DCCP] ackvec: Delete all the ack vector records in dccp_ackvec_free
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:20:46 -08:00
Arnaldo Carvalho de Melo
411447019a [DCCP] CCID: Allow ccid_{init,exit} to be NULL
Testing if the ccid being instantiated has these methods in
ccid_init().

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:20:23 -08:00
Andrea Bittau
02bcf28c82 [DCCP] ackvec: Introduce ack vector records
Based on a patch by Andrea Bittau.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:19:55 -08:00
Robert Olsson
7b204afd45 [IPV4]: Use RCU locking in fib_rules.
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:18:53 -08:00
Arnaldo Carvalho de Melo
9b07ef5dda [DCCP] ackvec: Introduce dccp_ackvec_slab
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:16:17 -08:00
Arnaldo Carvalho de Melo
fa23e2ecd3 [DCCP]: Fix error handling in dccp_init
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:16:01 -08:00
Arnaldo Carvalho de Melo
7400d78110 [DCCP] ackvec: Ditch dccpav_buf_len
Simplifying the code a bit as we're always using DCCP_MAX_ACKVEC_LEN.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:15:42 -08:00
Harald Welte
0af5f6c1eb [NETFILTER] nfnetlink_log: add sequence numbers for log events
By using a sequence number for every logged netfilter event, we can
determine from userspace whether logging information was lots somewhere
downstream.

The user has a choice of either having per-instance local sequence
counters, or using a global sequence counter, or both.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:15:11 -08:00
David S. Miller
39d8c1b6fb [NET]: Do not lose accepted socket when -ENFILE/-EMFILE.
Try to allocate the struct file and an unused file
descriptor before we try to pull a newly accepted
socket out of the protocol layer.

Based upon a patch by Prassana Meda.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:13:49 -08:00
Stefan Rompf
ddd7bf9fe4 [VLAN]: translate IF_OPER_DORMANT to netif_dormant_on()
this patch adds support to the VLAN driver to translate IF_OPER_DORMANT of the
underlying device to netif_dormant_on(). Beside clean state forwarding, this
allows running independant userspace supplicants on both the real device and
the stacked VLAN. It depends on my RFC2863 patch.

Signed-off-by: Stefan Rompf <stefan@loplof.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:11:41 -08:00
Stefan Rompf
b00055aacd [NET] core: add RFC2863 operstate
this patch adds a dormant flag to network devices, RFC2863 operstate derived
from these flags and possibility for userspace interaction. It allows drivers
to signal that a device is unusable for user traffic without disabling
queueing (and therefore the possibility for protocol establishment traffic to
flow) and a userspace supplicant (WPA, 802.1X) to mark a device unusable
without changes to the driver.

It is the result of our long discussion. However I must admit that it
represents what Jamal and I agreed on with compromises towards Krzysztof, but
Thomas and Krzysztof still disagree with some parts. Anyway I think it should
be applied.

Signed-off-by: Stefan Rompf <stefan@loplof.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:09:11 -08:00
YOSHIFUJI Hideaki
e843b9e1be [IPV6]: ROUTE: Ensure to accept redirects from nexthop for the target.
It is possible to get redirects from nexthop of "more-specific"
routes.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:07:49 -08:00
YOSHIFUJI Hideaki
09c884d4c3 [IPV6]: ROUTE: Add accept_ra_rt_info_max_plen sysctl.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:07:03 -08:00
YOSHIFUJI Hideaki
e317da9622 [IPV6]: ROUTE: Flag RTF_DEFAULT for Route Infomation for ::/0.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:06:42 -08:00
YOSHIFUJI Hideaki
70ceb4f539 [IPV6]: ROUTE: Add experimental support for Route Information Option in RA (RFC4191).
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:06:24 -08:00
YOSHIFUJI Hideaki
52e1635631 [IPV6]: ROUTE: Add router_probe_interval sysctl.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:05:47 -08:00
YOSHIFUJI Hideaki
930d6ff2e2 [IPV6]: ROUTE: Add accept_ra_rtr_pref sysctl.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:05:30 -08:00
YOSHIFUJI Hideaki
270972554c [IPV6]: ROUTE: Add Router Reachability Probing (RFC4191).
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:05:13 -08:00
YOSHIFUJI Hideaki
ebacaaa0fd [IPV6]: ROUTE: Add support for Router Preference (RFC4191).
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:04:53 -08:00
YOSHIFUJI Hideaki
8238dd0698 [IPV6]: ROUTE: Handle finding the next best route in reachability in BACKTRACK().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:04:35 -08:00
YOSHIFUJI Hideaki
bb133964e0 [IPV6]: ROUTE: Try finding the next best route.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:01:43 -08:00
YOSHIFUJI Hideaki
1ddef044ed [IPV6]: ROUTE: Clean up rt6_select() code path in ip6_route_{intput,output}().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:01:24 -08:00
YOSHIFUJI Hideaki
118f8c1654 [IPV6]: ROUTE: Try selecting better route for non-default routes as well.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:01:06 -08:00
YOSHIFUJI Hideaki
045927ff84 [IPV6]: ROUTE: More strict check for default routers in rt6_get_dflt_router().
Check RTF_ADDRCONF|RTF_DEFAULT in rt6_get_dflt_router().

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:00:48 -08:00
YOSHIFUJI Hideaki
554cfb7ee5 [IPV6]: ROUTE: Eliminate lock for default route pointer.
And prepare for more advanced router selection.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:00:26 -08:00
YOSHIFUJI Hideaki
519fbd8715 [IPV6]: ROUTE: Clean-up cow'ing in ip6_route_{intput,output}().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 17:00:05 -08:00
YOSHIFUJI Hideaki
e40cf3533c [IPV6]: ROUTE: Convert rt6_cow() to rt6_alloc_cow().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 16:59:27 -08:00
YOSHIFUJI Hideaki
fb9de91ea8 [IPV6]: ROUTE: Clean up reference counting / unlocking for returning object.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 16:59:08 -08:00
YOSHIFUJI Hideaki
d5315b500b [IPV6]: ROUTE: Unify two code paths for pmtu disc.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 16:58:48 -08:00
YOSHIFUJI Hideaki
299d993908 [IPV6]: ROUTE: Add rt6_alloc_clone() for cloning route allocation.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 16:58:32 -08:00
YOSHIFUJI Hideaki
76f9edd17d [IPV6]: ROUTE: Copy u.dst.error for RTF_REJECT routes when cloning.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 16:56:50 -08:00
YOSHIFUJI Hideaki
a1e783634a [IPV6]: ROUTE: Set appropriate information before inserting a route.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 16:56:32 -08:00
YOSHIFUJI Hideaki
95a9a5ba02 [IPV6]: ROUTE: Split up rt6_cow() for future changes.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 16:55:51 -08:00
YOSHIFUJI Hideaki
c4fd30eb18 [IPV6]: ADDRCONF: Add accept_ra_pinfo sysctl.
This controls whether we accept Prefix Information in RAs.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 16:55:26 -08:00
YOSHIFUJI Hideaki
65f5c7c114 [IPV6]: ROUTE: Add accept_ra_defrtr sysctl.
This controls whether we accept default router information
in RAs.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 16:55:08 -08:00
YOSHIFUJI Hideaki
073a8e0e15 [IPV6]: ADDRCONF: Split up ipv6_generate_eui64() by device type.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 16:54:49 -08:00
YOSHIFUJI Hideaki
955189efb4 [IPV6]: ADDRCONF: Use our standard algorithm for randomized ifid.
RFC 3041 describes an algorithm to generate random interface
identifier.  In RFC 3041bis, it is allowed to use different
algorithm than one described in RFC 3041.

So, let's use our standard pseudo random algorithm to simplify
our implementation.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 16:54:09 -08:00
YOSHIFUJI Hideaki
955aaa2fe3 [NET]: NEIGHBOUR: Ensure to record time to neigh->updated when neighbour's state changed.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 16:52:52 -08:00
YOSHIFUJI Hideaki
74a3a0ed90 [IPV6]: TUNNEL6: Don't try to add multicast route twice.
Since addrconf_add_dev() has already called addrconf_add_mroute()
to added route for multicast prefix, there's no point to call it
again in addrconf_ip6_tnl_config().

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20 16:51:48 -08:00
Trond Myklebust
7a1218a277 SUNRPC: Ensure rpc_call_async() always calls tk_ops->rpc_release()
Currently this will not happen if we exit before rpc_new_task() was called.
Also fix up rpc_run_task() to do the same (for consistency).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 18:11:10 -05:00
Steve Grubb
5bdb988680 [PATCH] promiscuous mode
Hi,

When a network interface goes into promiscuous mode, its an important security
issue. The attached patch is intended to capture that action and send an
event to the audit system.

The patch carves out a new block of numbers for kernel detected anomalies.
These are events that may indicate suspicious activity. Other examples of
potential kernel anomalies would be: exceeding disk quota, rlimit violations,
changes to syscall entry table.

Signed-off-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-20 14:08:55 -05:00
Trond Myklebust
43ac3f2961 SUNRPC: Fix memory barriers for req->rq_received
We need to ensure that all writes to the XDR buffers are done before
req->rq_received is visible to other processors.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:51 -05:00
Trond Myklebust
5428154827 SUNRPC: Fix a 'Busy inodes' error in rpc_pipefs
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:49 -05:00
Chuck Lever
5eb53f41d1 SUNRPC: fix compile warnings on 64-bit platforms
Introduced by NFS metrics patch.

Test plan:
Compile kernel with CONFIG_NFS enabled on a 64-bit platform.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:42 -05:00
Chuck Lever
e95b85ec9d SUNRPC: minor cleanup
RPC_DEBUG_DATA no longer needed in net/sunrpc/xprt.c.

Test plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:23 -05:00
Chuck Lever
dead28da8e SUNRPC: eliminate rpc_call()
Clean-up: replace rpc_call() helper with direct call to rpc_call_sync.

This makes NFSv2 and NFSv3 synchronous calls more computationally
efficient, and reduces stack consumption in functions that used to
invoke rpc_call more than once.

Test plan:
Compile kernel with CONFIG_NFS enabled.  Connectathon on NFS version 2,
version 3, and version 4 mount points.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:23 -05:00
Chuck Lever
cc0175c1dc SUNRPC: display human-readable procedure name in rpc_iostats output
Add fields to the rpc_procinfo struct that allow the display of a
human-readable name for each procedure in the rpc_iostats output.

Also fix it so that the NFSv4 stats are broken up correctly by
sub-procedure number.  NFSv4 uses only two real RPC procedures:
NULL, and COMPOUND.

Test plan:
Mount with NFSv2, NFSv3, and NFSv4, and do "cat /proc/self/mountstats".

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:22 -05:00
Chuck Lever
11c556b3d8 SUNRPC: provide a mechanism for collecting stats in the RPC client
Add a simple mechanism for collecting stats in the RPC client.  Stats are
tabulated during xprt_release.  Note that per_cpu shenanigans are not
required here because the RPC client already serializes on the transport
write lock.

Test plan:
Compile kernel with CONFIG_NFS enabled.  Basic performance regression
testing with high-speed networking and high performance server.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:22 -05:00
Chuck Lever
ef759a2e54 SUNRPC: introduce per-task RPC iostats
Account for various things that occur while an RPC task is executed.
Separate timers for RPC round trip and RPC execution time show how
long RPC requests wait in queue before being sent.  Eventually these
will be accumulated at xprt_release time in one place where they can
be viewed from userland.

Test plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:17 -05:00
Chuck Lever
262ca07de4 SUNRPC: add a handful of per-xprt counters
Monitor generic transport events.  Add a transport switch callout to
format transport counters for export to user-land.

Test plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:16 -05:00
Chuck Lever
e19b63dafd SUNRPC: track length of RPC wait queues
RPC wait queue length will eventually be exported to userland via the RPC
iostats interface.

Test plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:15 -05:00
Levent Serinol
1356b8c28d SUNRPC: more verbose output for rpc auth weak error
This patch adds server ip address to be printed out when "server
requires stronger authentication" error occured.

Signed-off-by: Levent Serinol <lserinol@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:11 -05:00
Trond Myklebust
12de3b35ea SUNRPC: Ensure that rpc_mkpipe returns a refcounted dentry
If not, we cannot guarantee that idmap->idmap_dentry, gss_auth->dentry and
clnt->cl_dentry are valid dentries.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:09 -05:00
Trond Myklebust
24c5d9d7ea SUNRPC: Run rpci->queue_timeout on the rpciod workqueue instead of generic
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:08 -05:00
Olaf Kirch
f344f6df4b SUNRPC: Auto-load RPC authentication kernel modules
This patch adds a request_module call to rpcauth_create which will try
to auto-load the kernel module for the requested authentication flavor.
For kernels with modular sunrpc, this reduces the admin overhead for
the user.

Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:08 -05:00
Jeff Garzik
2e9ff56efb Merge branch 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2006-03-20 04:38:50 -05:00
Jeff Garzik
d378aca6ec Merge branch 'master' 2006-03-20 04:38:03 -05:00
Ralf Baechle DL5RB
c7c694d196 [AX.25]: Fix potencial memory hole.
If the AX.25 dialect chosen by the sysadmin is set to DAMA master / 3
(or DAMA slave / 2, if CONFIG_AX25_DAMA_SLAVE=n) ax25_kick() will fall
through the switch statement without calling ax25_send_iframe() or any
other function that would eventually free skbn thus leaking the packet.

Fix by restricting the sysctl inferface to allow only actually supported
AX.25 dialects.

The system administration mistake needed for this to happen is rather
unlikely, so this is an uncritical hole.

Coverity #651.

Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-19 13:20:06 -08:00
James Ketrenos
f44349f221 [PATCH] ieee80211: Don't update network statistics from off-channel packets.
This patch fixes a problem in the ieee80211 probe response and beacon
reception code that would use the packet statistics for a network even
if they were received on a channel other than that which the network
exists on.

This causes a problem in overlapping channels where, for example, a
strong AP on channel 2 could have its beacons received on channels 1 and
3, but at much lower signal levels.  If scanning was done sequentially,
this means the beacon received on channel 3 would update the AP's signal
level as being much lower than it really is, which subsequently could
cause that AP to be passed over and an alternate AP selected.

Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-17 15:38:55 -05:00
Jeff Garzik
abc71c46dc Merge branch 'upstream-fixes' 2006-03-16 19:27:08 -05:00
John W. Linville
dd288e7d75 Merge branch 'upstream-fixes' 2006-03-15 17:02:08 -05:00
Hong Liu
72df16f109 [PATCH] ieee80211: Fix QoS is not active problem
Fix QoS is not active even the network and the card is QOS enabled.
The problem is we pass the wrong ieee80211_network address to
ipw_handle_beacon/ipw_handle_probe_response, thus the
ieee80211_network->qos_data.active will not be set, causing the driver
not sending QoS frames at all.

Signed-off-by: Hong Liu <hong.liu@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-15 16:16:07 -05:00
Zhu Yi
0df7861240 [PATCH] ieee80211: Fix CCMP decryption problem when QoS is enabled
Use the correct STYPE for Qos data.

Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-03-15 16:11:55 -05:00
Trond Myklebust
e6d83d5569 [PATCH] SUNRPC: Fix potential deadlock in RPC code
In rpc_wake_up() and rpc_wake_up_status(), it is possible for the call to
__rpc_wake_up_task() to fail if another thread happens to be calling
rpc_wake_up_task() on the same rpc_task.

Problem noticed by Bruno Faccini.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-14 07:57:18 -08:00
Adrian Bunk
712917d1c0 [PATCH] SUNRPC: fix a NULL pointer dereference in net/sunrpc/clnt.c
The Coverity checker spotted this possible NULL pointer dereference in
rpc_new_client().

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-14 07:57:17 -08:00
Herbert Xu
3759fa9c55 [TCP]: Fix zero port problem in IPv6
When we link a socket into the hash table, we need to make sure that we
set the num/port fields so that it shows us with a non-zero port value
in proc/netlink and on the wire.  This code and comment is copied over
from the IPv4 stack as is.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2006-03-13 14:26:12 -08:00
Patrick McHardy
31fe4d3317 [NETFILTER]: arp_tables: fix NULL pointer dereference
The check is wrong and lets NULL-ptrs slip through since !IS_ERR(NULL)
is true.

Coverity #190

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-12 20:40:43 -08:00
Patrick McHardy
baa829d892 [IPV4/6]: Fix UFO error propagation
When ufo_append_data fails err is uninitialized, but returned back.
Strangely gcc doesn't notice it.

Coverity #901 and #902

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-12 20:39:40 -08:00
Patrick McHardy
4a1ff6e2bd [TCP]: tcp_highspeed: fix AIMD table out-of-bounds access
Covertiy #547

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-12 20:39:39 -08:00
Patrick McHardy
cc9a06cd8d [NETLINK]: Fix use-after-free in netlink_recvmsg
The skb given to netlink_cmsg_recv_pktinfo is already freed, move it up
a few lines.

Coverity #948

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-12 20:39:38 -08:00
Patrick McHardy
f8dc01f543 [XFRM]: Fix leak in ah6_input
tmp_hdr is not freed when ipv6_clear_mutable_options fails.

Coverity #650

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-12 20:39:37 -08:00
Patrick McHardy
f6e57464df [NET_SCHED]: act_api: fix skb leak in error path
The skb is allocated by the function, so it needs to be freed instead
of trimmed on overrun.

Coverity #614

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-12 20:39:36 -08:00
Patrick McHardy
406dbfc9ae [NETFILTER]: nfnetlink_queue: fix possible NULL-ptr dereference
Fix NULL-ptr dereference when a config message for a non-existant
queue containing only an NFQA_CFG_PARAMS attribute is received.

Coverity #433

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-12 20:39:35 -08:00
David S. Miller
ba244fe900 [TCP]: Fix tcp_tso_should_defer() when limit>=65536
That's >= a full sized TSO frame, so we should always
return 0 in that case.

Based upon a report and initial patch from Lachlan
Andrew, final patch suggested by Herbert Xu.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-11 18:51:49 -08:00
Gregor Maier
c127437641 [NETFILTER]: Fix wrong option spelling in Makefile for CONFIG_BRIDGE_EBT_ULOG
Signed-off-by: Gregor Maier <gregor@net.in.tum.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-11 18:51:25 -08:00
Brian Haley
0d27b42739 [IPV6]: fix ipv6_saddr_score struct element
The scope element in the ipv6_saddr_score struct used in 
ipv6_dev_get_saddr() is an unsigned integer, but __ipv6_addr_src_scope() 
returns a signed integer (and can return -1).

Signed-off-by: Brian Haley <brian.haley@hp.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-11 18:50:14 -08:00
Jeff Garzik
749dfc7055 Merge branch 'upstream-fixes' 2006-03-11 13:35:31 -05:00
Dipankar Sarma
529bf6be5c [PATCH] fix file counting
I have benchmarked this on an x86_64 NUMA system and see no significant
performance difference on kernbench.  Tested on both x86_64 and powerpc.

The way we do file struct accounting is not very suitable for batched
freeing.  For scalability reasons, file accounting was
constructor/destructor based.  This meant that nr_files was decremented
only when the object was removed from the slab cache.  This is susceptible
to slab fragmentation.  With RCU based file structure, consequent batched
freeing and a test program like Serge's, we just speed this up and end up
with a very fragmented slab -

llm22:~ # cat /proc/sys/fs/file-nr
587730  0       758844

At the same time, I see only a 2000+ objects in filp cache.  The following
patch I fixes this problem.

This patch changes the file counting by removing the filp_count_lock.
Instead we use a separate percpu counter, nr_files, for now and all
accesses to it are through get_nr_files() api.  In the sysctl handler for
nr_files, we populate files_stat.nr_files before returning to user.

Counting files as an when they are created and destroyed (as opposed to
inside slab) allows us to correctly count open files with RCU.

Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08 14:14:01 -08:00
Thomas Graf
850a9a4e3c [NETFILTER] ip_queue: Fix wrong skb->len == nlmsg_len assumption
The size of the skb carrying the netlink message is not
equivalent to the length of the actual netlink message
due to padding. ip_queue matches the length of the payload
against the original packet size to determine if packet
mangling is desired, due to the above wrong assumption
arbitary packets may not be mangled depening on their
original size.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-07 14:56:12 -08:00
Ian McDonald
c09966608d [DCCP] ccid3: Divide by zero fix
In rare circumstances 0 is returned by dccp_li_hist_calc_i_mean which
leads to a divide by zero in ccid3_hc_rx_packet_recv. Explicitly check
for zero return now. Update copyright notice at same time.

Found by Arnaldo.

Signed-off-by: Ian McDonald <imcdnzl@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-04 21:06:29 -08:00
Chas Williams
0f8f325b25 [ATM]: keep atmsvc failure messages quiet
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-04 21:06:25 -08:00
Stephen Hemminger
125a12ccf3 [BRIDGE]: generate kobject remove event
The earlier round of kobject/sysfs changes to bridge caused
it not to generate a uevent on removal. Don't think any application
cares (not sure about Xen) but since it generates add uevent
it should generate remove as well.

Signed-off-by: Stephen Hemminger <shemmigner@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-04 21:06:23 -08:00
Stephen Hemminger
d32439c0d4 [BRIDGE]: port timer initialization
Initialize the STP timers for a port when it is created,
rather than when it is enabled. This will prevent future race conditions
where timer gets started before port is enabled.

Signed-off-by: Stephen Hemminger <shemmigner@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-04 21:06:21 -08:00
Stephen Hemminger
6e86b89084 [BRIDGE]: fix crash in STP
Bridge would crash because of uninitailized timer if STP is used and
device was inserted into a bridge before bridge was up. This got
introduced when the delayed port checking was added.  Fix is to not
enable STP on port unless bridge is up.

Bugzilla: http://bugzilla.kernel.org/show_bug.cgi?id=6140
Dup:      http://bugzilla.kernel.org/show_bug.cgi?id=6156

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-04 21:06:19 -08:00
Jay Vosburgh
8f903c708f [PATCH] bonding: suppress duplicate packets
Originally submitted by Kenzo Iwami; his original description is:

The current bonding driver receives duplicate packets when broadcast/
multicast packets are sent by other devices or packets are flooded by the
switch. In this patch, new flags are added in priv_flags of net_device
structure to let the bonding driver discard duplicate packets in
dev.c:skb_bond().

	Modified by Jay Vosburgh to change a define name, update some
comments, rearrange the new skb_bond() for clarity, clear all bonding
priv_flags on slave release, and update the driver version.

Signed-off-by: Kenzo Iwami <k-iwami@cj.jp.nec.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2006-03-03 20:58:00 -05:00
Jeff Garzik
75e47b3600 Merge branch 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2006-03-01 01:59:15 -05:00
Jeff Garzik
68727fed54 Merge branch 'upstream-fixes' 2006-03-01 01:58:38 -05:00
Jeff Garzik
ce7eeb6b52 Merge branch 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2006-02-28 18:04:30 -05:00
Pete Zaitcev
07981aa43f [PATCH] ieee80211_geo.c: remove frivolous BUG_ON's
I have come to consider BUG_ON generally harmful. The idea of an assert is
to prevent a program to execute past a point where its state is known
erroneous, thus preventing it from dealing more damage to the data
(or hiding the traces of malfunction). The problem is, in kernel this harm
has to be balanced against the harm of forced reboot.

The last straw was our softmac tree, where "iwlist eth1 scan" causes
a lockup. It is absolutely frivolus and provides no advantages a normal
assert has to provide. In fact, doing this impedes debugging.

Signed-off-by: Pete Zaitcev <zaitcev@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-02-27 20:14:58 -05:00
John W. Linville
acfaf10be5 Merge branch 'upstream-fixes' 2006-02-27 20:13:10 -05:00
John W. Linville
9f5a405b68 Merge branch 'from-linus' 2006-02-27 20:12:23 -05:00
Pete Zaitcev
4832843d77 [PATCH] ieee80211_rx.c: is_beacon
Fix broken is_beacon().

Signed-off-by: Pete Zaitcev <zaitcev@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-02-27 20:12:02 -05:00
Arnaldo Carvalho de Melo
ba13c98405 [REQSK]: Don't reset rskq_defer_accept in reqsk_queue_alloc
In 295f7324ff I moved defer_accept from
tcp_sock to request_queue and mistakingly reset it at reqsl_queue_alloc, causing
calls to setsockopt(TCP_DEFER_ACCEPT ) to be lost after bind, the fix is to
remove the zeroing of rskq_defer_accept from reqsl_queue_alloc.

Thanks to Alexandra N. Kossovsky <Alexandra.Kossovsky@oktetlabs.ru> for
reporting and testing the suggested fix.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-27 13:30:43 -08:00
Patrick McHardy
bafac2a512 [NETFILTER]: Restore {ipt,ip6t,ebt}_LOG compatibility
The nfnetlink_log infrastructure changes broke compatiblity of the LOG
targets. They currently use whatever log backend was registered first,
which means that if ipt_ULOG was loaded first, no messages will be printed
to the ring buffer anymore.

Restore compatiblity by using the old log functions by default and only use
the nf_log backend if the user explicitly said so.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-27 13:04:17 -08:00
Patrick McHardy
45fe4dc08c [NETFILTER]: nf_queue: fix end-of-list check
The comparison wants to find out if the last list iteration reached the
end of the list. It needs to compare the iterator with the list head to
do this, not the element it is looking for.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-27 13:03:55 -08:00
Patrick McHardy
e121e9ecb0 [NETFILTER]: nf_queue: remove unnecessary check for outfn
The only point of registering a queue handler is to provide an outfn,
so there is no need to check for it.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-27 13:03:39 -08:00
Patrick McHardy
7a11b9848a [NETFILTER]: nf_queue: fix rerouting after packet mangling
Packets should be rerouted when they come back from userspace, not before.
Also move the queue_rerouters to RCU to avoid taking the queue_handler_lock
for each reinjected packet.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-27 13:03:24 -08:00
Patrick McHardy
f92f871989 [NETFILTER]: nf_queue: check if rerouter is present before using it
Every rerouter needs to provide a save and a reroute function, we don't
need to check for them. But we do need to check if a rerouter is registered
at all for the current family, with bridging for example packets of
unregistered families can hit nf_queue.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-27 13:03:10 -08:00
Patrick McHardy
e02f7d1603 [NETFILTER]: nf_queue: don't copy registered rerouter data
Use the registered data structure instead of copying it.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-27 13:02:52 -08:00
Herbert Xu
752c1f4c78 [IPSEC]: Kill post_input hook and do NAT-T in esp_input directly
The only reason post_input exists at all is that it gives us the
potential to adjust the checksums incrementally in future which
we ought to do.

However, after thinking about it for a bit we can adjust the
checksums without using this post_input stuff at all.  The crucial
point is that only the inner-most NAT-T SA needs to be considered
when adjusting checksums.  What's more, the checksum adjustment
comes down to a single u32 due to the linearity of IP checksums.

We just happen to have a spare u32 lying around in our skb structure :)
When ip_summed is set to CHECKSUM_NONE on input, the value of skb->csum
is currently unused.  All we have to do is to make that the checksum
adjustment and voila, there goes all the post_input and decap structures!

I've left in the decap data structures for now since it's intricately
woven into the sec_path stuff.  We can kill them later too.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-27 13:00:40 -08:00
Herbert Xu
4bf05eceec [IPSEC] esp: Kill unnecessary block and indentation
We used to keep sg on the stack which is why the extra block was useful.
We've long since stopped doing that so let's kill the block and save
some indentation.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-27 13:00:01 -08:00
Jeff Garzik
dbfedbb981 Merge branch 'master' 2006-02-27 11:33:51 -05:00
YOSHIFUJI Hideaki
d91675f9c7 [IPV6]: Do not ignore IPV6_MTU socket option.
Based on patch by Hoerdt Mickael <hoerdt@clarinet.u-strasbg.fr>.

Signed-off-by: YOSHIFUJI Hideaki <yosufuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-24 13:18:33 -08:00
Hugo Santos
0c0888908d [IPV6] ip6_tunnel: release cached dst on change of tunnel params
The included patch fixes ip6_tunnel to release the cached dst entry
when the tunnel parameters (such as tunnel endpoints) are changed so
they are used immediatly for the next encapsulated packets.

Signed-off-by: Hugo Santos <hsantos@av.it.pt>
Acked-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-24 13:16:25 -08:00
Jeff Garzik
7b0386921d Merge branch 'upstream-fixes' 2006-02-23 21:16:27 -05:00
Herbert Xu
4da3089f2b [IPSEC]: Use TOS when doing tunnel lookups
We should use the TOS because it's one of the routing keys.  It also
means that we update the correct routing cache entry when PMTU occurs.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-23 16:19:26 -08:00
Jamal Hadi Salim
f8d0e3f115 [NET] ethernet: Fix first packet goes out with MAC 00:00:00:00:00:00
When you turn off ARP on a netdevice then the first packet always goes
out with a dstMAC of all zeroes. This is because the first packet is
used to resolve ARP entries. Even though the ARP entry may be resolved
(I tried by setting a static ARP entry for a host i was pinging from),
it gets overwritten by virtue of having the netdevice disabling ARP.

Subsequent packets go out fine with correct dstMAC address (which may
be why people have ignored reporting this issue).

To cut the story short: 

the culprit code is in net/ethernet/eth.c::eth_header()

----
        /*
         *      Anyway, the loopback-device should never use this
function...
         */

        if (dev->flags & (IFF_LOOPBACK|IFF_NOARP))
        {
                memset(eth->h_dest, 0, dev->addr_len);
                return ETH_HLEN;
        }

	if(daddr)
        {
                memcpy(eth->h_dest,daddr,dev->addr_len);
                return ETH_HLEN;
        }

----

Note how the h_dest is being reset when device has IFF_NOARP.

As a note:
All devices including loopback pass a daddr. loopback in fact passes
a 0 all the time ;-> 
This means i can delete the check totaly or i can remove the IFF_NOARP

Alexey says:
--------------------
I think, it was me who did this crap. It was so long ago I do not remember
why it was made.

I remember some troubles with dummy device. It tried to resolve
addresses, apparently, without success and generated errors instead of
blackholing. I think the problem was eventually solved at neighbour
level.

After some thinking I suspect the deletion of this chunk could change
behaviour of some parts which do not use neighbour cache f.e. packet
socket.

I think safer approach would be to move this chunk after if (daddr).
And the possibility to remove this completely could be analyzed later.
--------------------

Patch updated with Alexey's safer suggestions.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-23 16:18:01 -08:00
Herbert Xu
21380b81ef [XFRM]: Eliminate refcounting confusion by creating __xfrm_state_put().
We often just do an atomic_dec(&x->refcnt) on an xfrm_state object
because we know there is more than 1 reference remaining and thus
we can elide the heavier xfrm_state_put() call.

Do this behind an inline function called __xfrm_state_put() so that is
more obvious and also to allow us to more cleanly add refcount
debugging later.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-23 16:10:53 -08:00
Suresh Bhogavilli
8525987849 [IPV4]: Fix garbage collection of multipath route entries
When garbage collecting route cache entries of multipath routes
in rt_garbage_collect(), entries were deleted from the hash bucket
'i' while holding a spin lock on bucket 'k' resulting in a system
hang.  Delete entries, if any, from bucket 'k' instead.

Signed-off-by: Suresh Bhogavilli <sbhogavilli@verisign.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-23 16:10:52 -08:00
Patrick McHardy
42cf93cd46 [NETFILTER]: Fix bridge netfilter related in xfrm_lookup
The bridge-netfilter code attaches a fake dst_entry with dst->ops == NULL
to purely bridged packets. When these packets are SNATed and a policy
lookup is done, xfrm_lookup crashes because it tries to dereference
dst->ops.

Change xfrm_lookup not to dereference dst->ops before checking for the
DST_NOXFRM flag and set this flag in the fake dst_entry.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-23 16:10:51 -08:00
Linus Torvalds
cf70a6f264 Merge branch 'fixes.b8' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/bird 2006-02-20 20:09:44 -08:00
YOSHIFUJI Hideaki
a8372f035a [NET]: NETFILTER: remove duplicated lines and fix order in skb_clone().
Some of netfilter-related members are initalized / copied twice in
skb_clone(). Remove one.

Pointed out by Olivier MATZ <olivier.matz@6wind.com>.

And this patch also fixes order of copying / clearing members.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-19 22:32:06 -08:00
Patrick McHardy
8e249f0881 [NETFILTER]: Fix outgoing redirects to loopback
When redirecting an outgoing packet to loopback, it keeps the original
conntrack reference and information from the outgoing path, which
falsely triggers the check for DNAT on input and the dst_entry is
released to trigger rerouting. ip_route_input refuses to route the
packet because it has a local source address and it is dropped.

Look at the packet itself to dermine if it was NATed. Also fix a
missing inversion that causes unneccesary xfrm lookups.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-19 22:29:47 -08:00
Patrick McHardy
bc6e14b6f0 [NETFILTER]: Fix NAT PMTUD problems
ICMP errors are only SNATed when their source matches the source of the
connection they are related to, otherwise the source address is not
changed. This creates problems with ICMP frag. required messages
originating from a router behind the NAT, if private IPs are used the
packet has a good change of getting dropped on the path to its destination.

Always NAT ICMP errors similar to the original connection.

Based on report by Al Viro.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-19 22:26:40 -08:00
Patrick McHardy
9951101438 [XFRM]: Fix policy double put
The policy is put once immediately and once at the error label, which results
in the following Oops:

kernel BUG at net/xfrm/xfrm_policy.c:250!
invalid opcode: 0000 [#2]
PREEMPT
[...]
CPU:    0
EIP:    0060:[<c028caf7>]    Not tainted VLI
EFLAGS: 00210246   (2.6.16-rc3 #39)
EIP is at __xfrm_policy_destroy+0xf/0x46
eax: d49f2000   ebx: d49f2000   ecx: f74bd880   edx: f74bd280
esi: d49f2000   edi: 00000001   ebp: cd506dcc   esp: cd506dc8
ds: 007b   es: 007b   ss: 0068
Process ssh (pid: 31970, threadinfo=cd506000 task=cfb04a70)
Stack: <0>cd506000 cd506e34 c028e92b ebde7280 cd506e58 cd506ec0 f74bd280 00000000
       00000214 0000000a 0000000a 00000000 00000002 f7ae6000 00000000 cd506e58
       cd506e14 c0299e36 f74bd280 e873fe00 c02943fd cd506ec0 ebde7280 f271f440
Call Trace:
 [<c0103a44>] show_stack_log_lvl+0xaa/0xb5
 [<c0103b75>] show_registers+0x126/0x18c
 [<c0103e68>] die+0x14e/0x1db
 [<c02b6809>] do_trap+0x7c/0x96
 [<c0104237>] do_invalid_op+0x89/0x93
 [<c01035af>] error_code+0x4f/0x54
 [<c028e92b>] xfrm_lookup+0x349/0x3c2
 [<c02b0b0d>] ip6_datagram_connect+0x317/0x452
 [<c0281749>] inet_dgram_connect+0x49/0x54
 [<c02404d2>] sys_connect+0x51/0x68
 [<c0240928>] sys_socketcall+0x6f/0x166
 [<c0102aa1>] syscall_call+0x7/0xb

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-19 22:11:50 -08:00
Al Viro
cc6cdac0cf [PATCH] missing ntohs() in ip6_tunnel
->payload_len is net-endian

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-02-18 16:02:18 -05:00
Jeff Garzik
b04a92e160 Merge branch 'upstream-fixes' 2006-02-17 16:20:30 -05:00
Johannes Berg
b7cffb028a [PATCH] ieee80211: fix sparse warning about missing "static"
This patch adds a missing "static" on a variable (sparse complaint)

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-02-17 10:41:34 -05:00
Zhu Yi
4716808283 [PATCH] ieee80211: Use IWEVGENIE to set WPA IE
It replaces returning WPA/RSN IEs as custom events with returning them
as IWEVGENIE events. I have tested that it returns proper information
with both Xsupplicant, and the latest development version of the Linux
wireless tools.

Signed-off-by: Chris Hessing <Chris.Hessing@utah.edu>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-02-17 08:16:59 -05:00
John W. Linville
750b50ab56 Merge branch 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 2006-02-17 08:15:41 -05:00
Yasuyuki Kozakai
7c6de05884 [NETFILTER]: nf_conntrack: Fix TCP/UDP HW checksum handling for IPv6 packet
If skb->ip_summed is CHECKSUM_HW here, skb->csum includes checksum
of actual IPv6 header and extension headers. Then such excess
checksum must be subtruct when nf_conntrack calculates TCP/UDP checksum
with pseudo IPv6 header. Spotted by Ben Skeggs.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-15 15:25:18 -08:00
Yasuyuki Kozakai
763ecff187 [NETFILTER]: nf_conntrack: attach conntrack to locally generated ICMPv6 error
Locally generated ICMPv6 errors should be associated with the conntrack
of the original packet. Since the conntrack entry may not be in the hash
tables (for the first packet), it must be manually attached.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-15 15:24:15 -08:00
Yasuyuki Kozakai
08857fa745 [NETFILTER]: nf_conntrack: attach conntrack to TCP RST generated by ip6t_REJECT
TCP RSTs generated by the REJECT target should be associated with the
conntrack of the original TCP packet. Since the conntrack entry is
usually not is the hash tables, it must be manually attached.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-15 15:23:28 -08:00
Yasuyuki Kozakai
7d3cdc6b55 [NETFILTER]: nf_conntrack: move registration of __nf_ct_attach
Move registration of __nf_ct_attach to nf_conntrack_core to make it usable
for IPv6 connection tracking as well.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-15 15:22:21 -08:00
Yasuyuki Kozakai
deac0ccdb4 [NETFILTER]: x_tables: fix dependencies of conntrack related modules
NF_CONNTRACK_MARK is bool and depends on NF_CONNTRACK which is
tristate.  If a variable depends on NF_CONNTRACK_MARK and doesn't take
care about NF_CONNTRACK, it can be y even if NF_CONNTRACK isn't y.
NF_CT_ACCT have same issue, too.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-15 15:21:31 -08:00
Patrick McHardy
48d5cad87c [XFRM]: Fix SNAT-related crash in xfrm4_output_finish
When a packet matching an IPsec policy is SNATed so it doesn't match any
policy anymore it looses its xfrm bundle, which makes xfrm4_output_finish
crash because of a NULL pointer dereference.

This patch directs these packets to the original output path instead. Since
the packets have already passed the POST_ROUTING hook, but need to start at
the beginning of the original output path which includes another
POST_ROUTING invocation, a flag is added to the IPCB to indicate that the
packet was rerouted and doesn't need to pass the POST_ROUTING hook again.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-15 15:10:22 -08:00
Adrian Drzewiecki
78872ccb68 [BRIDGE]: Fix deadlock in br_stp_disable_bridge
Looks like somebody forgot to use the _bh spin_lock variant. We ran into a 
deadlock where br->hello_timer expired while br_stp_disable_br() walked 
br->port_list. 

Signed-off-by: Adrian Drzewiecki <z@drze.net>
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-15 01:47:48 -08:00
Patrick McHardy
ee68cea2c2 [NETFILTER]: Fix xfrm lookup after SNAT
To find out if a packet needs to be handled by IPsec after SNAT, packets
are currently rerouted in POST_ROUTING and a new xfrm lookup is done. This
breaks SNAT of non-unicast packets to non-local addresses because the
packet is routed as incoming packet and no neighbour entry is bound to the
dst_entry. In general, it seems to be a bad idea to replace the dst_entry
after the packet was already sent to the output routine because its state
might not match what's expected.

This patch changes the xfrm lookup in POST_ROUTING to re-use the original
dst_entry without routing the packet again. This means no policy routing
can be used for transport mode transforms (which keep the original route)
when packets are SNATed to match the policy, but it looks like the best
we can do for now.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-15 01:34:23 -08:00
David S. Miller
b4d9eda028 [NET]: Revert skb_copy_datagram_iovec() recursion elimination.
Revert the following changeset:

bc8dfcb939

Recursive SKB frag lists are really possible and disallowing
them breaks things.

Noticed by: Jesse Brandeburg <jesse.brandeburg@intel.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-13 16:06:10 -08:00
Herbert Xu
00de651d14 [IPSEC]: Fix strange IPsec freeze.
Problem discovered and initial patch by Olaf Kirch:

	there's a problem with IPsec that has been bugging some of our users
	for the last couple of kernel revs. Every now and then, IPsec will
	freeze the machine completely. This is with openswan user land,
	and with kernels up to and including 2.6.16-rc2.

	I managed to debug this a little, and what happens is that we end
	up looping in xfrm_lookup, and never get out. With a bit of debug
	printks added, I can this happening:

		ip_route_output_flow calls xfrm_lookup

		xfrm_find_bundle returns NULL (apparently we're in the
			middle of negotiating a new SA or something)

		We therefore call xfrm_tmpl_resolve. This returns EAGAIN
			We go to sleep, waiting for a policy update.
			Then we loop back to the top

		Apparently, the dst_orig that was passed into xfrm_lookup
			has been dropped from the routing table (obsolete=2)
			This leads to the endless loop, because we now create
			a new bundle, check the new bundle and find it's stale
			(stale_bundle -> xfrm_bundle_ok -> dst_check() return 0)

	People have been testing with the patch below, which seems to fix the
	problem partially. They still see connection hangs however (things
	only clear up when they start a new ping or new ssh). So the patch
	is obvsiouly not sufficient, and something else seems to go wrong.

	I'm grateful for any hints you may have...

I suggest that we simply bail out always.  If the dst decides to die
on us later on, the packet will be dropped anyway.  So there is no
great urgency to retry here.  Once we have the proper resolution
queueing, we can then do the retry again.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Olaf Kirch <okir@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-13 16:01:27 -08:00
Nicolas DICHTEL
6d3e85ecf2 [IPV6] Don't store dst_entry for RAW socket
Signed-off-by: Nicolas DICHTEL <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-13 15:56:13 -08:00
Jamal Hadi Salim
e200bd8065 [NETLINK] genetlink: Fix bugs spotted by Andrew Morton.
- panic() doesn't return.

- Don't forget to unlock on genl_register_family() error path

- genl_rcv_msg() is called via pointer so there's no point in declaring it
  `inline'.

Notes:

genl_ctrl_event() ignores the genlmsg_multicast() return value.

lots of things ignore the genl_ctrl_event() return value.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-13 15:51:24 -08:00
Stephen Hemminger
178a3259f2 [BRIDGE]: Better fix for netfilter missing symbol has_bridge_parent
Horms patch was the best of the three fixes. Dave, already applied
Harald's version, so this patch converts that to the better one.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-13 15:43:58 -08:00
Harald Welte
a6c1cd5726 [NETFILTER] Fix Kconfig menu level for x_tables
The new x_tables related Kconfig options appear at the wrong menu level
without this patch.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-13 15:42:48 -08:00
David S. Miller
15c38c6ecd Merge master.kernel.org:/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6 2006-02-13 15:40:55 -08:00
Dave Jones
99e382afd2 [P8023]: Fix tainting of kernel.
Missing license tag.
I've assumed this is GPL.  (It could also use a MODULE_AUTHOR)

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-13 15:38:42 -08:00
Dave Jones
77decfc716 [IPV4] ICMP: Invert default for invalid icmp msgs sysctl
isic can trigger these msgs to be spewed at a very high rate.
There's already a sysctl to turn them off. Given these messages
aren't useful for most people, this patch disables them by
default.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-13 15:36:21 -08:00
Dave Jones
bf3883c12f [ATM]: Ratelimit atmsvc failure messages
This seems to be trivial to trigger.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-13 15:34:58 -08:00
Marcel Holtmann
7b005bd34c [Bluetooth] Fix NULL pointer dereferences of the HCI socket
This patch fixes the two NULL pointer dereferences found by the sfuzz
tool from Ilja van Sprundel. The first one was a call of getsockname()
for an unbound socket and the second was calling accept() while this
operation isn't implemented for the HCI socket interface.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2006-02-13 11:40:03 +01:00
Marcel Holtmann
56f3a40a5e [Bluetooth] Reduce L2CAP MTU for RFCOMM connections
This patch reduces the default L2CAP MTU for all RFCOMM connections
from 1024 to 1013 to improve the interoperability with some broken
RFCOMM implementations. To make this more flexible the L2CAP MTU
becomes also a module parameter and so it can changed at runtime.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2006-02-13 11:39:57 +01:00
Jesper Juhl
3c791925da [PATCH] netfilter: fix build error due to missing has_bridge_parent macro
net/bridge/br_netfilter.c: In function `br_nf_post_routing':
net/bridge/br_netfilter.c:808: warning: implicit declaration of function `has_bridge_parent'

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Cc: Harald Welte <laforge@netfilter.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-12 16:10:47 -08:00
Stephen Hemminger
bab1deea30 [BRIDGE]: fix error handling for add interface to bridge
Refactor how the bridge code interacts with kobject system.
It should still use kobjects even if not using sysfs.
Fix the error unwind handling in br_add_if.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-09 17:10:12 -08:00
Stephen Hemminger
5dce971acf [BRIDGE]: netfilter handle RCU during removal
Bridge netfilter code needs to handle the case where device is
removed from bridge while packet in process. In these cases the
bridge_parent can become null while processing.

This should fix: http://bugzilla.kernel.org/show_bug.cgi?id=5803

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-09 17:09:38 -08:00
Stephen Hemminger
b3f1be4b54 [BRIDGE]: fix for RCU and deadlock on device removal
Change Bridge receive path to correctly handle RCU removal of device
from bridge.  Also fixes deadlock between carrier_check and del_nbp.
This replaces the previous deleted flag fix.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-09 17:08:52 -08:00
John Heffner
6fcf9412de [TCP]: rcvbuf lock when tcp_moderate_rcvbuf enabled
The rcvbuf lock should probably be honored here.

Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-09 17:06:57 -08:00
David Binderman
80ba250e59 [IRDA]: out of range array access
This patch fixes an out of range array access in irnet_irda.c.

Author: David Binderman <dcb314@hotmail.com>
Signed-off-by: Samuel Ortiz <samuel.ortiz@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-09 16:59:48 -08:00
Samuel Ortiz
d93077fb0e [IRDA]: Set proper IrLAP device address length
This patch set IrDA's addr_len properly, i.e to 4 bytes, the size of the
IrLAP device address.

Signed-off-by: Samuel Ortiz <samuel.ortiz@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-09 16:58:46 -08:00
Alexey Kuznetsov
28633514af [NETLINK]: illegal use of pid in rtnetlink
When a netlink message is not related to a netlink socket,
it is issued by kernel socket with pid 0. Netlink "pid" has nothing
to do with current->pid. I called it incorrectly, if it was named "port",
the confusion would be avoided.

Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-09 16:43:41 -08:00
Alexey Kuznetsov
a70ea994a0 [NETLINK]: Fix a severe bug
netlink overrun was broken while improvement of netlink.
Destination socket is used in the place where it was meant to be source socket,
so that now overrun is never sent to user netlink sockets, when it should be,
and it even can be set on kernel socket, which results in complete deadlock
of rtnetlink.

Suggested fix is to restore status quo passing source socket as additional
argument to netlink_attachskb().

A little explanation: overrun is set on a socket, when it failed
to receive some message and sender of this messages does not or even
have no way to handle this error. This happens in two cases:
1. when kernel sends something. Kernel never retransmits and cannot
   wait for buffer space.
2. when user sends a broadcast and the message was not delivered
   to some recipients.

Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-09 16:43:38 -08:00
Jeff Garzik
70c07e0262 Merge branch 'viro' 2006-02-09 14:17:05 -05:00
Kristian Slavov
9908104935 [IPV6]: Address autoconfiguration does not work after device down/up cycle
If you set network interface down and up again, the IPv6 address
autoconfiguration does not work. 'ip addr' shows that the link-local
address is in tentative state. We don't even react to periodical router
advertisements.

During NETDEV_DOWN we clear IF_READY, and we don't set it back in
NETDEV_UP. While starting to perform DAD on the link-local address, we
notice that the device is not in IF_READY, and we abort autoconfiguration
process (which would eventually send router solicitations).

Acked-by: Juha-Matti Tapio <jmtapio@verkkotelakka.net>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-08 16:13:28 -08:00
Al Viro
e80e28b6b6 [PATCH] net/ipv6/mcast.c NULL noise removal
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-02-07 20:58:56 -05:00
Al Viro
76edc6051e [PATCH] ipv4 NULL noise removal
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-02-07 20:57:37 -05:00
Al Viro
1b8623545b [PATCH] remove bogus asm/bug.h includes.
A bunch of asm/bug.h includes are both not needed (since it will get
pulled anyway) and bogus (since they are done too early).  Removed.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-02-07 20:56:35 -05:00
Jeff Garzik
3c9b3a8575 Merge branch 'master' 2006-02-07 01:47:12 -05:00
Linus Torvalds
98bd0c07b6 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2006-02-05 11:10:29 -08:00
Eric Dumazet
88a2a4ac6b [PATCH] percpu data: only iterate over possible CPUs
percpu_data blindly allocates bootmem memory to store NR_CPUS instances of
cpudata, instead of allocating memory only for possible cpus.

As a preparation for changing that, we need to convert various 0 -> NR_CPUS
loops to use for_each_cpu().

(The above only applies to users of asm-generic/percpu.h.  powerpc has gone it
alone and is presently only allocating memory for present CPUs, so it's
currently corrupting memory).

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Jens Axboe <axboe@suse.de>
Cc: Anton Blanchard <anton@samba.org>
Acked-by: William Irwin <wli@holomorphy.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-05 11:06:51 -08:00
Patrick McHardy
7918d212df [NETFILTER]: Fix check whether dst_entry needs to be released after NAT
After DNAT the original dst_entry needs to be released if present
so the packet doesn't skip input routing with its new address. The
current check for DNAT in ip_nat_in is reversed and checks for SNAT.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-04 23:51:29 -08:00
Patrick McHardy
0047c65a60 [NETFILTER]: Prepare {ipt,ip6t}_policy match for x_tables unification
The IPv4 and IPv6 version of the policy match are identical besides address
comparison and the data structure used for userspace communication. Unify
the data structures to break compatiblity now (before it is released), so
we can port it to x_tables in 2.6.17.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-04 23:51:28 -08:00
Patrick McHardy
878c41ce57 [NETFILTER]: Fix ip6t_policy address matching
Fix two bugs in ip6t_policy address matching:
- misorder arguments to ip6_masked_addrcmp, mask must be the second argument
- inversion incorrectly applied to the entire expression instead of just
  the address comparison

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-04 23:51:27 -08:00
Patrick McHardy
e55f1bc5dc [NETFILTER]: Check policy length in policy match strict mode
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-04 23:51:26 -08:00
Kirill Korotaev
ee4bb818ae [NETFILTER]: Fix possible overflow in netfilters do_replace()
netfilter's do_replace() can overflow on addition within SMP_ALIGN()
and/or on multiplication by NR_CPUS, resulting in a buffer overflow on
the copy_from_user().  In practice, the overflow on addition is
triggerable on all systems, whereas the multiplication one might require
much physical memory to be present due to the check above.  Either is
sufficient to overwrite arbitrary amounts of kernel memory.

I really hate adding the same check to all 4 versions of do_replace(),
but the code is duplicate...

Found by Solar Designer during security audit of OpenVZ.org

Signed-Off-By: Kirill Korotaev <dev@openvz.org>
Signed-Off-By: Solar Designer <solar@openwall.com>
Signed-off-by: Patrck McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-04 23:51:25 -08:00
Samir Bellabes
df4e9574a3 [NETFILTER]: nf_conntrack: fix incorrect memset() size in FTP helper
This memset() is executing with a bad size. According to Yasuyuki Kozakai,
this memset() can be deleted, as 'ftp' is declared in global area.

Signed-off-by: Samir Bellabes <sbellabes@mandriva.com>
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-04 23:51:23 -08:00
Patrick McHardy
6f16930078 [NETFILTER]: Fix missing src port initialization in tftp expectation mask
Reported by David Ahern <dahern@avaya.com>, netfilter bugzilla #426.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-04 23:51:21 -08:00
Patrick McHardy
a706124d0a [NETFILTER]: nfnetlink_queue: fix packet marking over netlink
The packet marked is the netlink skb, not the queued skb.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-04 23:51:20 -08:00
Patrick McHardy
ad2ad0f965 [NETFILTER]: Fix undersized skb allocation in ipt_ULOG/ebt_ulog/nfnetlink_log
The skb allocated is always of size nlbufsize, even if that is smaller than
the size needed for the current packet.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-04 23:51:19 -08:00
Holger Eitzenberger
c2db292438 [NETFILTER]: ULOG/nfnetlink_log: Use better default value for 'nlbufsiz'
Performance tests showed that ULOG may fail on heavy loaded systems
because of failed order-N allocations (N >= 1).

The default value of 4096 is not optimal in the sense that it actually
allocates _two_ contigous physical pages.  Reasoning: ULOG uses
alloc_skb(), which adds another ~300 bytes for skb_shared_info.

This patch sets the default value to NLMSG_GOODSIZE and adds some
documentation at the top.

Signed-off-by: Holger Eitzenberger <heitzenberger@astaro.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-04 23:51:18 -08:00
Yasuyuki Kozakai
ddc8d029ac [NETFILTER]: nf_conntrack: check address family when finding protocol module
__nf_conntrack_{l3}proto_find() doesn't check the passed protocol family,
then it's possible to touch out of the array which has only AF_MAX items.

Spotted by Pablo Neira Ayuso.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-04 23:51:17 -08:00
Pablo Neira Ayuso
34f9a2e4de [NETFILTER]: ctnetlink: add MODULE_ALIAS for expectation subsystem
Add load-on-demand support for expectation request. eg. conntrack -L expect

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-04 23:51:16 -08:00
Marcus Sundberg
b633ad5fbf [NETFILTER]: ctnetlink: Fix subsystem used for expectation events
The ctnetlink expectation events should use the NFNL_SUBSYS_CTNETLINK_EXP
subsystem, not NFNL_SUBSYS_CTNETLINK.

Signed-off-by: Marcus Sundberg <marcus@ingate.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-04 23:51:15 -08:00
Herbert Xu
fa60cf7f64 [ICMP]: Fix extra dst release when ip_options_echo fails
When two ip_route_output_key lookups in icmp_send were combined I
forgot to change the error path for ip_options_echo to not drop the
dst reference since it now sits before the dst lookup.  To fix it we
simply jump past the ip_rt_put call.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-04 23:51:14 -08:00
Linus Torvalds
d6c8f6aaa1 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2006-02-03 08:33:06 -08:00
Stephen Hemminger
0dec456d1f [NET]: Add CONFIG_NETDEBUG to suppress bad packet messages.
If you are on a hostile network, or are running protocol tests, you can
easily get the logged swamped by messages about bad UDP and ICMP packets.
This turns those messages off unless a config option is enabled.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Acked-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-02 20:40:09 -08:00
Horms
f00c401b9b [IPV4]: Remove suprious use of goto out: in icmp_reply
This seems to be an artifact of the follwoing commit in February '02.

e7e173af42dbf37b1d946f9ee00219cb3b2bea6a

In a nutshell, goto out and return actually do the same thing,
and both are called in this function. This patch removes out.

Signed-Off-By: Horms <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-02 17:03:18 -08:00
Herbert Xu
6f4b6ec1cf [IPV6]: Fix illegal dst locking in softirq context.
On Tue, Jan 31, 2006 at 10:24:32PM +0100, Ingo Molnar wrote:
>
>  [<c04de9e8>] _write_lock+0x8/0x10
>  [<c0499015>] inet6_destroy_sock+0x25/0x100
>  [<c04b8672>] tcp_v6_destroy_sock+0x12/0x20
>  [<c046bbda>] inet_csk_destroy_sock+0x4a/0x150
>  [<c047625c>] tcp_rcv_state_process+0xd4c/0xdd0
>  [<c047d8e9>] tcp_v4_do_rcv+0xa9/0x340
>  [<c047eabb>] tcp_v4_rcv+0x8eb/0x9d0

OK this is definitely broken.  We should never touch the dst lock in
softirq context.  Since inet6_destroy_sock may be called from that
context due to the asynchronous nature of sockets, we can't take the
lock there.

In fact this sk_dst_reset is totally redundant since all IPv6 sockets
use inet_sock_destruct as their socket destructor which always cleans
up the dst anyway.  So the solution is to simply remove the call.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-02 17:01:13 -08:00
Herbert Xu
f8addb3215 [IPV4] multipath_wrandom: Fix softirq-unsafe spin lock usage
The spin locks in multipath_wrandom may be obtained from either process
context or softirq context depending on whether the packet is locally
or remotely generated.  Therefore we need to disable BH processing when
taking these locks.

This bug was found by Ingo's lock validator.
 
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-02 16:59:16 -08:00
Vlad Yasevich
27852c26ba [SCTP]: Fix 'fast retransmit' to send a TSN only once.
SCTP used to "fast retransmit" a TSN every time we hit the number
of missing reports for the TSN.  However the Implementers Guide
specifies that we should only "fast retransmit" a given TSN once.
Subsequent retransmits should be timeouts only. Also change the
number of missing reports to 3 as per the latest IG(similar to TCP).

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-02 16:57:31 -08:00
Herbert Xu
4641e7a334 [IPV6]: Don't hold extra ref count in ipv6_ifa_notify
Currently the logic in ipv6_ifa_notify is to hold an extra reference
count for addrconf dst's that get added to the routing table.  Thus,
when addrconf dst entries are taken out of the routing table, we need
to drop that dst.  However, addrconf dst entries may be removed from
the routing table by means other than __ipv6_ifa_notify.

So we're faced with the choice of either fixing up all places where
addrconf dst entries are removed, or dropping the extra reference count
altogether.

I chose the latter because the ifp itself always holds a dst reference
count of 1 while it's alive.  This is dropped just before we kfree the
ifp object.  Therefore we know that in __ipv6_ifa_notify we will always
hold that count.

This bug was found by Eric W. Biederman.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-02 16:55:45 -08:00
Stephen Hemminger
42c5e15f18 [NET] snap: needs hardware checksum fix
The SNAP code pops off it's 5 byte header, but doesn't adjust
the checksum. This would cause problems when using device that
does IP over SNAP and hardware receive checksums.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-02-02 16:53:26 -08:00
Trond Myklebust
fba3bad488 SUNRPC: Move upcall out of auth->au_ops->crcreate()
This fixes a bug whereby if two processes try to look up the same auth_gss
 credential, they may end up creating two creds, and triggering two upcalls
 because the upcall is performed before the credential is added to the
 credcache.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-02-01 12:52:25 -05:00
Trond Myklebust
adb12f63e0 SUNRPC: Remove the deprecated function lookup_hash() from rpc_pipefs code
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-02-01 12:52:24 -05:00
Trond Myklebust
9842ef3557 SUNRPC: rpc_timeout_upcall_queue should not sleep
The function rpc_timeout_upcall_queue runs from a workqueue, and hence
 sleeping is not recommended. Convert the protection of the upcall queue
 from being mutex-based to being spinlock-based.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-02-01 12:52:24 -05:00
Trond Myklebust
8a3177604b SUNRPC: Fix a lock recursion in the auth_gss downcall
When we look up a new cred in the auth_gss downcall so that we can stuff
 the credcache, we do not want that lookup to queue up an upcall in order
 to initialise it. To do an upcall here not only redundant, but since we
 are already holding the inode->i_mutex, it will trigger a lock recursion.

 This patch allows rpcauth cache searches to indicate that they can cope
 with uninitialised credentials.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-02-01 12:52:23 -05:00
Martin Waitz
99acf04421 [PATCH] DocBook: fix some kernel-doc comments in net/sunrpc
Fix the syntax of some kernel-doc comments

Signed-off-by: Martin Waitz <tali@admingilde.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-01 08:53:27 -08:00
David S. Miller
0cbd782507 [DCCP] ipv6: dccp_v6_send_response() has a DST leak too.
It was copy&pasted from tcp_v6_send_synack() which has
a DST leak recently fixed by Eric W. Biederman.

So dccp_v6_send_response() needs the same fix too.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-31 17:53:37 -08:00
Eric W. Biederman
78b910429e [IPV6] tcp_v6_send_synack: release the destination
This patch fix dst reference counting in tcp_v6_send_synack

Analysis:
Currently tcp_v6_send_synack is never called with a dst entry
so dst always comes in as NULL.

ip6_dst_lookup calls ip6_route_output which calls dst_hold
before it returns the dst entry.   Neither xfrm_lookup
nor tcp_make_synack consume the dst entry so we still have
a dst_entry with a bumped refrence count at the end of
this function.

Therefore we need to call dst_release just before we return
just like tcp_v4_send_synack does.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-31 17:51:44 -08:00
Sam Ravnborg
f9d9516db7 [NET]: Do not export inet_bind_bucket_create twice.
inet_bind_bucket_create was exported twice.  Keep the export in the
file where inet_bind_bucket_create is defined.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-31 17:47:02 -08:00
Stephen Hemminger
3f4cfc2d11 [BRIDGE]: Fix device delete race.
This is a simpler fix for the two races in bridge device removal.
The Xen race of delif and notify is managed now by a new deleted flag.
No need for barriers or other locking because of rtnl mutex.

The del_timer_sync()'s are unnecessary, because br_stp_disable_port
delete's the timers, and they will finish running before RCU callback.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-31 17:44:07 -08:00
Patrick McHardy
5d39a795bf [IPV4]: Always set fl.proto in ip_route_newports
ip_route_newports uses the struct flowi from the struct rtable returned
by ip_route_connect for the new route lookup and just replaces the port
numbers if they have changed. If an IPsec policy exists which doesn't match
port 0 the struct flowi won't have the proto field set and no xfrm lookup
is done for the changed ports.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-31 17:35:35 -08:00
Linus Torvalds
dd1c1853e2 Fix ipv4/igmp.c compile with gcc-4 and IP_MULTICAST
Modern versions of gcc do not like case statements at the end of a block
statement: you need at least an empty statement.  Using just a "break;"
is preferred for visual style.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-31 13:11:41 -08:00
Linus Torvalds
0827f2b698 Merge branch 'upstream-fixes' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 2006-01-31 10:29:35 -08:00
Larry Finger
2f633db5e9 [PATCH] Add two management functions to ieee80211_rx.c
On my system, I get unhandled management functions corresponding
to IEEE80211_STYPE_REASSOC_REQ and IEEE80211_STYPE_ASSOC_REQ. The
attached patch adds the logic to pass these requests off to a user
stack. The patches to implement these requests in softmac have already
been sent to Johannes Berg.

Signed-Off-By: Larry Finger <Larry.Finger@lwfinger.net>

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-01-31 10:35:46 -05:00
Baruch Even
2c74088e41 [TCP] H-TCP: Fix accounting
This fixes the accounting in H-TCP, the ccount variable is also
adjusted a few lines above this one.

This line was not supposed to be there and wasn't there in the patches
originally submitted, the four patches submitted were merged to one
and in that merge the bug was introduced.

Signed-Off-By: Baruch Even <baruch@ev-en.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-30 20:54:39 -08:00
Dave Jones
c5d90e0004 [IPV4] igmp: remove pointless printk
This is easily triggerable by sending bogus packets,
allowing a malicious user to flood remote logs.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-30 20:27:17 -08:00
Larry Finger
dd5eeb461e [PATCH] ieee80211: common wx auth code
This patch creates two functions ieee80211_wx_set_auth and
ieee80211_wx_get_auth that can be used by drivers for the wireless
extension handlers instead of writing their own, if the implementation
should be software only.

These patches enable using bcm43xx devices with WPA and this seems (as
far as I can tell) to be the only difference between the stock ieee80211
and softmac's ieee80211 left.

Signed-Off-By: Johannes Berg <johannes@sipsolutions.net>

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-01-30 20:35:35 -05:00
Denis Vlasenko
9eafe76b8a [PATCH] ieee80211: trivial fix for misplaced ()'s
Patch fixes misplaced (). Diffed against wireless-2.6.git

Signed-off-by: Denis Vlasenko <vda@ilport.com.ua>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-01-30 20:35:31 -05:00
Adrian Bunk
d86b5e0e6b [PATCH] net/: fix the WIRELESS_EXT abuse
This patch contains the following changes:
- add a CONFIG_WIRELESS_EXT select'ed by NET_RADIO for conditional
  code
- remove the now no longer required #ifdef CONFIG_NET_RADIO from some
  #include's

Based on a patch by Jean Tourrilhes <jt@hpl.hp.com>.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-01-30 20:35:30 -05:00
Vlad Yasevich
e2c2fc2c8f [SCTP]: heartbeats exceed maximum retransmssion limit
The number of HEARTBEAT chunks that an association may transmit is
limited by Association.Max.Retrans count; however, the code allows
us to send one extra heartbeat.

This patch limits the number of heartbeats to the maximum count.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-30 16:00:40 -08:00
Vlad Yasevich
81845c21dc [SCTP]: correct the number of INIT retransmissions
We currently count the initial INIT/COOKIE_ECHO chunk toward the
retransmit count and thus sends a total of sctp_max_retrans_init chunks.
The correct behavior is to retransmit the chunk sctp_max_retrans_init in
addition to sending the original.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-30 15:59:54 -08:00
John W. Linville
747af1e154 Merge branch 'upstream-fixes' 2006-01-30 17:43:25 -05:00
Larry Finger
1a1fedf4d3 [PATCH] Typo corrections for ieee80211
This patch, generated against 2.6.16-rc1-git4, corrects two typographical
errors in ieee80211_rx.c and adds the facility name to a bare printk.

Signed-Off-By: Larry Finger <Larry.Finger@lwfinger.net>

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-01-30 17:41:36 -05:00
Zhu Yi
d1b46b0fba [PATCH] ieee80211: Add 802.11h information element parsing
Added default handlers for various 802.11h DFS and TPC information
elements.  Moved all information elements into single location (called
from two places).  Added debug message with information on unparsed IEs
if debug_level set.  Added code to reset network IBSS DFS information
when appropriate.  Added code to invoke driver callback for 802.11h
ACTION STYPE.  Changed a few printk's to IEEE80211_DEBUG_MGMT.

Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-01-27 17:08:07 -05:00
Zhu Yi
15f385982e [PATCH] ieee80211: Add helpers for IBSS DFS handling
To support IEEE 802.11h in IBSS, an ibss_dfs field is added to struct
ieee80211_network. In IBSS, if one STA sends a beacon with DFS info
(for radar detection), all the other STAs should receive and store
this DFS.  All STAs should send the DFS as one of the information
element in the beacon they are scheduled to send (if possible) in
the future.

Since the ibss_dfs has variable length, it must be allocated
dynamically. ieee80211_network_reset() is added to clear the ibss_dfs
field. ieee80211_network_free() is also updated to free the ibss_dfs
field if it is not NULL.

Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-01-27 17:08:07 -05:00
Zhu Yi
b79e20b609 [PATCH] ieee80211: Add 802.11h data type and structures
Add 802.11h data types and structure definitions to ieee80211.h.

Signed-off-by: Hong Liu <hong.liu@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-01-27 17:08:07 -05:00
Zhu Yi
9184d9348a [PATCH] ieee80211: Add TKIP crypt->build_iv
This patch adds ieee80211 TKIP build_iv() method to support hardwares
that can do TKIP encryption but relies on ieee80211 layer to build
the IV. It also changes the build_iv() interface to return the key
if possible after the IV is built (this is required by TKIP).

Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-01-27 17:08:07 -05:00
Zhu Yi
41a25c616b [PATCH] ieee80211: TIM information element parsing
Added partial support of TIM information element parsing

Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-01-27 17:08:07 -05:00
Zhu Yi
8aa914b747 [PATCH] ieee80211: kmalloc+memset -> kzalloc cleanups
kmalloc+memset -> kzalloc cleanups in ieee80211_crypt_tkip

Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-01-27 17:08:06 -05:00
Zhu Yi
7bd6436604 [PATCH] ieee80211: Add spectrum management information
Add spectrum management information and use stat.signal to provide
signal level information.

Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-01-27 17:08:06 -05:00
Zhu Yi
d128f6c176 [PATCH] ieee80211: add flags for all geo channels
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-01-27 17:08:06 -05:00
Zhu Yi
d652923751 [PATCH] ieee80211: Log if netif_rx() drops the packet
Log to wireless network stats if netif_rx() drops the packet.

(also trailing whitespace and Lindent cleanups as part of patch-apply
process)

Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-01-27 17:08:06 -05:00
Denis Vlasenko
44d7a8cfbd [PATCH] WEP fields are incorrectly shown to be INSIDE snap in the doc
>If encryption is enabled, each fragment payload size is reduced by enough space
>to add the prefix and postfix (IV and ICV totalling 8 bytes in the case of WEP)
>So if you have 1500 bytes of payload with ieee->fts set to 500 without
>encryption it will take 3 frames.  With WEP it will take 4 frames as the
>payload of each frame is reduced to 492 bytes.

Text is correct, but in picture (IV,payload,ICV) sits inside SNAP.
Patch corrects this.

Signed-Off-By: Denis Vlasenko <vda@ilport.com.ua>
Acked-By: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-01-27 17:08:06 -05:00
Zhu Yi
55cd94aa1d [PATCH] ieee80211: Fix iwlist scan can only show about 20 APs
Limit the amount of output given to iwlist scan.

Signed-off-by: Hong Liu <hong.liu@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-01-27 16:49:58 -05:00
Zhu Yi
b6daa25d65 [PATCH] ieee80211: Fix problem with not decrypting broadcast packets
The code for pulling the key to use for decrypt was correctly using
the host_mc_decrypt flag.  The code that actually decrypted,
however, was based on host_decrypt.  This patch changes this
behavior.

Signed-off-by: Etay Bogner <etay.bogner@gmail.com>
Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2006-01-27 16:49:58 -05:00
David L Stevens
7add2a4398 [IPV6] MLDv2: fix change records when transitioning to/from inactive
The following patch fixes these problems in MLDv2:

1) Add/remove "delete" records for sending change reports when
        addition of a filter results in that filter transitioning to/from
        inactive. [same as recent IPv4 IGMPv3 fix]
2) Remove 2 redundant "group_type" checks (can't be IPV6_ADDR_ANY
        within that loop, so checks are always true)
3) change an is_in() "return 0" to "return type == MLD2_MODE_IS_INCLUDE".
        It should always be "0" to get here, but it improves code locality 
        to not assume it, and if some race allowed otherwise, doing
        the check would return the correct result.

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-24 13:06:39 -08:00
Jerome Borsboom
151bb0ffe5 [AF_KEY]: no message type set
When returning a message to userspace in reply to a SADB_FLUSH or 
SADB_X_SPDFLUSH message, the type was not set for the returned PFKEY 
message. The patch below corrects this problem.

Signed-off-by: Jerome Borsboom <j.borsboom@erasmusmc.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-24 12:57:19 -08:00
Thomas Graf
cabcac0b29 [BONDING]: Remove CAP_NET_ADMIN requirement for INFOQUERY ioctl
This information is already available via /proc/net/bonding/*
therefore it doesn't make sense to require CAP_NET_ADMIN
privileges.

Original patch by Laurent Deniel <laurent.deniel@free.fr>

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-24 12:46:33 -08:00
Herbert Xu
8798b3fb71 [NET]: Fix skb fclone error path handling.
On the error path if we allocated an fclone then we will free it in
the wrong pool.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-23 16:32:45 -08:00
Kris Katterjohn
8ae55f0489 [NET]: Fix some whitespace issues in af_packet.c
Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-23 16:28:02 -08:00
Kris Katterjohn
2966b66c25 [NET]: more whitespace issues in net/core/filter.c
This fixes some whitespace issues in net/core/filter.c

Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-23 16:26:16 -08:00
David S. Miller
cf9e50a920 Merge master.kernel.org:/pub/scm/linux/kernel/git/sridhar/lksctp-2.6 2006-01-19 16:53:02 -08:00
Alan Cox
715b49ef2d [PATCH] EDAC: atomic scrub operations
EDAC requires a way to scrub memory if an ECC error is found and the chipset
does not do the work automatically.  That means rewriting memory locations
atomically with respect to all CPUs _and_ bus masters.  That means we can't
use atomic_add(foo, 0) as it gets optimised for non-SMP

This adds a function to include/asm-foo/atomic.h for the platforms currently
supported which implements a scrub of a mapped block.

It also adjusts a few other files include order where atomic.h is included
before types.h as this now causes an error as atomic_scrub uses u32.

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18 19:20:30 -08:00
J. Bruce Fields
5fb8b49e29 [PATCH] svcrpc: gss: svc context creation error handling
Allow mechanisms to return more varied errors on the context creation
downcall.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18 19:20:25 -08:00
Kevin Coffman
91a4762e0a [PATCH] svcrpc: gss: server context init failure handling
We require the server's gssd to create a completed context before asking the
kernel to send a final context init reply.  However, gssd could be buggy, or
under some bizarre circumstances we might purge the context from our cache
before we get the chance to use it here.

Handle this case by returning GSS_S_NO_CONTEXT to the client.

Also move the relevant code here to a separate function rather than nesting
excessively.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18 19:20:25 -08:00
Andy Adamson
822f1005ae [PATCH] svcrpc: gss: handle the GSS_S_CONTINUE
Kerberos context initiation is handled in a single round trip, but other
mechanisms (including spkm3) may require more, so we need to handle the
GSS_S_CONTINUE case in svcauth_gss_accept.  Send a null verifier.

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18 19:20:25 -08:00
J. Bruce Fields
1918e34138 [PATCH] svcrpc: save and restore the daddr field when request deferred
The server code currently keeps track of the destination address on every
request so that it can reply using the same address.  However we forget to do
that in the case of a deferred request.  Remedy this oversight.  >From folks
at PolyServe.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18 19:20:24 -08:00
David S. Miller
27a7b0415f Merge git://tipc.cslab.ericsson.net/pub/git/tipc 2006-01-18 14:23:54 -08:00
David L Stevens
ad12583f46 [IPV4]: Fix multiple bugs in IGMPv3
1) fix "mld_marksources()" to
        a) send nothing when all queried sources are excluded
        b) send full exclude report when source queried sources are
                not excluded
        c) don't schedule a timer when there's nothing to report

2) fix "add_grec()" to send empty-source records when it should
        The original check doesn't account for a non-empty source
        list with all sources inactive; the new code keeps that
        short-circuit case, and also generates the group header
        with an empty list if needed.

3) fix mca_crcount decrement to be after add_grec(), which needs
        its original value

4) add/remove delete records and prevent current advertisements
        when an exclude-mode filter moves from "active" to "inactive"
        or vice versa based on new filter additions.

        Items 1-3 are just IPv4 versions of the IPv6 bugs found
by Yan Zheng and fixed earlier. Item #4 is a related bug that
affects exclude-mode change records only (but not queries) and
also occurs in IPv6 (IPv6 version coming soon).

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-18 14:20:56 -08:00
David S. Miller
7ac5459ec0 [PKTGEN]: Respect hard_header_len of device.
Don't assume 16.

Found by Ben Greear.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-18 14:19:10 -08:00
Andrew Morton
dbd2915ce8 [IPV4]: RT_CACHE_STAT_INC() warning fix
BUG: using smp_processor_id() in preemptible [00000001] code: rpc.statd/2408

And it _is_ a bug, but I guess we don't care enough to add preempt_disable().

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-17 22:46:49 -08:00
Per Liden
4323add677 [TIPC] Avoid polluting the global namespace
This patch adds a tipc_ prefix to all externally visible symbols.

Signed-off-by: Per Liden <per.liden@ericsson.com>
2006-01-18 00:45:16 +01:00
Per Liden
1e63e681e0 [TIPC] Group protocols with sub-options in Kconfig
This is just a cosmetic change that moves the TIPC configuration
entry next to the other protocols that also have sub-options.
Makes the the networking options menu look a bit better.

Signed-off-by: Per Liden <per.liden@ericsson.com>
2006-01-18 00:45:15 +01:00
Per Liden
c11ac3f236 [TIPC] Add help text for TIPC configuration option
Signed-off-by: Per Liden <per.liden@ericsson.com>
2006-01-18 00:45:15 +01:00
Per Liden
50f9bcddf8 [TIPC] Remove unused #includes
Signed-off-by: Per Liden <per.liden@ericsson.com>
2006-01-18 00:45:15 +01:00
Per Liden
33a9c4da5a [TIPC] Move ethernet protocol id to linux/if_ether.h
Signed-off-by: Per Liden <per.liden@ericsson.com>
2006-01-18 00:45:15 +01:00
Per Liden
16cb4b333c [TIPC] Updated link priority macros
Added macros for min/default/max link priority in tipc_config.h.
Also renamed TIPC_NUM_LINK_PRI to TIPC_MEDIA_LINK_PRI since that
is a more accurate description of what it is used for.

Signed-off-by: Per Liden <per.liden@ericsson.com>
2006-01-18 00:45:15 +01:00
Jon Maloy
5f7c3ff6a2 [TIPC] Minor changes to #includes
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
2006-01-18 00:45:14 +01:00
Kris Katterjohn
3860288ee8 [NET]: Use is_zero_ether_addr() in net/core/netpoll.c
This replaces a memcmp() with is_zero_ether_addr().

Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-17 15:15:38 -08:00
Kris Katterjohn
f404e9a67f [PKTGEN]: Replacing with (compare|is_zero)_ether_addr() and ETH_ALEN
This replaces some tests with is_zero_ether_addr(), memcmp(one, two,
6) with compare_ether_addr(one, two), and 6 with ETH_ALEN where
appropriate.

Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-17 13:04:57 -08:00
Kris Katterjohn
a8fc3d8dec [NET]: "signed long" -> "long"
Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-17 13:03:54 -08:00
Patrick McHardy
ab67a4d511 [EBTABLES]: Handle SCTP/DCCP in ebt_{ip,log}
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-17 13:01:31 -08:00
Patrick McHardy
ae82af54d7 [PKT_SCHED]: Handle SCTP/DCCP in sfq_hash
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-17 13:01:06 -08:00
Tsutomu Fujii
a7d1f1b66c [SCTP]: Fix sctp_rcv_ootb() to handle the last chunk of a packet correctly.
Signed-off-by: Tsutomu Fujii <t-fujii@nb.jp.nec.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
2006-01-17 11:57:09 -08:00
Sridhar Samudrala
c4d2444e99 [SCTP]: Fix couple of races between sctp_peeloff() and sctp_rcv().
Validate and update the sk in sctp_rcv() to avoid the race where an
assoc/ep could move to a different socket after we get the sk, but before
the skb is added to the backlog.

Also migrate the skb's in backlog queue to new sk when doing a peeloff.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
2006-01-17 11:56:26 -08:00
Vlad Yasevich
313e7b4d25 [SCTP]: Fix machine check/connection hang on IA64.
sctp_unpack_cookie used an on-stack array called digest as a result/out
parameter in the call to crypto_hmac. However, hmac code
(crypto_hmac_final)
assumes that the 'out' argument is in virtual memory (identity mapped
region)
and can use virt_to_page call on it.  This does not work with the on-stack
declared digest.  The problems observed so far have been:
 a) incorrect hmac digest
 b) machine check and hardware reset.

Solution is to define the digest in an identity mapped region by
kmalloc'ing
it.  We can do this once as part of the endpoint structure and re-use it
when
verifying the SCTP cookie.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
2006-01-17 11:55:57 -08:00
Vlad Yasevich
8116ffad41 [SCTP]: Fix bad sysctl formatting of SCTP timeout values on 64-bit m/cs.
Change all the structure members that hold jiffies to be of type
unsigned long.  This also corrects bad sysctl formating on 64 bit
architectures.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
2006-01-17 11:55:17 -08:00
Vlad Yasevich
38b0e42aba [SCTP]: Fix sctp_assoc_seq_show() panics on big-endian systems.
This patch corrects the panic by casting the argument to the
pointer of correct size.  On big-endian systems we ended up loading
only 32 bits of data because we are treating the pointer as an int*.
By treating this pointer as loff_t*, we'll load the full 64 bits
and then let regular integer demotion take place which will give us
the correct value.

Signed-off-by: Vlad Yaseivch <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
2006-01-17 11:54:06 -08:00
Vlad Yasevich
49392e5ecf [SCTP]: sctp doesn't show all associations/endpoints in /proc
When creating a very large number of associations (and endpoints),
/proc/assocs and /proc/eps will not show all of them.  As a result
netstat will not show all of the either.  This is particularly evident
when creating 1000+ associations (or endpoints).  As an example with
1500 tcp style associations over loopback, netstat showed 1420 on my
system instead of 3000.

The reason for this is that the seq_operations start method is invoked
multiple times bacause of the amount of data that is provided.  The
start method always increments the position parameter and since we use
the position as the hash bucket id, we end up skipping hash buckets.

This patch corrects this situation and get's rid of the silly hash-1
decrement.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
2006-01-17 11:53:06 -08:00
Vlad Yasevich
9834a2bb49 [SCTP]: Fix sctp_cookie alignment in the packet.
On 64 bit architectures, sctp_cookie sent as part of INIT-ACK is not
aligned on a 64 bit boundry and thus causes unaligned access exceptions.

The layout of the cookie prameter is this:
|<----- Parameter Header --------------------|<--- Cookie DATA --------
-----------------------------------------------------------------------
| param type (16 bits) | param len (16 bits) | sig [32 bytes] | cookie..
-----------------------------------------------------------------------

The cookie data portion contains 64 bit values on 64 bit architechtures
(timeval) that fall on a 32 bit alignment boundry when used as part of
the on-wire format, but align correctly when used in internal
structures.  This patch explicitely pads the on-wire format so that
it is properly aligned.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
2006-01-17 11:52:12 -08:00
Sridhar Samudrala
7a48f923b8 [SCTP]: Fix potential race condition between sctp_close() and sctp_rcv().
Do not release the reference to association/endpoint if an incoming skb is
added to backlog. Instead release it after the chunk is processed in
sctp_backlog_rcv().

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2006-01-17 11:51:28 -08:00
Eric Dumazet
2f970d8357 [IPV4]: rt_cache_stat can be statically defined
Using __get_cpu_var(obj) is slightly faster than per_cpu_ptr(obj, 
raw_smp_processor_id()).

1) Smaller code and memory use
For static and small objects, DEFINE_PER_CPU(type, object) is preferred over a 
alloc_percpu() : Better and smaller code to access them, and no extra memory 
(storing the pointer, and the percpu array of pointers)

x86_64 code before patch

mov    1237577(%rip),%rax        # ffffffff803e5990 <rt_cache_stat>
not    %rax  # part of per_cpu machinery
mov    %gs:0x3c,%edx # get cpu number
movslq %edx,%rdx # extend 32 bits cpu number to 64 bits
mov    (%rax,%rdx,8),%rax # get the pointer for this cpu
incl   0x38(%rax)

x86_64 code after patch

mov    $per_cpu__rt_cache_stat,%rdx
mov    %gs:0x48,%rax # get percpu data offset
incl   0x38(%rax,%rdx,1)

2) False sharing avoidance for SMP :
For a small NR_CPUS, the array of per cpu pointers allocated in alloc_percpu() 
can be <= 32 bytes. This let slab code gives a part of a cache line. If the 
other part of this 64 bytes (or 128 bytes) cache line is used by a mostly 
written object, we can have false sharing and expensive per_cpu_ptr() operations.

Size of rt_cache_stat is 64 bytes, so this patch is not a danger of a too big 
increase of bss (in UP mode) or static per_cpu data for SMP 
(PERCPU_ENOUGH_ROOM is currently 32768 bytes)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-17 02:54:36 -08:00
David S. Miller
f09484ff87 [NETFILTER]: ip_conntrack_proto_gre.c needs linux/interrupt.h
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-17 02:42:02 -08:00
Yasuyuki Kozakai
f0daaa654a [NETFILTER] ip6tables: whitespace and indent cosmetic cleanup
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-17 02:39:39 -08:00
Yasuyuki Kozakai
6dd42af790 [NETFILTER] Makefile cleanup
These are replaced with x_tables matches and no longer exist.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-17 02:38:56 -08:00
Benoit Boissinot
ccc91324a1 [NETFILTER] ip[6]t_policy: Fix compilation warnings
ip[6]t_policy argument conversion slipped when merging with x_tables

Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-17 02:26:34 -08:00
Kris Katterjohn
e35bedf369 [NET]: Fix whitespace issues in net/core/filter.c
This fixes some whitespace issues in net/core/filter.c

Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-17 02:25:52 -08:00
Amnon Aaronsohn
dd914b4082 [PKT_SCHED] sch_prio: fix qdisc bands init
Currently when PRIO is configured to use N bands, it lets the packets be
directed to any of the bands 0..N-1. However, PRIO attaches a fifo qdisc
only to the bands that appear in the priomap; the rest of the N bands
remain with a noop qdisc attached. This patch changes PRIO's behavior so
that it attaches a fifo qdisc to all of the N bands.

Signed-off-by: Amnon Aaronsohn <bla@cs.huji.ac.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-17 02:24:26 -08:00
YOSHIFUJI Hideaki
9343e79a7b [IPV6]: Preserve procfs IPV6 address output format
Procfs always output IPV6 addresses without the colon
characters, and we cannot change that.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-17 02:10:53 -08:00
Linus Torvalds
caf5b04c82 x86: Work around compiler code generation bug with -Os
Some versions of gcc generate incorrect code for the inet_check_attr()
function, apparently due to a totally bogus index -> pointer comparison
transformation.

At least "gcc version 4.0.1 20050727 (Red Hat 4.0.1-5)" from FC4 is
affected, possibly others too.

This changes the function subtly so that the buggy gcc transformation
doesn't trigger.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-14 22:08:28 -08:00
Arjan van de Ven
858119e159 [PATCH] Unlinline a bunch of other functions
Remove the "inline" keyword from a bunch of big functions in the kernel with
the goal of shrinking it by 30kb to 40kb

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-14 18:27:06 -08:00
David S. Miller
37d8dc82e0 [NETFILTER] x-tables: Missing linux/ipv6.h includes.
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-13 16:19:44 -08:00
Patrick McHardy
dca80b962a [PKT_SCHED]: Change default clock source to gettimeofday
The default of using jiffies is very bad and results in
underutilization except with very low bandwidth.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-13 14:36:55 -08:00
Patrick McHardy
ee51b1b6ce [XFRM]: IPsec tunnel wildcard address support
When the source address of a tunnel is given as 0.0.0.0 do a routing lookup
to get the real source address for the destination and fill that into the
acquire message. This allows to specify policies like this:

spdadd 172.16.128.13/32 172.16.0.0/20 any -P out ipsec
        esp/tunnel/0.0.0.0-x.x.x.x/require;
spdadd 172.16.0.0/20 172.16.128.13/32 any -P in ipsec
        esp/tunnel/x.x.x.x-0.0.0.0/require;

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-13 14:34:36 -08:00
Kris Katterjohn
7b11f69fb5 [NET]: Clean up comments for sk_chk_filter()
This removes redundant comments, and moves one comment to a better
location.

Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-13 14:33:06 -08:00
Joe Perches
46b86a2da0 [NET]: Use NIP6_FMT in kernel.h
There are errors and inconsistency in the display of NIP6 strings.
	ie: net/ipv6/ip6_flowlabel.c

There are errors and inconsistency in the display of NIPQUAD strings too.
	ie: net/netfilter/nf_conntrack_ftp.c

This patch:
	adds NIP6_FMT to kernel.h
	changes all code to use NIP6_FMT
	fixes net/ipv6/ip6_flowlabel.c
	adds NIPQUAD_FMT to kernel.h
	fixes net/netfilter/nf_conntrack_ftp.c
	changes a few uses of "%u.%u.%u.%u" to NIPQUAD_FMT for symmetry to NIP6_FMT

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-13 14:29:07 -08:00
Per Liden
23b0ca5bf5 [PATCH] genetlink: don't touch module ref count
Increasing the module ref count at registration will block the module from
ever being unloaded. In fact, genetlink should not care about the owner at
all. This patch removes the owner field from the struct registered with
genetlink.

Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-13 13:06:40 -08:00
Harald Welte
2e4e6a17af [NETFILTER] x_tables: Abstraction layer for {ip,ip6,arp}_tables
This monster-patch tries to do the best job for unifying the data
structures and backend interfaces for the three evil clones ip_tables,
ip6_tables and arp_tables.  In an ideal world we would never have
allowed this kind of copy+paste programming... but well, our world
isn't (yet?) ideal.

o introduce a new x_tables module
o {ip,arp,ip6}_tables depend on this x_tables module
o registration functions for tables, matches and targets are only
  wrappers around x_tables provided functions
o all matches/targets that are used from ip_tables and ip6_tables
  are now implemented as xt_FOOBAR.c files and provide module aliases
  to ipt_FOOBAR and ip6t_FOOBAR
o header files for xt_matches are in include/linux/netfilter/,
  include/linux/netfilter_{ipv4,ipv6} contains compatibility wrappers
  around the xt_FOOBAR.h headers

Based on this patchset we're going to further unify the code,
gradually getting rid of all the layer 3 specific assumptions.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-12 14:06:43 -08:00
David S. Miller
880b005f29 [TIPC]: Fix 64-bit build warnings.
When storing u32 values in a pointer, need to do
some long casts to keep GCC happy.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-12 14:06:41 -08:00
Per Liden
593a5f22d8 [TIPC] More updates of file headers
Updated copyright notice to include the year the file was
actually created. Information about file creation dates
was extracted from the files in the old CVS repository
at tipc.sourceforge.net.

Signed-off-by: Per Liden <per.liden@nospam.ericsson.com>
2006-01-12 14:06:39 -08:00
Per Liden
9da1c8b694 [TIPC] Update of file headers
The copyright statements from different parts of Ericsson
have been merged into one.

Signed-off-by: Per Liden <per.liden@nospam.ericsson.com>
2006-01-12 14:06:38 -08:00
Per Liden
d0a14a9dbd [TIPC] Cleaned up info/warn/err macros
Signed-off-by: Per Liden <per.liden@nospam.ericsson.com>
2006-01-12 14:06:37 -08:00
Per Liden
9ea1fd3c1a [TIPC] License header update
The license header in each file now more clearly state that this
code is licensed under a dual BSD/GPL. Before this was only
evident if you looked at the MODULE_LICENSE line in core.c.

Signed-off-by: Per Liden <per.liden@nospam.ericsson.com>
2006-01-12 14:06:36 -08:00
Per Liden
ea714ccda5 [TIPC] Moved configuration interface into tipc_config.h
Restored the old tipc_config.h to get a cleaner division between the
interfaces used by normal TIPC users and TIPC administration utilities.

Signed-off-by: Per Liden <per.liden@nospam.ericsson.com>
2006-01-12 14:06:35 -08:00
Jon Maloy
b70e4f45a8 [TIPC} Fixed bug in disc_timeout()
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
2006-01-12 14:06:33 -08:00
Per Liden
1dba974333 [TIPC] Use dynamically allocated family id with NETLINK_GENERIC
Signed-off-by: Per Liden <per.liden@nospam.ericsson.com>
2006-01-12 14:06:32 -08:00
Per Liden
b97bf3fd8f [TIPC] Initial merge
TIPC (Transparent Inter Process Communication) is a protocol designed for
intra cluster communication. For more information see
http://tipc.sourceforge.net

Signed-off-by: Per Liden <per.liden@nospam.ericsson.com>
2006-01-12 14:06:31 -08:00
Randy Dunlap
4fc268d24c [PATCH] capable/capability.h (net/)
net: Use <linux/capability.h> where capable() is used.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-11 18:42:14 -08:00
Adrian Bunk
bb7e8c5a55 [PKT_SCHED] net/sched/Kconfig: fix typo in NET_EMATCH_META description
Noted by Matt LaPlante <webmaster@cyberdogtech.com>.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-11 16:40:30 -08:00
Evgeniy Polyakov
54608b7099 [PKT_SCHED] ematch: Remove bogus include.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-11 16:32:16 -08:00
Evgeniy Polyakov
c3f343e4d7 [NET]: Fix diverter build.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-11 16:32:15 -08:00
Kris Katterjohn
8b3a70058b [NET]: Remove more unneeded typecasts on *malloc()
This removes more unneeded casts on the return value for kmalloc(),
sock_kmalloc(), and vmalloc().

Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-11 16:32:14 -08:00
David Woodhouse
ae0f7d5f83 [IPV6]: Avoid calling ip6_xmit() with NULL sk
The ip6_xmit() function now assumes that its sk argument is non-NULL,
which isn't currently true when TCPv6 code is sending RST or ACK
packets. This fixes that code to use a socket of its own for sending
such packets, as TCPv4 does. (Thanks Andi for the pointer).

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-11 16:32:13 -08:00
David S. Miller
a776809755 [NETFILTER]: ip_ct_proto_gre_fini() cannot be __exit
It is invoked from failures paths of __init code.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-11 16:32:12 -08:00
David S. Miller
82bf7e97ac [NET]: Some more missing include/etherdevice.h includes
For compare_ether_addr()

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-11 16:32:11 -08:00
David S. Miller
5bf887f2ff [IPV6]: Fix modular build with netfilter enabled.
Also, drop __exit marker from ipv6_netfilter_fini() as this
can be invoked from inet6_init() error handling paths.

Based upon a report from Stephen Hemminger.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-10 21:02:21 -08:00
Linus Torvalds
9819d85c21 Fix net/core/wireless.c link failure
It needs <linux/etherdevice.h> for compare_ether_addr()
2006-01-10 19:35:19 -08:00
Nicolas Kaiser
b8ab50bc55 netfilter: headers included twice
Headers included twice.

Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-01-11 02:04:35 +01:00
Bart De Schuymer
8a4c8a96a4 [EBTABLES] Don't match tcp/udp source/destination port for IP fragments
Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-10 13:12:22 -08:00
Jesper Juhl
12fe2c588d [NET]: Remove unneeded kmalloc() return value casts
Get rid of needless casting of kmalloc() return value in net/

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-10 13:08:21 -08:00
Jesper Juhl
ea2e90dfce [RXRPC]: Decrease number of pointer derefs in connection.c
Decrease the number of pointer derefs in net/rxrpc/connection.c

Benefits of the patch:
 - Fewer pointer dereferences should make the code slightly faster.
 - Size of generated code is smaller
 - improved readability

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-10 13:07:44 -08:00
Martin Murray
ad8e4b75c8 [AF_NETLINK]: Fix DoS in netlink_rcv_skb()
From: Martin Murray <murrayma@citi.umich.edu>

Sanity check nlmsg_len during netlink_rcv_skb.  An nlmsg_len == 0 can
cause infinite loop in kernel, effectively DoSing machine.  Noted by
Matin Murray.

Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-10 13:02:29 -08:00
Patrick McHardy
babbdb1a18 [NETFILTER]: Fix timeout sysctls on big-endian 64bit architectures
The connection tracking timeout variables are unsigned long, but
proc_dointvec_jiffies is used with sizeof(unsigned int) in the sysctl
tables. Since there is no proc_doulongvec_jiffies function, change the
timeout variables to unsigned int.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-10 12:54:35 -08:00
Patrick McHardy
9d28026b7e [NETFILTER]: Remove unused function from NAT protocol helpers
->print and ->print_range are not used (and apparently never were).

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-10 12:54:34 -08:00
Patrick McHardy
c07bc1ffbd [NETFILTER]: Fix return value confusion in PPTP NAT helper
ip_nat_mangle_tcp_packet doesn't return NF_* values but 0/1 for
failure/success.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-10 12:54:33 -08:00
Patrick McHardy
03b9feca89 [NETFILTER]: Fix another crash in ip_nat_pptp
The PPTP NAT helper calculates the offset at which the packet needs
to be mangled as difference between two pointers to the header. With
non-linear skbs however the pointers may point to two seperate buffers
on the stack and the calculation results in a wrong offset beeing
used.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-10 12:54:32 -08:00
Patrick McHardy
15db34702c [NETFILTER]: Fix crash in ip_nat_pptp
When an inbound PPTP_IN_CALL_REQUEST packet is received the
PPTP NAT helper uses a NULL pointer in pointer arithmentic to
calculate the offset in the packet which needs to be mangled
and corrupts random memory or crashes.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-10 12:54:30 -08:00
Patrick McHardy
bb94aa169e [NETFILTER]: net/ipv[46]/netfilter.c cleanups
Don't wrap entire file in #ifdef CONFIG_NETFILTER, remove a few
unneccessary includes.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-10 12:54:29 -08:00
Kris Katterjohn
d3f4a687f6 [NET]: Change memcmp(,,ETH_ALEN) to compare_ether_addr()
This changes some memcmp(one,two,ETH_ALEN) to compare_ether_addr(one,two).

Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-10 12:54:28 -08:00
Linus Torvalds
4f47707b05 Fix rpc shutdown event condition bug
We want to wait for the cl_users to go down to zero, not for it to stay
positive.  Quoth Trond (who wasn't even the author, but acked the wrong
version): "Argh! I need to increase my daily caffeine dosages."

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-10 08:56:39 -08:00
Alan Cox
33f0f88f1c [PATCH] TTY layer buffering revamp
The API and code have been through various bits of initial review by
serial driver people but they definitely need to live somewhere for a
while so the unconverted drivers can get knocked into shape, existing
drivers that have been updated can be better tuned and bugs whacked out.

This replaces the tty flip buffers with kmalloc objects in rings. In the
normal situation for an IRQ driven serial port at typical speeds the
behaviour is pretty much the same, two buffers end up allocated and the
kernel cycles between them as before.

When there are delays or at high speed we now behave far better as the
buffer pool can grow a bit rather than lose characters. This also means
that we can operate at higher speeds reliably.

For drivers that receive characters in blocks (DMA based, USB and
especially virtualisation) the layer allows a lot of driver specific
code that works around the tty layer with private secondary queues to be
removed. The IBM folks need this sort of layer, the smart serial port
people do, the virtualisers do (because a virtualised tty typically
operates at infinite speed rather than emulating 9600 baud).

Finally many drivers had invalid and unsafe attempts to avoid buffer
overflows by directly invoking tty methods extracted out of the innards
of work queue structs. These are no longer needed and all go away. That
fixes various random hangs with serial ports on overflow.

The other change in here is to optimise the receive_room path that is
used by some callers. It turns out that only one ldisc uses receive room
except asa constant and it updates it far far less than the value is
read. We thus make it a variable not a function call.

I expect the code to contain bugs due to the size alone but I'll be
watching and squashing them and feeding out new patches as it goes.

Because the buffers now dynamically expand you should only run out of
buffering when the kernel runs out of memory for real.  That means a lot of
the horrible hacks high performance drivers used to do just aren't needed any
more.

Description:

tty_insert_flip_char is an old API and continues to work as before, as does
tty_flip_buffer_push() [this is why many drivers dont need modification].  It
does now also return the number of chars inserted

There are also

tty_buffer_request_room(tty, len)

which asks for a buffer block of the length requested and returns the space
found.  This improves efficiency with hardware that knows how much to
transfer.

and tty_insert_flip_string_flags(tty, str, flags, len)

to insert a string of characters and flags

For a smart interface the usual code is

    len = tty_request_buffer_room(tty, amount_hardware_says);
    tty_insert_flip_string(tty, buffer_from_card, len);

More description!

At the moment tty buffers are attached directly to the tty.  This is causing a
lot of the problems related to tty layer locking, also problems at high speed
and also with bursty data (such as occurs in virtualised environments)

I'm working on ripping out the flip buffers and replacing them with a pool of
dynamically allocated buffers.  This allows both for old style "byte I/O"
devices and also helps virtualisation and smart devices where large blocks of
data suddenely materialise and need storing.

So far so good.  Lots of drivers reference tty->flip.*.  Several of them also
call directly and unsafely into function pointers it provides.  This will all
break.  Most drivers can use tty_insert_flip_char which can be kept as an API
but others need more.

At the moment I've added the following interfaces, if people think more will
be needed now is a good time to say

 int tty_buffer_request_room(tty, size)

Try and ensure at least size bytes are available, returns actual room (may be
zero).  At the moment it just uses the flipbuf space but that will change.
Repeated calls without characters being added are not cumulative.  (ie if you
call it with 1, 1, 1, and then 4 you'll have four characters of space.  The
other functions will also try and grow buffers in future but this will be a
more efficient way when you know block sizes.

 int tty_insert_flip_char(tty, ch, flag)

As before insert a character if there is room.  Now returns 1 for success, 0
for failure.

 int tty_insert_flip_string(tty, str, len)

Insert a block of non error characters.  Returns the number inserted.

 int tty_prepare_flip_string(tty, strptr, len)

Adjust the buffer to allow len characters to be added.  Returns a buffer
pointer in strptr and the length available.  This allows for hardware that
needs to use functions like insl or mencpy_fromio.

Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Paul Fulghum <paulkf@microgate.com>
Signed-off-by: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-10 08:01:59 -08:00
Ingo Molnar
532347e2bb [PATCH] nfs: sleep_on() removal
Convert sleep_on() to wait_event_timeout().  Probably safe with the BKL but
could be racy once BKL use in NFS-client is gone.

Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-10 08:01:42 -08:00
Andrey Borzenkov
6dd214b554 [PATCH] fix /sys/class/net/<if>/wireless without dev->get_wireless_stats
dev->get_wireless_stats is deprecated but removing it also removes wireless
subdirectory in sysfs. This patch puts it back.

akpm: I don't know what's happening here.  This might be appropriate as a
2.6.15.x compatibility backport.  Waiting to hear from Jeff.

Signed-off-by: Andrey Borzenkov <arvidjaar@mail.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-10 08:01:24 -08:00
Linus Torvalds
80c0531514 Merge master.kernel.org:/pub/scm/linux/kernel/git/mingo/mutex-2.6 2006-01-09 17:31:38 -08:00
Linus Torvalds
a457aa6c2b Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial 2006-01-09 17:06:53 -08:00
Jes Sorensen
1b1dcc1b57 [PATCH] mutex subsystem, semaphore to mutex: VFS, ->i_sem
This patch converts the inode semaphore to a mutex. I have tested it on
XFS and compiled as much as one can consider on an ia64. Anyway your
luck with it might be different.

Modified-by: Ingo Molnar <mingo@elte.hu>

(finished the conversion)

Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2006-01-09 15:59:24 -08:00
Adrian Bunk
93b1fae491 spelling: s/trough/through/
Additionally, one comment was reformulated by Joe Perches <joe@perches.com>.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-01-10 00:13:33 +01:00
Arnaldo Carvalho de Melo
dff2c03534 [INET_DIAG]: Introduce sk_diag_fill
To be called from inet_diag_get_exact, also rename inet_diag_fill to
inet_csk_diag_fill, for consistency with inet_twsk_diag_fill.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09 14:56:56 -08:00
Arnaldo Carvalho de Melo
c7d58aabdc [INET_DIAG]: Introduce inet_twsk_diag_dump & inet_twsk_diag_fill
To properly dump TIME_WAIT sockets and to reduce complexity a bit by
having per socket class accessor routines.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09 14:56:38 -08:00
Arnaldo Carvalho de Melo
4e852c0279 [INET_DIAG]: whitespace/simple cleanups
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09 14:56:19 -08:00
Arnaldo Carvalho de Melo
7dbf075524 [INET_DIAG]: Use inet_twsk() with TIME_WAIT sockets
The fields being accessed in inet_diag_dump are outside sock_common, the
common part of struct sock and struct inet_timewait_sock.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09 14:56:03 -08:00
Patrick McHardy
a2c2064f7f [IPV6]: Set skb->priority in ip6_output.c
Set skb->priority = sk->sk_priority as in raw.c and IPv4.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09 14:16:31 -08:00
Patrick McHardy
cfacb0577e [IPV4]: ip_output.c needs xfrm.h
This patch fixes a warning from my IPsec patches:

   CC      net/ipv4/ip_output.o
net/ipv4/ip_output.c: In function 'ip_finish_output':
net/ipv4/ip_output.c:208: warning: implicit declaration of function
'xfrm4_output_finish'

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09 14:16:28 -08:00
Jamal Hadi Salim
29f1df6cc1 [PKT_SCHED]: Fix qdisc return code.
The mapping between TC_ACTION_SHOT and the qdisc return codes is better
suited to NET_XMIT_BYPASS so as not to confuse TCP

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09 14:16:26 -08:00
Kris Katterjohn
09a626600b [NET]: Change some "if (x) BUG();" to "BUG_ON(x);"
This changes some simple "if (x) BUG();" statements to "BUG_ON(x);"

Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09 14:16:18 -08:00
Patrick McHardy
4bba392592 [PKT_SCHED]: Prefix tc actions with act_
Clean up the net/sched directory a bit by prefix all actions with act_.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09 14:16:14 -08:00
Patrick McHardy
541673c859 [PKT_SCHED]: Fix memory leak when dumping in pedit action
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09 14:16:12 -08:00
Patrick McHardy
31bd06eb33 [PKT_SCHED]: Remove some obsolete policer exports
Also make sure the legacy code is only built when CONFIG_NET_CLS_ACT
is not set.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09 14:16:10 -08:00
Patrick McHardy
f43c5a0df3 [PKT_SCHED]: Convert tc action functions to single skb pointers
tcf_action_exec only gets a single skb pointer and doesn't own the skb,
but passes double skb pointers (to a local variable) to the action
functions. Change to use single skb pointers everywhere.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09 14:16:08 -08:00
Patrick McHardy
538e43a4bd [PKT_SCHED]: Use USEC_PER_SEC
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09 14:16:05 -08:00
Patrick McHardy
2941a48631 [NET]: Convert net/{ipv4,ipv6,sched} to netdev_priv
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-09 14:16:03 -08:00
Linus Torvalds
cf10b2853f Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 2006-01-09 09:39:05 -08:00
Kirill Korotaev
14591de147 [PATCH] netlink oops fix due to incorrect error code
Fixed oops after failed netlink socket creation.

Wrong parathenses in if() statement caused err to be 1,
instead of negative value.

Trivial fix, not trivial to find though.

Signed-Off-By: Dmitry Mishin <dim@sw.ru>
Signed-Off-By: Kirill Korotaev <dev@openvz.org>
Signed-Off-By: Linus Torvalds <torvalds@osdl.org>
2006-01-09 09:36:52 -08:00
Johannes Berg
a4bf26f30e [PATCH] ieee80211: enable hw wep where host has to build IV
This patch fixes some of the ieee80211 crypto related code so that
instead of having the host fully do crypto operations, the host_build_iv
flag works properly (for WEP in this patch) which, if turned on,
requires the hardware to do all crypto operations, but the ieee80211
layer builds the IV. The hardware also has to build the ICV.

Previously, the host_build_iv flag couldn't be used at all for WEP, and
not alone (with both host_decrypt and host_encrypt disabled) because the
crypto algorithm wasn't assigned. This is also fixed.

I have tested this patch both in host crypto mode and in hw crypto mode
(with the Broadcom chipset).

[resent, signing digitally caused it to be MIME-junked, sorry]

Signed-Off-By: Johannes Berg <johannes@sipsolutions.net>

Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2006-01-09 10:34:25 -05:00
Matt Mackall
18e92b12e8 [PATCH] tiny: Trim non-IPX builds
trivial: drop unused 802.3 code if we compile without IPX

(originally from http://wohnheim.fh-wedel.de/~joern/software/kernel/je/25/)

Signed-off-by: Matt Mackall <mpm@selenic.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Arnaldo Carvalho de Melo <acme@conectiva.com.br>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:14:10 -08:00
Eric Dumazet
5160ee6fc8 [PATCH] shrink dentry struct
Some long time ago, dentry struct was carefully tuned so that on 32 bits
UP, sizeof(struct dentry) was exactly 128, ie a power of 2, and a multiple
of memory cache lines.

Then RCU was added and dentry struct enlarged by two pointers, with nice
results for SMP, but not so good on UP, because breaking the above tuning
(128 + 8 = 136 bytes)

This patch reverts this unwanted side effect, by using an union (d_u),
where d_rcu and d_child are placed so that these two fields can share their
memory needs.

At the time d_free() is called (and d_rcu is really used), d_child is known
to be empty and not touched by the dentry freeing.

Lockless lookups only access d_name, d_parent, d_lock, d_op, d_flags (so
the previous content of d_child is not needed if said dentry was unhashed
but still accessed by a CPU because of RCU constraints)

As dentry cache easily contains millions of entries, a size reduction is
worth the extra complexity of the ugly C union.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: Maneesh Soni <maneesh@in.ibm.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Ian Kent <raven@themaw.net>
Cc: Paul Jackson <pj@sgi.com>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Cc: James Morris <jmorris@namei.org>
Cc: Stephen Smalley <sds@epoch.ncsc.mil>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:13:58 -08:00
Pekka Enberg
f9f7500521 [PATCH] slab: remove unused align parameter from alloc_percpu
__alloc_percpu and alloc_percpu both take an 'align' argument which is
completely ignored.  snmp6_mib_init() in net/ipv6/af_inet6.c attempts to use
it, but it will be ignored.  Therefore, remove the 'align' argument and fixup
the lone caller.

Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:12:39 -08:00
Adrian Bunk
9f5336e218 [IPV6]: small cleanups
This patch contains the following cleanups:
- addrconf.c: make addrconf_dad_stop() static
- inet6_connection_sock.c should #include <net/inet6_connection_sock.h>
  for getting the prototypes of it's global functions

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07 13:24:25 -08:00
Adrian Bunk
97dc627fb3 [IPV4]: make ip_fragment() static
Since there's no longer any external user of ip_fragment() we can make 
it static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07 13:23:39 -08:00
Joe Kappus
da7bc6ee8e [NETFILTER]: ip_conntrack_proto_sctp.c needs linux/interrupt.h
Signed-off-by: Joe Kappus <joecool1029@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07 12:57:41 -08:00
Patrick McHardy
e16a8f0b8c [NETFILTER]: Add ipt_policy/ip6t_policy matches
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07 12:57:38 -08:00
Patrick McHardy
eb9c7ebe69 [NETFILTER]: Handle NAT in IPsec policy checks
Handle NAT of decapsulated IPsec packets by reconstructing the struct flowi
of the original packet from the conntrack information for IPsec policy
checks.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07 12:57:37 -08:00
Patrick McHardy
b59c270104 [NETFILTER]: Keep conntrack reference until IPsec policy checks are done
Keep the conntrack reference until policy checks have been performed for
IPsec NAT support. The reference needs to be dropped before a packet is
queued to avoid having the conntrack module unloadable.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07 12:57:36 -08:00
Patrick McHardy
5c901daaea [NETFILTER]: Redo policy lookups after NAT when neccessary
When NAT changes the key used for the xfrm lookup it needs to be done
again. If a new policy is returned in POST_ROUTING the packet needs
to be passed to xfrm4_output_one manually after all hooks were called
because POST_ROUTING is called with fixed okfn (ip_finish_output).

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07 12:57:35 -08:00
Patrick McHardy
4e8e9de7c2 [NETFILTER]: Use conntrack information to determine if packet was NATed
Preparation for IPsec support for NAT:
Use conntrack information instead of saving the saving and comparing the
addresses to determine if a packet was NATed and needs to be rerouted to
make it easier to extend the key.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07 12:57:34 -08:00
Patrick McHardy
3e3850e989 [NETFILTER]: Fix xfrm lookup in ip_route_me_harder/ip6_route_me_harder
ip_route_me_harder doesn't use the port numbers of the xfrm lookup and
uses ip_route_input for non-local addresses which doesn't do a xfrm
lookup, ip6_route_me_harder doesn't do a xfrm lookup at all.

Use xfrm_decode_session and do the lookup manually, make sure both
only do the lookup if the packet hasn't been transformed already.

Makeing sure the lookup only happens once needs a new field in the
IP6CB, which exceeds the size of skb->cb. The size of skb->cb is
increased to 48b. Apparently the IPv6 mobile extensions need some
more room anyway.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07 12:57:33 -08:00
Patrick McHardy
8cdfab8a43 [IPV4]: reset IPCB flags when neccessary
Reset IPSKB_XFRM_TUNNEL_SIZE flags in ipip and ip_gre hard_start_xmit
function before the packet reenters IP. This is neccessary so the
encapsulated packets are checked not to be oversized in xfrm4_output.c
again. Reset all flags in sit when a packet changes its address family.

Also remove some obsolete IPSKB flags.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07 12:57:32 -08:00
Patrick McHardy
b05e106698 [IPV4/6]: Netfilter IPsec input hooks
When the innermost transform uses transport mode the decapsulated packet
is not visible to netfilter. Pass the packet through the PRE_ROUTING and
LOCAL_IN hooks again before handing it to upper layer protocols to make
netfilter-visibility symetrical to the output path.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07 12:57:31 -08:00
Patrick McHardy
951dbc8ac7 [IPV6]: Move nextheader offset to the IP6CB
Move nextheader offset to the IP6CB to make it possible to pass a
packet to ip6_input_finish multiple times and have it skip already
parsed headers. As a nice side effect this gets rid of the manual
hopopts skipping in ip6_input_finish.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07 12:57:29 -08:00
Patrick McHardy
16a6677fdf [XFRM]: Netfilter IPsec output hooks
Call netfilter hooks before IPsec transforms. Packets visit the
FORWARD/LOCAL_OUT and POST_ROUTING hook before the first encapsulation
and the LOCAL_OUT and POST_ROUTING hook before each following tunnel mode
transform.

Patch from Herbert Xu <herbert@gondor.apana.org.au>:

Move the loop from dst_output into xfrm4_output/xfrm6_output since they're
the only ones who need to it. xfrm{4,6}_output_one() processes the first SA
all subsequent transport mode SAs and is called in a loop that calls the
netfilter hooks between each two calls.

In order to avoid the tail call issue, I've added the inline function
nf_hook which is nf_hook_slow plus the empty list check.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07 12:57:28 -08:00
David S. Miller
aa0e4e4aea [DCCP]: ipv6.c needs net/ip6_checksum.c
Reported by Dave Jones.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07 12:57:26 -08:00
Linus Torvalds
d8d8f6a4fd Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2006-01-06 15:24:28 -08:00
Linus Torvalds
57d1c91fa6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild 2006-01-06 15:23:56 -08:00
Alexey Dobriyan
a2167dc62e [NET]: Endian-annotate in_aton()
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-06 13:24:54 -08:00
Alexey Dobriyan
76ab608d86 [NET]: Endian-annotate struct iphdr
And fix trivial warnings that emerged.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-06 13:24:29 -08:00
Trent Jaeger
5f8ac64b15 [LSM-IPSec]: Corrections to LSM-IPSec Nethooks
This patch contains two corrections to the LSM-IPsec Nethooks patches
previously applied.  

(1) free a security context on a failed insert via xfrm_user 
interface in xfrm_add_policy.  Memory leak.

(2) change the authorization of the allocation of a security context
in a xfrm_policy or xfrm_state from both relabelfrom and relabelto 
to setcontext.

Signed-off-by: Trent Jaeger <tjaeger@cse.psu.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-06 13:22:39 -08:00
Luiz Capitulino
69549ddd2f [PKTGEN]: Adds missing __init.
pktgen_find_thread() and pktgen_create_thread() are only called at
initialization time.

Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-06 13:19:31 -08:00
Joe
3cbc4ab58f [NETFILTER]: ipt_helper.c needs linux/interrupt.h
From: Joe <joecool1029@gmail.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-06 13:15:11 -08:00
Stephen Hemminger
ee02b3a613 [BRIDGE] netfilter: vlan + hw checksum = bug?
It looks like the bridge netfilter code does not correctly update
the hardware checksum after popping off the VLAN header.

This is by inspection, I have *not* tested this.
To test you would need to set up a filtering bridge with vlans
and a device the does hardware receive checksum (skge, or sungem)

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-06 13:13:29 -08:00
Shaun Pereira
a20a855479 [X25]: Fix for broken x25 module.
When a user-space server application calls bind on a socket, then in kernel
space this bound socket is considered 'x25-linked' and the SOCK_ZAPPED flag
is unset.(As in x25_bind()/af_x25.c).

Now when a user-space client application attempts to connect to the server
on the listening socket, if the kernel accepts this in-coming call, then it
returns a new socket to userland and attempts to reply to the caller.

The reply/x25_sendmsg() will fail, because the new socket created on
call-accept has its SOCK_ZAPPED flag set by x25_make_new().
(sock_init_data() called by x25_alloc_socket() called by x25_make_new()
sets the flag to SOCK_ZAPPED)).

Fix: Using the sock_copy_flag() routine available in sock.h fixes this.

Tested on 32 and 64 bit kernels with x25 over tcp.

Signed-off-by: Shaun Pereira <pereira.shaun@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-06 13:11:35 -08:00
Kris Katterjohn
4bad4dc919 [NET]: Change sk_run_filter()'s return type in net/core/filter.c
It should return an unsigned value, and fix sk_filter() as well.

Signed-off-by: Kris Katterjohn <kjak@ispwest.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-06 13:08:20 -08:00
Kris Katterjohn
dbbc098828 [NET]: Use newer is_multicast_ether_addr() in some files
This uses is_multicast_ether_addr() because it has recently been
changed to do the same thing these seperate tests are doing.

Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-06 13:05:58 -08:00
Sam Ravnborg
367cb70421 kbuild: un-stringnify KBUILD_MODNAME
Now when kbuild passes KBUILD_MODNAME with "" do not __stringify it when
used. Remove __stringnify for all users.
This also fixes the output of:

$ ls -l /sys/module/
drwxr-xr-x 4 root root 0 2006-01-05 14:24 pcmcia
drwxr-xr-x 4 root root 0 2006-01-05 14:24 pcmcia_core
drwxr-xr-x 3 root root 0 2006-01-05 14:24 "processor"
drwxr-xr-x 3 root root 0 2006-01-05 14:24 "psmouse"

The quoting of the module names will be gone again.
Thanks to GregKH + Kay Sievers for reproting this.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2006-01-06 21:17:50 +01:00
J. Bruce Fields
9e56904e41 SUNRPC: Make krb5 report unsupported encryption types
Print messages when an unsupported encrytion algorthm is requested or
 there is an error locating a supported algorthm.

 Signed-off-by: Kevin Coffman <kwc@citi.umich.edu>
 Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:59:00 -05:00
J. Bruce Fields
42181d4baf SUNRPC: Make spkm3 report unsupported encryption types
Print messages when an unsupported encrytion algorthm is requested or
 there is an error locating a supported algorthm.

 Signed-off-by: Kevin Coffman <kwc@citi.umich.edu>
 Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:59 -05:00
J. Bruce Fields
9eed129bbd SUNRPC: Update the spkm3 code to use the make_checksum interface
Also update the tokenlen calculations to accomodate g_token_size().

 Signed-off-by: Andy Adamson <andros@citi.umich.edu>
 Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:59 -05:00
Trond Myklebust
0065db3285 SUNRPC: Clean up xprt_destroy()
We ought never to be calling xprt_destroy() if there are still active
 rpc_tasks. Optimise away the broken code that attempts to "fix" that case.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:58 -05:00
Trond Myklebust
632e3bdc50 SUNRPC: Ensure client closes the socket when server initiates a close
If the server decides to close the RPC socket, we currently don't actually
 respond until either another RPC call is scheduled, or until xprt_autoclose()
 gets called by the socket expiry timer (which may be up to 5 minutes
 later).

 This patch ensures that xprt_autoclose() is called much sooner if the
 server closes the socket.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:57 -05:00
Chuck Lever
f518e35aec SUNRPC: get rid of cl_chatty
Clean up: Every ULP that uses the in-kernel RPC client, except the NLM
 client, sets cl_chatty.  There's no reason why NLM shouldn't set it, so
 just get rid of cl_chatty and always be verbose.

 Test-plan:
 Compile with CONFIG_NFS enabled.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:56 -05:00
Chuck Lever
922004120b SUNRPC: transport switch API for setting port number
At some point, transport endpoint addresses will no longer be IPv4.  To hide
 the structure of the rpc_xprt's address field from ULPs and port mappers,
 add an API for setting the port number during an RPC bind operation.

 Test-plan:
 Destructive testing (unplugging the network temporarily).  Connectathon
 with UDP and TCP.  NFSv2/3 and NFSv4 mounting should be carefully checked.
 Probably need to rig a server where certain services aren't running, or
 that returns an error for some typical operation.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:56 -05:00
Chuck Lever
35f5a422ce SUNRPC: new interface to force an RPC rebind
We'd like to hide fields in rpc_xprt and rpc_clnt from upper layer protocols.
 Start by creating an API to force RPC rebind, replacing logic that simply
 sets cl_port to zero.

 Test-plan:
 Destructive testing (unplugging the network temporarily).  Connectathon
 with UDP and TCP.  NFSv2/3 and NFSv4 mounting should be carefully checked.
 Probably need to rig a server where certain services aren't running, or
 that returns an error for some typical operation.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:56 -05:00
Chuck Lever
0210714834 SUNRPC: switchable buffer allocation
Add RPC client transport switch support for replacing buffer management
 on a per-transport basis.

 In the current IPv4 socket transport implementation, RPC buffers are
 allocated as needed for each RPC message that is sent.  Some transport
 implementations may choose to use pre-allocated buffers for encoding,
 sending, receiving, and unmarshalling RPC messages, however.  For
 transports capable of direct data placement, the buffers can be carved
 out of a pre-registered area of memory rather than from a slab cache.

 Test-plan:
 Millions of fsx operations.  Performance characterization with "sio" and
 "iozone".  Use oprofile and other tools to look for significant regression
 in CPU utilization.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:55 -05:00
Adrian Bunk
fb459f45f7 SUNRPC: net/sunrpc/xdr.c: remove xdr_decode_string()
This patch removes ths unused function xdr_decode_string().

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Neil Brown <neilb@suse.de>
Acked-by: Charles Lever <Charles.Lever@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:53 -05:00
Trond Myklebust
969b7f2522 SUNRPC: Fix a potential race in rpc_pipefs.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:51 -05:00
Trond Myklebust
2bd615797e SUNRPC: Ensure that SIGKILL will always terminate a synchronous RPC call.
...and make sure that the "intr" flag also enables SIGHUP and SIGTERM to
 interrupt RPC calls too (as per the Solaris implementation).

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:45 -05:00
Trond Myklebust
e60859ac0e SUNRPC: rpc_execute should not return task->tk_status;
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:42 -05:00
Trond Myklebust
89991c24e4 SUNRPC: Get rid of some unused exports
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:41 -05:00
Trond Myklebust
44c288732f NFSv4: stateful NFSv4 RPC call interface
The NFSv4 model requires us to complete all RPC calls that might
 establish state on the server whether or not the user wants to
 interrupt it. We may also need to schedule new work (including
 new RPC calls) in order to cancel the new state.

 The asynchronous RPC model will allow us to ensure that RPC calls
 always complete, but in order to allow for "synchronous" RPC, we
 want to add the ability to wait for completion.
 The waits are, of course, interruptible.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:40 -05:00
Trond Myklebust
4ce70ada1f SUNRPC: Further cleanups
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:40 -05:00
Trond Myklebust
963d8fe533 RPC: Clean up RPC task structure
Shrink the RPC task structure. Instead of storing separate pointers
 for task->tk_exit and task->tk_release, put them in a structure.

 Also pass the user data pointer as a parameter instead of passing it via
 task->tk_calldata. This enables us to nest callbacks.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:39 -05:00
Trond Myklebust
abbcf28f23 SUNRPC: Yet more RPC cleanups
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:39 -05:00
Olaf Kirch
93fbf1a5de [PATCH] Keep nfsd from exiting when seeing recv() errors
I submitted this one previously - svc_tcp_recvfrom currently returns
any errors to the caller, including ECONNRESET and the like.

This is something svc_recv isn't able to deal with:

	len = svsk->sk_recvfrom(rqstp);
	[...]
	if (len == 0 || len == -EAGAIN) {
		[...]
		return -EAGAIN;
	}

	[...]
	return len;

The nfsd main loop will exit when it sees an error code other than
EAGAIN.

The following patch fixes this problem

svc_recv is not equipped to deal with error codes other than EAGAIN,
and will propagate anything else (such as ECONNRESET) up to nfsd,
causing it to exit.

Signed-off-by: Olaf Kirch <okir@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:59 -08:00
NeilBrown
1f1e030bf7 [PATCH] knfsd: fix hash function for IP addresses on 64bit little-endian machines.
The hash.h hash_long function, when used on a 64 bit machine, ignores many
of the middle-order bits.  (The prime chosen it too bit-sparse).

IP addresses for clients of an NFS server are very likely to differ only in
the low-order bits.  As addresses are stored in network-byte-order, these
bits become middle-order bits in a little-endian 64bit 'long', and so do
not contribute to the hash.  Thus you can have the situation where all
clients appear on one hash chain.

So, until hash_long is fixed (or maybe forever), us a hash function that
works well on IP addresses - xor the bytes together.

Thanks to "Iozone" <capps@iozone.org> for identifying this problem.

Cc: "Iozone" <capps@iozone.org>

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:21 -08:00
Kris Katterjohn
46f25dffba [NET]: Change 1500 to ETH_DATA_LEN in some files
These patches add the header linux/if_ether.h and change 1500 to
ETH_DATA_LEN in some files.

Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-05 16:48:56 -08:00
Andrew Morton
e924283bf9 [IPVS]: Another file needs linux/interrupt.h
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-05 16:48:55 -08:00
Yasuyuki Kozakai
e8eaedf2f8 [NETFILTER]: Use HOPLIMIT metric as TTL of TCP reset sent by REJECT
HOPLIMIT metric is appropriate to TCP reset sent by REJECT target
than hard-coded max TTL. Thanks to David S. Miller for hint.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-05 12:28:57 -08:00
Patrick McHardy
0ae2cfe7f3 [NETFILTER]: nf_conntrack_l3proto_ipv4.c needs net/route.h
CC [M]  net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.o
net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c: In function 'ipv4_refrag':
net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c:198: error: dereferencing pointer to incomplete type
make[3]: *** [net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.o] Error 1

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-05 12:21:52 -08:00
Patrick McHardy
22dea562bb [NETFILTER]: Export ip6_masked_addrcmp, don't pass IPv6 addresses on stack
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-05 12:21:34 -08:00
Patrick McHardy
b777e0ce74 [NETFILTER]: make ipv6_find_hdr() find transport protocol header
The original ipv6_find_hdr() finds the specified header in IPv6 packets.
This makes it possible to get transport header so that we can kill similar
loop in ip6_match_packet().

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-05 12:21:16 -08:00
Patrick McHardy
1bd9bef6f9 [NETFILTER]: Call POST_ROUTING hook before fragmentation
Call POST_ROUTING hook before fragmentation to get rid of the okfn use
in ip_refrag and save the useless fragmentation/defragmentation step
when NAT is used.

The patch introduces one user-visible change, the POSTROUTING chain
in the mangle table gets entire packets, not fragments, which should
simplify use of the MARK and CLASSIFY targets for queueing as a nice
side-effect.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-05 12:20:59 -08:00
Patrick McHardy
abbcc73982 [NETFILTER]: Remove okfn usage in ip_vs_core.c
okfn should only be used from different contexts to avoid deep call chains,
i.e. by nf_queue.

Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-05 12:20:40 -08:00
Patrick McHardy
a9b305c4e5 [NETFILTER]: ctnetlink: Fix dumping of helper name
Properly dump the helper name instead of internal kernel data.
Based on patch by Marcus Sundberg <marcus@ingate.com>.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-05 12:20:02 -08:00
Patrick McHardy
e7be6994ec [NETFILTER]: Fix module_param types and permissions
Fix netfilter module_param types and permissions. Also fix an off-by-one in
the ipt_ULOG nlbufsiz < 128k check.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-05 12:19:46 -08:00
Pablo Neira Ayuso
87711cb81c [NETFILTER]: Filter dumped entries based on the layer 3 protocol number
Dump entries of a given Layer 3 protocol number.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-05 12:19:23 -08:00
Pablo Neira Ayuso
c1d10adb4a [NETFILTER]: Add ctnetlink port for nf_conntrack
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-05 12:19:05 -08:00
Pablo Neira Ayuso
205d67c7d9 [NETFILTER]: ctnetlink: remove unused variable
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-05 12:18:44 -08:00
Pablo Neira Ayuso
d4d6bb41e0 [NETFILTER]: ctnetlink: fix conntrack mark race
Set conntrack mark before it is in hashes.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-05 12:18:25 -08:00
Pablo Neira Ayuso
0368309cb4 [NETFILTER]: ctnetlink: ctnetlink_event cleanup
Cleanup: Use 'else if' instead of a ugly 'goto' statement.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-05 12:18:08 -08:00
Pablo Neira Ayuso
47116eb201 [NETFILTER]: ctnetlink: use u_int32_t instead of unsigned int
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-05 12:17:50 -08:00
Pablo Neira Ayuso
984955b3d7 [NETFILTER]: ctnetlink: propagate ctnetlink_dump_tuples_proto return value back
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-05 12:17:29 -08:00
Yasuyuki Kozakai
90c4656eb4 [NETFILTER]: ctnetlink: Add sanity checkings for ICMP
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-05 12:17:03 -08:00
Pablo Neira Ayuso
684f7b296c [NETFILTER]: ctnetlink: remove bogus checks in ICMP protocol at dumping
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-05 12:16:41 -08:00
Jesper Juhl
d695aa8a1f [NETFILTER]: Decrease number of pointer derefs in nf_conntrack_core.c
Benefits of the patch:
 - Fewer pointer dereferences should make the code slightly faster.
 - Size of generated code is smaller
 - improved readability

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-05 12:16:16 -08:00
Jesper Juhl
3e4ead4fe5 [NETFILTER]: Decrease number of pointer derefs in nfnetlink_queue.c
Benefits of the patch:
 - Fewer pointer dereferences should make the code slightly faster.
 - Size of generated code is smaller
 - improved readability

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-05 12:15:58 -08:00
Adrian Bunk
4ffd2e4907 [IPVS]: Fix compilation
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-05 12:14:43 -08:00
Linus Torvalds
db9edfd7e3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6
Trivial manual merge fixup for usb_find_interface clashes.
2006-01-04 18:44:12 -08:00
Linus Torvalds
52347f4e81 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial 2006-01-04 16:34:57 -08:00
Linus Torvalds
d779188d2b Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 2006-01-04 16:31:56 -08:00
Kay Sievers
fd586bacf4 [PATCH] net: swich device attribute creation to default attrs
Recent udev versions don't longer cover bad sysfs timing with built-in
logic. Explicit rules are required to do that. For net devices, the
following is needed:
  ACTION=="add", SUBSYSTEM=="net", WAIT_FOR_SYSFS="address"
to handle access to net device properties from an event handler without
races.

This patch changes the main net attributes to be created by the driver
core, which is done _before_ the event is sent out and will not require
the stat() loop of the WAIT_FOR_SYSFS key.

Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-01-04 16:18:10 -08:00
Kay Sievers
312c004d36 [PATCH] driver core: replace "hotplug" by "uevent"
Leave the overloaded "hotplug" word to susbsystems which are handling
real devices. The driver core does not "plug" anything, it just exports
the state to userspace and generates events.

Signed-off-by: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-01-04 16:18:08 -08:00
Thomas Young
74cb879822 [TCP] tcp_vegas: Fix slow start
Vegas' slow start was only adding one MSS per RTT rather than one for
every ack. Slow start behavior should now match Reno.

Signed-off-by: Thomas Young <tyo@ee.mu.oz.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-04 13:59:32 -08:00
Kris Katterjohn
9369986306 [NET]: More instruction checks fornet/core/filter.c
Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-04 13:58:36 -08:00
YOSHIFUJI Hideaki
181a46a56e [NETFILTER]: Use macro for spinlock_t/rwlock_t initializations/definition.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-04 13:56:54 -08:00
YOSHIFUJI Hideaki
196433c5b7 [IPV6]: Use macro for rwlock_t initialization.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-04 13:56:31 -08:00
YOSHIFUJI Hideaki
ca40330248 [ECONET]: Use macro for spinlock_t definition.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-04 13:56:08 -08:00
Arnaldo Carvalho de Melo
f190055ff5 [IPVS]: Add missing include <linux/net.h>
CC [M]  net/ipv4/ipvs/ip_vs_conn.o
  /pub/scm/linux/kernel/git/acme/net-2.6/net/ipv4/ipvs/ip_vs_conn.c: In
  function 'ip_vs_conn_new':
  /pub/scm/linux/kernel/git/acme/net-2.6/net/ipv4/ipvs/ip_vs_conn.c:606:
  warning: implicit declaration of function 'net_ratelimit'
  /pub/scm/linux/kernel/git/acme/net-2.6/net/ipv4/ipvs/ip_vs_conn.c: In
  function 'ip_vs_random_dropentry':
  /pub/scm/linux/kernel/git/acme/net-2.6/net/ipv4/ipvs/ip_vs_conn.c:810:
  warning: implicit declaration of function 'net_random'

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-01-04 02:02:20 -02:00
Arnaldo Carvalho de Melo
80e40daa47 [TCP]: syn_flood_warning is only needed if CONFIG_SYN_COOKIES is selected
CC      net/ipv4/tcp_ipv4.o
  /pub/scm/linux/kernel/git/acme/net-2.6/net/ipv4/tcp_ipv4.c:665: warning:
  'syn_flood_warning' defined but not used

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-01-04 01:58:06 -02:00
Arnaldo Carvalho de Melo
e4dfd449c8 [DCCP] ackvec: use u8 for the buf offsets
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-01-04 01:46:34 -02:00
Andrea Bittau
6742bbcbb8 [DCCP] ackvec: Fix spelling of "throw"
Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-01-04 01:45:17 -02:00
Stephen Hemminger
40efc6fa17 [TCP]: less inline's
TCP inline usage cleanup:
 * get rid of inline in several places
 * replace __inline__ with inline where possible
 * move functions used in one file out of tcp.h
 * let compiler decide on used once cases

On x86_64: 
   text	   data	    bss	    dec	    hex	filename
3594701	 648348	 567400	4810449	 4966d1	vmlinux.orig
3593133	 648580	 567400	4809113	 496199	vmlinux

On sparc64:
   text	   data	    bss	    dec	    hex	filename
2538278	 406152	 530392	3474822	 350586	vmlinux.ORIG
2536382	 406384	 530392	3473158	 34ff06	vmlinux

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 16:03:49 -08:00
Stephen Hemminger
3c19065a1e [IEEE80211] ipw2200: Simplify multicast checks.
From: Stephen Hemminger <shemminger@osdl.org>

is_multicast_ether_addr() accepts broadcast too, so the
is_broadcast_ether_addr() calls are redundant.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 15:27:38 -08:00
Stephen Hemminger
cd8787ab04 [IPV4] fib_trie: build fix
Need this to fix build of fib_trie in net-2.6.16 (rebased) tree.
The code needs the new inet_make_mask inline.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 14:38:34 -08:00
Stephen Hemminger
554c9a8ec3 [BRIDGE]: Fix faulty check in br_stp_recalculate_bridge_id()
One of the conversions from memcmp to compare_ether_addr is incorrect.
We need to do relative comparison to determine min MAC address to
use in bridge id. 

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 14:35:54 -08:00
Andrea Bittau
e84a9f5e9c [DCCP]: Notify CCID only after ACK vectors have been processed.
The CCID should be notified of packet reception only when a packet is
valid.  Therefore, the ACK vector needs to be processed before
notifying the CCID.  Also, the CCID might need information provided by
the ACK vector.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 14:26:15 -08:00
Andrea Bittau
9e377202d2 [DCCP]: Send an ACK vector when ACKing a response packet
If ACK vectors are used, each packet with an ACK should contain an ACK
vector.  The only exception currently is response packets.  It
probably is not a good idea to store ACK vector state before the
connection is completed (to help protect from syn floods).

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 14:25:49 -08:00
Andrea Bittau
709dd3aaf5 [DCCP]: Do not process a packet twice when it's not in state DCCP_OPEN.
When packets are received, the connection is either in DCCP_OPEN
[fast-path] or it isn't.  If it's not [e.g. DCCP_PARTOPEN] upper
layers will perform sanity checks and parse options.  If it is in
DCCP_OPEN, dccp_rcv_established() will do it.  It is important not to
re-parse options in dccp_rcv_established() when it is not called from
the fast-path.  Else, fore example, the ack vector will be added twice
and the CCID will see the packet twice.

The solution is to always enfore sanity checks from the upper layers.
When packets arrive in the fast-path, sanity checks will be performed
before calling dccp_rcv_established().

Note(acme): I rewrote the patch to achieve the same result but keeping
dccp_rcv_established with the previous semantics and having it split
into __dccp_rcv_established, that doesn't does do any sanity check,
code in state != DCCP_OPEN use this lighter version as they already do
the sanity checks.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 14:25:17 -08:00
Patrick Caulfield
5062430c5c [DECNET]: Only use local routers
The attached patch makes DECnet routing only use routers from the same
area - rather than the highest rated router seen.

In theory there should not be an out-of-area router on a local network
but some networks are bridged rather than properly routed. VMS seems
to behave similarly: if I bring up a VMS node with no router then it
can't see anything else on the global network.

Signed-off-by: Patrick Caulfield <patrick@tykepenguin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 14:24:02 -08:00
Roberto Nibali
4b5bdf5cc3 [IPVS]: Cleanup IP_VS_DBG statements.
From: Roberto Nibali <ratz@drugphish.ch>

The attached patch (against current -GIT) is a cleanup patch which does
following:

o lookup debug messages shifted back to 9
o added more informational value to flags and refcnt since those
entries can be in multiple referenced structures
o cleanup 80 char violation

It's the prepatch to the session pool implementation and helps very much
to debug and monitor important variables and structures regarding the
threshold limitation and persistency without the thousands of lookup
messages which noone is interested in.

Signed-off-by: Horms <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 14:22:59 -08:00
Christoph Hellwig
b5e5fa5e09 [NET]: Add a dev_ioctl() fallback to sock_ioctl()
Currently all network protocols need to call dev_ioctl as the default
fallback in their ioctl implementations.  This patch adds a fallback
to dev_ioctl to sock_ioctl if the protocol returned -ENOIOCTLCMD.
This way all the procotol ioctl handlers can be simplified and we don't
need to export dev_ioctl.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 14:18:33 -08:00
Christoph Hellwig
5ff7630e4a [NETROM]: Remove unessecary lock_sock calls in netrom_ioctl()
lock_sock is needed only in very few cases, so do it there instead of
around the switch statement.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 14:14:46 -08:00
Per Liden
b461d2f218 [NETLINK] genetlink: fix cmd type in genl_ops to be consistent to u8
Signed-off-by: Per Liden <per.liden@ericsson.com>
ACKed-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 14:13:29 -08:00
Benjamin LaHaise
fd19f329a3 [AF_UNIX]: Convert to use a spinlock instead of rwlock
From: Benjamin LaHaise <bcrl@kvack.org>

In af_unix, a rwlock is used to protect internal state.  At least on my 
P4 with HT it is faster to use a spinlock due to the simpler memory 
barrier used to unlock.  This patch raises bw_unix to ~690K/s.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 14:10:46 -08:00
Benjamin LaHaise
4947d3ef8d [NET]: Speed up __alloc_skb()
From: Benjamin LaHaise <bcrl@kvack.org>

In __alloc_skb(), the use of skb_shinfo() which casts a u8 * to the 
shared info structure results in gcc being forced to do a reload of the 
pointer since it has no information on possible aliasing.  Fix this by 
using a pointer to refer to skb_shared_info.

By initializing skb_shared_info sequentially, the write combining buffers 
can reduce the number of memory transactions to a single write.  Reorder 
the initialization in __alloc_skb() to match the structure definition.  
There is also an alignment issue on 64 bit systems with skb_shared_info 
by converting nr_frags to a short everything packs up nicely.

Also, pass the slab cache pointer according to the fclone flag instead 
of using two almost identical function calls.

This raises bw_unix performance up to a peak of 707KB/s when combined 
with the spinlock patch.  It should help other networking protocols, too.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 14:06:50 -08:00
Arnaldo Carvalho de Melo
14c850212e [INET_SOCK]: Move struct inet_sock & helper functions to net/inet_sock.h
To help in reducing the number of include dependencies, several files were
touched as they were getting needed headers indirectly for stuff they use.

Thanks also to Alan Menegotto for pointing out that net/dccp/proto.c had
linux/dccp.h include twice.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:11:21 -08:00
Arnaldo Carvalho de Melo
25995ff577 [SOCK]: Introduce sk_receive_skb
Its common enough to to justify that, TCP still can't use it as it has the
prequeueing stuff, still to be made generic in the not so distant future :-)

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:11:19 -08:00
Christoph Hellwig
ce1d4d3e88 [NET]: restructure sock_aio_{read,write} / sock_{readv,writev}
Mid-term I plan to restructure the file_operations so that we don't need
to have all these duplicate aio and vectored versions.  This patch is
a small step in that direction but also a worthwile cleanup on it's own:

(1) introduce a alloc_sock_iocb helper that encapsulates allocating a
    proper sock_iocb
(2) add do_sock_read and do_sock_write helpers for common read/write
    code

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:11:18 -08:00
David S. Miller
cbeb321a64 [NET]: Fix sock_init() return value.
It needs to return zero now that it is an initcall.

Also, net/nonet.c no longer needs a dummy sock_init().

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:11:17 -08:00
Jaco Kroon
f34fbb9713 [PKTGEN]: Deinitialise static variables.
static variables should not be explicitly initialised to 0.  This causes
them to be placed in .data instead of .bss.  This patch de-initialises 3
static variables in net/core/pktgen.c.

There are approximately 800 more such variables in the source tree
(2.6.15rc5).  If there is more interrest I'd be willing to track down the
rest of these as well and de-initialise them as well.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:11:16 -08:00
Eric Dumazet
90ddc4f047 [NET]: move struct proto_ops to const
I noticed that some of 'struct proto_ops' used in the kernel may share
a cache line used by locks or other heavily modified data. (default
linker alignement is 32 bytes, and L1_CACHE_LINE is 64 or 128 at
least)

This patch makes sure a 'struct proto_ops' can be declared as const,
so that all cpus can share all parts of it without false sharing.

This is not mandatory : a driver can still use a read/write structure
if it needs to (and eventually a __read_mostly)

I made a global stubstitute to change all existing occurences to make
them const.

This should reduce the possibility of false sharing on SMP, and
speedup some socket system calls.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:11:15 -08:00
Andi Kleen
77d76ea310 [NET]: Small cleanup to socket initialization
sock_init can be done as a core_initcall instead of calling
it directly in init/main.c

Also I removed an out of date #ifdef.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:11:14 -08:00
Frank Filz
7708610b1b [SCTP]: Add support for SCTP_DELAYED_ACK_TIME socket option.
Signed-off-by: Frank Filz <ffilz@us.ibm.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:11:13 -08:00
Frank Filz
52ccb8e90c [SCTP]: Update SCTP_PEER_ADDR_PARAMS socket option to the latest api draft.
This patch adds support to set/get heartbeat interval, maximum number of
retransmissions, pathmtu, sackdelay time for a particular transport/
association/socket as per the latest SCTP sockets api draft11.

Signed-off-by: Frank Filz <ffilz@us.ibm.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:11:11 -08:00
Robert Olsson
fd9662555c [IPV4] fib_trie: Add credits.
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:11:10 -08:00
Stephen Hemminger
9eb2d62719 [TCP] cubic: use Newton-Raphson
Replace cube root algorithim with a faster version using Newton-Raphson.
Surprisingly, doing the scaled div64_64 is faster than a true 64 bit
division on 64 bit CPU's.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:11:09 -08:00
Stephen Hemminger
89b3d9aaf4 [TCP] cubic: precompute constants
Revised version of patch to pre-compute values for TCP cubic.
  * d32,d64 replaced with descriptive names
  * cube_factor replaces
	 srtt[scaled by count] / HZ * ((1 << (10+2*BICTCP_HZ)) / bic_scale)
  * beta_scale replaces
	8*(BICTCP_BETA_SCALE+beta)/3/(BICTCP_BETA_SCALE-beta);

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:11:08 -08:00
Stephen Hemminger
c865e5d99e [PKT_SCHED] netem: packet corruption option
Here is a new feature for netem in 2.6.16. It adds the ability to
randomly corrupt packets with netem. A version was done by
Hagen Paul Pfeifer, but I redid it to handle the cases of backwards
compatibility with netlink interface and presence of hardware checksum
offload. It is useful for testing hardware offload in devices.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:11:05 -08:00
Stephen Hemminger
8cbb512e50 [BRIDGE]: add version number
Add version info to bridge module.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:11:04 -08:00
Stephen Hemminger
edb5e46fc0 [BRIDGE]: limited ethtool support
Add limited ethtool support to bridge to allow disabling
features.

Note: if underlying device does not support a feature (like checksum
offload), then the bridge device won't inherit it.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:11:03 -08:00
Stephen Hemminger
0e5eabac49 [BRIDGE]: filter packets in learning state
While in the learning state, run filters but drop the result.
This prevents us from acquiring bad fdb entries in learning state.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:11:02 -08:00
Stephen Hemminger
4433f420e5 [BRIDGE]: handle speed detection after carrier changes
Speed of a interface may not be available until carrier
is detected in the case of autonegotiation. To get the correct value
we need to recheck speed after carrier event.  But the check needs to
be done in a context that is similar to normal ethtool interface (can sleep).

Also, delay check for 1ms to try avoid any carrier bounce transitions.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:11:01 -08:00
Stephen Hemminger
4505a3ef72 [BRIDGE]: allow setting hardware address of bridge pseudo-dev
Some people are using bridging to hide multiple machines from an ISP
that restricts by MAC address. So in that case allow the bridge mac
address to be set to any of the existing interfaces.  I don't want to
allow any arbitrary value and confuse STP.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:11:00 -08:00
David S. Miller
fbe9cc4a87 [AF_UNIX]: Use spinlock for unix_table_lock
This lock is actually taken mostly as a writer,
so using a rwlock actually just makes performance
worse especially on chips like the Intel P4.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:59 -08:00
Arnaldo Carvalho de Melo
d83d8461f9 [IP_SOCKGLUE]: Remove most of the tcp specific calls
As DCCP needs to be called in the same spots.

Now we have a member in inet_sock (is_icsk), set at sock creation time from
struct inet_protosw->flags (if INET_PROTOSW_ICSK is set, like for TCP and
DCCP) to see if a struct sock instance is a inet_connection_sock for places
like the ones in ip_sockglue.c (v4 and v6) where we previously were looking if
sk_type was SOCK_STREAM, that is insufficient because we now use the same code
for DCCP, that has sk_type SOCK_DCCP.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:58 -08:00
Arnaldo Carvalho de Melo
d8313f5ca2 [INET6]: Generalise tcp_v6_hash_connect
Renaming it to inet6_hash_connect, making it possible to ditch
dccp_v6_hash_connect and share the same code with TCP instead.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:56 -08:00
Arnaldo Carvalho de Melo
a7f5e7f164 [INET]: Generalise tcp_v4_hash_connect
Renaming it to inet_hash_connect, making it possible to ditch
dccp_v4_hash_connect and share the same code with TCP instead.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:55 -08:00
Arnaldo Carvalho de Melo
6d6ee43e0b [TWSK]: Introduce struct timewait_sock_ops
So that we can share several timewait sockets related functions and
make the timewait mini sockets infrastructure closer to the request
mini sockets one.

Next changesets will take advantage of this, moving more code out of
TCP and DCCP v4 and v6 to common infrastructure.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:54 -08:00
Arnaldo Carvalho de Melo
fc44b98053 [DCCP]: Use reqsk_free in dccp_v4_conn_request
Now we have the destructor (dccp_v4_reqsk_destructor) in our
request_sock_ops vtable.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:53 -08:00
Arnaldo Carvalho de Melo
3df80d9320 [DCCP]: Introduce DCCPv6
Still needs mucho polishing, specially in the checksum code, but works
just fine, inet_diag/iproute2 and all 8)

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:52 -08:00
Arnaldo Carvalho de Melo
399c07def6 [IPV6]: Export ipv6_opt_accepted
It was already non-TCP specific, will be used by DCCPv6.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:51 -08:00
Arnaldo Carvalho de Melo
f21e68caa0 [DCCP]: Prepare the AF agnostic core for the introduction of DCCPv6
Basically exports a similar set of functions as the one exported by
the non-AF specific TCP code.

In the process moved some non-AF specific code from dccp_v4_connect to
dccp_connect_init and moved the checksum verification from
dccp_invalid_packet to dccp_v4_rcv, so as to use it in dccp_v6_rcv
too.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:50 -08:00
Arnaldo Carvalho de Melo
34ca686081 [DCCP]: Just rename dccp_v4_prot to dccp_prot
To match TCP equivalent.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:49 -08:00
Arnaldo Carvalho de Melo
3cf3dc6c2e [IPV6]: Export some symbols for DCCPv6
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:48 -08:00
Arnaldo Carvalho de Melo
0fa1a53e1f [IPV6]: Introduce inet6_timewait_sock
Out of tcp6_timewait_sock, that now is just an aggregation of
inet_timewait_sock and inet6_timewait_sock, using tw_ipv6_offset in struct
inet_timewait_sock, that is common to the IPv6 transport protocols that use
timewait sockets, like DCCP and TCP.

tw_ipv6_offset plays the struct inet_sock pinfo6 role, i.e. for the generic
code to find the IPv6 area in a timewait sock.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:47 -08:00
Arnaldo Carvalho de Melo
b9750ce13c [IPV6]: Generalise some functions
Using sk->sk_protocol instead of IPPROTO_TCP.

Will be used by DCCPv6 in the next changesets.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:46 -08:00
Benjamin LaHaise
830a1e5c21 [AF_UNIX]: Remove superfluous reference counting in unix_stream_sendmsg
AF_UNIX stream socket performance on P4 CPUs tends to suffer due to a
lot of pipeline flushes from atomic operations.  The patch below
removes the sock_hold() and sock_put() in unix_stream_sendmsg().  This
should be safe as the socket still holds a reference to its peer which
is only released after the file descriptor's final user invokes
unix_release_sock().  The only consideration is that we must add a
memory barrier before setting the peer initially.

Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:45 -08:00
Benjamin LaHaise
c1cbe4b7ad [NET]: Avoid atomic xchg() for non-error case
It also looks like there were 2 places where the test on sk_err was
missing from the event wait logic (in sk_stream_wait_connect and
sk_stream_wait_memory), while the rest of the sock_error() users look
to be doing the right thing.  This version of the patch fixes those,
and cleans up a few places that were testing ->sk_err directly.

Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:44 -08:00
Roberto Nibali
f1f71e03b1 [IPVS]: remove dead code
This patch removes dead code. I don't see the reason to keep this cruft
around, besides cluttering the nice and functionally working code.

Signed-off-by: Roberto Nibali <ratz@drugphish.ch>
Signed-off-by: Horms <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:43 -08:00
Stephen Hemminger
65a45441d7 [UDP]: udp_checksum_init return value
Since udp_checksum_init always returns 0 there is no point in
having it return a value.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:42 -08:00
Herbert Xu
3305b80c21 [IP]: Simplify and consolidate MSG_PEEK error handling
When a packet is obtained from skb_recv_datagram with MSG_PEEK enabled
it is left on the socket receive queue.  This means that when we detect
a checksum error we have to be careful when trying to free the packet
as someone could have dequeued it in the time being.

Currently this delicate logic is duplicated three times between UDPv4,
UDPv6 and RAWv6.  This patch moves them into a one place and simplifies
the code somewhat.

This is based on a suggestion by Eric Dumazet.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:41 -08:00
Arnaldo Carvalho de Melo
57cca05af1 [DCCP]: Introduce dccp_ipv4_af_ops
And make the core DCCP code AF agnostic, just like TCP, now its time
to work on net/dccp/ipv6.c, we are close to the end!

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:40 -08:00
Arnaldo Carvalho de Melo
af05dc9394 [ICSK]: Move v4_addr2sockaddr from TCP to icsk
Renaming it to inet_csk_addr2sockaddr.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:39 -08:00
Arnaldo Carvalho de Melo
8292a17a39 [ICSK]: Rename struct tcp_func to struct inet_connection_sock_af_ops
And move it to struct inet_connection_sock. DCCP will use it in the
upcoming changesets.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:38 -08:00
Arnaldo Carvalho de Melo
ca304b6104 [IPV6]: Introduce inet6_rsk()
And inet6_rsk_offset in inet_request_sock, for the same reasons as
inet_sock's pinfo6 member.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:37 -08:00
Arnaldo Carvalho de Melo
8129765ac0 [IPV6]: Generalise tcp_v6_search_req & tcp_v6_synq_add
More work is needed tho to introduce inet6_request_sock from
tcp6_request_sock, in the same layout considerations as ipv6_pinfo in
inet_sock, next changeset will do that.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:36 -08:00
Arnaldo Carvalho de Melo
c2977c2213 [ICSK]: make inet_csk_reqsk_queue_hash_add timeout arg unsigned long
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:34 -08:00
Arnaldo Carvalho de Melo
90b19d3169 [IPV6]: Generalise __tcp_v6_hash, renaming it to __inet6_hash
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:33 -08:00
Arnaldo Carvalho de Melo
971af18bbf [IPV6]: Reuse inet_csk_get_port in tcp_v6_get_port
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:33 -08:00
Herbert Xu
89cee8b1cb [IPV4]: Safer reassembly
Another spin of Herbert Xu's "safer ip reassembly" patch
for 2.6.16.

(The original patch is here:
http://marc.theaimsgroup.com/?l=linux-netdev&m=112281936522415&w=2
and my only contribution is to have tested it.)

This patch (optionally) does additional checks before accepting IP
fragments, which can greatly reduce the possibility of reassembling
fragments which originated from different IP datagrams.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Arthur Kepner <akepner@sgi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:31 -08:00
Bart De Schuymer
d5228a4f49 [NETFILTER] ebtables: Support nf_log API from ebt_log and ebt_ulog
This makes ebt_log and ebt_ulog use the new nf_log api.  This enables
the bridging packet filter to log packets e.g. via nfnetlink_log.

Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:30 -08:00
Eric Dumazet
3183606469 [NETFILTER] ip_tables: NUMA-aware allocation
Part of a performance problem with ip_tables is that memory allocation
is not NUMA aware, but 'only' SMP aware (ie each CPU normally touch
separate cache lines)

Even with small iptables rules, the cost of this misplacement can be
high on common workloads.  Instead of using one vmalloc() area
(located in the node of the iptables process), we now allocate an area
for each possible CPU, using vmalloc_node() so that memory should be
allocated in the CPU's node if possible.

Port to arp_tables and ip6_tables by Harald Welte.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:29 -08:00
Stephen Hemminger
df3271f336 [TCP] BIC: CUBIC window growth (2.0)
Replace existing BIC version 1.1 with new version 2.0.
The main change is to replace the window growth function
with a cubic function as described in:
  http://www.csc.ncsu.edu/faculty/rhee/export/bitcp/cubic-paper.pdf

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:28 -08:00
Stephen Hemminger
05d054503a [TCP] BIC: spelling and whitespace
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:27 -08:00
Stephen Hemminger
018da8f44c [TCP] BIC: remove low utilization code.
The latest BICTCP patch at:
http://www.csc.ncsu.edu:8080/faculty/rhee/export/bitcp/index_files/Page546.htm

disables the low_utilization feature of BICTCP because it doesn't work
in some cases. This patch removes it.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:26 -08:00
Trent Jaeger
df71837d50 [LSM-IPSec]: Security association restriction.
This patch series implements per packet access control via the
extension of the Linux Security Modules (LSM) interface by hooks in
the XFRM and pfkey subsystems that leverage IPSec security
associations to label packets.  Extensions to the SELinux LSM are
included that leverage the patch for this purpose.

This patch implements the changes necessary to the XFRM subsystem,
pfkey interface, ipv4/ipv6, and xfrm_user interface to restrict a
socket to use only authorized security associations (or no security
association) to send/receive network packets.

Patch purpose:

The patch is designed to enable access control per packets based on
the strongly authenticated IPSec security association.  Such access
controls augment the existing ones based on network interface and IP
address.  The former are very coarse-grained, and the latter can be
spoofed.  By using IPSec, the system can control access to remote
hosts based on cryptographic keys generated using the IPSec mechanism.
This enables access control on a per-machine basis or per-application
if the remote machine is running the same mechanism and trusted to
enforce the access control policy.

Patch design approach:

The overall approach is that policy (xfrm_policy) entries set by
user-level programs (e.g., setkey for ipsec-tools) are extended with a
security context that is used at policy selection time in the XFRM
subsystem to restrict the sockets that can send/receive packets via
security associations (xfrm_states) that are built from those
policies.

A presentation available at
www.selinux-symposium.org/2005/presentations/session2/2-3-jaeger.pdf
from the SELinux symposium describes the overall approach.

Patch implementation details:

On output, the policy retrieved (via xfrm_policy_lookup or
xfrm_sk_policy_lookup) must be authorized for the security context of
the socket and the same security context is required for resultant
security association (retrieved or negotiated via racoon in
ipsec-tools).  This is enforced in xfrm_state_find.

On input, the policy retrieved must also be authorized for the socket
(at __xfrm_policy_check), and the security context of the policy must
also match the security association being used.

The patch has virtually no impact on packets that do not use IPSec.
The existing Netfilter (outgoing) and LSM rcv_skb hooks are used as
before.

Also, if IPSec is used without security contexts, the impact is
minimal.  The LSM must allow such policies to be selected for the
combination of socket and remote machine, but subsequent IPSec
processing proceeds as in the original case.

Testing:

The pfkey interface is tested using the ipsec-tools.  ipsec-tools have
been modified (a separate ipsec-tools patch is available for version
0.5) that supports assignment of xfrm_policy entries and security
associations with security contexts via setkey and the negotiation
using the security contexts via racoon.

The xfrm_user interface is tested via ad hoc programs that set
security contexts.  These programs are also available from me, and
contain programs for setting, getting, and deleting policy for testing
this interface.  Testing of sa functions was done by tracing kernel
behavior.

Signed-off-by: Trent Jaeger <tjaeger@cse.psu.edu>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-03 13:10:24 -08:00
Jeff Garzik
ac67c62473 Merge branch 'master' 2006-01-03 10:49:18 -05:00
Matt Mackall
4a4efbdee2 s/retreiv/retriev/g
As everyone knows, the rule is: "i before e.. um.. always."

Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-01-03 13:27:11 +01:00
David L Stevens
5ab4a6c81e [IPV6] mcast: Fix multiple issues in MLDv2 reports.
The below "jumbo" patch fixes the following problems in MLDv2.

1) Add necessary "ntohs" to recent "pskb_may_pull" check [breaks
        all nonzero source queries on little-endian (!)]

2) Add locking to source filter list [resend of prior patch]

3) fix "mld_marksources()" to
        a) send nothing when all queried sources are excluded
        b) send full exclude report when source queried sources are
                not excluded
        c) don't schedule a timer when there's nothing to report

NOTE: RFC 3810 specifies the source list should be saved and each
  source reported individually as an IS_IN. This is an obvious DOS
  path, requiring the host to store and then multicast as many sources
  as are queried (e.g., millions...). This alternative sends a full, 
  relevant report that's limited to number of sources present on the
  machine.

4) fix "add_grec()" to send empty-source records when it should
        The original check doesn't account for a non-empty source
        list with all sources inactive; the new code keeps that
        short-circuit case, and also generates the group header
        with an empty list if needed.

5) fix mca_crcount decrement to be after add_grec(), which needs
        its original value

These issues (other than item #1 ;-) ) were all found by Yan Zheng,
much thanks!

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-27 14:03:00 -08:00
David S. Miller
1b93ae64ca [NET]: Validate socket filters against BPF_MAXINSNS in one spot.
Currently the checks are scattered all over and this leads
to inconsistencies and even cases where the check is not made.

Based upon a patch from Kris Katterjohn.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-27 13:57:59 -08:00
YOSHIFUJI Hideaki
6732badee0 [IPV6]: Fix addrconf dead lock.
We need to release idev->lcok before we call addrconf_dad_stop().
It calls ipv6_addr_del(), which will hold idev->lock.

Bug spotted by Yasuyuki KOZAKAI <yasuyuki.kozakai@toshiba.co.jp>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-27 13:35:15 -08:00
David Kimdon
79cac2a221 [BR_NETFILTER]: Fix leak if skb traverses > 1 bridge
Call nf_bridge_put() before allocating a new nf_bridge structure and
potentially overwriting the pointer to a previously allocated one.
This fixes a memory leak which can occur when the bridge topology
allows for an skb to traverse more than one bridge.

Signed-off-by: David Kimdon <david.kimdon@devicescape.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-26 17:27:10 -08:00
David L Stevens
6f4353d891 [IPV6]: Increase default MLD_MAX_MSF to 64.
The existing default of 10 is just way too low.

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-26 17:03:46 -08:00
Hiroyuki YAMAMORI
291d809ba5 [IPV6]: Fix Temporary Address Generation
From: Hiroyuki YAMAMORI <h-yamamo@db3.so-net.ne.jp>

Since regen_count is stored in the public address, we need to reset it
when we start renewing temporary address.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-23 11:24:05 -08:00
YOSHIFUJI Hideaki
3dd3bf8357 [IPV6]: Fix dead lock.
We need to relesae ifp->lock before we call addrconf_dad_stop(),
which will hold ifp->lock.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-23 11:23:21 -08:00
David S. Miller
e6469297d4 Merge git://git.skbuff.net/gitroot/yoshfuji/linux-2.6.14+git+ipv6-fix-20051221a 2005-12-22 07:41:27 -08:00
David S. Miller
9b78a82c1c [IPSEC]: Fix policy updates missed by sockets
The problem is that when new policies are inserted, sockets do not see
the update (but all new route lookups do).

This bug is related to the SA insertion stale route issue solved
recently, and this policy visibility problem can be fixed in a similar
way.

The fix is to flush out the bundles of all policies deeper than the
policy being inserted.  Consider beginning state of "outgoing"
direction policy list:

	policy A --> policy B --> policy C --> policy D

First, realize that inserting a policy into a list only potentially
changes IPSEC routes for that direction.  Therefore we need not bother
considering the policies for other directions.  We need only consider
the existing policies in the list we are doing the inserting.

Consider new policy "B'", inserted after B.

	policy A --> policy B --> policy B' --> policy C --> policy D

Two rules:

1) If policy A or policy B matched before the insertion, they
   appear before B' and thus would still match after inserting
   B'

2) Policy C and D, now "shadowed" and after policy B', potentially
   contain stale routes because policy B' might be selected
   instead of them.

Therefore we only need flush routes assosciated with policies
appearing after a newly inserted policy, if any.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-22 07:39:48 -08:00
Ian McDonald
4c7e689502 [DCCP]: Comment typo
I hope to actually change this behaviour shortly but this will help
anybody grepping code at present.

Signed-off-by: Ian McDonald <imcdnzl@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-21 19:02:39 -08:00
Kristian Slavov
1d1428045c [IPV6]: Fix address deletion
If you add more than one IPv6 address belonging to the same prefix and 
delete the address that was last added, routing table entry for that 
prefix is also deleted.
Tested on 2.6.14.4

To reproduce:
ip addr add 3ffe::1/64 dev eth0
ip addr add 3ffe::2/64 dev eth0
/* wait DAD */
sleep 1
ip addr del 3ffe::2/64 dev eth0
ip -6 route

(route to 3ffe::/64 should be gone)

In ipv6_del_addr(), if ifa == ifp, we set ifa->if_next to NULL, and later 
assign ifap = &ifa->if_next, effectively terminating the for-loop.
This prevents us from checking if there are other addresses using the same 
prefix that are valid, and thus resulting in deletion of the prefix.
This applies only if the first entry in idev->addr_list is the address to 
be deleted.

Signed-off-by: Kristian Slavov <kristian.slavov@nomadiclab.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-21 18:47:24 -08:00
Mika Kukkonen
7eb1b3d372 [VLAN]: Add two missing checks to vlan_ioctl_handler()
In vlan_ioctl_handler() the code misses couple checks for
error return values.

Signed-off-by: Mika Kukkonen <mikukkon@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-21 18:39:49 -08:00
Mika Kukkonen
0d77d59f62 [NETROM]: Fix three if-statements in nr_state1_machine()
I found these while compiling with extra gcc warnings;
considering the indenting surely they are not intentional?

Signed-off-by: Mika Kukkonen <mikukkon@iki.fi>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-21 18:38:26 -08:00
YOSHIFUJI Hideaki
6b3ae80a63 [IPV6]: Don't select a tentative address as a source address.
A tentative address is not considered "assigned to an interface"
in the traditional sense (RFC2462 Section 4).
Don't try to select such an address for the source address.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2005-12-21 22:58:01 +09:00
YOSHIFUJI Hideaki
c5e33bddd3 [IPV6]: Run DAD when the link becomes ready.
If the link was not available when the interface was created,
run DAD for pending tentative addresses when the link becomes ready.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2005-12-21 22:57:44 +09:00
YOSHIFUJI Hideaki
3c21edbd11 [IPV6]: Defer IPv6 device initialization until the link becomes ready.
NETDEV_UP might be sent even if the link attached to the interface was
not ready.  DAD does not make sense in such case, so we won't do so.
After interface

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2005-12-21 22:57:24 +09:00
YOSHIFUJI Hideaki
8de3351e6e [IPV6]: Try not to send icmp to anycast address.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2005-12-21 22:57:06 +09:00
YOSHIFUJI Hideaki
58c4fb86ea [IPV6]: Flag RTF_ANYCAST for anycast routes.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2005-12-21 22:56:42 +09:00
Trond Myklebust
48e4918775 SUNRPC: Fix "EPIPE" error on mount of rpcsec_gss-protected partitions
gss_create_upcall() should not error just because rpc.gssd closed the
 pipe on its end. Instead, it should requeue the pending requests and then
 retry.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-12-19 23:12:21 -05:00
Trond Myklebust
b079fa7baa RPC: Do not block on skb allocation
If we get something like the following,
 [  125.300636]  [<c04086e1>] schedule_timeout+0x54/0xa5
 [  125.305931]  [<c040866e>] io_schedule_timeout+0x29/0x33
 [  125.311495]  [<c02880c4>] blk_congestion_wait+0x70/0x85
 [  125.317058]  [<c014136b>] throttle_vm_writeout+0x69/0x7d
 [  125.322720]  [<c014714d>] shrink_zone+0xe0/0xfa
 [  125.327560]  [<c01471d4>] shrink_caches+0x6d/0x6f
 [  125.332581]  [<c01472a6>] try_to_free_pages+0xd0/0x1b5
 [  125.338056]  [<c013fa4b>] __alloc_pages+0x135/0x2e8
 [  125.343258]  [<c03b74ad>] tcp_sendmsg+0xaa0/0xb78
 [  125.348281]  [<c03d4666>] inet_sendmsg+0x48/0x53
 [  125.353212]  [<c0388716>] sock_sendmsg+0xb8/0xd3
 [  125.358147]  [<c0388773>] kernel_sendmsg+0x42/0x4f
 [  125.363259]  [<c038bc00>] sock_no_sendpage+0x5e/0x77
 [  125.368556]  [<c03ee7af>] xs_tcp_send_request+0x2af/0x375
 then the socket is blocked until memory is reclaimed, and no
 progress can ever be made.

 Try to access the emergency pools by using GFP_ATOMIC.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-12-19 23:11:54 -05:00
Neil Horman
9bffc4ace1 [SCTP]: Fix sctp to not return erroneous POLLOUT events.
Make sctp_writeable() use sk_wmem_alloc rather than sk_wmem_queued to
determine the sndbuf space available. It also removes all the modifications
to sk_wmem_queued as it is not currently used in SCTP.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-19 14:24:40 -08:00
David S. Miller
399c180ac5 [IPSEC]: Perform SA switchover immediately.
When we insert a new xfrm_state which potentially
subsumes an existing one, make sure all cached
bundles are flushed so that the new SA is used
immediately.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-19 14:23:23 -08:00
Patrick McHardy
9e999993c7 [XFRM]: Handle DCCP in xfrm{4,6}_decode_session
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-19 14:03:46 -08:00
YOSHIFUJI Hideaki
3dd4bc68fa [IPV6]: Fix route lifetime.
The route expiration time is stored in rt6i_expires in jiffies.
The argument of rt6_route_add() for adding a route is not the
expiration time in jiffies nor in clock_t, but the lifetime
(or time left before expiration) in clock_t.

Because of the confusion, we sometimes saw several strange errors
(FAILs) in TAHI IPv6 Ready Logo Phase-2 Self Test.
The symptoms were analyzed by Mitsuru Chinen <CHINEN@jp.ibm.com>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-19 14:02:45 -08:00
Bart De Schuymer
b03664869a [BRIDGE-NF]: Fix bridge-nf ipv6 length check
A typo caused some bridged IPv6 packets to get dropped randomly,
as reported by Sebastien Chaumontet. The patch below fixes this
(using skb->nh.raw instead of raw) and also makes the jumbo packet
length checking up-to-date with the code in
net/ipv6/exthdrs.c::ipv6_hop_jumbo.

Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-19 14:00:08 -08:00
Patrick McHardy
31cb5bd4dc [NETFILTER]: Fix incorrect dependency for IP6_NF_TARGET_NFQUEUE
IP6_NF_TARGET_NFQUEUE depends on IP6_NF_IPTABLES, not IP_NF_IPTABLES.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-19 13:53:26 -08:00
Patrick McHardy
0476f171af [NETFILTER]: Fix NAT init order
As noticed by Phil Oester, the GRE NAT protocol helper is initialized
before the NAT core, which makes registration fail.

Change the linking order to make NAT be initialized first.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-19 13:53:09 -08:00
Jeff Garzik
418fbfe979 Merge branch 'master' 2005-12-19 00:09:53 -05:00
Al Viro
d3a880e1ff [PATCH] Address of void __user * is void __user * *, not void * __user *
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-15 10:04:31 -08:00
Stephen Hemminger
a388442c37 [VLAN]: Fix hardware rx csum errors
Receiving VLAN packets over a device (without VLAN assist) that is
doing hardware checksumming (CHECKSUM_HW), causes errors because the
VLAN code forgets to adjust the hardware checksum.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-14 16:23:16 -08:00
Herbert Xu
1542272a60 [GRE]: Fix hardware checksum modification
The skb_postpull_rcsum introduced a bug to the checksum modification.
Although the length pulled is offset bytes, the origin of the pulling
is the GRE header, not the IP header.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-14 12:55:24 -08:00
David S. Miller
2edc2689f8 [PKT_SCHED]: Disable debug tracing logs by default in packet action API.
Noticed by Andi Kleen.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-13 22:59:50 -08:00
David S. Miller
a1493d9cd1 [IPV6] addrconf: Do not print device pointer in privacy log message.
Noticed by Andi Kleen, it is pointless to emit the device
structure pointer in the kernel logs like this.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-13 22:59:36 -08:00
Jeff Garzik
783e3385a1 Merge branch 'upstream-fixes' 2005-12-13 00:07:46 -05:00
Olaf Hering
1cf9e8a786 [PATCH] ieee80211_crypt_tkip depends on NET_RADIO
*** Warning: ".wireless_send_event" [net/ieee80211/ieee80211_crypt_tkip.ko] undefined!

Signed-off-by: Olaf Hering <olh@suse.de>

 net/ieee80211/Kconfig |    2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-12-12 23:59:28 -05:00
Linus Torvalds
14ee0a1414 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/nf-2.6 2005-12-12 15:49:56 -08:00
Marcus Sundberg
2f9616d4c4 [NETFILTER]: ip_nat_tftp: Fix expectation NAT
When a TFTP client is SNATed so that the port is also changed, the
port is never changed back for the expected connection.

Signed-off-by: Marcus Sundberg <marcus@ingate.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-12 15:02:48 -08:00
Arnaldo Carvalho de Melo
ecc51b6d5c [TCPv6]: Fix skb leak
Spotted by Francois Romieu, thanks!

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-12 14:38:10 -08:00
Jeff Garzik
b1086eef81 Merge branch 'master' 2005-12-12 15:24:45 -05:00
Kazunori MIYAZAWA
73d4f84fd0 [IPv6] IPsec: fix pmtu calculation of esp
It is a simple bug which uses the wrong member.

This bug does not seriously affect ordinary use of IPsec.
But it is important to pass IPv6 ready logo phase-2
conformance test of IPsec SGW.

Signed-off-by: Kazunori MIYAZAWA <miyazawa@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-08 23:11:42 -08:00
Stephen Hemminger
246a421207 [NET]: Fix NULL pointer deref in checksum debugging.
The problem I was seeing turned out to be that skb->dev is NULL when
the checksum is being completed in user context. This happens because
the reference to the device is dropped (to allow it to be released
when packets are in the queue).

Because skb->dev was NULL, the netdev_rx_csum_fault was panicing on
deref of dev->name. How about this?

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-08 15:21:39 -08:00
David S. Miller
4ebf0ae261 [AF_PACKET]: Convert PACKET_MMAP over to vm_insert_page().
So we can properly use __GFP_COMP and avoid the use of
PG_reserved pages.

With extremely helpful review from Hugh Dickins.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-06 16:38:35 -08:00
David S. Miller
dfb4b9dceb [TCP] Vegas: timestamp before clone
We have to store the congestion control timestamp on the SKB before we
clone it, not after.  Else we get no timestamping information at all.

tcp_transmit_skb() has been reworked so that we can do the timestamp
still in one spot, instead of at all the call sites.

Problem discovered, and initial fix, from Tom Young
<tyo@ee.unimelb.edu.au>.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-06 16:24:52 -08:00
Thomas Young
0d7bef600a [TCP] Vegas: Remove extra call to tcp_vegas_rtt_calc
Remove unneeded call to tcp_vegas_rtt_calc. The more accurate
microsecond value has already been registered prior to calling
tcp_vegas_cong_avoid.

Signed-off-by: Thomas Young <tyo@ee.mu.oz.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-06 16:17:11 -08:00
Thomas Young
5b49561381 [TCP] Vegas: stop resetting rtt every ack
Move the resetting of rtt measurements to inside the once per RTT
block of code.

Signed-off-by: Thomas Young <tyo@ee.mu.oz.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-06 16:16:34 -08:00
Steven Whitehouse
1f12bcc9d1 [DECNET]: add memory buffer settings
The patch (originally from Steve) simply adds memory buffer settings to 
DECnet similar to those in TCP.

Signed-off-by: Patrick Caulfield <patrick@tykepenguin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-05 13:42:06 -08:00
Martin Waitz
dab9630fb3 [NET]: make function pointer argument parseable by kernel-doc
When a function takes a function pointer as argument it should use the 'return
(*pointer)(params...)' syntax used everywhere else in the kernel as this is
recognized by kernel-doc.

Signed-off-by: Martin Waitz <tali@admingilde.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-05 13:40:12 -08:00
Patrick McHardy
2fdf1faa8e [NETFILTER]: Don't use conntrack entry after dropping the reference
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-05 13:38:16 -08:00
Patrick McHardy
266c854348 [NETFILTER]: Fix unbalanced read_unlock_bh in ctnetlink
NFA_NEST calls NFA_PUT which jumps to nfattr_failure if the skb has no
room left. We call read_unlock_bh at nfattr_failure for the NFA_PUT inside
the locked section, so move NFA_NEST inside the locked section too.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-05 13:37:33 -08:00
Patrick McHardy
6636568cf8 [NETFILTER]: Wait for untracked references in nf_conntrack module unload
Noticed by Pablo Neira <pablo@eurodev.net>.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-05 13:36:50 -08:00
Patrick McHardy
a795756333 [NETFILTER]: Mark ctnetlink as EXPERIMENTAL
Should have been marked EXPERIMENTAL from the beginning, as the current
bunch of fixes show.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-05 13:36:25 -08:00
Patrick McHardy
0be7fa92ca [NETFILTER]: Fix CTA_PROTO_NUM attribute size in ctnetlink
CTA_PROTO_NUM is a u_int8_t.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-05 13:34:51 -08:00
Patrick McHardy
afe5c6bb03 [NETFILTER]: Fix ip_conntrack_flush abuse in ctnetlink
ip_conntrack_flush() used to be part of ip_conntrack_cleanup(), which needs
to drop _all_ references on module unload. Table flushed using ctnetlink
just needs to clean the table and doesn't need to flush the event cache or
wait for any references attached to skbs. Move everything but pure table
flushing back to ip_conntrack_cleanup().

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-05 13:33:50 -08:00
Yasuyuki Kozakai
3ebbe0cdd4 [NETFILTER]: nfnetlink: Fix calculation of minimum message length
At least, valid nfnetlink message should have nlmsghdr and nfgenmsg.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-05 13:33:26 -08:00
Yasuyuki Kozakai
f16c910724 [NETFILTER]: nf_conntrack: Fix missing check for ICMPv6 type
This makes nf_conntrack_icmpv6 check that ICMPv6 type isn't < 128
to avoid accessing out of array valid_new[] and invmap[].

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-05 13:32:50 -08:00
Pablo Neira Ayuso
8d1ca69984 [NETFILTER]: Fix incorrect argument to ip_nat_initialized() in ctnetlink
ip_nat_initialized() takes enum ip_nat_manip_type as it's second argument,
not a hook number.

Noticed and initial patch by Marcus Sundberg <marcus@ingate.com>.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-05 13:32:14 -08:00
Jeff Garzik
2fde9901f6 Merge branch 'master' 2005-12-03 21:03:28 -05:00
Trond Myklebust
bb184f3356 SUNRPC: Fix Oopsable condition in rpc_pipefs
The elements on rpci->in_upcall are tracked by the filp->private_data,
 which will ensure that they get released when the file is closed.

 The exception is if rpc_close_pipes() gets called first, since that
 sets rpci->ops to NULL.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-12-03 15:20:10 -05:00
YOSHIFUJI Hideaki
af1afe8662 [IPV6]: Load protocol module dynamically.
[ Modified to match inet_create() bug fix by Herbert Xu -DaveM ]

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-02 20:56:57 -08:00
Herbert Xu
86c8f9d158 [IPV4] Fix EPROTONOSUPPORT error in inet_create
There is a coding error in inet_create that causes it to always return
ESOCKTNOSUPPORT.  It should return EPROTONOSUPPORT when there are
protocols registered for a given socket type but none of them match
the requested protocol.

This is based on a patch by Jayachandran C.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-02 20:43:26 -08:00
David Stevens
24c6927505 [IGMP]: workaround for IGMP v1/v2 bug
From: David Stevens <dlstevens@us.ibm.com>

As explained at:

	http://www.cs.ucsb.edu/~krishna/igmp_dos/

With IGMP version 1 and 2 it is possible to inject a unicast
report to a client which will make it ignore multicast
reports sent later by the router.

The fix is to only accept the report if is was sent to a
multicast or unicast address.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-02 20:32:59 -08:00
Neil Horman
bf031fff1f [SCTP]: Fix getsockname for sctp when an ipv6 socket accepts a connection from
an ipv4 socket.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-02 20:32:29 -08:00
Neil Horman
6736dc35e9 [SCTP]: Return socket errors only if the receive queue is empty.
This patch fixes an issue where it is possible to get valid data after
a ENOTCONN error. It returns socket errors only after data queued on
socket receive queue is consumed.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-02 20:30:06 -08:00
Thomas Graf
ea86575eaf [NETLINK]: Fix processing of fib_lookup netlink messages
The receive path for fib_lookup netlink messages is lacking sanity
checks for header and payload and is thus vulnerable to malformed
netlink messages causing illegal memory references.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-01 14:30:00 -08:00
Phil Oester
2a43c4af3f [NETFILTER]: Fix recent match jiffies wrap mismatches
Around jiffies wrap time (i.e. within first 5 mins after boot), recent
match rules which contain both --seconds and --hitcount arguments
experience false matches.

This is because the last_pkts array is filled with zeros on creation, and
when comparing 'now' to 0 (+ --seconds argument), time_before_eq thinks it
has found a hit.

Below patch adds a break if the packet value is zero.  This has the
unfortunate side effect of causing mismatches if a packet was received
when jiffies really was equal to zero.  The odds of that happening are
slim compared to the problems caused by not adding the break however.
Plus, the author used this same method just below, so it is "good enough".

This fixes netfilter bugs #383 and #395.

Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-01 14:29:24 -08:00
Jozsef Kadlecsik
73f306024c [NETFILTER]: Ignore ACKs ACKs on half open connections in TCP conntrack
Mounting NFS file systems after a (warm) reboot could take a long time if
firewalling and connection tracking was enabled.

The reason is that the NFS clients tends to use the same ports (800 and
counting down). Now on reboot, the server would still have a TCB for an
existing TCP connection client:800 -> server:2049. The client sends a
SYN from port 800 to server:2049, which elicits an ACK from the server.
The firewall on the client drops the ACK because (from its point of
view) the connection is still in half-open state, and it expects to see
a SYNACK.

The client will eventually time out after several minutes.

The following patch corrects this, by accepting ACKs on half open
connections as well.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-12-01 14:28:58 -08:00
Jeff Garzik
e538af42e4 Merge branch 'master' 2005-12-01 01:54:02 -05:00
Adrian Bunk
34a0b3cdc0 [IPV6]: make two functions static
This patch makes two needlessly global functions static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-29 16:28:56 -08:00
Adrian Bunk
d127e94a5c [NETFILTER] ipv4: small cleanups
This patch contains the following cleanups:
- make needlessly global code static
- ip_conntrack_core.c: ip_conntrack_flush() -> ip_conntrack_flush(void)

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-29 16:28:18 -08:00
Adrian Bunk
4b30b1c6a3 [IPV4]: make two functions static
This patch makes two needlessly global functions static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-29 16:27:20 -08:00
Arjan van de Ven
9b5b5cff9a [NET]: Add const markers to various variables.
the patch below marks various variables const in net/; the goal is to
move them to the .rodata section so that they can't false-share
cachelines with things that get written to, as well as potentially
helping gcc a bit with optimisations.  (these were found using a gcc
patch to warn about such variables)

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-29 16:21:38 -08:00
Stanislaw Gruszka
64bf69ddff [ATM]: deregistration removes device from atm_devs list immediately
atm_dev_deregister() removes device from atm_dev list immediately to
prevent operations on a phantom device.  Decision to free device based
only on ->refcnt  now. Remove shutdown_atm_dev() use atm_dev_deregister()
instead.  atm_dev_deregister() also asynchronously releases all vccs
related to device.

Signed-off-by: Stanislaw Gruszka <stf_xl@wp.pl>
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-29 16:16:41 -08:00
Stanislaw Gruszka
aaaaaadbe7 [ATM]: avoid race conditions related to atm_devs list
Use semaphore to protect atm_devs list, as no one need access to it from
interrupt context.  Avoid race conditions between atm_dev_register(),
atm_dev_lookup() and atm_dev_deregister().  Fix double spin_unlock() bug.

Signed-off-by: Stanislaw Gruszka <stf_xl@wp.pl>
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-29 16:16:21 -08:00
Mitchell Blank Jr
50accc9c42 [ATM]: attempt to autoload atm drivers
From: Mitchell Blank Jr <mitch@sfgoth.com>
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-29 16:15:18 -08:00
Mitchell Blank Jr
c219750b2e [ATM]: atm_pcr_goal() doesn't modify its argument's contents -- mark it as const
Signed-off-by: Mitchell Blank Jr <mitch@sfgoth.com>
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-29 16:13:55 -08:00
Mitchell Blank Jr
c9933d0856 [ATM]: always return the first interface for ATM_ITF_ANY
From: Mitchell Blank Jr <mitch@sfgoth.com>
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-29 16:13:32 -08:00
Mike Stroyan
18955cfcb2 [IPV4] tcp/route: Another look at hash table sizes
The tcp_ehash hash table gets too big on systems with really big memory.
It is worse on systems with pages larger than 4KB.  It wastes memory that
could be better used.  It also makes the netstat command slow because reading
/proc/net/tcp and /proc/net/tcp6 needs to go through the full hash table.

  The default value should not be larger for larger page sizes.  It seems
that the effect of page size is an unintended error dating back a long
time.  I also wonder if the default value really should be a larger
fraction of memory for systems with more memory.  While systems with
really big ram can afford more space for hash tables, it is not clear to
me that they benefit from increasing the allocation ratio for this table.

  The amount of memory allocated is determined by net/ipv4/tcp.c:tcp_init and
mm/page_alloc.c:alloc_large_system_hash.

tcp_init calls alloc_large_system_hash passing parameters-
    bucketsize=sizeof(struct tcp_ehash_bucket)
    numentries=thash_entries
    scale=(num_physpages >= 128 * 1024) ? (25-PAGE_SHIFT) : (27-PAGE_SHIFT)
    limit=0

On i386, PAGE_SHIFT is 12 for a page size of 4K
On ia64, PAGE_SHIFT defaults to 14 for a page size of 16K

The num_physpages test above makes the allocation take a larger fraction
of the total memory on systems with larger memory.  The threshold size
for a i386 system is 512MB.  For an ia64 system with 16KB pages the
threshold is 2GB.

For smaller memory systems-
On i386, scale = (27 - 12) = 15
On ia64, scale = (27 - 14) = 13
For larger memory systems-
On i386, scale = (25 - 12) = 13
On ia64, scale = (25 - 14) = 11

  For the rest of this discussion, I'll just track the larger memory case.

  The default behavior has numentries=thash_entries=0, so the allocated
size is determined by either scale or by the default limit of 1/16 of
total memory.

In alloc_large_system_hash-
|	numentries = (flags & HASH_HIGHMEM) ? nr_all_pages : nr_kernel_pages;
|	numentries += (1UL << (20 - PAGE_SHIFT)) - 1;
|	numentries >>= 20 - PAGE_SHIFT;
|	numentries <<= 20 - PAGE_SHIFT;

  At this point, numentries is pages for all of memory, rounded up to the
nearest megabyte boundary.

|	/* limit to 1 bucket per 2^scale bytes of low memory */
|	if (scale > PAGE_SHIFT)
|		numentries >>= (scale - PAGE_SHIFT);
|	else
|		numentries <<= (PAGE_SHIFT - scale);

On i386, numentries >>= (13 - 12), so numentries is 1/8196 of
bytes of total memory.
On ia64, numentries <<= (14 - 11), so numentries is 1/2048 of
bytes of total memory.

|        log2qty = long_log2(numentries);
|
|        do {
|                size = bucketsize << log2qty;

bucketsize is 16, so size is 16 times numentries, rounded
down to a power of two.

On i386, size is 1/512 of bytes of total memory.
On ia64, size is 1/128 of bytes of total memory.

For smaller systems the results are
On i386, size is 1/2048 of bytes of total memory.
On ia64, size is 1/512 of bytes of total memory.

  The large page effect can be removed by just replacing
the use of PAGE_SHIFT with a constant of 12 in the calls to
alloc_large_system_hash.  That makes them more like the other uses of
that function from fs/inode.c and fs/dcache.c

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-29 16:12:55 -08:00
Jeff Garzik
2226340eb8 Merge branch 'master' 2005-11-29 03:50:33 -05:00
YOSHIFUJI Hideaki
220bbd7483 [IPV6]: Implement appropriate dummy rule 4 in ipv6_dev_get_saddr().
Ensure to update hiscore.rule in dummy rule 4 in ipv6_dev_get_saddr().
Pointed out by Yan Zheng <yanzheng@21cn.com>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-28 22:27:11 -08:00
Trond Myklebust
b3eb67a2ab SUNRPC: Funny looking code in __rpc_purge_upcall
In __rpc_purge_upcall (net/sunrpc/rpc_pipe.c), the newer code to clean up
 the in_upcall list has a typo.
 Thanks to Vince Busam <vbusam@google.com> for spotting this!

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-11-25 17:11:30 -05:00
Olaf Rempel
133747e8d1 [BRIDGE]: recompute features when adding a new device
We must recompute bridge features everytime the list of underlying 
devices changes, or we might end up with features that are not
supported by all devices (eg. NETIF_F_TSO)
This patch adds the missing recompute when adding a device to the bridge.

Signed-off-by: Olaf Rempel <razzor@kopf-tisch.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-23 19:04:08 -08:00
Benoit Boissinot
de919820cf [NETFILTER]: ip_conntrack_netlink.c needs linux/interrupt.h
net/ipv4/netfilter/ip_conntrack_netlink.c: In function 'ctnetlink_dump_table':
net/ipv4/netfilter/ip_conntrack_netlink.c:409: warning: implicit declaration of function 'local_bh_disable'
net/ipv4/netfilter/ip_conntrack_netlink.c:427: warning: implicit declaration of function 'local_bh_enable'

Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-23 19:03:46 -08:00
Pablo Neira Ayuso
00cb277a4a [NETFILTER] ctnetlink: Fix refcount leak ip_conntrack/nat_proto
Remove proto == NULL checking since ip_conntrack_[nat_]proto_find_get
always returns a valid pointer.

Fix missing ip_conntrack_proto_put in some paths.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-22 14:54:34 -08:00
Jamal Hadi Salim
0ff60a4567 [IPV4]: Fix secondary IP addresses after promotion
This patch fixes the problem with promoting aliases when:
a) a single primary and > 1 secondary addresses
b) multiple primary addresses each with at least one secondary address

Based on earlier efforts from Brian Pomerantz <bapper@piratehaven.org>,
Patrick McHardy <kaber@trash.net> and Thomas Graf <tgraf@suug.ch>

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-22 14:47:37 -08:00
Herbert Xu
c27bd492fd [NETLINK]: Use tgid instead of pid for nlmsg_pid
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-22 14:41:50 -08:00
Patrick McHardy
a516b04950 [DCCP]: Add missing no_policy flag to struct net_protocol
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-20 21:16:13 -08:00
Yasuyuki Kozakai
2b8f2ff6f4 [NETFILTER]: fixed dependencies between modules related with ip_conntrack
- IP_NF_CONNTRACK_MARK is bool and depends on only IP_NF_CONNTRACK
  which is tristate. If a variable depends on IP_NF_CONNTRACK_MARK and
  doesn't care about IP_NF_CONNTRACK, it can be y. This must be avoided.
- IP_NF_CT_ACCT has same problem.
- IP_NF_TARGET_CLUSTERIP also depends on IP_NF_MANGLE.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-20 21:09:55 -08:00
Patrick McHardy
c9e53cbe7a [FIB_TRIE]: Don't show local table in /proc/net/route output
Don't show local table to behave similar to fib_hash.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-20 21:09:00 -08:00
David S. Miller
1ef43204f4 Merge git://git.skbuff.net/gitroot/yoshfuji/linux-2.6.14+advapi-fix/ 2005-11-20 20:52:16 -08:00
Yan Zheng
5d5780df23 [IPV6]: Acquire addrconf_hash_lock for read in addrconf_verify(...)
addrconf_verify(...) only traverse address hash table when
addrconf_hash_lock is held for writing, and it may hold
addrconf_hash_lock for a long time. So I think it's better to acquire
addrconf_hash_lock for reading instead of writing

Signed-off-by: Yan Zheng <yanzheng@21cn.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-20 13:42:20 -08:00
Kris Katterjohn
fb0d366b08 [NET]: Reject socket filter if division by constant zero is attempted.
This way we don't have to check it in sk_run_filter().

Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-20 13:41:34 -08:00
Andrea Bittau
aa8751667d [PKT_SCHED]: sch_netem: correctly order packets to be sent simultaneously
If two packets were queued to be sent at the same time in the future,
their order would be reversed.  This would occur because the queue is
traversed back to front, and a position is found by checking whether
the new packet needs to be sent before the packet being examined.  If
the new packet is to be sent at the same time of a previous packet, it
would end up before the old packet in the queue.  This patch places
packets in the correct order when they are queued to be sent at a same
time in the future.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-20 13:41:05 -08:00
YOSHIFUJI Hideaki
df9890c31a [IPV6]: Fix sending extension headers before and including routing header.
Based on suggestion from Masahide Nakamura <nakam@linux-ipv6.org>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2005-11-20 12:23:18 +09:00
Ville Nuorvala
a305989386 [IPV6]: Fix calculation of AH length during filling ancillary data.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2005-11-20 12:21:59 +09:00
YOSHIFUJI Hideaki
8b8aa4b5a6 [IPV6]: Fix memory management error during setting up new advapi sockopts.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2005-11-20 12:18:17 +09:00
Jeff Garzik
638cbac8de Merge branch 'master' 2005-11-18 13:23:21 -05:00
David S. Miller
9e147a1cfc [IPV6]: Fib dump really needs GFP_ATOMIC.
Revert: 8225ccbaf0

Based upon a report by Yan Zheng.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-17 16:52:51 -08:00
Roman Zippel
05b8b0fafd [NET]: Sanitize NET_SCHED protection in /net/sched/Kconfig
On Thu, 17 Nov 2005, David Gmez wrote:

> I found out that if i select NET_CLS_ROUTE4, save my changes and exit
> menuconfig, execute again make menuconfig and go to QoS options, then the new
> available options are visible. So menuconfig has some problem refreshing
> contents :?

No, they were there before too, but you have to go up one level to see 
them.

It's better in 2.6.15-rc1-git5, but the menu structure is still a little 
messed up, the patch below properly indents all menu entries.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-17 15:22:39 -08:00
David S. Miller
381998241f [LLC]: Fix compiler warnings introduced by TX window scaling changes.
Noticed by Olaf Hering.

The comparisons want a u8 here (the data type on the left-hand branch
is a u8 structure member, and the constant on the right-hand branch is
"~((u8) 128)"), but C turns it into an integer so we get:

net/llc/llc_c_ac.c: In function `llc_conn_ac_inc_npta_value':
net/llc/llc_c_ac.c:998: warning: comparison is always true due to limited range of data type
net/llc/llc_c_ac.c:999: warning: large integer implicitly truncated to unsigned type

Fix this up by explicitly recasting the right-hand branch constant
into a "u8" once more.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-17 15:17:42 -08:00
Harald Welte
2fce76afdb [NETFILTER] ip_conntrack: fix ftp/irc/tftp helpers on ports >= 32768
Since we've converted the ftp/irc/tftp helpers to use the new
module_parm_array() some time ago, we ware accidentially using signed data
types - thus preventing those modules from being used on ports >= 32768.

This patch fixes it by using 'ushort' module parameters.

Thanks to Jan Nijs for reporting this bug.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-17 15:06:47 -08:00
Stephen Hemminger
bd6af700a7 [TCP]: TCP highspeed build error
There is a compile error that crept in with the last patch of
TCP patches.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-17 14:11:18 -08:00
Patrick McHardy
4a59a81051 [NETFILTER]: Fix nf_conntrack compilation with CONFIG_NETFILTER_DEBUG
CC [M]  net/netfilter/nf_conntrack_core.o
net/netfilter/nf_conntrack_core.c: In function 'nf_ct_unlink_expect':
net/netfilter/nf_conntrack_core.c:390: error: 'exp_timeout' undeclared (first use in this function)
net/netfilter/nf_conntrack_core.c:390: error: (Each undeclared identifier is reported only once
net/netfilter/nf_conntrack_core.c:390: error: for each function it appears in.)

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-16 23:14:19 -08:00
Yasuyuki Kozakai
e7c8a41e81 [IPV4,IPV6]: replace handmade list with hlist in IPv{4,6} reassembly
Both of ipq and frag_queue have *next and **prev, and they can be replaced
with hlist. Thanks Arnaldo Carvalho de Melo for the suggestion.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-16 12:55:37 -08:00
Linus Torvalds
f6ff56cd56 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2005-11-15 16:59:38 -08:00
KOVACS Krisztian
5a6f294e43 [NETFILTER] Free layer-3 specific protocol tables at cleanup
Although the comment around the allocation code tells us that
the layer-3 specific protocol tables will be freed when cleaning up,
they aren't. And this makes nfsim complain loudly...

Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-15 16:47:34 -08:00
KOVACS Krisztian
96479376c8 [NETFILTER] Remove nf_conntrack stat proc file when cleaning up
Fix nf_conntrack statistics proc file removal. Looks like the old bug
was forward-ported from ip_conntrack. :-]

Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-15 16:47:09 -08:00
Stephen Hemminger
31f3426904 [TCP]: More spelling fixes.
From Joe Perches

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-15 15:17:10 -08:00
NeilBrown
1887b93529 [PATCH] knfsd: make sure nfsd doesn't hog a cpu forever
Being kernel-threads, nfsd servers don't get pre-empted (depending on
CONFIG).  If there is a steady stream of NFS requests that can be served
from cache, an nfsd thread may hold on to a cpu indefinitely, which isn't
very friendly.

So it is good to have a cond_resched in there (just before looking for a
new request to serve), to make sure we play nice.

Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-15 08:59:19 -08:00
Jeff Garzik
f055408957 Merge branch 'master' 2005-11-15 04:51:40 -05:00
Jochen Friedrich
451677c46f [LLC]: Make core block on remote busy.
Signed-off-by: Jochen Friedrich <jochen@scram.de>
Acked-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-14 21:57:46 -08:00
Jochen Friedrich
59c6196e59 [LLC]: Fix TX window scaling
Signed-off-by: Jochen Friedrich <jochen@scram.de>
Acked-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-14 21:57:15 -08:00
Luiz Capitulino
cb422c464b [IPV6]: Fixes sparse warning in ipv6/ipv6_sockglue.c
The patch below fixes the following sparse warning:

net/ipv6/ipv6_sockglue.c:291:13: warning: Using plain integer as NULL pointer

Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-14 21:43:36 -08:00
Yan Zheng
12da2a435c [IPV6]: small fix for ipv6_dev_get_saddr(...)
The "score.rule++" doesn't make any sense for me. 
According to codes above, I think it should be "hiscore.rule++;" .

Signed-off-by: Yan Zheng<yanzheng@21cn.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-14 21:42:46 -08:00
Yasuyuki Kozakai
302fe1758d [NETFILTER] fix leak of fragment queue at unloading nf_conntrack_ipv6
This patch makes nf_conntrack_ipv6 free all IPv6 fragment queues at module
unloading time.  Also introduce a BUG_ON if we ever again have leaks in
the memory accounting.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-14 15:28:45 -08:00
Yasuyuki Kozakai
1ba430bc3e [NETFILTER] nf_conntrack: fix possibility of infinite loop while evicting nf_ct_frag6_queue
This synchronizes nf_ct_reasm with ipv6 reassembly, and fixes a possibility
of an infinite loop if CPUs evict and create nf_ct_frag6_queue in parallel.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-14 15:28:18 -08:00
Yasuyuki Kozakai
7686a02c0e [NETFILTER]: fix type of sysctl variables in nf_conntrack_ipv6
These variables should be unsigned.  This fixes sysctl handler for
nf_ct_frag6_{low,high}_thresh.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-14 15:27:43 -08:00
Yasuyuki Kozakai
9bdf87d90b [NETFILTER]: cleanup IPv6 Netfilter Kconfig
This removes linux 2.4 configs in comments as TODO lists.
And this also move the entry of nf_conntrack to top like IPv4 Netfilter
Kconfig.

Based on original patch by Krzysztof Piotr Oledzki <ole@ans.pl>.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-14 15:26:58 -08:00
Krzysztof Oledzki
47d4305bf2 [NETFILTER]: link 'netfilter' before ipv4
Staticaly linked nf_conntrack_ipv4 requires nf_conntrack. but currently
nf_conntrack is linked after it. This changes the order of ipv4 and netfilter
to fix this.

Signed-off-by: Krzysztof Oledzki <olenf@ans.pl>
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-14 15:25:59 -08:00
Harald Welte
37d2e7a20d [NETFILTER] nfnetlink: unconditionally require CAP_NET_ADMIN
This patch unconditionally requires CAP_NET_ADMIN for all nfnetlink
messages.  It also removes the per-message cap_required field, since all
existing subsystems use CAP_NET_ADMIN for all their messages anyway.

Patrick McHardy owes me a beer if we ever need to re-introduce this.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-14 15:24:59 -08:00
KOVACS Krisztian
3746a2b140 [NETFILTER] nf_conntrack: Add missing code to TCP conntrack module
Looks like the nf_conntrack TCP code was slightly mismerged: it does
not contain an else branch present in the IPv4 version. Let's add that
code and make the testsuite happy.

Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-14 15:23:01 -08:00
Pablo Neira Ayuso
5655820852 [NETFILTER] ctnetlink: More thorough size checking of attributes
Add missing size checks. Thanks Patrick McHardy for the hint.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-14 15:22:11 -08:00
Pablo Neira Ayuso
dbd36ea496 [NETFILTER] ctnetlink: use size_t to make gcc-4.x happy
Make gcc-4.x happy. Use size_t instead of int. Thanks to Patrick McHardy
for the hint.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-14 15:21:01 -08:00
Mitch Williams
c2373ee989 [PATCH] net: make dev_valid_name public
dev_valid_name() is a useful function.  Make it public.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2005-11-13 14:48:18 -05:00
Mitch Williams
1e2e565965 [PATCH] net: allow newline terminated IP addresses in in_aton
in_aton() gives weird results if it sees a newline at the end of the
input. This patch makes it able to handle such input correctly.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2005-11-13 14:48:17 -05:00
Thomas Graf
8225ccbaf0 [IPV6]: Fix unnecessary GFP_ATOMIC allocation in fib6 dump
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-12 12:15:16 -08:00
Vlad Drukker
a2d7222f0f [NETFILTER] {ip,nf}_conntrack TCP: Accept SYN+PUSH like SYN
Some devices (e.g. Qlogic iSCSI HBA hardware like QLA4010 up to firmware
3.0.0.4) initiates TCP with SYN and PUSH flags set.

The Linux TCP/IP stack deals fine with that, but the connection tracking
code doesn't.

This patch alters TCP connection tracking to accept SYN+PUSH as a valid
flag combination.

Signed-off-by: Vlad Drukker <vlad@storewiz.com>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-12 12:13:14 -08:00
Herbert Xu
efacfbcb6c [IPV6]: Fix rtnetlink dump infinite loop
The recent change to netlink dump "done" callback handling broke IPv6
which played dirty tricks with the "done" callback.  This causes an
infinite loop during a dump.

The following patch fixes it.

This bug was reported by Jeff Garzik.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-12 12:12:05 -08:00
Neil Horman
049b3ff5a8 [SCTP]: Include ulpevents in socket receive buffer accounting.
Also introduces a sysctl option to configure the receive buffer
accounting policy to be either at socket or association level.
Default is all the associations on the same socket share the
receive buffer.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-11 16:08:24 -08:00
Vladislav Yasevich
1e7d3d90c9 [SCTP]: Remove timeouts[] array from sctp_endpoint.
The socket level timeout values are maintained in sctp_sock and
association level timeouts are in sctp_association. So there is
no need for ep->timeouts.

Signed-off-by: Vladislav Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-11 16:06:16 -08:00
Vladislav Yasevich
23ec47a088 [SCTP]: Fix potential NULL pointer dereference in sctp_v4_get_saddr
It is possible to get to sctp_v4_get_saddr() without a valid
association.  This happens when processing OOTB packets and
the cached route entry is no longer valid.
However, when responding to OOTB packets we already properly
set the source address based on the information in the OOTB
packet.  So, if we we get to sctp_v4_get_saddr() without an
association we can simply return.

Signed-off-by: Vladislav Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-11 16:05:55 -08:00
David S. Miller
8eb5591052 [IPV6]: Fix inet6_init missing unregister.
Based mostly upon a patch from Olaf Kirch <okir@suse.de>

When initialization fails in inet6_init(), we should
unregister the PF_INET6 socket ops.

Also, check sock_register()'s return value for errors.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-11 15:05:47 -08:00
Patrick Caulfield
9eb5c94ef2 [DECNET]: fix SIGPIPE
Currently recvmsg generates SIGPIPE whereas sendmsg does not; for the
other stacks it seems to be the other way round!

It also fixes the bug where reading from a socket whose peer has shutdown
returned -EINVAL rather than 0.

Signed-off-by: Patrick Caulfield <patrick@tykepenguin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-11 12:04:28 -08:00
Jeff Garzik
c050970a25 [PATCH] TCP: fix vegas build
Recent TCP changes broke the build.

Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-11 09:21:28 -08:00
Stephen Hemminger
6a438bbe68 [TCP]: speed up SACK processing
Use "hints" to speed up the SACK processing. Various forms 
of this have been used by TCP developers (Web100, STCP, BIC)
to avoid the 2x linear search of outstanding segments.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-10 17:14:59 -08:00
Stephen Hemminger
caa20d9abe [TCP]: spelling fixes
Minor spelling fixes for TCP code.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-10 17:13:47 -08:00
John Heffner
326f36e9e7 [TCP]: receive buffer growth limiting with mixed MTU
This is a patch for discussion addressing some receive buffer growing issues.
This is partially related to the thread "Possible BUG in IPv4 TCP window
handling..." last week.

Specifically it addresses the problem of an interaction between rcvbuf
moderation (receiver autotuning) and rcv_ssthresh.  The problem occurs when
sending small packets to a receiver with a larger MTU.  (A very common case I
have is a host with a 1500 byte MTU sending to a host with a 9k MTU.)  In
such a case, the rcv_ssthresh code is targeting a window size corresponding
to filling up the current rcvbuf, not taking into account that the new rcvbuf
moderation may increase the rcvbuf size.

One hunk makes rcv_ssthresh use tcp_rmem[2] as the size target rather than
rcvbuf.  The other changes the behavior when it overflows its memory bounds
with in-order data so that it tries to grow rcvbuf (the same as with
out-of-order data).

These changes should help my problem of mixed MTUs, and should also help the
case from last week's thread I think.  (In both cases though you still need
tcp_rmem[2] to be set much larger than the TCP window.)  One question is if
this is too aggressive at trying to increase rcvbuf if it's under memory
stress.

Orignally-from: John Heffner <jheffner@psc.edu>
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-10 17:11:48 -08:00
Stephen Hemminger
9772efb970 [TCP]: Appropriate Byte Count support
This is an updated version of the RFC3465 ABC patch originally
for Linux 2.6.11-rc4 by Yee-Ting Li. ABC is a way of counting
bytes ack'd rather than packets when updating congestion control.

The orignal ABC described in the RFC applied to a Reno style
algorithm. For advanced congestion control there is little
change after leaving slow start.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-10 17:09:53 -08:00
Stephen Hemminger
7faffa1c7f [TCP]: add tcp_slow_start helper
Move all the code that does linear TCP slowstart to one
inline function to ease later patch to add ABC support.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-10 17:07:24 -08:00
Stephen Hemminger
2d2abbab63 [TCP]: simplify microsecond rtt sampling
Simplify the code that comuputes microsecond rtt estimate used
by TCP Vegas. Move the callback out of the RTT sampler and into
the end of the ack cleanup.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-10 16:56:12 -08:00
Stephen Hemminger
f4805eded7 [TCP]: fix congestion window update when using TSO deferal
TCP peformance with TSO over networks with delay is awful.
On a 100Mbit link with 150ms delay, we get 4Mbits/sec with TSO and
50Mbits/sec without TSO.

The problem is with TSO, we intentionally do not keep the maximum
number of packets in flight to fill the window, we hold out to until 
we can send a MSS chunk. But, we also don't update the congestion window 
unless we have filled, as per RFC2861.

This patch replaces the check for the congestion window being full
with something smarter that accounts for TSO.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-10 16:53:30 -08:00
Herbert Xu
fb286bb299 [NET]: Detect hardware rx checksum faults correctly
Here is the patch that introduces the generic skb_checksum_complete
which also checks for hardware RX checksum faults.  If that happens,
it'll call netdev_rx_csum_fault which currently prints out a stack
trace with the device name.  In future it can turn off RX checksum.

I've converted every spot under net/ that does RX checksum checks to
use skb_checksum_complete or __skb_checksum_complete with the
exceptions of:

* Those places where checksums are done bit by bit.  These will call
netdev_rx_csum_fault directly.

* The following have not been completely checked/converted:

ipmr
ip_vs
netfilter
dccp

This patch is based on patches and suggestions from Stephen Hemminger
and David S. Miller.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-10 13:01:24 -08:00
Linus Torvalds
b01a55a865 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2005-11-09 19:32:25 -08:00
Trond Myklebust
940e3318c3 [PATCH] SUNRPC: don't reencode when looping in call transmit.
If the call to xprt_transmit() fails due to socket buffer space
exhaustion, we do not need to re-encode the RPC message when we
loop back through call_transmit.

Re-encoding can actually end up triggering the WARN_ON() in
call_decode() if we re-encode something like a read() request and
auth->au_rslack has changed.
It can also cause us to increment the RPCSEC_GSS sequence number
beyond the limits of the allowed window.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 19:31:33 -08:00
Thomas Graf
482a8524f8 [NETLINK]: Generic netlink family
The generic netlink family builds on top of netlink and provides
simplifies access for the less demanding netlink users. It solves
the problem of protocol numbers running out by introducing a so
called controller taking care of id management and name resolving.

Generic netlink modules register themself after filling out their
id card (struct genl_family), after successful registration the
modules are able to register callbacks to command numbers by
filling out a struct genl_ops and calling genl_register_op(). The
registered callbacks are invoked with attributes parsed making
life of simple modules a lot easier.

Although generic netlink modules can request static identifiers,
it is recommended to use GENL_ID_GENERATE and to let the controller
assign a unique identifier to the module. Userspace applications
will then ask the controller and lookup the idenfier by the module
name.

Due to the current multicast implementation of netlink, the number
of generic netlink modules is restricted to 1024 to avoid wasting
memory for the per socket multiacst subscription bitmask.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-10 02:26:41 +01:00
Thomas Graf
9ac4a16983 [RTNETLINK]: Use generic netlink receive queue processor
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-10 02:26:40 +01:00
Thomas Graf
88fc2c8431 [XFRM]: Use generic netlink receive queue processor
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-10 02:26:40 +01:00
Thomas Graf
82ace47a72 [NETLINK]: Generic netlink receive queue processor
Introduces netlink_run_queue() to handle the receive queue of
a netlink socket in a generic way. Processes as much as there
was in the queue upon entry and invokes a callback function
for each netlink message found. The callback function may
refuse a message by returning a negative error code but setting
the error pointer to 0 in which case netlink_run_queue() will
return with a qlen != 0.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-10 02:26:40 +01:00
Thomas Graf
a8f74b2288 [NETLINK]: Make netlink_callback->done() optional
Most netlink families make no use of the done() callback, making
it optional gets rid of all unnecessary dummy implementations.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-10 02:26:40 +01:00
Thomas Graf
bfa83a9e03 [NETLINK]: Type-safe netlink messages/attributes interface
Introduces a new type-safe interface for netlink message and
attributes handling. The interface is fully binary compatible
with the old interface towards userspace. Besides type safety,
this interface features attribute validation capabilities,
simplified message contstruction, and documentation.

The resulting netlink code should be smaller, less error prone
and easier to understand.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-10 02:26:40 +01:00
Yasuyuki Kozakai
9fb9cbb108 [NETFILTER]: Add nf_conntrack subsystem.
The existing connection tracking subsystem in netfilter can only
handle ipv4.  There were basically two choices present to add
connection tracking support for ipv6.  We could either duplicate all
of the ipv4 connection tracking code into an ipv6 counterpart, or (the
choice taken by these patches) we could design a generic layer that
could handle both ipv4 and ipv6 and thus requiring only one sub-protocol
(TCP, UDP, etc.) connection tracking helper module to be written.

In fact nf_conntrack is capable of working with any layer 3
protocol.

The existing ipv4 specific conntrack code could also not deal
with the pecularities of doing connection tracking on ipv6,
which is also cured here.  For example, these issues include:

1) ICMPv6 handling, which is used for neighbour discovery in
   ipv6 thus some messages such as these should not participate
   in connection tracking since effectively they are like ARP
   messages

2) fragmentation must be handled differently in ipv6, because
   the simplistic "defrag, connection track and NAT, refrag"
   (which the existing ipv4 connection tracking does) approach simply
   isn't feasible in ipv6

3) ipv6 extension header parsing must occur at the correct spots
   before and after connection tracking decisions, and there were
   no provisions for this in the existing connection tracking
   design

4) ipv6 has no need for stateful NAT

The ipv4 specific conntrack layer is kept around, until all of
the ipv4 specific conntrack helpers are ported over to nf_conntrack
and it is feature complete.  Once that occurs, the old conntrack
stuff will get placed into the feature-removal-schedule and we will
fully kill it off 6 months later.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-09 16:38:16 -08:00
Ken-ichirou MATSUZAWA
9f0ede52a0 [IPV6]: ip6ip6_lock is not unlocked in error path.
From: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp>

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-09 13:08:29 -08:00
Peter Chubb
44fd0261d3 [IPV6]: Fix fallout from CONFIG_IPV6_PRIVACY
Trying to build today's 2.6.14+git snapshot gives undefined references
to use_tempaddr

Looks like an ifdef got left out.

Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-09 13:05:47 -08:00
Krzysztof Piotr Oledzki
5fd52fe098 [NETFILTER] ctnetlink: ICMP_ID is u_int16_t not u_int8_t.
Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-09 13:04:32 -08:00
Krzysztof Piotr Oledzki
439a9994bb [NETFILTER] ctnetlink: Fix oops when no ICMP ID info in message
This patch fixes an userspace triggered oops. If there is no ICMP_ID
info the reference to attr will be NULL.

Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-09 13:04:08 -08:00
Pablo Neira Ayuso
a856a19a9f [NETFILTER] ctnetlink: Add support to identify expectations by ID's
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-09 13:03:42 -08:00
Pablo Neira Ayuso
fcda46128d [NETFILTER] ctnetlink: propagate error instaed of returning -EPERM
Propagate the error to userspace instead of returning -EPERM if the get
conntrack operation fails.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-09 13:03:26 -08:00
Pablo Neira Ayuso
fe902a91ff [NETFILTER] ctnetlink: return -EINVAL if size is wrong
Return -EINVAL if the size isn't OK instead of -EPERM.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-09 13:03:09 -08:00
Yasuyuki Kozakai
d63a928108 [NETFILTER]: stop tracking ICMP error at early point
Currently connection tracking handles ICMP error like normal packets
if it failed to get related connection. But it fails that after all.

This makes connection tracking stop tracking ICMP error at early point.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-09 13:02:45 -08:00
Harald Welte
ed77de9fc6 [NETFILTER] nfnetlink: only load subsystems if CAP_NET_ADMIN is set
Without this patch, any user can cause nfnetlink subsystems to be
autoloaded.  Those subsystems however could add significant processing
overhead to packet processing, and would refuse any configuration messages
from non-CAP_NET_ADMIN processes anyway.

This patch follows a suggestion from Patrick McHardy.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-09 13:02:16 -08:00
Philip Craig
5978a9b82c [NETFILTER] PPTP helper: fix PNS-PAC expectation call id
The reply tuple of the PNS->PAC expectation was using the wrong call id.

So we had the following situation:
- PNS behind NAT firewall
- PNS call id requires NATing
- PNS->PAC gre packet arrives first

then the PNS->PAC expectation is matched, and the other expectation
is deleted, but the PAC->PNS gre packets do not match the gre conntrack
because the call id is wrong.

We also cannot use ip_nat_follow_master().

Signed-off-by: Philip Craig <philipc@snapgear.com>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-09 13:01:53 -08:00
Pablo Neira Ayuso
81e5c27d08 [NETFILTER] ctnetlink: get_conntrack can use GFP_KERNEL
ctnetlink_get_conntrack is always called from user context, so GFP_KERNEL
is enough.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-09 13:01:19 -08:00
Pablo Neira Ayuso
7a4fe3664b [NETFILTER] ctnetlink: kill unused includes
Kill some useless headers included in ctnetlink. They aren't used in any
way.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-09 13:00:47 -08:00
Pablo Neira Ayuso
119a318494 [NETFILTER] ctnetlink: add module alias to fix autoloading
Add missing module alias. This is a must to load ctnetlink on demand. For
example, the conntrack tool will fail if the module isn't loaded.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-09 13:00:29 -08:00
Pablo Neira Ayuso
02a78cdf42 [NETFILTER] ctnetlink: add marking support from userspace
This patch adds support for conntrack marking from user space.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-09 13:00:04 -08:00
Pablo Neira Ayuso
51df784ed7 [NETFILTER] ctnetlink: check if protoinfo is present
This fixes an oops triggered from userspace. If we don't pass information
about the private protocol info, the reference to attr will be NULL. This is
likely to happen in update messages.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-09 12:59:41 -08:00
Harald Welte
a2506c0432 [NETFILTER] nfnetlink: nfattr_parse() can never fail, make it void
nfattr_parse (and thus nfattr_parse_nested) always returns success. So we
can make them 'void' and remove all the checking at the caller side.

Based on original patch by Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-09 12:59:13 -08:00
Yasuyuki Kozakai
eaae4fa45e [NETFILTER]: refcount leak of proto when ctnetlink dumping tuple
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-09 12:58:46 -08:00
Yasuyuki Kozakai
46998f59c0 [NETFILTER]: packet counter of conntrack is 32bits
The packet counter variable of conntrack was changed to 32bits from 64bits.
This follows that change.
		    
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-09 12:58:05 -08:00
Linus Torvalds
a7c243b544 Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 2005-11-09 08:34:36 -08:00
Christoph Hellwig
49705b7743 [PATCH] sanitize lookup_hash prototype
->permission and ->lookup have a struct nameidata * argument these days to
pass down lookup intents.  Unfortunately some callers of lookup_hash don't
actually pass this one down.  For lookup_one_len() we don't have a struct
nameidata to pass down, but as this function is a library function only
used by filesystem code this is an acceptable limitation.  All other
callers should pass down the nameidata, so this patch changes the
lookup_hash interface to only take a struct nameidata argument and derives
the other two arguments to __lookup_hash from it.  All callers already have
the nameidata argument available so this is not a problem.

At the same time I'd like to deprecate the lookup_hash interface as there
are better exported interfaces for filesystem usage.  Before it can
actually be removed I need to fix up rpc_pipefs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Al Viro <viro@ftp.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:00 -08:00
Christoph Hellwig
e4543eddfd [PATCH] add a vfs_permission helper
Most permission() calls have a struct nameidata * available.  This helper
takes that as an argument and thus makes sure we pass it down for lookup
intents and prepares for per-mount read-only support where we need a struct
vfsmount for checking whether a file is writeable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:55:58 -08:00
Christoph Hellwig
e3305626e0 ieee80211: cleanup crypto list handling, other minor cleanups. 2005-11-09 01:01:04 -05:00
Jeff Garzik
f24e09754b Merge rsync://bughost.org/repos/ieee80211-delta/ 2005-11-09 00:00:29 -05:00
Marcel Holtmann
be9d122730 [Bluetooth]: Remove the usage of /proc completely
This patch removes all relics of the /proc usage from the Bluetooth
subsystem core and its upper layers. All the previous information are
now available via /sys/class/bluetooth through appropriate functions.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-08 09:57:38 -08:00
Marcel Holtmann
1ebb92521d [Bluetooth]: Add endian annotations to the core
This patch adds the endian annotations to the Bluetooth core.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-08 09:57:21 -08:00
Herbert Xu
89f5f0aeed [IPV4]: Fix ip_queue_xmit identity increment for TSO packets
When ip_queue_xmit calls ip_select_ident_more for IP identity selection
it gives it the wrong packet count for TSO packets.  The ip_select_*
functions expect one less than the number of packets, so we need to
subtract one for TSO packets.

This bug was diagnosed and fixed by Tom Young.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-08 09:41:56 -08:00
Jesper Juhl
a51482bde2 [NET]: kfree cleanup
From: Jesper Juhl <jesper.juhl@gmail.com>

This is the net/ part of the big kfree cleanup patch.

Remove pointless checks for NULL prior to calling kfree() in net/.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Arnaldo Carvalho de Melo <acme@conectiva.com.br>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2005-11-08 09:41:34 -08:00
Julian Anastasov
dc8103f25f [IPVS]: fix connection leak if expire_nodest_conn=1
There was a fix in 2.6.13 that changed the behaviour of
ip_vs_conn_expire_now function not to put reference to connection,
its callers should hold write lock or connection refcnt. But we
forgot to convert one caller, when the real server for connection
is unavailable caller should put the connection reference. It
happens only when sysctl var expire_nodest_conn is set to 1 and
such connections never expire. Thanks to Roberto Nibali who found
the problem and tested a 2.4.32-rc2 patch, which is equal to this
2.6 version. Patch for 2.4 is already sent to Marcelo.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Roberto Nibali <ratz@drugphish.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-08 09:40:05 -08:00
Thomas Graf
b541ca2c5a [PKT_SCHED]: Correctly handle empty ematch trees
Fixes an invalid memory reference when the basic classifier
is used without any ematches but just actions.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-08 09:39:17 -08:00
YOSHIFUJI Hideaki
072047e4de [IPV6]: RFC3484 compliant source address selection
Choose more appropriate source address; e.g.
 - outgoing interface
 - non-deprecated
 - scope
 - matching label

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-08 09:38:30 -08:00
YOSHIFUJI Hideaki
b1cacb6820 [IPV6]: Make ipv6_addr_type() more generic so that we can use it for source address selection.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-08 09:38:12 -08:00
YOSHIFUJI Hideaki
971f359ddc [IPV6]: Put addr_diff() into common header for future use.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-08 09:37:56 -08:00
Jeff Garzik
3133c5e896 Merge git://git.tuxdriver.com/git/netdev-jwl 2005-11-07 22:54:48 -05:00
Adrian Bunk
fd7a516efb [PATCH] fix NET_RADIO=n, IEEE80211=y compile
This patch fixes the following compile error with CONFIG_NET_RADIO=n and
CONFIG_IEEE80211=y:

  LD      .tmp_vmlinux1
net/built-in.o: In function `ieee80211_rx':
: undefined reference to `wireless_spy_update'
make: *** [.tmp_vmlinux1] Error 1

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2005-11-07 21:50:00 -05:00
Volker Braun
e189277a3f Fix problem with WEP unicast key > index 0
The functions ieee80211_wx_{get,set}_encodeext fail if one tries to set
unicast (IW_ENCODE_EXT_GROUP_KEY not set) keys at key indices>0. But at
least some Cisco APs dish out dynamic WEP unicast keys at index !=0.

Signed-off-by: Volker Braun <volker.braun@physik.hu-berlin.de>
Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
2005-11-07 16:19:02 -06:00
James Ketrenos
81f875208e scripts/Lindent on ieee80211 subsystem.
Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
2005-11-07 16:18:48 -06:00
Linus Torvalds
8e33ba4976 Merge master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6 2005-11-07 08:05:11 -08:00
Linus Torvalds
8cde0776ec Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 2005-11-07 08:04:01 -08:00
NeilBrown
80d188a643 [PATCH] knfsd: make sure svc_process call the correct pg_authenticate for multi-service port
If an RPC socket is serving multiple programs, then the pg_authenticate of
the first program in the list is called, instead of pg_authenticate for the
program to be run.

This does not cause a problem with any programs in the current kernel, but
could confuse future code.

Also set pg_authenticate for nfsd_acl_program incase it ever gets used.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07 07:53:49 -08:00
Jeff Garzik
a10b5aacea Remove linux/version.h include from drivers/net/phy/* and net/ieee80211/*.
Unused, and causes the files to be needlessly rebuilt in some cases.
2005-11-05 23:39:54 -05:00
Arnaldo Carvalho de Melo
2d43f1128a Merge branch 'red' of 84.73.165.173:/home/tgr/repos/net-2.6 2005-11-05 22:30:29 -02:00
Stephen Hemminger
6df716340d [TCP/DCCP]: Randomize port selection
This patch randomizes the port selected on bind() for connections
to help with possible security attacks. It should also be faster
in most cases because there is no need for a global lock.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 21:23:15 -02:00
Herbert Xu
6151b31c96 [NET]: Fix race condition in sk_stream_wait_connect
When sk_stream_wait_connect detects a state transition to ESTABLISHED
or CLOSE_WAIT prior to it going to sleep, it will return without
calling finish_wait and decrementing sk_write_pending.

This may result in crashes and other unintended behaviour.

The fix is to always call finish_wait and update sk_write_pending since
it is safe to do so even if the wait entry is no longer on the queue.

This bug was tracked down with the help of Alex Sidorenko and the
fix is also based on his suggestion.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 21:05:20 -02:00
Stephen Hemminger
eb229c4cdc [NETEM]: Add version string
Add a version string to help support issues.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 20:59:21 -02:00
Stephen Hemminger
300ce174eb [NETEM]: Support time based reordering
Change netem to support packets getting reordered because of variations in
delay. Introduce a special case version of FIFO that queues packets in order
based on the netem delay.

Since netem is classful, those users that don't want jitter based reordering
can just insert a pfifo instead of the default.

This required changes to generic skbuff code to allow finer grain manipulation
of sk_buff_head.  Insertion into the middle and reverse walk.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 20:56:41 -02:00
Thomas Graf
bdc450a0bb [PKT_SCHED]: (G)RED: Introduce hard dropping
Introduces a new flag TC_RED_HARDDROP which specifies that if ECN
marking is enabled packets should still be dropped once the
average queue length exceeds the maximum threshold.

This _may_ help to avoid global synchronisation during small
bursts of peers advertising but not caring about ECN. Use this
option very carefully, it does more harm than good if
(qth_max - qth_min) does not cover at least two average burst
cycles.

The difference to the current behaviour, in which we'd run into
the hard queue limit, is that due to the low pass filter of RED
short bursts are less likely to cause a global synchronisation.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:29 +01:00
Thomas Graf
b38c7eef7e [PKT_SCHED]: GRED: Support ECN marking
Adds a new u8 flags in a unused padding area of the netlink
message. Adds ECN marking support to be used instead of dropping
packets immediately.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:29 +01:00
Thomas Graf
d8f64e1960 [PKT_SCHED]: GRED: Fix restart of idle period in WRED mode upon dequeue and drop
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:28 +01:00
Thomas Graf
1e4dfaf9b9 [PKT_SCHED]: GRED: Cleanup and remove unnecessary code
Removes unnecessary includes, initializers, and simplifies
the code a bit.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:28 +01:00
Thomas Graf
6214e653cc [PKT_SCHED]: GRED: Remove auto-creation of default VQ
Since we are no longer depending on the default VQ to be always
allocated we can leave it up to the user to actually create it.
This gives the user the ability to leave it out on purpose and
enqueue packets directly to the device without applying the RED
algorithm.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:28 +01:00
Thomas Graf
7051703b99 [PKT_SCHED]: GRED: Dont abuse default VQ for equalizing
Introduces a new red parameter set for use in equalize mode,
although only the qavg variable and the idle period marker are
being used for now this makes it possible to allow a separate
parameter set to be used for equalize later on.

The use of this separate parameter set fixes a bogus start of
an idle period in gred_drop() which did start an idle period
on the default VQ even if equalize mode was disabled.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:28 +01:00
Thomas Graf
4a591834cf [PKT_SCHED]: GRED: Remove initd flag
The case when the default VQ is not set up yet is already handled
in a less error prone way.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:28 +01:00
Thomas Graf
18e3fb84e6 [PKT_SCHED]: GRED: Improve error handling and messages
Try to enqueue packets if we cannot associate it with a VQ, this
basically means that the default VQ has not been set up yet.

We must check if the VQ still exists while requeueing, the VQ
might have been changed between dequeue and the requeue of the
underlying qdisc.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:28 +01:00
Thomas Graf
716a1b40b0 [PKT_SCHED]: GRED: Introduce tc_index_to_dp()
Adds a transformation function returning the DP index for a
given skb according to its tc_index.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:27 +01:00
Thomas Graf
edf7a7b1f0 [PKT_SCHED]: GRED: Use generic queue management interface
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:27 +01:00
Thomas Graf
c3b553cdaf [PKT_SCHED]: GRED: Report congestion related drops as NET_XMIT_CN
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:27 +01:00
Thomas Graf
301d063c29 [PKT_SCHED]: GRED: Do not reset statistics in gred_reset/gred_change
Qdiscs are not supposed to reset statistics in reset() and while
changing parameters. My argumentation is that if the user wants
the counters to be reset he can simply remove and readd the
qdiscs, that's what most users do anyway.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:27 +01:00
Thomas Graf
22b33429ab [PKT_SCHED]: GRED: Use new generic red interface
Simplifies code a lot by separating the red algorithm and the
queueing logic. We now differentiate between probability marks
and forced marks but sum them together again to not break
backwards compatibility.

This brings GRED back to the level of RED and improves the
accuracy of the averge queue length calculations when stab
suggests a zero shift.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:27 +01:00
Thomas Graf
f62d6b936d [PKT_SCHED]: GRED: Use central VQ change procedure
Introduces a function gred_change_vq() acting as a central point
to change VQ parameters. Fixes priority inheritance in rio mode
when the default DP equals 0. Adds proper locking during changes.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:27 +01:00
Thomas Graf
a8aaa9958e [PKT_SCHED]: GRED: Report out-of-bound DPs as illegal
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:26 +01:00
Thomas Graf
6639607ed9 [PKT_SCHED]: GRED: Use a central table definition change procedure
Introduces a function gred_change_table_def() acting as a central
point to change the table definition.

Adds missing validations for table definition: MAX_DPs > DPs > 0
and def_DP < DPs thus fixing possible invalid memory reference
oopses. Only root could do it but having a typo crashing the
machine is a bit hard.

Adds missing locking while changing the table definition, the
operation of changing the number of DPs and removing shadowed VQs
may not be interrupted by a dequeue.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:26 +01:00
Thomas Graf
e06368221c [PKT_SCHED]: GRED: Dump table definition
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:26 +01:00
Thomas Graf
05f1cc01b4 [PKT_SCHED]: GRED: Cleanup dumping
Avoids the allocation of a buffer by appending the VQs directly
to the skb and simplifies the code by using the appropriate
message construction macros.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:26 +01:00
Thomas Graf
d6fd4e9667 [PKT_SCHED]: GRED: Transform grio to GRED_RIO_MODE
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:25 +01:00
Thomas Graf
dea3f62852 [PKT_SCHED]: GRED: Cleanup equalize flag and add new WRED mode detection
Introduces a flags variable using bitops and transforms eqp to use
it. Converts the conditions of the form (wred && rio) to (wred)
since wred can only be enabled in rio mode anyway.

The patch also improves WRED mode detection. The current behaviour
does not allow WRED mode to be turned off again without removing
the whole qdisc first. The new algorithm checks each VQ against
each other looking for equal priorities every time a VQ is changed
or added. The performance is poor, O(n**2), but it's used only
during administrative tasks and the number of VQs is strictly
limited.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:25 +01:00
Thomas Graf
dba051f36a [PKT_SCHED]: RED: Cleanup and remove unnecessary code
Removes the skb trimming code which is not needed since we never
touch the skb upon failure. Removes unnecessary includes,
initializers, and simplifies the code a bit. Removes Jamal's
obsolete email addresses upon his own request.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:25 +01:00
Thomas Graf
6a1b63d467 [PKT_SCHED]: RED: Dont start idle periods while already idling
We should not interrupt and restart an idle period while idling already.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:25 +01:00
Thomas Graf
9e178ff27c [PKT_SCHED]: RED: Use generic queue management interface
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:25 +01:00
Thomas Graf
6b31b28a44 [PKT_SCHED]: RED: Use new generic red interface
Simplifies code a lot by separating the red algorithm and the
queueing logic. We now differentiate between probability marks
and forced marks but sum them together again to not break
backwards compatibility.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 22:02:25 +01:00
Stephen Hemminger
07aaa11540 [NETEM]: use PSCHED_LESS
Convert netem to use PSCHED_LESS and warn if requeue fails.
With some of the psched clock sources, the subtraction doesn't
work always work right without wrapping.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 17:03:46 -02:00
Harald Welte
1758ee0ea2 [NETFILTER] nf_queue: Fix Ooops when no queue handler registered
With the new nf_queue generalization in 2.6.14, we've introduced a bug
that causes an oops as soon as a packet is queued but no queue handler
registered.  This patch fixes it.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 16:43:29 -02:00
Harald Welte
433a4d3b54 [NETFILTER]: CONNMARK target needs ip_conntrack
There's a missing dependency from the CONNMARK target to ip_conntrack.

Signed-off-by: Pablo Neira Ayuso <pablo@eurodev.net>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 16:39:20 -02:00
Harald Welte
10dfdc69ea [NETFILTER] nfnetlink: Use kzalloc
These is a cleanup patch, kzalloc can be used in a couple of cases

Signed-off-by: Samir Bellabes <sbellabes@mandriva.com>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 16:35:27 -02:00
Harald Welte
0f81eb4db4 [NETFILTER]: Fix double free after netlink_unicast() in ctnetlink
It's not necessary to free skb if netlink_unicast() failed.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 03:28:37 -02:00
Harald Welte
d2a7bb7141 [NETFILTER] NAT: Fix module refcount dropping too far
The unknown protocol is used as a fallback when a protocol isn't known.
Hence we cannot handle it failing, so don't set ".me".  It's OK, since we
only grab a reference from within the same module (iptable_nat.ko), so we
never take the module refcount from 0 to 1.

Also, remove the "protocol is NULL" test: it's never NULL.

Signed-off-by: Rusty Rusty <rusty@rustcorp.com.au>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-05 01:23:34 -02:00
Harald Welte
d811552eda [NETFILTER] PPTP helper: Fix endianness bug in GRE key / CallID NAT
This endianness bug slipped through while changing the 'gre.key' field in the
conntrack tuple from 32bit to 16bit.

None of my tests caught the problem, since the linux pptp client always has
'0' as call id / gre key.  Only windows clients actually trigger the bug.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-04 23:19:17 -02:00
Harald Welte
3428c209c6 [NETFILTER] PPTP helper: Fix compilation of conntrack helper without NAT
This patch fixes compilation of the PPTP conntrack helper when NAT is
configured off.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-04 23:02:53 -02:00
Chuck Lever
0bbacc402e NFS,SUNRPC,NLM: fix unused variable warnings when CONFIG_SYSCTL is disabled
Fix some dprintk's so that NLM, NFS client, and RPC client compile
 cleanly if CONFIG_SYSCTL is disabled.

 Test plan:
 Compile kernel with CONFIG_NFS enabled and CONFIG_SYSCTL disabled.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-11-04 15:39:48 -05:00
Chuck Lever
c556b75496 SUNRPC: allow sunrpc.o to link when CONFIG_SYSCTL is disabled
The sunrpc module should build properly even when CONFIG_SYSCTL is
 disabled.

 Reported by Jan-Benedict Glaw.

 Test plan:
 Compile kernel with CONFIG_NFS as a module and built-in, and CONFIG_SYSCTL
 enabled and disabled.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-11-04 15:39:45 -05:00
Thomas Graf
52ab4ac258 [PKT_SCHED]: Rework QoS and/or fair queueing configuration
Make "QoS and/or fair queueing" have its own menu, it's too big to be
inlined into "Network options". Remove the obsolete NET_QOS option.
Automatically select NET_CLS if needed. Do the same for NET_ESTIMATOR
but allow it to be selected manually for statistical purposes. Add
comments to separate queueing from classification. Fix dependencies
and ordering of classifiers. Improve descriptions/help texts and
remove outdated pieces.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-03 02:29:06 -02:00
Yan Zheng
979ad66312 [IPV6]: inet6_ifinfo_notify should use RTM_DELLINK in addrconf_ifdown
Signed-off-by: Yan Zheng <yanzheng@21cn.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-03 01:03:05 -02:00
Herbert Xu
c75d721c76 [NET]: Fix zero-size datagram reception
The recent rewrite of skb_copy_datagram_iovec broke the reception of
zero-size datagrams.  This patch fixes it.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-02 22:25:04 -02:00
Stephen Hemminger
450b5b1898 [TCP]: BIC max increment too large
The max growth of BIC TCP is too large. Original code was based on
BIC 1.0 and the default there was 32. Later code (2.6.13) included
compensation for delayed acks, and should have reduced the default
value to 16; since normally TCP gets one ack for every two packets sent.

The current value of 32 makes BIC too aggressive and unfair to other
flows.

Submitted-by: Injong Rhee <rhee@eos.ncsu.edu>
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Acked-by: Ian McDonald <imcdnzl@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-02 21:24:01 -02:00
Yan Zheng
8713dbf057 [MCAST]: ip[6]_mc_add_src should be called when number of sources is zero
And filter mode is exclude.

Further explanation by David Stevens:

Multicast source filters aren't widely used yet, and that's really the only
feature that's affected if an application actually exercises this bug, as far
as I can tell. An ordinary filter-less multicast join should still work, and
only forwarded multicast traffic making use of filters and doing empty-source
filters with the MSFILTER ioctl would be at risk of not getting multicast
traffic forwarded to them because the reports generated would not be based on
the correct counts.

Signed-off-by: Yan Zheng <yanzheng@21cn.com
Acked-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-11-02 21:03:57 -02:00
Yan Zheng
97300b5fdf [MCAST] IPv6: Check packet size when process Multicast
Signed-off-by: Yan Zheng <yanzheng@21cn.com
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-31 22:52:03 -02:00
Herbert Xu
edc9e81917 [DCCP]: Set socket owner iff packet is not data
Here is a complimentary insurance policy for those feeling a bit insecure.
You don't have to accept this.  However, if you do, you can't blame me for
it :)
  
> 1) dccp_transmit_skb sets the owner for all packets except data packets.
  
We can actually verify this by looking at pkt_type.
  
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-31 22:30:02 -02:00
Herbert Xu
48918a4dbd [DCCP]: Simplify skb_set_owner_w semantics
While we're at it let's reorganise the set_owner_w calls a little so that:
  
1) dccp_transmit_skb sets the owner for all packets except data packets.
2) Add dccp_skb_entail to set owner for packets queued for retransmission.
3) Make dccp_transmit_skb static.
  
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-31 19:26:17 -02:00
Yan Zheng
9d17f21893 [IPV6]: Fix behavior of ip6_route_input() for link local address
I find that linux will reply echo request destined to an address which
belongs to an interface other than the one from which the request received.
This behavior doesn't make sense for link local address.

YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> said:

Please note that sender does need to setup neighbor entry by hand to reproduce
this bug.  (Link-local address on eth1 is not visible on eth0, from the point
of view of neighbor discovery in IPv6.)

 +--------+               +--------+
 | sender |               | router |
 +---+----+               +-+----+-+
     |eth0              eth0|    |eth1
-----+----------------------+-  -+--------------

Signed-off-by: Yan Zheng <yanzheng@21cn.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Andrew Morton <akpm@osdl.org> (forwarded)
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-31 16:54:05 -02:00
Andrew Morton
a3d7a9d775 [ROSE]: rose_heartbeat_expiry() locking fix
Missing unlock, as noted by Ted Unangst <tedu@coverity.com>.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-31 16:41:45 -02:00
Harald Welte
6b7d31fcdd [NETFILTER]: Add "revision" support to arp_tables and ip6_tables
Like ip_tables already has it for some time, this adds support for
having multiple revisions for each match/target.  We steal one byte from
the name in order to accomodate a 8 bit version number.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-31 16:36:08 -02:00
Stephen Hemminger
6ede2463c8 [BRIDGE]: Use ether_compare
Use compare_ether_addr in bridge code.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-31 16:34:10 -02:00
Jean Delvare
3fa63c7d82 [PATCH] Typo fix: dot after newline in printk strings
Typo fix: dots appearing after a newline in printk strings.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30 17:37:20 -08:00
Herbert Xu
6df5b9f48d [CRYPTO] Simplify one-member scatterlist expressions
This patch rewrites various occurences of &sg[0] where sg is an array
of length one to simply sg.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2005-10-30 11:19:43 +11:00
David Hardeman
378f058cc4 [PATCH] Use sg_set_buf/sg_init_one where applicable
This patch uses sg_set_buf/sg_init_one in some places where it was
duplicated.

Signed-off-by: David Hardeman <david@2gen.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Greg KH <greg@kroah.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2005-10-30 11:19:43 +11:00
Al Viro
a6e0eb3791 [PATCH] bluetooth hidp is broken on s390
Bluetooth HIDP selects INPUT and it really needs it to be there - module
depends on input core.  And input core is never built on s390...

Marked as broken on s390, for now; if somebody has better ideas, feel
free to fix it and remove dependency...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29 10:35:08 -07:00
Jayachandran C
9fcc2e8a75 [IPV4]: Fix issue reported by Coverity in ipv4/fib_frontend.c
fib_del_ifaddr() dereferences ifa->ifa_dev, so the code already assumes that
ifa->ifa_dev is non-NULL, the check is unnecessary.

Signed-off-by: Jayachandran C. <c.jayachandran at gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-29 02:53:39 -02:00
Stephen Hemminger
360ac8e2f1 [ETH]: ether address compare
Expose faster ether compare for use by protocols and other
driver. And change name to be more consistent with other ether
address manipulation routines in same file

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-29 02:23:58 -02:00
Arnaldo Carvalho de Melo
974f7bc578 Merge master.kernel.org:/pub/scm/linux/kernel/git/sridhar/lksctp-2.6 2005-10-28 23:35:02 -02:00
Ivan Skytte Jorgensen
64a0c1c81e [SCTP] Do not allow unprivileged programs initiating new associations on
privileged ports.

Signed-off-by: Ivan Skytte Jorgensen <isj-sctp@i1.dk>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
2005-10-28 15:39:02 -07:00
Ivan Skytte Jorgensen
96a339985d [SCTP] Allow SCTP_MAXSEG to revert to default frag point with a '0' value.
Signed-off-by: Ivan Skytte Jorgensen <isj-sctp@i1.dk>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
2005-10-28 15:36:12 -07:00
Ivan Skytte Jorgensen
a1ab358269 [SCTP] Fix SCTP_SETADAPTION sockopt to use the correct structure.
Signed-off-by: Ivan Skytte Jorgensen <isj-sctp@i1.dk>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
2005-10-28 15:33:24 -07:00
Ivan Skytte Jorgensen
eaa5c54dbe [SCTP] Rename SCTP specific control message flags.
Rename SCTP specific control message flags to use SCTP_ prefix rather than
MSG_ prefix as per the latest sctp sockets API draft.

Signed-off-by: Ivan Skytte Jorgensen <isj-sctp@i1.dk>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
2005-10-28 15:10:00 -07:00
Linus Torvalds
84860bf064 Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6 2005-10-28 13:09:47 -07:00
Yan Zheng
f12baeab9d [MCAST] IPv6: Fix algorithm to compute Querier's Query Interval
5.1.3.  Maximum Response Code

   The Maximum Response Code field specifies the maximum time allowed
   before sending a responding Report.  The actual time allowed, called
   the Maximum Response Delay, is represented in units of milliseconds,
   and is derived from the Maximum Response Code as follows:

   If Maximum Response Code < 32768,
      Maximum Response Delay = Maximum Response Code

   If Maximum Response Code >=32768, Maximum Response Code represents a
   floating-point value as follows:

       0 1 2 3 4 5 6 7 8 9 A B C D E F
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |1| exp |          mant         |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Maximum Response Delay = (mant | 0x1000) << (exp+3)


5.1.9.  QQIC (Querier's Query Interval Code)

   The Querier's Query Interval Code field specifies the [Query
   Interval] used by the Querier.  The actual interval, called the
   Querier's Query Interval (QQI), is represented in units of seconds,
   and is derived from the Querier's Query Interval Code as follows:

   If QQIC < 128, QQI = QQIC

   If QQIC >= 128, QQIC represents a floating-point value as follows:

       0 1 2 3 4 5 6 7
      +-+-+-+-+-+-+-+-+
      |1| exp | mant  |
      +-+-+-+-+-+-+-+-+

   QQI = (mant | 0x10) << (exp + 3)

                                                -- rfc3810

#define MLDV2_QQIC(value) MLDV2_EXP(0x80, 4, 3, value)
#define MLDV2_MRC(value) MLDV2_EXP(0x8000, 12, 3, value)

Above macro are defined in mcast.c. but 1 << 4 == 0x10 and 1 << 12 == 0x1000.
So the result computed by original Macro is larger.

Signed-off-by: Yan Zheng <yanzheng@21cn.com>
Acked-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-28 16:35:18 -02:00
Ananda Raju
e89e9cf539 [IPv4/IPv6]: UFO Scatter-gather approach
Attached is kernel patch for UDP Fragmentation Offload (UFO) feature.

1. This patch incorporate the review comments by Jeff Garzik.
2. Renamed USO as UFO (UDP Fragmentation Offload)
3. udp sendfile support with UFO

This patches uses scatter-gather feature of skb to generate large UDP
datagram. Below is a "how-to" on changes required in network device
driver to use the UFO interface.

UDP Fragmentation Offload (UFO) Interface:
-------------------------------------------
UFO is a feature wherein the Linux kernel network stack will offload the
IP fragmentation functionality of large UDP datagram to hardware. This
will reduce the overhead of stack in fragmenting the large UDP datagram to
MTU sized packets

1) Drivers indicate their capability of UFO using
dev->features |= NETIF_F_UFO | NETIF_F_HW_CSUM | NETIF_F_SG

NETIF_F_HW_CSUM is required for UFO over ipv6.

2) UFO packet will be submitted for transmission using driver xmit routine.
UFO packet will have a non-zero value for

"skb_shinfo(skb)->ufo_size"

skb_shinfo(skb)->ufo_size will indicate the length of data part in each IP
fragment going out of the adapter after IP fragmentation by hardware.

skb->data will contain MAC/IP/UDP header and skb_shinfo(skb)->frags[]
contains the data payload. The skb->ip_summed will be set to CHECKSUM_HW
indicating that hardware has to do checksum calculation. Hardware should
compute the UDP checksum of complete datagram and also ip header checksum of
each fragmented IP packet.

For IPV6 the UFO provides the fragment identification-id in
skb_shinfo(skb)->ip6_frag_id. The adapter should use this ID for generating
IPv6 fragments.

Signed-off-by: Ananda Raju <ananda.raju@neterion.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (forwarded)
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-28 16:30:00 -02:00
Arnaldo Carvalho de Melo
de5144164f Merge master.kernel.org:/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6 2005-10-28 15:49:24 -02:00
Marcel Holtmann
dd7f5527b3 [Bluetooth] Update security filter for Extended Inquiry Response
This patch updates the HCI security filter with support for the Extended
Inquiry Response (EIR) feature.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2005-10-28 19:20:53 +02:00
Marcel Holtmann
6516455d3b [Bluetooth] Make more functions static
This patch makes another bunch of functions static.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2005-10-28 19:20:48 +02:00
Marcel Holtmann
408c1ce271 [Bluetooth] Move CRC table into RFCOMM core
This patch moves rfcomm_crc_table[] into the RFCOMM core, because there
is no need to keep it in a separate file.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2005-10-28 19:20:36 +02:00
Greg KH
6fbfddcb52 Merge ../bleed-2.6 2005-10-28 10:13:16 -07:00
Dmitry Torokhov
34abf91f40 [PATCH] Input: convert net/bluetooth to dynamic input_dev allocation
Input: convert net/bluetooth to dynamic input_dev allocation

This is required for input_dev sysfs integration

Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-10-28 09:52:54 -07:00
Linus Torvalds
e5dfa9282f Merge branch 'upstream' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 2005-10-28 09:05:25 -07:00
Linus Torvalds
236fa08168 Merge master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6.15 2005-10-28 08:50:37 -07:00
Al Viro
7d877f3bda [PATCH] gfp_t: net/*
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-28 08:16:47 -07:00
Trond Myklebust
434f1d10c1 Merge /home/trondmy/scm/kernel/git/torvalds/linux-2.6 2005-10-27 22:13:32 -04:00
Trond Myklebust
6070fe6f82 RPC: Ensure that nobody can queue up new upcalls after rpc_close_pipes()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-27 22:12:46 -04:00
Jeff Garzik
b2ab040db8 Merge branch 'master' 2005-10-27 20:35:17 -04:00
Trond Myklebust
4c2cb58c55 Merge /home/trondmy/scm/kernel/git/torvalds/linux-2.6 2005-10-27 19:12:49 -04:00
Trond Myklebust
6fa05b1736 Revert "RPC: stops the release_pipe() funtion from being called twice"
This reverts 747c5534c9 commit.
2005-10-27 19:08:18 -04:00
Herbert Xu
2ad41065d9 [TCP]: Clear stale pred_flags when snd_wnd changes
This bug is responsible for causing the infamous "Treason uncloaked"
messages that's been popping up everywhere since the printk was added.
It has usually been blamed on foreign operating systems.  However,
some of those reports implicate Linux as both systems are running
Linux or the TCP connection is going across the loopback interface.

In fact, there really is a bug in the Linux TCP header prediction code
that's been there since at least 2.1.8.  This bug was tracked down with
help from Dale Blount.

The effect of this bug ranges from harmless "Treason uncloaked"
messages to hung/aborted TCP connections.  The details of the bug
and fix is as follows.

When snd_wnd is updated, we only update pred_flags if
tcp_fast_path_check succeeds.  When it fails (for example,
when our rcvbuf is used up), we will leave pred_flags with
an out-of-date snd_wnd value.

When the out-of-date pred_flags happens to match the next incoming
packet we will again hit the fast path and use the current snd_wnd
which will be wrong.

In the case of the treason messages, it just happens that the snd_wnd
cached in pred_flags is zero while tp->snd_wnd is non-zero.  Therefore
when a zero-window packet comes in we incorrectly conclude that the
window is non-zero.

In fact if the peer continues to send us zero-window pure ACKs we
will continue making the same mistake.  It's only when the peer
transmits a zero-window packet with data attached that we get a
chance to snap out of it.  This is what triggers the treason
message at the next retransmit timeout.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-27 15:11:04 -02:00
Andrew Morton
4bcde03d41 [PATCH] svcsock timestamp fix
Convert nanoseconds to microseconds correctly.

Spotted by Steve Dickson <SteveD@redhat.com>

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-26 10:39:43 -07:00
Jeff Garzik
35848e048f [PATCH] kill massive wireless-related log spam
Although this message is having the intended effect of causing wireless
driver maintainers to upgrade their code, I never should have merged this
patch in its present form.  Leading to tons of bug reports and unhappy
users.

Some wireless apps poll for statistics regularly, which leads to a printk()
every single time they ask for stats.  That's a little bit _too_ much of a
reminder that the driver is using an old API.

Change this to printing out the message once, per kernel boot.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-26 10:39:43 -07:00
Jeff Garzik
1f57389a38 Merge branch 'master' 2005-10-26 01:06:45 -04:00
James Ketrenos
077783f877 [PATCH] ieee80211 build fix
James Ketrenos wrote:
> [3/4] Use the tx_headroom and reserve requested space.

This patch introduced a compile problem; patch below corrects this.

Fixed compilation error due to not passing tx_headroom in
ieee80211_tx_frame.

Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-10-26 00:54:23 -04:00
David Engel
dcab5e1eec [IPV4]: Fix setting broadcast for SIOCSIFNETMASK
Fix setting of the broadcast address when the netmask is set via
SIOCSIFNETMASK in Linux 2.6.  The code wanted the old value of
ifa->ifa_mask but used it after it had already been overwritten with
the new value.

Signed-off-by: David Engel <gigem@comcast.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-26 01:20:21 -02:00
Ralf Baechle
95df1c04ab [AX.25]: Use constant instead of magic number
Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-26 01:14:09 -02:00
Randy Dunlap
c83c248618 [SK_BUFF] kernel-doc: fix skbuff warnings
Add kernel-doc to skbuff.h, skbuff.c to eliminate kernel-doc warnings.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-26 01:10:18 -02:00
Jayachandran C
0d0d2bba97 [IPV4]: Remove dead code from ip_output.c
skb_prev is assigned from skb, which cannot be NULL. This patch removes the
unnecessary NULL check.

Signed-off-by: Jayachandran C. <c.jayachandran at gmail.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-26 00:58:54 -02:00
Jayachandran C
ea7ce40649 [NETLINK]: Remove dead code in af_netlink.c
Remove the variable nlk & call to nlk_sk as it does not have any side effect.

Signed-off-by: Jayachandran C. <c.jayachandran at gmail.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-26 00:54:46 -02:00
Herbert Xu
80b30c1023 [IPSEC]: Kill obsolete get_mss function
Now that we've switched over to storing MTUs in the xfrm_dst entries,
we no longer need the dst's get_mss methods.  This patch gets rid of
them.

It also documents the fact that our MTU calculation is not optimal
for ESP.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-26 00:48:45 -02:00
Herbert Xu
1371e37da2 [IPV4]: Kill redundant rcu_dereference on fa_info
This patch kills a redundant rcu_dereference on fa->fa_info in fib_trie.c.
As this dereference directly follows a list_for_each_entry_rcu line, we
have already taken a read barrier with respect to getting an entry from
the list.

This read barrier guarantees that all values read out of fa are valid.
In particular, the contents of structure pointed to by fa->fa_info is
initialised before fa->fa_info is actually set (see fn_trie_insert);
the setting of fa->fa_info itself is further separated with a write
barrier from the insertion of fa into the list.

Therefore by taking a read barrier after obtaining fa from the list
(which is given by list_for_each_entry_rcu), we can be sure that
fa->fa_info contains a valid pointer, as well as the fact that the
data pointed to by fa->fa_info is itself valid.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-26 00:25:03 -02:00
Harald Welte
eed75f191d [NETFILTER] ip_conntrack: Make "hashsize" conntrack parameter writable
It's fairly simple to resize the hash table, but currently you need to
remove and reinsert the module.  That's bad (we lose connection
state).  Harald has even offered to write a daemon which sets this
based on load.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-26 00:19:27 -02:00
Stephen Hemminger
d50a6b56f0 [PKTGEN]: proc interface revision
The code to handle the /proc interface can be cleaned up in several places:
* use seq_file for read
* don't need to remember all the filenames separately
* use for_online_cpu's
* don't vmalloc a buffer for small command from user.

Committer note:
This patch clashed with John Hawkes's "[NET]: Wider use of for_each_*cpu()",
so I fixed it up manually.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-26 00:12:18 -02:00
Stephen Hemminger
b4099fab75 [PKTGEN]: Spelling and white space
Fix some cosmetic issues. Indentation, spelling errors, and some whitespace.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-26 00:08:10 -02:00
Stephen Hemminger
2845b63b50 [PKTGEN]: Use kzalloc
These are cleanup patches for pktgen that can go in 2.6.15
Can use kzalloc in a couple of places.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-26 00:05:32 -02:00
Stephen Hemminger
b7c8921bf1 [PKTGEN]: Sleeping function called under lock
pktgen is calling kmalloc GFP_KERNEL and vmalloc with lock held.
The simplest fix is to turn the lock into a semaphore, since the
thread lock is only used for admin control from user context.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-26 00:03:12 -02:00
John Hawkes
670c02c2bf [NET]: Wider use of for_each_*cpu()
In 'net' change the explicit use of for-loops and NR_CPUS into the
general for_each_cpu() or for_each_online_cpu() constructs, as
appropriate.  This widens the scope of potential future optimizations
of the general constructs, as well as takes advantage of the existing
optimizations of first_cpu() and next_cpu(), which is advantageous
when the true CPU count is much smaller than NR_CPUS.

Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-25 23:54:01 -02:00
Patrick Caulfield
900e0143a5 [DECNET]: Remove some redundant ifdeffed code
Signed-off-by: Patrick Caulfield <patrick@tykepenguin.com>
Signed-off-by: Steven Whitehouse <steve@chygwyn.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-25 23:49:29 -02:00
Jochen Friedrich
5ac660ee13 [TR]: Preserve RIF flag even for 2 byte RIF fields.
Signed-off-by: Jochen Friedrich <jochen@scram.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-25 21:31:38 -02:00
Yan Zheng
4ea6a8046b [IPV6]: Fix refcnt of struct ip6_flowlabel
Signed-off-by: Yan Zheng <yanzheng@21cn.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-25 21:17:52 -02:00
Herbert Xu
49636bb128 [NEIGH] Fix timer leak in neigh_changeaddr
neigh_changeaddr attempts to delete neighbour timers without setting
nud_state.  This doesn't work because the timer may have already fired
when we acquire the write lock in neigh_changeaddr.  The result is that
the timer may keep firing for quite a while until the entry reaches
NEIGH_FAILED.

It should be setting the nud_state straight away so that if the timer
has already fired it can simply exit once we relinquish the lock.

In fact, this whole function is simply duplicating the logic in
neigh_ifdown which in turn is already doing the right thing when
it comes to deleting timers and setting nud_state.

So all we have to do is take that code out and put it into a common
function and make both neigh_changeaddr and neigh_ifdown call it.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2005-10-23 17:18:00 +10:00
Herbert Xu
6fb9974f49 [NEIGH] Fix add_timer race in neigh_add_timer
neigh_add_timer cannot use add_timer unconditionally.  The reason is that
by the time it has obtained the write lock someone else (e.g., neigh_update)
could have already added a new timer.

So it should only use mod_timer and deal with its return value accordingly.

This bug would have led to rare neighbour cache entry leaks.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2005-10-23 16:37:48 +10:00
Herbert Xu
203755029e [NEIGH] Print stack trace in neigh_add_timer
Stack traces are very helpful in determining the exact nature of a bug.
So let's print a stack trace when the timer is added twice.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2005-10-23 16:11:39 +10:00
Julian Anastasov
c98d80edc8 [SK_BUFF]: ipvs_property field must be copied
IPVS used flag NFC_IPVS_PROPERTY in nfcache but as now nfcache was removed the
new flag 'ipvs_property' still needs to be copied. This patch should be
included in 2.6.14.

Further comments from Harald Welte:

Sorry, seems like the bug was introduced by me.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-22 17:06:01 -02:00
Michael Buesch
d3f7bf4fa9 ieee80211 subsystem:
* Use GFP mask on TX skb allocation.
* Use the tx_headroom and reserve requested space.

Signed-off-by: Michael Buesch <mbuesch@freenet.de>
Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
2005-10-21 13:00:28 -05:00
Herbert Xu
b2cc99f04c [TCP] Allow len == skb->len in tcp_fragment
It is legitimate to call tcp_fragment with len == skb->len since
that is done for FIN packets and the FIN flag counts as one byte.
So we should only check for the len > skb->len case.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-20 17:13:13 -02:00
Herbert Xu
49c5bfaffe [DCCP]: Clear the IPCB area
Turns out the problem has nothing to do with use-after-free or double-free.
It's just that we're not clearing the CB area and DCCP unlike TCP uses a CB
format that's incompatible with IP.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ian McDonald <imcdnzl@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-20 14:49:59 -02:00
Herbert Xu
ffa29347df [DCCP]: Make dccp_write_xmit always free the packet
icmp_send doesn't use skb->sk at all so even if skb->sk has already
been freed it can't cause crash there (it would've crashed somewhere
else first, e.g., ip_queue_xmit).

I found a double-free on an skb that could explain this though.
dccp_sendmsg and dccp_write_xmit are a little confused as to what
should free the packet when something goes wrong.  Sometimes they
both go for the ball and end up in each other's way.

This patch makes dccp_write_xmit always free the packet no matter
what.  This makes sense since dccp_transmit_skb which in turn comes
from the fact that ip_queue_xmit always frees the packet.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-20 14:44:29 -02:00
Herbert Xu
fda0fd6c5b [DCCP]: Use skb_set_owner_w in dccp_transmit_skb when skb->sk is NULL
David S. Miller <davem@davemloft.net> wrote:
> One thing you can probably do for this bug is to mark data packets
> explicitly somehow, perhaps in the SKB control block DCCP already
> uses for other data.  Put some boolean in there, set it true for
> data packets.  Then change the test in dccp_transmit_skb() as
> appropriate to test the boolean flag instead of "skb_cloned(skb)".

I agree.  In fact we already have that flag, it's called skb->sk.
So here is patch to test that instead of skb_cloned().

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Ian McDonald <imcdnzl@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-20 14:25:28 -02:00
Hong Liu
f0f15ab554 Fixed oops if an uninitialized key is used for encryption.
Without this patch, if you try and use a key that has not been
configured, for example:

% iwconfig eth1 key deadbeef00 [2]

without having configured key [1], then the active key will still be
[1], but privacy will now be enabled.  Transmission of a packet in this
situation will result in a kernel oops.

Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
2005-10-20 11:06:36 -05:00
Hong Liu
5b74eda78d Fixed problem with not being able to decrypt/encrypt broadcast packets.
Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
2005-10-19 16:49:03 -05:00
J. Bruce Fields
a0857d03b2 RPCSEC_GSS: krb5 cleanup
Remove some senseless wrappers.

 Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 23:19:47 -07:00
J. Bruce Fields
00fd6e1425 RPCSEC_GSS remove all qop parameters
Not only are the qop parameters that are passed around throughout the gssapi
 unused by any currently implemented mechanism, but there appears to be some
 doubt as to whether they will ever be used.  Let's just kill them off for now.

 Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 23:19:47 -07:00
J. Bruce Fields
14ae162c24 RPCSEC_GSS: Add support for privacy to krb5 rpcsec_gss mechanism.
Add support for privacy to the krb5 rpcsec_gss mechanism.

 Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 23:19:46 -07:00
J. Bruce Fields
bfa91516b5 RPCSEC_GSS: krb5 pre-privacy cleanup
The code this was originally derived from processed wrap and mic tokens using
 the same functions.  This required some contortions, and more would be required
 with the addition of xdr_buf's, so it's better to separate out the two code
 paths.

 In preparation for adding privacy support, remove the last vestiges of the
 old wrap token code.

 Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 23:19:45 -07:00
J. Bruce Fields
f7b3af64c6 RPCSEC_GSS: Simplify rpcsec_gss crypto code
Factor out some code that will be shared by privacy crypto routines

 Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 23:19:45 -07:00
J. Bruce Fields
2d2da60c63 RPCSEC_GSS: client-side privacy support
Add the code to the client side to handle privacy.  This is dead code until
 we actually add privacy support to krb5.

 Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 23:19:44 -07:00
J. Bruce Fields
24b2605bec RPCSEC_GSS: cleanup au_rslack calculation
Various xdr encode routines use au_rslack to guess where the reply argument
 will end up, so we can set up the xdr_buf to recieve data into the right place
 for zero copy.

 Currently we calculate the au_rslack estimate when we check the verifier.
 Normally this only depends on the verifier size.  In the integrity case we add
 a few bytes to allow for a length and sequence number.

 It's a bit simpler to calculate only the verifier size when we check the
 verifier, and delay the full calculation till we unwrap.

 Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 23:19:44 -07:00
J. Bruce Fields
f3680312a7 SUNRPC: Retry wrap in case of memory allocation failure.
For privacy we need to allocate extra pages to hold encrypted page data when
 wrapping requests.  This allocation may fail, and we handle that case by
 waiting and retrying.

 Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 23:19:43 -07:00
J. Bruce Fields
ead5e1c26f SUNRPC: Provide a callback to allow free pages allocated during xdr encoding
For privacy, we need to allocate pages to store the encrypted data (passed
 in pages can't be used without the risk of corrupting data in the page cache).
 So we need a way to free that memory after the request has been transmitted.

 Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 23:19:43 -07:00
J. Bruce Fields
293f1eb551 SUNRPC: Add support for privacy to generic gss-api code.
Add support for privacy to generic gss-api code.  This is dead code until we
 have both a mechanism that supports privacy and code in the client or server
 that uses it.

 Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 23:19:42 -07:00
Steve Dickson
747c5534c9 RPC: stops the release_pipe() funtion from being called twice
This patch stops the release_pipe() funtion from being called
 twice by invalidating the ops pointer in the rpc_inode
 when rpc_pipe_release() is called.

 Signed-off-by: Steve Dickson <steved@redhat.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 23:19:40 -07:00
Jiri Benc
757d18faee [PATCH] ieee80211: division by zero fix
This fixes division by zero bug in ieee80211_wx_get_scan().

Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-10-18 17:25:36 -04:00
Trond Myklebust
5e5ce5be6f RPC: allow call_encode() to delay transmission of an RPC call.
Currently, call_encode will cause the entire RPC call to abort if it returns
 an error. This is unnecessarily rigid, and gets in the way of attempts
 to allow the NFSv4 layer to order RPC calls that carry sequence ids.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 14:20:11 -07:00
Chuck Lever
ea635a517e SUNRPC: Retry rpcbind requests if the server's portmapper isn't up
After a server crash/reboot, rebinding should always retry, otherwise
 requests on "hard" mounts will fail when they shouldn't.

 Test plan:
 Run a lock-intensive workload against a server while rebooting the server
 repeatedly.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18 14:20:10 -07:00
Jeff Garzik
28af493cd7 Merge branch 'master' 2005-10-18 17:14:17 -04:00
Trond Myklebust
cff6bf9709 Merge /home/trondmy/scm/kernel/git/torvalds/linux-2.6 2005-10-18 13:50:52 -07:00
Andrew Morton
e6850cce8f [NETFILTER]: Fix ip6_table.c build with NETFILTER_DEBUG enabled.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-15 16:15:38 -07:00
Jeff Garzik
59aee3c2a1 Merge branch 'master' 2005-10-13 21:22:27 -04:00
Herbert Xu
046d20b739 [TCP]: Ratelimit debugging warning.
Better safe than sorry.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-13 14:42:24 -07:00
Andi Kleen
34cb711ba9 [NET]: Disable NET_SCH_CLK_CPU for SMP x86 hosts
Opterons with frequency scaling have fully unsynchronized TSCs
running at different frequencies, so using TSCs there is not a good idea. 
Also some other x86 boxes have this problem. gettimeofday should be good 
enough, so just disable it.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-13 14:41:44 -07:00
David S. Miller
c8923c6b85 [NETFILTER]: Fix OOPSes on machines with discontiguous cpu numbering.
Original patch by Harald Welte, with feedback from Herbert Xu
and testing by Sbastien Bernard.

EBTABLES, ARP tables, and IP/IP6 tables all assume that cpus
are numbered linearly.  That is not necessarily true.

This patch fixes that up by calculating the largest possible
cpu number, and allocating enough per-cpu structure space given
that.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-13 14:41:23 -07:00
Herbert Xu
9ff5c59ce2 [TCP]: Add code to help track down "BUG at net/ipv4/tcp_output.c:438!"
This is the second report of this bug.  Unfortunately the first
reporter hasn't been able to reproduce it since to provide more
debugging info.

So let's apply this patch for 2.6.14 to

1) Make this non-fatal.
2) Provide the info we need to track it down.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-12 15:59:39 -07:00
Stephen Hemminger
ab4060e858 [BRIDGE]: fix race on bridge del if
This fixes the RCU race on bridge delete interface.  Basically,
the network device has to be detached from the bridge in the first
step (pre-RCU), rather than later. At that point, no more bridge traffic
will come in, and the other code will not think that network device
is part of a bridge.

This should also fix the XEN test problems.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-12 15:10:01 -07:00
Arnaldo Carvalho de Melo
eeb2b85606 [TWSK]: Grab the module refcount for timewait sockets
This is required to avoid unloading a module that has active timewait
sockets, such as DCCP.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10 21:25:23 -07:00
Arnaldo Carvalho de Melo
2a9bc9bb4d [DCCP]: Transition from PARTOPEN to OPEN when receiving DATA packets
Noticed by Andrea Bittau, that provided a patch that was modified to
not transition from RESPOND to OPEN when receiving DATA packets.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10 21:25:00 -07:00
Arnaldo Carvalho de Melo
777b25a2fe [CCID]: Check if ccid is NULL in the hc_[tr]x_exit functions
For consistency with ccid_exit and to fix a bug when
IP_DCCP_UNLOAD_HACK is enabled as the control sock is not associated
to any CCID.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10 21:24:20 -07:00
Pablo Neira Ayuso
061cb4a0ec [NETFILTER] ctnetlink: add support to change protocol info
This patch add support to change the state of the private protocol
information via conntrack_netlink.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10 21:23:46 -07:00
Pablo Neira Ayuso
3392315375 [NETFILTER] ctnetlink: allow userspace to change TCP state
This patch adds the ability of changing the state a TCP connection. I know
that this must be used with care but it's required to provide a complete
conntrack creation via conntrack_netlink. So I'll document this aspect on
the upcoming docs.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10 21:23:28 -07:00
Harald Welte
a051a8f730 [NETFILTER]: Use only 32bit counters for CONNTRACK_ACCT
Initially we used 64bit counters for conntrack-based accounting, since we
had no event mechanism to tell userspace that our counters are about to
overflow.  With nfnetlink_conntrack, we now have such a event mechanism and
thus can save 16bytes per connection.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10 21:21:10 -07:00
Herbert Xu
d4875b049b [IPSEC] Fix block size/MTU bugs in ESP
This patch fixes the following bugs in ESP:

* Fix transport mode MTU overestimate.  This means that the inner MTU
  is smaller than it needs be.  Worse yet, given an input MTU which
  is a multiple of 4 it will always produce an estimate which is not
  a multiple of 4.

  For example, given a standard ESP/3DES/MD5 transform and an MTU of
  1500, the resulting MTU for transport mode is 1462 when it should
  be 1464.

  The reason for this is because IP header lengths are always a multiple
  of 4 for IPv4 and 8 for IPv6.

* Ensure that the block size is at least 4.  This is required by RFC2406
  and corresponds to what the esp_output function does.  At the moment
  this only affects crypto_null as its block size is 1.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10 21:11:34 -07:00
Herbert Xu
a02a64223e [IPSEC]: Use ALIGN macro in ESP
This patch uses the macro ALIGN in all the applicable spots for ESP.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10 21:11:08 -07:00
Pablo Neira Ayuso
e1c73b78e3 [NETFILTER] ctnetlink: add one nesting level for TCP state
To keep consistency, the TCP private protocol information is nested
attributes under CTA_PROTOINFO_TCP. This way the sequence of attributes to
access the TCP state information looks like here below:

CTA_PROTOINFO
CTA_PROTOINFO_TCP
CTA_PROTOINFO_TCP_STATE

instead of:

CTA_PROTOINFO
CTA_PROTOINFO_TCP_STATE

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10 20:55:49 -07:00
Pablo Neira Ayuso
a1bcc3f268 [NETFILTER] ctnetlink: ICMP ID is not mandatory
The ID is only required by ICMP type 8 (echo), so it's not
mandatory for all sort of ICMP connections. This patch makes
mandatory only the type and the code for ICMP netlink messages.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10 20:53:16 -07:00
Harald Welte
d000eaf772 [NETFILTER] conntrack_netlink: Fix endian issue with status from userspace
When we send "status" from userspace, we forget to convert the endianness.
This patch adds the reqired conversion.  Thanks to Pablo Neira for
discovering this.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10 20:52:51 -07:00
Harald Welte
ebe0bbf06c [NETFILTER] nfnetlink: use highest bit of nfa_type to indicate nested TLV
As Henrik Nordstrom pointed out, all our efforts with "split endian" (i.e.
host byte order tags, net byte order values) are useless, unless a parser
can determine whether an attribute is nested or not.

This patch steals the highest bit of nfattr.nfa_type to indicate whether
the data payload contains a nested nfattr (1) or not (0).

This will break userspace compatibility, but luckily no kernel with
nfnetlink was released so far.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10 20:52:19 -07:00
Harald Welte
f40863cec8 [NETFILTER] ipt_ULOG: Mark ipt_ULOG as OBSOLETE
Similar to nfnetlink_queue and ip_queue, we mark ipt_ULOG as obsolete.
This should have been part of the original nfnetlink_log merge, but
I somehow missed it.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10 20:51:53 -07:00
Harald Welte
85d9b05d9b [NETFILTER] PPTP helper: Add missing Kconfig dependency
PPTP should not be selectable without conntrack enabled

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10 20:47:42 -07:00
Al Viro
dd0fc66fb3 [PATCH] gfp flags annotations - part 1
- added typedef unsigned int __nocast gfp_t;

 - replaced __nocast uses for gfp flags with gfp_t - it gives exactly
   the same warnings as far as sparse is concerned, doesn't change
   generated code (from gcc point of view we replaced unsigned int with
   typedef) and documents what's going on far better.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-08 15:00:57 -07:00
Jean-Denis Boyer
4f55cd105c [ATM]: [br2684] if we free the skb, we should return 0
From: "Jean-Denis Boyer" <jdboyer@mediatrix.com>
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-07 13:44:35 -07:00
Eric Kinzie
0f21ba7cc3 [ATM]: add support for LECS addresses learned from network
From: Eric Kinzie <ekinzie@cmf.nrl.navy.mil>
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-06 22:19:28 -07:00
Ivan Skytte Jrgensen
5fe467ee97 [SCTP] Fix sctp_get{pl}addrs() API to work with 32-bit apps on 64-bit kernels.
The old socket options are marked with a _OLD suffix so that the
existing 32-bit apps on 32-bit kernels do not break.

Signed-off-by: Ivan Skytte Jrgensen <isj-sctp@i1.dk>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-06 21:36:17 -07:00
Ralf Baechle
3a867b36c3 [AX.25]: Fix packet socket crash
Since changeset 98a82febb6 AX.25 is passing
received IP and ARP packets to the stack through netif_rx() but we don't
set the skb->mac.raw to right value which may result in a crash with
applications that use a packet socket.

Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-05 12:16:04 -07:00
Herbert Xu
77d8d7a684 [IPSEC]: Document that policy direction is derived from the index.
Here is a patch that adds a helper called xfrm_policy_id2dir to
document the fact that the policy direction can be and is derived
from the index.

This is based on a patch by YOSHIFUJI Hideaki and 210313105@suda.edu.cn.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-05 12:15:12 -07:00
YOSHIFUJI Hideaki
140e26fcd5 [IPV6]: Fix NS handing for proxy/anycast address
Timer set up by pneigh_enqueue() ended up calling ndisc_rcv()
via pndisc_redo(), which clears LOCALLY_ENQUEUED flag in
NEIGH_CB(skb) and NS was queued again.
Let's call ndisc_recv_ns() directly to avoid the loop.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-05 12:11:41 -07:00
Stephen Hemminger
42a39450f8 [TCP]: BIC coding bug in Linux 2.6.13
Missing parenthesis in causes BIC to be slow in increasing congestion
window.

Spotted by Injong Rhee.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-05 12:09:31 -07:00
Yan Zheng
fab10fe37a [MCAST] ipv6: Fix address size in grec_size
Signed-Off-By: Yan Zheng <yanzheng@21cn.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-05 12:08:13 -07:00
Jeff Garzik
0d69ae5fb7 Merge branch 'master' 2005-10-05 02:11:33 -04:00
Randy Dunlap
83fa3400eb [XFRM]: fix sparse gfp nocast warnings
Fix implicit nocast warnings in xfrm code:
net/xfrm/xfrm_policy.c:232:47: warning: implicit cast to nocast type

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-04 22:45:35 -07:00
Randy Dunlap
dd13a285b7 [RPC]: fix sparse gfp nocast warnings
Fix nocast sparse warnings:
net/rxrpc/call.c:2013:25: warning: implicit cast to nocast type
net/rxrpc/connection.c:538:46: warning: implicit cast to nocast type
net/sunrpc/sched.c:730:36: warning: implicit cast to nocast type
net/sunrpc/sched.c:734:56: warning: implicit cast to nocast type

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-04 22:44:45 -07:00
Randy Dunlap
00fa023345 [AF_KEY]: fix sparse gfp nocast warnings
Fix implicit nocast warnings in net/key code:
net/key/af_key.c:195:27: warning: implicit cast to nocast type
net/key/af_key.c:1439:28: warning: implicit cast to nocast type

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-04 22:43:04 -07:00
Randy Dunlap
c6f4fafccf [NETFILTER]: fix sparse gfp nocast warnings
Fix implicit nocast warnings in nfnetlink code:
net/netfilter/nfnetlink.c:204:43: warning: implicit cast to nocast type

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-04 22:42:42 -07:00
Randy Dunlap
8eea00a44d [IPVS]: fix sparse gfp nocast warnings
From: Randy Dunlap <rdunlap@xenotime.net>

Fix implicit nocast warnings in ip_vs code:
net/ipv4/ipvs/ip_vs_app.c:631:54: warning: implicit cast to nocast type

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-04 22:42:15 -07:00
Randy Dunlap
f4a19a56e3 [DECNET]: fix sparse gfp nocast warnings
Fix implicit nocast warnings in decnet code:
net/decnet/af_decnet.c:458:40: warning: implicit cast to nocast type
net/decnet/dn_nsp_out.c:125:35: warning: implicit cast to nocast type
net/decnet/dn_nsp_out.c:219:29: warning: implicit cast to nocast type

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-04 22:41:48 -07:00
Randy Dunlap
7b5b3f3d82 [ATM]: fix sparse gfp nocast warnings
Fix implicit nocast warnings in atm code:
net/atm/atm_misc.c:35:44: warning: implicit cast to nocast type
drivers/atm/fore200e.c:183:33: warning: implicit cast to nocast type

Also use kzalloc() instead of kmalloc().

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-04 22:38:44 -07:00
Horst H. von Brand
a5181ab06d [NETFILTER]: Fix Kconfig typo
Signed-off-by: Horst H. von Brand <vonbrand@inf.utfsm.cl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-04 15:58:56 -07:00
Robert Olsson
e6308be85a [IPV4]: fib_trie root-node expansion
The patch below introduces special thresholds to keep root node in the trie 
large. This gives a flatter tree at the cost of a modest memory increase.
Overall it seems to be gain and this was also proposed by one the authors 
of the paper in recent a seminar.

Main table after loading 123 k routes.

	Aver depth:     3.30
	Max depth:      9
        Root-node size  12 bits
        Total size: 4044  kB

With the patch:
	Aver depth:     2.78
	Max depth:      8
        Root-node size  15 bits
        Total size: 4150  kB

An increase of 8-10% was seen in forwading performance for an rDoS attack. 

Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-04 13:01:58 -07:00
YOSHIFUJI Hideaki
87bf9c97b4 [IPV6]: Fix infinite loop in udp_v6_get_port().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-04 13:00:39 -07:00
Jeff Garzik
13d1ef29bc Merge rsync://bughost.org/repos/ieee80211-delta/ 2005-10-04 08:22:13 -04:00
Jeff Garzik
d9e34325fd Merge branch 'upstream-fixes' 2005-10-04 05:30:02 -04:00
Randy Dunlap
f36a29d567 [PATCH] ieee80211: fix gfp flags type
Fix implicit nocast warnings in ieee80211 code, including __nocast:
net/ieee80211/ieee80211_tx.c:215:9: warning: implicit cast to nocast type

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-10-04 05:29:48 -04:00
Jeff Garzik
3c8c7b2f32 Merge branch 'upstream-fixes' 2005-10-03 22:06:19 -04:00
Randy Dunlap
8cb6108bae [PATCH] ieee80211: fix gfp flags type
Fix implicit nocast warnings in ieee80211 code:
net/ieee80211/ieee80211_tx.c:215:9: warning: implicit cast to nocast type

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-10-03 22:01:14 -04:00
David S. Miller
7ce312467e [IPV4]: Update icmp sysctl docs and disable broadcast ECHO/TIMESTAMP by default
It's not a good idea to be smurf'able by default.
The few people who need this can turn it on.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-03 16:07:30 -07:00
Herbert Xu
3e56a40bb3 [IPV4]: Get rid of bogus __in_put_dev in pktgen
This patch gets rid of a bogus __in_dev_put() in pktgen.c.  This was
spotted by Suzanne Wood.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-03 14:36:32 -07:00
Herbert Xu
e5ed639913 [IPV4]: Replace __in_dev_get with __in_dev_get_rcu/rtnl
The following patch renames __in_dev_get() to __in_dev_get_rtnl() and
introduces __in_dev_get_rcu() to cover the second case.

1) RCU with refcnt should use in_dev_get().
2) RCU without refcnt should use __in_dev_get_rcu().
3) All others must hold RTNL and use __in_dev_get_rtnl().

There is one exception in net/ipv4/route.c which is in fact a pre-existing
race condition.  I've marked it as such so that we remember to fix it.

This patch is based on suggestions and prior work by Suzanne Wood and
Paul McKenney.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-03 14:35:55 -07:00
David S. Miller
a5e7c210fe [IPV6]: Fix leak added by udp connect dst caching fix.
Based upon a patch from Mitsuru KANDA <mk@linux-ipv6.org>

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-03 14:21:58 -07:00
Yan Zheng
f36d6ab182 [IPV6]: Fix ipv6 fragment ID selection at slow path
Signed-Off-By: Yan Zheng <yanzheng@21cn.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-03 14:19:15 -07:00
Herbert Xu
444fc8fc3a [IPV4]: Fix "Proxy ARP seems broken"
Meelis Roos <mroos@linux.ee> wrote:
> RK> My firewall setup relies on proxyarp working.  However, with 2.6.14-rc3,
> RK> it appears to be completely broken.  The firewall is 212.18.232.186,
> 
> Same here with some kernel between 14-rc2 and 14-rc3 - no reposnse to
> ARP on a proxyarp gateway. Sorry, no exact revison and no more debugging
> yet since it'a a production gateway.

The breakage is caused by the change to use the CB area for flagging
whether a packet has been queued due to proxy_delay.  This area gets
cleared every time arp_rcv gets called.  Unfortunately packets delayed
due to proxy_delay also go through arp_rcv when they are reprocessed.

In fact, I can't think of a reason why delayed proxy packets should go
through netfilter again at all.  So the easiest solution is to bypass
that and go straight to arp_process.

This is essentially what would've happened before netfilter support
was added to ARP.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> 
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-03 14:18:10 -07:00
Russell King
496a22b08f [NET]: Fix "sysctl_net.c:36: error: 'core_table' undeclared here"
During the build for ARM machine type "fortunet", this error occurred:

  CC      net/sysctl_net.o
net/sysctl_net.c:36: error: 'core_table' undeclared here (not in a function)

It appears that the following configuration settings cause this error
due to a missing include:
CONFIG_SYSCTL=y
CONFIG_NET=y
# CONFIG_INET is not set

core_table appears to be declared in net/sock.h.  if CONFIG_INET were
defined, net/sock.h would have been included via:
  sysctl_net.c -> net/ip.h -> linux/ip.h -> net/sock.h

so include it directly.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-03 14:16:34 -07:00
Eric Dumazet
81c3d5470e [INET]: speedup inet (tcp/dccp) lookups
Arnaldo and I agreed it could be applied now, because I have other
pending patches depending on this one (Thank you Arnaldo)

(The other important patch moves skc_refcnt in a separate cache line,
so that the SMP/NUMA performance doesnt suffer from cache line ping pongs)

1) First some performance data :
--------------------------------

tcp_v4_rcv() wastes a *lot* of time in __inet_lookup_established()

The most time critical code is :

sk_for_each(sk, node, &head->chain) {
     if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif))
         goto hit; /* You sunk my battleship! */
}

The sk_for_each() does use prefetch() hints but only the begining of
"struct sock" is prefetched.

As INET_MATCH first comparison uses inet_sk(__sk)->daddr, wich is far
away from the begining of "struct sock", it has to bring into CPU
cache cold cache line. Each iteration has to use at least 2 cache
lines.

This can be problematic if some chains are very long.

2) The goal
-----------

The idea I had is to change things so that INET_MATCH() may return
FALSE in 99% of cases only using the data already in the CPU cache,
using one cache line per iteration.

3) Description of the patch
---------------------------

Adds a new 'unsigned int skc_hash' field in 'struct sock_common',
filling a 32 bits hole on 64 bits platform.

struct sock_common {
	unsigned short		skc_family;
	volatile unsigned char	skc_state;
	unsigned char		skc_reuse;
	int			skc_bound_dev_if;
	struct hlist_node	skc_node;
	struct hlist_node	skc_bind_node;
	atomic_t		skc_refcnt;
+	unsigned int		skc_hash;
	struct proto		*skc_prot;
};

Store in this 32 bits field the full hash, not masked by (ehash_size -
1) Using this full hash as the first comparison done in INET_MATCH
permits us immediatly skip the element without touching a second cache
line in case of a miss.

Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to
sk_hash and tw_hash) already contains the slot number if we mask with
(ehash_size - 1)

File include/net/inet_hashtables.h

64 bits platforms :
#define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
     (((__sk)->sk_hash == (__hash))
     ((*((__u64 *)&(inet_sk(__sk)->daddr)))== (__cookie))   &&  \
     ((*((__u32 *)&(inet_sk(__sk)->dport))) == (__ports))   &&  \
     (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))

32bits platforms:
#define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
     (((__sk)->sk_hash == (__hash))                 &&  \
     (inet_sk(__sk)->daddr          == (__saddr))   &&  \
     (inet_sk(__sk)->rcv_saddr      == (__daddr))   &&  \
     (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))


- Adds a prefetch(head->chain.first) in 
__inet_lookup_established()/__tcp_v4_check_established() and 
__inet6_lookup_established()/__tcp_v6_check_established() and 
__dccp_v4_check_established() to bring into cache the first element of the 
list, before the {read|write}_lock(&head->lock);

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-03 14:13:38 -07:00
Herbert Xu
325ed82393 [NET]: Fix packet timestamping.
I've found the problem in general.  It affects any 64-bit
architecture.  The problem occurs when you change the system time.

Suppose that when you boot your system clock is forward by a day.
This gets recorded down in skb_tv_base.  You then wind the clock back
by a day.  From that point onwards the offset will be negative which
essentially overflows the 32-bit variables they're stored in.

In fact, why don't we just store the real time stamp in those 32-bit
variables? After all, we're not going to overflow for quite a while
yet.

When we do overflow, we'll need a better solution of course.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-03 13:57:23 -07:00
James Ketrenos
ff0037b259 Lindent and trailing whitespace script executed ieee80211 subsystem
Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
2005-10-03 10:23:42 -05:00
Ivo van Doorn
c1bda44a4a When an assoc_resp is received the network structure is not completely
initialized which can cause problems for drivers that expect the network
structure to be completely filled in.

This patch will make sure the network is filled in as much as possible.

Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
2005-10-03 10:20:47 -05:00
Ivo van Doorn
ff9e00f1b0 Currently the info_element is parsed by 2 seperate functions, this
results in a lot of duplicate code.

This will move the parsing stage into a seperate function.

Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
2005-10-03 10:19:25 -05:00
Randy Dunlap
e846cbb112 Fix implicit nocast warnings in ieee80211 code:
net/ieee80211/ieee80211_tx.c:215:9: warning: implicit cast to nocast type

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
2005-10-03 10:02:14 -05:00
Ivo van Doorn
7c254d3dba This will move the ieee80211_is_ofdm_rate function to the ieee80211.h
header, and I also added the ieee80211_is_cck_rate counterpart.

Various drivers currently create there own version of these functions,
but I guess the ieee80211 stack is the best place to provide such
routines.

Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
2005-10-03 09:50:40 -05:00
Scott Talbert
75b895c15b [ATM]: [lec] reset retry counter when new arp issued
From: Scott Talbert <scott.talbert@lmco.com>
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-29 17:31:30 -07:00
Scott Talbert
4a7097fcc4 [ATM]: [lec] attempt to support cisco failover
From: Scott Talbert <scott.talbert@lmco.com>
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-29 17:30:54 -07:00
Alexey Kuznetsov
09e9ec8711 [TCP]: Don't over-clamp window in tcp_clamp_window()
From: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>

Handle better the case where the sender sends full sized
frames initially, then moves to a mode where it trickles
out small amounts of data at a time.

This known problem is even mentioned in the comments
above tcp_grow_window() in tcp_input.c, specifically:

...
 * The scheme does not work when sender sends good segments opening
 * window and then starts to feed us spagetti. But it should work
 * in common situations. Otherwise, we have to rely on queue collapsing.
...

When the sender gives full sized frames, the "struct sk_buff" overhead
from each packet is small.  So we'll advertize a larger window.
If the sender moves to a mode where small segments are sent, this
ratio becomes tilted to the other extreme and we start overrunning
the socket buffer space.

tcp_clamp_window() tries to address this, but it's clamping of
tp->window_clamp is a wee bit too aggressive for this particular case.

Fix confirmed by Ion Badulescu.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-29 17:17:15 -07:00
David S. Miller
01ff367e62 [TCP]: Revert 6b251858d3
But retain the comment fix.

Alexey Kuznetsov has explained the situation as follows:

--------------------

I think the fix is incorrect. Look, the RFC function init_cwnd(mss) is
not continuous: f.e. for mss=1095 it needs initial window 1095*4, but
for mss=1096 it is 1096*3. We do not know exactly what mss sender used
for calculations. If we advertised 1096 (and calculate initial window
3*1096), the sender could limit it to some value < 1096 and then it
will need window his_mss*4 > 3*1096 to send initial burst.

See?

So, the honest function for inital rcv_wnd derived from
tcp_init_cwnd() is:

	init_rcv_wnd(mss)=
	  min { init_cwnd(mss1)*mss1 for mss1 <= mss }

It is something sort of:

	if (mss < 1096)
		return mss*4;
	if (mss < 1096*2)
		return 1096*4;
	return mss*2;

(I just scrablled a graph of piece of paper, it is difficult to see or
to explain without this)

I selected it differently giving more window than it is strictly
required.  Initial receive window must be large enough to allow sender
following to the rfc (or just setting initial cwnd to 2) to send
initial burst.  But besides that it is arbitrary, so I decided to give
slack space of one segment.

Actually, the logic was:

If mss is low/normal (<=ethernet), set window to receive more than
initial burst allowed by rfc under the worst conditions
i.e. mss*4. This gives slack space of 1 segment for ethernet frames.

For msses slighlty more than ethernet frame, take 3. Try to give slack
space of 1 frame again.

If mss is huge, force 2*mss. No slack space.

Value 1460*3 is really confusing. Minimal one is 1096*2, but besides
that it is an arbitrary value. It was meant to be ~4096. 1460*3 is
just the magic number from RFC, 1460*3 = 1095*4 is the magic :-), so
that I guess hands typed this themselves.

--------------------

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-29 17:07:20 -07:00
Linus Torvalds
eb693d2994 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2005-09-29 08:56:47 -07:00
Al Viro
666002218d [PATCH] proc_mkdir() should be used to create procfs directories
A bunch of create_proc_dir_entry() calls creating directories had crept
in since the last sweep; converted to proc_mkdir().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-29 08:46:26 -07:00
David S. Miller
01d40f28b1 [NET]: Fix reversed logic in eth_type_trans().
I got the second compare_eth_addr() test reversed, oops.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-28 22:37:53 -07:00
Martin Whitaker
735631a919 [ATM]: fix bug in atm address list handling
From: Martin Whitaker <atm@martin-whitaker.co.uk>
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
2005-09-28 16:35:22 -07:00
Chas Williams
9301e320e9 [ATM]: track and close listen sockets when sigd exits
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
2005-09-28 16:35:01 -07:00
Roman Kagan
e2c4b72158 [ATM]: net/atm/ioctl.c: autoload pppoatm and br2684
Signed-off-by: Roman Kagan <rkagan@mail.ru>
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
2005-09-28 16:34:24 -07:00
David S. Miller
6b251858d3 [TCP]: Fix init_cwnd calculations in tcp_select_initial_window()
Match it up to what RFC2414 really specifies.
Noticed by Rick Jones.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-28 16:31:48 -07:00
Oliver Dawid
64233bffbb [APPLETALK]: Fix broadcast bug.
From: Oliver Dawid <oliver@helios.de>

we found a bug in net/appletalk/ddp.c concerning broadcast packets. In 
kernel 2.4 it was working fine. The bug first occured 4 years ago when 
switching to new SNAP layer handling. This bug can be splitted up into a 
sending(1) and reception(2) problem:

Sending(1)
In kernel 2.4 broadcast packets were sent to a matching ethernet device 
and atalk_rcv() was called to receive it as "loopback" (so loopback 
packets were shortcutted and handled in DDP layer).

When switching to the new SNAP structure, this shortcut was removed and 
the loopback packet was send to SNAP layer. The author forgot to replace 
the remote device pointer by the loopback device pointer before sending 
the packet to SNAP layer (by calling ddp_dl->request() ) therfor the 
packet was not sent back by underlying layers to ddp's atalk_rcv().

Reception(2)
In atalk_rcv() a packet received by this loopback mechanism contains now 
the (rigth) loopback device pointer (in Kernel 2.4 it was the (wrong) 
remote ethernet device pointer) and therefor no matching socket will be 
found to deliver this packet to. Because a broadcast packet should be 
send to the first matching socket (as it is done in many other protocols 
(?)), we removed the network comparison in broadcast case.

Below you will find a patch to correct this bug. Its diffed to kernel 
2.6.14-rc1

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-27 16:11:29 -07:00
David S. Miller
ba645c1602 [NET]: Slightly optimize ethernet address comparison.
We know the thing is at least 2-byte aligned, so take
advantage of that instead of invoking memcmp() which
results in truly horrifically inefficient code because
it can't assume anything about alignment.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-27 16:03:05 -07:00
Alexey Dobriyan
520d1b830a [ROSE]: fix typo (regeistration)
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-27 15:45:15 -07:00
Alexey Dobriyan
a83cd2cc90 [ROSE]: check rose_ndevs earlier
* Don't bother with proto registering if rose_ndevs is bad.
* Make escape structure more coherent.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-27 15:44:36 -07:00
Alexey Dobriyan
70ff3b66d7 [ROSE]: return sane -E* from rose_proto_init()
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-27 15:43:46 -07:00
Alexey Dobriyan
c3c4ed652e [ROSE]: do proto_unregister() on exit paths
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-27 15:42:58 -07:00
Frank Filz
a79af59efd [NET]: Fix module reference counts for loadable protocol modules
I have been experimenting with loadable protocol modules, and ran into
several issues with module reference counting.

The first issue was that __module_get failed at the BUG_ON check at
the top of the routine (checking that my module reference count was
not zero) when I created the first socket. When sk_alloc() is called,
my module reference count was still 0. When I looked at why sctp
didn't have this problem, I discovered that sctp creates a control
socket during module init (when the module ref count is not 0), which
keeps the reference count non-zero. This section has been updated to
address the point Stephen raised about checking the return value of
try_module_get().

The next problem arose when my socket init routine returned an error.
This resulted in my module reference count being decremented below 0.
My socket ops->release routine was also being called. The issue here
is that sock_release() calls the ops->release routine and decrements
the ref count if sock->ops is not NULL. Since the socket probably
didn't get correctly initialized, this should not be done, so we will
set sock->ops to NULL because we will not call try_module_get().

While searching for another bug, I also noticed that sys_accept() has
a possibility of doing a module_put() when it did not do an
__module_get so I re-ordered the call to security_socket_accept().

Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-27 15:23:38 -07:00
Eric Dumazet
2d7ceece08 [NET]: Prefetch dev->qdisc_lock in dev_queue_xmit()
We know the lock is going to be taken.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-27 15:22:58 -07:00
Daniel Phillips
bc8dfcb939 [NET]: Use non-recursive algorithm in skb_copy_datagram_iovec()
Use iteration instead of recursion.  Fraglists within fraglists
should never occur, so we BUG check this.

Signed-off-by: Daniel Phillips <phillips@istop.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-27 15:22:35 -07:00
David S. Miller
667347f1ca [NEIGH]: Add debugging check when adding timers.
If we double-add a neighbour entry timer, which should be
impossible but has been reported, dump the current state of
the entry so that we can debug this.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-27 12:07:44 -07:00
David S. Miller
56e9b26324 Merge master.kernel.org:/pub/scm/linux/kernel/git/acme/llc-2.6 2005-09-26 15:29:31 -07:00
Harald Welte
188bab3ae0 [NETFILTER]: Fix invalid module autoloading by splitting iptable_nat
When you've enabled conntrack and NAT as a module (standard case in all
distributions), and you've also enabled the new conntrack netlink
interface, loading ip_conntrack_netlink.ko will auto-load iptable_nat.ko.
This causes a huge performance penalty, since for every packet you iterate
the nat code, even if you don't want it.

This patch splits iptable_nat.ko into the NAT core (ip_nat.ko) and the
iptables frontend (iptable_nat.ko).  Threfore, ip_conntrack_netlink.ko will
only pull ip_nat.ko, but not the frontend.  ip_nat.ko will "only" allocate
some resources, but not affect runtime performance.

This separation is also a nice step in anticipation of new packet filters
(nf-hipac, ipset, pkttables) being able to use the NAT core.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-26 15:25:11 -07:00
David S. Miller
b85daee0e4 [AF_PACKET]: Remove bogus checks added to packet_sendmsg().
These broke existing apps, and the checks are superfluous
as the values being verified aren't even used.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-26 15:23:58 -07:00
Herbert Xu
c62dba9011 [IPV6]: Fix [Bug 5306] Oops on IPv6 route lookup
> Steps to reproduce:
> 1. Boot Linux, do NOT setup any IPv6 routes
> 2. ip route get 2001::1 (or any unroutable address)

Well caught.  We never set rt6i_idev on ip6_null_entry.
This patch should make the problem go away.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-26 15:10:16 -07:00
Alex Williamson
b9d717a7b4 [NET]: Make sure ctl buffer is aligned properly in sys_sendmsg().
It's on the stack and declared as "unsigned char[]", but pointers
and similar can be in here thus we need to give it an explicit
alignment attribute.

Signed-off-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-26 14:28:02 -07:00
Harald Welte
8ddec7460d [NETFILTER] ip_conntrack: Update event cache when status changes
The GRE, SCTP and TCP protocol helpers did not call
ip_conntrack_event_cache() when updating ct->status.  This patch adds
the respective calls.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-24 16:56:08 -07:00
Alexey Dobriyan
8689c07e47 [IRDA]: *irttp cleanup
* Remove useless comment.
* Remove useless assertions.
* Remove useless comparison.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-24 16:55:17 -07:00
Alexey Dobriyan
15166fadb0 [IRDA]: Fix memory leak in irttp_init()
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-24 16:54:50 -07:00
Amos Waterland
45fc3b11f1 [NET]: Protect neigh_stat_seq_fops by CONFIG_PROC_FS
From: Amos Waterland <apw@us.ibm.com>

If CONFIG_PROC_FS is not selected, the compiler emits this warning:

 net/core/neighbour.c:64: warning: `neigh_stat_seq_fops' defined but not used

Which is correct, because neigh_stat_seq_fops is in fact only
initialized and used by code that is protected by CONFIG_PROC_FS.  So
this patch fixes that up.

Signed-off-by: Amos Waterland <apw@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-24 16:53:16 -07:00
Harald Welte
d67b24c40f [NETFILTER]: Fix ip[6]t_NFQUEUE Kconfig dependency
We have to introduce a separate Kconfig menu entry for the NFQUEUE targets.
They cannot "just" depend on nfnetlink_queue, since nfnetlink_queue could
be linked into the kernel, whereas iptables can be a module.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-24 16:52:03 -07:00