linux

Author	SHA1	Message	Date
Gerrit Renker	73f18fdbca	dccp: Bug-Fix - AWL was never updated The AWL lower Ack validity window advances in proportion to GSS, the greatest sequence number sent. Updating AWL other than at connection setup (in the DCCP-Request sent by dccp_v{4,6}_connect()) was missing in the DCCP code. This bug lead to syslog messages such as "kernel: dccp_check_seqno: DCCP: Step 6 failed for DATAACK packet, [...] P.ackno exists or LAWL(82947089) <= P.ackno(82948208) <= S.AWH(82948728), sending SYNC..." The difference between AWL/AWH here is 1639 packets, while the expected value (the Sequence Window) would have been 100 (the default). A closer look showed that LAWL = AWL = 82947089 equalled the ISS on the Response. The patch now updates AWL with each increase of GSS. Further changes: ---------------- The patch also enforces more stringent checks on the ISS sequence number: * AWL is initialised to ISS at connection setup and remains at this value; * AWH is then always set to GSS (via dccp_update_gss()); * so on the first Request: AWL = AWH = ISS, and on the n-th Request: AWL = ISS, AWH = ISS + n. As a consequence, only Response packets that refer to Requests sent by this host will pass, all others are discarded. This is the intention and in effect implements the initial adjustments for AWL as specified in RFC 4340, 7.5.1. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-07-26 11:59:10 +01:00
Gerrit Renker	59435444a1	dccp: Allow to distinguish original and retransmitted packets This patch allows the sender to distinguish original and retransmitted packets, which is in particular needed for the retransmission of DCCP-Requests: * the first Request uses ISS (generated in net/dccp/ip.c), and sets GSS = ISS; all retransmitted Requests use GSS' = GSS + 1, so that the n-th retransmitted Request has sequence number ISS + n (mod 48). To add generic support, the patch reorganises existing code so that: * icsk_retransmits == 0 for the original packet and * icsk_retransmits = n > 0 for the n-th retransmitted packet at the time dccp_transmit_skb() is called, via dccp_retransmit_skb(). Thanks to Wei Yongjun for pointing this problem out. Further changes: ---------------- * removed the `skb' argument from dccp_retransmit_skb(), since sk_send_head is used for all retransmissions (the exception is client-Acks in PARTOPEN state, but these do not use sk_send_head); * since sk_send_head always contains the original skb (via dccp_entail()), skb_cloned() never evaluated to true and thus pskb_copy() was never used. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-07-26 11:59:09 +01:00
Ilpo Järvinen	547b792cac	net: convert BUG_TRAP to generic WARN_ON Removes legacy reinvent-the-wheel type thing. The generic machinery integrates much better to automated debugging aids such as kerneloops.org (and others), and is unambiguous due to better naming. Non-intuively BUG_TRAP() is actually equal to WARN_ON() rather than BUG_ON() though some might actually be promoted to BUG_ON() but I left that to future. I could make at least one BUILD_BUG_ON conversion. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-25 21:43:18 -07:00
Pavel Emelyanov	de0744af1f	mib: add net to NET_INC_STATS_BH Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-16 20:31:16 -07:00
Pavel Emelyanov	ca12a1a443	inet: prepare net on the stack for NET accounting macros Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-16 20:28:42 -07:00
Pavel Emelyanov	7c73a6faff	mib: add net to IP_INC_STATS_BH Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-16 20:20:11 -07:00
Pavel Emelyanov	dcfc23cac1	mib: add struct net to ICMP_INC_STATS_BH Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 23:05:29 -07:00
Pavel Emelyanov	fd54d716b1	inet: toss struct net initialization around Some places, that deal with ICMP statistics already have where to get a struct net from, but use it directly, without declaring a separate variable on the stack. Since I will need this net soon, I declare a struct net on the stack and use it in the existing places in a separate patch not to spoil the future ones. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 23:05:26 -07:00
Gerrit Renker	2eeea7ba6b	dccp ccid-3: Length of loss intervals This corrects an error in the computation of the open loss interval I_0: * the interval length is (highest_seqno - start_seqno) + 1 * and not (highest_seqno - start_seqno). This condition was not fully clear in RFC 3448, but reflects the current revision state of rfc3448bis and is also consistent with RFC 4340, 6.1.1. Further changes: ---------------- * variable renamed due to line length constraints; * explicit typecast to `s64' to avoid implicit signed/unsigned casting. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-07-13 11:51:40 +01:00
Gerrit Renker	b552c6231f	dccp ccid-3: Fix a loss detection bug This fixes a bug in the logic of the TFRC loss detection: * new_loss_indicated() should not be called while a loss is pending; * but the code allows this; * thus, for two subsequent gaps in the sequence space, when loss_count has not yet reached NDUPACK=3, the loss_count is falsely reduced to 1. To avoid further and similar problems, all loss handling and loss detection is now done inside tfrc_rx_hist_handle_loss(), using an appropriate routine to track new losses. Further changes: ---------------- * added a reminder that no RX history operations should be performed when rx_handle_loss() has identified a (new) loss, since the function takes care of packet reordering during loss detection; * made tfrc_rx_hist_loss_pending() bool (thanks to an earlier suggestion by Arnaldo); * removed unused functions. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-07-13 11:51:40 +01:00
Gerrit Renker	5b5d0e7048	dccp: Upgrade NDP count from 3 to 6 bytes RFC 4340, 7.7 specifies up to 6 bytes for the NDP Count option, whereas the code is currently limited to up to 3 bytes. This seems to be a relict of an earlier draft version and is brought up to date by the patch. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-07-13 11:51:40 +01:00
Gerrit Renker	2013c7e35a	dccp ccid-3: Fix error in loss detection The TFRC loss detection code used the wrong loss condition (RFC 4340, 7.7.1): * the difference between sequence numbers s1 and s2 instead of * the number of packets missing between s1 and s2 (one less than the distance). Since this condition appears in many places of the code, it has been put into a separate function, dccp_loss_free(). Further changes: ---------------- * tidied up incorrect typing (it was using `int' for u64/s64 types); * optimised conditional statements for common case of non-reordered packets; * rewrote comments/documentation to match the changes. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-07-13 11:51:40 +01:00
Brian Haley	7d06b2e053	net: change proto destroy method to return void Change struct proto destroy function pointer to return void. Noticed by Al Viro. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-14 17:04:49 -07:00
Gerrit Renker	be4c798a41	dccp: Bug in initial acknowledgment number assignment Step 8.5 in RFC 4340 says for the newly cloned socket Initialize S.GAR := S.ISS, but what in fact the code (minisocks.c) does is Initialize S.GAR := S.ISR, which is wrong (typo?) -- fixed by the patch. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-06-11 11:19:10 +01:00
Gerrit Renker	7deb0f8510	dccp ccid-3: X truncated due to type conversion This fixes a bug in computing the inter-packet-interval t_ipi = s/X: scaled_div32(a, b) uses u32 for b, but in "scaled_div32(s, X)" the type of the sending rate `X' is u64. Since X is scaled by 2^6, this truncates rates greater than 2^26 Bps (~537 Mbps). Using full 64-bit division now. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-06-11 11:19:10 +01:00
Gerrit Renker	1e8a287c79	dccp ccid-3: TFRC reverse-lookup Bug-Fix This fixes a bug in the reverse lookup of p: given a value f(p), instead of p, the function returned the smallest tabulated value f(p). The smallest tabulated value of 10^6 * f(p) = sqrt(2p/3) + 12 sqrt(3p/8) (32 * p^3 + p) for p=0.0001 is 8172. Since this value is scaled by 10^6, the outcome of this bug is that a loss of 8172/10^6 = 0.8172% was reported whenever the input was below the table resolution of 0.01%. This means that the value was over 80 times too high, resulting in large spikes of the initial loss interval, thus unnecessarily reducing the throughput. Also corrected the printk format (%u for u32). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-06-11 11:19:10 +01:00
Gerrit Renker	65907a433a	dccp ccid-2: Bug-Fix - Ack Vectors need to be ignored on request sockets This fixes an oversight from an earlier patch, ensuring that Ack Vectors are not processed on request sockets. The issue is that Ack Vectors must not be parsed on request sockets, since the Ack Vector feature depends on the selection of the (TX) CCID. During the initial handshake the CCIDs are undefined, and so RFC 4340, 10.3 applies: "Using CCID-specific options and feature options during a negotiation for the corresponding CCID feature is NOT RECOMMENDED [...]" And it is not even possible: when the server receives the Request from the client, the CCID and Ack vector features are undefined; when the Ack finalising the 3-way hanshake arrives, the request socket has not been cloned yet into a full socket. (This order is necessary, since otherwise the newly created socket would have to be destroyed whenever an option error occurred - a malicious hacker could simply send garbage options and exploit this.) Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-06-11 11:19:09 +01:00
Gerrit Renker	1e2f0e5e83	dccp: Fix sparse warnings This patch fixes the following sparse warnings: * nested min(max()) expression: net/dccp/ccids/ccid3.c:91:21: warning: symbol '__x' shadows an earlier one net/dccp/ccids/ccid3.c:91:21: warning: symbol '__y' shadows an earlier one * Declaration of function prototypes in .c instead of .h file, resulting in "should it be static?" warnings. * Declared "struct dccpw" static (local to dccp_probe). * Disabled dccp_delayed_ack() - not fully removed due to RFC 4340, 11.3 ("Receivers SHOULD implement delayed acknowledgement timers ..."). * Used a different local variable name to avoid net/dccp/ackvec.c:293:13: warning: symbol 'state' shadows an earlier one net/dccp/ackvec.c:238:33: originally declared here * Removed unused functions `dccp_ackvector_print' and `dccp_ackvec_print'. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-06-11 11:19:09 +01:00
Gerrit Renker	3294f202dc	dccp ccid-3: Bug-Fix - Zero RTT is possible In commit $(`825de27d9e`) (from 27th May, commit message `dccp ccid-3: Fix "t_ipi explosion" bug'), the CCID-3 window counter computation was fixed to cope with RTTs < 4 microseconds. Such RTTs can be found e.g. when running CCID-3 over loopback. The fix removed a check against RTT < 4, but introduced a divide-by-zero bug. All steady-state RTTs in DCCP are filtered using dccp_sample_rtt(), which ensures non-zero samples. However, a zero RTT is possible on initialisation, when there is no RTT sample from the Request/Response exchange. The fix is to use the fallback-RTT from RFC 4340, 3.4. This is also better than just fixing update_win_count() since it allows other parts of the code to always assume that the RTT is non-zero during the time that the CCID is used. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-06-11 11:19:09 +01:00
Arnaldo Carvalho de Melo	ce4a7d0d48	inet{6}_request_sock: Init ->opt and ->pktopts in the constructor Wei Yongjun noticed that we may call reqsk_free on request sock objects where the opt fields may not be initialized, fix it by introducing inet_reqsk_alloc where we initialize ->opt to NULL and set ->pktopts to NULL in inet6_reqsk_alloc. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-10 12:39:35 -07:00
Gerrit Renker	825de27d9e	dccp ccid-3: Fix "t_ipi explosion" bug The identification of this bug is thanks to Cheng Wei and Tomasz Grobelny. To avoid divide-by-zero, the implementation previously ignored RTTs smaller than 4 microseconds when performing integer division RTT/4. When the RTT reached a value less than 4 microseconds (as observed on loopback), this prevented the Window Counter CCVal value from advancing. As a result, the receiver stopped sending feedback. This in turn caused non-ending expiries of the nofeedback timer at the sender, so that the sending rate was progressively reduced until reaching the minimum of one packet per 64 seconds. The patch fixes this bug by handling integer division more intelligently. Due to consistent use of dccp_sample_rtt(), divide-by-zero-RTT is avoided. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-27 06:33:54 -07:00
Wei Yongjun	6079a463cf	dccp: Fix to handle short sequence numbers packet correctly RFC4340 said: 8.5. Pseudocode ... If P.type is not Data, Ack, or DataAck and P.X == 0 (the packet has short sequence numbers), drop packet and return But DCCP has some mistake to handle short sequence numbers packet, now it drop packet only if P.type is Data, Ack, or DataAck and P.X == 0. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-27 06:22:38 -07:00
Chris Wright	19443178fb	dccp: return -EINVAL on invalid feature length dccp_feat_change() validates length and on error is returning 1. This happens to work since call chain is checking for 0 == success, but this is returned to userspace, so make it a real error value. Signed-off-by: Chris Wright <chrisw@sous-sol.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-05 13:50:24 -07:00
Harvey Harrison	84994e16f2	dccp: ccid2.c, ccid3.c use clamp(), clamp_t() Makes the intention of the nested min/max clear. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-02 16:44:07 -07:00
Linus Torvalds	2e561c7b7e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (48 commits) net: Fix wrong interpretation of some copy_to_user() results. xfrm: alg_key_len & alg_icv_len should be unsigned [netdrvr] tehuti: move ioctl perm check closer to function start ipv6: Fix typo in net/ipv6/Kconfig via-velocity: fix vlan receipt tg3: sparse cleanup forcedeth: realtek phy crossover detection ibm_newemac: Increase MDIO timeouts gianfar: Fix skb allocation strategy netxen: reduce stack usage of netxen_nic_flash_print smc911x: test after postfix decrement fails in smc911x_{reset,drop_pkt} net drivers: fix platform driver hotplug/coldplug forcedeth: new backoff implementation ehea: make things static phylib: Add support for board-level PHY fixups [netdrvr] atlx: code movement: move atl1 parameter parsing atlx: remove flash vendor parameter korina: misc cleanup korina: fix misplaced return statement WAN: Fix confusing insmod error code for C101 too. ...	2008-04-25 12:28:28 -07:00
Pavel Emelyanov	653252c230	net: Fix wrong interpretation of some copy_to_user() results. I found some places, that erroneously return the value obtained from the copy_to_user() call: if some amount of bytes were not able to get to the user (this is what this one returns) the proper behavior is to return the -EFAULT error, not that number itself. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-25 01:49:48 -07:00
Linus Torvalds	b0d19a378a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: iwlwifi: Fix built-in compilation of iwlcore net: Unexport move_addr_to_{kernel,user} rt2x00: Select LEDS_CLASS. iwlwifi: Select LEDS_CLASS. leds: Do not guard NEW_LEDS with HAS_IOMEM [IPSEC]: Fix catch-22 with algorithm IDs above 31 time: Export set_normalized_timespec. tcp: Make use of before macro in tcp_input.c hamradio: Remove unneeded and deprecated cli()/sti() calls in dmascc.c [NETNS]: Remove empty ->init callback. [DCCP]: Convert do_gettimeofday() to getnstimeofday(). [NETNS]: Don't initialize err variable twice. [NETNS]: The ip6_fib_timer can work with garbage on net namespace stop. [IPV4]: Convert do_gettimeofday() to getnstimeofday(). [IPV4]: Make icmp_sk_init() static. [IPV6]: Make struct ip6_prohibit_entry_template static. tcp: Trivial fix to correct function name in a comment in net/ipv4/tcp.c [NET]: Expose netdevice dev_id through sysfs skbuff: fix missing kernel-doc notation [ROSE]: Fix soft lockup wrt. rose_node_list_lock	2008-04-23 12:23:45 -07:00
YOSHIFUJI Hideaki	cdd04d98f6	[DCCP]: Convert do_gettimeofday() to getnstimeofday(). What do_gettimeofday() does is to call getnstimeofday() and to convert the result from timespec{} to timeval{}. We do not always need timeval{} and we can convert timespec{} when we really need (to print). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-21 14:28:45 -07:00
Matthew Wilcox	5f090dcb4d	net: Remove unnecessary inclusions of asm/semaphore.h None of these files use any of the functionality promised by asm/semaphore.h. It's possible that they rely on it dragging in some unrelated header file, but I can't build all these files, so we'll have fix any build failures as they come up. Signed-off-by: Matthew Wilcox <willy@linux.intel.com>	2008-04-18 22:15:50 -04:00
Pavel Emelyanov	e56d8b8a2e	[INET]: Drop the inet_inherit_port() call. As I can see from the code, two places (tcp_v6_syn_recv_sock and dccp_v6_request_recv_sock) that call this one already run with BHs disabled, so it's safe to call __inet_inherit_port there. Besides (in case I missed smth with code review) the calltrace tcp_v6_syn_recv_sock `- tcp_v4_syn_recv_sock `- __inet_inherit_port and the similar for DCCP are valid, but assumes BHs to be disabled. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-17 23:17:34 -07:00
Pavel Emelyanov	13f51d82ac	[DCCP]: Fix comment about control sockets. These sockets now have a bit other names and are no longer global. Shame on me, I haven't provided a good comment for this when sending DCCP netnsization patches. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-14 02:38:45 -07:00
David S. Miller	df39e8ba56	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/ehea/ehea_main.c drivers/net/wireless/iwlwifi/Kconfig drivers/net/wireless/rt2x00/rt61pci.c net/ipv4/inet_timewait_sock.c net/ipv6/raw.c net/mac80211/ieee80211_sta.c	2008-04-14 02:30:23 -07:00
Pavel Emelyanov	671a1c7401	[NETNS][DCCPV6]: Make per-net socket lookup. The inet6_lookup family of functions requires a net to lookup a socket in, so give a proper one to them. No more things to do for dccpv6, since routing is OK and the ipv4-like transport layer filtering is not done for ipv6. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-13 22:33:06 -07:00
Pavel Emelyanov	334527d351	[NETNS][DCCPV6]: Actually create ctl socket on each net and use it. Move the call to inet_ctl_sock_create to init callback (and inet_ctl_sock_destroy to exit one) and use proper ctl sock in dccp_v6_ctl_send_reset. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-13 22:32:45 -07:00
Pavel Emelyanov	0204774191	[NETNS][DCCPV6]: Move the dccp_v6_ctl_sk on the struct net. And replace all its usage with init_net's socket. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-13 22:32:25 -07:00
Pavel Emelyanov	8231bd270d	[NETNS][DCCPV6]: Add dummy per-net operations. They will be responsible for ctl socket initialization, but currently they are void. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-13 22:32:02 -07:00
Pavel Emelyanov	68d185980f	[NETNS][DCCPV6]: Don't pass NULL to ip6_dst_lookup. This call uses the sock to get the net to lookup the routing in. With CONFIG_NET_NS this code will OOPS, since the sk ptr is NULL. After looking inside the ip6_dst_lookup and drawing the analogy with respective ipv6 code, it seems, that the dccp ctl socket is a good candidate for the first argument. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-13 22:31:32 -07:00
Pavel Emelyanov	fc5f8580d3	[NETNS][DCCPV4]: Enable DCCPv4 in net namespaces. This enables sockets creation with IPPROTO_DCCP and enables the ip level to pass DCCP packets to the DCCP level. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-13 22:31:05 -07:00
Pavel Emelyanov	b9901a84c9	[NETNS][DCCPV4]: Make per-net socket lookup. The inet_lookup family of functions requires a net to lookup a socket in, so give a proper one to them. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-13 22:30:43 -07:00
Pavel Emelyanov	f54873982c	[NETNS][DCCPV4]: Use proper net to route the reset packet. The dccp_v4_route_skb used in dccp_v4_ctl_send_reset, currently works with init_net's routing tables - fix it. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-13 22:30:19 -07:00
Pavel Emelyanov	b76c4b27fe	[NETNS][DCCPV4]: Actually create ctl socket on each net and use it. Move the call to inet_ctl_sock_create to init callback (and inet_ctl_sock_destroy to exit one) and use proper ctl sock in dccp_v4_ctl_send_reset. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-13 22:29:59 -07:00
Pavel Emelyanov	7b1cffa8c9	[NETNS][DCCPV4]: Move the dccp_v4_ctl_sk on the struct net. And replace all its usage with init_net's socket. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-13 22:29:37 -07:00
Pavel Emelyanov	72a2d61382	[NETNS][DCCPV4]: Add dummy per-net operations. They will be responsible for ctl socket initialization, but currently they are void. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-13 22:29:13 -07:00
Patrick McHardy	028b027524	[DCCP]: Fix skb->cb conflicts with IP dev_queue_xmit() and the other IP output functions expect to get a skb with clear or properly initialized skb->cb. Unlike TCP and UDP, the dccp_skb_cb doesn't contain a struct inet_skb_parm at the beginning, so the DCCP-specific data is interpreted by the IP output functions. This can cause false negatives for the conditional POST_ROUTING hook invocation, making the packet bypass the hook. Add a inet_skb_parm/inet6_skb_parm union to the beginning of dccp_skb_cb to avoid clashes. Also add a BUILD_BUG_ON to make sure it fits in the cb. [ Combined with patch from Gerrit Renker to remove two now unnecessary memsets of IPCB(skb)->opt ] Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-12 18:35:41 -07:00
YOSHIFUJI Hideaki	24e8b7e484	[DCCP]: Use snmp_mib_{init,free}(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-10 03:48:43 -07:00
Denis V. Lunev	5677242f43	[NETNS]: Inet control socket should not hold a namespace. This is a generic requirement, so make inet_ctl_sock_create namespace aware and create a inet_ctl_sock_destroy wrapper around sk_release_kernel. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-03 14:28:30 -07:00
Denis V. Lunev	eee4fe4ded	[INET]: Let inet_ctl_sock_create return sock rather than socket. All upper protocol layers are already use sock internally. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-03 14:27:58 -07:00
Denis V. Lunev	3d58b5fa8e	[INET]: Rename inet_csk_ctl_sock_create to inet_ctl_sock_create. This call is nothing common with INET connection sockets code. It simply creates an unhashes kernel sockets for protocol messages. Move the new call into af_inet.c after the rename. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-03 14:22:32 -07:00
Denis V. Lunev	4f049b4f33	[DCCP]: dccp_v(4\|6)_ctl_socket is leaked. This seems a purism as module can't be unloaded, but though if cleanup method is present it should be correct and clean all staff created. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-03 14:21:33 -07:00
Denis V. Lunev	7630f02681	[DCCP]: Replace socket with sock for reset sending. Replace dccp_v(4\|6)_ctl_socket with sock to unify a code with TCP/ICMP. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-03 14:20:52 -07:00
Pavel Emelyanov	bdcde3d71a	[SOCK]: Drop inuse pcounter from struct proto (v2). An uppercut - do not use the pcounter on struct proto. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-28 16:39:33 -07:00
Pavel Emelyanov	39d8cda76c	[SOCK]: Add udp_hash member to struct proto. Inspired by the commit `ab1e0a13` ([SOCK] proto: Add hashinfo member to struct proto) from Arnaldo, I made similar thing for UDP/-Lite IPv4 and -v6 protocols. The result is not that exciting, but it removes some levels of indirection in udpxxx_get_port and saves some space in code and text. The first step is to union existing hashinfo and new udp_hash on the struct proto and give a name to this union, since future initialization of tcpxxx_prot, dccp_vx_protinfo and udpxxx_protinfo will cause gcc warning about inability to initialize anonymous member this way. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-22 16:50:58 -07:00
Harvey Harrison	0dc47877a3	net: replace remaining __FUNCTION__ occurrences __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-05 20:47:47 -08:00
Eric Dumazet	ee6b967301	[IPV4]: Add 'rtable' field in struct sk_buff to alias 'dst' and avoid casts (Anonymous) unions can help us to avoid ugly casts. A common cast it the (struct rtable )skb->dst one. Defining an union like : union { struct dst_entry dst; struct rtable *rtable; }; permits to use skb->rtable in place. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-05 18:30:47 -08:00
Denis V. Lunev	fd80eb942a	[INET]: Remove struct dst_entry *dst from request_sock_ops.rtx_syn_ack. It looks like dst parameter is used in this API due to historical reasons. Actually, it is really used in the direct call to tcp_v4_send_synack only. So, create a wrapper for tcp_v4_send_synack and remove dst from rtx_syn_ack. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-29 11:43:03 -08:00
Arnaldo Carvalho de Melo	ab1e0a13d7	[SOCK] proto: Add hashinfo member to struct proto This way we can remove TCP and DCCP specific versions of sk->sk_prot->get_port: both v4 and v6 use inet_csk_get_port sk->sk_prot->hash: inet_hash is directly used, only v6 need a specific version to deal with mapped sockets sk->sk_prot->unhash: both v4 and v6 use inet_hash directly struct inet_connection_sock_af_ops also gets a new member, bind_conflict, so that inet_csk_get_port can find the per family routine. Now only the lookup routines receive as a parameter a struct inet_hashtable. With this we further reuse code, reducing the difference among INET transport protocols. Eventually work has to be done on UDP and SCTP to make them share this infrastructure and get as a bonus inet_diag interfaces so that iproute can be used with these protocols. net-2.6/net/ipv4/inet_hashtables.c: struct proto \| +8 struct inet_connection_sock_af_ops \| +8 2 structs changed __inet_hash_nolisten \| +18 __inet_hash \| -210 inet_put_port \| +8 inet_bind_bucket_create \| +1 __inet_hash_connect \| -8 5 functions changed, 27 bytes added, 218 bytes removed, diff: -191 net-2.6/net/core/sock.c: proto_seq_show \| +3 1 function changed, 3 bytes added, diff: +3 net-2.6/net/ipv4/inet_connection_sock.c: inet_csk_get_port \| +15 1 function changed, 15 bytes added, diff: +15 net-2.6/net/ipv4/tcp.c: tcp_set_state \| -7 1 function changed, 7 bytes removed, diff: -7 net-2.6/net/ipv4/tcp_ipv4.c: tcp_v4_get_port \| -31 tcp_v4_hash \| -48 tcp_v4_destroy_sock \| -7 tcp_v4_syn_recv_sock \| -2 tcp_unhash \| -179 5 functions changed, 267 bytes removed, diff: -267 net-2.6/net/ipv6/inet6_hashtables.c: __inet6_hash \| +8 1 function changed, 8 bytes added, diff: +8 net-2.6/net/ipv4/inet_hashtables.c: inet_unhash \| +190 inet_hash \| +242 2 functions changed, 432 bytes added, diff: +432 vmlinux: 16 functions changed, 485 bytes added, 492 bytes removed, diff: -7 /home/acme/git/net-2.6/net/ipv6/tcp_ipv6.c: tcp_v6_get_port \| -31 tcp_v6_hash \| -7 tcp_v6_syn_recv_sock \| -9 3 functions changed, 47 bytes removed, diff: -47 /home/acme/git/net-2.6/net/dccp/proto.c: dccp_destroy_sock \| -7 dccp_unhash \| -179 dccp_hash \| -49 dccp_set_state \| -7 dccp_done \| +1 5 functions changed, 1 bytes added, 242 bytes removed, diff: -241 /home/acme/git/net-2.6/net/dccp/ipv4.c: dccp_v4_get_port \| -31 dccp_v4_request_recv_sock \| -2 2 functions changed, 33 bytes removed, diff: -33 /home/acme/git/net-2.6/net/dccp/ipv6.c: dccp_v6_get_port \| -31 dccp_v6_hash \| -7 dccp_v6_request_recv_sock \| +5 3 functions changed, 5 bytes added, 38 bytes removed, diff: -33 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-03 04:28:52 -08:00
Pavel Emelyanov	d86e0dac2c	[NETNS]: Tcp-v6 sockets per-net lookup. Add a net argument to inet6_lookup and propagate it further. Actually, this is tcp-v6 implementation of what was done for tcp-v4 sockets in a previous patch. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:20 -08:00
Pavel Emelyanov	c67499c0e7	[NETNS]: Tcp-v4 sockets per-net lookup. Add a net argument to inet_lookup and propagate it further into lookup calls. Plus tune the __inet_check_established. The dccp and inet_diag, which use that lookup functions pass the init_net into them. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:19 -08:00
Denis V. Lunev	f1b050bf7a	[NETNS]: Add namespace parameter to ip_route_output_flow. Needed to propagate it down to the __ip_route_output_key. Signed_off_by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:06 -08:00
Pavel Emelyanov	b5ccd792fa	[NET]: Simple ctl_table to ctl_path conversions. This patch includes many places, that only required replacing the ctl_table-s with appropriate ctl_paths and call register_sysctl_paths(). Nothing special was done with them. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:07 -08:00
Ilpo Järvinen	c4e18dade1	[CCID3]: Kill some bloat Without a number of CONFIG.*DEBUG: net/dccp/ccids/ccid3.c: ccid3_hc_tx_update_x \| -170 ccid3_hc_tx_packet_sent \| -175 ccid3_hc_tx_packet_recv \| -169 ccid3_hc_tx_no_feedback_timer \| -192 ccid3_hc_tx_send_packet \| -144 5 functions changed, 850 bytes removed, diff: -850 net/dccp/ccids/ccid3.c: ccid3_update_send_interval \| +191 1 function changed, 191 bytes added, diff: +191 net/dccp/ccids/ccid3.o: 6 functions changed, 191 bytes added, 850 bytes removed, diff: -659 Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:44 -08:00
Pavel Emelyanov	152da81deb	[INET]: Uninline the __inet_hash function. This one is used in quite many places in the networking code and seems to big to be inline. After the patch net/ipv4/build-in.o loses ~650 bytes: add/remove: 2/0 grow/shrink: 0/5 up/down: 461/-1114 (-653) function old new delta __inet_hash_nolisten - 282 +282 __inet_hash - 179 +179 tcp_sacktag_write_queue 2255 2254 -1 __inet_lookup_listener 284 274 -10 tcp_v4_syn_recv_sock 755 493 -262 tcp_v4_hash 389 35 -354 inet_hash_connect 1086 599 -487 This version addresses the issue pointed by Eric, that while being inline this function was optimized by gcc in respect to the 'listen_possible' argument. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:59:26 -08:00
Gerrit Renker	a07a5a86d0	[DCCP]: Remove unused inline function The function follows48(), which is a special-case of dccp_delta_seqno(), is nowhere used in the DCCP code, thus removed by this patch. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:24 -08:00
Gerrit Renker	52515e77a7	[CCID3]: Nofeedback timer according to rfc3448bis This implements the changes to the nofeedback timer handling suggested in draft rfc3448bis00, section 4.4. In particular, these changes mean: * better handling of the lossless case (p == 0) * the timestamp for computing t_ld becomes obsolete * much more recent document (RFC 3448 is almost 5 years old) * concepts in rfc3448bis arose from a real, working implementation (cf. sec. 12) Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:23 -08:00
Gerrit Renker	d8d1252f74	[CCID3]: Implement rfc3448bis changes to feedback reception This implements the algorithm to update the allowed sending rate X upon receiving feedback packets, as described in draft rfc3448bis, 4.2/4.3. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:22 -08:00
Gerrit Renker	5bd370a63d	[CCID3]: Remove two irrelevant states in TX feedback handling * the NO_SENT state is only triggered in bidirectional mode, costing unnecessary processing. * the TERM (terminating) state is irrelevant. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:22 -08:00
Gerrit Renker	8e138e7949	[CCID3]: Use a function to update p_inv, and p is never used This patch 1) concentrates previously scattered computation of p_inv into one function; 2) removes the `p' element of the CCID3 RX sock (it is redundant); 3) makes the tfrc_rx_info structure standalone, only used on demand. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:21 -08:00
Gerrit Renker	6179983ad3	[DCCP]: Introducing CCMPS This introduces a CCMPS field for setting a CCID-specific upper bound on the application payload size, as is defined in RFC 4340, section 14. Only the TX CCID is considered in setting this limit, since the RX CCID generates comparatively small (DCCP-Ack) feedback packets. The CCMPS field includes network and transport layer header lengths. The only current CCMPS customer is CCID4 (via RFC 4828). A wrapper is used to allow querying the CCMPS even at times where the CCID modules may not have been fully negotiated yet. In dccp_sync_mss() the variable `mss_now' has been renamed into `cur_mps', to reflect that we are dealing with an MPS, but not an MSS. Since the DCCP code closely follows the TCP code, the identifiers `dccp_sync_mss' and `dccps_mss_cache' have been kept, as they have direct TCP counterparts. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:59 -08:00
Gerrit Renker	84a97b0af8	[CCID]: More informative registration The patch makes the registration messages of CCID 2/3 a bit more informative: instead of repeating the CCID number as currently done, "CCID: Registered CCID 2 (ccid2)" or "CCID: Registered CCID 3 (ccid3)", the descriptive names of the CCID's (from RFCs) are now used: "CCID: Registered CCID 2 (TCP-like)" and "CCID: Registered CCID 3 (TCP-Friendly Rate Control)". To allow spaces in the name, the slab name string has been changed to refer to the numeric CCID identifier, using the same format as before. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:58 -08:00
Gerrit Renker	9cb2345a8c	[DCCP]: Documentation for CCID operations This adds documentation for the ccid_operations structure. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:58 -08:00
Gerrit Renker	cf86314cb7	[DCCP]: Ignore feature negotiation on Data packets This implements [RFC 4340, p. 32]: "any feature negotiation options received on DCCP-Data packets MUST be ignored". Also added a FIXME for further processing, since the code currently (wrongly) classifies empty Confirm options as invalid - this needs to be resolved in a separate patch. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:54 -08:00
Gerrit Renker	5cdae198de	[DCCP]: Make code assumptions explicit This removes several `XXX' references which indicate a missing support for non-1-byte feature values: this is unnecessary, as all currently known (standardised) SP feature values are 1-byte quantities. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:53 -08:00
Gerrit Renker	dd6303df09	[DCCP]: Remove unused and redundant validation functions This removes two inlines which were both called in a single function only: 1) dccp_feat_change() is always called with either DCCPO_CHANGE_L or DCCPO_CHANGE_R as argument * from dccp_set_socktopt_change() via do_dccp_setsockopt() with DCCP_SOCKOPT_CHANGE_R/L * from __dccp_feat_init() via dccp_feat_init() also with DCCP_SOCKOPT_CHANGE_R/L. Hence the dccp_feat_is_valid_type() is completely unnecessary and always returns true. 2) Due to (1), the length test reduces to 'len >= 4', which in turn makes dccp_feat_is_valid_length() unnecessary. Furthermore, the inline function dccp_feat_is_reserved() was unfolded, since only called in a single place. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:52 -08:00
Gerrit Renker	af3b867e2f	[DCCP]: Support inserting options during the 3-way handshake This provides a separate routine to insert options during the initial handshake. The main purpose is to conduct feature negotiation, for the moment the only user is the timestamp echo needed for the (CCID3) handshake RTT sample. Padding of options has been put into a small separate routine, to be shared among the two functions. This could also be used as a generic routine to finish inserting options. Also removed an `XXX' comment since its content was obvious. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:52 -08:00
Gerrit Renker	b4d4f7c70f	[DCCP]: Handle timestamps on Request/Response exchange separately In DCCP, timestamps can occur on packets anytime, CCID3 uses a timestamp(/echo) on the Request/Response exchange. This patch addresses the following situation: * timestamps are recorded on the listening socket; * Responses are sent from dccp_request_sockets; * suppose two connections reach the listening socket with very small time in between: * the first timestamp value gets overwritten by the second connection request. This is not really good, so this patch separates timestamps into * those which are received by the server during the initial handshake (on dccp_request_sock); * those which are received by the client or the client after connection establishment. As before, a timestamp of 0 is regarded as indicating that no (meaningful) timestamp has been received (in addition, a warning message is printed if hosts send 0-valued timestamps). The timestamp-echoing now works as follows: * when a timestamp is present on the initial Request, it is placed into dreq, due to the call to dccp_parse_options in dccp_v{4,6}_conn_request; * when a timestamp is present on the Ack leading from RESPOND => OPEN, it is copied over from the request_sock into the child cocket in dccp_create_openreq_child; * timestamps received on an (established) dccp_sock are treated as before. Since Elapsed Time is measured in hundredths of milliseconds (13.2), the new dccp_timestamp() function is used, as it is expected that the time between receiving the timestamp and sending the timestamp echo will be very small against the wrap-around time. As a byproduct, this allows smaller timestamping-time fields. Furthermore, inserting the Timestamp Echo option has been taken out of the block starting with '!dccp_packet_without_ack()', since Timestamp Echo can be carried on any packet (5.8 and 13.3). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:51 -08:00
Gerrit Renker	8109616e2e	[DCCP]: Add (missing) option parsing to request_sock processing This adds option-parsing code to processing of Acks in the listening state on request_socks on the server, serving two purposes (i) resolves a FIXME (removed); (ii) paves the way for feature-negotiation during connection-setup. There is an intended subtlety here with regard to dccp_check_req: Parsing options happens only after testing whether the received packet is a retransmitted Request. Otherwise, if the Request contained (a possibly large number of) feature-negotiation options, recomputing state would have to happen each time a retransmitted Request arrives, which opens the door to an easy DoS attack. Since in a genuine retransmission the options should not be different from the original, reusing the already computed state seems better. The other point is - if there are timestamp options on the Request, they will not be answered; which means that in the presence of retransmission (likely due to loss and/or other problems), the use of Request/Response RTT sampling is suspended, so that startup problems here do not propagate. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:50 -08:00
Gerrit Renker	8b81941248	[DCCP]: Allow to parse options on Request Sockets The option parsing code currently only parses on full sk's. This causes a problem for options sent during the initial handshake (in particular timestamps and feature-negotiation options). Therefore, this patch extends the option parsing code with an additional argument for request_socks: if it is non-NULL, options are parsed on the request socket, otherwise the normal path (parsing on the sk) is used. Subsequent patches, which implement feature negotiation during connection setup, make use of this facility. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:50 -08:00
Gerrit Renker	7913350663	[DCCP]: Collapse repeated `len' statements into one This replaces 4 individual assignments for `len' with a single one, placed where the control flow of those 4 leads to. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:49 -08:00
Gerrit Renker	b8599d2070	[DCCP]: Support for server holding timewait state This adds a socket option and signalling support for the case where the server holds timewait state on closing the connection, as described in RFC 4340, 8.3. Since holding timewait state at the server is the non-usual case, it is enabled via a socket option. Documentation for this socket option has been added. The setsockopt statement has been made resilient against different possible cases of expressing boolean `true' values using a suggestion by Ian McDonald. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:48 -08:00
Gerrit Renker	28be544004	[DCCP]: Use maximum-RTO backoff from DCCP spec This removes another Fixme, using the TCP maximum RTO rather than the value specified by the DCCP specification. Across the sections in RFC 4340, 64 seconds is consistently suggested as maximum RTO backoff value; and this is the value which is now used. I have checked both termination cases for retransmissions of Close/CloseReq: with the default value 15 of `retries2', and an initial icsk_retransmit = 0, it takes about 614 seconds to declare a non-responding peer as dead, after which the final terminating Reset is sent. With the TCP maximum RTO value of 120 seconds it takes (as might be expected) almost twice as long, about 23 minutes. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:47 -08:00
Gerrit Renker	92d31920b8	[DCCP]: Shift the retransmit timer for active-close into output.c When performing active close, RFC 4340, 8.3. requires to retransmit the Close/CloseReq with a backoff-retransmit timer starting at intially 2 RTTs. This patch shifts the existing code for active-close retransmit timer into output.c, so that the retransmit timer is started when the first Close/CloseReq is sent. Previously, the timer was started when, after releasing the socket in dccp_close(), the actively-closing side had not yet reached the CLOSED/TIMEWAIT state. The patch further reduces the initial timeout from 3 seconds to the required 2 RTTs, where - in absence of a known RTT - the fallback value specified in RFC 4340, 3.4 is used. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:47 -08:00
Gerrit Renker	69567d0b63	[DCCP]: Perform SHUT_RD and SHUT_WR on receiving close This patch performs two changes: 1) Close the write-end in addition to the read-end when a fin-like segment (Close or CloseReq) is received by DCCP. This accounts for the fact that DCCP, in contrast to TCP, does not have a half-close. RFC 4340 says in this respect that when a fin-like segment has been sent there is no guarantee at all that any further data will be processed. Thus this patch performs SHUT_WR in addition to the SHUT_RD when a fin-like segment is encountered. 2) Minor change: I noted that code appears twice in different places and think it makes sense to put this into a self-contained function (dccp_enqueue()). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:45 -08:00
Herbert Xu	bb72845e69	[IPSEC]: Make callers of xfrm_lookup to use XFRM_LOOKUP_WAIT This patch converts all callers of xfrm_lookup that used an explicit value of 1 to indiciate blocking to use the new flag XFRM_LOOKUP_WAIT. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:42 -08:00
Gerrit Renker	3f71c81ac3	[TFRC]: Remove previous loss intervals implementation Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:20 -08:00
Gerrit Renker	954c2db868	[CCID3]: Interface CCID3 code with newer Loss Intervals Database This hooks up the TFRC Loss Interval database with CCID 3 packet reception. In addition, it makes the CCID-specific computation of the first loss interval (which requires access to all the guts of CCID3) local to ccid3.c. The patch also fixes an omission in the DCCP code, that of a default / fallback RTT value (defined in section 3.4 of RFC 4340 as 0.2 sec); while at it, the upper bound of 4 seconds for an RTT sample has been reduced to match the initial TCP RTO value of 3 seconds from[RFC 1122, 4.2.3.1]. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:20 -08:00
Gerrit Renker	de0d411cb8	[TFRC]: CCID3 (and CCID4) needs to access these inlines This moves two inlines back to packet_history.h: these are not private to packet_history.c, but are needed by CCID3/4 to detect whether a new loss is indicated, or whether a loss is already pending. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:19 -08:00
Gerrit Renker	db64196038	[CCID3]: Redundant debugging output / documentation Each time feedback is sent two lines are printed: ccid3_hc_rx_send_feedback: client ... - entry ccid3_hc_rx_send_feedback: Interval ...usec, X_recv=..., 1/p=... The first line is redundant and thus removed. Further, documentation of ccid3_hc_rx_sock (capitalisation) is made consistent. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:18 -08:00
Gerrit Renker	8a9c7e92e0	[TFRC]: Ringbuffer to track loss interval history A ringbuffer-based implementation of loss interval history is easier to maintain, allocate, and update. The `swap' routine to keep the RX history sorted is due to and was written by Arnaldo Carvalho de Melo, simplifying an earlier macro-based variant. Details: * access to the Loss Interval Records via macro wrappers (with safety checks); * simplified, on-demand allocation of entries (no extra memory consumption on lossless links); cache allocation is local to the module / exported as service; * provision of RFC-compliant algorithm to re-compute average loss interval; * provision of comprehensive, new loss detection algorithm - support for all cases of loss, including re-ordered/duplicate packets; - waiting for NDUPACK=3 packets to fill the hole; - updating loss records when a late-arriving packet fills a hole. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:18 -08:00
Gerrit Renker	8995a238ef	[TFRC]: Loss interval code needs the macros/inlines that were moved This moves the inlines (which were previously declared as macros) back into packet_history.h since the loss detection code needs to be able to read entries from the RX history in order to create the relevant loss entries: it needs at least tfrc_rx_hist_loss_prev() and tfrc_rx_hist_last_rcv(), which in turn require the definition of the other inlines (macros). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:16 -08:00
Gerrit Renker	df8f83fdd6	[TFRC]: Put RX/TX initialisation into tfrc.c This separates RX/TX initialisation and puts all packet history / loss intervals initialisation into tfrc.c. The organisation is uniform: slab declaration -> {rx,tx}_init() -> {rx,tx}_exit() Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:15 -08:00
Gerrit Renker	385ac2e3f2	[CCID3]: HC-receiver should not insert timestamps as HC-sender doesn't uses it Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:04 -08:00
Gerrit Renker	797eba424d	[TFRC]: The function tfrc_rx_hist_entry_delete() is not used anymore Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:03 -08:00
Gerrit Renker	78282d2af5	[TFRC]: Move comment. Moved up the comment "Receiver routines" above the first occurrence of RX history routines. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:03 -08:00
Arnaldo Carvalho de Melo	b84a2189c4	[TFRC]: New rx history code Credit here goes to Gerrit Renker, that provided the initial implementation for this new codebase. I modified it just to try to make it closer to the existing API, renaming some functions, add namespacing and fix one bug where the tfrc_rx_hist_alloc was not freeing the allocated ring entries on the error path. Original changeset comment from Gerrit: ----------- This provides a new, self-contained and generic RX history service for TFRC based protocols. Details: * new data structure, initialisation and cleanup routines; * allocation of dccp_rx_hist entries local to packet_history.c, as a service exported by the dccp_tfrc_lib module. * interface to automatically track highest-received seqno; * receiver-based RTT estimation (needed for instance by RFC 3448, 6.3.1); * a generic function to test for `data packets' as per RFC 4340, sec. 7.7. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:43 -08:00
Gerrit Renker	30a0eacd47	[CCID3]: The receiver of a half-connection does not set window counter values Only the sender sets window counters [RFC 4342, sections 5 and 8.1]. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:43 -08:00
Arnaldo Carvalho de Melo	d58d1af03a	[TFRC]: Rename dccp_rx_ to tfrc_rx_ This is in preparation for merging the new rx history code written by Gerrit Renker. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:42 -08:00
Arnaldo Carvalho de Melo	34a9e7ea91	[TFRC]: Make the rx history slab be global This is in preparation for merging the new rx history code written by Gerrit Renker. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:41 -08:00
Arnaldo Carvalho de Melo	e9c8b24a6a	[TFRC]: Rename tfrc_tx_hist to tfrc_tx_hist_slab, for consistency Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:40 -08:00
Gerrit Renker	2180c41ca5	[DCCP]: Introduce generic function to test for `data packets' as per RFC 4340, sec. 7.7. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:40 -08:00
Gerrit Renker	c40616c597	[TFRC]: Provide central source file and debug facility This patch changes the tfrc_lib module in the following manner: (1) a dedicated tfrc source file to call the packet history & loss interval init/exit functions. (2) a dedicated tfrc_pr_debug macro with toggle switch `tfrc_debug'. Commiter note: renamed tfrc_module.c to tfrc.c, and made CONFIG_IP_DCCP_CCID3 select IP_DCCP_TFRC_LIB. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:39 -08:00

1 2 3 4 5 ...

623 Commits