Commit Graph

6953 Commits

Author SHA1 Message Date
YOSHIFUJI Hideaki
8d614434ab [SUNRPC]: Use htonl() where appropriate.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:05 -08:00
YOSHIFUJI Hideaki
ae445d172a [RXRPC]: Use cpu_to_be32() where appropriate.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:04 -08:00
YOSHIFUJI Hideaki
f831e90971 [MAC80211]: Use htons() where appropriate.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:04 -08:00
YOSHIFUJI Hideaki
6d82de9e57 [IRDA]: Use htons() where appropriate.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:03 -08:00
YOSHIFUJI Hideaki
5661df7b6c [IPVS]: Use htons() where appropriate.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:02 -08:00
YOSHIFUJI Hideaki
4e39430ae3 [IEEE80211]: Use htons() where appropriate.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:02 -08:00
YOSHIFUJI Hideaki
b98999dc38 [DECNET]: Use htons() where appropriate.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:01 -08:00
YOSHIFUJI Hideaki
91c5ec3ed1 [BRIDGE]: Use cpu_to_be16() where appropriate.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:58:00 -08:00
Gerrit Renker
6179983ad3 [DCCP]: Introducing CCMPS
This introduces a CCMPS field for setting a CCID-specific upper bound on the application payload
size, as is defined in RFC 4340, section 14.

Only the TX CCID is considered in setting this limit, since the RX CCID generates comparatively
small (DCCP-Ack) feedback packets. The CCMPS field includes network and transport layer header
lengths. The only current CCMPS customer is CCID4 (via RFC 4828).

A wrapper is used to allow querying the CCMPS even at times where the CCID modules may not have
been fully negotiated yet.

In dccp_sync_mss() the variable `mss_now' has been renamed into `cur_mps', to reflect that we are
dealing with an MPS, but not an MSS.
Since the DCCP code closely follows the TCP code, the identifiers `dccp_sync_mss' and
`dccps_mss_cache' have been kept, as they have direct TCP counterparts.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:59 -08:00
Gerrit Renker
84a97b0af8 [CCID]: More informative registration
The patch makes the registration messages of CCID 2/3 a bit more
informative: instead of repeating the CCID number as currently done,

        "CCID: Registered CCID 2 (ccid2)"  or
        "CCID: Registered CCID 3 (ccid3)",

the descriptive names of the CCID's (from RFCs) are now used:

	"CCID: Registered CCID 2 (TCP-like)" and
	"CCID: Registered CCID 3 (TCP-Friendly Rate Control)".

To allow spaces in the name, the slab name string has been changed to
refer to the numeric CCID identifier, using the same format as before.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:58 -08:00
Gerrit Renker
9cb2345a8c [DCCP]: Documentation for CCID operations
This adds documentation for the ccid_operations structure.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:58 -08:00
Denis V. Lunev
f5026fabda [IPV4]: Thresholds in fib_trie.c are used as consts, so make them const.
There are several thresholds for trie fib hash management. They are used
in the code as a constants. Make them constants from the compiler point of
view.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:57 -08:00
Michal Schmidt
8a4a50f98b [IPV6] sit: Rebinding of SIT tunnels to other interfaces
This is similar to the change already done for IPIP tunnels.

Once created, a SIT tunnel can't be bound to another device.
To reproduce:

# create a tunnel:
ip tunnel add tunneltest0 mode sit remote 10.0.0.1 dev eth0
# try to change the bounding device from eth0 to eth1:
ip tunnel change tunneltest0 dev eth1
# show the result:
ip tunnel show tunneltest0

tunneltest0: ipv6/ip  remote 10.0.0.1  local any  dev eth0  ttl inherit

Notice the bound device has not changed from eth0 to eth1.

This patch fixes it. When changing the binding, it also recalculates the
MTU according to the new bound device's MTU.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:56 -08:00
Michal Schmidt
ee34c1eb35 [IP_GRE]: Rebinding of GRE tunnels to other interfaces
This is similar to the change already done for IPIP tunnels.

Once created, a GRE tunnel can't be bound to another device.
To reproduce:

# create a tunnel:
ip tunnel add tunneltest0 mode gre remote 10.0.0.1 dev eth0
# try to change the bounding device from eth0 to eth1:
ip tunnel change tunneltest0 dev eth1
# show the result:
ip tunnel show tunneltest0

tunneltest0: gre/ip  remote 10.0.0.1  local any  dev eth0  ttl inherit

Notice the bound device has not changed from eth0 to eth1.

This patch fixes it. When changing the binding, it also recalculates the
MTU according to the new bound device's MTU.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:56 -08:00
Denis V. Lunev
528c4ceb42 [IPV6]: Always pass a valid nl_info to inet6_rt_notify.
This makes the code in the inet6_rt_notify more straightforward and provides
groud for namespace passing.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:55 -08:00
Herbert Xu
aef2178599 [IPSEC]: Fix zero return value in xfrm_lookup on error
Further testing shows that my ICMP relookup patch can cause xfrm_lookup
to return zero on error which isn't very nice since it leads to the caller
dying on null pointer dereference.  The bug is due to not setting err
to ENOENT just before we leave xfrm_lookup in case of no policy.

This patch moves the err setting to where it should be.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:54 -08:00
Gerrit Renker
cf86314cb7 [DCCP]: Ignore feature negotiation on Data packets
This implements [RFC 4340, p. 32]: "any feature negotiation options received
on DCCP-Data packets MUST be ignored".

Also added a FIXME for further processing, since the code currently (wrongly)
classifies empty Confirm options as invalid - this needs to be resolved in
a separate patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:54 -08:00
Gerrit Renker
5cdae198de [DCCP]: Make code assumptions explicit
This removes several `XXX' references which indicate a missing support
for non-1-byte feature values: this is unnecessary, as all currently known
(standardised) SP feature values are 1-byte quantities.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:53 -08:00
Gerrit Renker
dd6303df09 [DCCP]: Remove unused and redundant validation functions
This removes two inlines which were both called in a single function only:

 1) dccp_feat_change() is always called with either DCCPO_CHANGE_L or DCCPO_CHANGE_R as argument
    * from dccp_set_socktopt_change() via do_dccp_setsockopt() with DCCP_SOCKOPT_CHANGE_R/L
    * from __dccp_feat_init() via dccp_feat_init() also with DCCP_SOCKOPT_CHANGE_R/L.

    Hence the dccp_feat_is_valid_type() is completely unnecessary and always returns true.

 2) Due to (1), the length test reduces to 'len >= 4', which in turn makes
    dccp_feat_is_valid_length() unnecessary.

Furthermore, the inline function dccp_feat_is_reserved() was unfolded,
since only called in a single place.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:52 -08:00
Gerrit Renker
af3b867e2f [DCCP]: Support inserting options during the 3-way handshake
This provides a separate routine to insert options during the initial handshake.
The main purpose is to conduct feature negotiation, for the moment the only user
is the timestamp echo needed for the (CCID3) handshake RTT sample.

Padding of options has been put into a small separate routine, to be shared among
the two functions. This could also be used as a generic routine to finish inserting
options.

Also removed an `XXX' comment since its content was obvious.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:52 -08:00
Gerrit Renker
b4d4f7c70f [DCCP]: Handle timestamps on Request/Response exchange separately
In DCCP, timestamps can occur on packets anytime, CCID3 uses a timestamp(/echo) on the Request/Response
exchange. This patch addresses the following situation:
	* timestamps are recorded on the listening socket;
	* Responses are sent from dccp_request_sockets;
	* suppose two connections reach the listening socket with very small time in between:
	* the first timestamp value gets overwritten by the second connection request.

This is not really good, so this patch separates timestamps into
 * those which are received by the server during the initial handshake (on dccp_request_sock);
 * those which are received by the client or the client after connection establishment.

As before, a timestamp of 0 is regarded as indicating that no (meaningful) timestamp has been
received (in addition, a warning message is printed if hosts send 0-valued timestamps).

The timestamp-echoing now works as follows:
 * when a timestamp is present on the initial Request, it is placed into dreq, due to the
   call to dccp_parse_options in dccp_v{4,6}_conn_request;
 * when a timestamp is present on the Ack leading from RESPOND => OPEN, it is copied over
   from the request_sock into the child cocket in dccp_create_openreq_child;
 * timestamps received on an (established) dccp_sock are treated as before.

Since Elapsed Time is measured in hundredths of milliseconds (13.2), the new dccp_timestamp()
function is used, as it is expected that the time between receiving the timestamp and
sending the timestamp echo will be very small against the wrap-around time. As a byproduct,
this allows smaller timestamping-time fields.

Furthermore, inserting the Timestamp Echo option has been taken out of the block starting with
'!dccp_packet_without_ack()', since Timestamp Echo can be carried on any packet (5.8 and 13.3).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:51 -08:00
Gerrit Renker
8109616e2e [DCCP]: Add (missing) option parsing to request_sock processing
This adds option-parsing code to processing of Acks in the listening state
on request_socks on the server, serving two purposes
 (i)  resolves a FIXME (removed);
 (ii) paves the way for feature-negotiation during connection-setup.

There is an intended subtlety here with regard to dccp_check_req:

 Parsing options happens only after testing whether the received packet is
 a retransmitted Request.  Otherwise, if the Request contained (a possibly
 large number of) feature-negotiation options, recomputing state would have to
 happen each time a retransmitted Request arrives, which opens the door to an
 easy DoS attack.  Since in a genuine retransmission the options should not be
 different from the original, reusing the already computed state seems better.

 The other point is - if there are timestamp options on the Request, they will
 not be answered; which means that in the presence of retransmission (likely
 due to loss and/or other problems), the use of Request/Response RTT sampling
 is suspended, so that startup problems here do not propagate.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:50 -08:00
Gerrit Renker
8b81941248 [DCCP]: Allow to parse options on Request Sockets
The option parsing code currently only parses on full sk's. This causes a problem for
options sent during the initial handshake (in particular timestamps and feature-negotiation
options). Therefore, this patch extends the option parsing code with an additional argument
for request_socks: if it is non-NULL, options are parsed on the request socket, otherwise
the normal path (parsing on the sk) is used.

Subsequent patches, which implement feature negotiation during connection setup, make use
of this facility.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:50 -08:00
Gerrit Renker
7913350663 [DCCP]: Collapse repeated `len' statements into one
This replaces 4 individual assignments for `len' with a single
one, placed where the control flow of those 4 leads to.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:49 -08:00
Gerrit Renker
b8599d2070 [DCCP]: Support for server holding timewait state
This adds a socket option and signalling support for the case where the server
holds timewait state on closing the connection, as described in RFC 4340, 8.3.

Since holding timewait state at the server is the non-usual case, it is enabled
via a socket option. Documentation for this socket option has been added.

The setsockopt statement has been made resilient against different possible cases
of expressing boolean `true' values using a suggestion by Ian McDonald.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:48 -08:00
Gerrit Renker
28be544004 [DCCP]: Use maximum-RTO backoff from DCCP spec
This removes another Fixme, using the TCP maximum RTO rather than the value
specified by the DCCP specification. Across the sections in RFC 4340, 64
seconds is consistently suggested as maximum RTO backoff value; and this is
the value which is now used.

I have checked both termination cases for retransmissions of Close/CloseReq:
with the default value 15 of `retries2', and an initial icsk_retransmit = 0,
it takes about 614 seconds to declare a non-responding peer as dead, after
which the final terminating Reset is sent. With the TCP maximum RTO value of
120 seconds it takes (as might be expected) almost twice as long, about 23
minutes.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:47 -08:00
Gerrit Renker
92d31920b8 [DCCP]: Shift the retransmit timer for active-close into output.c
When performing active close, RFC 4340, 8.3. requires to retransmit the
Close/CloseReq with a backoff-retransmit timer starting at intially 2 RTTs.

This patch shifts the existing code for active-close retransmit timer
into output.c, so that the retransmit timer is started when the first
Close/CloseReq is sent. Previously, the timer was started when, after
releasing the socket in dccp_close(), the actively-closing side had not yet
reached the CLOSED/TIMEWAIT state.

The patch further reduces the initial timeout from 3 seconds to the required
2 RTTs, where - in absence of a known RTT - the fallback value specified in
RFC 4340, 3.4 is used.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:47 -08:00
Daniel Lezcano
09f7709f49 [IPV6]: fix section mismatch warnings
Removed useless and buggy __exit section in the different
ipv6 subsystems. Otherwise they will be called inside an
init section during rollbacking in case of an error in the
protocol initialization.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:46 -08:00
Gerrit Renker
69567d0b63 [DCCP]: Perform SHUT_RD and SHUT_WR on receiving close
This patch performs two changes:

1) Close the write-end in addition to the read-end when a fin-like segment
  (Close or CloseReq) is received by DCCP. This accounts for the fact that DCCP,
  in contrast to TCP, does not have a half-close. RFC 4340 says in this respect
  that when a fin-like segment has been sent there is no guarantee at all that
  any   further data will be processed.
  Thus this patch performs SHUT_WR in addition to the SHUT_RD when a fin-like
  segment is encountered.

2) Minor change: I noted that code appears twice in different places and think it
   makes sense to put this into a self-contained function (dccp_enqueue()).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:45 -08:00
Herbert Xu
96eba69dba [DECNET]: Fix inverted wait flag in xfrm_lookup call
My previous patch made the wait flag take the opposite value to what
it should be.  This patch fixes that.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:44 -08:00
Herbert Xu
a66207121f [NET]: Check RTNL status in unregister_netdevice
The caller must hold the RTNL so let's check it in unregister_netdevice.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:43 -08:00
Herbert Xu
aebcf82c1f [IPSEC]: Do not let packets pass when ICMP flag is off
This fixes a logical error in ICMP policy checks which lets
packets through if the state ICMP flag is off.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:43 -08:00
Herbert Xu
bb72845e69 [IPSEC]: Make callers of xfrm_lookup to use XFRM_LOOKUP_WAIT
This patch converts all callers of xfrm_lookup that used an
explicit value of 1 to indiciate blocking to use the new flag
XFRM_LOOKUP_WAIT.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:42 -08:00
Herbert Xu
7233b9f33e [IPSEC]: Fix reversed ICMP6 policy check
The policy check I added for ICMP on IPv6 is reversed.  This
patch fixes that.

It also adds an skb->sp check so that unprotected packets that
fail the policy check do not crash the machine.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:41 -08:00
Michal Schmidt
5533995b62 [IPIP]: Allow rebinding the tunnel to another interface
Once created, an IP tunnel can't be bound to another device.
(reported as https://bugzilla.redhat.com/show_bug.cgi?id=419671)

To reproduce:

# create a tunnel:
ip tunnel add tunneltest0 mode ipip remote 10.0.0.1 dev eth0
# try to change the bounding device from eth0 to eth1:
ip tunnel change tunneltest0 dev eth1
# show the result:
ip tunnel show tunneltest0

tunneltest0: ip/ip  remote 10.0.0.1  local any  dev eth0  ttl inherit

Notice the bound device has not changed from eth0 to eth1.

This patch fixes it. When changing the binding, it also recalculates the
MTU according to the new bound device's MTU.

If the change is acceptable, I'll do the same for GRE and SIT tunnels.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:25 -08:00
Denis V. Lunev
81103a52f2 [NETNS]: network namespace was passed into dev_getbyhwaddr but not used
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:24 -08:00
Herbert Xu
8b7817f3a9 [IPSEC]: Add ICMP host relookup support
RFC 4301 requires us to relookup ICMP traffic that does not match any
policies using the reverse of its payload.  This patch implements this
for ICMP traffic that originates from or terminates on localhost.

This is activated on outbound with the new policy flag XFRM_POLICY_ICMP,
and on inbound by the new state flag XFRM_STATE_ICMP.

On inbound the policy check is now performed by the ICMP protocol so
that it can repeat the policy check where necessary.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:23 -08:00
Herbert Xu
d5422efe68 [IPSEC]: Added xfrm_decode_session_reverse and xfrmX_policy_check_reverse
RFC 4301 requires us to relookup ICMP traffic that does not match any
policies using the reverse of its payload.  This patch adds the functions
xfrm_decode_session_reverse and xfrmX_policy_check_reverse so we can get
the reverse flow to perform such a lookup.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:22 -08:00
Herbert Xu
815f4e57e9 [IPSEC]: Make xfrm_lookup flags argument a bit-field
This patch introduces an enum for bits in the flags argument of xfrm_lookup.
This is so that we can cram more information into it later.

Since all current users use just the values 0 and 1, XFRM_LOOKUP_WAIT has
been added with the value 1 << 0 to represent the current meaning of flags.

The test in __xfrm_lookup has been changed accordingly.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:21 -08:00
Gerrit Renker
3f71c81ac3 [TFRC]: Remove previous loss intervals implementation
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:20 -08:00
Gerrit Renker
954c2db868 [CCID3]: Interface CCID3 code with newer Loss Intervals Database
This hooks up the TFRC Loss Interval database with CCID 3 packet reception.
In addition, it makes the CCID-specific computation of the first loss
interval (which requires access to all the guts of CCID3) local to ccid3.c.

The patch also fixes an omission in the DCCP code, that of a default /
fallback RTT value (defined in section 3.4 of RFC 4340 as 0.2 sec); while
at it, the  upper bound of 4 seconds for an RTT sample has  been reduced to
match the initial TCP RTO value of 3 seconds from[RFC 1122, 4.2.3.1].

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:20 -08:00
Gerrit Renker
de0d411cb8 [TFRC]: CCID3 (and CCID4) needs to access these inlines
This moves two inlines back to packet_history.h: these are not private
to packet_history.c, but are needed by CCID3/4 to detect whether a new
loss is indicated, or whether a loss is already pending.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:19 -08:00
Gerrit Renker
db64196038 [CCID3]: Redundant debugging output / documentation
Each time feedback is sent two lines are printed:

	ccid3_hc_rx_send_feedback: client ... - entry
	ccid3_hc_rx_send_feedback: Interval ...usec, X_recv=..., 1/p=...

The first line is redundant and thus removed.

Further, documentation of ccid3_hc_rx_sock (capitalisation) is made consistent.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:18 -08:00
Gerrit Renker
8a9c7e92e0 [TFRC]: Ringbuffer to track loss interval history
A ringbuffer-based implementation of loss interval history is easier to
maintain, allocate, and update.

The `swap' routine to keep the RX history sorted is due to and was written
by Arnaldo Carvalho de Melo, simplifying an earlier macro-based variant.

Details:
 * access to the Loss Interval Records via macro wrappers (with safety checks);
 * simplified, on-demand allocation of entries (no extra memory consumption on
   lossless links); cache allocation is local to the module / exported as service;
 * provision of RFC-compliant algorithm to re-compute average loss interval;
 * provision of comprehensive, new loss detection algorithm
 	- support for all cases of loss, including re-ordered/duplicate packets;
 	- waiting for NDUPACK=3 packets to fill the hole;
	- updating loss records when a late-arriving packet fills a hole.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:18 -08:00
Gerrit Renker
8995a238ef [TFRC]: Loss interval code needs the macros/inlines that were moved
This moves the inlines (which were previously declared as macros) back into
packet_history.h since the loss detection code needs to be able to read entries
from the RX history in order to create the relevant loss entries: it needs at
least tfrc_rx_hist_loss_prev() and tfrc_rx_hist_last_rcv(), which in turn
require the definition of the other inlines (macros).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:16 -08:00
Gerrit Renker
df8f83fdd6 [TFRC]: Put RX/TX initialisation into tfrc.c
This separates RX/TX initialisation and puts all packet history / loss intervals
initialisation into tfrc.c.
The organisation is uniform: slab declaration -> {rx,tx}_init() -> {rx,tx}_exit()

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:15 -08:00
Denis V. Lunev
2aaef4e47f [NETNS]: separate af_packet netns data
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:15 -08:00
Denis V. Lunev
a0a53c8ba9 [NETNS]: struct net content re-work (v3)
Recently David Miller and Herbert Xu pointed out that struct net becomes
overbloated and un-maintainable. There are two solutions:
- provide a pointer to a network subsystem definition from struct net.
  This costs an additional dereferrence
- place sub-system definition into the structure itself. This will speedup
  run-time access at the cost of recompilation time

The second approach looks better for us. Other sub-systems will follow.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:14 -08:00
Daniel Lezcano
7f4e4868f3 [IPV6]: make the protocol initialization to return an error code
This patchset makes the different protocols to return an error code, so
the af_inet6 module can check the initialization was correct or not.

The raw6 was taken into account to be consistent with the rest of the
protocols, but the registration is at the same place.
Because the raw6 has its own init function, the proto and the ops structure
can be moved inside the raw6.c file.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:13 -08:00
Daniel Lezcano
87c3efbfdd [IPV6]: make inet6_register_protosw to return an error code
This patch makes the inet6_register_protosw to return an error code.
The different protocols can be aware the registration was successful or
not and can pass the error to the initial caller, af_inet6.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:12 -08:00
Daniel Lezcano
853cbbaaa4 [IPV6]: make frag to return an error at initialization
This patch makes the frag_init to return an error code, so the af_inet6
module can handle the error.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:11 -08:00
Daniel Lezcano
248b238dc9 [IPV6]: make extended headers to return an error at initialization
This patch factorize the code for the differents init functions for rthdr,
nodata, destopt in a single function exthdrs_init.
This function returns an error so the af_inet6 module can check correctly
the initialization.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:10 -08:00
Daniel Lezcano
0a3e78ac2c [IPV6]: make flowlabel to return an error
This patch makes the flowlab subsystem to return an error code and makes
some cleanup with procfs ifdefs.
The af_inet6 will use the flowlabel init return code to check the initialization
was correct.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:10 -08:00
Pavel Emelyanov
51602b2a5e [IPV4]: Cleanup sysctl manipulations in devinet.c
This includes:

 * moving neigh_sysctl_(un)register calls inside
   devinet_sysctl_(un)register ones, as they are always
   called in pairs;
 * making __devinet_sysctl_unregister() to unregister
   the ipv4_devconf struct, while original devinet_sysctl_unregister()
   works with the in_device to handle both - devconf and
   neigh sysctls;
 * make stubs for CONFIG_SYSCTL=n case to get rid of
   in-code ifdefs.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:09 -08:00
Pavel Emelyanov
95c9382a34 [INET]: Use BUILD_BUG_ON in inet_timewait_sock.c checks
Make the INET_TWDR_TWKILL_SLOTS vs sizeof(twdr->thread_slots)
check nicer.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:08 -08:00
Pavel Emelyanov
1f9e636ea2 [TCP]: Use BUILD_BUG_ON for tcp_skb_cb size checking
The sizeof(struct tcp_skb_cb) should not be less than the
sizeof(skb->cb). This is checked in net/ipv4/tcp.c, but
this check can be made more gracefully.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:07 -08:00
Eric Dumazet
ea72912c88 [NETLINK]: kzalloc() conversion
nl_pid_hash_alloc() is renamed to nl_pid_hash_zalloc().
It is now returning zeroed memory to its callers.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:06 -08:00
Eric Dumazet
64b7d96167 [NET]: dst_ifdown() cleanup
This cleanup shrinks size of net/core/dst.o on i386 from 1299 to 1289 bytes.
(This is because dev_hold()/dev_put() are doing atomic_inc()/atomic_dec() and
force compiler to re-evaluate memory contents.)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:05 -08:00
Herbert Xu
005011211f [IPSEC]: Add xfrm_input_state helper
This patch adds the xfrm_input_state helper function which returns the
current xfrm state being processed on the input path given an sk_buff.
This is currently only used by xfrm_input but will be used by ESP upon
asynchronous resumption.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:05 -08:00
Gerrit Renker
385ac2e3f2 [CCID3]: HC-receiver should not insert timestamps as HC-sender doesn't uses it
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:04 -08:00
Gerrit Renker
797eba424d [TFRC]: The function tfrc_rx_hist_entry_delete() is not used anymore
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:03 -08:00
Gerrit Renker
78282d2af5 [TFRC]: Move comment.
Moved up the comment "Receiver routines" above the first occurrence of
RX history routines.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:03 -08:00
YOSHIFUJI Hideaki
c69bce20dd [NET]: Remove unused "mibalign" argument for snmp_mib_init().
With fixes from Arnaldo Carvalho de Melo.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:02 -08:00
Denis V. Lunev
971b893e79 [IPV4]: last default route is a fib table property
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:01 -08:00
Denis V. Lunev
a2bbe6822f [IPV4]: Unify assignment of fi to fib_result
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:01 -08:00
Denis V. Lunev
c17860a039 [IPV4]: no need pass pointer to a default into fib_detect_death
ipv4: no need pass pointer to a default into fib_detect_death

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:57:00 -08:00
Daniel Lezcano
7e5449c215 [IPV6]: route6 remove ifdef for fib_rules
The patch defines the usual static inline functions when the code is
disabled for fib6_rules. That's allow to remove some ifdef in route.c
file and make the code a little more clear.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:59 -08:00
Daniel Lezcano
c35b7e72cd [IPV6]: remove ifdef in route6 for xfrm6
The following patch create the usual static inline functions to disable
the xfrm6_init and xfrm6_fini function when XFRM is off.
That's allow to remove some ifdef and make the code a little more clear.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:59 -08:00
Daniel Lezcano
75314fb383 [IPV6]: create route6 proc init-fini functions
Make the proc creation/destruction to be a separate function. That
allows to remove the #ifdef CONFIG_PROC_FS in the init/fini function
and make them more readable.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:58 -08:00
Pavel Emelyanov
b8e1f9b5c3 [NET] sysctl: make sysctl_somaxconn per-namespace
Just move the variable on the struct net and adjust
its usage.

Others sysctls from sys.net.core table are more
difficult to virtualize (i.e. make them per-namespace),
but I'll look at them as well a bit later.

Signed-off-by: Pavel Emelyanov <xemul@oenvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:57 -08:00
Pavel Emelyanov
790a353289 [NET] sysctl: prepare core tables to point to netns variables
Some of ctl variables are going to be on the struct
net. Here's the way to adjust the ->data pointer on the
ctl_table-s to point on the right variable.

Since some pointers still point on the global variables,
I keep turning the write bits off on such tables.

This looks to become a common procedure for net sysctls,
so later parts of this code may migrate to some more
generic place.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:56 -08:00
Pavel Emelyanov
024626e36d [NET] sysctl: make the sys.net.core sysctls per-namespace
Making them per-namespace is required for the following
two reasons:

 First, some ctl values have a per-namespace meaning.
 Second, making them writable from the sub-namespace
 is an isolation hole.

So I introduce the pernet operations to create these
tables. For init_net I use the existing statically
declared tables, for sub-namespace they are duplicated
and the write bits are removed from the mode.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:56 -08:00
Denis Cheng
b5e78337b5 [IUCV]: use LIST_HEAD instead of LIST_HEAD_INIT
these three list_head are all local variables, but can also use
LIST_HEAD.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:54 -08:00
Denis Cheng
df01812eba [XFRM] net/xfrm/xfrm_state.c: use LIST_HEAD instead of LIST_HEAD_INIT
single list_head variable initialized with LIST_HEAD_INIT could almost
always can be replaced with LIST_HEAD declaration, this shrinks the code
and looks better.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:54 -08:00
Denis Cheng
0e3cf7e916 [X25]: use LIST_HEAD instead of LIST_HEAD_INIT
single list_head variable initialized with LIST_HEAD_INIT could almost
always can be replaced with LIST_HEAD declaration, this shrinks the code
and looks better.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:53 -08:00
Denis Cheng
14d0e7b74e [LAPB] net/lapb/lapb_iface.c: use LIST_HEAD instead of LIST_HEAD_INIT
single list_head variable initialized with LIST_HEAD_INIT could almost
always can be replaced with LIST_HEAD declaration, this shrinks the code
and looks better.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:52 -08:00
Denis Cheng
1596c97aa8 [IPV4] net/ipv4/cipso_ipv4.c: use LIST_HEAD instead of LIST_HEAD_INIT
single list_head variable initialized with LIST_HEAD_INIT could almost
always can be replaced with LIST_HEAD declaration, this shrinks the code
and looks better.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:52 -08:00
Denis Cheng
3b5b34fd2b [NET] net/core/dev.c: use LIST_HEAD instead of LIST_HEAD_INIT
single list_head variable initialized with LIST_HEAD_INIT could almost
always can be replaced with LIST_HEAD declaration, this shrinks the code
and looks better.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:51 -08:00
Eric W. Biederman
877a9bff38 [IPV4]: Move trie_local and trie_main into the proc iterator.
We only use these variables when displaying the trie in proc so
place them into the iterator to make this explicit.  We should
probably do something smarter to handle the CONFIG_IP_MULTIPLE_TABLES
case but at least this makes it clear that the silliness is limited
to the display in /proc.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:49 -08:00
Eric W. Biederman
bb80317586 [IPV4]: Remove ip_fib_local_table and ip_fib_main_table defines.
There are only 2 users and it doesn't hurt to call fib_get_table
instead, and it makes it easier to make the fib network namespace
aware.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:49 -08:00
Daniel Lezcano
f845ab6b7d [IPV6] route6/fib6: Don't panic a kmem_cache_create.
If the kmem_cache_creation fails, the kernel will panic. It is
acceptable if the system is booting, but if the ipv6 protocol is
compiled as a module and it is loaded after the system has booted, do
we want to panic instead of just failing to initialize the protocol ?

The init function is now returning an error and this one is checked
for protocol initialization. So the ipv6 protocol will safely fails.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:48 -08:00
Daniel Lezcano
e2fddf5e96 [IPV6]: Make af_inet6 to check ip6_route_init return value.
The af_inet6 initialization function does not check the return code of
the route initilization, so if something goes wrong, the protocol
initialization will continue anyway.  This patch takes into account
the modification made in the different route's initialization
subroutines to check the return value and to make the protocol
initialization to fail.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:47 -08:00
Daniel Lezcano
433d49c3bb [IPV6]: Make ip6_route_init to return an error code.
The route initialization function does not return any value to notify
if the initialization is successful or not. This patch checks all
calls made for the initilization in order to return a value for the
caller.

Unfortunately, proc_net_fops_create will return a NULL pointer if
CONFIG_PROC_FS is off, so we can not check the return code without an
ifdef CONFIG_PROC_FS block in the ip6_route_init function.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:47 -08:00
Daniel Lezcano
9eb87f3f7e [IPV6]: Make fib6_rules_init to return an error code.
When the fib_rules initialization finished, no return code is provided
so there is no way to know, for the caller, if the initialization has
been successful or has failed. This patch fix that.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:46 -08:00
Daniel Lezcano
0013cabab3 [IPV6]: Make xfrm6_init to return an error code.
The xfrm initialization function does not return any error code, so if
there is an error, the caller can not be advise of that.  This patch
checks the return code of the different called functions in order to
return a successful or failed initialization.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:45 -08:00
Daniel Lezcano
d63bddbe90 [IPV6]: Make fib6_init to return an error code.
If there is an error in the initialization function, nothing is
followed up to the caller. So I add a return value to be set for the
init function.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:45 -08:00
Denis V. Lunev
5a3e55d68e [NET]: Multiple namespaces in the all dst_ifdown routines.
Move dst entries to a namespace loopback to catch refcounting leaks.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:44 -08:00
Arnaldo Carvalho de Melo
b84a2189c4 [TFRC]: New rx history code
Credit here goes to Gerrit Renker, that provided the initial implementation for
this new codebase.

I modified it just to try to make it closer to the existing API, renaming some
functions, add namespacing and fix one bug where the tfrc_rx_hist_alloc was not
freeing the allocated ring entries on the error path.

Original changeset comment from Gerrit:
      -----------
This provides a new, self-contained and generic RX history service for TFRC
based protocols.

Details:
 * new data structure, initialisation and cleanup routines;
 * allocation of dccp_rx_hist entries local to packet_history.c,
   as a service exported by the dccp_tfrc_lib module.
 * interface to automatically track highest-received seqno;
 * receiver-based RTT estimation (needed for instance by RFC 3448, 6.3.1);
 * a generic function to test for `data packets' as per  RFC 4340, sec. 7.7.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:43 -08:00
Gerrit Renker
30a0eacd47 [CCID3]: The receiver of a half-connection does not set window counter values
Only the sender sets window counters [RFC 4342, sections 5 and 8.1].

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:43 -08:00
Arnaldo Carvalho de Melo
d58d1af03a [TFRC]: Rename dccp_rx_ to tfrc_rx_
This is in preparation for merging the new rx history code written by Gerrit Renker.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:42 -08:00
Arnaldo Carvalho de Melo
34a9e7ea91 [TFRC]: Make the rx history slab be global
This is in preparation for merging the new rx history code written by Gerrit Renker.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:41 -08:00
Arnaldo Carvalho de Melo
e9c8b24a6a [TFRC]: Rename tfrc_tx_hist to tfrc_tx_hist_slab, for consistency
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:40 -08:00
Gerrit Renker
2180c41ca5 [DCCP]: Introduce generic function to test for `data packets'
as per  RFC 4340, sec. 7.7.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:40 -08:00
Gerrit Renker
c40616c597 [TFRC]: Provide central source file and debug facility
This patch changes the tfrc_lib module in the following manner:

 (1) a dedicated tfrc source file to call the packet history &
     loss interval init/exit functions.
 (2) a dedicated tfrc_pr_debug macro with toggle switch `tfrc_debug'.

Commiter note: renamed tfrc_module.c to tfrc.c, and made CONFIG_IP_DCCP_CCID3
select IP_DCCP_TFRC_LIB.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:39 -08:00
Pavel Emelyanov
f8b33fdfaf [ARP]: Consolidate some code in arp_req_set/delete_publc
The PROXY_ARP is set on devconfigs in a similar way in
both calls.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:38 -08:00
Pavel Emelyanov
46479b4329 [ARP]: Minus one level of ndentation in arp_req_delete
The same cleanup for deletion requests.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:38 -08:00
Pavel Emelyanov
43dc170117 [ARP]: Minus one level of indentation in arp_req_set
The ATF_PUBL requests are handled completely separate from
the others. Emphasize it with a separate function. This also
reduces the indentation level.

The same issue exists with the arp_delete_request, but
when I tried to make it in one patch diff produced completely
unreadable patch. So I split it into two, but they may be
done with one commit.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:37 -08:00
Pavel Emelyanov
1ff1cc202e [IPV4] ROUTE: Convert rt_hash_lock_init() macro into function
There's no need in having this function exist in a form
of macro. Properly formatted function looks much better.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:36 -08:00
Pavel Emelyanov
107f163428 [IPV4] ROUTE: Clean up proc files creation.
The rt_cache, stats/rt_cache and rt_acct(optional) files
creation looks a bit messy. Clean this out and join them
to other proc-related functions under the proper ifdef.

The struct net * argument in a new function will help net
namespaces patches look nicer.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:36 -08:00
Pavel Emelyanov
78c686e9fa [IPV4] ROUTE: Collect proc-related functions together
The net/ipv4/route.c file declares some entries for proc
to dump some routing info. The reading functions are
scattered over this file - collect them together.

Besides, remove a useless IP_RT_ACCT_CPU macro.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:35 -08:00
Herbert Xu
a59322be07 [UDP]: Only increment counter on first peek/recv
The previous move of the the UDP inDatagrams counter caused each
peek of the same packet to be counted separately.  This may be
undesirable.

This patch fixes this by adding a bit to sk_buff to record whether
this packet has already been seen through skb_recv_datagram.  We
then only increment the counter when the packet is seen for the
first time.

The only dodgy part is the fact that skb_recv_datagram doesn't have
a good way of returning this new bit of information.  So I've added
a new function __skb_recv_datagram that does return this and made
skb_recv_datagram a wrapper around it.

The plan is to eventually replace all uses of skb_recv_datagram with
this new function at which time it can be renamed its proper name.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:34 -08:00
Herbert Xu
1781f7f580 [UDP]: Restore missing inDatagrams increments
The previous move of the the UDP inDatagrams counter caused the
counting of encapsulated packets, SUNRPC data (as opposed to call)
packets and RXRPC packets to go missing.

This patch restores all of these.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:33 -08:00
Herbert Xu
27ab256864 [UDP]: Avoid repeated counting of checksum errors due to peeking
Currently it is possible for two processes to peek on the same socket
and end up incrementing the error counter twice for the same packet.

This patch fixes it by making skb_kill_datagram return whether it
succeeded in unlinking the packet and only incrementing the counter
if it did.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:32 -08:00
Pavel Emelyanov
c8fecf2242 [IPV6]: Eliminate difference in actions of sysctl and proc handler for conf.all.forwarding
The only difference in this case is that updating all.forwarding
causes the update in default.forwarding when done via proc, but
not via the system call.

Besides, this consolidates a good portion of code.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:32 -08:00
Pavel Emelyanov
68dd299bc8 [INET]: Merge sys.net.ipv4.ip_forward and sys.net.ipv4.conf.all.forwarding
AFAIS these two entries should do the same thing - change the
forwarding state on ipv4_devconf and on all the devices.

I propose to merge the handlers together using ctl paths.

The inet_forward_change() is static after this and I move
it higher to be closer to other "propagation" helpers and
to avoid diff making patches based on { and } matching :)
i.e. - make them easier to read.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:31 -08:00
Pavel Emelyanov
4d43b78ac2 [IPV6]: Use sysctl paths to register ipv6 sysctl tables
I have already done this for core, ipv4 and tr tables, so repeat this
for the ipv6 ones.

This makes the ipv6.ko smaller and creates the ground needed for net
namespaces support in ipv6.ko ssctls.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:30 -08:00
Pavel Emelyanov
4a61b586cd [IPV6]: Make the ipv6/sysctl_net_ipv6.c compilation cleaner
Since this file is entirely enclosed with the
#ifdef CONFIG_SYSCTL/#endif pair, it's OK to move this
CONFIG_ into a Makefile.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:29 -08:00
Pavel Emelyanov
08913681e4 [NET]: Remove the empty net_table
I have removed all the entries from this table (core_table,
ipv4_table and tr_table), so now we can safely drop it.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:29 -08:00
Pavel Emelyanov
36f0bebd98 [TR]: Use ctl paths to register net/token-ring/ table
The same thing for token-ring - use ctl paths and get
rid of external references on the tr_table.

Unfortunately, I couldn't split this patch into cleanup and
use-the-paths parts.

As a lame excuse I can say, that the cleanup is just moving
the tr_table from one file to another - closet to a single
variable, that this ctl table tunes. Since the source  file
becomes empty after the move, I remove it.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:28 -08:00
Pavel Emelyanov
3e37c3f997 [IPV4]: Use ctl paths to register net/ipv4/ table
This is the same as I did for the net/core/ table in the
second patch in his series: use the paths and isolate the
whole table in the .c file.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:27 -08:00
Pavel Emelyanov
9ba6397976 [IPV4]: Cleanup the sysctl_net_ipv4.c file
This includes several cleanups:

 * tune Makefile to compile out this file when SYSCTL=n. Now
   it looks like net/core/sysctl_net_core.c one;
 * move the ipv4_config to af_inet.c to exist all the time;
 * remove additional sysctl_ip_nonlocal_bind declaration
   (it is already declared in net/ip.h);
 * remove no nonger needed ifdefs from this file.

This is a preparation for using ctl paths for net/ipv4/
sysctl table.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:27 -08:00
Pavel Emelyanov
33eb9cfc70 [NET]: Isolate the net/core/ sysctl table
Using ctl paths we can put all the stuff, related to net/core/
sysctl table, into one file and remove all the references on it.

As a good side effect this hides the "core_table" name from
the global scope :)

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:26 -08:00
Pavel Emelyanov
7e2e109cef [NET]: Remove unneeded ifdefs from sysctl_net_core.c
This file is already compiled out when the SYSCTL=n, so
these ifdefs, that enclose the whole file, can be removed.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:25 -08:00
Patrick McHardy
2eeeba390a [NETFILTER]: Select CONFIG_NETFILTER_NETLINK when needed
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:25 -08:00
Patrick McHardy
ab4f58c77a [NETFILTER]: remove NF_CONNTRACK_ENABLED option
Remove the NF_CONNTRACK_ENABLED option. It was meant for a smoother upgrade
to nf_conntrack, people having reconfigured their kernel at least once since
ip_conntrack was removed will have the NF_CONNTRACK option already set.
People upgrading from older kernels have to reconfigure a lot anyway.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:24 -08:00
Patrick McHardy
4ad9d4fa94 [NETFILTER]: nfnetlink_queue: update copyright
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:23 -08:00
Patrick McHardy
0ef0f46580 [NETFILTER]: nfnetlink_queue: remove useless enqueue status codes
The queueing core doesn't care about the exact return value from
the queue handler, so there's no need to go through the trouble
of returning a meaningful value as long as we indicate an error.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:23 -08:00
Patrick McHardy
861934c7c8 [NETFILTER]: nfnetlink_queue: eliminate impossible switch case
We don't need a default case in nfqnl_build_packet_message(), the
copy_mode is validated when it is set. Tell the compiler about
the possible types and remove the default case. Saves 80b of
text on x86_64.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:22 -08:00
Patrick McHardy
ea3a66ff5a [NETFILTER]: nfnetlink_queue: use endianness-aware attribute functions
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:21 -08:00
Patrick McHardy
9d6023ab8b [NETFILTER]: nfnetlink_queue: mark hash table __read_mostly
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:21 -08:00
Patrick McHardy
2bd0119729 [NETFILTER]: nfnetlink_queue: remove useless debugging
Originally I wanted to just remove the QDEBUG macro and use pr_debug, but
none of the messages seems worth keeping.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:20 -08:00
Patrick McHardy
c5de0dfde8 [NETFILTER]: nfnetlink_queue: kill useless wrapper
nfqnl_set_mode takes the queue lock and calls __nfqnl_set_mode. Just move
the code from __nfqnl_set_mode to nfqnl_set_mode since there is no other
user.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:19 -08:00
Patrick McHardy
9872bec773 [NETFILTER]: nfnetlink: use RCU for queue instances hash
Use RCU for queue instances hash. Avoids multiple atomic operations
for each packet.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:18 -08:00
Patrick McHardy
a3c8e7fd4b [NETFILTER]: nfnetlink_queue: fix checks in nfqnl_recv_config
The peer_pid must be checked in all cases when a queue exists, currently
it is not checked if for NFQA_CFG_QUEUE_MAXLEN when a NFQA_CFG_CMD
attribute exists in some cases. Same for the queue existance check,
which can cause a NULL pointer dereference.

Also consistently return -ENODEV for "queue not found". -ENOENT would
be better, but that is already used to indicate a queued skb id doesn't
exist.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:18 -08:00
Patrick McHardy
e48b9b2fb3 [NETFILTER]: nfnetlink_queue: avoid unnecessary atomic operation
The sequence counter doesn't need to be an atomic_t, just move the increment
inside the locked section.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:17 -08:00
Patrick McHardy
f9c6399050 [NETFILTER]: remove annoying debugging message
Don't log "nf_hook: Verdict = QUEUE." message with NETFILTER_DEBUG=y.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:16 -08:00
Patrick McHardy
daaa8be2e0 [NETFILTER]: nf_queue: clean up error paths
Move duplicated error handling to end of function and add a helper function
to release the device and module references from the queue entry.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:16 -08:00
Patrick McHardy
4b3d15ef4a [NETFILTER]: {nfnetlink,ip,ip6}_queue: kill issue_verdict
Now that issue_verdict doesn't need to free the queue entries anymore,
all it does is disable local BHs and call nf_reinject. Move the BH
disabling to the okfn invocation in nf_reinject and kill the
issue_verdict functions.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:15 -08:00
Patrick McHardy
02f014d888 [NETFILTER]: nf_queue: move list_head/skb/id to struct nf_info
Move common fields for queue management to struct nf_info and rename it
to struct nf_queue_entry. The avoids one allocation/free per packet and
simplifies the code a bit.

Alternatively we could add some private room at the tail, but since
all current users use identical structs this seems easier.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:14 -08:00
Patrick McHardy
7a6c6653b3 [NETFILTER]: ip6_queue: resync dev-index based flushing
Resync dev_cmp to take bridge devices into account.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:14 -08:00
Patrick McHardy
171b7fc4fc [NETFILTER]: ip6_queue: deobfuscate entry lookups
A queue entry lookup currently looks like this:

ipq_find_dequeue_entry -> __ipq_find_dequeue_entry ->
	__ipq_find_entry -> cmpfn -> id_cmp

Use simple open-coded list walking and kill the cmpfn for
ipq_find_dequeue_entry. Instead add it to ipq_flush (after
similar cleanups) and use ipq_flush for both complete flushes
and flushing entries related to a device.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:13 -08:00
Patrick McHardy
9521409265 [NETFILTER]: ip_queue: deobfuscate entry lookups
A queue entry lookup currently looks like this:

ipq_find_dequeue_entry -> __ipq_find_dequeue_entry ->
	__ipq_find_entry -> cmpfn -> id_cmp

Use simple open-coded list walking and kill the cmpfn for
ipq_find_dequeue_entry. Instead add it to ipq_flush (after
similar cleanups) and use ipq_flush for both complete flushes
and flushing entries related to a device.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:12 -08:00
Patrick McHardy
b43d8d8598 [NETFILTER]: nfnetlink_queue: deobfuscate entry lookups
A queue entry lookup currently looks like this:

find_dequeue_entry -> __find_dequeue_entry ->
	__find_entry -> cmpfn -> id_cmp

Use simple open-coded list walking and kill the cmpfn for
find_dequeue_entry. Instead add it to nfqnl_flush (after
similar cleanups) and use nfqnl_flush for both complete
flushes and flushing entries related to a device.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:12 -08:00
Patrick McHardy
0ac41e8146 [NETFILTER]: {nf_netlink,ip,ip6}_queue: use list_for_each_entry
Use list_add_tail/list_for_each_entry instead of list_add and
list_for_each_prev as a preparation for switching to RCU.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:11 -08:00
Patrick McHardy
c01cd429fc [NETFILTER]: nf_queue: move queueing related functions/struct to seperate header
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:10 -08:00
Patrick McHardy
f9d8928f83 [NETFILTER]: nf_queue: remove unused data pointer
Remove the data pointer from struct nf_queue_handler. It has never been used
and is useless for the only handler that really matters, nfnetlink_queue,
since the handler is shared between all instances.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:10 -08:00
Patrick McHardy
e3ac529815 [NETFILTER]: nf_queue: make queue_handler const
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:09 -08:00
Patrick McHardy
fb46990dba [NETFILTER]: nf_queue: remove unnecessary hook existance check
We hold a module reference for each queued packet, so the hook that
queued the packet can't disappear. Also remove an obsolete  comment
stating the opposite.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:08 -08:00
Patrick McHardy
8b1cf0db2a [NETFILTER]: nf_queue: minor cleanup
Clean up

if (x) y;

constructs. We've got nothing to hide :)

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:07 -08:00
Patrick McHardy
1999414a4e [NETFILTER]: Mark hooks __read_mostly
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:07 -08:00
Patrick McHardy
41c5b31703 [NETFILTER]: Use nf_register_hooks for multiple registrations
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:06 -08:00
Patrick McHardy
279c2c74b6 [NETFILTER]: nf_conntrack_proto_icmp: kill extern declaration in .c file
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:05 -08:00
Patrick McHardy
1841a4c7ae [NETFILTER]: nf_ct_h323: remove ipv6 module dependency
nf_conntrack_h323 needs ip6_route_output for the call forwarding filter.
Add a ->route function to nf_afinfo and use that to avoid pulling in the
ipv6 module.

Fix the #ifdef for the IPv6 code while I'm at it - the IPv6 support is
only needed when IPv6 conntrack is enabled.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:05 -08:00
Patrick McHardy
193b23c5a0 [NETFILTER]: xt_hashlimit: remove ip6tables module dependency
Switch from ipv6_find_hdr to ipv6_skip_exthdr to avoid pulling in ip6_tables
and ipv6 when only using it for IPv4.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:04 -08:00
Maciej Soltysiak
17dfc93f6d [NETFILTER]: {ip,ip6}t_LOG: log GID
Log GID in addition to UID

Signed-off-by: Maciej Soltysiak <maciej.soltysiak@ae.poznan.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:03 -08:00
Patrick McHardy
50c164a81f [NETFILTER]: x_tables: add rateest match
Add rate estimator match. The rate estimator match can match on
estimated rates by the RATEEST target. It supports matching on
absolute bps/pps values, comparing two rate estimators and matching
on the difference between two rate estimators.

This is what I use to route outgoing data connections from a FTP
server over two lines based on the  available bandwidth:

# estimate outgoing rates
iptables -t mangle -A POSTROUTING -o eth0 -j RATEEST --rateest-name eth0 \
                                                     --rateest-interval 250ms \
                                                     --rateest-ewma 0.5s
iptables -t mangle -A POSTROUTING -o ppp0 -j RATEEST --rateest-name ppp0 \
                                                     --rateest-interval 250ms \
                                                     --rateest-ewma 0.5s

# mark based on available bandwidth
iptables -t mangle -A BALANCE -m state --state NEW \
                              -m helper --helper ftp \
                              -m rateest --rateest-delta \
                                         --rateest1 eth0 \
                                         --rateest-bps1 2.5mbit \
                                         --rateest-gt \
                                         --rateest2 ppp0 \
                                         --rateest-bps2 2mbit \
                              -j CONNMARK --set-mark 0x1

iptables -t mangle -A BALANCE -m state --state NEW \
                              -m helper --helper ftp \
                              -m rateest --rateest-delta \
                                         --rateest1 ppp0 \
                                         --rateest-bps1 2mbit \
                                         --rateest-gt \
                                         --rateest2 eth0 \
                                         --rateest-bps2 2.5mbit \
                              -j CONNMARK --set-mark 0x2

iptables -t mangle -A BALANCE -j CONNMARK --restore-mark

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:03 -08:00
Patrick McHardy
5859034d7e [NETFILTER]: x_tables: add RATEEST target
Add new rate estimator target (using gen_estimator). In combination with
the rateest match (next patch) this can be used for load-based multipath
routing.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:02 -08:00
Patrick McHardy
cb76c6a597 [NETFILTER]: ip_tables: remove obsolete SAME target
Remove the ipt_SAME target as scheduled in feature-removal-schedule.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:01 -08:00
Jan Engelhardt
5c350e5a38 [NETFILTER]: IPv6 capable xt_TOS v1 target
Extends the xt_DSCP target by xt_TOS v1 to add support for selectively
setting and flipping any bit in the IPv4 TOS and IPv6 Priority fields.
(ipt_TOS and xt_DSCP only accepted a limited range of possible
values.)

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:00 -08:00
Jan Engelhardt
f1095ab51d [NETFILTER]: IPv6 capable xt_tos v1 match
Extends the xt_dscp match by xt_tos v1 to add support for selectively
matching any bit in the IPv4 TOS and IPv6 Priority fields. (ipt_tos
and xt_dscp only accepted a limited range of possible values.)

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:00 -08:00