Support more fine grained control of bridge netfilter iptables invocation
by adding seperate brnf_call_*tables parameters for each device using the
sysfs interface. Packets are passed to layer 3 netfilter when either the
global parameter or the per bridge parameter is enabled.
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
The LOG targets print the entire MAC header as one long string, which is not
readable very well:
IN=eth0 OUT= MAC=00:15:f2:24:91:f8:00:1b:24:dc:61:e6:08:00 ...
Add an option to decode known header formats (currently just ARPHRD_ETHER devices)
in their individual fields:
IN=eth0 OUT= MACSRC=00:1b:24:dc:61:e6 MACDST=00:15:f2:24:91:f8 MACPROTO=0800 ...
IN=eth0 OUT= MACSRC=00:1b:24:dc:61:e6 MACDST=00:15:f2:24:91:f8 MACPROTO=86dd ...
The option needs to be explicitly enabled by userspace to avoid breaking
existing parsers.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Remove the comparison within the loop to print the macheader by prepending
the colon to all but the first printk.
Based on suggestion by Jan Engelhardt <jengelh@medozas.de>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
CONFIG_NF_CT_ACCT has been deprecated for awhile and
was originally scheduled for removal by 2.6.29.
Removing support for this config option also stops
this deprecation warning message in the kernel log.
[ 61.669627] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
[ 61.669850] CONFIG_NF_CT_ACCT is deprecated and will be removed soon. Please use
[ 61.669852] nf_conntrack.acct=1 kernel parameter, acct=1 nf_conntrack module option or
[ 61.669853] sysctl net.netfilter.nf_conntrack_acct=1 to enable it.
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
[Patrick: changed default value to 0]
Signed-off-by: Patrick McHardy <kaber@trash.net>
Check at rule install time that CT accounting is enabled. Force it
to be enabled if not while also emitting a warning since this is not
the default state.
This is in preparation for deprecating CONFIG_NF_CT_ACCT upon which
CONFIG_NETFILTER_XT_MATCH_CONNBYTES depended being set.
Added 2 CT accounting support functions:
nf_ct_acct_enabled() - Get CT accounting state.
nf_ct_set_acct() - Enable/disable CT accountuing.
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Destination was spelled wrong in KConfig.
Signed-off-by: Arnd Hannemann <hannemann@nets.rwth-aachen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Add header file to fix build error:
net/netfilter/xt_IDLETIMER.c:276: error: implicit declaration of function 'MKDEV'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Allow one-packet scheduling for UDP connections. When the fwmark-based or
normal virtual service is marked with '-o' or '--ops' options all
connections are created only to schedule one packet. Useful to schedule UDP
packets from same client port to different real servers. Recommended with
RR or WRR schedulers (the connections are not visible with ipvsadm -L).
Signed-off-by: Nick Chalk <nick@loadbalancer.org>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2.6.34 introduced 'conntrack zones' to deal with cases where packets
from multiple identical networks are handled by conntrack/NAT. Packets
are looped through veth devices, during which they are NATed to private
addresses, after which they can continue normally through the stack
and possibly have NAT rules applied a second time.
This works well, but is needlessly complicated for cases where only
a single SNAT/DNAT mapping needs to be applied to these packets. In that
case, all that needs to be done is to assign each network to a seperate
zone and perform NAT as usual. However this doesn't work for packets
destined for the machine performing NAT itself since its corrently not
possible to configure SNAT mappings for the LOCAL_IN chain.
This patch adds a new INPUT chain to the NAT table and changes the
targets performing SNAT to be usable in that chain.
Example usage with two identical networks (192.168.0.0/24) on eth0/eth1:
iptables -t raw -A PREROUTING -i eth0 -j CT --zone 1
iptables -t raw -A PREROUTING -i eth0 -j MARK --set-mark 1
iptables -t raw -A PREROUTING -i eth1 -j CT --zone 2
iptabels -t raw -A PREROUTING -i eth1 -j MARK --set-mark 2
iptables -t nat -A INPUT -m mark --mark 1 -j NETMAP --to 10.0.0.0/24
iptables -t nat -A POSTROUTING -m mark --mark 1 -j NETMAP --to 10.0.0.0/24
iptables -t nat -A INPUT -m mark --mark 2 -j NETMAP --to 10.0.1.0/24
iptables -t nat -A POSTROUTING -m mark --mark 2 -j NETMAP --to 10.0.1.0/24
iptables -t raw -A PREROUTING -d 10.0.0.0/24 -j CT --zone 1
iptables -t raw -A OUTPUT -d 10.0.0.0/24 -j CT --zone 1
iptables -t raw -A PREROUTING -d 10.0.1.0/24 -j CT --zone 2
iptables -t raw -A OUTPUT -d 10.0.1.0/24 -j CT --zone 2
iptables -t nat -A PREROUTING -d 10.0.0.0/24 -j NETMAP --to 192.168.0.0/24
iptables -t nat -A OUTPUT -d 10.0.0.0/24 -j NETMAP --to 192.168.0.0/24
iptables -t nat -A PREROUTING -d 10.0.1.0/24 -j NETMAP --to 192.168.0.0/24
iptables -t nat -A OUTPUT -d 10.0.1.0/24 -j NETMAP --to 192.168.0.0/24
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch implements an idletimer Xtables target that can be used to
identify when interfaces have been idle for a certain period of time.
Timers are identified by labels and are created when a rule is set with a new
label. The rules also take a timeout value (in seconds) as an option. If
more than one rule uses the same timer label, the timer will be restarted
whenever any of the rules get a hit.
One entry for each timer is created in sysfs. This attribute contains the
timer remaining for the timer to expire. The attributes are located under
the xt_idletimer class:
/sys/class/xt_idletimer/timers/<label>
When the timer expires, the target module sends a sysfs notification to the
userspace, which can then decide what to do (eg. disconnect to save power).
Cc: Timo Teras <timo.teras@iki.fi>
Signed-off-by: Luciano Coelho <luciano.coelho@nokia.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
- clusterip_lock becomes a spinlock
- lockless lookups
- kfree() deferred after RCU grace period
- rcu_barrier_bh() inserted in clusterip_tg_exit()
v2)
- As Patrick pointed out, we use atomic_inc_not_zero() in
clusterip_config_find_get().
- list_add_rcu() and list_del_rcu() variants are used.
- atomic_dec_and_lock() used in clusterip_config_entry_put()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Change 2wire transfer rate of SFP+ module EEPROM from 400Khz to 100Khz
since some DACs(direct attached cables) do not work at 400Khz.
Reported-by: Krzysztof Oldzki <ole@ans.pl>
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Try to reduce cache line contentions in peer management, to reduce IP
defragmentation overhead.
- peer_fake_node is marked 'const' to make sure its not modified.
(tested with CONFIG_DEBUG_RODATA=y)
- Group variables in two structures to reduce number of dirtied cache
lines. One named "peers" for avl tree root, its number of entries, and
associated lock. (candidate for RCU conversion)
- A second one named "unused_peers" for unused list and its lock
- Add a !list_empty() test in unlink_from_unused() to avoid taking lock
when entry is not unused.
- Use atomic_dec_and_lock() in inet_putpeer() to avoid taking lock in
some cases.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Uses a seqcount_t to synchronize stat producer and consumer, for packets
and bytes counter, now u64 types.
(dropped counter being rarely used, stay a native "unsigned long" type)
No noticeable performance impact on x86, as it only adds two increments
per frame. It might be more expensive on arches where smp_wmb() is not
free.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use RCU to avoid atomic ops on idev refcnt in ipv6_get_mtu()
and ip6_dst_hoplimit()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use __in6_dev_get() instead of in6_dev_get()/in6_dev_put()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 97f8aefbbf "net: fix ethtool
coding style errors and warnings" changed the indentation of several
macro definitions in ethtool.h. These definitions line up in the diff
where there is an extra character at the start of each line, but not
in the resulting file.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The parameter (work) is unused, remove it.
Reported from Eric Dumazet.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Instead of doing one atomic operation per frag, we can factorize them.
Reported from Eric Dumazet.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
If the returned csum value is 0, We has set ip_summed with
CHECKSUM_UNNECESSARY flag in __skb_checksum_complete_head().
So this patch kills the check and changes to return to upper
caller directly.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
- must use atomic_inc_not_zero() in instance_lookup_get()
- must use hlist_add_head_rcu() instead of hlist_add_head()
- must use hlist_del_rcu() instead of hlist_del()
- Introduce NFULNL_COPY_DISABLED to stop lockless reader from using an
instance, before we do final instance_put() on it.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch added to 2.6.34:
commit f8d1dcaf88
Author: Jesse Brandeburg <jesse.brandeburg@intel.com>
Date: Tue Apr 27 01:37:20 2010 +0000
ixgbe: enable extremely low latency
introduced a feature where LRO (called RSC on the hardware) was disabled
automatically when setting rx-usecs to 0 via ethtool. Some might not
like the fact that LRO was disabled automatically, but I'm fine with
that. What I don't like is that LRO/RSC is automatically enabled when
rx-usecs is set >0 via ethtool.
This would certainly be a problem if the device was used for forwarding
and it was determined that the low latency wasn't needed after the
device was already forwarding. I played around with saving the state of
LRO in the driver, but it just didn't seem worthwhile and would require
a small change to dev_disable_lro() that I did not like.
This patch simply leaves LRO disabled when setting rx-usecs >0 and
requires that the user enable it again. An extra informational message
will also now appear in the log so users can understand why LRO isn't
being enabled as they expect.
Inconsistency of LRO setting first noticed by Stanislaw Gruszka.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
CC: Stanislaw Gruszka <sgruszka@redhat.com>
CC: stable@kernel.org
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 675ad47375
removed the capability to use ethtool.set_msglevel to
control the types of messages emitted by the driver.
That commit should probably be reverted.
If not, then this patch fixes a message logging defect
introduced by converting a printk without KERN_<level>
to e_info.
This also reduces text by about 200 bytes.
Signed-off-by: Joe Perches <joe@perches.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a small window where the watchdog could be running as the
interface is brought down on a NIC with two ports wired back to back.
If ixgbe_update_status is then called can lead to a panic. This patch
allows the update to bail if we are in that condition.
This issue was orignally reported and fix proposed by Akihiko Saitou.
CC: Akihiko Saitou <asaitou@users.sourceforge.net>
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No need to copy rxhash again in __skb_clone()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
deliver_no_wcard is not being set in skb_copy_header.
In the skb_cloned case it is not being cleared and
may cause the skb to be dropped when the loopback device
pushes it back up the stack.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Device statistics have type unsigned long and several of the
device-specific parameters printed here have type __u32.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use struct rtnl_link_stats64 as the statistics structure.
On 32-bit architectures, insert 32 bits of padding after/before each
field of struct net_device_stats to make its layout compatible with
struct rtnl_link_stats64. Add an anonymous union in net_device; move
stats into the union and add struct rtnl_link_stats64 stats64.
Add net_device_ops::ndo_get_stats64, implementations of which will
return a pointer to struct rtnl_link_stats64. Drivers that implement
this operation must not update the structure asynchronously.
Change dev_get_stats() to call ndo_get_stats64 if available, and to
return a pointer to struct rtnl_link_stats64. Change callers of
dev_get_stats() accordingly.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ioctl operation (ndo_do_ioctl) is added to make mii-tools work
Signed-off-by: Sergey Matyukevich <geomatsi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix build warning on i386 (32-bit) with 32-bit dma_addr_t:
drivers/net/enic/vnic_dev.c: In function 'vnic_dev_init_prov':
drivers/net/enic/vnic_dev.c:716: warning: passing argument 3 of 'pci_alloc_consistent' from incompatible pointer type
include/asm-generic/pci-dma-compat.h:16: note: expected 'dma_addr_t *' but argument is of type 'u64 *'
Now builds without warnings on i386 and on x86_64.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Scott Feldman <scofeldm@cisco.com>
Cc: Vasanthy Kolluri <vkolluri@cisco.com>
Cc: Roopa Prabhu <roprabhu@cisco.com>
Acked-by: Scott Feldman <scofeldm@cisco.com>
This patch increases the granularity of the rate generated by pktgen.
The previous version of pktgen uses micro seconds (udelay) resolution when it
was delayed causing gaps in the rates. It is changed to nanosecond (ndelay).
Now any rate is possible.
Also it allows to set, the desired rate in Mb/s or packets per second.
The documentation has been updated.
Signed-off-by: Daniel Turull <daniel.turull@gmail.com>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
econet lacks proper locking. It holds econet_lock only when inserting or
deleting an entry in econet_sklist, not during lookups.
- convert econet_lock from rwlock to spinlock
- use econet_lock in ec_listening_socket() lookup
- use appropriate sock_hold() / sock_put() to avoid corruptions.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
gen_kill_estimator() API is incomplete or not well documented, since
caller should make sure an RCU grace period is respected before
freeing stats_lock.
This was partially addressed in commit 5d944c640b
(gen_estimator: deadlock fix), but same problem exist for all
gen_kill_estimator() users, if lock they use is not already RCU
protected.
A code review shows xt_RATEEST.c, act_api.c, act_police.c have this
problem. Other are ok because they use qdisc lock, already RCU
protected.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/bnx2.c: In function 'bnx2_disable_forced_2g5':
drivers/net/bnx2.c:1489: warning: 'bmcr' may be used uninitialized in this function
We fix it by checking return values from all bnx2_read_phy() and proceeding
to do read-modify-write only if the read operation is successful.
The related bnx2_enable_forced_2g5() is also fixed the same way.
Reported-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If oui were a null variable then vic_provinfo_alloc() would leak memory.
But this function is only called from one place and oui is not null so
I removed the check.
I also moved the memory allocation down a line so it was easier to spot.
(No one ever reads variable declarations).
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The point of using the devres resource management routines is that they
simplify the driver by taking care of releasing resources on failure and
release. A recent commit added a bunch of error handling that is unnecessary
in this context.
This patch removes this redundant error handling, as well as using
dmam_alloc_coherent in place of dma_alloc_coherent in order to use this
framework consistenly throughout the driver.
Signed-off-by: Jonas Bonn <jonas@southpole.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
This matches what ethoc_mdio_read does and makes the functions
symmetric.
Signed-off-by: Jonas Bonn <jonas@southpole.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
- No need to iterate over all possible addresses on bus
- Use helper function phy_find_first
- Use phy_connect_direct as we already have the relevant structure
Signed-off-by: Jonas Bonn <jonas@southpole.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
This moves the write of the TX_BD_NUM to init_ring together with the
rest of the code setting up the transmission buffers.
Signed-off-by: Jonas Bonn <jonas@southpole.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ethoc driver should be writing bus addresses to the ethoc registers, not
virtual addresses. This patch adds an array to store the virtual addresses
in and references that array when manipulating the contents of the buffer
descriptors.
Signed-off-by: Jonas Bonn <jonas@southpole.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
This moves the calculation of the number of transmission buffers to
ethoc_probe where it more logically fits with the rest of the memory
allocation code.
Signed-off-by: Jonas Bonn <jonas@southpole.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
i2400m_fw_hdr_check() was accessing hardware field
bcf_hdr->module_type (little endian 32) without converting to host
byte sex.
Reported-by: Данилин Михаил <mdanilin@nsg.net.ru>
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>