When you've enabled conntrack and NAT as a module (standard case in all
distributions), and you've also enabled the new conntrack netlink
interface, loading ip_conntrack_netlink.ko will auto-load iptable_nat.ko.
This causes a huge performance penalty, since for every packet you iterate
the nat code, even if you don't want it.
This patch splits iptable_nat.ko into the NAT core (ip_nat.ko) and the
iptables frontend (iptable_nat.ko). Threfore, ip_conntrack_netlink.ko will
only pull ip_nat.ko, but not the frontend. ip_nat.ko will "only" allocate
some resources, but not affect runtime performance.
This separation is also a nice step in anticipation of new packet filters
(nf-hipac, ipset, pkttables) being able to use the NAT core.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The GRE, SCTP and TCP protocol helpers did not call
ip_conntrack_event_cache() when updating ct->status. This patch adds
the respective calls.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have to introduce a separate Kconfig menu entry for the NFQUEUE targets.
They cannot "just" depend on nfnetlink_queue, since nfnetlink_queue could
be linked into the kernel, whereas iptables can be a module.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a number of bugs. It cannot be reasonably split up in
multiple fixes, since all bugs interact with each other and affect the same
function:
Bug #1:
The event cache code cannot be called while a lock is held. Therefore, the
call to ip_conntrack_event_cache() within ip_ct_refresh_acct() needs to be
moved outside of the locked section. This fixes a number of 2.6.14-rcX
oops and deadlock reports.
Bug #2:
We used to call ct_add_counters() for unconfirmed connections without
holding a lock. Since the add operations are not atomic, we could race
with another CPU.
Bug #3:
ip_ct_refresh_acct() lost REFRESH events in some cases where refresh
(and the corresponding event) are desired, but no accounting shall be
performed. Both, evenst and accounting implicitly depended on the skb
parameter bein non-null. We now re-introduce a non-accounting
"ip_ct_refresh()" variant to explicitly state the desired behaviour.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As noted by Alexey Dobriyan, the DEBUGP statement prints the wrong
callID.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Those exports are needed by the PPTP helper following in the next
couple of changes.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both __ip_conntrack_expect_find and ip_conntrack_expect_find_get take
a reference to the expectation, the difference is that callers of
__ip_conntrack_expect_find must hold ip_conntrack_lock.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This new "version 3" PPTP conntrack/nat helper is finally ready for
mainline inclusion. Special thanks to lots of last-minute bugfixing
by Patric McHardy.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_ct_refresh_acct() can be called without a valid "skb" pointer.
This used to work, since ct_add_counters() deals with that fact.
However, the recently-added event cache doesn't handle this at all.
This patch is a quick fix that is supposed to be replaced soon by a cleaner
solution during the pending redesign of the event cache.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of maintaining an array containing a list of nodes this instance
is responsible for let's use a simple bitmap. This provides the
following features:
* clusterip_responsible() and the add_node()/delete_node() operations
become very simple and don't need locking
* the config structure is much smaller
In spite of the completely different internal data representation the
user-space interface remains almost unchanged; the only difference is
that the proc file does not list nodes in the order they were added.
(The target info structure remains the same.)
Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The CLUSTERIP target creates a procfs entry for all different cluster
IPs. Although more than one rules can refer to a single cluster IP (and
thus a single config structure), removal of the procfs entry is done
unconditionally in destroy(). In more complicated situations involving
deferred dereferencing of the config structure by procfs and creating a
new rule with the same cluster IP it's also possible that no entry will
be created for the new rule.
This patch fixes the problem by counting the number of entries
referencing a given config structure and moving the config list
manipulation and procfs entry deletion parts to the
clusterip_config_entry_put() function.
Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In 2.6.13-rcX the MASQUERADE target was changed not to exclude local
packets for better source address consistency. This breaks DHCP clients
using UDP sockets when the DHCP requests are caught by a MASQUERADE rule
because the MASQUERADE target drops packets when no address is configured
on the outgoing interface. This patch makes it ignore packets with a
source address of 0.
Thanks to Rusty for this suggestion.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't parse the packet, the data is already available in the conntrack
structure.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
With large port numbers the helper_names buffer can overflow.
Noticed by Samir Bellabes <sbellabes@mandriva.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the use of RCU in files structure, the look-up of files using fds can now
be lock-free. The lookup is protected by rcu_read_lock()/rcu_read_unlock().
This patch changes the readers to use lock-free lookup.
Signed-off-by: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Ravikiran Thirumalai <kiran_th@gmail.com>
Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following patch kills __ip_ct_expect_unlink_destroy and export
unlink_expect as ip_ct_unlink_expect. As it was discussed [1], the function
__ip_ct_expect_unlink_destroy is a bit confusing so better do the following
sequence: ip_ct_destroy_expect and ip_conntrack_expect_put.
[1] https://lists.netfilter.org/pipermail/netfilter-devel/2005-August/020794.html
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the NAT module is loaded when connections are already confirmed
it must not change their tuples anymore. This is especially important
with CONFIG_NETFILTER_DEBUG, the netfilter listhelp functions will
refuse to remove an entry from a list when it can not be found on
the list, so when a changed tuple hashes to a new bucket the entry
is kept in the list until and after the conntrack is freed.
Allocate the exact conntrack tuple for NAT for already confirmed
connections or drop them if that fails.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Connection mark tracking support is one of the feature in connection
tracking, so IP_NF_CONNTRACK_MARK depends on IP_NF_CONNTRACK.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
A permanent expectation exists until timeing out and can expect
multiple related connections.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a trivial typo in clusterip_config_init().
Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This new iptables target allows manipulation of the TTL of an IPv4 packet.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch puts mostly read only data in the right section
(read_mostly), to help sharing of these data between CPUS without
memory ping pongs.
On one of my production machine, tcp_statistics was sitting in a
heavily modified cache line, so *every* SNMP update had to force a
reload.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Originally written by Henrik Nordstrom <hno@marasystems.com>, taken
from netfilter patch-o-matic and added ip6_tables support.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pablo Neira Ayuso <pablo@eurodev.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rip out cmd/sid/pid matching since its unfixable broken and stands in the
way of locking changes to tasklist_lock.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Gary Wayne Smith <gary.w.smith@primeexalia.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Increases consistency in source-address selection.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch ads a new "connbytes" match that utilizes the CONFIG_NF_CT_ACCT
per-connection byte and packet counters. Using it you can do things like
packet classification on average packet size within a connection.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using this new iptables DCCP protocol header match, it is possible to
create simplistic stateless packet filtering rules for DCCP. It
permits matching of port numbers, packet type and options.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the bug which doesn't return ERR_PTR(-ENOMEM) if it
failed to allocate memory space from slab cache. This bug leads to
erroneously not dropped packets under stress, and wrong statistic
counters ('invalid' is incremented instead of 'drop'). It was
introduced during the ctnetlink merge in the net-2.6.14 tree, so no
stable or mainline releases affected.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a /proc/net/netfilter/nf_queue file, similar to the
recently-added /proc/net/netfilter/nf_log. It indicates which queue
handler is registered to which protocol family. This is useful since
there are now multiple queue handlers in the treee (ip[6]_queue,
nfnetlink_queue).
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since nfnetlink_queue can override ip{6}_queue as queue handlers, we
can no longer blindly unregister whoever is registered for PF_INET[6],
but only unregister ourselves.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to DaveM, it is preferrable to have large data structures be
allocated dynamically from the module init() function rather than
putting them as static global variables into BSS.
This patch moves the conntrack helper packet buffers into dynamically
allocated memory.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently conntracks are inserted after the head. That means that
conntracks are sorted from the biggest to the smallest id. This happens
because we use list_prepend (list_add) instead list_add_tail. This can
result in problems during the list iteration.
list_for_each(i, &ip_conntrack_hash[cb->args[0]]) {
h = (struct ip_conntrack_tuple_hash *) i;
if (DIRECTION(h) != IP_CT_DIR_ORIGINAL)
continue;
ct = tuplehash_to_ctrack(h);
if (ct->id <= *id)
continue;
In that case just the first conntrack in the bucket will be dumped. To
fix this, we iterate the list from the tail to the head via
list_for_each_prev. Same thing for the list of expectations.
Signed-off-by: Pablo Neira Ayuso <pablo@eurodev.net>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes the size of the ctnl_exp_cb array that is IPCTNL_MSG_EXP_MAX
instead of IPCTNL_MSG_MAX. Simple typo.
Signed-off-by: Pablo Neira Ayuso <pablo@eurodev.net>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>