Commit graph

41879 commits

Author SHA1 Message Date
Bart De Schuymer
d12cdc3ccf [NETFILTER]: ebtables: add --snap-arp option
The attached patch adds --snat-arp support, which makes it possible to
change the source mac address in both the mac header and the arp header
with one rule.

Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:32 -08:00
Patrick McHardy
baf7b1e112 [NETFILTER]: x_tables: add NFLOG target
Add new NFLOG target to allow use of nfnetlink_log for both IPv4 and IPv6.
Currently we have two (unsupported by userspace) hacks in the LOG and ULOG
targets to optionally call to the nflog API. They lack a few features,
namely the IPv4 and IPv6 LOG targets can not specify a number of arguments
related to nfnetlink_log, while the ULOG target is only available for IPv4.
Remove those hacks and add a clean way to use nfnetlink_log.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:31 -08:00
Patrick McHardy
39b46fc6f0 [NETFILTER]: x_tables: add port of hashlimit match for IPv4 and IPv6
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:31 -08:00
Patrick McHardy
d7a5c32442 [NETFILTER]: nfnetlink_log: remove useless prefix length limitation
There is no reason for limiting netlink attributes in size.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:30 -08:00
Eric Leblond
829e17a1a6 [NETFILTER]: nfnetlink_queue: allow changing queue length through netlink
Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:29 -08:00
Pablo Neira Ayuso
7b621c1ea6 [NETFILTER]: ctnetlink: rework conntrack fields dumping logic on events
|   NEW   | UPDATE  | DESTROY |
     ----------------------------------------|
     tuples    |    Y    |    Y    |    Y    |
     status    |    Y    |    Y    |    N    |
     timeout   |    Y    |    Y    |    N    |
     protoinfo |    S    |    S    |    N    |
     helper    |    S    |    S    |    N    |
     mark      |    S    |    S    |    N    |
     counters  |    F    |    F    |    Y    |

 Leyend:
         Y: yes
         N: no
         S: iif the field is set
	 F: iif overflow

This patch also replace IPCT_HELPINFO by IPCT_HELPER since we want to
track the helper assignation process, not the changes in the private
information held by the helper.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:28 -08:00
Pablo Neira Ayuso
bbb3357d14 [NETFILTER]: ctnetlink: check for status attribute existence on conntrack creation
Check that status flags are available in the netlink message received
to create a new conntrack.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:27 -08:00
Patrick McHardy
1b683b5512 [NETFILTER]: sip conntrack: better NAT handling
The NAT handling of the SIP helper has a few problems:

- Request headers are only mangled in the reply direction, From/To headers
  not at all, which can lead to authentication failures with DNAT in case
  the authentication domain is the IP address

- Contact headers in responses are only mangled for REGISTER responses

- Headers may be mangled even though they contain addresses not
  participating in the connection, like alternative addresses

- Packets are droppen when domain names are used where the helper expects
  IP addresses

This patch takes a different approach, instead of fixed rules what field
to mangle to what content, it adds symetric mapping of From/To/Via/Contact
headers, which allows to deal properly with echoed addresses in responses
and foreign addresses not belonging to the connection.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:26 -08:00
Patrick McHardy
77a78dec48 [NETFILTER]: sip conntrack: make header shortcuts optional
Not every header has a shortcut, so make them optional instead
of searching for the same string twice.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:25 -08:00
Patrick McHardy
40883e8184 [NETFILTER]: sip conntrack: do case insensitive SIP header search
SIP headers are generally case-insensitive, only SDP headers are
case sensitive.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:24 -08:00
Patrick McHardy
9d5b8baa4e [NETFILTER]: sip conntrack: minor cleanup
- Use enum for header field enumeration
- Use numerical value instead of pointer to header info structure to
  identify headers, unexport ct_sip_hdrs
- group SIP and SDP entries in header info structure
- remove double forward declaration of ct_sip_get_info

Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:23 -08:00
Patrick McHardy
337fbc4166 [NETFILTER]: ip_conntrack: fix NAT helper unload races
The NAT helpr hooks are protected by RCU, but all of the
conntrack helpers test and use the global pointers instead
of copying them first using rcu_dereference()

Also replace synchronize_net() by synchronize_rcu() for clarity
since sychronizing only with packet receive processing is
insufficient to prevent races.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:22 -08:00
Yasuyuki Kozakai
468ec44bd5 [NETFILTER]: conntrack: add '_get' to {ip, nf}_conntrack_expect_find
We usually uses 'xxx_find_get' for function which increments
reference count.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:21 -08:00
Patrick McHardy
e4bd8bce3e [NETFILTER]: nf_conntrack: /proc compatibility with old connection tracking
This patch adds /proc/net/ip_conntrack, /proc/net/ip_conntrack_expect and
/proc/net/stat/ip_conntrack files to keep old programs using them working.

The /proc/net/ip_conntrack and /proc/net/ip_conntrack_expect files show only
IPv4 entries, the /proc/net/stat/ip_conntrack shows global statistics.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:20 -08:00
Patrick McHardy
a999e68376 [NETFILTER]: nf_conntrack: sysctl compatibility with old connection tracking
This patch adds an option to keep the connection tracking sysctls visible
under their old names.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:19 -08:00
Patrick McHardy
933a41e7e1 [NETFILTER]: nf_conntrack: move conntrack protocol sysctls to individual modules
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:18 -08:00
Patrick McHardy
d62f9ed4a4 [NETFILTER]: nf_conntrack: automatic sysctl registation for conntrack protocols
Add helper functions for sysctl registration with optional instantiating
of common path elements (like net/netfilter) and use it for support for
automatic registation of conntrack protocol sysctls.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:17 -08:00
Patrick McHardy
f8eb24a89a [NETFILTER]: nf_conntrack: move extern declaration to header files
Using extern in a C file is a bad idea because the compiler can't
catch type errors.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:16 -08:00
Patrick McHardy
d734685334 [NETFILTER]: nf_conntrack_ftp: fix missing helper mask initilization
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:15 -08:00
Martin Josefsson
be00c8e489 [NETFILTER]: nf_conntrack: reduce timer updates in __nf_ct_refresh_acct()
Only update the conntrack timer if there's been at least HZ jiffies since
the last update. Reduces the number of del_timer/add_timer cycles from one
per packet to one per connection per second (plus once for each state change
of a connection)

Should handle timer wraparounds and connection timeout changes.

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:14 -08:00
Martin Josefsson
824621eddd [NETFILTER]: nf_conntrack: remove unused struct list_head from protocols
Remove unused struct list_head from struct nf_conntrack_l3proto and
nf_conntrack_l4proto as all protocols are kept in arrays, not linked
lists.

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:13 -08:00
Martin Josefsson
3ffd5eeb1a [NETFILTER]: nf_conntrack: minor __nf_ct_refresh_acct() whitespace cleanup
Minor whitespace cleanup.

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:12 -08:00
Martin Josefsson
951d36cace [NETFILTER]: nf_conntrack: remove ASSERT_{READ,WRITE}_LOCK
Remove the usage of ASSERT_READ_LOCK/ASSERT_WRITE_LOCK in nf_conntrack,
it didn't do anything, it was just an empty define and it uglified the code.

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:11 -08:00
Martin Josefsson
ae5718fb3d [NETFILTER]: nf_conntrack: more sanity checks in protocol registration/unregistration
Add some more sanity checks when registering/unregistering l3/l4 protocols.

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:10 -08:00
Martin Josefsson
605dcad6c8 [NETFILTER]: nf_conntrack: rename struct nf_conntrack_protocol
Rename 'struct nf_conntrack_protocol' to 'struct nf_conntrack_l4proto' in
order to help distinguish it from 'struct nf_conntrack_l3proto'. It gets
rather confusing with 'nf_conntrack_protocol'.

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:09 -08:00
Martin Josefsson
e2b7606cdb [NETFILTER]: More __read_mostly annotations
Place rarely written variables in the read-mostly section by using
__read_mostly

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:08 -08:00
Martin Josefsson
8f03dea52b [NETFILTER]: nf_conntrack: split out protocol handling
This patch splits out L3/L4 protocol handling into its own file
nf_conntrack_proto.c

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:07 -08:00
Martin Josefsson
f61801218a [NETFILTER]: nf_conntrack: split out the event cache
This patch splits out the event cache into its own file
nf_conntrack_ecache.c

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:06 -08:00
Martin Josefsson
7e5d03bb9d [NETFILTER]: nf_conntrack: split out helper handling
This patch splits out handling of helpers into its own file
nf_conntrack_helper.c

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:05 -08:00
Martin Josefsson
77ab9cff0f [NETFILTER]: nf_conntrack: split out expectation handling
This patch splits out expectation handling into its own file
nf_conntrack_expect.c

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02 21:31:04 -08:00
David S. Miller
d2e4bdc870 [TCP] Vegas: Increase default alpha to 2 and beta to 4.
This helps Vegas cope better with delayed ACKs, see
analysis at:

http://www.cs.caltech.edu/%7Eweixl/technical/ns2linux/known_linux/index.html#vegas

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:31:03 -08:00
Gerrit Renker
6b57c93dc3 [DCCP]: Use `unsigned' for packet lengths
This patch implements a suggestion by Ian McDonald and

 1) Avoids tests against negative packet lengths by using unsigned int
    for packet payload lengths in the CCID send_packet()/packet_sent() routines

 2) As a consequence, it removes an now unnecessary test with regard to `len > 0'
    in ccid3_hc_tx_packet_sent: that condition is always true, since
      * negative packet lengths are avoided
      * ccid3_hc_tx_send_packet flags an error whenever the payload length is 0.
        As a consequence, ccid3_hc_tx_packet_sent is never called as all errors
        returned by ccid_hc_tx_send_packet are caught in dccp_write_xmit

 3) Removes the third argument of ccid_hc_tx_send_packet (the `len' parameter),
    since it is currently always set to skb->len. The code is updated with regard
    to this parameter change.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:31:02 -08:00
Gerrit Renker
a79ef76f4d [DCCP] ccid3: Larger initial windows
This implements the larger-initial-windows feature for CCID 3, as described in
section 5 of RFC 4342. When the first feedback packet arrives, the sender can
send up to 2..4 packets per RTT, instead of just one.

The patch further
 * reduces the number of timestamping calls by passing the timestamp value
   (which is computed in one of the calling functions anyway) as argument

 * renames one constant with a very long name into one which is shorter and
   resembles the one in RFC 3448 (t_mbi)

 * simplifies some of the min_t/max_t cases where both `x', `y' have the same
   type

Commiter note: renamed TFRC_t_mbi to TFRC_T_MBI, to follow Linux coding style.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:31:01 -08:00
Arnaldo Carvalho de Melo
841bac1d60 [DCCP]: Make {set,get}sockopt(DCCP_SOCKOPT_PACKET_SIZE) return 0
To reflect the fact that this now is of no effect, not making apps
stop working, just be warned in the system log.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:31:00 -08:00
Gerrit Renker
5aed324369 [DCCP]: Tidy up unused structures
This removes and cleans up unused variables and structures which have become
unnecessary following the introduction of the EWMA patch to automatically track
the CCID 3 receiver/sender packet sizes `s'.

It deprecates the PACKET_SIZE socket option by returning an error code and
printing a deprecation warning if an application tries to read or write this
socket option.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:59 -08:00
Gerrit Renker
78ad713da6 [DCCP] ccid3: Track RX/TX packet size `s' using moving-average
Problem:
2006-12-02 21:30:58 -08:00
Gerrit Renker
2a1fda6f6c [DCCP] ccid3: Set NoFeedback Timeout according to RFC 3448
This corrects the setting of the nofeedback timer with regard to RFC
3448 - previously it was not set to max(4*R, 2*s/X) as specified. Using
the maximum of 1 second as upper bound (as it was done before) can have
detrimental effects, especially if R is small.

Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:57 -08:00
Gerrit Renker
4384260443 [DCCP]: Remove allocation of sysctl numbers
This is in response to a request sent earlier by Eric W. Biederman
and replaces all sysctl numbers for net.dccp.default with CTL_UNNUMBERED.

It has been tested to compile and to work.

Commiter note: I've removed the use of CTL_UNNUMBERED, not setting .ctl_name
               sets it to 0, that is the what CTL_UNNUMBERED is, reason is
               to avoid unneeded source code cluttering.

Signed-off-by: Gerrit Renker  <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:56 -08:00
Arnaldo Carvalho de Melo
ee41e2dff1 [INET]: Change protocol field in struct inet_protosw to u16
[acme@newtoy net-2.6.20]$ pahole /tmp/tcp_ipv6.o inet_protosw
/* /pub/scm/linux/kernel/git/acme/net-2.6.20/include/net/protocol.h:69 */
struct inet_protosw {
        struct list_head           list;                 /*     0     8 */
        short unsigned int         type;                 /*     8     2 */

        /* XXX 2 bytes hole, try to pack */

        int                        protocol;             /*    12     4 */
        struct proto *             prot;                 /*    16     4 */
        const struct proto_ops  *  ops;                  /*    20     4 */
        int                        capability;           /*    24     4 */
        char                       no_check;             /*    28     1 */
        unsigned char              flags;                /*    29     1 */
}; /* size: 32, sum members: 28, holes: 1, sum holes: 2, padding: 2 */

So that we can kill that hole, protocol can only go all the way to 255 (RAW).

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:55 -08:00
Arnaldo Carvalho de Melo
3a137d2065 [TCP]: Renove the __ prefix on the struct tcp_sock members
As this struct is not userland visible at all.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:54 -08:00
Arnaldo Carvalho de Melo
2ff52f282c [TCP]: Change tcp_header_len member in tcp_sock to u16
With this we eliminate the last hole in struct tcp_sock.

End result:

[acme@newtoy net-2.6.20]$ codiff -sV /tmp/tcp.o.before net/ipv4/tcp.o
/pub/scm/linux/kernel/git/acme/net-2.6.20/net/ipv4/tcp.c:
  struct tcp_sock |   -4
    tcp_header_len;
     from: int                   /*  1000(0)     4(0) */
     to:   u16                   /*  1000(0)     2(0) */
 1 struct changed
[acme@newtoy net-2.6.20]$

Now sizeof(tcp_sock) is just...

[acme@newtoy net-2.6.20]$ pahole --sizes ../OUTPUT/qemu/net-2.6.20/net/ipv4/tcp.o | grep -w tcp_sock
struct tcp_sock: 1500 0

1500 bytes ;-)

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:53 -08:00
Gerrit Renker
5d0dbc4a9b [DCCP] ccid3: Consolidate handling of t_RTO
This patch
 * removes setting t_RTO in ccid3_hc_tx_init (per [RFC 3448, 4.2], t_RTO is
   undefined until feedback has been received);

 * makes some trivial changes (updates of comments);

 * performs a small optimisation by exploiting that the feedback timeout
   uses the value of t_ipi. The way it is done is safe, because the timeouts
   appear after the changes to t_ipi, ensuring that up-to-date values are used;

 * in ccid3_hc_tx_packet_recv, moves the t_rto statement closer to the calculation
   of the next_tmout. This makes the code clearer to read and is also safe, since
   t_rto is not updated until the next call of ccid3_hc_tx_packet_recv, and is not
   read by the functions called via ccid_wait_for_ccid();

 * removes a `max' statement in sk_reset_timer, this is not needed since the timeout
   value is always greater than 1E6 microseconds.

 * adds `XXX'es to highlight that currently the nofeedback timer is set
   in a non-standard way

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:52 -08:00
Gerrit Renker
17893bc1a6 [DCCP] ccid3: Consistently update t_nom, t_ipi, t_delta
This patch:

 * consolidates updating of parameters (t_nom, t_ipi, t_delta) which
   need to be updated at the same time, since they are inter-dependent

 * removes two inline functions which are no longer needed as a result of
   the above consolidation

 * resolves a FIXME regarding the re-calculation of t_ipi within the nofeedback
   timer, in the state where no feedback has previously been received

 * ties updating these parameters to updating the sending rate X, exploiting
   that all three parameters in turn depend on X; and using a small optimisation
   which can reduce the number of required instructions: only update the three
   parameters when X really changes

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:51 -08:00
Gerrit Renker
48e03eee71 [DCCP] ccid3: Consolidate timer resets
This patch concerns updating the value of the nofeedback timer when no feedback
has been received so far.

Since in this case the value of R is still undefined according to [RFC 3448,
4.2], we can not perform step (3) of [RFC 3448, 4.3].  A clarification is
provided in [RFC 4342, sec. 5], which states that in these cases the nofeedback
timer (still) expires "after two seconds".

Many thanks to Ian McDonald for pointing this out and providing the
clarification.

The patch
  * implements [RFC 4342, sec. 5] with regard to the above case
  * consolidates handling timer restart by
	- adding an appropriate jump label and
	- initialising the timeout value

Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:50 -08:00
Jamal Hadi Salim
b798a9ede2 [XFRM]: Convert a few __u8 to proper u8
Caught by the EyeBalls(tm) of Thomas Graf

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:30:50 -08:00
Jamal Hadi Salim
0c51f53c57 [XFRM]: Make flush notifier prettier when subpolicy used
Might as well make flush notifier prettier when subpolicy used

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:30:49 -08:00
Arnaldo Carvalho de Melo
46ca5f5dc4 [XFRM]: Pack struct xfrm_policy
[acme@newtoy net-2.6.20]$ pahole net/ipv4/tcp.o xfrm_policy
/* /pub/scm/linux/kernel/git/acme/net-2.6.20/include/linux/security.h:67 */
struct xfrm_policy {
        struct xfrm_policy *       next;                 /*     0     4 */
        struct hlist_node          bydst;                /*     4     8 */
        struct hlist_node          byidx;                /*    12     8 */
        rwlock_t                   lock;                 /*    20    36 */
        atomic_t                   refcnt;               /*    56     4 */
        struct timer_list          timer;                /*    60    24 */
        u8                         type;                 /*    84     1 */

        /* XXX 3 bytes hole, try to pack */

        u32                        priority;             /*    88     4 */
        u32                        index;                /*    92     4 */
        struct xfrm_selector       selector;             /*    96    56 */
        struct xfrm_lifetime_cfg   lft;                  /*   152    64 */
        struct xfrm_lifetime_cur   curlft;               /*   216    32 */
        struct dst_entry *         bundles;              /*   248     4 */
        __u16                      family;               /*   252     2 */
        __u8                       action;               /*   254     1 */
        __u8                       flags;                /*   255     1 */
        __u8                       dead;                 /*   256     1 */
        __u8                       xfrm_nr;              /*   257     1 */

        /* XXX 2 bytes hole, try to pack */

        struct xfrm_sec_ctx *      security;             /*   260     4 */
        struct xfrm_tmpl           xfrm_vec[6];          /*   264   360 */
}; /* size: 624, sum members: 619, holes: 2, sum holes: 5 */

So lets have just one hole instead of two, by moving 'type' to just before 'action',
end result:

[acme@newtoy net-2.6.20]$ codiff -s /tmp/tcp.o.before net/ipv4/tcp.o
/pub/scm/linux/kernel/git/acme/net-2.6.20/net/ipv4/tcp.c:
  struct xfrm_policy |   -4
 1 struct changed
[acme@newtoy net-2.6.20]$

[acme@newtoy net-2.6.20]$ pahole -c 64 net/ipv4/tcp.o xfrm_policy
/* /pub/scm/linux/kernel/git/acme/net-2.6.20/include/linux/security.h:67 */
struct xfrm_policy {
        struct xfrm_policy *       next;                 /*     0     4 */
        struct hlist_node          bydst;                /*     4     8 */
        struct hlist_node          byidx;                /*    12     8 */
        rwlock_t                   lock;                 /*    20    36 */
        atomic_t                   refcnt;               /*    56     4 */
        struct timer_list          timer;                /*    60    24 */
        u32                        priority;             /*    84     4 */
        u32                        index;                /*    88     4 */
        struct xfrm_selector       selector;             /*    92    56 */
        struct xfrm_lifetime_cfg   lft;                  /*   148    64 */
        struct xfrm_lifetime_cur   curlft;               /*   212    32 */
        struct dst_entry *         bundles;              /*   244     4 */
        u16                        family;               /*   248     2 */
        u8                         type;                 /*   250     1 */
        u8                         action;               /*   251     1 */
        u8                         flags;                /*   252     1 */
        u8                         dead;                 /*   253     1 */
        u8                         xfrm_nr;              /*   254     1 */

        /* XXX 1 byte hole, try to pack */

        struct xfrm_sec_ctx *      security;             /*   256     4 */
        struct xfrm_tmpl           xfrm_vec[6];          /*   260   360 */
}; /* size: 620, sum members: 619, holes: 1, sum holes: 1 */

Are there any fugly data dependencies here? None that I know.

In the process changed the removed the __ prefixed types, that are just for
userspace visible headers.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:48 -08:00
Arnaldo Carvalho de Melo
d5c42c0ec4 [NET]: Pack struct hh_cache
[acme@newtoy net-2.6.20]$ pahole net/ipv4/tcp.o hh_cache
/* /pub/scm/linux/kernel/git/acme/net-2.6.20/include/linux/netdevice.h:190 */
struct hh_cache {
        struct hh_cache *          hh_next;              /*     0     4 */
        atomic_t                   hh_refcnt;            /*     4     4 */
        __be16                     hh_type;              /*     8     2 */

        /* XXX 2 bytes hole, try to pack */

        int                        hh_len;               /*    12     4 */
        int                        (*hh_output)();       /*    16     4 */
        rwlock_t                   hh_lock;              /*    20    36 */
        long unsigned int          hh_data[24];          /*    56    96 */
}; /* size: 152, sum members: 150, holes: 1, sum holes: 2 */

[acme@newtoy net-2.6.20]$ find net -name "*.[ch]" | xargs grep 'hh_len.\+=' | sort -u
net/atm/br2684.c:               hh->hh_len = PADLEN + ETH_HLEN;
net/ethernet/eth.c:     hh->hh_len = ETH_HLEN;
net/ipv4/ipconfig.c:    int hh_len = LL_RESERVED_SPACE(dev);
net/ipv4/ip_output.c:   hh_len = LL_RESERVED_SPACE(rt->u.dst.dev);
net/ipv4/ip_output.c:   int hh_len = LL_RESERVED_SPACE(dev);
net/ipv4/netfilter.c:   hh_len = (*pskb)->dst->dev->hard_header_len;
net/ipv4/raw.c: hh_len = LL_RESERVED_SPACE(rt->u.dst.dev);
net/ipv6/ip6_output.c:  hh_len = LL_RESERVED_SPACE(rt->u.dst.dev);
net/ipv6/netfilter/ip6t_REJECT.c:       hh_len = (dst->dev->hard_header_len + 15)&~15;
net/ipv6/raw.c: hh_len = LL_RESERVED_SPACE(rt->u.dst.dev);
[acme@newtoy net-2.6.20]$

[acme@newtoy net-2.6.20]$ find include -name "*.h" | xargs grep 'define ETH_HLEN'
include/linux/if_ether.h:#define ETH_HLEN       14              /* Total octets in header.       */

        (((dev)->hard_header_len&~(HH_DATA_MOD - 1)) + HH_DATA_MOD)

[acme@newtoy net-2.6.20]$ pahole net/ipv4/tcp.o net_device | grep hard_header_len
        short unsigned int         hard_header_len;      /*   106     2 */
[acme@newtoy net-2.6.20]$

So I think we're safe in turning hh_len an u16, end result:

[acme@newtoy net-2.6.20]$ codiff -sV /tmp/tcp.o.before net/ipv4/tcp.o
/pub/scm/linux/kernel/git/acme/net-2.6.20/net/ipv4/tcp.c:
  struct hh_cache |   -4
    hh_len;
     from: int                   /*    12(0)     4(0) */
     to:   u16                   /*    10(0)     2(0) */
 1 struct changed
[acme@newtoy net-2.6.20]$

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:47 -08:00
Arnaldo Carvalho de Melo
850db6b8c5 [INET_CONNECTION_SOCK]: Pack struct inet_connection_sock_af_ops
We have a hole in:

[acme@newtoy net-2.6.20]$ pahole net/ipv6/tcp_ipv6.o inet_connection_sock_af_ops
/* /pub/scm/linux/kernel/git/acme/net-2.6.20/include/net/inet_connection_sock.h:38 */
struct inet_connection_sock_af_ops {
        int                        (*queue_xmit)();      /*     0     4 */
        void                       (*send_check)();      /*     4     4 */
        int                        (*rebuild_header)();  /*     8     4 */
        int                        (*conn_request)();    /*    12     4 */
        struct sock *              (*syn_recv_sock)();   /*    16     4 */
        int                        (*remember_stamp)();  /*    20     4 */
        __u16                      net_header_len;       /*    24     2 */

        /* XXX 2 bytes hole, try to pack */

        int                        (*setsockopt)();      /*    28     4 */
        int                        (*getsockopt)();      /*    32     4 */
        int                        (*compat_setsockopt)(); /*    36     4 */
        int                        (*compat_getsockopt)(); /*    40     4 */
        void                       (*addr2sockaddr)();   /*    44     4 */
        int                        sockaddr_len;         /*    48     4 */
}; /* size: 52, sum members: 50, holes: 1, sum holes: 2 */

But we don't need sockaddr_len to be an int:

[acme@newtoy net-2.6.20]$ find net -name "*.[ch]" | xargs grep '\.sockaddr_len.\+=' | sort -u
net/dccp/ipv4.c:        .sockaddr_len      = sizeof(struct sockaddr_in),
net/dccp/ipv6.c:        .sockaddr_len      = sizeof(struct sockaddr_in6),
net/ipv4/tcp_ipv4.c:    .sockaddr_len      = sizeof(struct sockaddr_in),
net/ipv6/tcp_ipv6.c:    .sockaddr_len      = sizeof(struct sockaddr_in6),
net/sctp/ipv6.c:        .sockaddr_len      = sizeof(struct sockaddr_in6),
net/sctp/protocol.c:    .sockaddr_len      = sizeof(struct sockaddr_in),

[acme@newtoy net-2.6.20]$ pahole --sizes net/ipv6/tcp_ipv6.o | grep sockaddr_in
struct sockaddr_in: 16 0
struct sockaddr_in6: 28 0
[acme@newtoy net-2.6.20]$

So I turned sockaddr_len a 'u16', and now:

[acme@newtoy net-2.6.20]$ pahole net/ipv6/tcp_ipv6.o inet_connection_sock_af_ops
/* /pub/scm/linux/kernel/git/acme/net-2.6.20/include/net/inet_connection_sock.h:38 */
struct inet_connection_sock_af_ops {
        int            (*queue_xmit)();        /*     0   4 */
        void           (*send_check)();        /*     4   4 */
        int            (*rebuild_header)();    /*     8   4 */
        int            (*conn_request)();      /*    12   4 */
        struct sock *  (*syn_recv_sock)();     /*    16   4 */
        int            (*remember_stamp)();    /*    20   4 */
        u16            net_header_len;         /*    24   2 */
        u16            sockaddr_len;           /*    26   2 */
        int            (*setsockopt)();        /*    28   4 */
        int            (*getsockopt)();        /*    32   4 */
        int            (*compat_setsockopt)(); /*    36   4 */
        int            (*compat_getsockopt)(); /*    40   4 */
        void           (*addr2sockaddr)();     /*    44   4 */
}; /* size: 48 */

So we've saved 4 bytes:

[acme@newtoy net-2.6.20]$ codiff -sV /tmp/tcp_ipv6.o.before net/ipv6/tcp_ipv6.o
/pub/scm/linux/kernel/git/acme/net-2.6.20/net/ipv6/tcp_ipv6.c:
  struct inet_connection_sock_af_ops |   -4
    net_header_len;
     from: __u16                 /*    24(0)     2(0) */
     to:   u16                   /*    24(0)     2(0) */
    sockaddr_len;
     from: int                   /*    48(0)     4(0) */
     to:   u16                   /*    26(0)     2(0) */
 1 struct changed
[acme@newtoy net-2.6.20]$

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-02 21:30:46 -08:00
Gerrit Renker
4c0a6cb0db [UDP(-Lite)]: consolidate v4 and v6 get|setsockopt code
This patch consolidates set/getsockopt code between UDP(-Lite) v4 and 6. The
justification is that UDP(-Lite) is a transport-layer protocol and therefore
the socket option code (at least in theory) should be AF-independent.

Furthermore, there is the following code reduplication:
 * do_udp{,v6}_getsockopt is 100% identical between v4 and v6
 * do_udp{,v6}_setsockopt is identical up to the following differerence
	--v4 in contrast to v4 additionally allows the experimental encapsulation
          types  UDP_ENCAP_ESPINUDP and UDP_ENCAP_ESPINUDP_NON_IKE
	--the remainder is identical between v4 and v6
   I believe that this difference is of little relevance.

The advantages in not duplicating twice almost completely identical code.

The patch further simplifies the interface of udp{,v6}_push_pending_frames,
since for the second argument (struct udp_sock *up) it always holds that
up = udp_sk(sk); where sk is the first function argument.

Signed-off-by: Gerrit Renker  <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02 21:30:45 -08:00