Commit graph

201456 commits

Author SHA1 Message Date
Jeremy Kerr
e6b8b3e21a ARM: 6260/1: arm/plat-spear: fix debug macro compilation failure
mov rx, =<immediate> isn't valid, use #<immediate> instead.

Signed-off-by: Jeremy Kerr <jeremy.kerr@canonical.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-07-26 10:33:06 +01:00
Jeremy Kerr
f63a79f653 ARM: 6259/1: arm/ns9xxx: fix debug macro compilation failure
We need asm/memory.h for NS9XXX_CSxSTAT_PHYS (via mach/memory.h).

Signed-off-by: Jeremy Kerr <jeremy.kerr@canonical.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-07-26 10:33:05 +01:00
Jeremy Kerr
9729c0ca19 ARM: 6258/1: arm/h720x: fix debug macro compilation failure
IO_BASE shoule be IO_VIRT, and IO_START should be IO_PHYS. We also need
mach/hardware.h for these definitions.

Signed-off-by: Jeremy Kerr <jeremy.kerr@canonical.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-07-26 10:33:05 +01:00
Ben Greear
c736eefadb net: dev_forward_skb should call nf_reset
With conn-track zones and probably with different network
namespaces, the netfilter logic needs to be re-calculated
on packet receive.  If the netfilter logic is not reset,
it will not be recalculated properly.  This patch adds
the nf_reset logic to dev_forward_skb.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-25 21:58:46 -07:00
Rajiv Andrade
59f6fbe429 tpm_tis: fix subsequent suspend failures
Fix subsequent suspends by issuing tpm_continue_selftest during resume.
Otherwise, the tpm chip seems to be not fully initialized and will reject
the save state command during suspend, thus preventing the whole system
to suspend.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16256

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: Rajiv Andrade <srajiv@linux.vnet.ibm.com>
Cc: James Morris <jmorris@namei.org>
Cc: Debora Velarde <debora@linux.vnet.ibm.com>
Cc: David Safford <safford@watson.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Morris <jmorris@namei.org>
2010-07-26 10:25:45 +10:00
Robert P. J. Day
25848b3ec6 ceph: Correct obvious typo of Kconfig variable "CRYPTO_AES"
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-07-24 21:36:07 -07:00
stephen hemminger
3b87956ea6 net sched: fix race in mirred device removal
This fixes hang when target device of mirred packet classifier
action is removed.

If a mirror or redirection action is configured to cause packets
to go to another device, the classifier holds a ref count, but was assuming
the adminstrator cleaned up all redirections before removing. The fix
is to add a notifier and cleanup during unregister.

The new list is implicitly protected by RTNL mutex because
it is held during filter add/delete as well as notifier.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-24 21:04:20 -07:00
David S. Miller
76ac21f5ef Merge branch 'wimax-2.6.35.y' of git://git.kernel.org/pub/scm/linux/kernel/git/inaky/wimax 2010-07-24 20:51:45 -07:00
Michael S. Tsirkin
ef3db4a595 tun: avoid BUG, dump packet on GSO errors
There are still some LRO cards that cause GSO errors in tun,
and BUG on this is an unfriendly way to tell the admin
to disable LRO.

Further, experience shows we might have more GSO bugs lurking.
See https://bugzilla.kernel.org/show_bug.cgi?id=16413
as a recent example.
dumping a packet will make it easier to figure it out.

Replace BUG with warning+dump+drop the packet to make
GSO errors in tun less critical and easier to debug.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Alex Unigovsky <unik@compot.ru>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-24 20:47:20 -07:00
Greg Edwards
d8190dff01 bonding: set device in RLB ARP packet handler
After:

commit 6146b1a4da
Author: Jay Vosburgh <fubar@us.ibm.com>
Date:   Tue Nov 4 17:51:15 2008 -0800

    bonding: Fix ALB mode to balance traffic on VLANs

the dev field in the RLB ARP packet handler was set to NULL to wildcard
and accommodate balancing VLANs on top of bonds.

This has the side-effect of the packet handler being called against
other, non RLB-enabled bonds, and a kernel oops results when it tries to
dereference rx_hashtbl in rlb_update_entry_from_arp(), which won't be
set for those bonds, e.g. active-backup.

With the __netif_receive_skb() changes from:

commit 1f3c8804ac
Author: Andy Gospodarek <andy@greyhouse.net>
Date:   Mon Dec 14 10:48:58 2009 +0000

    bonding: allow arp_ip_targets on separate vlans to use arp validation

frames received on VLANs correctly make their way to the bond's handler,
so we no longer need to wildcard the device.

The oops can be reproduced by:

modprobe bonding

echo active-backup > /sys/class/net/bond0/bonding/mode
echo 100 > /sys/class/net/bond0/bonding/miimon
ifconfig bond0 xxx.xxx.xxx.xxx netmask xxx.xxx.xxx.xxx
echo +eth0 > /sys/class/net/bond0/bonding/slaves
echo +eth1 > /sys/class/net/bond0/bonding/slaves

echo +bond1 > /sys/class/net/bonding_masters
echo balance-alb > /sys/class/net/bond1/bonding/mode
echo 100 > /sys/class/net/bond1/bonding/miimon
ifconfig bond1 xxx.xxx.xxx.xxx netmask xxx.xxx.xxx.xxx
echo +eth2 > /sys/class/net/bond1/bonding/slaves
echo +eth3 > /sys/class/net/bond1/bonding/slaves

Pass some traffic on bond0.  Boom.

[ Tested, behaves as advertised.  I do not believe a test of the bonding
mode is necessary, as there is no race between the packet handler and
the bonding mode changing (the mode can only change when the device is
closed).  Also updated the log message to include the reproduction and
full commit ids.  -J ]

Signed-off-by: Greg Edwards <greg.edwards@hp.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Acked-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-24 20:37:48 -07:00
Len Brown
0e1cf38889 Merge branch 'bugzilla-16396' into release 2010-07-24 23:26:22 -04:00
Rafael J. Wysocki
72ad5d77fb ACPI / Sleep: Allow the NVS saving to be skipped during suspend to RAM
Commit 2a6b69765a
(ACPI: Store NVS state even when entering suspend to RAM) caused the
ACPI suspend code save the NVS area during suspend and restore it
during resume unconditionally, although it is known that some systems
need to use acpi_sleep=s4_nonvs for hibernation to work.  To allow
the affected systems to avoid saving and restoring the NVS area
during suspend to RAM and resume, introduce kernel command line
option acpi_sleep=nonvs and make acpi_sleep=s4_nonvs work as its
alias temporarily (add acpi_sleep=s4_nonvs to the feature removal
file).

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16396 .

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reported-and-tested-by: tomas m <tmezzadra@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2010-07-24 23:26:09 -04:00
Sage Weil
1dadcce358 ceph: fix dentry lease release
When we embed a dentry lease release notification in a request, invalidate
our lease so we don't think we still have it.  Otherwise we can get all
sorts of incorrect client behavior when multiple clients are interacting
with the same part of the namespace.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-07-23 13:54:21 -07:00
Linus Torvalds
86c65a7857 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  vmlinux.lds: fix .data..init_task output section (fix popwerpc boot)
  powerpc: Fix erroneous lmb->memblock conversions
  powerpc/mm: Add some debug output when hash insertion fails
  powerpc/mm: Fix bugs in huge page hashing
  powerpc/mm: Move around testing of _PAGE_PRESENT in hash code
  powerpc/mm: Handle hypervisor pte insert failure in __hash_page_huge
  powerpc/kexec: Fix boundary case for book-e kexec memory limits
2010-07-23 13:26:16 -07:00
Linus Torvalds
20a52d4f59 Merge branch 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6
* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6:
  nconfig: Fix segfault when help contains special characters
  kbuild: Fix make rpm
  kbuild: Make the setlocalversion script POSIX-compliant
2010-07-23 13:25:00 -07:00
Linus Torvalds
339a2afcaa Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf tools: Fix fallback to cplus_demangle() when bfd_demangle() is not available
  perf annotate: Fix handling of goto labels that are valid hex numbers
  tracing: Properly align linker defined symbols
  perf symbols: Fix directory descriptor leaking
  perf: Fix various display bugs with parent filtering
2010-07-23 13:24:02 -07:00
Sage Weil
8c696737aa ceph: fix leak of dentry in ceph_init_dentry() error path
If we fail to allocate a ceph_dentry_info, don't leak the dn reference.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-07-23 10:02:07 -07:00
Sage Weil
bc4fdca857 ceph: fix pg_mapping leak on pg_temp updates
Free the ceph_pg_mapping structs when they are removed from the pg_temp
rbtree.  Also fix a leak in the __insert_pg_mapping() error path.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-07-23 10:02:06 -07:00
Sage Weil
252af52146 ceph: fix d_release dop for snapdir, snapped dentries
We need to set the d_release dop for snapdir and snapped dentries so that
the ceph_dentry_info struct gets released.  We also use the dcache to
cache readdir results when possible, which only works if we know when
dentries are dropped from the cache.  Since we don't use the dcache for
readdir in the hidden snapdir, avoid that case in ceph_dentry_release.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-07-23 10:02:05 -07:00
Stefano Stabellini
ff4878089e x86: Do not try to disable hpet if it hasn't been initialized before
hpet_disable is called unconditionally on machine reboot if hpet support
is compiled in the kernel.
hpet_disable only checks if the machine is hpet capable but doesn't make
sure that hpet has been initialized.

[ tglx: Made it a one liner and removed the redundant hpet_address check ]

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Venkatesh Pallipadi <venki@google.com>
LKML-Reference: <alpine.DEB.2.00.1007211726240.22235@kaball-desktop>
Cc: stable@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-07-23 12:53:00 +02:00
Stephen Boyd
58f915a311 nconfig: Fix segfault when help contains special characters
nconfig segfaults when help text contains the character '%'. For a quick
example, navigate to the kernel compression options and get the help for
bzip2. Doing so triggers a call to mvwprintw() with a string containing
'%' and no extra arguments to fill in the specifier's value. Fix this
case by printing the literal string retrieved from the kconfig.

 #0  0x00002b52b6b11d83 in vfprintf () from /lib/libc.so.6
 #1  0x00002b52b6bad010 in __vsnprintf_chk () from /lib/libc.so.6
 #2  0x00002b52b623991b in _nc_printf_string () from
 /lib/libncursesw.so.5
 #3  0x00002b52b6234cff in vwprintw () from /lib/libncursesw.so.5
 #4  0x00002b52b6234db9 in mvwprintw () from /lib/libncursesw.so.5
 #5  0x00000000004151d8 in fill_window (win=0x21b64c0,
     text=0x21b62b0 "CONFIG_KERNEL_BZIP2:\n\nIts compression ratio and
     speed is intermediate.\nDecompression speed is slowest among the
     three.  The kernel\nsize is about 10% smaller with bzip2, in
     comparison to gzip.\nBzip2 us"...)
     at scripts/kconfig/nconf.gui.c:229
 #6  0x0000000000416335 in show_scroll_win (main_window=0x21a5630,
         title=0x157fa30 "Bzip2",
 	    text=0x21b62b0 "CONFIG_KERNEL_BZIP2:\n\nIts compression
 	    ratio and speed is intermediate.\nDecompression speed is
 	    slowest among the three.  The kernel\nsize is about 10%
 	    smaller with bzip2, in comparison to gzip.\nBzip2 us"...)
     at scripts/kconfig/nconf.gui.c:535
 #7  0x00000000004055b2 in show_help (menu=0x157f9d0)
         at scripts/kconfig/nconf.c:1257
 #8  0x0000000000405897 in conf_choice (menu=0x157f130)
 	    at scripts/kconfig/nconf.c:1321
 #9  0x0000000000405326 in conf (menu=0x157d130) at
 	    scripts/kconfig/nconf.c:1208
 #10 0x00000000004052e8 in conf (menu=0xb434a0) at
 	    scripts/kconfig/nconf.c:1203
 #11 0x0000000000406092 in main (ac=2, av=0x7fff96a93c38)

Cc: Michal Marek <mmarek@suse.cz>
Cc: Nir Tzachar <nir.tzachar@gmail.com>
Signed-off-by: Stephen Boyd <bebarino@gmail.com>
Signed-off-by: Michal Marek <mmarek@suse.cz>
2010-07-23 11:23:42 +02:00
Avi Kivity
7a73c0283d KVM: Use kmalloc() instead of vmalloc() for KVM_[GS]ET_MSR
We don't need more than a page, and vmalloc() is slower (much
slower recently due to a regression).

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-07-23 09:07:14 +03:00
Xiao Guangrong
6aa0b9dec5 KVM: MMU: fix conflict access permissions in direct sp
In no-direct mapping, we mark sp is 'direct' when we mapping the
guest's larger page, but its access is encoded form upper page-struct
entire not include the last mapping, it will cause access conflict.

For example, have this mapping:
        [W]
      / PDE1 -> |---|
  P[W]          |   | LPA
      \ PDE2 -> |---|
        [R]

P have two children, PDE1 and PDE2, both PDE1 and PDE2 mapping the
same lage page(LPA). The P's access is WR, PDE1's access is WR,
PDE2's access is RO(just consider read-write permissions here)

When guest access PDE1, we will create a direct sp for LPA, the sp's
access is from P, is W, then we will mark the ptes is W in this sp.

Then, guest access PDE2, we will find LPA's shadow page, is the same as
PDE's, and mark the ptes is RO.

So, if guest access PDE1, the incorrect #PF is occured.

Fixed by encode the last mapping access into direct shadow page

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-07-23 09:07:04 +03:00
Benjamin Herrenschmidt
7ffb65f84b Merge commit 'kumar/merge' into merge 2010-07-23 13:46:21 +10:00
Sam Ravnborg
da5e37efe8 vmlinux.lds: fix .data..init_task output section (fix popwerpc boot)
The .data..init_task output section was missing
a load offset causing a popwerpc target to fail to boot.

Sean MacLennan tracked it down to the definition of
INIT_TASK_DATA_SECTION().

There are only two users of INIT_TASK_DATA_SECTION()
in the kernel today: cris and popwerpc.
cris do not support relocatable kernels and is thus not
impacted by this change.

Fix INIT_TASK_DATA_SECTION() to specify load offset like
all other output sections.

Reported-by: Sean MacLennan <smaclennan@pikatech.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-23 13:45:12 +10:00
Benjamin Herrenschmidt
3fdfd99051 powerpc: Fix erroneous lmb->memblock conversions
Oooops... we missed these. We incorrectly converted strings
used when parsing the device-tree on pseries, thus breaking
access to drconf memory and hotplug memory.

While at it, also revert some variable names that represent
something the FW calls "lmb" and thus don't need to be converted
to "memblock".

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
2010-07-23 12:56:57 +10:00
Benjamin Herrenschmidt
4b8692c022 powerpc/mm: Add some debug output when hash insertion fails
This adds some debug output to our MMU hash code to print out some
useful debug data if the hypervisor refuses the insertion (which
should normally never happen).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
2010-07-23 12:56:56 +10:00
Benjamin Herrenschmidt
171aa2caaa powerpc/mm: Fix bugs in huge page hashing
There's a couple of nasty bugs lurking in our huge page hashing code.

First, we don't check the access permission atomically with setting
the _PAGE_BUSY bit, which means that the PTE value we end up using
for the hashing might be different than the one we have checked
the access permissions for.

We've seen cases where that leads us to try to use an invalidated
PTE for hashing, causing all sort of "interesting" issues.

Then, we also failed to set _PAGE_DIRTY on a write access.

Finally, a minor tweak but we should return 0 when we find the
PTE busy, in order to just re-execute the access, rather than 1
which means going to do_page_fault().

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
2010-07-23 12:55:21 +10:00
Benjamin Herrenschmidt
ca91e6c09d powerpc/mm: Move around testing of _PAGE_PRESENT in hash code
Instead of adding _PAGE_PRESENT to the access permission mask
in each low level routine independently, we add it once from
hash_page().

We also move the preliminary access check (the racy one before
the PTE is locked) up so it applies to the huge page case. This
duplicates code in __hash_page_huge() which we'll remove in a
subsequent patch to fix a race in there.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-23 08:53:23 +10:00
Anton Blanchard
b1623e7eb2 powerpc/mm: Handle hypervisor pte insert failure in __hash_page_huge
If the hypervisor gives us an error on a hugepage insert we panic. The
normal page code already handles this by returning an error instead and we end
calling low_hash_fault which will just kill the task if possible.

The patch below does a similar thing for the hugepage case.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-23 08:44:51 +10:00
Len Brown
bbac30edb3 Merge branch 'misc' into release 2010-07-22 18:19:12 -04:00
Len Brown
4a973f2495 Merge branch 'bugzilla-15886' into release 2010-07-22 18:18:28 -04:00
Len Brown
be48b11573 Merge branch 'bugzilla-102904-workaround' into release 2010-07-22 18:18:18 -04:00
Len Brown
27568d8e5f Merge branch 'bugzilla-16244' into release 2010-07-22 18:18:05 -04:00
Len Brown
855977ef6d Merge branch 'bugzilla-16271' into release 2010-07-22 18:17:39 -04:00
Len Brown
840ba24dd6 Merge branch 'bugzilla-16357' into release 2010-07-22 18:17:33 -04:00
Alexey Shvetsov
41a8730c23 wimax/i2400m: Add PID & VID for Intel WiMAX 6250
This version of intel wimax device was found in my IBM ThinkPad x201

Signed-off-by: Alexey Shvetsov <alexxy@gentoo.org>
2010-07-22 14:50:34 -07:00
Len Brown
d3e7e99f2f ACPI: create "processor.bm_check_disable" boot param
processor.bm_check_disable=1" prevents Linux from checking BM_STS
before entering C3-type cpu power states.

This may be useful for a system running acpi_idle
where the BIOS exports FADT C-states, _CST IO C-states,
or _CST FFH C-states with the BM_STS bit set;
while configuring the chipset to set BM_STS
more frequently than perhaps is optimal.

Note that such systems may have been developed
using a tickful OS that would quickly clear BM_STS,
rather than a tickless OS that may go for some time
between checking and clearing BM_STS.

Note also that an alternative for newer systems
is to use the intel_idle driver, which always
ignores BM_STS, relying Linux device drivers
to register constraints explicitly via PM_QOS.

https://bugzilla.kernel.org/show_bug.cgi?id=15886

Signed-off-by: Len Brown <len.brown@intel.com>
2010-07-22 17:23:10 -04:00
Len Brown
718be4aaf3 ACPI: skip checking BM_STS if the BIOS doesn't ask for it
It turns out that there is a bit in the _CST for Intel FFH C3
that tells the OS if we should be checking BM_STS or not.

Linux has been unconditionally checking BM_STS.
If the chip-set is configured to enable BM_STS,
it can retard or completely prevent entry into
deep C-states -- as illustrated by turbostat:

http://userweb.kernel.org/~lenb/acpi/utils/pmtools/turbostat/

ref: Intel Processor Vendor-Specific ACPI Interface Specification
table 4 "_CST FFH GAS Field Encoding"
Bit 1: Set to 1 if OSPM should use Bus Master avoidance for this C-state

https://bugzilla.kernel.org/show_bug.cgi?id=15886

Signed-off-by: Len Brown <len.brown@intel.com>
2010-07-22 16:54:27 -04:00
Sage Weil
a0dff78dab ceph: avoid dcache readdir for snapdir
We should always go to the MDS for readdir on the hidden snapdir.  The
set of snapshots can change at any time; the client can't trust its cache
for that.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-07-22 13:50:45 -07:00
Brian Haley
64e724f62a ipv6: Don't add routes to ipv6 disabled interfaces.
If the interface has IPv6 disabled, don't add a multicast or
link-local route since we won't be adding a link-local address.

Reported-by: Mahesh Kelkar <maheshkelkar@gmail.com>
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-22 13:41:32 -07:00
Conny Seidel
8a4fd31e0e perf tools: Fix fallback to cplus_demangle() when bfd_demangle() is not available
make version 3.80 doesn't support "else ifdef" on the same line, also it
doesn't support unindented nested constructs.

Build fails with:
Makefile:608: Extraneous text after `else' directive
Makefile:611: *** only one `else' per conditional.  Stop.

This patch fixes the build for make 3.80.

Cc: Ingo Molnar <mingo@elte.hu>,
Cc: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1278430783-17259-1-git-send-email-conny.seidel@amd.com>
Signed-off-by: Conny Seidel <conny.seidel@amd.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-07-22 17:30:39 -03:00
David S. Miller
be2b6e6235 net: Fix skb_copy_expand() handling of ->csum_start
It should only be adjusted if ip_summed == CHECKSUM_PARTIAL.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-22 13:27:09 -07:00
Andrea Shepard
00c5a9834b net: Fix corruption of skb csum field in pskb_expand_head() of net/core/skbuff.c
Make pskb_expand_head() check ip_summed to make sure csum_start is really
csum_start and not csum before adjusting it.

This fixes a bug I encountered using a Sun Quad-Fast Ethernet card and VLANs.
On my configuration, the sunhme driver produces skbs with differing amounts
of headroom on receive depending on the packet size.  See line 2030 of
drivers/net/sunhme.c; packets smaller than RX_COPY_THRESHOLD have 52 bytes
of headroom but packets larger than that cutoff have only 20 bytes.

When these packets reach the VLAN driver, vlan_check_reorder_header()
calls skb_cow(), which, if the packet has less than NET_SKB_PAD (== 32) bytes
of headroom, uses pskb_expand_head() to make more.

Then, pskb_expand_head() needs to adjust a lot of offsets into the skb,
including csum_start.  Since csum_start is a union with csum, if the packet
has a valid csum value this will corrupt it, which was the effect I observed.
The sunhme hardware computes receive checksums, so the skbs would be created
by the driver with ip_summed == CHECKSUM_COMPLETE and a valid csum field, and
then pskb_expand_head() would corrupt the csum field, leading to an "hw csum
error" message later on, for example in icmp_rcv() for pings larger than the
sunhme RX_COPY_THRESHOLD.

On the basis of the comment at the beginning of include/linux/skbuff.h,
I believe that the csum_start skb field is only meaningful if ip_csummed is
CSUM_PARTIAL, so this patch makes pskb_expand_head() adjust it only in that
case to avoid corrupting a valid csum value.

Please see my more in-depth disucssion of tracking down this bug for
more details if you like:

http://puellavulnerata.livejournal.com/112186.html
http://puellavulnerata.livejournal.com/112567.html
http://puellavulnerata.livejournal.com/112891.html
http://puellavulnerata.livejournal.com/113096.html
http://puellavulnerata.livejournal.com/113591.html

I am not subscribed to this list, so please CC me on replies.

Signed-off-by: Andrea Shepard <andrea@persephoneslair.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-22 13:25:18 -07:00
Herbert Xu
8a35747a5d macvtap: Limit packet queue length
Mark Wagner reported OOM symptoms when sending UDP traffic over
a macvtap link to a kvm receiver.

This appears to be caused by the fact that macvtap packet queues
are unlimited in length.  This means that if the receiver can't
keep up with the rate of flow, then we will hit OOM. Of course
it gets worse if the OOM killer then decides to kill the receiver.

This patch imposes a cap on the packet queue length, in the same
way as the tuntap driver, using the device TX queue length.

Please note that macvtap currently has no way of giving congestion
notification, that means the software device TX queue cannot be
used and packets will always be dropped once the macvtap driver
queue fills up.

This shouldn't be a great problem for the scenario where macvtap
is used to feed a kvm receiver, as the traffic is most likely
external in origin so congestion notification can't be applied
anyway.

Of course, if anybody decides to complain about guest-to-guest
UDP packet loss down the track, then we may have to revisit this.

Incidentally, this patch also fixes a real memory leak when
macvtap_get_queue fails.

Chris Wright noticed that for this patch to work, we need a
non-zero TX queue length.  This patch includes his work to change
the default macvtap TX queue length to 500.

Reported-by: Mark Wagner <mwagner@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Chris Wright <chrisw@sous-sol.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-22 13:08:56 -07:00
Linus Torvalds
b37fa16e78 Linux 2.6.35-rc6 2010-07-22 12:13:38 -07:00
Linus Torvalds
27efd7e2e6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: synaptics - relax capability ID checks on newer hardware
  Input: twl40300-keypad - fix handling of "all ground" rows
  Input: gamecon - reference correct pad in gc_psx_command()
  Input: gamecon - reference correct input device in NES mode
  Input: w90p910_keypad - change platfrom driver name to 'nuc900-kpi'
  Input: i8042 - add Gigabyte Spring Peak to dmi_noloop_table
  Input: qt2160 - rename kconfig symbol name
2010-07-22 11:46:15 -07:00
Linus Torvalds
84d4db0e22 Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  drm/radeon/kms: add quirk to make HP DV5000 laptop resume
  drm/radeon/kms: fix RADEON_INFO_CRTC_FROM_ID info ioctl
  Fix ttm_page_alloc.c build breakage
  drm/radeon/kms: fix legacy LVDS dpms sequence
  drm/radeon/kms: drop taking lock around crtc lookup.
2010-07-22 11:45:57 -07:00
Linus Torvalds
38ea6e62d3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: talitos - fix bug in sg_copy_end_to_buffer
2010-07-22 11:45:23 -07:00
Linus Torvalds
2851785deb Merge branch 'x86/auditsyscall' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland
* 'x86/auditsyscall' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland:
  x86: auditsyscall: fix fastpath return value after reschedule
2010-07-22 11:45:02 -07:00