Commit graph

211523 commits

Author SHA1 Message Date
Jeremy Fitzhardinge
42ee1471e9 xen: implement "extra" memory to reserve space for pages not present at boot
When using the e820 map to get the initial pseudo-physical address space,
look for either Xen-provided memory which doesn't lie within an E820
region, or an E820 RAM region which extends beyond the Xen-provided
memory range.

Count these pages, and add them to a new "extra memory" range.  This range
has an E820 RAM range to describe it - so the kernel will allocate page
structures for it - but it is also marked reserved so that the kernel
will not attempt to use it.

The balloon driver can then add this range as a set of currently
ballooned-out pages, which can be used to extend the domain beyond its
original size.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:27 -07:00
Ian Campbell
35ae11fd14 xen: Use host-provided E820 map
Rather than simply using a flat memory map from Xen, use its provided
E820 map.  This allows the domain builder to tell the domain to reserve
space for more pages than those initially provided at domain-build time.

It also allows the host to specify holes in the address space (for
PCI-passthrough, for example).

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:27 -07:00
Jeremy Fitzhardinge
cfd8951e08 xen: don't map missing memory
When setting up a pte for a missing pfn (no matching mfn), just create
an empty pte rather than a junk mapping.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:26 -07:00
Jeremy Fitzhardinge
33a847502b xen: defer building p2m mfn structures until kernel is mapped
When building mfn parts of p2m structure, we rely on being able to
use mfn_to_virt, which in turn requires kernel to be mapped into
the linear area (which is distinct from the kernel image mapping
on 64-bit).  Defer calling xen_build_mfn_list_list() until after
xen_setup_kernel_pagetable();

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:25 -07:00
Jeremy Fitzhardinge
c3798062f1 xen: add return value to set_phys_to_machine()
set_phys_to_machine() can return false on failure, which means a memory
allocation failure for the p2m structure.  It can only fail if setting
the mfn for a pfn in previously unused address space.  It is guaranteed
to succeed if you're setting a mapping to INVALID_P2M_ENTRY or updating
the mfn for an existing pfn.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:25 -07:00
Jeremy Fitzhardinge
58e05027b5 xen: convert p2m to a 3 level tree
Make the p2m structure a 3 level tree which covers the full possible
physical space.

The p2m structure contains mappings from the domain's pfns to system-wide
mfns.  The structure has 3 levels and two roots.  The first root is for
the domain's own use, and is linked with virtual addresses.  The second
is all mfn references, and is used by Xen on save/restore to allow it to
update the p2m mapping for the domain.

At boot, the domain builder provides a simple flat p2m array for all the
initially present pages.  We construct the two levels above that using
the early_brk allocator.  After early boot time, set_phys_to_machine()
will allocate any missing levels using the normal kernel allocator
(at GFP_KERNEL, so it must be called in a normal blocking context).

Because the early_brk() API requires us to pre-reserve the maximum amount
of memory we could allocate, there is still a CONFIG_XEN_MAX_DOMAIN_MEMORY
config option, but its only negative side-effect is to increase the
kernel's apparent bss size.  However, since all unused brk memory is
returned to the heap, there's no real downside to making it large.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:24 -07:00
Jeremy Fitzhardinge
bbbf61eff9 xen: make install_p2mtop_page() static
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:23 -07:00
Jeremy Fitzhardinge
1f2d9dd309 xen: set the actual extent of the mfn_list_list
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:23 -07:00
Jeremy Fitzhardinge
b7eb4ad391 xen: set shared_info->arch.max_pfn to max_p2m_pfn
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:22 -07:00
Jeremy Fitzhardinge
3588fe2e3f xen/events: change to using fasteoi
Change event delivery to:
 - mask+clear event in the upcall function
 - use handle_fasteoi_irq as the handler
 - unmask in the eoi function (and handle migration)

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:21 -07:00
Jeremy Fitzhardinge
1e17fc7eff xen: remove noise about registering vcpu info
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:21 -07:00
Jeremy Fitzhardinge
764f0138b9 xen: allocate level1_ident_pgt
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:20 -07:00
Jeremy Fitzhardinge
f0991802bb xen: use early_brk for level2_kernel_pgt
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:19 -07:00
Jeremy Fitzhardinge
a2e8752987 xen: allocate p2m size based on actual max size
Allocate p2m tables based on the actual runtime maximum pfn rather than
the static config-time limit.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:19 -07:00
Jeremy Fitzhardinge
a171ce6e7b xen: dynamically allocate p2m space
Use early brk mechanism to allocate p2m tables, to save memory when
booting non-Xen.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:18 -07:00
Jeremy Fitzhardinge
5e941c0939 x86: add RESERVE_BRK_ARRAY() helper
Useful when converting static arrays into boottime brk allocated objects.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-10-22 12:57:17 -07:00
Linus Torvalds
cd07202cc8 Linux 2.6.36-rc8 2010-10-14 16:26:43 -07:00
Linus Torvalds
3aa0ce825a Un-inline the core-dump helper functions
Tony Luck reports that the addition of the access_ok() check in commit
0eead9ab41 ("Don't dump task struct in a.out core-dumps") broke the
ia64 compile due to missing the necessary header file includes.

Rather than add yet another include (<asm/unistd.h>) to make everything
happy, just uninline the silly core dump helper functions and move the
bodies to fs/exec.c where they make a lot more sense.

dump_seek() in particular was too big to be an inline function anyway,
and none of them are in any way performance-critical.  And we really
don't need to mess up our include file headers more than they already
are.

Reported-and-tested-by: Tony Luck <tony.luck@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-14 14:32:06 -07:00
Linus Torvalds
ae42d8d441 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  ehea: Fix a checksum issue on the receive path
  net: allow FEC driver to use fixed PHY support
  tg3: restore rx_dropped accounting
  b44: fix carrier detection on bind
  net: clear heap allocations for privileged ethtool actions
  NET: wimax, fix use after free
  ATM: iphase, remove sleep-inside-atomic
  ATM: mpc, fix use after free
  ATM: solos-pci, remove use after free
  net/fec: carrier off initially to avoid root mount failure
  r8169: use device model DMA API
  r8169: allocate with GFP_KERNEL flag when able to sleep
2010-10-14 11:19:44 -07:00
Linus Torvalds
0eead9ab41 Don't dump task struct in a.out core-dumps
akiphie points out that a.out core-dumps have that odd task struct
dumping that was never used and was never really a good idea (it goes
back into the mists of history, probably the original core-dumping
code).  Just remove it.

Also do the access_ok() check on dump_write().  It probably doesn't
matter (since normal filesystems all seem to do it anyway), but he
points out that it's normally done by the VFS layer, so ...

[ I suspect that we should possibly do "vfs_write()" instead of
  calling ->write directly.  That also does the whole fsnotify and write
  statistics thing, which may or may not be a good idea. ]

And just to be anal, do this all for the x86-64 32-bit a.out emulation
code too, even though it's not enabled (and won't currently even
compile)

Reported-by: akiphie <akiphie@lavabit.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-14 10:57:40 -07:00
Linus Torvalds
53eeb64e80 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx:
  ioat2: fix performance regression
2010-10-13 16:51:59 -07:00
Linus Torvalds
8c35bf368c Merge branch 'for-2.6.36' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.36' of git://linux-nfs.org/~bfields/linux:
  nfsd: fix BUG at fs/nfsd/nfsfh.h:199 on unlink
2010-10-13 16:51:29 -07:00
Linus Torvalds
fec896e21b Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  ring-buffer: Fix typo of time extends per page
  perf, MIPS: Support cross compiling of tools/perf for MIPS
  perf: Fix incorrect copy_from_user() usage
2010-10-13 16:50:23 -07:00
Linus Torvalds
d94bc4fc24 Merge master.kernel.org:/home/rmk/linux-2.6-arm
* master.kernel.org:/home/rmk/linux-2.6-arm:
  ARM: relax ioremap prohibition (309caa9) for -final and -stable
  ARM: 6440/1: ep93xx: DMA: fix channel_disable
  cpuimx27: fix i2c bus selection
  cpuimx27: fix compile when ULPI is selected
  ARM: 6435/1: Fix HWCAP_TLS flag for ARM11MPCore/Cortex-A9
  ARM: 6436/1: AT91: Fix power-saving in idle-mode on 926T processors
  ARM: fix section mismatch warnings in Versatile Express
  ARM: 6412/1: kprobes-decode: add support for MOVW instruction
  ARM: 6419/1: mmu: Fix MT_MEMORY and MT_MEMORY_NONCACHED pte flags
  ARM: 6416/1: errata: faulty hazard checking in the Store Buffer may lead to data corruption
2010-10-13 16:35:33 -07:00
Linus Torvalds
7081319658 Merge branch 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6
* 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6:
  omap: iommu-load cam register before flushing the entry
2010-10-13 16:35:05 -07:00
Linus Torvalds
a56f31a0c6 Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  drm/radeon/kms: Silent spurious error message
  drm/radeon/kms: fix bad cast/shift in evergreen.c
  drm/radeon/kms: make TV/DFP table info less verbose
  drm/radeon/kms: leave certain CP int bits enabled
  drm/radeon/kms: avoid corner case issue with unmappable vram V2
2010-10-13 16:34:46 -07:00
Linus Torvalds
509d4486bd Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, numa: For each node, register the memory blocks actually used
  x86, AMD, MCE thresholding: Fix the MCi_MISCj iteration order
  x86, mce, therm_throt.c: Fix missing curly braces in error handling logic
2010-10-13 16:34:23 -07:00
Dan Williams
c50a898fd4 ioat2: fix performance regression
Commit 0793448 "DMAENGINE: generic channel status v2" changed the interface for
how dma channel progress is retrieved.  It inadvertently exported an internal
helper function ioat_tx_status() instead of ioat_dma_tx_status().  The latter
polls the hardware to get the latest completion state, while the helper just
evaluates the current state without touching hardware.  The effect is that we
end up waiting for completion timeouts or descriptor allocation errors before
the completion state is updated.

iperf (before fix):
[SUM]  0.0-41.3 sec   364 MBytes  73.9 Mbits/sec

iperf (after fix):
[SUM]  0.0- 4.5 sec   499 MBytes   940 Mbits/sec

This is a regression starting with 2.6.35.

Cc: <stable@kernel.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Linus Walleij <linus.walleij@stericsson.com>
Cc: Maciej Sosnowski <maciej.sosnowski@intel.com>
Reported-by: Richard Scobie <richard@sauce.co.nz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2010-10-13 15:43:10 -07:00
Breno Leitao
71085ce828 ehea: Fix a checksum issue on the receive path
Currently we set all skbs with CHECKSUM_UNNECESSARY, even
those whose protocol we don't know. This patch just
add the CHECKSUM_COMPLETE tag for non TCP/UDP packets.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Breno Leitao <leitao@linux.vnet.ibm.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-13 14:24:59 -07:00
J. Bruce Fields
b1e86db1de nfsd: fix BUG at fs/nfsd/nfsfh.h:199 on unlink
As of commit 43a9aa64a2 "NFSD:
Fill in WCC data for REMOVE, RMDIR, MKNOD, and MKDIR", we sometimes call
fh_unlock on a filehandle that isn't fully initialized.

We should fix up the callers, but as a quick fix it is also sufficient
just to remove this assertion.

Reported-by: Marius Tolzmann <tolzmann@molgen.mpg.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-13 15:48:55 -04:00
Greg Ungerer
6fcc040f02 net: allow FEC driver to use fixed PHY support
At least one board using the FEC driver does not have a conventional
PHY attached to it, it is directly connected to a somewhat simple
ethernet switch (the board is the SnapGear/LITE, and the attached
4-port ethernet switch is a RealTek RTL8305). This switch does not
present the usual register interface of a PHY, it presents nothing.
So a PHY scan will find nothing - it finds ID's of 0 for each PHY
on the attached MII bus.

After the FEC driver was changed to use phylib for supporting PHYs
it no longer works on this particular board/switch setup.

Add code support to use a fixed phy if no PHY is found on the MII bus.
This is based on the way the cpmac.c driver solved this same problem.

Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-13 09:56:31 -07:00
Russell King
06c1088448 ARM: relax ioremap prohibition (309caa9) for -final and -stable
... but produce a big warning about the problem as encouragement
for people to fix their drivers.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-10-13 00:19:03 +01:00
Russell King
841f48a849 Merge branch 'for-rmk' of git://git.pengutronix.de/git/imx/linux-2.6 2010-10-12 22:43:36 +01:00
Mika Westerberg
10d48b3934 ARM: 6440/1: ep93xx: DMA: fix channel_disable
When channel_disable() is called, it disables per channel interrupts and
waits until channels state becomes STATE_STALL, and then disables the
channel. Now, if the DMA transfer is disabled while the channel is in
STATE_NEXT we will not wait anything and disable the channel immediately.
This seems to cause weird data corruption for example in audio transfers.

Fix is to wait while we are in STATE_NEXT or STATE_ON and only then
disable the channel.

Signed-off-by: Mika Westerberg <mika.westerberg@iki.fi>
Acked-by: Ryan Mallon <ryan@bluewatersys.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-10-12 22:43:19 +01:00
Linus Torvalds
0acc1b2afb Merge branch 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Move TSC reset out of vmcb_init
  KVM: x86: Fix SVM VMCB reset
2010-10-12 09:16:01 -07:00
Steven Rostedt
d01343244a ring-buffer: Fix typo of time extends per page
Time stamps for the ring buffer are created by the difference between
two events. Each page of the ring buffer holds a full 64 bit timestamp.
Each event has a 27 bit delta stamp from the last event. The unit of time
is nanoseconds, so 27 bits can hold ~134 milliseconds. If two events
happen more than 134 milliseconds apart, a time extend is inserted
to add more bits for the delta. The time extend has 59 bits, which
is good for ~18 years.

Currently the time extend is committed separately from the event.
If an event is discarded before it is committed, due to filtering,
the time extend still exists. If all events are being filtered, then
after ~134 milliseconds a new time extend will be added to the buffer.

This can only happen till the end of the page. Since each page holds
a full timestamp, there is no reason to add a time extend to the
beginning of a page. Time extends can only fill a page that has actual
data at the beginning, so there is no fear that time extends will fill
more than a page without any data.

When reading an event, a loop is made to skip over time extends
since they are only used to maintain the time stamp and are never
given to the caller. As a paranoid check to prevent the loop running
forever, with the knowledge that time extends may only fill a page,
a check is made that tests the iteration of the loop, and if the
iteration is more than the number of time extends that can fit in a page
a warning is printed and the ring buffer is disabled (all of ftrace
is also disabled with it).

There is another event type that is called a TIMESTAMP which can
hold 64 bits of data in the theoretical case that two events happen
18 years apart. This code has not been implemented, but the name
of this event exists, as well as the structure for it. The
size of a TIMESTAMP is 16 bytes, where as a time extend is only
8 bytes. The macro used to calculate how many time extends can fit on
a page used the TIMESTAMP size instead of the time extend size
cutting the amount in half.

The following test case can easily trigger the warning since we only
need to have half the page filled with time extends to trigger the
warning:

 # cd /sys/kernel/debug/tracing/
 # echo function > current_tracer
 # echo 'common_pid < 0' > events/ftrace/function/filter
 # echo > trace
 # echo 1 > trace_marker
 # sleep 120
 # cat trace

Enabling the function tracer and then setting the filter to only trace
functions where the process id is negative (no events), then clearing
the trace buffer to ensure that we have nothing in the buffer,
then write to trace_marker to add an event to the beginning of a page,
sleep for 2 minutes (only 35 seconds is probably needed, but this
guarantees the bug), and then finally reading the trace which will
trigger the bug.

This patch fixes the typo and prevents the false positive of that warning.

Reported-by: Hans J. Koch <hjk@linutronix.de>
Tested-by: Hans J. Koch <hjk@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Stable Kernel <stable@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-10-12 12:06:43 -04:00
Deng-Cheng Zhu
c1e028ef40 perf, MIPS: Support cross compiling of tools/perf for MIPS
Changes:
 v4: Fix the cosmetic issue of redundant dot-ops
 v3: Change rmb() to use SYNC
 v2: Include mips unistd.h and define rmb()/cpu_relax() in tools/perf/perf.h

Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: David Daney <ddaney@caviumnetworks.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 13:34:37 +02:00
Jean Delvare
a8c051f0c8 drm/radeon/kms: Silent spurious error message
I see the following error message in my kernel log from time to time:
radeon 0000:07:00.0: ffff88007c334000 reserve failed for wait
radeon 0000:07:00.0: ffff88007c334000 reserve failed for wait

After investigation, it turns out that there's nothing to be afraid of
and everything works as intended. So remove the spurious log message.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-10-12 20:18:07 +10:00
Alex Deucher
d31dba5848 drm/radeon/kms: fix bad cast/shift in evergreen.c
Missing parens.

fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=30718

Reported-by: Dave Gilbert <freedesktop@treblig.org>
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-10-12 20:18:06 +10:00
Alex Deucher
40f76d81fb drm/radeon/kms: make TV/DFP table info less verbose
Make TV standard and DFP table revisions debug only.

Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-10-12 20:18:05 +10:00
Alex Deucher
3555e53b5b drm/radeon/kms: leave certain CP int bits enabled
These bits are used for internal communication and should
be left enabled.  This may fix s/r issues on some systems.

Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-10-12 20:18:04 +10:00
Jerome Glisse
c919b371cb drm/radeon/kms: avoid corner case issue with unmappable vram V2
We should not allocate any object into unmappable vram if we
have no means to access them which on all GPU means having the
CP running and on newer GPU having the blit utility working.

This patch limit the vram allocation to visible vram until
we have acceleration up and running.

Note that it's more than unlikely that we run into any issue
related to that as when acceleration is not woring userspace
should allocate any object in vram beside front buffer which
should fit in visible vram.

V2 use real_vram_size as mc_vram_size could be bigger than
   the actual amount of vram

[airlied: fixup r700_cp_stop case]

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-10-12 20:17:43 +10:00
John Blackwood
ad0cf3478d perf: Fix incorrect copy_from_user() usage
perf events: repair incorrect use of copy_from_user

This makes the perf_event_period() return 0 instead of
-EFAULT on success.

Signed-off-by: John Blackwood<john.blackwood@ccur.com>
Signed-off-by: Joe Korty <joe.korty@ccur.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100928220311.GA18145@tsunami.ccur.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-12 11:45:01 +02:00
Eric Paris
7c5347733d fanotify: disable fanotify syscalls
This patch disables the fanotify syscalls by just not building them and
letting the cond_syscall() statements in kernel/sys_ni.c redirect them
to sys_ni_syscall().

It was pointed out by Tvrtko Ursulin that the fanotify interface did not
include an explicit prioritization between groups.  This is necessary
for fanotify to be usable for hierarchical storage management software,
as they must get first access to the file, before inotify-like notifiers
see the file.

This feature can be added in an ABI compatible way in the next release
(by using a number of bits in the flags field to carry the info) but it
was suggested by Alan that maybe we should just hold off and do it in
the next cycle, likely with an (new) explicit argument to the syscall.
I don't like this approach best as I know people are already starting to
use the current interface, but Alan is all wise and noone on list backed
me up with just using what we have.  I feel this is needlessly ripping
the rug out from under people at the last minute, but if others think it
needs to be a new argument it might be the best way forward.

Three choices:
Go with what we got (and implement the new feature next cycle).  Add a
new field right now (and implement the new feature next cycle).  Wait
till next cycle to release the ABI (and implement the new feature next
cycle).  This is number 3.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-11 18:15:28 -07:00
Eric Dumazet
b0057c51db tg3: restore rx_dropped accounting
commit 511d22247b (tg3: 64 bit stats on all arches), overlooked the
rx_dropped accounting.

We use a full "struct rtnl_link_stats64" to hold rx_dropped value, but
forgot to report it in tg3_get_stats64().

Use an "unsigned long" instead to shrink "struct tg3" by 176 bytes, and
report this value to stats readers.

Increment rx_dropped counter for oversized frames.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Michael Chan <mchan@broadcom.com>
CC: Matt Carlson <mcarlson@broadcom.com>
Acked-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-11 16:06:24 -07:00
Paul Fertser
bcf64aa379 b44: fix carrier detection on bind
For carrier detection to work properly when binding the driver with a cable
unplugged, netif_carrier_off() should be called after register_netdev(),
not before.

Signed-off-by: Paul Fertser <fercerpav@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-11 15:45:35 -07:00
Yinghai Lu
73cf624d02 x86, numa: For each node, register the memory blocks actually used
Russ reported SGI UV is broken recently. He said:

| The SRAT table shows that memory range is spread over two nodes.
|
| SRAT: Node 0 PXM 0 100000000-800000000
| SRAT: Node 1 PXM 1 800000000-1000000000
| SRAT: Node 0 PXM 0 1000000000-1080000000
|
|Previously, the kernel early_node_map[] would show three entries
|with the proper node.
|
|[    0.000000]     0: 0x00100000 -> 0x00800000
|[    0.000000]     1: 0x00800000 -> 0x01000000
|[    0.000000]     0: 0x01000000 -> 0x01080000
|
|The problem is recent community kernel early_node_map[] shows
|only two entries with the node 0 entry overlapping the node 1
|entry.
|
|    0: 0x00100000 -> 0x01080000
|    1: 0x00800000 -> 0x01000000

After looking at the changelog, Found out that it has been broken for a while by
following commit

|commit 8716273cae
|Author: David Rientjes <rientjes@google.com>
|Date:   Fri Sep 25 15:20:04 2009 -0700
|
|    x86: Export srat physical topology

Before that commit, register_active_regions() is called for every SRAT memory
entry right away.

Use nodememblk_range[] instead of nodes[] in order to make sure we
capture the actual memory blocks registered with each node.  nodes[]
contains an extended range which spans all memory regions associated
with a node, but that does not mean that all the memory in between are
included.

Reported-by: Russ Anderson <rja@sgi.com>
Tested-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4CB27BDF.5000800@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: <stable@kernel.org> 2.6.33 .34 .35 .36
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-10-11 15:26:15 -07:00
Kees Cook
b00916b189 net: clear heap allocations for privileged ethtool actions
Several other ethtool functions leave heap uncleared (potentially) by
drivers. Some interfaces appear safe (eeprom, etc), in that the sizes
are well controlled. In some situations (e.g. unchecked error conditions),
the heap will remain unchanged in areas before copying back to userspace.
Note that these are less of an issue since these all require CAP_NET_ADMIN.

Cc: stable@kernel.org
Signed-off-by: Kees Cook <kees.cook@canonical.com>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-11 12:23:25 -07:00
Jiri Slaby
0aa7deadff NET: wimax, fix use after free
Stanse found that i2400m_rx frees skb, but still uses skb->len even
though it has skb_len defined. So use skb_len properly in the code.

And also define it unsinged int rather than size_t to solve
compilation warnings.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>
Cc: linux-wimax@intel.com
Acked-by: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-11 11:05:43 -07:00
Jiri Slaby
ec622ab072 ATM: iphase, remove sleep-inside-atomic
Stanse found that ia_init_one locks a spinlock and inside of that it
calls ia_start which calls:
* request_irq
* tx_init which does kmalloc(GFP_KERNEL)

Both of them can thus sleep and result in a deadlock. I don't see a
reason to have a per-device spinlock there which is used only there
and inited right before the lock location. So remove it completely.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-11 11:05:42 -07:00