It's possible for init_memory_mapping() to fail to map the entire region
if it crosses a boundary, so ensure that we complete the mapping.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Link: http://lkml.kernel.org/r/1304623186-18261-5-git-send-email-mjg@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Experimentation with various EFI implementations has shown that functions
outside runtime services will still update their pointers if
SetVirtualAddressMap() is called with memory descriptors outside the
runtime area. This is obviously insane, and therefore is unsurprising.
Evidence from instrumenting another EFI implementation suggests that it
only passes the set of descriptors covering runtime regions, so let's
avoid any problems by doing the same. Runtime descriptors are copied to
a separate memory map, and only that map is passed back to the firmware.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Link: http://lkml.kernel.org/r/1304623186-18261-4-git-send-email-mjg@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Some firmware implementations assume that physically contiguous regions
will be contiguous in virtual address space. This assumption is, obviously,
entirely unjustifiable. Said firmware implementations lack the good grace
to handle their failings in a measured and reasonable manner, instead
tending to shit all over address space and oopsing the kernel.
In an ideal universe these firmware implementations would simultaneously
catch fire and cease to be a problem, but since some of them are present
in attractively thin and shiny metal devices vanity wins out and some
poor developer spends an extended period of time surrounded by a
growing array of empty bottles until the underlying reason becomes
apparent. Said developer presents this patch, which simply merges
adjacent regions if they happen to be contiguous and have the same EFI
memory type and caching attributes.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Link: http://lkml.kernel.org/r/1304623186-18261-3-git-send-email-mjg@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The core EFI code and 64-bit EFI code currently have independent
implementations of code for setting memory regions as executable or not.
Let's consolidate them.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Link: http://lkml.kernel.org/r/1304623186-18261-2-git-send-email-mjg@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The spec says that SetVirtualAddressMap doesn't work once you're in
virtual mode, so there's no point in having infrastructure for calling
it from there.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Link: http://lkml.kernel.org/r/1304623186-18261-1-git-send-email-mjg@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* 'stable/bug-fixes-for-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen: mask_rw_pte mark RO all pagetable pages up to pgt_buf_top
xen/mmu: Add workaround "x86-64, mm: Put early page table high"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits)
sysctl: net: call unregister_net_sysctl_table where needed
Revert: veth: remove unneeded ifname code from veth_newlink()
smsc95xx: fix reset check
tg3: Fix failure to enable WoL by default when possible
networking: inappropriate ioctl operation should return ENOTTY
amd8111e: trivial typo spelling: Negotitate -> Negotiate
ipv4: don't spam dmesg with "Using LC-trie" messages
af_unix: Only allow recv on connected seqpacket sockets.
mii: add support of pause frames in mii_get_an
net: ftmac100: fix scheduling while atomic during PHY link status change
usbnet: Transfer of maintainership
usbnet: add support for some Huawei modems with cdc-ether ports
bnx2: cancel timer on device removal
iwl4965: fix "Received BA when not expected"
iwlagn: fix "Received BA when not expected"
dsa/mv88e6131: fix unknown multicast/broadcast forwarding on mv88e6085
usbnet: Resubmit interrupt URB if device is open
iwl4965: fix "TX Power requested while scanning"
iwlegacy: led stay solid on when no traffic
b43: trivial: update module info about ucode16_mimo firmware
...
The use of base for %ebx in this file is arbitrary, *except* that we
also use it to compute the real-mode segment. Therefore, make it so
that r_base really is the true address to which %ebx points.
This resolves kernel bugzilla 33302.
Reported-and-tested-by: Alexey Zaytsev <alexey.zaytsev@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/n/tip-08os5wi3yq1no0y4i5m4z7he@git.kernel.org
mask_rw_pte is currently checking if a pfn is a pagetable page if it
falls in the range pgt_buf_start - pgt_buf_end but that is incorrect
because pgt_buf_end is a moving target: pgt_buf_top is the real
boundary.
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
As a consequence of the commit:
commit 4b239f458c
Author: Yinghai Lu <yinghai@kernel.org>
Date: Fri Dec 17 16:58:28 2010 -0800
x86-64, mm: Put early page table high
it causes the Linux kernel to crash under Xen:
mapping kernel into physical memory
Xen: setup ISA identity maps
about to get started...
(XEN) mm.c:2466:d0 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn b1d89 (pfn bacf7)
(XEN) mm.c:3027:d0 Error while pinning mfn b1d89
(XEN) traps.c:481:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
...
The reason is that at some point init_memory_mapping is going to reach
the pagetable pages area and map those pages too (mapping them as normal
memory that falls in the range of addresses passed to init_memory_mapping
as argument). Some of those pages are already pagetable pages (they are
in the range pgt_buf_start-pgt_buf_end) therefore they are going to be
mapped RO and everything is fine.
Some of these pages are not pagetable pages yet (they fall in the range
pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they
are going to be mapped RW. When these pages become pagetable pages and
are hooked into the pagetable, xen will find that the guest has already
a RW mapping of them somewhere and fail the operation.
The reason Xen requires pagetables to be RO is that the hypervisor needs
to verify that the pagetables are valid before using them. The validation
operations are called "pinning" (more details in arch/x86/xen/mmu.c).
In order to fix the issue we mark all the pages in the entire range
pgt_buf_start-pgt_buf_top as RO, however when the pagetable allocation
is completed only the range pgt_buf_start-pgt_buf_end is reserved by
init_memory_mapping. Hence the kernel is going to crash as soon as one
of the pages in the range pgt_buf_end-pgt_buf_top is reused (b/c those
ranges are RO).
For this reason, this function is introduced which is called _after_
the init_memory_mapping has completed (in a perfect world we would
call this function from init_memory_mapping, but lets ignore that).
Because we are called _after_ init_memory_mapping the pgt_buf_[start,
end,top] have all changed to new values (b/c another init_memory_mapping
is called). Hence, the first time we enter this function, we save
away the pgt_buf_start value and update the pgt_buf_[end,top].
When we detect that the "old" pgt_buf_start through pgt_buf_end
PFNs have been reserved (so memblock_x86_reserve_range has been called),
we immediately set out to RW the "old" pgt_buf_end through pgt_buf_top.
And then we update those "old" pgt_buf_[end|top] with the new ones
so that we can redo this on the next pagetable.
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Reviewed-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
[v1: Updated with Jeremy's comments]
[v2: Added the crash output]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm: (47 commits)
CLKDEV: Fix clkdev return value for NULL clk case
ARM: 6891/1: prevent heap corruption in OABI semtimedop
ARM: kprobes: Tidy-up kprobes-decode.c
ARM: kprobes: Add emulation of hint instructions like NOP and WFI
ARM: kprobes: Add emulation of SBFX, UBFX, BFI and BFC instructions
ARM: kprobes: Add emulation of MOVW and MOVT instructions
ARM: kprobes: Reject probing of undefined data processing instructions
ARM: kprobes: Remove redundant code in space_1111
ARM: kprobes: Fix emulation of PLD instructions
ARM: kprobes: Reject probing of SETEND instructions
ARM: kprobes: Consolidate stub decoding functions
ARM: kprobes: Reject probing of all coprocessor instructions
ARM: kprobes: Fix emulation of USAD8 instructions
ARM: kprobes: Fix emulation of SMUAD, SMUSD and SMMUL instructions
ARM: kprobes: Fix emulation of SXTB16, SXTB, SXTH, UXTB16, UXTB and UXTH instructions
ARM: kprobes: Reject probing of undefined media instructions
ARM: kprobes: Add emulation of RBIT instruction
ARM: kprobes: Reject probing of LDRB instructions which load PC
ARM: kprobes: Fix emulation of LDRD and STRD instructions
ARM: kprobes: Reject probing of LDR/STR instructions which update PC unpredictably
...
numa_cleanup_meminfo() trims each memblk between low (0) and
high (max_pfn) limits and discards empty ones. However, the
emptiness detection incorrectly used equality test. If the
start of a memblk is higher than max_pfn, it is empty but fails
the equality test and doesn't get discarded.
The condition triggers when max_pfn is lower than start of a
NUMA node and results in memory misconfiguration - leading to
WARN_ON()s and other funnies. The bug was discovered in devel
branch where 32bit too uses this code path for NUMA init. If a
node is above the addressing limit, max_pfn ends up lower than
the node triggering this problem.
The failure hasn't been observed on x86-64 but is still possible
with broken hardware e820/NUMA info. As the fix is very low
risk, it would be better to apply it even for 64bit.
Fix it by using >= instead of ==.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
[ Extracted the actual fix from the original patch and rewrote patch description. ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20110501171204.GO29280@htj.dyndns.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Older AMD K8 processors (Revisions A-E) are affected by erratum
400 (APIC timer interrupts don't occur in C states greater than
C1). This, for example, means that X86_FEATURE_ARAT flag should
not be set for these parts.
This addresses regression introduced by commit
b87cf80af3 ("x86, AMD: Set ARAT
feature on AMD processors") where the system may become
unresponsive until external interrupt (such as keyboard input)
occurs. This results, for example, in time not being reported
correctly, lack of progress on the system and other lockups.
Reported-by: Joerg-Volker Peetz <jvpeetz@web.de>
Tested-by: Joerg-Volker Peetz <jvpeetz@web.de>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Boris Ostrovsky <Boris.Ostrovsky@amd.com>
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/1304113663-6586-1-git-send-email-ostr@amd64.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf, x86, nmi: Move LVT un-masking into irq handlers
perf events, x86: Work around the Nehalem AAJ80 erratum
perf, x86: Fix BTS condition
ftrace: Build without frame pointers on Microblaze
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: ce4100: Configure IOAPIC pins for USB and SATA to level type
x86: devicetree: Configure IOAPIC pin only once
x86, setup: When probing memory with e801, use ax/bx as a pair
* 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6:
OMAP3+: voltage: remove initial voltage
OMAP4: Intialize IVA Device in addition to DSP device.
omap: rx51: mark reserved memory earlier
OMAP3: l3: fix for "irq 10: nobody cared" message
arm: omap2: enable smc instruction for sleep34xx
OMAP2/3: hwmod: fix gpio-reset timeouts seen during bootup.
OMAP3: PM: Do not rely on ROM code to restore CM_AUTOIDLE_PLL.AUTO_PERIPH_DPLL
OMAP2+: PM: Fix the saving of CM_AUTOIDLE_PLL register on scratchpad area
OMAP4: clock data: Change DSS clock aliases
OMAP2+: hwmod data: Fix wrong dma_system end address
When CONFIG_OABI_COMPAT is set, the wrapper for semtimedop does not
bound the nsops argument. A sufficiently large value will cause an
integer overflow in allocation size, followed by copying too much data
into the allocated buffer. Fix this by restricting nsops to SEMOPM.
Untested.
Cc: stable@kernel.org
Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
pfault, dasd diag and virtio all use the same external interrupt number.
The respective interrupt handlers decide by the subcode if they are
meant to handle the interrupt.
Counting is currently done before looking at the subcode which means
each handler counts an interrupt even if it is not handling it.
Fix this by moving the kstat code after the code which looks at the
subcode.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
- Remove coding standard violations reported by checkpatch.pl
- Delete comment about handling of conditional branches which is no
longer true.
- Delete comment at end of file which lists all ARM instructions. This
duplicates data available in the ARM ARM and seems like an
unnecessary maintenance burden to keep this up to date and accurate.
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Being able to probe NOP instructions is useful for hard-coding probeable
locations and is used by the kprobes test code.
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
These bit field manipulation instructions occur several thousand
times in an ARMv7 kernel.
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
The MOVW and MOVT instructions account for approximately 7% of all
instructions in a ARMv7 kernel as GCC uses them instead of a literal
pool.
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
The instruction decoding in space_cccc_000x needs to reject probing of
instructions with undefined patterns as they may in future become
defined and then emulated faultily - as has already happened with the
SMC instruction.
This fix is achieved by testing for the instruction patterns we want to
probe and making the the default fall-through paths reject probes. This
also allows us to remove some explicit tests for instructions that we
wish to reject, as that is now the default action.
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
The tests to explicitly reject probing CPS, RFE and SRS instructions
are redundant as the default case is now to reject undecoded patterns.
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
The PLD instructions wasn't being decoded correctly and the emulation
code wasn't adjusting PC correctly.
As the PLD instruction is only a performance hint we emulate it as a
simple nop, and we can broaden the instruction decoding to take into
account newer PLI and PLDW instructions.
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
The emulation of SETEND was broken as it changed the endianess for
the running kprobes handling code. Rather than adding a new simulation
routine to fix this we'll just reject probing of SETEND as these should
be very rare in the kernel.
Note, the function emulate_none is now unused but it is left in the
source code as future patches will use it.
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Following the change to remove support for coprocessor instructions
we are left with three stub functions which can be consolidated.
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
The kernel doesn't currently support VFP or Neon code, and probing of
code with CP15 operations is fraught with bad consequences. Therefore we
don't need the ability to probe coprocessor instructions and the code to
support this can be removed.
The removed code also had at least two bugs:
- MRC into R15 should set CPSR not trash PC
- LDC and STC which use PC as base register needed the address offset by 8
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
The USAD8 instruction wasn't being explicitly decoded leading
to the incorrect emulation routine being called. It can be correctly
decoded in the same way as the signed multiply instructions so we move
the decoding there.
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
The signed multiply instructions were being decoded incorrectly.
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
These sign extension instructions are encoded as extend-and-add
instructions where the register to add is specified as r15. The decoding
routines weren't checking for this and were using the incorrect
emulation code, giving incorrect results.
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
The instructions space for media instructions contains some undefined
patterns. We need to reject probing of these because they may in future
become defined and the kprobes code may then emulate them faultily.
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
The v6T2 RBIT instruction was accidentally being emulated correctly,
this patch adds correct decoding for the instruction.
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
These instructions are specified as UNPREDICTABLE.
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
The decoding of these instructions got the register indexed and
immediate indexed forms the wrong way around, causing incorrect
emulation.
Instructions like "LDRD Rx, [Rx]" were corrupting Rx because the base
register writeback was being performed unconditionally, overwriting the
value just loaded from memory. The fix is to only writeback the base
register when that form of the instruction is used. Note, now that we
reject probing writeback with PC the emulation code doesn't need the
check rn!=15.
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Using PC as an base register with writeback is UNPREDICTABLE, as is non
word-sized loads or stores of PC. (We only really care about preventing
loads to PC but it keeps the code simpler if we also exclude stores.)
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
The decoding of these instructions got the register indexed and
immediate indexed forms the wrong way around, causing incorrect
emulation.
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
The emulation code for STREX and LDREX instructions is faulty, however,
rather than attempting to fix this we reject probes of these
instructions. We do this because they can never succeed in gaining
exclusive access as the exception framework clears the exclusivity
monitor when a probes breakpoint is hit. (This is a general problem
when probing all instructions executing between a LDREX and its
corresponding STREX and can lead to infinite retry loops.)
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
The instructions space for 'Multiply and multiply-accumulate'
instructions contains some undefined patterns. We need to reject
probing of these because they may in future become defined and the
kprobes code may then emulate them faultily.
This has already happened with the new MLS instruction which this patch
also adds correct decoding for as well as tightening up other decoding
tests. (Before this patch the wrong emulation routine was being called
for MLS though it still produced correct results.)
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
The MRS instruction should set mode and interrupt bits in the read value
so it is simpler to use a new simulation routine (simulate_mrs) rather
than some modified emulation.
prep_emulate_rd12 is now unused and removed.
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
We need to reject probing of instructions which read SPSR because
we can't handle this as the value in SPSR is lost when the exception
handler for the probe breakpoint first runs.
This patch also fixes the bitmask for MRS instructions decoding to
include checking bits 5-7.
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Emulation of instructions like "ADD rd, rn, #<const>" would result in a
corrupted value for rd.
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Probing these instructions was corrupting R0 because the emulation code
didn't account for the fact that they don't write a result to a
register.
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Now we have the framework code handling conditionally executed
instructions we can remove redundant checks in individual simulation
routines.
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
When a kprobe is placed onto conditionally executed ARM instructions,
many of the emulation routines used to single step them produce corrupt
register results. Rather than fix all of these cases we modify the
framework which calls them to test the relevant condition flags and, if
the test fails, skip calling the emulation code.
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>