Commit graph

9287 commits

Author SHA1 Message Date
Borislav Petkov
4a90a99c4f x86: Add a static_cpu_has_safe variant
We want to use this in early code where alternatives might not have run
yet and for that case we fall back to the dynamic boot_cpu_has.

For that, force a 5-byte jump since the compiler could be generating
differently sized jumps for each label.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1370772454-6106-5-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-20 17:38:14 -07:00
Borislav Petkov
5700f743b5 x86: Sanity-check static_cpu_has usage
static_cpu_has may be used only after alternatives have run. Before that
it always returns false if constant folding with __builtin_constant_p()
doesn't happen. And you don't want that.

This patch is the result of me debugging an issue where I overzealously
put static_cpu_has in code which executed before alternatives have run
and had to spend some time with scratching head and cursing at the
monitor.

So add a jump to a warning which screams loudly when we use this
function too early. The alternatives patch that check away in
conjunction with patching the rest of the kernel image.

[ hpa: factored this into its own configuration option.  If we want to
  have an overarching option, it should be an option which selects
  other options, not as a group option in the source code. ]

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1370772454-6106-4-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-20 17:37:19 -07:00
Borislav Petkov
c3b83598c1 x86, cpu: Add a synthetic, always true, cpu feature
This will be used in alternatives later as an always-replace flag.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1370772454-6106-2-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-20 17:06:07 -07:00
Linus Torvalds
a3d5c3460a Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Two smaller fixes - plus a context tracking tracing fix that is a bit
  bigger"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tracing/context-tracking: Add preempt_schedule_context() for tracing
  sched: Fix clear NOHZ_BALANCE_KICK
  sched/x86: Construct all sibling maps if smt
2013-06-20 08:18:35 -10:00
Linus Torvalds
86c76676cf Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Four fixes.  The mmap ones are unfortunately larger than desired -
  fuzzing uncovered bugs that needed perf context life time management
  changes to fix properly"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Fix broken PEBS-LL support on SNB-EP/IVB-EP
  perf: Fix mmap() accounting hole
  perf: Fix perf mmap bugs
  kprobes: Fix to free gone and unused optprobes
2013-06-20 08:17:36 -10:00
Linus Torvalds
805e318548 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull cpu idle fixes from Thomas Gleixner:
 - Add a missing irq enable. Fallout of the idle conversion
 - Fix stackprotector wreckage caused by the idle conversion

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  idle: Enable interrupts in the weak arch_cpu_idle() implementation
  idle: Add the stack canary init to cpu_startup_entry()
2013-06-20 08:16:07 -10:00
Ingo Molnar
f070a4dba9 Merge branch 'perf/urgent' into perf/core
Merge in two hw_breakpoint fixes, before applying another 5.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-20 17:57:40 +02:00
Masami Hiramatsu
003002e04e kprobes: Fix arch_prepare_kprobe to handle copy insn failures
Fix arch_prepare_kprobe() to handle failures in copy instruction
correctly. This fix is related to the previous fix: 8101376
which made __copy_instruction return an error result if failed,
but caller site was not updated to handle it. Thus, this is the
other half of the bugfix.

This fix is also related to the following bug-report:

   https://bugzilla.redhat.com/show_bug.cgi?id=910649

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Jonathan Lebon <jlebon@redhat.com>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: systemtap@sourceware.org
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20130605031216.15285.2001.stgit@mhiramat-M0-7522
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-20 14:25:48 +02:00
Michel Lespinasse
b52e0a7c4e x86: Fix trigger_all_cpu_backtrace() implementation
The following change fixes the x86 implementation of
trigger_all_cpu_backtrace(), which was previously (accidentally,
as far as I can tell) disabled to always return false as on
architectures that do not implement this function.

trigger_all_cpu_backtrace(), as defined in include/linux/nmi.h,
should call arch_trigger_all_cpu_backtrace() if available, or
return false if the underlying arch doesn't implement this
function.

x86 did provide a suitable arch_trigger_all_cpu_backtrace()
implementation, but it wasn't actually being used because it was
declared in asm/nmi.h, which linux/nmi.h doesn't include. Also,
linux/nmi.h couldn't easily be fixed by including asm/nmi.h,
because that file is not available on all architectures.

I am proposing to fix this by moving the x86 definition of
arch_trigger_all_cpu_backtrace() to asm/irq.h.

Tested via: echo l > /proc/sysrq-trigger

Before the change, this uses a fallback implementation which
shows backtraces on active CPUs (using
smp_call_function_interrupt() )

After the change, this shows NMI backtraces on all CPUs

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1370518875-1346-1-git-send-email-walken@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-20 14:00:21 +02:00
Borislav Petkov
719038de98 x86/intel/cacheinfo: Shut up last long-standing warning
arch/x86/kernel/cpu/intel_cacheinfo.c: In function ‘init_intel_cacheinfo’:
arch/x86/kernel/cpu/intel_cacheinfo.c:642:28: warning: ‘this_leaf.size’ may be used uninitialized in this function [-Wmaybe-uninitialized] arch/x86/kernel/cpu/intel_cacheinfo.c:643:29: warning: ‘this_leaf.eax.split.num_threads_sharing’ may be used uninitialized in this function [-Wmaybe-uninitialized]

This keeps on happening during randbuilds and the compiler is
wrong here:

In the case where cpuid4_cache_lookup_regs() returns 0, both
this_leaf.size and this_leaf.eax get initialized. In the case
where the CPUID leaf doesn't contain valid cache info, we error
out which init_intel_cacheinfo() handles correctly without
touching the abovementioned fields.

So shut up the warning by clearing out the struct which we hand
down.

While at it, reverse error handling and gain one indentation
level.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1370710095-20547-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-20 12:27:41 +02:00
Konrad Rzeszutek Wilk
d6a77ead21 x86 / ACPI / sleep: Provide registration for acpi_suspend_lowlevel.
Which by default will be x86_acpi_suspend_lowlevel.
This registration allows us to register another callback
if there is a need to use another platform specific callback.

Signed-off-by: Liang Tang <liang.tang@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Ben Guthro <benjamin.guthro@citrix.com>
Acked-by: "H. Peter Anvin" <hpa@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-06-19 23:36:30 +02:00
Rusty Russell
5a802e1530 x86: Remove weird PTR_ERR() in do_debug
62edab905 changed the argument to notify_die() from dr6 to &dr6,
but weirdly, used PTR_ERR() to cast it to a long.  Since dr6 is
on the stack, this is an abuse of PTR_ERR().  Cast to long, as
per kernel standard.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: K.Prasad <prasad@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1371357768-4968-8-git-send-email-rusty@rustcorp.com.au
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 15:01:36 +02:00
Andi Kleen
f9134f36ae perf/x86/intel: Add mem-loads/stores support for Haswell
mem-loads is basically the same as Sandy Bridge,
but we use a separate string for changes later.

Haswell doesn't support the full precise store mode,
so we emulate it using the "DataLA" facility.
This allows to do everything, but for data sources we
can only detect L1 hit or not.

There is no explicit enable bit anymore, so we have
to tie it to a perf internal only flag.

The address is supported for all memory related PEBS
events with DataLA. Instead of only logging for the
load and store events we allow logging it for all
(it will be simply 0 if the current event does not
support it)

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-7-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:43:35 +02:00
Andi Kleen
135c5612c4 perf/x86/intel: Support Haswell/v4 LBR format
Haswell has two additional LBR from flags for TSX: in_tx and
abort_tx, implemented as a new "v4" version of the LBR format.

Handle those in and adjust the sign extension code to still
correctly extend. The flags are exported similarly in the LBR
record to the existing misprediction flag

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-6-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:43:35 +02:00
Andi Kleen
72db559646 perf/x86/intel: Move NMI clearing to end of PMI handler
This avoids some problems with spurious PMIs on Haswell.
Haswell seems to behave more like P4 in this regard. Do
the same thing as the P4 perf handler by unmasking
the NMI only at the end. Shouldn't make any difference
for earlier family 6 cores.

(Tested on Haswell, IvyBridge, Westmere, Saltwell (Atom).)

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-5-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:43:34 +02:00
Andi Kleen
3044318f1f perf/x86/intel: Add Haswell PEBS support
Add simple PEBS support for Haswell.

The constraints are similar to SandyBridge with a few new
events.

Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-4-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:43:33 +02:00
Andi Kleen
3a632cb229 perf/x86/intel: Add simple Haswell PMU support
Similar to SandyBridge, but has a few new events and two
new counter bits.

There are some new counter flags that need to be prevented
from being set on fixed counters, and allowed to be set
for generic counters.

Also we add support for the counter 2 constraint to handle
all raw events.

(Contains fixes from Stephane Eranian.)

Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:43:33 +02:00
Andi Kleen
130768b8c9 perf/x86/intel: Add Haswell PEBS record support
Add support for the Haswell extended (fmt2) PEBS format.

It has a superset of the nhm (fmt1) PEBS fields, but has a
longer record so we need to adjust the code paths.

The main advantage is the new "EventingRip" support which
directly gives the instruction, not off-by-one instruction. So
with precise == 2 we use that directly and don't try to use LBRs
and walking basic blocks. This lowers the overhead of using
precise significantly.

Some other features are added in later patches.

Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:43:32 +02:00
Dave Jones
4338774cd4 x86/debug: Only print out DR registers if they are not power-on defaults
The DR registers are rarely useful when decoding oopses.
With screen real estate during oopses at a premium, we can save
two lines by only printing out these registers when they are set
to something other than they power-on state.

Signed-off-by: Dave Jones <davej@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130618160911.GA24487@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:33:59 +02:00
Ingo Molnar
2e7e98b85d Fix typo in define
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRsuoUAAoJEBLB8Bhh3lVKpFUP/j6763r5WPtrzUx0sAZBt7SN
 T9VAdqWd3SvxP+lDnRoU1luGQc0j+05njUnc74e2k2rGu2+WDonkxJIv4xMHsuG/
 cItdK+ThmxOQa4ijm3uJ0N8I16k15cZkW0H5hyHSVqbawPaAjCt25DEXexTSgBQM
 wnQSsDNXRjtU/LSBPAZX4t9ePdawy/EwLGO+YYJ1ZKck4oesCwp0j/PVqQG+IGTM
 QUbB3mh0GvdHW5pPgVH6n+yqQ7puN92WHzJicYsYCtJDHa1Yf5D6/C+ZBq8GJ1V0
 rCdxQ8uI5tcn9WOJ1gZrUGPmjEMotnLrZX26b/cTEPzBP0mjf+Kd/Ew+gNmdmvX8
 5jv1Agzb1GQVXesPpv+N3hw5HNqLEQCBdlLQ5kiX63wYwF4vck+JaAGDWUILO6qp
 TtdEiotBHDKpk9YuWeepmCUDB2u/uyMrHSA0HGLBh6ACQXsl5/RnQj3KnGFlbvo6
 FbMvoOEXbNIOB92OEnC9kZteXGlIBtsK0sHmRXfOsckbH/gyYyBcFwqmeHFpzuKV
 N5f3NV8GTrUyrNdn4S28S/wlnx+KOqpqFhllVkgp/OcW2/Ry2eEZYM7ypU8bXMpo
 +HOFwQZimmQonVqRzGrQQI9w5+D5xGJ3gI/0lcjkX2r2Njowx8q13wfKKmqE8EDB
 7a+BsNdTBUt01dskFlSe
 =8FzZ
 -----END PGP SIGNATURE-----

Merge tag 'ras_fixlet_for_3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras

Pull "Fix typo in define" change from Borislav Petkov.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 13:51:54 +02:00
Yan, Zheng
b2fa344d0c perf/x86/intel: Fix sparse warning
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1370421025-10986-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 13:04:55 +02:00
Suravee Suthikulpanit
7be6296fdd perf/x86/amd: AMD IOMMU Performance Counter PERF uncore PMU implementation
Implement a perf PMU to handle IOMMU performance counters and events.
The PMU only supports counting mode (e.g. perf stat). Since the counters
are shared across all cores, the PMU is implemented as "system-wide" mode.

To invoke the AMD IOMMU PMU, issue a perf tool command such as:

  ./perf stat -a -e amd_iommu/<events>/ <command>

or:

  ./perf stat -a -e amd_iommu/config=<config-data>,config1=<config1-data>/ <command>

For example:

  ./perf stat -a -e amd_iommu/mem_trans_total/ <command>

The resulting count will be how many IOMMU total peripheral memory
operations were performed during the command execution window.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1370466709-3212-3-git-send-email-suravee.suthikulpanit@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 13:04:53 +02:00
Dave Hansen
ae0def05ed perf/x86: Only print PMU state when also WARN()'ing
intel_pmu_handle_irq() has a warning in it if it does too many
loops.  It is a WARN_ONCE(), but the perf_event_print_debug()
call beneath it is unconditional. For the first warning, you get
a nice backtrace and message, but subsequent ones just dump the
PMU state with no leading messages.  I doubt this is what was
intended.

This patch will only print the PMU state when paired with the
WARN_ON() text.  It effectively open-codes WARN_ONCE()'s
one-time-only logic.

My suspicion is that the code really just wants to make sure we
do not sit in the loop and spit out a warning for every loop
iteration after the 100th.  From what I've seen, this is very
unlikely to happen since we also clear the PMU state.

After this patch, instead of seeing the PMU state dumped each
time, you will just see:

	[57494.894540] perf_event_intel: clearing PMU state on CPU#129
	[57579.539668] perf_event_intel: clearing PMU state on CPU#10
	[57587.137762] perf_event_intel: clearing PMU state on CPU#134
	[57623.039912] perf_event_intel: clearing PMU state on CPU#114
	[57644.559943] perf_event_intel: clearing PMU state on CPU#118
	...

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130530174559.0DB049F4@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:50:47 +02:00
Andrew Hunter
43b4578071 perf/x86: Reduce stack usage of x86_schedule_events()
x86_schedule_events() caches event constraints on the stack during
scheduling.  Given the number of possible events, this is 512 bytes of
stack; since it can be invoked under schedule() under god-knows-what,
this is causing stack blowouts.

Trade some space usage for stack safety: add a place to cache the
constraint pointer to struct perf_event.  For 8 bytes per event (1% of
its size) we can save the giant stack frame.

This shouldn't change any aspect of scheduling whatsoever and while in
theory the locality's a tiny bit worse, I doubt we'll see any
performance impact either.

Tested: `perf stat whatever` does not blow up and produces
results that aren't hugely obviously wrong.  I'm not sure how to run
particularly good tests of perf code, but this should not produce any
functional change whatsoever.

Signed-off-by: Andrew Hunter <ahh@google.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1369332423-4400-1-git-send-email-ahh@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:50:44 +02:00
Ingo Molnar
eff2108f02 Merge branch 'perf/urgent' into perf/core
Merge in the latest fixes, to avoid conflicts with ongoing work.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:44:41 +02:00
Stephane Eranian
f1a527899e perf/x86: Fix broken PEBS-LL support on SNB-EP/IVB-EP
This patch fixes broken support of PEBS-LL on SNB-EP/IVB-EP.
For some reason, the LDLAT extra reg definition for snb_ep
showed up as duplicate in the snb table.

This patch moves the definition of LDLAT back into the
snb_ep table.

Thanks to Don Zickus for tracking this one down.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130607212210.GA11849@quad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:44:16 +02:00
Igor Mammedov
07868fc6aa x86: kvmclock: zero initialize pvclock shared memory area
kernel might hung in pvclock_clocksource_read() due to
uninitialized memory might contain odd version value in
following cycle:

        do {
                version = __pvclock_read_cycles(src, &ret, &flags);
        } while ((src->version & 1) || version != src->version);

if secondary kvmclock is accessed before it's registered with kvm.

Clear garbage in pvclock shared memory area right after it's
allocated to avoid this issue.

Ref: https://bugzilla.kernel.org/show_bug.cgi?id=59521
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
[See BZ for analysis.  We may want a different fix for 3.11, but
 this is the safest for now - Paolo]
Cc: <stable@vger.kernel.org> # 3.8
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-06-19 12:25:28 +02:00
Yinghai Lu
d8d386c106 x86, mtrr: Fix original mtrr range get for mtrr_cleanup
Joshua reported: Commit cd7b304dfaf1 (x86, range: fix missing merge
during add range) broke mtrr cleanup on his setup in 3.9.5.
corresponding commit in upstream is fbe06b7bae.

  *BAD*gran_size: 64K chunk_size: 16M num_reg: 6 lose cover RAM: -0G

https://bugzilla.kernel.org/show_bug.cgi?id=59491

So it rejects new var mtrr layout.

It turns out we have some problem with initial mtrr range retrieval.
The current sequence is:
	x86_get_mtrr_mem_range
		==> bunchs of add_range_with_merge
		==> bunchs of subract_range
		==> clean_sort_range
	add_range_with_merge for [0,1M)
	sort_range()

add_range_with_merge could have blank slots, so we can not just
sort only, that will have final result have extra blank slot in head.

So move that calling add_range_with_merge for [0,1M), with that we
could avoid extra clean_sort_range calling.

Reported-by: Joshua Covington <joshuacov@googlemail.com>
Tested-by: Joshua Covington <joshuacov@googlemail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1371154622-8929-2-git-send-email-yinghai@kernel.org
Cc: <stable@vger.kernel.org> v3.9
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-18 11:32:02 -05:00
Fenghua Yu
6bb2ff846f x86 thermal: Disable power limit notification interrupt by default
The package power limit notification interrupt is primarily for
system diagnosis, and should not be blindly enabled on every
system by default -- particuarly since Linux does nothing in the
handler except count how many times it has been called...

Add a new kernel cmdline parameter "int_pln_enable" for situations where
users want to oberve these events via existing system counters:

$ grep TRM /proc/interrupts

$ grep . /sys/devices/system/cpu/cpu*/thermal_throttle/*

https://bugzilla.kernel.org/show_bug.cgi?id=36182

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-06-14 14:49:00 -07:00
Fenghua Yu
c81147483e x86 thermal: Delete power-limit-notification console messages
Package power limits are common on some systems under some conditions --
so printing console messages when limits are reached
causes unnecessary customer concern and support calls.

Note that even with these console messages gone,
the events can still be observed via system counters:

$ grep TRM /proc/interrupts

Shows total thermal interrupts, which includes both power
limit notifications and thermal throttling interrupts.

$ grep . /sys/devices/system/cpu/cpu*/thermal_throttle/*

Will show what caused those interrupts, core and package
throttling and power limit notifications.

https://bugzilla.kernel.org/show_bug.cgi?id=36182

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-06-14 14:48:37 -07:00
Kees Cook
c8a22d19dd x86: Fix typo in kexec register clearing
Fixes a typo in register clearing code. Thanks to PaX Team for fixing
this originally, and James Troup for pointing it out.

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20130605184718.GA8396@www.outflux.net
Cc: <stable@vger.kernel.org> v2.6.30+
Cc: PaX Team <pageexec@freemail.hu>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-12 15:16:18 -07:00
Thomas Gleixner
d7880812b3 idle: Add the stack canary init to cpu_startup_entry()
Moving x86 to the generic idle implementation (commit 7d1a9417 "x86:
Use generic idle loop") wreckaged the stack protector.

I stupidly missed that boot_init_stack_canary() must be inlined from a
function which never returns, but I put that call into
arch_cpu_idle_prepare() which of course returns.

I pondered to play tricks with arch_cpu_idle_prepare() first, but then
I noticed, that the other archs which have implemented the
stackprotector (ARM and SH) do not initialize the canary for the
non-boot cpus.

So I decided to move the boot_init_stack_canary() call into
cpu_startup_entry() ifdeffed with an CONFIG_X86 for now. This #ifdef
is just a temporary measure as I don't want to inflict the
boot_init_stack_canary() call on ARM and SH that late in the cycle.

I'll queue a patch for 3.11 which removes the #ifdef if the ARM/SH
maintainers have no objection.

Reported-by: Wouter van Kesteren <woutershep@gmail.com>
Cc: x86@kernel.org
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-06-11 22:04:47 +02:00
H. Peter Anvin
60e019eb37 x86: Get rid of ->hard_math and all the FPU asm fu
Reimplement FPU detection code in C and drop old, not-so-recommended
detection method in asm. Move all the relevant stuff into i387.c where
it conceptually belongs. Finally drop cpuinfo_x86.hard_math.

[ hpa: huge thanks to Borislav for taking my original concept patch
  and productizing it ]

[ Boris, note to self: do not use static_cpu_has before alternatives! ]

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1367244262-29511-2-git-send-email-bp@alien8.de
Link: http://lkml.kernel.org/r/1365436666-9837-2-git-send-email-bp@alien8.de
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-06 14:32:04 -07:00
Jacob Shin
cd1c32ca96 x86, microcode, amd: Allow multiple families' bin files appended together
Add support for parsing through multiple families' microcode patch
container binary files appended together when early loading. This is
already supported on Intel.

Reported-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/1370463236-2115-3-git-send-email-jacob.shin@amd.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-05 13:56:55 -07:00
Jacob Shin
275bbe2e29 x86, microcode, amd: Make find_ucode_in_initrd() __init
Change find_ucode_in_initrd() to __init and only let BSP call it
during cold boot. This is the right thing to do because only BSP will
see initrd loaded by the boot loader. APs will offset into
initrd_start to find the microcode patch binary.

Reported-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/1370463236-2115-2-git-send-email-jacob.shin@amd.com
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-05 13:56:47 -07:00
Mathias Krause
a90936845d x86, mce: Fix "braodcast" typo
Fix the typo in MCJ_IRQ_BRAODCAST.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-06-05 11:59:17 +02:00
Jacob Shin
6b3389ac21 x86, microcode, amd: Fix warnings and errors on with CONFIG_MICROCODE=m
Fix section mismatch warnings on microcode_amd_early.
Compile error occurs when CONFIG_MICROCODE=m, change so that early
loading depends on microcode_core.

Reported-by: Yinghai Lu <yinghai@kernel.org>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/20130531150241.GA12006@jshin-Toonie
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-05-31 13:56:58 -07:00
Paul Bolle
71c69f7f4b x86/mce: Remove check for CONFIG_X86_MCE_P4THERMAL
The Kconfig symbol X86_MCE_P4THERMAL was removed in v2.6.32.
Remove a useless check for its macro, as it will now always
evaluate to false.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Link: http://lkml.kernel.org/r/1369853850.23034.28.camel@x61.thuisdomein
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-31 13:12:35 +02:00
Andrew Jones
b0bc225d0e sched/x86: Construct all sibling maps if smt
Commit 316ad24830 ("sched/x86: Rewrite
set_cpu_sibling_map()") broke the construction of sibling maps,
which also broke the booted_cores accounting.

Before the rewrite, if smt was present, then each map was
updated for each smt sibling. After the rewrite only
cpu_sibling_mask gets updated, as the llc and core maps depend
on 'has_mc = x86_max_cores > 1' instead. This leads to problems
with topologies like the following

(qemu -smp sockets=2,cores=1,threads=2)

  processor       : 0
  physical id     : 0
  siblings        : 1    <= should be 2
  core id         : 0
  cpu cores       : 1

  processor       : 1
  physical id     : 0
  siblings        : 1    <= should be 2
  core id         : 0
  cpu cores       : 0    <= should be 1

  processor       : 2
  physical id     : 1
  siblings        : 1    <= should be 2
  core id         : 0
  cpu cores       : 1

  processor       : 3
  physical id     : 1
  siblings        : 1    <= should be 2
  core id         : 0
  cpu cores       : 0    <= should be 1

This patch restores the former construction by defining has_mc
as (has_smt || x86_max_cores > 1). This should be fine as there
were no (has_smt && !has_mc) conditions in the context.

Aso rename has_mc to has_mp now that it's not just for cores.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: a.p.zijlstra@chello.nl
Cc: fenghua.yu@intel.com
Link: http://lkml.kernel.org/r/1369831695-11970-1-git-send-email-drjones@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-31 13:10:38 +02:00
Jacob Shin
757885e94a x86, microcode, amd: Early microcode patch loading support for AMD
Add early microcode patch loading support for AMD.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/1369940959-2077-5-git-send-email-jacob.shin@amd.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
2013-05-30 20:19:25 -07:00
Jacob Shin
a76096a657 x86, microcode, amd: Refactor functions to prepare for early loading
In preparation work for early loading, refactor some common functions
that will be shared, and move some struct defines to a common header file.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/1369940959-2077-4-git-send-email-jacob.shin@amd.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
2013-05-30 20:19:25 -07:00
Jacob Shin
f2b3ee820a x86, microcode: Vendor abstract out save_microcode_in_initrd()
Currently save_microcode_in_initrd() is declared in vendor neutural
microcode.h file, but defined in vendor specific
microcode_intel_early.c file. Vendor abstract it out to
microcode_core_early.c with a wrapper function.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/1369940959-2077-3-git-send-email-jacob.shin@amd.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
2013-05-30 20:19:25 -07:00
Borislav Petkov
83b325f1b4 x86, microcode, intel: Correct typo in printk
User-visible so correct it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1369940959-2077-2-git-send-email-jacob.shin@amd.com
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
2013-05-30 20:19:25 -07:00
Linus Torvalds
484b002e28 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:

 - Three EFI-related fixes

 - Two early memory initialization fixes

 - build fix for older binutils

 - fix for an eager FPU performance regression -- currently we don't
   allow the use of the FPU at interrupt time *at all* in eager mode,
   which is clearly wrong.

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Allow FPU to be used at interrupt time even with eagerfpu
  x86, crc32-pclmul: Fix build with older binutils
  x86-64, init: Fix a possible wraparound bug in switchover in head_64.S
  x86, range: fix missing merge during add range
  x86, efi: initial the local variable of DataSize to zero
  efivar: fix oops in efivar_update_sysfs_entries() caused by memory reuse
  efivarfs: Never return ENOENT from firmware again
2013-05-31 09:44:10 +09:00
Pekka Riikonen
5187b28ff0 x86: Allow FPU to be used at interrupt time even with eagerfpu
With the addition of eagerfpu the irq_fpu_usable() now returns false
negatives especially in the case of ksoftirqd and interrupted idle task,
two common cases for FPU use for example in networking/crypto.  With
eagerfpu=off FPU use is possible in those contexts.  This is because of
the eagerfpu check in interrupted_kernel_fpu_idle():

...
  * For now, with eagerfpu we will return interrupted kernel FPU
  * state as not-idle. TBD: Ideally we can change the return value
  * to something like __thread_has_fpu(current). But we need to
  * be careful of doing __thread_clear_has_fpu() before saving
  * the FPU etc for supporting nested uses etc. For now, take
  * the simple route!
...
 	if (use_eager_fpu())
 		return 0;

As eagerfpu is automatically "on" on those CPUs that also have the
features like AES-NI this patch changes the eagerfpu check to return 1 in
case the kernel_fpu_begin() has not been said yet.  Once it has been the
__thread_has_fpu() will start returning 0.

Notice that with eagerfpu the __thread_has_fpu is always true initially.
FPU use is thus always possible no matter what task is under us, unless
the state has already been saved with kernel_fpu_begin().

[ hpa: this is a performance regression, not a correctness regression,
  but since it can be quite serious on CPUs which need encryption at
  interrupt time I am marking this for urgent/stable. ]

Signed-off-by: Pekka Riikonen <priikone@iki.fi>
Link: http://lkml.kernel.org/r/alpine.GSO.2.00.1305131356320.18@git.silcnet.org
Cc: <stable@vger.kernel.org> v3.7+
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-05-30 16:36:42 -07:00
Dimitri Sivanich
879d5ad0dc x86/UV: Add GRU distributed mode mappings
GRU hardware will support an optional distributed mode that will
allow  per-node address mapping of local GRU space, as opposed
to mapping all GRU hardware to the same contiguous high space.

If GRU distributed mode is selected, setup per-node page table
mappings.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Cc: Alexander Gordeev <agordeev@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: Mike Travis <travis@sgi.com>
Link: http://lkml.kernel.org/r/20130529155609.GB22917@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-30 09:46:09 +02:00
Zhang Yanfei
e9d0626ed4 x86-64, init: Fix a possible wraparound bug in switchover in head_64.S
In head_64.S, a switchover has been used to handle kernel crossing
1G, 512G boundaries.

And commit 8170e6bed4
    x86, 64bit: Use a #PF handler to materialize early mappings on demand
said:
    During the switchover in head_64.S, before #PF handler is available,
    we use three pages to handle kernel crossing 1G, 512G boundaries with
    sharing page by playing games with page aliasing: the same page is
    mapped twice in the higher-level tables with appropriate wraparound.

But from the switchover code, when we set up the PUD table:
114         addq    $4096, %rdx
115         movq    %rdi, %rax
116         shrq    $PUD_SHIFT, %rax
117         andl    $(PTRS_PER_PUD-1), %eax
118         movq    %rdx, (4096+0)(%rbx,%rax,8)
119         movq    %rdx, (4096+8)(%rbx,%rax,8)

It seems line 119 has a potential bug there. For example,
if the kernel is loaded at physical address 511G+1008M, that is
    000000000 111111111 111111000 000000000000000000000
and the kernel _end is 512G+2M, that is
    000000001 000000000 000000001 000000000000000000000
So in this example, when using the 2nd page to setup PUD (line 114~119),
rax is 511.
In line 118, we put rdx which is the address of the PMD page (the 3rd page)
into entry 511 of the PUD table. But in line 119, the entry we calculate from
(4096+8)(%rbx,%rax,8) has exceeded the PUD page. IMO, the entry in line
119 should be wraparound into entry 0 of the PUD table.

The patch fixes the bug.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Link: http://lkml.kernel.org/r/5191DE5A.3020302@cn.fujitsu.com
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: <stable@vger.kernel.org> v3.9
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-05-28 15:41:59 -07:00
David Vrabel
3565184ed0 x86: Increase precision of x86_platform.get/set_wallclock()
All the virtualized platforms (KVM, lguest and Xen) have persistent
wallclocks that have more than one second of precision.

read_persistent_wallclock() and update_persistent_wallclock() allow
for nanosecond precision but their implementation on x86 with
x86_platform.get/set_wallclock() only allows for one second precision.
This means guests may see a wallclock time that is off by up to 1
second.

Make set_wallclock() and get_wallclock() take a struct timespec
parameter (which allows for nanosecond precision) so KVM and Xen
guests may start with a more accurate wallclock time and a Xen dom0
can maintain a more accurate wallclock for guests.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-05-28 14:00:59 -07:00
Thorsten Glaser
8ef726cb23 update AMD powerflags comments
from http://www.flounder.com/cpuid80000007amd.gif
and http://support.amd.com/us/Embedded_TechDocs/25481.pdf

Signed-off-by: Thorsten Glaser <t.glaser@tarent.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-05-28 12:02:10 +02:00
Peter Zijlstra
1b45adcd9a perf/x86/amd: Rework AMD PMU init code
Josh reported that his QEMU is a bad hardware emulator and trips a
WARN in the AMD PMU init code. He requested the WARN be turned into a
pr_err() or similar.

While there, rework the code a little.

Reported-by: Josh Boyer <jwboyer@redhat.com>
Acked-by: Robert Richter <rric@kernel.org>
Acked-by: Jacob Shin <jacob.shin@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130521110537.GG26912@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-05-28 09:13:55 +02:00