Commit graph

12899 commits

Author SHA1 Message Date
Stefano Stabellini
24bdb0b62c xen: do not create the extra e820 region at an addr lower than 4G
Do not add the extra e820 region at a physical address lower than 4G
because it breaks e820_end_of_low_ram_pfn().

It is OK for us to move the xen_extra_mem_start up and down because this
is the index of the memory that can be ballooned in/out - it is memory
not available to the kernel during bootup.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-04-20 09:04:40 -04:00
Linus Torvalds
fdfc552abe Merge branches 'core-fixes-for-linus', 'perf-fixes-for-linus', 'sched-fixes-for-linus', 'timer-fixes-for-linus' and 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  futex: Set FLAGS_HAS_TIMEOUT during futex_wait restart setup

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf_event: Fix cgrp event scheduling bug in perf_enable_on_exec()
  perf: Fix a build error with some GCC versions

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Fix erroneous all_pinned logic
  sched: Fix sched-domain avg_load calculation

* 'timer-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  RTC: rtc-mrst: follow on to the change of rtc_device_register()
  RTC: add missing "return 0" in new alarm func for rtc-bfin.c
  RTC: Fix s3c compile error due to missing s3c_rtc_setpie
  RTC: Fix early irqs caused by calling rtc_set_alarm too early

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, amd: Disable GartTlbWlkErr when BIOS forgets it
  x86, NUMA: Fix fakenuma boot failure
  x86/mrst: Fix boot crash caused by incorrect pin to irq mapping
  x86/ce4100: Add reg property to bridges
2011-04-16 09:45:08 -07:00
Joerg Roedel
5bbc097d89 x86, amd: Disable GartTlbWlkErr when BIOS forgets it
This patch disables GartTlbWlk errors on AMD Fam10h CPUs if
the BIOS forgets to do is (or is just too old). Letting
these errors enabled can cause a sync-flood on the CPU
causing a reboot.

The AMD BKDG recommends disabling GART TLB Wlk Error completely.

This patch is the fix for

	https://bugzilla.kernel.org/show_bug.cgi?id=33012

on my machine.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Link: http://lkml.kernel.org/r/20110415131152.GJ18463@8bytes.org
Tested-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-04-15 16:03:16 -07:00
KOSAKI Motohiro
7d6b46707f x86, NUMA: Fix fakenuma boot failure
Currently, numa=fake boot parameter is broken. If it's used,
kernel may panic due to devide by zero error depending on CPU
configuration

Call Trace:
 [<ffffffff8104ad4c>] find_busiest_group+0x38c/0xd30
 [<ffffffff81086aff>] ? local_clock+0x6f/0x80
 [<ffffffff81050533>] load_balance+0xa3/0x600
 [<ffffffff81050f53>] idle_balance+0xf3/0x180
 [<ffffffff81550092>] schedule+0x722/0x7d0
 [<ffffffff81550538>] ? wait_for_common+0x128/0x190
 [<ffffffff81550a65>] schedule_timeout+0x265/0x320
 [<ffffffff81095815>] ? lock_release_holdtime+0x35/0x1a0
 [<ffffffff81550538>] ? wait_for_common+0x128/0x190
 [<ffffffff8109bb6c>] ? __lock_release+0x9c/0x1d0
 [<ffffffff815534e0>] ? _raw_spin_unlock_irq+0x30/0x40
 [<ffffffff815534e0>] ? _raw_spin_unlock_irq+0x30/0x40
 [<ffffffff81550540>] wait_for_common+0x130/0x190
 [<ffffffff81051920>] ? try_to_wake_up+0x510/0x510
 [<ffffffff8155067d>] wait_for_completion+0x1d/0x20
 [<ffffffff8107f36c>] kthread_create_on_node+0xac/0x150
 [<ffffffff81077bb0>] ? process_scheduled_works+0x40/0x40
 [<ffffffff8155045f>] ? wait_for_common+0x4f/0x190
 [<ffffffff8107a283>] __alloc_workqueue_key+0x1a3/0x590
 [<ffffffff81e0cce2>] cpuset_init_smp+0x6b/0x7b
 [<ffffffff81df3d07>] kernel_init+0xc3/0x182
 [<ffffffff8155d5e4>] kernel_thread_helper+0x4/0x10
 [<ffffffff81553cd4>] ? retint_restore_args+0x13/0x13
 [<ffffffff81df3c44>] ? start_kernel+0x400/0x400
 [<ffffffff8155d5e0>] ? gs_change+0x13/0x13

The divede by zero is caused by the following line,
group->cpu_power==0:

 kernel/sched_fair.c::update_sg_lb_stats()
        /* Adjust by relative CPU power of the group */
        sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE) / group->cpu_power;

This regression was caused by commit e23bba6044 ("x86-64, NUMA: Unify
emulated distance mapping") because it changes cpu -> node
mapping in the process of dropping fake_physnodes().

  old) all cpus are assinged node 0
  now) cpus are assigned round robin
       (the logic is implemented by numa_init_array())

  Note: The change in behavior only happens if the system doesn't
        have neither ACPI SRAT table nor AMD northbridge NUMA
	information.

Round robin assignment doesn't work because init_numa_sched_groups_power()
assumes all logical cpus in the same physical cpu share the same node
(then it only accounts for group_first_cpu()), and the simple round robin
breaks the above assumption.

Thus, this patch implements a reassignment of node-ids if buggy firmware
or numa emulation makes wrong cpu node map. Tt enforce all logical cpus
in the same physical cpu share the same node.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/20110415203928.1303.A69D9226@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-15 20:28:19 +02:00
Linus Torvalds
aaa119a3d4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
  fix XEN_SAVE_RESTORE Kconfig dependencies
  PM / Hibernate: Introduce CONFIG_HIBERNATE_CALLBACKS
2011-04-12 17:18:05 -07:00
Jacob Pan
9d90e49da5 x86/mrst: Fix boot crash caused by incorrect pin to irq mapping
Moorestown systems crash on boot because the secondary CPU
clockevent (apbt1) will fail to request irq#1, which does not
have ioapic chip in its irq_desc[] entry.

Background:

Moorestown platform does not have ISA bus nor legacy IRQs. It
reuses the range of legacy IRQs for regular device interrupts.
The routing information of early system device IRQs (timers) are
obtained from firmware provided SFI tables. We reuse/fake MP
configuration table to facilitate IRQ setup with IOAPIC.

Maintaining a 1:1 mapping of IOAPIC pin (RTE entry) and IRQ#
makes routing information clean and easy to understand on
Moorestown. Though optional.

This patch allows SFI timer and vRTC IRQ to be treated as ISA
IRQ so that pin2irq mapping will be 1:1.

Also fixed MP table type and use macros to clearly set MP IRQ
entries. As a result, apbt timer and RTC interrupts on
Moorestown are within legacy IRQ range:

 # cat /proc/interrupts
            CPU0       CPU1
   0:      11249          0   IO-APIC-edge      apbt0
   1:          0      12271   IO-APIC-edge      apbt1
   8:        887          0   IO-APIC-fasteoi   dw_spi
  13:          0          0   IO-APIC-fasteoi   INTEL_MID_DMAC2
  14:          0          0   IO-APIC-fasteoi   rtc0

Further discussion of this patch can be found at:

  https://lkml.org/lkml/2010/6/10/70

Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Link: http://lkml.kernel.org/r/1302286980-21139-1-git-send-email-jacob.jun.pan@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-12 08:38:52 +02:00
Linus Torvalds
16ad56972c Merge branch 'stable/bug-fixes-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/bug-fixes-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: Allow PV-OPS kernel to detect whether XSAVE is supported
  xen: just completely disable XSAVE
  xen/debug: Don't be so verbose with WARN on 1-1 mapping errors.
  xen: events: fix error checks in bind_*_to_irqhandler()
2011-04-11 20:01:04 -07:00
Shriram Rajagopalan
d419e4c0f7 fix XEN_SAVE_RESTORE Kconfig dependencies
Make XEN_SAVE_RESTORE select HIBERNATE_CALLBACKS.
Remove XEN_SAVE_RESTORE dependency from PM_SLEEP.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-04-11 22:54:48 +02:00
Sebastian Andrzej Siewior
30d746c680 x86/ce4100: Add reg property to bridges
without the reg property Ben's new code won't find the PCI & ISA
bridge and the devices won't get the DT-node attached.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: monstr@monstr.eu
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Link: http://lkml.kernel.org/r/20110407121315.GA9204@linutronix.de
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-11 17:37:02 +02:00
Linus Torvalds
8b9686ff4d Merge branches 'x86-fixes-for-linus', 'sched-fixes-for-linus', 'timers-fixes-for-linus', 'irq-fixes-for-linus' and 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86-32, fpu: Fix FPU exception handling on non-SSE systems
  x86, hibernate: Initialize mmu_cr4_features during boot
  x86-32, NUMA: Fix ACPI NUMA init broken by recent x86-64 change
  x86: visws: Fixup irq overhaul fallout

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Clean up rebalance_domains() load-balance interval calculation

* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86/mrst/vrtc: Fix boot crash in mrst_rtc_init()
  rtc, x86/mrst/vrtc: Fix boot crash in rtc_read_alarm()

* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  genirq: Fix cpumask leak in __setup_irq()

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf probe: Fix listing incorrect line number with inline function
  perf probe: Fix to find recursively inlined function
  perf probe: Fix multiple --vars options behavior
  perf probe: Fix to remove redundant close
  perf probe: Fix to ensure function declared file
2011-04-07 12:12:58 -07:00
Feng Tang
09552b2696 x86/mrst/vrtc: Fix boot crash in mrst_rtc_init()
The sfi_mrtc_array[] only gets initialized when the sfi mrtc
table is parsed, so the vrtc_paddr should be initalized after it
too.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1302140389-27603-1-git-send-email-feng.tang@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-07 11:27:42 +02:00
Hans Rosenfeld
f994d99cf1 x86-32, fpu: Fix FPU exception handling on non-SSE systems
On 32bit systems without SSE (that is, they use FSAVE/FRSTOR for FPU
context switches), FPU exceptions in user mode cause Oopses, BUGs,
recursive faults and other nasty things:

fpu exception: 0000 [#1]
last sysfs file: /sys/power/state
Modules linked in: psmouse evdev pcspkr serio_raw [last unloaded: scsi_wait_scan]

Pid: 1638, comm: fxsave-32-excep Not tainted 2.6.35-07798-g58a992b-dirty #633 VP3-596B-DD/VT82C597
EIP: 0060:[<c1003527>] EFLAGS: 00010202 CPU: 0
EIP is at math_error+0x1b4/0x1c8
EAX: 00000003 EBX: cf9be7e0 ECX: 00000000 EDX: cf9c5c00
ESI: cf9d9fb4 EDI: c1372db3 EBP: 00000010 ESP: cf9d9f1c
DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
Process fxsave-32-excep (pid: 1638, ti=cf9d8000 task=cf9be7e0 task.ti=cf9d8000)
Stack:
00000000 00000301 00000004 00000000 00000000 cf9d3000 cf9da8f0 00000001
<0> 00000004 cf9b6b60 c1019a6b c1019a79 00000020 00000242 000001b6 cf9c5380
<0> cf806b40 cf791880 00000000 00000282 00000282 c108a213 00000020 cf9c5380
Call Trace:
[<c1019a6b>] ? need_resched+0x11/0x1a
[<c1019a79>] ? should_resched+0x5/0x1f
[<c108a213>] ? do_sys_open+0xbd/0xc7
[<c108a213>] ? do_sys_open+0xbd/0xc7
[<c100353b>] ? do_coprocessor_error+0x0/0x11
[<c12d5965>] ? error_code+0x65/0x70
Code: a8 20 74 30 c7 44 24 0c 06 00 03 00 8d 54 24 04 89 d9 b8 08 00 00 00 e8 9b 6d 02 00 eb 16 8b 93 5c 02 00 00 eb 05 e9 04 ff ff ff <9b> dd 32 9b e9 16 ff ff ff 81 c4 84 00 00 00 5b 5e 5f 5d c3 c6
EIP: [<c1003527>] math_error+0x1b4/0x1c8 SS:ESP 0068:cf9d9f1c

This usually continues in slight variations until the system is reset.

This bug was introduced by commit 58a992b9cb:
	x86-32, fpu: Rewrite fpu_save_init()

Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Link: http://lkml.kernel.org/r/1302106003-366952-1-git-send-email-hans.rosenfeld@amd.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2011-04-06 16:53:01 -07:00
H. Peter Anvin
4da9484bde x86, hibernate: Initialize mmu_cr4_features during boot
Restore the initialization of mmu_cr4_features during boot, which was
removed without comment in checkin e5f15b45dd

x86: Cleanup highmap after brk is concluded

thereby breaking resume from hibernate.  This restores previous
functionality in approximately the same place, and corrects the
reading of %cr4 on pre-CPUID hardware (%cr4 exists if and only if
CPUID is supported.)

However, part of the problem is that the hibernate suspend/resume
sequence should manage the save/restore of %cr4 explicitly.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <201104020154.57136.rjw@sisk.pl>
2011-04-06 13:10:02 -07:00
Shan Haitao
947ccf9c3c xen: Allow PV-OPS kernel to detect whether XSAVE is supported
Xen fails to mask XSAVE from the cpuid feature, despite not historically
supporting guest use of XSAVE.  However, now that XSAVE support has been
added to Xen, we need to reliably detect its presence.

The most reliable way to do this is to look at the OSXSAVE feature in
cpuid which is set iff the OS (Xen, in this case), has set
CR4.OSXSAVE.

[ Cleaned up conditional a bit. - Jeremy ]

Signed-off-by: Shan Haitao <haitao.shan@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-04-06 08:31:13 -04:00
Jeremy Fitzhardinge
61f4237d5b xen: just completely disable XSAVE
Some (old) versions of Xen just kill the domain if it tries to set any
unknown bits in CR4, so we can't reliably probe for OSXSAVE in
CR4.

Since Xen doesn't support XSAVE for guests at the moment, and no such
support is being worked on, there's no downside in just unconditionally
masking XSAVE support.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-04-06 08:31:00 -04:00
Andre Przywara
bd22f5cfcf KVM: move and fix substitue search for missing CPUID entries
If KVM cannot find an exact match for a requested CPUID leaf, the
code will try to find the closest match instead of simply confessing
it's failure.
The implementation was meant to satisfy the CPUID specification, but
did not properly check for extended and standard leaves and also
didn't account for the index subleaf.
Beside that this rule only applies to CPUID intercepts, which is not
the only user of the kvm_find_cpuid_entry() function.

So fix this algorithm and call it from kvm_emulate_cpuid().
This fixes a crash of newer Linux kernels as KVM guests on
AMD Bulldozer CPUs, where bogus values were returned in response to
a CPUID intercept.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-04-06 13:15:56 +03:00
Andre Przywara
20800bc940 KVM: fix XSAVE bit scanning
When KVM scans the 0xD CPUID leaf for propagating the XSAVE save area
leaves, it assumes that the leaves are contigious and stops at the
first zero one. On AMD hardware there is a gap, though, as LWP uses
leaf 62 to announce it's state save area.
So lets iterate through all 64 possible leaves and simply skip zero
ones to also cover later features.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-04-06 13:15:55 +03:00
Konrad Rzeszutek Wilk
d88885d092 xen/debug: Don't be so verbose with WARN on 1-1 mapping errors.
There are valid situations in which this error is not
a warning. Mainly when QEMU maps a guest memory and uses
the VM_IO flag to set the MFNs. For right now make the
WARN be WARN_ONCE. In the future we will:

 1). Remove the VM_IO code handling..
 2). .. which will also remove this debug facility.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-04-04 14:48:20 -04:00
Linus Torvalds
d7c764c4c7 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, UV: Fix kdump reboot
  x86, amd-nb: Rename CPU PCI id define for F4
  sound: Add delay.h to sound/soc/codecs/sn95031.c
  x86, mtrr, pat: Fix one cpu getting out of sync during resume
  x86, microcode: Unregister syscore_ops after microcode unloaded
  x86: Stop including <linux/delay.h> in two asm header files
2011-04-04 08:37:45 -07:00
Linus Torvalds
4da7e90e65 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf: Fix task_struct reference leak
  perf: Fix task context scheduling
  perf: mmap 512 kiB by default
  perf: Rebase max unprivileged mlock threshold on top of page size
  perf tools: Fix NO_NEWT=1 python build error
  perf symbols: Properly align symbol_conf.priv_size
  perf tools: Emit clearer message for sys_perf_event_open ENOENT return
  perf tools: Fixup exit path when not able to open events
  perf symbols: Fix vsyscall symbol lookup
  oprofile, x86: Allow setting EDGE/INV/CMASK for counter events
2011-04-04 08:36:40 -07:00
Linus Torvalds
fb9a7d76da Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  rcu: create new rcu_access_index() and use in mce
  WARN_ON_SMP(): Add comment to explain ({0;})
2011-04-04 08:36:15 -07:00
Tejun Heo
765af22da8 x86-32, NUMA: Fix ACPI NUMA init broken by recent x86-64 change
Commit d8fc3afc49 (x86, NUMA: Move *_numa_init() invocations
into initmem_init()) moved acpi_numa_init() call into NUMA
initmem_init() but forgot to update 32bit NUMA init breaking ACPI
NUMA configuration for 32bit.

acpi_numa_init() call was later moved again to srat_64.c.  Match
it by adding the call to get_memcfg_from_srat() in srat_32.c.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: H. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <20110404100645.GE1420@mtj.dyndns.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-04-04 16:56:33 +02:00
Thomas Gleixner
43a6246f9c x86: visws: Fixup irq overhaul fallout
Reported-by: Ian Campbell <Ian.Campbell@eu.citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-04-04 16:51:15 +02:00
Paul E. McKenney
a4dd99250d rcu: create new rcu_access_index() and use in mce
The MCE subsystem needs to sample an RCU-protected index outside of
any protection for that index.  If this was a pointer, we would use
rcu_access_pointer(), but there is no corresponding rcu_access_index().
This commit therefore creates an rcu_access_index() and applies it
to MCE.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Zdenek Kabelac <zkabelac@redhat.com>
2011-04-01 07:27:31 -07:00
Cliff Wickman
818987e9a1 x86, UV: Fix kdump reboot
After a crash dump on an SGI Altix UV system the crash kernel
fails to cause a reboot.  EFI mode is disabled in the kdump
kernel, so only the reboot_type of BOOT_ACPI works.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: rja@sgi.com
LKML-Reference: <E1Q5Iuo-00013b-UK@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-31 18:44:03 +02:00
Borislav Petkov
cb6c8520f6 x86, amd-nb: Rename CPU PCI id define for F4
With increasing number of PCI function ids, add the PCI function
id in the define name instead of its symbolic name in the BKDG
for more clarity. This renames function 4 define.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
LKML-Reference: <20110330183447.GA3668@aftab>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-31 08:51:38 +02:00
Ingo Molnar
9f644c4ba8 Merge commit 'v2.6.39-rc1' into perf/urgent
Merge reason: use the post-merge-window tree.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-30 09:07:43 +02:00
Suresh Siddha
84ac7cdbdd x86, mtrr, pat: Fix one cpu getting out of sync during resume
On laptops with core i5/i7, there were reports that after resume
graphics workloads were performing poorly on a specific AP, while
the other cpu's were ok. This was observed on a 32bit kernel
specifically.

Debug showed that the PAT init was not happening on that AP
during resume and hence it contributing to the poor workload
performance on that cpu.

On this system, resume flow looked like this:

1. BP starts the resume sequence and we reinit BP's MTRR's/PAT
   early on using mtrr_bp_restore()

2. Resume sequence brings all AP's online

3. Resume sequence now kicks off the MTRR reinit on all the AP's.

4. For some reason, between point 2 and 3, we moved from BP
   to one of the AP's. My guess is that printk() during resume
   sequence is contributing to this. We don't see similar
   behavior with the 64bit kernel but there is no guarantee that
   at this point the remaining resume sequence (after AP's bringup)
   has to happen on BP.

5. set_mtrr() was assuming that we are still on BP and skipped the
   MTRR/PAT init on that cpu (because of 1 above)

6. But we were on an AP and this led to not reprogramming PAT
   on this cpu leading to bad performance.

Fix this by doing unconditional mtrr_if->set_all() in set_mtrr()
during MTRR/PAT init. This might be unnecessary if we are still
running on BP. But it is of no harm and will guarantee that after
resume, all the cpu's will be in sync with respect to the
MTRR/PAT registers.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1301438292-28370-1-git-send-email-eric@anholt.net>
Signed-off-by: Eric Anholt <eric@anholt.net>
Tested-by: Keith Packard <keithp@keithp.com>
Cc: stable@kernel.org	[v2.6.32+]
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-03-29 16:17:42 -07:00
Thomas Gleixner
86cc8dfc21 x86: apb_timer: Fixup genirq fallout
The lonely user of the internal interface was not in the coccinelle
script.

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-30 00:13:30 +02:00
Linus Torvalds
90f1e7481e Merge branch 'stable/bug-fixes-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/bug-fixes-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: Use new irq_move functions
  xen: Convert genirq namespace
  xen: fix p2m section mismatches
  xen/p2m: Allocate p2m tracking pages on override
  xen-gntdev: unlock on error path in gntdev_mmap()
  xen-gntdev: return -EFAULT on copy_to_user failure
2011-03-29 11:36:52 -07:00
Randy Dunlap
b83c6e55ac xen: fix p2m section mismatches
Fix section mismatch warnings:
set_phys_range_identity() is called by __init xen_set_identity(),
so also mark set_phys_range_identity() as __init.
then:
__early_alloc_p2m() is called set_phys_range_identity(), so also mark
__early_alloc_p2m() as __init.

WARNING: arch/x86/built-in.o(.text+0x7856): Section mismatch in reference from the function __early_alloc_p2m() to the function .init.text:extend_brk()
The function __early_alloc_p2m() references
the function __init extend_brk().
This is often because __early_alloc_p2m lacks a __init
annotation or the annotation of extend_brk is wrong.

WARNING: arch/x86/built-in.o(.text+0x7967): Section mismatch in reference from the function set_phys_range_identity() to the function .init.text:extend_brk()
The function set_phys_range_identity() references
the function __init extend_brk().
This is often because set_phys_range_identity lacks a __init
annotation or the annotation of extend_brk is wrong.

[v2: Per Stephen Hemming recommonedation made __early_alloc_p2m static]
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-03-29 10:01:03 -04:00
Xiaotian Feng
4ac5fc6a3e x86, microcode: Unregister syscore_ops after microcode unloaded
Currently, microcode doesn't unregister syscore_ops after it's
unloaded. So if we modprobe then rmmod microcode, the stale
microcode syscore_ops info will stay on syscore_ops_list.

Later when we're trying to reboot/halt/shutdown the machine, kernel
will panic on syscore_shutdown().

With the patch applied, I can reboot/halt/shutdown my machine successfully.

Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
LKML-Reference: <1301387672-23661-1-git-send-email-dfeng@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-29 11:12:04 +02:00
Jean Delvare
ca444564a9 x86: Stop including <linux/delay.h> in two asm header files
Stop including <linux/delay.h> in x86 header files which don't
need it. This will let the compiler complain when this header is
not included by source files when it should, so that
contributors can fix the problem before building on other
architectures starts to fail.

Credits go to Geert for the idea.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: James E.J. Bottomley <James.Bottomley@suse.de>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
LKML-Reference: <20110325152014.297890ec@endymion.delvare>
[ this also fixes an upstream build bug in drivers/media/rc/ite-cir.c ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-29 09:37:42 +02:00
Ingo Molnar
1dfd7b494b Merge branch 'core' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into perf/urgent 2011-03-29 09:32:28 +02:00
Linus Torvalds
036a98263a Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: aesni-intel - fixed problem with packets that are not multiple of 64bytes
2011-03-28 13:07:49 -07:00
Linus Torvalds
551b0bda46 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6:
  mfd: Clean up max8997 IRQ namespace
  mfd: Fold irq_set_chip/irq_set_handler
  mfd: Cleanup irq namespace
  mfd: twl6030: Cleanup interrupt handling
  mfd: twl4030: Cleanup interrupt handling
  mfd: mx8925: Remove irq_desc leftovers
  mfd: htc-i2cpld: Cleanup interrupt handling
  mfd: htc-egpio: Cleanup interrupt handling
  mfd: ezx-pcap: Remvove open coded irq handling
  mfd: 88pm860x: Remove unused irq_desc leftovers
  mfd: asic3: Cleanup irq handling
  mfd: Select MFD_CORE if TPS6105X driver is configured
  mfd: Add MODULE_DEVICE_TABLE to rdc321x-southbridge
  mfd: Add MAX8997/8966 IRQ control
  mfd: Constify i2c_device_id tables
  mfd: OLPC: Clean up names to match what OLPC actually uses
  mfd: Add mfd_clone_cell(), convert cs5535-mfd/olpc-xo1 to it
2011-03-27 20:07:01 -07:00
Christoph Lameter
d7c3f8cee8 percpu: Omit segment prefix in the UP case for cmpxchg_double
Omit the segment prefix in the UP case. GS is not used then
and we will generate segfaults if cmpxchg16b is used otherwise.

Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-27 19:25:36 -07:00
Tadeusz Struk
60af520cf2 crypto: aesni-intel - fixed problem with packets that are not multiple of 64bytes
This patch fixes problem with packets that are not multiple of 64bytes.

Signed-off-by: Adrian Hoban <adrian.hoban@intel.com>
Signed-off-by: Aidan O'Mahony <aidan.o.mahony@intel.com>
Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2011-03-27 10:29:39 +08:00
Daniel Drake
adfa4bd4a8 mfd: OLPC: Clean up names to match what OLPC actually uses
The cs5535-pms cell doesn't actually need to be cloned, so we can drop that
and simply have the olpc-xo1.c driver use "cs5535-pms" directly.

Also, rename the cs5535-acpi clones to what we actually use for the (currently
out-of-tree) SCI driver.  In the process, that fixes a subtle bug in
olpc-xo1.c which broke powerdown on XO-1s.. olpc-xo1-ac-acpi was a typo, not
something that actually existed.

Signed-off-by: Daniel Drake <dsd@laptop.org>
Signed-off-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-27 00:09:31 +01:00
Andres Salomon
fa1df69168 mfd: Add mfd_clone_cell(), convert cs5535-mfd/olpc-xo1 to it
Replace mfd_shared_platform_driver_register with mfd_clone_cell.  The
former was called by an mfd client, and registered both a platform driver
and device.  The latter is called by an mfd driver, and registers only a
platform device.

The downside of this is that mfd drivers need to be modified whenever
new clients are added that share a cell; the upside is that it fits
Linux's driver model better.  It's also simpler.

This also converts cs5535-mfd/olpc-xo1 from the old API.  cs5535-mfd
now creates the olpc-xo1-{acpi,pms} devices, while olpc-xo1 binds to
them via platform drivers.

Signed-off-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-27 00:09:30 +01:00
Linus Torvalds
16c29dafcc Merge branch 'syscore' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6
* 'syscore' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
  Introduce ARCH_NO_SYSDEV_OPS config option (v2)
  cpufreq: Use syscore_ops for boot CPU suspend/resume (v2)
  KVM: Use syscore_ops instead of sysdev class and sysdev
  PCI / Intel IOMMU: Use syscore_ops instead of sysdev class and sysdev
  timekeeping: Use syscore_ops instead of sysdev class and sysdev
  x86: Use syscore_ops instead of sysdev classes and sysdevs
2011-03-25 21:07:59 -07:00
Linus Torvalds
95e14ed7fc Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
  kdb: add usage string of 'per_cpu' command
  kgdb,x86_64: fix compile warning found with sparse
  kdb: code cleanup to use macro instead of value
  kgdboc,kgdbts: strlen() doesn't count the terminator
2011-03-25 21:04:56 -07:00
Linus Torvalds
2a20b02c05 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf, x86: Complain louder about BIOSen corrupting CPU/PMU state and continue
  perf, x86: P4 PMU - Read proper MSR register to catch unflagged overflows
  perf symbols: Look at .dynsym again if .symtab not found
  perf build-id: Add quirk to deal with perf.data file format breakage
  perf session: Pass evsel in event_ops->sample()
  perf: Better fit max unprivileged mlock pages for tools needs
  perf_events: Fix stale ->cgrp pointer in update_cgrp_time_from_cpuctx()
  perf top: Fix uninitialized 'counter' variable
  tracing: Fix set_ftrace_filter probe function display
  perf, x86: Fix Intel fixed counters base initialization
2011-03-25 17:53:09 -07:00
Linus Torvalds
94df491c4a Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  futex: Fix WARN_ON() test for UP
  WARN_ON_SMP(): Allow use in if() statements on UP
  x86, dumpstack: Use %pB format specifier for stack trace
  vsprintf: Introduce %pB format specifier
  lockdep: Remove unused 'factor' variable from lockdep_stats_show()
2011-03-25 17:52:22 -07:00
Linus Torvalds
26ff6801f7 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: DT: Cleanup namespace and call irq_set_irq_type() unconditional
  x86: DT: Fix return condition in irq_create_of_mapping()
  x86, mpparse: Move check_slot into CONFIG_X86_IO_APIC context
2011-03-25 17:51:51 -07:00
Jason Wessel
21431c2900 kgdb,x86_64: fix compile warning found with sparse
Fix sparse warning:

arch/x86/kernel/kgdb.c:123:9: warning: switch with no cases

Reported-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2011-03-25 16:37:31 -05:00
Ingo Molnar
45daae575e perf, x86: Complain louder about BIOSen corrupting CPU/PMU state and continue
Eric Dumazet reported that hardware PMU events do not work on his
system, due to the BIOS corrupting PMU state:

    Performance Events: PEBS fmt0+, Core2 events, Broken BIOS detected, using software events only.
    [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 186 is 43003c)

Linus suggested that we continue in the face of such BIOS-induced CPU
state corruption:

   http://lkml.org/lkml/2011/3/24/608

Such BIOSes will have to be fixed - Linux developers rely on a working and
fully capable PMU and the BIOS interfering with the CPU's PMU state is simply
not acceptable.

So this patch changes perf to continue when it detects such BIOS
interaction, some hardware events may be unreliable due to the BIOS
writing and re-writing them - there's not much the kernel can do
about that but to detect the corruption and report it.

Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-25 11:23:41 +01:00
Thomas Gleixner
07611dbda5 x86: DT: Cleanup namespace and call irq_set_irq_type() unconditional
That call escaped the name space cleanup. Fix it up.

We really want to call there. The chip might have changed since the
irq was setup initially. So let the core code and the chip decide what
to do. The status is just an unreliable snapshot.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2011-03-24 23:17:56 +01:00
Thomas Gleixner
00a30b254b x86: DT: Fix return condition in irq_create_of_mapping()
The xlate() function returns 0 or a negative error code. Returning the
error code blindly will be seen as an huge irq number by the calling
function because irq_create_of_mapping() returns an unsigned value.

Return 0 (NO_IRQ) as required.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2011-03-24 23:17:56 +01:00
Don Zickus
242214f9c1 perf, x86: P4 PMU - Read proper MSR register to catch unflagged overflows
The read of a proper MSR register was missed and instead of
counter the configration register was tested (it has
ARCH_P4_UNFLAGGED_BIT always cleared) leading to unknown NMI
hitting the system. As result the user may obtain "Dazed and
confused, but trying to continue" message. Fix it by reading a
proper MSR register.

When an NMI happens on a P4, the perf nmi handler checks the
configuration register to see if the overflow bit is set or not
before taking appropriate action.  Unfortunately, various P4
machines had a broken overflow bit, so a backup mechanism was
implemented.  This mechanism checked to see if the counter
rolled over or not.

A previous commit that implemented this backup mechanism was
broken. Instead of reading the counter register, it used the
configuration register to determine if the counter rolled over
or not. Reading that bit would give incorrect results.

This would lead to 'Dazed and confused' messages for the end
user when using the perf tool (or if the nmi watchdog is
running).

The fix is to read the counter register before determining if
the counter rolled over or not.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Lin Ming <ming.m.lin@intel.com>
LKML-Reference: <4D8BAB49.3080701@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-24 21:40:01 +01:00