smp_call_function and smp_call_function_single are almost complete
duplicates of the same logic. This patch combines them by
implementing them in terms of the more general
smp_call_function_mask().
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Stephane Eranian <eranian@hpl.hp.com>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Andi Kleen <ak@suse.de>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Cc: Ingo Molnar <mingo@elte.hu>
The reboot_fixups stuff seems to be a bit of a mess, specifically the
header is in linux/ when its a purely i386-specific piece of code. I'm
not sure why it has its config option; its only currently needed for
"geode-gx1/cs5530a", so perhaps whatever config option controls that
hardware should enable this?
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Change sysenter_setup to __cpuinit.
Change __INIT & __INITDATA to be cpu hotplug aware.
Resolve MODPOST warnings similar to:
WARNING: vmlinux - Section mismatch: reference to .init.text:sysenter_setup from
.text between 'identify_cpu' (at offset 0xc040a380) and 'detect_ht'
and
WARNING: vmlinux - Section mismatch: reference to .init.data:vsyscall_int80_end
from .text between 'sysenter_setup' (at offset 0xc041a269) and 'enable_sep_cpu'
WARNING: vmlinux - Section mismatch: reference to
.init.data:vsyscall_int80_start from .text between 'sysenter_setup' (at offset
0xc041a26e) and 'enable_sep_cpu'
WARNING: vmlinux - Section mismatch: reference to
.init.data:vsyscall_sysenter_end from .text between 'sysenter_setup' (at offset
0xc041a275) and 'enable_sep_cpu'
WARNING: vmlinux - Section mismatch: reference to
.init.data:vsyscall_sysenter_start from .text between 'sysenter_setup' (at
offset 0xc041a27a) and 'enable_sep_cpu'
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Hello,
This patch against 2.6.20-git14 makes the NMI watchdog use PERFSEL1/PERFCTR1
instead of PERFSEL0/PERFCTR0 on processors supporting Intel architectural
perfmon, such as Intel Core 2. Although all PMU events can work on
both counters, the Precise Event-Based Sampling (PEBS) requires that the
event be in PERFCTR0 to work correctly (see section 18.14.4.1 in the
IA32 SDM Vol 3b).
A similar patch for x86-64 is to follow.
Changelog:
- make the i386 NMI watchdog use PERFSEL1/PERFCTR1 instead of PERFSEL0/PERFCTR0
on processors supporting the Intel architectural perfmon (e.g. Core 2 Duo).
This allows PEBS to work when the NMI watchdog is active.
signed-off-by: stephane eranian <eranian@hpl.hp.com>
Signed-off-by: Andi Kleen <ak@suse.de>
a userspace fault or a kernelspace fault which will result in the
immediate death of the process. They should not be filled in as a
result of a kernelspace fault which can be fixed up.
Otherwise, if the process is handling SIGSEGV and examining the fault
information, this can result in the kernel space fault trashing the
previously stored fault information if it arrives between the
userspace fault happening and the SIGSEGV being delivered to the process.
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>
--
arch/i386/kernel/traps.c | 24 ++++++++++++++++++------
arch/x86_64/kernel/traps.c | 30 +++++++++++++++++++++++-------
2 files changed, 41 insertions(+), 13 deletions(-)
Remove the assumption that if the first page of a legacy ROM is mapped,
it'll all be mapped. This'll also stop people reading this code from
wondering if they're looking at a bug...
Signed-off-by: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Martin Murray <murrayma@citi.umich.edu>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Eliminated the arch/i386/kernel/timers in 2.6.18, use clocksoures instead.
pit_latch_buggy was referred in timers/timer_tsc.c, and currently removed.
Therefore nobody refer it.
Until 2.6.17, MediaGX's TSC works correctly. after 2.6.18, warned "TSC
appears to be running slowly. Marking it as unstable". So marked unstable
TSC when CS55x0.
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Whether a region is below 1Mb is determined by its start rather than
its end.
This hunk got erroneously dropped from a previous patch.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
No need to use -traditional for processing asm in i386/kernel/
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Synchronize i386's smp_send_stop() with x86-64's in only try-locking
the call lock to prevent deadlocks when called from panic().
In both version, disable interrupts before clearing the CPU off the
online map to eliminate races with IRQ handlers inspecting this map.
Also in both versions, save/restore interrupts rather than disabling/
enabling them.
On x86-64, eliminate one function used here by folding it into its
single caller, convert to static, and rename for consistency with i386
(lkcd may like this).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
This isn't true for voyager, so alter setup_pit_timer() to initialise
the cpumask from the current processor id (which should be the boot
processor) rather than defaulting to zero.
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The PowerNow! driver for Opteron reports the number of cores
in the system, but claims to report the number of processors.
Fix this minor cosmetic bug.
Signed-off-by: Bhavana Nagendra <bhavana.nagendra@amd.com>
Acked-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
fill_powernow_table_pstate() and fill_powernow_table_fidvid() are only
defined and used for X86_POWERNOW_K8_ACPI.
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Dave Jones <davej@redhat.com>
There is something wrong with this code. It needs more
testing. It is better to disable it for now because support
for some machines will be broken.
Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
Replace obsolete pci_find_device with pci_get_device.
Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
The following patch prevent this warning to be displayed again & again (eg:
nine times on my NForce2 motherboard) and thus improve signal to noise
ratio in logs.
The ATI quirk below probably needs a similar "fix" but I don't have
the hardware to test.
Btw arch/x86_64/kernel/early-quirks.c::nvidia_bugs() would probably need to
be synced (but I don't have an x86_64 NVidia motherboard to boot test it).
Still it shows the usefullity of the recent x86 merge thread.
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Thierry Vignaud <tvignaud@mandriva.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
noreplacement is dangerous on modern systems because it will not replace the
context switch FNSAVE with SSE aware FXSAVE. But other places in the kernel still assume
SSE and do FXSAVE and the CPU will then access FXSAVE information with
FNSAVE and cause corruption.
Easiest way to avoid this is to remove the option. It was mostly for paranoia
reasons anyways and alternative()s have been stable for some time.
Thanks to Jeremy F. for reporting and helping debug it.
Signed-off-by: Andi Kleen <ak@suse.de>
Support for Longhaul ver. 2 broke driver for VIA C3 Eden 600MHz with
Samuel 2 core. Processor is not able to switch frequency anymore. I
don't know much about this issue at the moment, but until (if ever) I
will know why, this part should be reversed.
Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While reviewing this code again I found a potential overflow of the bitmap.
The p4 oprofile can theoretically set bits beyond the reservation bitmap for
specific configurations. Avoid that by sizing the bitmaps properly.
Signed-off-by: Andi Kleen <ak@suse.de>
Due to an over aggressive optimizer gcc 4.2 cannot optimize away _proxy_pda
in all cases (counter intuitive, but true). This breaks loading of some
modules.
The earlier workaround to just export a dummy symbol didn't work unfortunately
because the module code ignores exports with 0 value.
Make it 1 instead.
Signed-off-by: Andi Kleen <ak@suse.de>
Fix logic error in VMI relocation processing. NOPs would always cause
a BUG_ON to fire because the != RELOCATION_NONE in the first if clause
precluding the == VMI_RELOCATION_NOP in the second clause. Make these
direct equality tests and just warn for unsupported relocation types
(which should never happen), falling back to native in that case.
Thanks to Anthony Liguori for noting this!
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since lazy MMU batching mode still allows interrupts to enter, it is
possible for interrupt handlers to try to use kmap_atomic, which fails when
lazy mode is active, since the PTE update to highmem will be delayed. The
best workaround is to issue an explicit flush in kmap_atomic_functions
case; this is the only way nested PTE updates can happen in the interrupt
handler.
Thanks to Jeremy Fitzhardinge for noting the bug and suggestions on a fix.
This patch gets reverted again when we start 2.6.22 and the bug gets fixed
differently.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6:
[PATCH] x86: Don't probe for DDC on VBE1.2
[PATCH] x86-64: Increase NMI watchdog probing timeout
[PATCH] x86-64: Let oprofile reserve MSR on all CPUs
[PATCH] x86-64: Disable local APIC timer use on AMD systems with C1E
Fix the regression resulting from the recent change of suspend code
ordering that causes systems based on Intel x86 CPUs using the microcode
driver to hang during the resume.
The problem occurs since the microcode driver uses request_firmware() in
its CPU hotplug notifier, which is called after tasks has been frozen and
hangs. It can be fixed by telling the microcode driver to use the
microcode stored in memory during the resume instead of trying to load it
from disk.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Adrian Bunk <bunk@stusta.de>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Maxim <maximlevitsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The MSR reservation is per CPU and oprofile would only allocate them
on the CPU it was initialized on. Change this to handle all CPUs.
This also fixes a warning about unprotected use of smp_processor_id()
in preemptible kernels.
Signed-off-by: Andi Kleen <ak@suse.de>
AMD dual core laptops with C1E do not run the APIC timer correctly
when they go idle. Previously the code assumed this only happened
on C2 or deeper. But not all of these systems report support C2.
Use a AMD supplied snippet to detect C1E being enabled and then disable
local apic timer use.
This supercedes an earlier workaround using DMI detection of specific systems.
Thanks to Mark Langsdorf for the detection snippet.
Signed-off-by: Andi Kleen <ak@suse.de>
This adds support of suspend/resume on i386 for HPET, which fixes a
number of timer-related failures around STR.
Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com>
Acked-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Acked-by: Jeff Chua <jeff.chua.linux@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The clockevents / tick management code expects an error value, when the
event is already expired. hpet_next_event() returns 1 in that case.
Fix it to return the proper -ETIME error code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
commit f9690982b8 removed the check for
cpu_khz from sched_clock(), which prevented early access to the TSC by
non obvious magic.
This is harmless as long as the CPU has a TSC. On TSCless systems this
results in an illegal instruction trap.
Replace tsc_disabled and tsc_unstable by tsc_enabled, which is only set
when the tsc is available and not unstable.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It turned out that it is almost impossible to trust ACPI, BIOS & Co.
regarding the C states. This was the reason to switch the local apic
timer off in C2 state already. OTOH there are sane and well behaving
systems, which get punished by that decision.
Allow the user to confirm that the local apic timer is trustworthy in C2
state. This keeps the default behaviour on the safe side.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
latest -git triggers an irqtrace/lockdep warning of a leaked
irqs-off condition:
BUG: at kernel/fork.c:1033 copy_process()
after some debugging it turns out that commit ca1b940c accidentally left
interrupts disabled - which trickled down all the way to the first time
we fork a kernel thread and triggered the warning.
the fix is to re-enable interrupts in the 'else' branch of
setup_boot_APIC_clock()'s pmtimers calibration path.
Reported-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@brown.paperbag.linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The local APIC timer stops to work in deeper C-States. This is handled by
the ACPI code and a broadcast mechanism in the clockevents / tick managment
code.
Some systems do not expose the deeper C-States to the kernel, but switch
into deeper C-States behind the kernels back. This delays the local apic
timer interrupts for ever and makes the systems unusable.
Add a command line option to disable the local apic timer and a dmi
quirk for known broken systems.
Andi sayeth:
While not wrong by itself i think it is still better to use some heuristic
-- like "has battery in ACPI" With the DMI table if the problem is more wide
spread we will just continue extending it.
But anyways should be ok now for .21 although I'm not really happy with
it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Grudgingly-acked-by: Andi Kleen <ak@suse.de>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The PIT has no dedicated mode for shut down. The only way to disable PIT
is to put it into one shot mode. AMD implementations of PIT on Geode
(also observed on Cyrix) are confused by an "empty" transition from
CLOCK_EVT_MODE_UNUSED to CLOCK_EVT_MODE_SHUTDOWN, which puts the PIT
into one shot mode momentarily.
I realized after staring helpless at the bug report
http://bugzilla.kernel.org/show_bug.cgi?id=8027 for quite a while, that
the only change, which might influence the bogomips calibration, is the
above transition during the PIT initialization.
Avoiding the unnecessary switch to oneshot and later to periodic mode
fixes the weird bogomips value and also the resulting slowness.
The fix is confirmed on OLPC and another Geode based box.
Note: this is unrelated to the Dual Core problem discussed here:
http://lkml.org/lkml/2007/3/17/48
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When PM-Timer is available for local APIC timer calibration we can skip the
verification of the calibrated time value. The resulting error is quite
small on a bunch of evaluated platforms and is less harming than the
observed false positives.
We need to keep the verification on systems, which have no PM-Timer to
avoid bogus local APIC timer calibrations in the range of factor 2-10,
which can be observed when swicthing off the PM-timer support in the kernel
configuration.
The wrong calibration values are probably caused by SMM code trying to
emulate a PS/2 keyboard from a (maybe connected or not) USB keyboard. This
prohibits the accurate delivery of PIT interrupts, which are used to
calibrate the local APIC timer. Unfortunately we have no way to disable
this BIOS misfeature in the early boot process.
Add also the dropped cpu_relax() back to the wait loops.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The symbol is not actually used, but the compiler unforunately generates
a (unused) reference to it. This can happen even in modules. So export it.
Signed-off-by: Andi Kleen <ak@suse.de>
VMI ROMs are pretty intimate to the kernel, so enforce their GPLness.
No \0 tricks checking for now
This rules out BSD/MIT modules for now, sorry -- the trouble is those
could come without source.
Acked-by: Zachary Amsden <zach@vmware.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
x86_64 nvidia_bugs() broke when we bailed out on not finding the HPET.
However, the quirk works by checking for _not_ finding the HPET...
Delete the nvidia_hpet_detected flag and simply test for
not finding the HPET, which is simple to do now that
acpi_table_parse returns 1 on failure.
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The root cause of this bug shows that this machine
could not possibly run an ACPI-aware OS without a
model specific workaround.
http://bugzilla.kernel.org/show_bug.cgi?id=5966
Signed-off-by: Len Brown <len.brown@intel.com>
check_tsc_sync_source() depends on being called with irqs disabled (it
checks whether the TSC is coherent across two specific CPUs). This is
incidentally true during bootup, but not during cpu hotplug __cpu_up().
This got found via smp_processor_id() debugging.
disable irqs explicitly and remove the unconditional enabling of
interrupts. Add touch_nmi_watchdog() to the cpu_online_map busy loop.
this bug is present both on i386 and on x86_64.
Reported-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The obsolete SA_xxx interrupt flags have been used despite the scheduled
removal. Fixup the remaining users.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
CC arch/i386/kernel/vmi.o
/home/bunk/linux/kernel-2.6/linux-2.6.21-rc2-mm1/arch/i386/kernel/vmi.c: In function 'vmi_map_pt_hook':
/home/bunk/linux/kernel-2.6/linux-2.6.21-rc2-mm1/arch/i386/kernel/vmi.c:387: error: 'KM_PTE0' undeclared (first use in this function)
/home/bunk/linux/kernel-2.6/linux-2.6.21-rc2-mm1/arch/i386/kernel/vmi.c:387: error: (Each undeclared identifier is reported only once
/home/bunk/linux/kernel-2.6/linux-2.6.21-rc2-mm1/arch/i386/kernel/vmi.c:387: error: for each function it appears in.)
/home/bunk/linux/kernel-2.6/linux-2.6.21-rc2-mm1/arch/i386/kernel/vmi.c:387: error: 'KM_PTE1' undeclared (first use in this function)
make[2]: *** [arch/i386/kernel/vmi.o] Error 1
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch resolves the issue found here:
http://bugme.osdl.org/show_bug.cgi?id=7426
The basic summary is:
Currently we register most of i386/x86_64 clocksources at module_init
time. Then we enable clocksource selection at late_initcall time. This
causes some problems for drivers that use gettimeofday for init
calibration routines (specifically the es1968 driver in this case),
where durring module_init, the only clocksource available is the low-res
jiffies clocksource. This may cause slight calibration errors, due to
the small sampling time used.
It should be noted that drivers that require fine grained time may not
function on architectures that do not have better then jiffies
resolution timekeeping (there are a few). However, this does not
discount the reasonable need for such fine-grained timekeeping at init
time.
Thus the solution here is to register clocksources earlier (ideally when
the hardware is being initialized), and then we enable clocksource
selection at fs_initcall (before device_initcall).
This patch should probably get some testing time in -mm, since
clocksource selection is one of the most important issues for correct
timekeeping, and I've only been able to test this on a few of my own
boxes.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Testing NMI watchdog ... CPU#0: NMI appears to be stuck (54->54)!
CPU#1: NMI appears to be stuck (0->0)!
Keep the PIT/HPET alive when nmi_watchdog = 1 is given on the command
line.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Critical fixes for SMP.
Fix a couple functions which needed to be __devinit and fix a bogus parameter
to AP startup that just so happened to work because the low virtual mapping of
memory was still established.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use para_fill instead of directly setting the APIC ops to the result of the
vmi_get_function call - this allows one to implement a VMI ROM without
implementing APIC functions, just using the native APIC functions.
While doing this, I realized that there is a lot more cleanup that should have
been done. Basically, we should never assume that the ROM implements a
specific set of functions, and always allow fallback to the native
implementation.
This is critical for future compatibility.
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
More goo from hrtimers integration. We do compile and run properly with NO_HZ
enabled. There was a period when we didn't because of a missing export, but
that was since fixed.
And with the clocksource code now firmly in place, we can get rid of code that
fixes up the wallclock, since this is done in the common infrastructure. This
actually fixes a timer bug as well, that was caused by do_settimeofday no
longer being callable with interrupts disabled due to the use of
on_each_cpu().
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The time_init_hook in paravirt-ops no longer functions in the correct manner
after the integration of the hrtimers code. The problem is that now the call
path for time initialization is:
time_init :
late_time_init = hpet_time_init;
late_time_init -> hpet_time_init:
setup_pit_timer (BAD)
do_time_init --> (via paravirt.h)
time_init_hook --> (via arch_hooks.h)
time_init_hook (in SUBARCH/setup.c)
If this isn't confusing enough, the paravirt case goes through an indirect
function pointer in the paravirt-ops table. The problem is, by the time the
paravirt hook is called, the pit timer is already enabled.
But paravirt guests have their own timer, and don't want to use the PIT.
Rather than intensify the struggle for power going on here, just make it all
nice and simple and just unconditionally do all timer setup in the
late_time_init hook. This also has the advantage of enabling timers in the
same place in all code paths, so everyone has the same bugs and we don't have
outliers who break other code because they turn on timer too early or too
late.
So the paravirt-ops time init function is now by default hpet_time_init, which
is the time init function used for native hardware. Paravirt guests have the
chance to override this when they setup the paravirt-ops table, and should
need no change.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Not respecting udelay causes problems with any virtual hardware that is passed
through to real hardware. This can be noticed by any device that interacts
with the real world in real time - like AP startup, which takes real time. Or
keyboard LEDs, which should blink in real-time. Or floppy drives, but only
when passed through to a real floppy controller on OSes which can't
sufficiently buffer the floppy commands to emulate a zero latency floppy. Or
IDE drives, when connecting to a physical CDROM.
This was mostly a hack to get the kernel to boot faster, but it introduced a
number of misvirtualization bugs, and Alan and Pavel argued pretty strongly
against it. We were the only client, and now want to clean up this cruft.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Provide a PT map hook for HIGHPTE kernels to designate where they are mapping
page tables. This information is required so the physical address of PTE
updates can be determined; otherwise, the mm layer would have to carry the
physical address all the way to each PTE modification callsite, which is even
more hideous that the macros required to provide the proper hooks.
So lets not mess up arch neutral code to achieve this, but keep the horror in
an #ifdef HIGHPTE in include/asm-i386/pgtable.h. I had to use macros here
because some types are not yet defined in all the include paths for this
header.
This patch is absolutely required for HIGHPTE kernels to operate properly with
VMI.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In order to share the common code in tsc.c which does CPU Khz calibration, we
need to make an accurate value of CPU speed available to the tsc.c code. This
value loses a lot of precision in a VM because of the timing differences with
real hardware, but we need it to be as precise as possible so the guest can
make accurate time calculations with the cycle counters.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The custom_sched_clock hook is broken. The result from sched_clock needs to
be in nanoseconds, not in CPU cycles. The TSC is insufficient for this
purpose, because TSC is poorly defined in a virtual environment, and mostly
represents real world time instead of scheduled process time (which can be
interrupted without notice when a virtual machine is descheduled).
To make the scheduler consistent, we must expose a different nature of time,
that is scheduled time. So deprecate this custom_sched_clock hack and turn it
into a paravirt-op, as it should have been all along. This allows the tsc.c
code which converts cycles to nanoseconds to be shared by all paravirt-ops
backends.
It is unfortunate to add a new paravirt-op, but this is a very distinct
abstraction which is clearly different for all virtual machine
implementations, and it gets rid of an ugly indirect function which I
ashamedly admit I hacked in to try to get this to work earlier, and then even
got in the wrong units.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Critical bugfixes for the VMI-Timer code.
1) Do not setup a one shot alarm if we are keeping the periodic alarm
armed. Additionally, since the periodic alarm can be run at a lower rate
than HZ, let's fixup the guard to the no-idle-hz mode appropriately. This
fixes the bug where the no-idle-hz mode might have a higher interrupt rate
than the non-idle case.
2) The interrupt handler can no longer adjust xtime due to nested lock
acquisition. Drop this. We don't need to check for wallclock time at
every tick, it can be done in userspace instead.
3) Add a bypass to disable noidle operation. This is useful as a last
minute workaround, or testing measure.
4) The code to skip the IO_APIC timer testing (no_timer_check) should be
conditional on IO_APIC, not SMP, since UP kernels can have this configured
in as well.
Signed-off-by: Dan Hecht <dhecht@vmware.com>
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When it goes to free1_out, dev->dma_mem has not been freed.
Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When removing set_native_irq I missed the fact that it was
called in a couple of places that were compiled even when
SMP support is disabled. And since the irq_desc[].affinity
field only exists in SMP things broke.
Thanks to Simon Arlott <simon@arlott.org> for spotting this.
There are a couple of ways to fix this but the simplest one
is to just remove the assignments. The affinity field is only
used to display a value to the user, and nothing on either i386
or x86_64 reads it or depends on it being any particlua value,
so skipping the assignment is safe. The assignment that
is being removed is just for the initial affinity value before
the user explicitly sets it. The irq_desc array initializes
this field to CPU_MASK_ALL so the field is initialized to
a reasonable value in the SMP case without being set.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit aeeddc1435, which was
half-baked and broken. It just resulted in compile errors, since
cpufreq_register_driver() still changes the 'driver_data' by setting
bits in the flags field. So claiming it is 'const' _really_ doesn't
work.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch replaces all instances of "set_native_irq_info(irq, mask)"
with "irq_desc[irq].affinity = mask". The latter form is clearer
uses fewer abstractions, and makes access to this field uniform
accross different architectures.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 2ff2d3d747.
Uwe Bugla reports that he cannot mount a floppy drive any more, and Jiri
Slaby bisected it down to this commit.
Benjamin LaHaise also points out that this is a big hot-path, and that
interrupt delivery while idle is very common and should not go through
all these expensive gyrations.
Fix up conflicts in arch/i386/kernel/apic.c and arch/i386/kernel/irq.c
due to other unrelated irq changes.
Cc: Stephane Eranian <eranian@hpl.hp.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Uwe Bugla <uwe.bugla@gmx.de>
Cc: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits)
Documentation/kernel-docs.txt update.
arch/cris: typo in KERN_INFO
Storage class should be before const qualifier
kernel/printk.c: comment fix
update I/O sched Kconfig help texts - CFQ is now default, not AS.
Remove duplicate listing of Cris arch from README
kbuild: more doc. cleanups
doc: make doc. for maxcpus= more visible
drivers/net/eexpress.c: remove duplicate comment
add a help text for BLK_DEV_GENERIC
correct a dead URL in the IP_MULTICAST help text
fix the BAYCOM_SER_HDX help text
fix SCSI_SCAN_ASYNC help text
trivial documentation patch for platform.txt
Fix typos concerning hierarchy
Fix comment typo "spin_lock_irqrestore".
Fix misspellings of "agressive".
drivers/scsi/a100u2w.c: trivial typo patch
Correct trivial typo in log2.h.
Remove useless FIND_FIRST_BIT() macro from cardbus.c.
...
Now that disable_irq() defaults to delayed-disable semantics, the IRQ_DISABLED
flag is not needed anymore.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Never mask interrupts immediately upon request. Disabling interrupts in
high-performance codepaths is rare, and on the other hand this change could
recover lost edges (or even other types of lost interrupts) by conservatively
only masking interrupts after they happen. (NOTE: with this change the
highlevel irq-disable code still soft-disables this IRQ line - and if such an
interrupt happens then the IRQ flow handler keeps the IRQ masked.)
Mark i8529A controllers as 'never loses an edge'.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In preparation for supporting generic timekeeping, this patch cleans up
x86-64's use of vxtime.hpet_address, changing it to just hpet_address as is
also used in i386. This is necessary since the vxtime structure will be going
away.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@muc.de>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The NMI watchdog implementation assumes that the local APIC timer interrupt is
happening. This assumption is not longer true when high resolution timers and
dynamic ticks come into play, as they may switch off the local APIC timer
completely. Take the PIT/HPET interrupts into account too, to avoid false
positives.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Cc: Zachary Amsden <zach@vmware.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rohit Seth <rohitseth@google.com>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The local apic timer calibration has two problem cases:
1. The calibration is based on readout of the PIT/HPET timer to detect the
wrap of the periodic tick. It happens that a box gets stuck in the
calibration loop due to a PIT with a broken readout function.
2. CoreDuo boxen show a sporadic PIT runs too slow defect, which results
in a wrong lapic calibration. The PIT goes back to normal operation once
the lapic timer is switched to periodic mode.
Both are existing and unfixed problems in the current upstream kernel and
prevent certain laptops and other systems from booting Linux.
Rework the code to address both problems:
- Make the calibration interrupt driven. This removes the wait_timer_tick
magic hackery from lapic.c and time_hpet.c. The clockevents framework
allows easy substitution of the global tick event handler for the
calibration. This is more accurate than monitoring jiffies. At this point
of the boot process, nothing disturbes the interrupt delivery, so the
results are very accurate.
- Verify the calibration against the PM timer, when available by using the
early access function. When the measured calibration period is outside of
an one percent window, then the lapic timer calibration is adjusted to the
pm timer result.
- Verify the calibration by running the lapic timer with the calibration
handler. Disable lapic timer in case of deviation.
This also removes the "synchronization" of the local apic timer to the global
tick. This synchronization never worked, as there is no way to synchronize
PIT(HPET) and local APIC timer. The synchronization by waiting for the tick
just alignes the local APIC timer for the first events, but later the events
drift away due to the different clocks. Removing the "sync" is just
randomizing the asynchronous behaviour at setup time.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Zachary Amsden <zach@vmware.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rohit Seth <rohitseth@google.com>
Cc: Andi Kleen <ak@suse.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add clockevent drivers for i386: lapic (local) and PIT/HPET (global). Update
the timer IRQ to call into the PIT/HPET driver's event handler and the
lapic-timer IRQ to call into the lapic clockevent driver. The assignement of
timer functionality is delegated to the core framework code and replaces the
compile and runtime evalution in do_timer_interrupt_hook()
Use the clockevents broadcast support and implement the lapic_broadcast
function for ACPI.
No changes to existing functionality.
[ kdump fix from Vivek Goyal <vgoyal@in.ibm.com> ]
[ fixes based on review feedback from Arjan van de Ven <arjan@infradead.org> ]
Cleanups-from: Adrian Bunk <bunk@stusta.de>
Build-fixes-from: Andrew Morton <akpm@osdl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The apic code is quite unstructured and missing a lot of comments.
- Restructure the code into helper functions, timer, setup/shutdown,
interrupt and power management blocks.
- Fixup comments.
- Namespace fixups
- Inline helpers for version and is_integrated
- Combine the ack_bad_irq functions
No functional changes.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Zachary Amsden <zach@vmware.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rohit Seth <rohitseth@google.com>
Cc: Andi Kleen <ak@suse.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Allow early access to the power management timer by exposing the verified read
function and providing a helper function which checks the pmtmr_ioport
variable and returns either the pm timer readout or 0 in case the pm timer is
not available.
Create a new header file and replace also the ifdef'ed extern definition in
arch/i386/kernel/acpi/boot.c
This is a preperatory patch for the rework of the local apic timer
calibration.
No functional changes.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The Geode can safely use the TSC for highres, since:
1) Does not support frequency scaling,
2) The TSC _does_ count when the CPU is halted. Furthermore, the Geode
supports a mode called "suspension on halt", where Suspend mode (which
interacts with the power management states) is entered. TSC counting
during suspend mode is controlled by bit 8 of the Bus Controller
Configuration Register #0 (thanks Tom!).
3) no SMP :)
Check if "RTSC counts during suspension" and remove the requirement for
verification, so the clocksource code can safely select it as an timesource
for the highres timers subsystem.
Signed-off-by: Marcelo Tosatti <marcelo@kvack.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The TSC needs to be verified against another clocksource. Instead of using
hardwired assumptions of available hardware, provide a generic verification
mechanism. The verification uses the best available clocksource and handles
the usability for high resolution timers / dynticks of the clocksource which
needs to be verified.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The clocksource code allows direct updates of the rating of a given
clocksource now. Change TSC unstable tracking to use this interface and
remove the update callback.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Using a flag filed allows to encode more than one information into a variable.
Preparatory patch for the generic clocksource verification.
[mingo@elte.hu: convert vmitime.c to the new clocksource flag]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
make the TSC synchronization code more robust, and unify it between x86_64 and
i386.
The biggest change is the removal of the 'fix up TSCs' code on x86_64 and
i386, in some rare cases it was /causing/ time-warps on SMP systems.
The new code only checks for TSC asynchronity - and if it can prove a
time-warp (if it can observe the TSC going backwards when going from one CPU
to another within a critical section), then the TSC clock-source is turned
off.
The TSC synchronization-checking code also got moved into a separate file.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Enqueue clocksources in rating order to make selection of the clocksource
easier. Also check the match with an user override at enqueue time.
Preparatory patch for the generic clocksource verification.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The delayed work code in arch/i386/kernel/tsc.c is an unused leftover of the
GTOD conversion. Remove it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a flag so we can prevent the irq balancing of an interrupt. Move the
bits, so we have room for more :)
Necessary for the ability to setup clocksources more flexible (e.g. use the
different HPET channels per CPU)
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Start using v2 version of Longhaul when available. It provides
voltage scaling and can use ACPI C3 state. That's curious. CPU
will not change frequency on ACPI C3 when v1 is in use, but it will
when v2 is used. Driver will return max frequency all the time if
this isn't true for all processors. There is strange thing with
mobile voltage. Looks like only Nehemiah (C3-M) supports it.
Earlier processors have different mobile VRM (in docs), but I can't
find any which is using it. Looks like all are using VRM 8.5. So
fail for non Nehemiah with mobile VRM.
Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
Solution for small, but nasty bug: access beyond end of f_table for C7 brand.
Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
After updating several machines to 2.6.20, I can't boot anymore the single
one of them that supports the NX bit and is configured as a 32-bit system.
My understanding is that the VDSO changes in 2.6.20-rc7 were not fully
cooked, in that with that config option enabled VDSO_SYM(x) now equals
x, meaning that an address in the fixmap area is now being passed to
apps via AT_SYSINFO. However, the page is mapped with PAGE_READONLY
rather than PAGE_READONLY_EXEC.
I'm not certain whether having app code go through the fixmap area is
intended, but in case it is here is the simple patch that makes things work
again.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
When I implemented the DECLARE_PER_CPU(var) macros, I was careful that
people couldn't use "var" in a non-percpu context, by prepending
percpu__. I never considered that this would allow them to overload
the same name for a per-cpu and a non-percpu variable.
It is only one of many horrors in the i386 boot code, but let's rename
the non-perpcu cpu_gdt_descr to early_gdt_descr (not boot_gdt_descr,
that's something else...)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
===================================================================
The current code simply calls "start_kernel" directly if we're under a
hypervisor and no paravirt_ops backend wants us, because paravirt.c
registers that as a backend.
This was always a vain hope; start_kernel won't get far without setup.
It's also impossible for paravirt_ops backends which don't sit in the
arch/i386/kernel directory: they can't link before paravirt.o anyway.
Keep it simple: if we pass all the registered paravirt probes, BUG().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
The old Cyrix 5520 CPU detection code relied upon the PCI layer setup being
done earlier than the CPU setup, which is no longer true. Fortunately we
know that if the processor is a MediaGX we can do type 1 pci config
accesses to check the companion chip. We thus do those directly and from
this find the 5520 and implement the workarounds for the timer problem
Original report from takada@mbf.nifty.com, I sent a proposed patch which
Takara then corrected, tested and sent back to the list on 10th January.
Submitting for merging as it seems to have been missed
AK: Changed to use pci-direct.h and fix warning for !CONFIG_PCI (later
AK: originally from akpm)
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: <takada@mbf.nifty.com>
Cc: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fix bogus warning
linux/arch/i386/kernel/cpu/transmeta.c:12: warning: ‘cpu_freq’ may be used uninitialized in this function
Signed-off-by: Andi Kleen <ak@suse.de>
Fix bogus gcc warning
linux/arch/i386/kernel/microcode.c:387: warning: ‘new_mc’ may be used uninitialized in this function
Signed-off-by: Andi Kleen <ak@suse.de>
Just various new acronyms. The new popcnt bit is in the middle
of Intel space. This looks a little weird, but I've been assured
it's ok.
Also I fixed RDTSCP for i386 which was at the wrong place.
For i386 and x86-64.
Signed-off-by: Andi Kleen <ak@suse.de>
Original code doesn't write back to CCR4 register. This patch reflects a
value of a register.
Cc: Jordan Crouse <jordan.crouse@amd.com>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Sometimes developers need to see more object code in an oops report,
e.g. when kernel may be corrupted at runtime.
Add the "code_bytes" option for this.
Signed-off-by: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Annotate i386/kernel/entry.S with END/ENDPROC to assist disassemblers and
other analysis tools.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
I hope to support "classic" MediaGXm in kernel.
The DIR1 register of MediaGXm( or Geode) shows the following values for
identify CPU. For example, My MediaGXm shows 0x42.
We can read National Semiconductor's datasheet without any NDAs.
http://www.national.com/pf/GX/GXLV.html
from datasheets:
DIR1
0x30 - 0x33 GXm rev. 1.0 - 2.3
0x34 - 0x4f GXm rev. 2.4 - 3.x
0x5x GXm rev. 5.0 - 5.4
0x6x GXLV
0x7x (unknow)
0x8x Gx1
In nsc driver of X, accept 0x30 through 0x82. What will 0x7x mean?
Cc: Jordan Crouse <jordan.crouse@amd.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
All Transmeta CPUs ever produced have constant-rate TSCs.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
During kernel bootup, a new T60 laptop (CoreDuo, 32-bit) hangs about
10%-20% of the time in acpi_init():
Calling initcall 0xc055ce1a: topology_init+0x0/0x2f()
Calling initcall 0xc055d75e: mtrr_init_finialize+0x0/0x2c()
Calling initcall 0xc05664f3: param_sysfs_init+0x0/0x175()
Calling initcall 0xc014cb65: pm_sysrq_init+0x0/0x17()
Calling initcall 0xc0569f99: init_bio+0x0/0xf4()
Calling initcall 0xc056b865: genhd_device_init+0x0/0x50()
Calling initcall 0xc056c4bd: fbmem_init+0x0/0x87()
Calling initcall 0xc056dd74: acpi_init+0x0/0x1ee()
It's a hard hang that not even an NMI could punch through! Frustratingly,
adding printks or function tracing to the ACPI code made the hangs go away
...
After some time an additional detail emerged: disabling the NMI watchdog
made these occasional hangs go away.
So i spent the better part of today trying to debug this and trying out
various theories when i finally found the likely reason for the hang: if
acpi_ns_initialize_devices() executes an _INI AML method and an NMI
happens to hit that AML execution in the wrong moment, the machine would
hang. (my theory is that this must be some sort of chipset setup method
doing stores to chipset mmio registers?)
Unfortunately given the characteristics of the hang it was sheer
impossible to figure out which of the numerous AML methods is impacted
by this problem.
As a workaround i wrote an interface to disable chipset-based NMIs while
executing _INI sections - and indeed this fixed the hang. I did a
boot-loop of 100 separate reboots and none hung - while without the patch
it would hang every 5-10 attempts. Out of caution i did not touch the
nmi_watchdog=2 case (it's not related to the chipset anyway and didnt
hang).
I implemented this for both x86_64 and i686, tested the i686 laptop both
with nmi_watchdog=1 [which triggered the hangs] and nmi_watchdog=2, and
tested an Athlon64 box with the 64-bit kernel as well. Everything builds
and works with the patch applied.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mtrr: fix size_or_mask and size_and_mask
This fixes two bugs in /proc/mtrr interface:
o If physical address size crosses the 44 bit boundary
size_or_mask is evaluated wrong.
o size_and_mask limits width of physical base
address for an MTRR to be less than 44 bits.
TBD: later patch had one more change, but I think that was bogus.
TBD: need to double check
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Use adding __init to romsignature() (it's only called from probe_roms()
which is itself __init) as an excuse to submit a pedantic cleanup.
Signed-off-by: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Clean up sched_clock() on i686: it will use the TSC if available and falls
back to jiffies only if the user asked for it to be disabled via notsc or
the CPU calibration code didnt figure out the right cpu_khz.
This generally makes the scheduler timestamps more finegrained, on all
hardware. (the current scheduler is pretty resistant against asynchronous
sched_clock() values on different CPUs, it will allow at most up to a jiffy
of jitter.)
Also simplify sched_clock()'s check for TSC availability: propagate the
desire and ability to use the TSC into the tsc_disable flag, previously
this flag only indicated whether the notsc option was passed. This makes
the rare low-res sched_clock() codepath a single branch off a read-mostly
flag.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Add a notifier mechanism to the low level idle loop. You can register a
callback function which gets invoked on entry and exit from the low level idle
loop. The low level idle loop is defined as the polling loop, low-power call,
or the mwait instruction. Interrupts processed by the idle thread are not
considered part of the low level loop.
The notifier can be used to measure precisely how much is spent in useless
execution (or low power mode). The perfmon subsystem uses it to turn on/off
monitoring.
Signed-off-by: stephane eranian <eranian@hpl.hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Every file should include the headers containing the prototypes for
it's global functions.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
o Entry startup_32 was in .text section but it was accessing some init
data too and it prompts MODPOST to generate compilation warnings.
WARNING: vmlinux - Section mismatch: reference to .init.data:boot_params from
.text between '_text' (at offset 0xc0100029) and 'startup_32_smp'
WARNING: vmlinux - Section mismatch: reference to .init.data:boot_params from
.text between '_text' (at offset 0xc0100037) and 'startup_32_smp'
WARNING: vmlinux - Section mismatch: reference to
.init.data:init_pg_tables_end from .text between '_text' (at offset
0xc0100099) and 'startup_32_smp'
o Can't move startup_32 to .init.text as this entry point has to be at the
start of bzImage. Hence moved startup_32 to a new section .text.head and
instructed MODPOST to not to generate warnings if init data is being
accessed from .text.head section. This code has been audited.
o SMP boot up code (startup_32_smp) can go into .init.text if CPU hotplug
is not supported. Otherwise it generates more warnings
WARNING: vmlinux - Section mismatch: reference to .init.data:new_cpu_data from
.text between 'checkCPUtype' (at offset 0xc0100126) and 'is486'
WARNING: vmlinux - Section mismatch: reference to .init.data:new_cpu_data from
.text between 'checkCPUtype' (at offset 0xc0100130) and 'is486'
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Because timer code moves around, and we might eventually move our init to a
late_time_init hook, save and restore IRQs around this code because it is
definitely not interrupt safe.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Kprobes bugfix for paravirt compatibility - RPL on the CS when inserting
BPs must match running kernel.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
CC: Eric Biederman <ebiederm@xmission.com>
Profile_pc was broken when using paravirtualization because the
assumption the kernel was running at CPL 0 was violated, causing
bad logic to read a random value off the stack.
The only way to be in kernel lock functions is to be in kernel
code, so validate that assumption explicitly by checking the CS
value. We don't want to be fooled by BIOS / APM segments and
try to read those stacks, so only match KERNEL_CS.
I moved some stuff in segment.h to make it prettier.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
VMI timer code. It works by taking over the local APIC clock when APIC is
configured, which requires a couple hooks into the APIC code. The backend
timer code could be commonized into the timer infrastructure, but there are
some pieces missing (stolen time, in particular), and the exact semantics of
when to do accounting for NO_IDLE need to be shared between different
hypervisors as well. So for now, VMI timer is a separate module.
[Adrian Bunk: cleanups]
Subject: VMI timer patches
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Add VMI SMP boot hook. We emulate a regular boot sequence and use the same
APIC IPI initiation, we just poke magic values to load into the CPU state when
the startup IPI is received, rather than having to jump through a real mode
trampoline.
This is all that was needed to get SMP to work.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
I found a clever way to make the extra IOPL switching invisible to
non-paravirt compiles - since kernel_rpl is statically defined to be zero
there, and only non-zero rpl kernel have a problem restoring IOPL, as popf
does not restore IOPL flags unless run at CPL-0.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
The VMI ROM has a mode where hypercalls can be queued and batched. This turns
out to be a significant win during context switch, but must be done at a
specific point before side effects to CPU state are visible to subsequent
instructions. This is similar to the MMU batching hooks already provided.
The same hooks could be used by the Xen backend to implement a context switch
multicall.
To explain a bit more about lazy modes in the paravirt patches, basically, the
idea is that only one of lazy CPU or MMU mode can be active at any given time.
Lazy MMU mode is similar to this lazy CPU mode, and allows for batching of
multiple PTE updates (say, inside a remap loop), but to avoid keeping some
kind of state machine about when to flush cpu or mmu updates, we just allow
one or the other to be active. Although there is no real reason a more
comprehensive scheme could not be implemented, there is also no demonstrated
need for this extra complexity.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
The VMI backend uses explicit page type notification to track shadow page
tables. The allocation of page table roots is especially tricky. We need to
clone the root for non-PAE mode while it is protected under the pgd lock to
correctly copy the shadow.
We don't need to allocate pgds in PAE mode, (PDPs in Intel terminology) as
they only have 4 entries, and are cached entirely by the processor, which
makes shadowing them rather simple.
For base page table level allocation, pmd_populate provides the exact hook
point we need. Also, we need to allocate pages when splitting a large page,
and we must release pages before returning the page to any free pool.
Despite being required with these slightly odd semantics for VMI, Xen also
uses these hooks to determine the exact moment when page tables are created or
released.
AK: All nops for other architectures
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Every file should #include the headers containing the prototypes for
its global functions.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
The "fasteoi" IRQ handler is named "fasteio" incorrectly. This is a fix.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Convert the PDA code to use %fs rather than %gs as the segment for
per-processor data. This is because some processors show a small but
measurable performance gain for reloading a NULL segment selector (as %fs
generally is in user-space) versus a non-NULL one (as %gs generally is).
On modern processors the difference is very small, perhaps undetectable.
Some old AMD "K6 3D+" processors are noticably slower when %fs is used
rather than %gs; I have no idea why this might be, but I think they're
sufficiently rare that it doesn't matter much.
This patch also fixes the math emulator, which had not been adjusted to
match the changed struct pt_regs.
[frederik.deweerdt@gmail.com: fixit with gdb]
[mingo@elte.hu: Fix KVM too]
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Ian Campbell <Ian.Campbell@XenSource.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Zachary Amsden <zach@vmware.com>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Many struct file_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.
[akpm@osdl.org: sparc64 fix]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Update all arch/*/kernel/vmlinux.lds.S to not include space for initramfs
when CONFIG_BLK_DEV_INITRAMFS is not selected. This saves another 4 kbytes
on most platfoms (some reserve PAGE_SIZE for initramfs).
Signed-off-by: Jean-Paul Saman <jean-paul.saman@nxp.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This change should make Longhaul more compatible with
both ver. 2 and Powersaver processors. Voltage transitions
will be done before or after frequency transition. That depends
on direction of change. I don't know how to force conservative
governor when voltage scaling is enabled, so there is only
a warning for user. Minimal voltage is calculated in different
way now because in this way more power is saved at lower
multipliers.
Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
This is driver for Enhanced Powersaver which is present in VIA C7
processors. Beta tested by Jorgen (jorgen (at) greven dot dk).
Thanks! Based on documentation provided by Dave Jones (Thanks!)
and C7 Eden datasheet available from www.via.com.tw. Looks like all
these C7 Eden CPU's don't have P-states in BIOS. I know that 2
p-states is low, but Jorgen finds it usefull anyway because board
is passive cooled.
There are 3 different types of C7 processors (called brands):
0. C7-M - these processors can set any maultiplier between min and
max, any voltage between min and max.
1. C7 - only min and max states are supported. Voltage is different
for min and max states.
2. Eden - only min and max states are supported. Looks like this
brand can only change multiplier. Voltage seems to be the same for
min and max frequency.
Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>