Commit graph

2450 commits

Author SHA1 Message Date
Linus Torvalds
42a886af72 x86-64: Fix "bytes left to copy" return value for copy_from_user()
Most users by far do not care about the exact return value (they only
really care about whether the copy succeeded in its entirety or not),
but a few special core routines actually care deeply about exactly how
many bytes were copied from user space.

And the unrolled versions of the x86-64 user copy routines would
sometimes report that it had copied more bytes than it actually had.

Very few uses actually have partial copies to begin with, but to make
this bug even harder to trigger, most x86 CPU's use the "rep string"
instructions for normal user copies, and that version didn't have this
issue.

To make it even harder to hit, the one user of this that really cared
about the return value (and used the uncached version of the copy that
doesn't use the "rep string" instructions) was the generic write
routine, which pre-populated its source, once more hiding the problem by
avoiding the exception case that triggers the bug.

In other words, very special thanks to Bron Gondwana who not only
triggered this, but created a test-program to show it, and bisected the
behavior down to commit 08291429cf ("mm:
fix pagecache write deadlocks") which changed the access pattern just
enough that you can now trigger it with 'writev()' with multiple
iovec's.

That commit itself was not the cause of the bug, it just allowed all the
stars to align just right that you could trigger the problem.

[ Side note: this is just the minimal fix to make the copy routines
  (with __copy_from_user_inatomic_nocache as the particular version that
  was involved in showing this) have the right return values.

  We really should improve on the exceptional case further - to make the
  copy do a byte-accurate copy up to the exact page limit that causes it
  to fail.  As it is, the callers have to do extra work to handle the
  limit case gracefully. ]

Reported-by: Bron Gondwana <brong@fastmail.fm>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

 (which didn't have this problem), and since
most users that do the carethis was very hard to trigger, but
2008-06-17 17:47:50 -07:00
Linus Torvalds
0269c5c6d9 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI: fixup write combine comment in pci_mmap_resource
  x86: PAT export resource_wc in pci sysfs
  x86, pci-dma.c: don't always add __GFP_NORETRY to gfp
  suspend-vs-iommu: prevent suspend if we could not resume
  x86: pci-dma.c: use __GFP_NO_OOM instead of __GFP_NORETRY
  pci, x86: add workaround for bug in ASUS A7V600 BIOS (rev 1005)
  PCI: use dev_to_node in pci_call_probe
  PCI: Correct last two HP entries in the bfsort whitelist
2008-06-14 13:32:56 -07:00
Stas Sergeev
1da2e3d679 provide rtc_cmos platform device
Recently (around 2.6.25) I've noticed that RTC no longer works for me.  It
turned out this is because I use pnpacpi=off kernel option to work around
the parport_pc bugs.  I always did so, but RTC used to work fine in the
past, and now it have regressed.

The patch fixes the problem by creating the platform device for the RTC
when PNP is disabled.  This may also help running the PNP-enabled kernel
on an older PCs.

Signed-off-by: Stas Sergeev <stsp@aknet.ru>
Cc: David Brownell <david-b@pacbell.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Adam Belay <ambx1@neo.rr.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-12 18:05:42 -07:00
Jesse Barnes
883eed1b3e Merge branch 'pci-for-jesse' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip into for-linus 2008-06-12 13:51:05 -07:00
Linus Torvalds
cbfa66b88d Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: fix pointer type warning in arch/x86/mm/init_64.c:early_memtest
  x86, lockdep: fix "WARNING: at kernel/lockdep.c:2658 check_flags+0x4c/0x128()"
  x86: fix an incompatible pointer type warning on 64-bit compilations
  x86: fix lockdep warning during suspend-to-ram
  x86: fix unused variable 'loops' warning in arch/x86/boot/a20.c
  Revert "x86: fix ioapic bug again"
  x86: fix asm warning in head_32.S
  x86: fix endless page faults in mount_block_root for Linux 2.6
  geode: fix modular build
2008-06-12 12:55:32 -07:00
Kevin Winchester
f8a45704f5 x86: fix pointer type warning in arch/x86/mm/init_64.c:early_memtest
Changed the call to find_e820_area_size to pass u64 instead of unsigned long.

Signed-off-by: Kevin Winchester <kjwinchester@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 21:36:23 +02:00
Vegard Nossum
4461145ef1 x86, lockdep: fix "WARNING: at kernel/lockdep.c:2658 check_flags+0x4c/0x128()"
Alessandro Suardi reported:
> Recently upgraded my FC6 desktop to Fedora 9; with the
>  latest nautilus RPM updates my VNC session went nuts
>  with nautilus pegging the CPU for everything that breathed.
>
> I now reverted to an earlier nautilus package, but during
>  the peak CPU period my kernel spat this:
>
> [314185.623294] ------------[ cut here ]------------
> [314185.623414] WARNING: at kernel/lockdep.c:2658 check_flags+0x4c/0x128()
> [314185.623514] Modules linked in: iptable_filter ip_tables x_tables
> sunrpc ipv6 fuse snd_via82xx snd_ac97_codec ac97_bus snd_mpu401_uart
> snd_rawmidi via686a hwmon parport_pc sg parport uhci_hcd ehci_hcd
> [314185.623924] Pid: 12314, comm: nautilus Not tainted 2.6.26-rc5-git2 #4
> [314185.624021]  [<c0115b95>] warn_on_slowpath+0x41/0x7b
> [314185.624021]  [<c010de70>] ? do_page_fault+0x2c1/0x5fd
> [314185.624021]  [<c0128396>] ? up_read+0x16/0x28
> [314185.624021]  [<c010de70>] ? do_page_fault+0x2c1/0x5fd
> [314185.624021]  [<c012fa33>] ? __lock_acquire+0xbb4/0xbc3
> [314185.624021]  [<c012d0a0>] check_flags+0x4c/0x128
> [314185.624021]  [<c012fa73>] lock_acquire+0x31/0x7d
> [314185.624021]  [<c0128cf6>] __atomic_notifier_call_chain+0x30/0x80
> [314185.624021]  [<c0128cc6>] ? __atomic_notifier_call_chain+0x0/0x80
> [314185.624021]  [<c0128d52>] atomic_notifier_call_chain+0xc/0xe
> [314185.624021]  [<c0128d81>] notify_die+0x2d/0x2f
> [314185.624021]  [<c01043b0>] do_int3+0x1f/0x4d
> [314185.624021]  [<c02f2d3b>] int3+0x27/0x2c
> [314185.624021]  =======================
> [314185.624021] ---[ end trace 1923f65a2d7bb246 ]---
> [314185.624021] possible reason: unannotated irqs-off.
> [314185.624021] irq event stamp: 488879
> [314185.624021] hardirqs last  enabled at (488879): [<c0102d67>]
> restore_nocheck+0x12/0x15
> [314185.624021] hardirqs last disabled at (488878): [<c0102dca>]
> work_resched+0x19/0x30
> [314185.624021] softirqs last  enabled at (488876): [<c011a1ba>]
> __do_softirq+0xa6/0xac
> [314185.624021] softirqs last disabled at (488865): [<c010476e>]
> do_softirq+0x57/0xa6
>
> I didn't seem to find it with some googling, so here it is.
>
> I was incidentally ltracing that process to try and find out
>  what was gulping down that much CPU (sorry, no idea
>  whether ltrace and the WARNING happened at the same
>  time or which came first) and:

Yeah, this is extremely likely to be the source of the warning.

The warning should be harmless, however.

> Box is my trusty noname K7-800, 512MB RAM; if there's
>  anything else useful I might be able to provide, just ask.

It would be interesting to see where the int3 comes from.  Too bad,
lockdep doesn't provide the register dump. The stacktrace also doesn't
go further than the int3(), I wonder if this int3 came from userspace?
The ltrace readme says "software breakpoints, like gdb", so I guess
this is the case. Yep, seems like it.

This looks relevant:

| commit fb1dac909d
| Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
| Date:   Wed Jan 16 09:51:59 2008 +0100
|
|     lockdep: more hardirq annotations for notify_die()

I'm attaching a similarly-looking patch for this case (DO_VM86_ERROR),
though I suspect it might be missing for the other cases
(DO_ERROR/DO_ERROR_INFO) as well.

Reported-by: Alessandro Suardi <alessandro.suardi@gmail.com>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 21:27:19 +02:00
David Howells
eb53e9f3ea x86: fix an incompatible pointer type warning on 64-bit compilations
Fix an incompatible pointer type warning on x86_64 compilations.
early_memtest() is passing a u64* to find_e820_area_size() which is expecting
an unsigned long.  Change t_start and t_size to unsigned long as those are
also 64-bit types on x88_64.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 21:27:15 +02:00
Peter Zijlstra
e32e58a96d x86: fix lockdep warning during suspend-to-ram
Andrew Morton wrote:

> I've been seeing the below for a long time during suspend-to-ram on the Vaio.
>
>
> PM: Syncing filesystems ... done.
> PM: Preparing system for mem sleep
> Freezing user space processes ... <4>------------[ cut here ]------------
> WARNING: at kernel/lockdep.c:2658 check_flags+0x4c/0x127()
> Modules linked in: i915 drm ipw2200 sonypi ipv6 autofs4 hidp l2cap bluetooth sunrpc nf_conntrack_netbios_ns ipt_REJECT nf_conntrack_ipv4 xt_state nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables acpi_cpufreq nvram ohci1394 ieee1394 ehci_hcd uhci_hcd sg joydev snd_hda_intel snd_seq_dummy sr_mod snd_seq_oss cdrom snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss ieee80211 pcspkr ieee80211_crypt snd_pcm i2c_i801 snd_timer i2c_core ide_pci_generic piix snd soundcore snd_page_alloc button ext3 jbd ide_disk ide_core [last unloaded: ipw2200]
> Pid: 3250, comm: zsh Not tainted 2.6.26-rc5 #1
>  [<c011c5f5>] warn_on_slowpath+0x41/0x6d
>  [<c01080e6>] ? native_sched_clock+0x82/0x96
>  [<c013789c>] ? mark_held_locks+0x41/0x5c
>  [<c0315688>] ? _spin_unlock_irqrestore+0x36/0x58
>  [<c0137a29>] ? trace_hardirqs_on+0xe6/0x10d
>  [<c0138637>] ? __lock_acquire+0xae3/0xb2b
>  [<c0313413>] ? schedule+0x39b/0x3b4
>  [<c0135596>] check_flags+0x4c/0x127
>  [<c01386b9>] lock_acquire+0x3a/0x86
>  [<c0315075>] _spin_lock+0x26/0x53
>  [<c0140660>] ? refrigerator+0x13/0xc3
>  [<c0140660>] refrigerator+0x13/0xc3
>  [<c012684a>] get_signal_to_deliver+0x3c/0x31e
>  [<c0102fe7>] do_notify_resume+0x91/0x6ee
>  [<c01359fd>] ? lock_release_holdtime+0x50/0x56
>  [<c0315688>] ? _spin_unlock_irqrestore+0x36/0x58
>  [<c0235d24>] ? read_chan+0x0/0x58c
>  [<c0137a29>] ? trace_hardirqs_on+0xe6/0x10d
>  [<c0315694>] ? _spin_unlock_irqrestore+0x42/0x58
>  [<c0230afa>] ? tty_ldisc_deref+0x5c/0x63
>  [<c0233104>] ? tty_read+0x66/0x98
>  [<c014b3f0>] ? audit_syscall_exit+0x2aa/0x2c5
>  [<c0109430>] ? do_syscall_trace+0x6b/0x16f
>  [<c0103a9c>] work_notifysig+0x13/0x1b
>  =======================
> ---[ end trace 25b49fe59a25afa5 ]---
> possible reason: unannotated irqs-off.
> irq event stamp: 58919
> hardirqs last  enabled at (58919): [<c0103afd>] syscall_exit_work+0x11/0x26

Joy - I so love entry.S

Best I can make of it:

syscall_exit_work
  resume_userspace
    DISABLE_INTERRUPTS
    (no TRACE_IRQS_OFF)
      work_pending
        work_notifysig
          do_notify_resume()
            do_signal()
              get_signal_to_deliver()
                try_to_freeze()
                  refrigerator()
                    task_lock() -> check_flags() -> BANG

The normal path is:

syscall_exit_work
  resume_userspace
    DISABLE_INTERRUPTS
    restore_all
      TRACE_IRQS_IRET
      iret

No idea why that would not warn..

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 21:27:09 +02:00
Manish Katiyar
52aaa12fbe x86: fix unused variable 'loops' warning in arch/x86/boot/a20.c
Following patch fixes the below warning message :
arch/x86/boot/a20.c:118: warning: unused variable 'loops'

Signed-off-by : Manish Katiyar <mkatiyar@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 21:27:05 +02:00
Ingo Molnar
0b6a39f7eb Revert "x86: fix ioapic bug again"
This reverts commit 6e908947b4.

Németh Márton reported:

| there is a problem in 2.6.26-rc3 which was not there in case of
| 2.6.25: the CPU wakes up ~90,000 times per sec instead of ~60 per sec.
|
| I also "git bisected" the problem, the result is:
|
| 6e908947b4 is first bad commit
| commit 6e908947b4
| Author: Ingo Molnar <mingo@elte.hu>
| Date:   Fri Mar 21 14:32:36 2008 +0100
|
|     x86: fix ioapic bug again

the original problem is fixed by Maciej W. Rozycki in the tip/x86/apic
branch (confirmed by Márton), but those changes are too intrusive for
v2.6.26 so we'll go for the less intrusive (repeated) revert now.

Reported-and-bisected-by: Németh Márton <nm127@freemail.hu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 21:26:28 +02:00
Joe Korty
86b2b70e15 x86: fix asm warning in head_32.S
On Mon, May 19, 2008 at 04:10:02PM -0700, Linus Torvalds wrote:
> It also causes these warnings on 32-bit PAE:
>
> 	  AS      arch/x86/kernel/head_32.o
> 	arch/x86/kernel/head_32.S: Assembler messages:
> 	arch/x86/kernel/head_32.S:225: Warning: left operand is a bignum; integer 0 assumed
> 	arch/x86/kernel/head_32.S:609: Warning: left operand is a bignum; integer 0 assumed
>
> and I do not see why (the end result seems to be identical).

Fix head_32.S gcc bignum warnings when CONFIG_PAE=y.

    arch/x86/kernel/head_32.S: Assembler messages:
    arch/x86/kernel/head_32.S:225: Warning: left operand is a bignum; integer 0 assumed
    arch/x86/kernel/head_32.S:609: Warning: left operand is a bignum; integer 0 assumed

The assembler was stumbling over the 64-bit constant 0x100000000 in the
KPMDS #define.

Testing: a cmp(1) on head_32.o before and after shows the binary is unchanged.

Signed-off-by: Joe Korty <joe.korty@ccur.com
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Theodore Tso <tytso@mit.edu>
Cc: Gabriel C <nix.or.die@googlemail.com>
Cc: Keith Packard <keithp@keithp.com>
Cc: "Pallipadi Venkatesh" <venkatesh.pallipadi@intel.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: "Siddha Suresh B" <suresh.b.siddha@intel.com>
Cc: bugme-daemon@bugzilla.kernel.org
Cc: airlied@linux.ie
Cc: "Barnes Jesse" <jesse.barnes@intel.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 21:26:12 +02:00
Henry Nestler
b29c701dea x86: fix endless page faults in mount_block_root for Linux 2.6
Page faults in kernel address space between PAGE_OFFSET up to
VMALLOC_START should not try to map as vmalloc.

Fix rarely endless page faults inside mount_block_root for root
filesystem at boot time.

All 32bit kernels up to 2.6.25 can fail into this hole.
I can not present this under native linux kernel. I see, that the 64bit
has fixed the problem. I copied the same lines into 32bit part.

Recorded debugs are from coLinux kernel 2.6.22.18 (virtualisation):
http://www.henrynestler.com/colinux/testing/pfn-check-0.7.3/20080410-antinx/bug16-recursive-page-fault-endless.txt
The physicaly memory was trimmed down to 192MB to better catch the bug.
More memory gets the bug more rarely.

Details, how every x86 32bit system can fail:

Start from "mount_block_root",
http://lxr.linux.no/linux/init/do_mounts.c#L297
There the variable "fs_names" got one memory page with 4096 bytes.
Variable "p" walks through the existing file system types. The first
string is no problem.
But, with the second loop in mount_block_root the offset of "p" is not
at beginning of page, the offset is for example +9, if "reiserfs" is the
first in list.
Than calls do_mount_root, and lands in sys_mount.
Remember: Variable "type_page" contains now "fs_type+9" and not contains
a full page.
The sys_mount copies 4096 bytes with function "exact_copy_from_user()":
http://lxr.linux.no/linux/fs/namespace.c#L1540

Mostly exist pages after the buffer "fs_names+4096+9" and the page fault
handler was not called. No problem.

In the case, if the page after "fs_names+4096" is not mapped, the page
fault handler was called from http://lxr.linux.no/linux/fs/namespace.c#L1320

The do_page_fault gots an address 0xc03b4000.
It's kernel address, address >= TASK_SIZE, but not from vmalloc! It's
from "__getname()" alias "kmem_cache_alloc".
The "error_code" is 0. "vmalloc_fault" will be call:
http://lxr.linux.no/linux/arch/i386/mm/fault.c#L332

"vmalloc_fault" tryed to find the physical page for a non existing
virtual memory area. The macro "pte_present" in vmalloc_fault()
got a next page fault for 0xc0000ed0 at:
http://lxr.linux.no/linux/arch/i386/mm/fault.c#L282

No PTE exist for such virtual address. The page fault handler was trying
to sync the physical page for the PTE lockup.

This called vmalloc_fault() again for address 0xc000000, and that also
was not existing. The endless began...

In normal case the cpu would still loop with disabled interrrupts. Under
coLinux this was catched by a stack overflow inside printk debugs.

Signed-off-by: Henry Nestler <henry.nestler@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-12 21:26:07 +02:00
Ingo Molnar
3703f39965 geode: fix modular build
-tip testing found this build bug:

 MODPOST 331 modules
 ERROR: "geode_mfgpt_toggle_event" [drivers/watchdog/geodewdt.ko] undefined!
 ERROR: "geode_mfgpt_alloc_timer" [drivers/watchdog/geodewdt.ko] undefined!
 make[1]: *** [__modpost] Error 1
 make: *** [modules] Error 2

with this config:

  http://redhat.com/~mingo/misc/config-Wed_Jun__4_18_01_59_CEST_2008.bad

export those symbols.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 21:25:51 +02:00
Linus Torvalds
dc10885d68 Merge branch 'core/iter-div' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core/iter-div' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  always_inline timespec_add_ns
  add an inlined version of iter_div_u64_rem
  common implementation of iterative div/mod
2008-06-12 07:47:44 -07:00
Jeremy Fitzhardinge
f595ec964d common implementation of iterative div/mod
We have a few instances of the open-coded iterative div/mod loop, used
when we don't expcet the dividend to be much bigger than the divisor.
Unfortunately modern gcc's have the tendency to strength "reduce" this
into a full mod operation, which isn't necessarily any faster, and
even if it were, doesn't exist if gcc implements it in libgcc.

The workaround is to put a dummy asm statement in the loop to prevent
gcc from performing the transformation.

This patch creates a single implementation of this loop, and uses it
to replace the open-coded versions I know about.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Robert Hancock <hancockr@shaw.ca>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-12 10:47:56 +02:00
Linus Torvalds
da50ccc6a0 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (23 commits)
  ACPICA: fix stray va_end() caused by mis-merge
  ACPI: Reject below-freezing temperatures as invalid critical temperatures
  ACPICA: Fix for access to deleted object <regression>
  ACPICA: Fix to make _SST method optional
  ACPICA: Fix for Load operator, load table at the namespace root
  ACPICA: Ignore ACPI table signature for Load() operator
  ACPICA: Fix to allow zero-length ASL field declarations
  ACPI: use memory_read_from_buffer()
  bay: exit if notify handler cannot be installed
  dock.c remove trailing printk whitespace
  proper prototype for acpi_processor_tstate_has_changed()
  ACPI: handle invalid ACPI SLIT table
  PNPACPI: use _CRS IRQ descriptor length for _SRS
  pnpacpi: fix shareable IRQ encode/decode
  pnpacpi: fix IRQ flag decoding
  MAINTAINERS: update ACPI homepage
  ACPI 2.6.26-rc2: Add missing newline to DSDT/SSDT warning message
  ACPI: EC: Use msleep instead of udelay while waiting for event.
  thinkpad-acpi: fix LED handling on older ThinkPads
  thinkpad-acpi: fix initialization error paths
  ...
2008-06-11 17:16:32 -07:00
Fenghua Yu
39b8931b5c ACPI: handle invalid ACPI SLIT table
This is a SLIT sanity checking patch.  It moves slit_valid() function to
generic ACPI code and does sanity checking for both x86 and ia64.  It sets up
node_distance with LOCAL_DISTANCE and REMOTE_DISTANCE when hitting invalid
SLIT table on ia64.  It also cleans up unused variable localities in
acpi_parse_slit() on x86.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-06-11 19:13:46 -04:00
Linus Torvalds
a4df1ac12d Merge branch 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm
* 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
  KVM: MMU: Fix is_empty_shadow_page() check
  KVM: MMU: Fix printk() format string
  KVM: IOAPIC: only set remote_irr if interrupt was injected
  KVM: MMU: reschedule during shadow teardown
  KVM: VMX: Clear CR4.VMXE in hardware_disable
  KVM: migrate PIT timer
  KVM: ppc: Report bad GFNs
  KVM: ppc: Use a read lock around MMU operations, and release it on error
  KVM: ppc: Remove unmatched kunmap() call
  KVM: ppc: add lwzx/stwz emulation
  KVM: ppc: Remove duplicate function
  KVM: s390: Fix race condition in kvm_s390_handle_wait
  KVM: s390: Send program check on access error
  KVM: s390: fix interrupt delivery
  KVM: s390: handle machine checks when guest is running
  KVM: s390: fix locking order problem in enable_sie
  KVM: s390: use yield instead of schedule to implement diag 0x44
  KVM: x86 emulator: fix hypercall return value on AMD
  KVM: ia64: fix zero extending for mmio ld1/2/4 emulation in KVM
2008-06-11 10:35:44 -07:00
Miquel van Smoorenburg
b7f09ae583 x86, pci-dma.c: don't always add __GFP_NORETRY to gfp
Currently arch/x86/kernel/pci-dma.c always adds __GFP_NORETRY
to the allocation flags, because it wants to be reasonably
sure not to deadlock when calling alloc_pages().

But really that should only be done in two cases:
- when allocating memory in the lower 16 MB DMA zone.
  If there's no free memory there, waiting or OOM killing is of no use
- when optimistically trying an allocation in the DMA32 zone
  when dma_mask < DMA_32BIT_MASK hoping that the allocation
  happens to fall within the limits of the dma_mask

Also blindly adding __GFP_NORETRY to the the gfp variable might
not be a good idea since we then also use it when calling
dma_ops->alloc_coherent(). Clearing it might also not be a
good idea, dma_alloc_coherent()'s caller might have set it
on purpose. The gfp variable should not be clobbered.

[ mingo@elte.hu: converted to delta patch ontop of previous version. ]

Signed-off-by: Miquel van Smoorenburg <miquels@cistron.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-10 12:22:18 +02:00
Avi Kivity
3c9155106d KVM: MMU: Fix is_empty_shadow_page() check
The check is only looking at one of two possible empty ptes.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-06 21:36:33 +03:00
Avi Kivity
ebb0e6264c KVM: MMU: Fix printk() format string
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-06 21:36:20 +03:00
Linus Torvalds
256a13dd70 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  x86/PCI: add workaround for bug in ASUS A7V600 BIOS (rev 1005)
  PCI/x86: fix up PCI stuff so that PCI_GOANY supports OLPC
2008-06-06 11:33:08 -07:00
Avi Kivity
8d2d73b9a5 KVM: MMU: reschedule during shadow teardown
Shadows for large guests can take a long time to tear down, so reschedule
occasionally to avoid softlockup warnings.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-06 21:32:20 +03:00
Eli Collins
e693d71b46 KVM: VMX: Clear CR4.VMXE in hardware_disable
Clear CR4.VMXE in hardware_disable. There's no reason to leave it set
after doing a VMXOFF.

VMware Workstation 6.5 checks CR4.VMXE as a proxy for whether the CPU is
in VMX mode, so leaving VMXE set means we'll refuse to power on. With this
change the user can power on after unloading the kvm-intel module. I
tested on kvm-67 and kvm-69.

Signed-off-by: Eli Collins <ecollins@vmware.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-06 21:30:20 +03:00
Marcelo Tosatti
2f5997140f KVM: migrate PIT timer
Migrate the PIT timer to the physical CPU which vcpu0 is scheduled on,
similarly to what is done for the LAPIC timers, otherwise PIT interrupts
will be delayed until an unrelated event causes an exit.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-06 21:25:51 +03:00
Avi Kivity
33e3885de2 KVM: x86 emulator: fix hypercall return value on AMD
The hypercall instructions on Intel and AMD are different.  KVM allows the
guest to choose one or the other (the default is Intel), and if the guest
chooses incorrectly, KVM will patch it at runtime to select the correct
instruction.  This allows live migration between Intel and AMD machines.

This patching occurs in the x86 emulator.  The current code also executes
the hypercall.  Unfortunately, the tail end of the x86 emulator code also
executes, overwriting the return value of the hypercall with the original
contents of rax (which happens to be the hypercall number).

Fix not by executing the hypercall in the emulator context; instead let the
guest reissue the patched instruction and execute the hypercall via the
normal path.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-06 21:08:25 +03:00
Bertram Felgenhauer
9f67fd5db5 x86/PCI: add workaround for bug in ASUS A7V600 BIOS (rev 1005)
This BIOS claims the VIA 8237 south bridge to be compatible with VIA 586,
which it is not.

Without this patch, I get the following warning while booting,
among others,

| PCI: Using IRQ router VIA [1106/3227] at 0000:00:11.0
| ------------[ cut here ]------------
| WARNING: at arch/x86/pci/irq.c:265 pirq_via586_get+0x4a/0x60()
| Modules linked in:
| Pid: 1, comm: swapper Not tainted 2.6.26-rc4-00015-g1ec7d99 #1
|  [<c0119fd4>] warn_on_slowpath+0x54/0x70
|  [<c02246e0>] ? vt_console_print+0x210/0x2b0
|  [<c02244d0>] ? vt_console_print+0x0/0x2b0
|  [<c011a413>] ? __call_console_drivers+0x43/0x60
|  [<c011a482>] ? _call_console_drivers+0x52/0x80
|  [<c011aa89>] ? release_console_sem+0x1c9/0x200
|  [<c0291d21>] ? raw_pci_read+0x41/0x70
|  [<c0291e8f>] ? pci_read+0x2f/0x40
|  [<c029151a>] pirq_via586_get+0x4a/0x60
|  [<c02914d0>] ? pirq_via586_get+0x0/0x60
|  [<c029178d>] pcibios_lookup_irq+0x15d/0x430
|  [<c03b895a>] pcibios_irq_init+0x17a/0x3e0
|  [<c03a66f0>] ? kernel_init+0x0/0x250
|  [<c03a6763>] kernel_init+0x73/0x250
|  [<c03b87e0>] ? pcibios_irq_init+0x0/0x3e0
|  [<c0114d00>] ? schedule_tail+0x10/0x40
|  [<c0102dee>] ? ret_from_fork+0x6/0x1c
|  [<c03a66f0>] ? kernel_init+0x0/0x250
|  [<c03a66f0>] ? kernel_init+0x0/0x250
|  [<c010324b>] kernel_thread_helper+0x7/0x1c
|  =======================
| ---[ end trace 4eaa2a86a8e2da22 ]---

and IRQ trouble later,

| irq 10: nobody cared (try booting with the "irqpoll" option)

Now that's an VIA 8237 chip, so pirq_via586_get shouldn't be called
at all; adding this workaround to via_router_probe() fixes the
problem for me.

Amazingly I have a 2.6.23.8 kernel that somehow works fine ... I'll
never understand why.

Signed-off-by: Bertram Felgenhauer <int-e@gmx.de>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-06-05 15:32:15 -07:00
Andres Salomon
2bdd1b031b PCI/x86: fix up PCI stuff so that PCI_GOANY supports OLPC
Previously, one would have to specifically choose CONFIG_OLPC and
CONFIG_PCI_GOOLPC in order to enable PCI_OLPC.  That doesn't really work
for distro kernels, so this patch allows one to choose CONFIG_OLPC and
CONFIG_PCI_GOANY in order to build in OLPC support in a generic kernel (as
requested by Robert Millan).

This also moves GOOLPC before GOANY in the menuconfig list.

Finally, make pci_access_init return early if we detect OLPC hardware.
There's no need to continue probing stuff, and pci_pcbios_init
specifically trashes our settings (we didn't run into that before because
PCI_GOANY wasn't supported).

Signed-off-by: Andres Salomon <dilinger@debian.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-06-05 14:29:25 -07:00
Stefan Richter
16104b5504 x86: fix CONFIG_NONPROMISC_DEVMEM prompt and help text
Here is an attempt to translate the prompt and help text into something
which is legible and, as a bonus, correct.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-05 14:21:45 -07:00
Suresh Siddha
870568b390 x86, fpu: fix CONFIG_PREEMPT=y corruption of application's FPU stack
Jürgen Mell reported an FPU state corruption bug under CONFIG_PREEMPT,
and bisected it to commit v2.6.19-1363-gacc2076, "i386: add sleazy FPU
optimization".

Add tsk_used_math() checks to prevent calling math_state_restore()
which can sleep in the case of !tsk_used_math(). This prevents
making a blocking call in __switch_to().

Apparently "fpu_counter > 5" check is not enough, as in some signal handling
and fork/exec scenarios, fpu_counter > 5 and !tsk_used_math() is possible.

It's a side effect though. This is the failing scenario:

process 'A' in save_i387_ia32() just after clear_used_math()

Got an interrupt and pre-empted out.

At the next context switch to process 'A' again, kernel tries to restore
the math state proactively and sees a fpu_counter > 0 and !tsk_used_math()

This results in init_fpu() during the __switch_to()'s math_state_restore()

And resulting in fpu corruption which will be saved/restored
(save_i387_fxsave and restore_i387_fxsave) during the remaining
part of the signal handling after the context switch.

Bisected-by: Jürgen Mell <j.mell@t-online.de>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Jürgen Mell <j.mell@t-online.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
2008-06-04 16:21:24 +02:00
Pavel Machek
cd76374e9d suspend-vs-iommu: prevent suspend if we could not resume
iommu/gart support misses suspend/resume code, which can do bad stuff,
including memory corruption on resume.  Prevent system suspend in case we
would be unable to resume.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Tested-by: Patrick <ragamuffin@datacomm.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-04 13:11:47 +02:00
Andrew Morton
be524fb960 x86: section mismatch fix
Fix this:

 WARNING: vmlinux.o(.text+0x114bb): Section mismatch in reference from
 the function nopat() to the function .cpuinit.text:pat_disable()
 The function nopat() references
 the function __cpuinit pat_disable().
 This is often because nopat lacks a __cpuinit
 annotation or the annotation of pat_disable is wrong.

Reported-by: "Fabio Comolli" <fabio.comolli@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-04 13:11:47 +02:00
Venki Pallipadi
282c454cd3 x86: fix Xorg crash with xf86MapVidMem error
Clarify the usage of mtrr_lookup() in PAT code, and to make PAT code
resilient to mtrr lookup problems.

Specifically, pat_x_mtrr_type() is restructured to highlight, under what
conditions we look for mtrr hint. pat_x_mtrr_type() uses a default type
when there are any errors in mtrr lookup (still maintaining the pat
consistency). And, reserve_memtype() highlights its usage ot mtrr_lookup
for request type of '-1' and also defaults in a sane way on any mtrr
lookup failure.

pat.c looks at mtrr type of a range to get a hint on what mapping type
to request when user/API: (1) hasn't specified any type (/dev/mem
mapping) and we do not want to take performance hit by always mapping
UC_MINUS. This will be the case for /dev/mem mappings used to map BIOS
area or ACPI region which are WB'able. In this case, as long as MTRR is
not WB, PAT will request UC_MINUS for such mappings.

(2) user/API requests WB mapping while in reality MTRR may have UC or
WC. In this case, PAT can map as WB (without checking MTRR) and still
effective type will be UC or WC. But, a subsequent request to map same
region as UC or WC may fail, as the region will get trackked as WB in
PAT list. Looking at MTRR hint helps us to track based on effective type
rather than what user requested. Again, here mtrr_lookup is only used as
hint and we fallback to WB mapping (as requested by user) as default.

In both cases, after using the mtrr hint, we still go through the
memtype list to make sure there are no inconsistencies among multiple
users.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Rufus & Azrael <rufus-azrael@numericable.fr>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-04 13:11:47 +02:00
Kevin Winchester
511631011d x86: fix pointer type warning in arch/x86/mm/init_64.c:early_memtest
Changed the call to find_e820_area_size to pass u64 instead of unsigned long.

Signed-off-by: Kevin Winchester <kjwinchester@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-04 13:11:47 +02:00
Hugh Dickins
2884f110d5 x86: fix bad pmd ffff810000207xxx(9090909090909090)
OGAWA Hirofumi and Fede have reported rare pmd_ERROR messages:
mm/memory.c:127: bad pmd ffff810000207xxx(9090909090909090).

Initialization's cleanup_highmap was leaving alignment filler
behind in the pmd for MODULES_VADDR: when vmalloc's guard page
would occupy a new page table, it's not allocated, and then
module unload's vfree hits the bad 9090 pmd entry left over.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-04 13:11:47 +02:00
Ingo Molnar
226e9a93a2 x86: ioremap fix failing nesting check
Mika Kukkonen noticed that the nesting check in early_iounmap() is not
actually done.

Reported-by: Mika Kukkonen <mikukkon@srv1-m700-lanp.koti>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: torvalds@linux-foundation.org
Cc: arjan@linux.intel.com
Cc: mikukkon@iki.fi
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-04 13:11:47 +02:00
Suresh Siddha
e8a496ac8c x86: fix broken math-emu with lazy allocation of fpu area
Fix the math emulation that got broken with the recent lazy allocation of FPU
area. init_fpu() need to be added for the math-emulation path aswell
for the FPU area allocation.

math emulation enabled kernel booted fine with this, in the presence
of "no387 nofxsr" boot param.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: hpa@zytor.com
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-04 13:11:46 +02:00
Steven Rostedt
5c1ea08215 x86: enable preemption in delay
The RT team has been searching for a nasty latency. This latency shows
up out of the blue and has been seen to be as big as 5ms!

Using ftrace I found the cause of the latency.

   pcscd-2995  3dNh1 52360300us : irq_exit (smp_apic_timer_interrupt)
   pcscd-2995  3dN.2 52360301us : idle_cpu (irq_exit)
   pcscd-2995  3dN.2 52360301us : rcu_irq_exit (irq_exit)
   pcscd-2995  3dN.1 52360771us : smp_apic_timer_interrupt (apic_timer_interrupt
)
   pcscd-2995  3dN.1 52360771us : exit_idle (smp_apic_timer_interrupt)

Here's an example of a 400 us latency. pcscd took a timer interrupt and
returned with "need resched" enabled, but did not reschedule until after
the next interrupt came in at 52360771us 400us later!

At first I thought we somehow missed a preemption check in entry.S. But
I also noticed that this always seemed to happen during a __delay call.

   pcscd-2995  3dN.2 52360836us : rcu_irq_exit (irq_exit)
   pcscd-2995  3.N.. 52361265us : preempt_schedule (__delay)

Looking at the x86 delay, I found my problem.

In git commit 35d5d08a08, Andrew Morton
placed preempt_disable around the entire delay due to TSC's not working
nicely on SMP.  Unfortunately for those that care about latencies this
is devastating! Especially when we have callers to mdelay(8).

Here I enable preemption during the loop and account for anytime the task
migrates to a new CPU. The delay asked for may be extended a bit by
the migration, but delay only guarantees that it will delay for that minimum
time. Delaying longer should not be an issue.

[
  Thanks to Thomas Gleixner for spotting that cpu wasn't updated,
    and to place the rep_nop between preempt_enabled/disable.
]

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: akpm@osdl.org
Cc: Clark Williams <clark.williams@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org>
Cc: Gregory Haskins <ghaskins@novell.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andi Kleen <andi-suse@firstfloor.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-04 13:11:46 +02:00
Ingo Molnar
deef325086 x86: disable preemption in native_smp_prepare_cpus
Priit Laes reported the following warning:

Call Trace:
 [<ffffffff8022f1e1>] warn_on_slowpath+0x51/0x63
 [<ffffffff80282e48>] sys_ioctl+0x2d/0x5d
 [<ffffffff805185ff>] _spin_lock+0xe/0x24
 [<ffffffff80227459>] task_rq_lock+0x3d/0x73
 [<ffffffff805133c3>] set_cpu_sibling_map+0x336/0x350
 [<ffffffff8021c1b8>] read_apic_id+0x30/0x62
 [<ffffffff806d921d>] verify_local_APIC+0x90/0x138
 [<ffffffff806d84b5>] native_smp_prepare_cpus+0x1f9/0x305
 [<ffffffff806ce7b1>] kernel_init+0x59/0x2d9
 [<ffffffff80518a26>] _spin_unlock_irq+0x11/0x2b
 [<ffffffff8020bf48>] child_rip+0xa/0x12
 [<ffffffff806ce758>] kernel_init+0x0/0x2d9
 [<ffffffff8020bf3e>] child_rip+0x0/0x12

fix this by generally disabling preemption in native_smp_prepare_cpus().

Reported-and-bisected-by: Priit Laes <plaes@plaes.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-04 13:11:46 +02:00
Yinghai Lu
fb3bbd6a66 x86: fix APIC warning on 32bit v2
for http://bugzilla.kernel.org/show_bug.cgi?id=10613

BIOS bug, APIC version is 0 for CPU#0! fixing up to 0x10. (tell your hw vendor)

v2: fix 64 bit compilation

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Gabriel C <nix.or.die@googlemail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-06-04 13:11:46 +02:00
Pavel Machek
f529626a86 suspend-vs-iommu: prevent suspend if we could not resume
iommu/gart support misses suspend/resume code, which can do bad stuff,
including memory corruption on resume.  Prevent system suspend in case we
would be unable to resume.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Tested-by: Patrick <ragamuffin@datacomm.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-02 13:02:48 +02:00
Miquel van Smoorenburg
db9f600b96 x86: pci-dma.c: use __GFP_NO_OOM instead of __GFP_NORETRY
On Wed, 2008-05-28 at 04:47 +0200, Andi Kleen wrote:
> > So...  why not just remove the setting of __GFP_NORETRY?  Why is it
> > wrong to oom-kill things in this case?
>
> When the 16MB zone overflows (which can be common in some workloads)
> calling the OOM killer is pretty useless because it has barely any
> real user data [only exception would be the "only 16MB" case Alan
> mentioned]. Killing random processes in this case is bad.
>
> I think for 16MB __GFP_NORETRY is ok because there should be
> nothing freeable in there so looping is useless. Only exception would be the
> "only 16MB total" case again but I'm not sure 2.6 supports that at all
> on x86.
>
> On the other hand d_a_c() does more allocations than just 16MB, especially
> on 64bit and the other zones need different strategies.

Okay, so how about this then ?

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-02 12:14:58 +02:00
Bertram Felgenhauer
75b19b790b pci, x86: add workaround for bug in ASUS A7V600 BIOS (rev 1005)
This BIOS claims the VIA 8237 south bridge to be compatible with VIA 586,
which it is not.

Without this patch, I get the following warning while booting,
among others,

| PCI: Using IRQ router VIA [1106/3227] at 0000:00:11.0
| ------------[ cut here ]------------
| WARNING: at arch/x86/pci/irq.c:265 pirq_via586_get+0x4a/0x60()
| Modules linked in:
| Pid: 1, comm: swapper Not tainted 2.6.26-rc4-00015-g1ec7d99 #1
|  [<c0119fd4>] warn_on_slowpath+0x54/0x70
|  [<c02246e0>] ? vt_console_print+0x210/0x2b0
|  [<c02244d0>] ? vt_console_print+0x0/0x2b0
|  [<c011a413>] ? __call_console_drivers+0x43/0x60
|  [<c011a482>] ? _call_console_drivers+0x52/0x80
|  [<c011aa89>] ? release_console_sem+0x1c9/0x200
|  [<c0291d21>] ? raw_pci_read+0x41/0x70
|  [<c0291e8f>] ? pci_read+0x2f/0x40
|  [<c029151a>] pirq_via586_get+0x4a/0x60
|  [<c02914d0>] ? pirq_via586_get+0x0/0x60
|  [<c029178d>] pcibios_lookup_irq+0x15d/0x430
|  [<c03b895a>] pcibios_irq_init+0x17a/0x3e0
|  [<c03a66f0>] ? kernel_init+0x0/0x250
|  [<c03a6763>] kernel_init+0x73/0x250
|  [<c03b87e0>] ? pcibios_irq_init+0x0/0x3e0
|  [<c0114d00>] ? schedule_tail+0x10/0x40
|  [<c0102dee>] ? ret_from_fork+0x6/0x1c
|  [<c03a66f0>] ? kernel_init+0x0/0x250
|  [<c03a66f0>] ? kernel_init+0x0/0x250
|  [<c010324b>] kernel_thread_helper+0x7/0x1c
|  =======================
| ---[ end trace 4eaa2a86a8e2da22 ]---

and IRQ trouble later,

| irq 10: nobody cared (try booting with the "irqpoll" option)

Now that's an VIA 8237 chip, so pirq_via586_get shouldn't be called
at all; adding this workaround to via_router_probe() fixes the
problem for me.

Amazingly I have a 2.6.23.8 kernel that somehow works fine ... I'll
never understand why.

Signed-off-by: Bertram Felgenhauer <int-e@gmx.de>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-02 11:29:10 +02:00
Rusty Russell
a16ffe93c4 lguest: fix ugly <NULL> in /proc/interrupts
Before:
	root@ubuntu:~# cat /proc/interrupts
	           CPU0
	  1:       1672    lguest-<NULL>    virtio0
	  2:          1    lguest-<NULL>    virtio1
	  ...
After:
	root@ubuntu:~# cat /proc/interrupts
	           CPU0
	  1:       2889    lguest-level     virtio0
	  2:          9    lguest-level     virtio1

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-30 15:09:43 +10:00
Sam Ravnborg
73531905ed Kconfig: introduce ARCH_DEFCONFIG to DEFCONFIG_LIST
init/Kconfig contains a list of configs that are searched
for if 'make *config' are used with no .config present.
Extend this list to look at the config identified by
ARCH_DEFCONFIG.

With this change we now try the defconfig targets last.

This fixes a regression reported
by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2008-05-25 23:03:18 +02:00
Linus Torvalds
eb90d81d03 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
  x86: prevent PGE flush from interruption/preemption
  x86: use explicit copy in vdso_gettimeofday()
  namespacecheck: automated fixes
  x86/xen: fix arbitrary_virt_to_machine()
  x86: don't read maxlvt before checking if APIC is mapped
  x86: disable TSC for sched_clock() when calibration failed
  x86: distangle user disabled TSC from unstable
  x86: fix setup of cyc2ns in tsc_64.c
2008-05-24 10:20:00 -07:00
Linus Torvalds
e6b027a398 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] clarify license of freq_table.c
  [CPUFREQ] Remove documentation of removed ondemand tunable.
  [CPUFREQ] Crusoe: longrun cpufreq module reports false min freq
  [CPUFREQ] powernow-k8: improve error messages
2008-05-23 09:24:52 -07:00
Harvey Harrison
7fafd91d85 x86: fix integer as NULL pointer warning
arch/x86/boot/printf.c:59:10: warning: Using plain integer as NULL pointer

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-23 08:11:06 -07:00
Andi Kleen
a1289643ad x86: use explicit copy in vdso_gettimeofday()
Jeremy's gcc 3.4 seems to be unable to inline a 8 byte memcpy.  But the
vdso doesn't support external references.  Copy the structure members
of struct timezone explicitely instead.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23 14:08:06 +02:00
Ingo Molnar
2ddfd20e7c namespacecheck: automated fixes
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-23 14:08:06 +02:00
Jan Beulich
de067814d6 x86/xen: fix arbitrary_virt_to_machine()
While I realize that the function isn't currently being used, I still
think an obvious mistake like this should be corrected.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23 14:08:06 +02:00
Chuck Ebbert
2584a82dee x86: don't read maxlvt before checking if APIC is mapped
A check for unmapped apic was added before reading maxlvt but the early
read of maxlvt wasn't removed.

Signed-off-by: Chuck Ebbert <cebbert@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
2008-05-23 14:08:06 +02:00
Thomas Gleixner
74dc51a3de x86: disable TSC for sched_clock() when calibration failed
When the TSC calibration fails then TSC is still used in
sched_clock(). Disable it completely in that case.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
2008-05-23 14:08:06 +02:00
Thomas Gleixner
9ccc906c97 x86: distangle user disabled TSC from unstable
tsc_enabled is set to 0 from the command line switch "notsc" and from
the mark_tsc_unstable code. Seperate those functionalities and replace
tsc_enable with tsc_disable. This makes also the native_sched_clock()
decision when to use TSC understandable.

Preparatory patch to solve the sched_clock() issue on 32 bit.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-23 14:08:06 +02:00
Thomas Gleixner
b6db80ee13 x86: fix setup of cyc2ns in tsc_64.c
When the TSC is calibrated against the PIT due to the nonavailability
of PMTIMER/HPET or due to SMI interference then the setup of the per
CPU cyc2ns variables is skipped. This is unlikely to happen but it
would definitely render sched_clock() unusable.

This was introduced with commit 53d517cdba

    x86: scale cyc_2_nsec according to CPU frequency

Update the per CPU cyc2ns variables in all exit pathes of tsc_calibrate.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
2008-05-23 14:08:06 +02:00
Tony Camuso
a167607255 PCI: Correct last two HP entries in the bfsort whitelist
Greetings.

There is a code flaw in the bfsort whitelist, where there are redundant
entries for the same two HP systems, DL385 G2 and DL585 G2. This patch
replaces those redundant entries with the correct ones. The correct
entries are for large-volume systems, the DL360 and DL380.

-----------------------------------------------------------------------

commit ec69f0374c3b0ad7ea991b0e9ac00377acfe5b1a
Author: Tony Camuso <tony.camuso@hp.com>
Date:   Wed May 14 07:09:28 2008 -0400

     Replace Redundant Whitelist Entries with the Correct Ones

     The ProLiant DL585 G2 and the DL585 G2 are entered reundantly
     in the dmi_system_id table. What should have been there are the
     DL360 and DL380. This patch simply replaces the redundant
     entries with the correct entries.

 arch/x86/pci/common.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

     Signed-off-by: Tony Camuso <tony.camuso@hp.com>
     Signed-off-by: Pat Schoeller <patrick.schoeller@hp.com>

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-22 18:16:24 +02:00
Linus Torvalds
737b0fbf44 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI: correct mailing list address
  PCI: Correct last two HP entries in the bfsort whitelist
2008-05-20 10:55:04 -07:00
Linus Torvalds
e23a5f6687 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  [PATCH] return to old errno choice in mkdir() et.al.
  [Patch] fs/binfmt_elf.c: fix wrong return values
  [PATCH] get rid of leak in compat_execve()
  [Patch] fs/binfmt_elf.c: fix a wrong free
  [PATCH] avoid multiplication overflows and signedness issues for max_fds
  [PATCH] dup_fd() part 4 - race fix
  [PATCH] dup_fd() - part 3
  [PATCH] dup_fd() part 2
  [PATCH] dup_fd() fixes, part 1
  [PATCH] take init_files to fs/file.c
2008-05-19 16:37:45 -07:00
maximilian attems
667ad4f701 [CPUFREQ] Crusoe: longrun cpufreq module reports false min freq
The longrun cpufreq module reports a false minimum frequency 3MHz on
300-600MHz Crusoe processor.  This may be due to a calculation bug
in the module.

Original patch from Kaz Sasayama <kazssym@hypercore.co.jp>
submitted as http://bugs.debian.org/468149 patch ported to x86

Cc: Kaz Sasayama <kazssym@hypercore.co.jp>
Signed-off-by: maximilian attems <max@stro.at>
Signed-off-by: Dave Jones <davej@redhat.com>
2008-05-19 18:17:28 -04:00
Mark Langsdorf
eba9fe93a2 [CPUFREQ] powernow-k8: improve error messages
The most common error with powernow-k8 is an ACPI _PSS error
caused either by failure to load the ACPI processor module
or a bad parse of the _PSS object.  Make the error message
returned to the user in these situations more straightforward
and easier to understand.

-Mark Langsdorf
Operating System Research Center
AMD

Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2008-05-19 18:17:27 -04:00
Linus Torvalds
88d53766bd Merge branch 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm
* 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
  KVM: LAPIC: ignore pending timers if LVTT is disabled
  KVM: Update MAINTAINERS for new mailing lists
  KVM: Fix kvm_vcpu_block() task state race
  KVM: ia64: Set KVM_IOAPIC_NUM_PINS to 48
  KVM: ia64: fix GVMM module including position-dependent objects
  KVM: ia64: Define new kvm_fpreg struture to replace ia64_fpreg
  KVM: PIT: take inject_pending into account when emulating hlt
  s390: KVM guest: fix compile error
  KVM: x86 emulator: fix writes to registers with modrm encodings
2008-05-19 13:53:21 -07:00
Tony Camuso
8d64c781f0 PCI: Correct last two HP entries in the bfsort whitelist
Replace Redundant Whitelist Entries with the Correct Ones

The ProLiant DL585 G2 and the DL585 G2 are entered reundantly in the
dmi_system_id table. What should have been there are the DL360 and DL380. This
patch simply replaces the redundant entries with the correct entries.

Signed-off-by: Tony Camuso <tony.camuso@hp.com>
Signed-off-by: Pat Schoeller <patrick.schoeller@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-05-19 12:21:36 -07:00
Marcelo Tosatti
54aaacee35 KVM: LAPIC: ignore pending timers if LVTT is disabled
Only use the APIC pending timers count to break out of HLT emulation if
the timer vector is enabled.

Certain configurations of Windows simply mask out the vector without
disabling the timer.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-18 14:39:39 +03:00
Marcelo Tosatti
eedaa4e2af KVM: PIT: take inject_pending into account when emulating hlt
Otherwise hlt emulation fails if PIT is not injecting IRQ's.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-18 14:34:15 +03:00
Avi Kivity
107d6d2efa KVM: x86 emulator: fix writes to registers with modrm encodings
A register destination encoded with a mod=3 encoding left dst.ptr NULL.
Normally we don't trap writes to registers, but in the case of smsw, we do.

Fix by pointing dst.ptr at the destination register.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-18 14:34:14 +03:00
Thomas Gleixner
e9623b3559 x86: disable mwait for AMD family 10H/11H CPUs
The previous revert of 0c07ee38c9 left
out the mwait disable condition for AMD family 10H/11H CPUs.

Andreas Herrman said:

It depends on the CPU. For AMD CPUs that support MWAIT this is wrong.
Family 0x10 and 0x11 CPUs will enter C1 on HLT. Powersavings then
depend on a clock divisor and current Pstate of the core.

If all cores of a processor are in halt state (C1) the processor can
enter the C1E (C1 enhanced) state. If mwait is used this will never
happen.

Thus HLT saves more power than MWAIT here.

It might be best to switch off the mwait flag for these AMD CPU
families like it was introduced with commit
f039b75471 (x86: Don't use MWAIT on AMD
Family 10)

Re-add the AMD families 10H/11H check and disable the mwait usage for
those.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-17 22:57:20 +02:00
Avi Kivity
31f4d870b0 x86: fix crash on cpu hotplug on pat-incapable machines
pat_disable() is __init, which means it goes away after booting is complete.
Unfortunately it is used by the hotplug code if the machine is not
pat-capable, causing a crash.

Fix by marking pat_disable() as __cpuinit.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-17 22:57:20 +02:00
Ingo Molnar
a738d897b7 x86: remove mwait capability C-state check
Vegard Nossum reports:

| powertop shows between 200-400 wakeups/second with the description
| "<kernel IPI>: Rescheduling interrupts" when all processors have load (e.g.
| I need to run two busy-loops on my 2-CPU system for this to show up).
|
| The bisect resulted in this commit:
|
| commit 0c07ee38c9
| Date:   Wed Jan 30 13:33:16 2008 +0100
|
|     x86: use the correct cpuid method to detect MWAIT support for C states

remove the functional effects of this patch and make mwait unconditional.

A future patch will turn off mwait on specific CPUs where that causes
power to be wasted.

Bisected-by: Vegard Nossum <vegard.nossum@gmail.com>
Tested-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-17 22:57:20 +02:00
Al Viro
f52111b154 [PATCH] take init_files to fs/file.c
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-05-16 17:22:20 -04:00
Linus Torvalds
4ef7e3e90f Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: user_regset_view table fix for ia32 on 64-bit
  x86: arch/x86/mm/pat.c - fix warning
  x86: fix csum_partial() export
  x86: early_init_centaur(): use set_cpu_cap()
  x86: fix app crashes after SMP resume
  x86: wakeup.lds.S - section ordering fix
  x86: [VOYAGER] fix duplicate phys_cpu_present_map symbol
  x86/pci: fix broken ISA DMA
2008-05-13 12:33:56 -07:00
Roland McGrath
1f465f4e47 x86: user_regset_view table fix for ia32 on 64-bit
The user_regset_view table for the 32-bit regsets on the 64-bit build had
the wrong sizes for the FP regsets.  This bug had no user-visible effect
(just on kernel modules using the user_regset interfaces and the like).
But the fix is trivial and risk-free.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-13 19:40:20 +02:00
Pranith Kumar
afc8534380 x86: arch/x86/mm/pat.c - fix warning
fix this warning:

 arch/x86/mm/pat.c: In function `phys_mem_access_prot_allowed':
 arch/x86/mm/pat.c:558: warning: long long unsigned int format, long
 unsigned int arg (arg 6)
 arch/x86/mm/pat.c: In function `map_devmem':
 arch/x86/mm/pat.c:580: warning: long long unsigned int format, long
 unsigned int arg (arg 6)

Signed-off-by: D Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-13 19:39:30 +02:00
Ingo Molnar
89804c022f x86: fix csum_partial() export
Fix this symbol export problem:

    Building modules, stage 2.
    MODPOST 193 modules
    ERROR: "csum_partial" [fs/reiserfs/reiserfs.ko] undefined!
    make[1]: *** [__modpost] Error 1
    make: *** [modules] Error 2

This is due to a known weakness of symbol exports: if a symbol's
only in-core user is an EXPORT_SYMBOL from a lib-y section, the
symbol is not linked in.

The solution is to move the export to x8664_ksyms_64.c - but the real
solution would be to fix kbuild.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-13 19:38:47 +02:00
Andrew Morton
8c45a4e4f2 x86: early_init_centaur(): use set_cpu_cap()
arch/x86/kernel/setup_64.c:954: warning: passing argument 2 of 'set_bit' from incompatible pointer type

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-13 19:37:38 +02:00
Hugh Dickins
61165d7a03 x86: fix app crashes after SMP resume
After resume on a 2cpu laptop, kernel builds collapse with a sed hang,
sh or make segfault (often on 20295564), real-time signal to cc1 etc.

Several hurdles to jump, but a manually-assisted bisect led to -rc1's
d2bcbad5f3 x86: do not zap_low_mappings
in __smp_prepare_cpus.  Though the low mappings were removed at bootup,
they were left behind (with Global flags helping to keep them in TLB)
after resume or cpu online, causing the crashes seen.

Reinstate zap_low_mappings (with local __flush_tlb_all) for each cpu_up
on x86_32.  This used to be serialized by smp_commenced_mask: that's now
gone, but a low_mappings flag will do.  No need for native_smp_cpus_done
to repeat the zap: let mem_init zap BSP's low mappings just like on UP.

(In passing, fix error code from native_cpu_up: do_boot_cpu returns a
variety of diagnostic values, Dprintk what it says but convert to -EIO.
And save_pg_dir separately before zap_low_mappings: doesn't matter now,
but zapping twice in succession wiped out resume's swsusp_pg_dir.)

That worked well on the duo and one quad, but wouldn't boot 3rd or 4th
cpu on P4 Xeon, oopsing just after unlock_ipi_call_lock.  The TLB flush
IPI now being sent reveals a long-standing bug: the booting cpu has its
APIC readied in smp_callin at the top of start_secondary, but isn't put
into the cpu_online_map until just before that unlock_ipi_call_lock.

So native_smp_call_function_mask to online cpus would send_IPI_allbutself,
including the cpu just coming up, though it has been excluded from the
count to wait for: by the time it handles the IPI, the call data on
native_smp_call_function_mask's stack may well have been overwritten.

So fall back to send_IPI_mask while cpu_online_map does not match
cpu_callout_map: perhaps there's a better APICological fix to be
made at the start_secondary end, but I wouldn't know that.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-13 19:36:12 +02:00
Venki Pallipadi
77db988564 x86/PCI: X86_PAT & mprotect
Some versions of X used the mprotect workaround to change caching type from UC
to WB, so that it can then use mtrr to program WC for that region [1].  Change
the mmap of pci space through /sys or /proc interfaces from UC to UC_MINUS.
With this change, X will not need to use mprotect workaround to get WC type
since the MTRR mapping type will be honored.

The bug in mprotect that clobbers PAT bits is fixed in a follow on patch. So,
this X workaround will stop working as well.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-05-13 09:51:54 -07:00
Takashi Iwai
4a367f3a9d x86/PCI: fix broken ISA DMA
Rene Herman reported:

> commit 8779f2fc3b
>
> "x86: don't try to allocate from DMA zone at first"
>
> breaks all of ISA DMA. Or all of ALSA ISA DMA at least. All
> ISA soundcards are silent following that commit -- no error
> messages, everything appears fine, just silence.

That patch is buggy. We had an implicit assumption that
dev = NULL for ISA devices that require 24bit DMA.

The recent work on x86 dma_alloc_coherent() breaks the ISA DMA buffer
allocation, which is represented by "dev = NULL" and requires 24bit
DMA implicitly.

Bisected-by: Rene Herman <rene.herman@keyaccess.nl>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-05-13 09:51:53 -07:00
Cyrill Gorcunov
8c6b0ef2ea x86: wakeup.lds.S - section ordering fix
To allow linker to catch sections overlapping we have to declare
them in appropriate order.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:27:51 +02:00
James Bottomley
f8955ebe3e x86: [VOYAGER] fix duplicate phys_cpu_present_map symbol
The phys_cpu_present_map is an expected symbol in the SMP harness.
Unfortunately, x86 recently moved this and a few others to
kernel/setup.c where it doesn't quite work because voyager has to
define its own.  Use CONFIG_X86_LOCAL_APIC to isolate these
definitions and fix up another area in setup.c where CONFIG_X86_SMP
should be used instead of CONFIG_SMP.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: toralf.foerster@gmx.de
Cc: Mike Travis <travis@sgi.com>
Cc: Alexey Starikovskiy <astarikovskiy@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 21:27:51 +02:00
Takashi Iwai
8965eb1938 x86/pci: fix broken ISA DMA
Rene Herman reported:

> commit 8779f2fc3b
>
> "x86: don't try to allocate from DMA zone at first"
>
> breaks all of ISA DMA. Or all of ALSA ISA DMA at least. All
> ISA soundcards are silent following that commit -- no error
> messages, everything appears fine, just silence.

That patch is buggy. We had an implicit assumption that
dev = NULL for ISA devices that require 24bit DMA.

The recent work on x86 dma_alloc_coherent() breaks the ISA DMA buffer
allocation, which is represented by "dev = NULL" and requires 24bit
DMA implicitly.

Bisected-by: Rene Herman <rene.herman@keyaccess.nl>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Rene Herman <rene.herman@keyaccess.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-12 21:27:50 +02:00
Linus Torvalds
3e1b83ab39 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86:
  x86: rdc: leds build/config fix
  x86: sysfs cpu?/topology is empty in 2.6.25 (32-bit Intel system)
  x86: revert commit 709f744 ("x86: bitops asm constraint fixes")
  x86: restrict keyboard io ports reservation to make ipmi driver work
  x86: fix fpu restore from sig return
  x86: remove spew print out about bus to node mapping
  x86: revert printk format warning change which is for linux-next
  x86: cleanup PAT cpu validation
  x86: geode: define geode_has_vsa2() even if CONFIG_MGEODE_LX is not set
  x86: GEODE: cache results from geode_has_vsa2() and uninline
  x86: revert geode config dependency
2008-05-10 21:10:48 -07:00
Ingo Molnar
82fd866701 x86: rdc: leds build/config fix
select NEW_LEDS for now until the Kconfig dependencies have been
fixed.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-10 19:31:45 +02:00
Helge Wagner
9096bd7a66 x86: restrict keyboard io ports reservation to make ipmi driver work
On some of our (single board computer) boards (x86) we are using an
IPMI controller that uses I/O ports 0x62 and 0x66 for a KCS (keyboard
controller style) IPMI system interface.

Trying to load the openipmi driver fails, because the ports
(0x62/0x66) are reserved for keyboard. keyboard reserves the full
range 0x60-0x6F while it doesn't need to.

Reserve only ports 0x60 and 0x64 for the legacy PS/2 i8042 keyboad
controller instead of 0x60-0x6F to allow the openipmi driver to work.

[ tglx: added 64bit fixup ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-10 19:31:45 +02:00
Suresh Siddha
fd3c3ed5d1 x86: fix fpu restore from sig return
If the task never used fpu, initialize the fpu before restoring the FP
state from the signal handler context. This will allocate the fpu
state, if the task never needed it before.

Reported-and-bisected-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Eric Sesterhenn <snakebyte@gmx.de>
Cc: Frederik Deweerdt <deweerdt@free.fr>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-10 19:31:45 +02:00
Yinghai Lu
0646153921 x86: remove spew print out about bus to node mapping
Jeff Garzik pointed out that this printout is not needed.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-10 19:31:45 +02:00
Thomas Gleixner
5ecddcebfb x86: revert printk format warning change which is for linux-next
commit 62179849b4
    x86: fix setup printk format warning

is for linux-next and not for .26

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-10 19:31:44 +02:00
Linus Torvalds
8d53910856 Revert "PCI: remove default PCI expansion ROM memory allocation"
This reverts commit 9f8daccaa0, which was
reported to break X startup (xf86-video-ati-6.8.0). See

	http://bugs.freedesktop.org/show_bug.cgi?id=15523

for details.

Reported-by: Laurence Withers <l@lwithers.me.uk>
Cc: Gary Hade <garyhade@us.ibm.com>
Cc: Greg KH <greg@kroah.com>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: "Jun'ichi Nomura" <j-nomura@ce.jp.nec.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-08 19:02:55 -07:00
Linus Torvalds
f589274533 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  [ALSA] soc at91 minor bug fixes
  [ALSA] soc - at91-pcm - Fix line wrapping
  pcspkr: fix dependancies
2008-05-08 10:58:45 -07:00
Thomas Gleixner
8d4a430085 x86: cleanup PAT cpu validation
Move the scattered checks for PAT support to a single function. Its
moved to addon_cpuid_features.c as this file is shared between 32 and
64 bit.

Remove the manipulation of the PAT feature bit and just disable PAT in
the PAT layer, based on the PAT bit provided by the CPU and the
current CPU version/model white list.

Change the boot CPU check so it works on Voyager somewhere in the
future as well :) Also panic, when a secondary has PAT disabled but
the primary one has alrady switched to PAT. We have no way to undo
that.

The white list is kept for now to ensure that we can rely on known to
work CPU types and concentrate on the software induced problems
instead of fighthing CPU erratas and subtle wreckage caused by not yet
verified CPUs. Once the PAT code has stabilized enough, we can remove
the white list and open the can of worms.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-08 15:43:51 +02:00
Andres Salomon
547acec7ec x86: GEODE: cache results from geode_has_vsa2() and uninline
This moves geode_has_vsa2 into a .c file, caches the result we get from
the VSA virtual registers, and causes the function to no longer be inline.

[akpm@linux-foundation.org: cleanup]

Signed-off-by: Andres Salomon <dilinger@debian.org>
Cc: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-08 15:43:50 +02:00
Thomas Gleixner
ac44cc96fb x86: revert geode config dependency
commit e26a28d190
    x86: olpc build fix

was a fix to a patch that was withdrawn/delayed and then erroneously
commited to x86.git. Revert it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-08 15:43:50 +02:00
Stas Sergeev
e5e1d3cb20 pcspkr: fix dependancies
fix pcspkr dependancies: make the pcspkr platform
drivers to depend on a platform device, and
not the other way around.

Signed-off-by: Stas Sergeev <stsp@aknet.ru>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
CC: Vojtech Pavlik <vojtech@suse.cz>
CC: Michael Opdenacker <michael-lists@free-electrons.com>
[fixed for 2.6.26-rc1 by tiwai]
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2008-05-07 12:42:03 +02:00
Hugh Dickins
aeed5fce37 x86: fix PAE pmd_bad bootup warning
Fix warning from pmd_bad() at bootup on a HIGHMEM64G HIGHPTE x86_32.

That came from 9fc34113f6 x86: debug pmd_bad();
but we understand now that the typecasting was wrong for PAE in the previous
version: pagetable pages above 4GB looked bad and stopped Arjan from booting.

And revert that cded932b75 x86: fix pmd_bad
and pud_bad to support huge pages.  It was the wrong way round: we shouldn't
weaken every pmd_bad and pud_bad check to let huge pages slip through - in
part they check that we _don't_ have a huge page where it's not expected.

Put the x86 pmd_bad() and pud_bad() definitions back to what they have long
been: they can be improved (x86_32 should use PTE_MASK, to stop PAE thinking
junk in the upper word is good; and x86_64 should follow x86_32's stricter
comparison, to stop thinking any subset of required bits is good); but that
should be a later patch.

Fix Hans' good observation that follow_page() will never find pmd_huge()
because that would have already failed the pmd_bad test: test pmd_huge in
between the pmd_none and pmd_bad tests.  Tighten x86's pmd_huge() check?
No, once it's a hugepage entry, it can get quite far from a good pmd: for
example, PROT_NONE leaves it with only ACCESSED of the KERN_PGTABLE bits.

However... though follow_page() contains this and another test for huge
pages, so it's nice to keep it working on them, where does it actually get
called on a huge page?  get_user_pages() checks is_vm_hugetlb_page(vma) to
to call alternative hugetlb processing, as does unmap_vmas() and others.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Earlier-version-tested-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jeff Chua <jeff.chua.linux@gmail.com>
Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-06 13:08:58 -07:00
Linus Torvalds
bb896afe20 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-fixes:
  sched: default to n for GROUP_SCHED and FAIR_GROUP_SCHED
  sched: add optional support for CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
  sched, x86: add HAVE_UNSTABLE_SCHED_CLOCK
  sched: fix cpu clock
  sched: fair-group: fix a Div0 error of the fair group scheduler
  sched: fix missing locking in sched_domains code
  sched: make clock sync tunable by architecture code
  sched: fix debugging
  sched: fix sched_info_switch not being called according to documentation
  sched: fix hrtick_start_fair and CPU-Hotplug
  sched: fix SCHED_FAIR wake-idle logic error
  sched: fix RT task-wakeup logic
  sched: add statics, don't return void expressions
  sched: add debug checks to idle functions
  sched: remove old sched doc
  sched: make rt_sched_class, idle_sched_class static
  sched: optimize calc_delta_mine()
  sched: fix normalized sleeper
2008-05-05 17:31:14 -07:00
Ingo Molnar
a5574cf65b sched, x86: add HAVE_UNSTABLE_SCHED_CLOCK
add the HAVE_UNSTABLE_SCHED_CLOCK, for architectures to select.

the next change utilizes it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-05 23:56:18 +02:00
Linus Torvalds
108c196184 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  x86 PCI: call dmi_check_pciprobe()
  x86/pci: add pci=skip_isa_align command lines.
  x86/pci: remove flag in pci_cfg_space_size_ext
  x86: fix section mismatch in pci_scan_bus
2008-05-05 12:39:10 -07:00
Yinghai Lu
0df18ff366 x86 PCI: call dmi_check_pciprobe()
this change:

| commit 08f1c192c3
| Author: Muli Ben-Yehuda <muli@il.ibm.com>
| Date:   Sun Jul 22 00:23:39 2007 +0300
|
|    x86-64: introduce struct pci_sysdata to facilitate sharing of ->sysdata
|
|    This patch introduces struct pci_sysdata to x86 and x86-64, and
|    converts the existing two users (NUMA, Calgary) to use it.
|
|    This lays the groundwork for having other users of sysdata, such as
|    the PCI domains work.
|
|    The Calgary bits are tested, the NUMA bits just look ok.

replaces pcibios_scan_root with pci_scan_bus_parented...

but in pcibios_scan_root we have a DMI check:

    dmi_check_system(pciprobe_dmi_table);

when when have several peer root buses this could be called multiple
times (which is bad), so move that call to pci_access_init().

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-05-05 09:24:00 -07:00
Yinghai Lu
13a6ddb08e x86/pci: add pci=skip_isa_align command lines.
so we don't align the io port start address for pci cards.

also move out dmi check out acpi.c, because it has nothing to do with acpi.
it could spare some calling when we have several peer root buses.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-05-05 09:22:08 -07:00
Linus Torvalds
45ea2103d8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-fixes:
  x86: fix setup printk format warning
  x86: olpc build fix
  x86: video/fbdev.c: add MODULE_LICENSE
  x86: fix up bootparam.h for userspace inclusion
  x86: relocs ELF handling - use SELFMAG instead of numeric constant
  x86: vdso ELF handling - use SELFMAG instead of numeric constant
  x86: remove dell reboot dmi quirk board name match
  x86: es7000 build fix
  x86: make additional_cpus static
  x86: make start_secondary() static
  kbuild, suspend, x86: fix rebuild of wakeup.bin
  uml: fix gcc problem
  x86: undo visws/numaq build changes
2008-05-04 17:11:43 -07:00