* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits)
m68k: rename global variable vmalloc_end to m68k_vmalloc_end
percpu: add missing per_cpu_ptr_to_phys() definition for UP
percpu: Fix kdump failure if booted with percpu_alloc=page
percpu: make misc percpu symbols unique
percpu: make percpu symbols in ia64 unique
percpu: make percpu symbols in powerpc unique
percpu: make percpu symbols in x86 unique
percpu: make percpu symbols in xen unique
percpu: make percpu symbols in cpufreq unique
percpu: make percpu symbols in oprofile unique
percpu: make percpu symbols in tracer unique
percpu: make percpu symbols under kernel/ and mm/ unique
percpu: remove some sparse warnings
percpu: make alloc_percpu() handle array types
vmalloc: fix use of non-existent percpu variable in put_cpu_var()
this_cpu: Use this_cpu_xx in trace_functions_graph.c
this_cpu: Use this_cpu_xx for ftrace
this_cpu: Use this_cpu_xx in nmi handling
this_cpu: Use this_cpu operations in RCU
this_cpu: Use this_cpu ops for VM statistics
...
Fix up trivial (famous last words) global per-cpu naming conflicts in
arch/x86/kvm/svm.c
mm/slab.c
Interrupt vector 0xec has been doubly defined in irq_vectors.h
It seems arbitrary whether LOCAL_PENDING_VECTOR or
UV_BAU_MESSAGE is the higher number. As long as they are
unique. If they are not unique we'll hit a BUG in
alloc_system_vector().
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: <stable@kernel.org>
LKML-Reference: <E1NJ9Pe-0004P7-0Q@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The simplest method was to add an extra asm-offsets.h
file in arch/$ARCH/include/asm that references the generated file.
We can now migrate the architectures one-by-one to reference
the generated file direct - and when done we can delete the
temporary arch/$ARCH/include/asm/asm-offsets.h file.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Michal Marek <mmarek@suse.cz>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86/amd-iommu: Fix PCI hotplug with passthrough mode
x86/amd-iommu: Fix passthrough mode
x86: mmio-mod.c: Use pr_fmt
x86: kmmio.c: Add and use pr_fmt(fmt)
x86: i8254.c: Add pr_fmt(fmt)
x86: setup_percpu.c: Use pr_<level> and add pr_fmt(fmt)
x86: es7000_32.c: Use pr_<level> and add pr_fmt(fmt)
x86: Print DMI_BOARD_NAME as well as DMI_PRODUCT_NAME from __show_regs()
x86: Factor duplicated code out of __show_regs() into show_regs_common()
arch/x86/kernel/microcode*: Use pr_fmt() and remove duplicated KERN_ERR prefix
x86, mce: fix confusion between bank attributes and mce attributes
x86/mce: Set up timer unconditionally
x86: Fix bogus warning in apic_noop.apic_write()
x86: Fix typo in arch/x86/mm/kmmio.c
x86: ASUS P4S800 reboot=bios quirk
* git://git.kernel.org/pub/scm/linux/kernel/git/viro/mmap:
Add missing alignment check in arch/score sys_mmap()
fix broken aliasing checks for MAP_FIXED on sparc32, mips, arm and sh
Get rid of open-coding in ia64_brk()
sparc_brk() is not needed anymore
switch do_brk() to get_unmapped_area()
Take arch_mmap_check() into get_unmapped_area()
fix a struct file leak in do_mmap_pgoff()
Unify sys_mmap*
Cut hugetlb case early for 32bit on ia64
arch_mmap_check() on mn10300
Kill ancient crap in s390 compat mmap
arm: add arch_mmap_check(), get rid of sys_arm_mremap()
file ->get_unmapped_area() shouldn't duplicate work of get_unmapped_area()
kill useless checks in sparc mremap variants
fix pgoff in "have to relocate" case of mremap()
fix the arch checks in MREMAP_FIXED case
fix checks for expand-in-place mremap
do_mremap() untangling, part 3
do_mremap() untangling, part 2
untangling do_mremap(), part 1
The current rd/wrmsr_on_cpus helpers assume that the supplied
cpumasks are contiguous. However, there are machines out there
like some K8 multinode Opterons which have a non-contiguous core
enumeration on each node (e.g. cores 0,2 on node 0 instead of 0,1), see
http://www.gossamer-threads.com/lists/linux/kernel/1160268.
This patch fixes out-of-bounds writes (see URL above) by adding per-CPU
msr structs which are used on the respective cores.
Additionally, two helpers, msrs_{alloc,free}, are provided for use by
the callers of the MSR accessors.
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20091211171440.GD31998@aftab>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
New helper - sys_mmap_pgoff(); switch syscalls to using it.
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Jens found the following crash/regression:
[ 0.000000] found SMP MP-table at [ffff8800000fdd80] fdd80
[ 0.000000] Kernel panic - not syncing: Overlapping early reservations 12-f011 MP-table mpc to 0-fff BIOS data page
and
[ 0.000000] Kernel panic - not syncing: Overlapping early reservations 12-f011 MP-table mpc to 6000-7fff TRAMPOLINE
and bisected it to b24c2a9 ("x86: Move find_smp_config()
earlier and avoid bootmem usage").
It turns out the BIOS is using the first 64k for mptable,
without reserving it.
So try to find good range for the real-mode trampoline instead of
hard coding it, in case some bios tries to use that range for sth.
Reported-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Tested-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
LKML-Reference: <4B21630A.6000308@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The device change notifier is initialized in the dma_ops
initialization path. But this path is never executed for
iommu=pt. Move the notifier initialization to IOMMU hardware
init code to fix this.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The data structure changes to use dev->archdata.iommu field
broke the iommu=pt mode because in this case the
dev->archdata.iommu was left uninitialized. This moves the
inititalization of the devices into the main init function
and fixes the problem.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Change 32-bit sys_clone to new PTREGSCALL stub, and merge with 64-bit.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260403316-5679-7-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Convert these to new PTREGSCALL stubs.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260403316-5679-6-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Change 32-bit sys_sigaltstack to PTREGSCALL2, and merge with 64-bit.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260403316-5679-5-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Change 32-bit sys_execve to PTREGSCALL3, and merge with 64-bit.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260403316-5679-4-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Change 32-bit sys_iopl to PTREGSCALL1, and merge with 64-bit.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260403316-5679-3-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Unify x86_32 and x86_64 implementations of __show_regs() header,
standardizing on the x86_64 format string in the process. Also,
32-bit will now call print_modules.
Signed-off-by: Andy Isaacson <adi@hexapodia.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Robert Hancock <hancockrwd@gmail.com>
Cc: Richard Zidlicky <rz@linux-m68k.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <20091208082942.GA27174@hexapodia.org>
[ v2: resolved conflict ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'timers-for-linus-hpet' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: hpet: Make WARN_ON understandable
x86: arch specific support for remapping HPET MSIs
intr-remap: generic support for remapping HPET MSIs
x86, hpet: Simplify the HPET code
x86, hpet: Disable per-cpu hpet timer if ARAT is supported
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
hwrng: core - Prevent too-small buffer sizes
hwrng: virtio-rng - Convert to new API
hwrng: core - Replace u32 in driver API with byte array
crypto: ansi_cprng - Move FIPS functions under CONFIG_CRYPTO_FIPS
crypto: testmgr - Add ghash algorithm test before provide to users
crypto: ghash-clmulni-intel - Put proper .data section in place
crypto: ghash-clmulni-intel - Use gas macro for PCLMULQDQ-NI and PSHUFB
crypto: aesni-intel - Use gas macro for AES-NI instructions
x86: Generate .byte code for some new instructions via gas macro
crypto: ghash-intel - Fix irq_fpu_usable usage
crypto: ghash-intel - Add PSHUFB macros
crypto: ghash-intel - Hard-code pshufb
crypto: ghash-intel - Fix building failure on x86_32
crypto: testmgr - Fix warning
crypto: ansi_cprng - Fix test in get_prng_bytes
crypto: hash - Remove cra_u.{digest,hash}
crypto: api - Remove digest case from procfs show handler
crypto: hash - Remove legacy hash/digest code
crypto: ansi_cprng - Add FIPS wrapper
crypto: ghash - Add PCLMULQDQ accelerated implementation
* 'x86-process-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86-64: merge the standard and compat start_thread() functions
x86-64: make compat_start_thread() match start_thread()
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (36 commits)
x86, mm: Correct the implementation of is_untracked_pat_range()
x86/pat: Trivial: don't create debugfs for memtype if pat is disabled
x86, mtrr: Fix sorting of mtrr after subtracting
x86: Move find_smp_config() earlier and avoid bootmem usage
x86, platform: Change is_untracked_pat_range() to bool; cleanup init
x86: Change is_ISA_range() into an inline function
x86, mm: is_untracked_pat_range() takes a normal semiclosed range
x86, mm: Call is_untracked_pat_range() rather than is_ISA_range()
x86: UV SGI: Don't track GRU space in PAT
x86: SGI UV: Fix BAU initialization
x86, numa: Use near(er) online node instead of roundrobin for NUMA
x86, numa, bootmem: Only free bootmem on NUMA failure path
x86: Change crash kernel to reserve via reserve_early()
x86: Eliminate redundant/contradicting cache line size config options
x86: When cleaning MTRRs, do not fold WP into UC
x86: remove "extern" from function prototypes in <asm/proto.h>
x86, mm: Report state of NX protections during boot
x86, mm: Clean up and simplify NX enablement
x86, pageattr: Make set_memory_(x|nx) aware of NX support
x86, sleep: Always save the value of EFER
...
Fix up conflicts (added both iommu_shutdown and is_untracked_pat_range)
to 'struct x86_platform_ops') in
arch/x86/include/asm/x86_init.h
arch/x86/kernel/x86_init.c
* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: ucode-amd: Move family check to microcde_amd.c's init function
x86, ucode-amd: Ensure ucode update on suspend/resume after CPU off/online cycle
x86: ucode-amd: Convert printk(KERN_*...) to pr_*(...)
x86: ucode-amd: Don't warn when no ucode is available for a CPU revision
x86: ucode-amd: Load ucode-patches once and not separately of each CPU
x86, amd-ucode: Remove needless log messages
* 'for-2.6.33' of git://git.kernel.dk/linux-2.6-block: (113 commits)
cfq-iosched: Do not access cfqq after freeing it
block: include linux/err.h to use ERR_PTR
cfq-iosched: use call_rcu() instead of doing grace period stall on queue exit
blkio: Allow CFQ group IO scheduling even when CFQ is a module
blkio: Implement dynamic io controlling policy registration
blkio: Export some symbols from blkio as its user CFQ can be a module
block: Fix io_context leak after failure of clone with CLONE_IO
block: Fix io_context leak after clone with CLONE_IO
cfq-iosched: make nonrot check logic consistent
io controller: quick fix for blk-cgroup and modular CFQ
cfq-iosched: move IO controller declerations to a header file
cfq-iosched: fix compile problem with !CONFIG_CGROUP
blkio: Documentation
blkio: Wait on sync-noidle queue even if rq_noidle = 1
blkio: Implement group_isolation tunable
blkio: Determine async workload length based on total number of queues
blkio: Wait for cfq queue to get backlogged if group is empty
blkio: Propagate cgroup weight updation to cfq groups
blkio: Drop the reference to queue once the task changes cgroup
blkio: Provide some isolation between groups
...
* 'kvm-updates/2.6.33' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (84 commits)
KVM: VMX: Fix comparison of guest efer with stale host value
KVM: s390: Fix prefix register checking in arch/s390/kvm/sigp.c
KVM: Drop user return notifier when disabling virtualization on a cpu
KVM: VMX: Disable unrestricted guest when EPT disabled
KVM: x86 emulator: limit instructions to 15 bytes
KVM: s390: Make psw available on all exits, not just a subset
KVM: x86: Add KVM_GET/SET_VCPU_EVENTS
KVM: VMX: Report unexpected simultaneous exceptions as internal errors
KVM: Allow internal errors reported to userspace to carry extra data
KVM: Reorder IOCTLs in main kvm.h
KVM: x86: Polish exception injection via KVM_SET_GUEST_DEBUG
KVM: only clear irq_source_id if irqchip is present
KVM: x86: disallow KVM_{SET,GET}_LAPIC without allocated in-kernel lapic
KVM: x86: disallow multiple KVM_CREATE_IRQCHIP
KVM: VMX: Remove vmx->msr_offset_efer
KVM: MMU: update invlpg handler comment
KVM: VMX: move CR3/PDPTR update to vmx_set_cr3
KVM: remove duplicated task_switch check
KVM: powerpc: Fix BUILD_BUG_ON condition
KVM: VMX: Use shared msr infrastructure
...
Trivial conflicts due to new Kconfig options in arch/Kconfig and kernel/Makefile
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits)
mac80211: fix reorder buffer release
iwmc3200wifi: Enable wimax core through module parameter
iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter
iwmc3200wifi: Coex table command does not expect a response
iwmc3200wifi: Update wiwi priority table
iwlwifi: driver version track kernel version
iwlwifi: indicate uCode type when fail dump error/event log
iwl3945: remove duplicated event logging code
b43: fix two warnings
ipw2100: fix rebooting hang with driver loaded
cfg80211: indent regulatory messages with spaces
iwmc3200wifi: fix NULL pointer dereference in pmkid update
mac80211: Fix TX status reporting for injected data frames
ath9k: enable 2GHz band only if the device supports it
airo: Fix integer overflow warning
rt2x00: Fix padding bug on L2PAD devices.
WE: Fix set events not propagated
b43legacy: avoid PPC fault during resume
b43: avoid PPC fault during resume
tcp: fix a timewait refcnt race
...
Fix up conflicts due to sysctl cleanups (dead sysctl_check code and
CTL_UNNUMBERED removed) in
kernel/sysctl_check.c
net/ipv4/sysctl_net_ipv4.c
net/ipv6/addrconf.c
net/sctp/sysctl.c
* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6: (43 commits)
security/tomoyo: Remove now unnecessary handling of security_sysctl.
security/tomoyo: Add a special case to handle accesses through the internal proc mount.
sysctl: Drop & in front of every proc_handler.
sysctl: Remove CTL_NONE and CTL_UNNUMBERED
sysctl: kill dead ctl_handler definitions.
sysctl: Remove the last of the generic binary sysctl support
sysctl net: Remove unused binary sysctl code
sysctl security/tomoyo: Don't look at ctl_name
sysctl arm: Remove binary sysctl support
sysctl x86: Remove dead binary sysctl support
sysctl sh: Remove dead binary sysctl support
sysctl powerpc: Remove dead binary sysctl support
sysctl ia64: Remove dead binary sysctl support
sysctl s390: Remove dead sysctl binary support
sysctl frv: Remove dead binary sysctl support
sysctl mips/lasat: Remove dead binary sysctl support
sysctl drivers: Remove dead binary sysctl support
sysctl crypto: Remove dead binary sysctl support
sysctl security/keys: Remove dead binary sysctl support
sysctl kernel: Remove binary sysctl logic
...
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, msr, cpumask: Use struct cpumask rather than the deprecated cpumask_t
x86, cpuid: Simplify the code in cpuid_open
x86, cpuid: Remove the bkl from cpuid_open()
x86, msr: Remove the bkl from msr_open()
x86: AMD Geode LX optimizations
x86, msr: Unify rdmsr_on_cpus/wrmsr_on_cpus
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
include/linux/compiler-gcc4.h: Fix build bug - gcc-4.0.2 doesn't understand __builtin_object_size
x86/alternatives: No need for alternatives-asm.h to re-invent stuff already in asm.h
x86/alternatives: Check replacementlen <= instrlen at build time
x86, 64-bit: Set data segments to null after switching to 64-bit mode
x86: Clean up the loadsegment() macro
x86: Optimize loadsegment()
x86: Add missing might_fault() checks to copy_{to,from}_user()
x86-64: __copy_from_user_inatomic() adjustments
x86: Remove unused thread_return label from switch_to()
x86, 64-bit: Fix bstep_iret jump
x86: Don't use the strict copy checks when branch profiling is in use
x86, 64-bit: Move K8 B step iret fixup to fault entry asm
x86: Generate cmpxchg build failures
x86: Add a Kconfig option to turn the copy_from_user warnings into errors
x86: Turn the copy_from_user check into an (optional) compile time warning
x86: Use __builtin_memset and __builtin_memcpy for memset/memcpy
x86: Use __builtin_object_size() to validate the buffer size for copy_from_user()
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (30 commits)
x86, apic: Enable lapic nmi watchdog on AMD Family 11h
x86: Remove unnecessary mdelay() from cpu_disable_common()
x86, ioapic: Document another case when level irq is seen as an edge
x86, ioapic: Fix the EOI register detection mechanism
x86, io-apic: Move the effort of clearing remoteIRR explicitly before migrating the irq
x86: SGI UV: Map low MMR ranges
x86: apic: Print out SRAT table APIC id in hex
x86: Re-get cfg_new in case reuse/move irq_desc
x86: apic: Remove not needed #ifdef
x86: io-apic: IO-APIC MMIO should not fail on resource insertion
x86: Remove asm/apicnum.h
x86: apic: Do not use stacked physid_mask_t
x86, apic: Get rid of apicid_to_cpu_present assign on 64-bit
x86, ioapic: Use snrpintf while set names for IO-APIC resourses
x86, apic: Use PAGE_SIZE instead of numbers
x86: Remove local_irq_enable()/local_irq_disable() in fixup_irqs()
x86: Use EOI register in io-apic on intel platforms
x86: Force irq complete move during cpu offline
x86: Remove move_cleanup_count from irq_cfg
x86, intr-remap: Avoid irq_chip mask/unmask in fixup_irqs() for intr-remapping
...
* 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (63 commits)
x86, Calgary IOMMU quirk: Find nearest matching Calgary while walking up the PCI tree
x86/amd-iommu: Remove amd_iommu_pd_table
x86/amd-iommu: Move reset_iommu_command_buffer out of locked code
x86/amd-iommu: Cleanup DTE flushing code
x86/amd-iommu: Introduce iommu_flush_device() function
x86/amd-iommu: Cleanup attach/detach_device code
x86/amd-iommu: Keep devices per domain in a list
x86/amd-iommu: Add device bind reference counting
x86/amd-iommu: Use dev->arch->iommu to store iommu related information
x86/amd-iommu: Remove support for domain sharing
x86/amd-iommu: Rearrange dma_ops related functions
x86/amd-iommu: Move some pte allocation functions in the right section
x86/amd-iommu: Remove iommu parameter from dma_ops_domain_alloc
x86/amd-iommu: Use get_device_id and check_device where appropriate
x86/amd-iommu: Move find_protection_domain to helper functions
x86/amd-iommu: Simplify get_device_resources()
x86/amd-iommu: Let domain_for_device handle aliases
x86/amd-iommu: Remove iommu specific handling from dma_ops path
x86/amd-iommu: Remove iommu parameter from __(un)map_single
x86/amd-iommu: Make alloc_new_range aware of multiple IOMMUs
...
Use the new unreachable() macro instead of for(;;);. When
allyesconfig is built with a GCC-4.5 snapshot on i686 the size of the
text segment is reduced by 3987 bytes (from 6827019 to 6823032).
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: x86@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
"Definition" is misspelled "defintion" in several comments; this
patch fixes them. No code changes.
Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
update_transition_efer() masks out some efer bits when deciding whether
to switch the msr during guest entry; for example, NX is emulated using the
mmu so we don't need to disable it, and LMA/LME are handled by the hardware.
However, with shared msrs, the comparison is made against a stale value;
at the time of the guest switch we may be running with another guest's efer.
Fix by deferring the mask/compare to the actual point of guest entry.
Noted by Marcelo.
Signed-off-by: Avi Kivity <avi@redhat.com>
While we are never normally passed an instruction that exceeds 15 bytes,
smp games can cause us to attempt to interpret one, which will cause
large latencies in non-preempt hosts.
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
This new IOCTL exports all yet user-invisible states related to
exceptions, interrupts, and NMIs. Together with appropriate user space
changes, this fixes sporadic problems of vmsave/restore, live migration
and system reset.
[avi: future-proof abi by adding a flags field]
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The various syscall-related MSRs are fairly expensive to switch. Currently
we switch them on every vcpu preemption, which is far too often:
- if we're switching to a kernel thread (idle task, threaded interrupt,
kernel-mode virtio server (vhost-net), for example) and back, then
there's no need to switch those MSRs since kernel threasd won't
be exiting to userspace.
- if we're switching to another guest running an identical OS, most likely
those MSRs will have the same value, so there's little point in reloading
them.
- if we're running the same OS on the guest and host, the MSRs will have
identical values and reloading is unnecessary.
This patch uses the new user return notifiers to implement last-minute
switching, and checks the msr values to avoid unnecessary reloading.
Signed-off-by: Avi Kivity <avi@redhat.com>
When we migrate a kvm guest that uses pvclock between two hosts, we may
suffer a large skew. This is because there can be significant differences
between the monotonic clock of the hosts involved. When a new host with
a much larger monotonic time starts running the guest, the view of time
will be significantly impacted.
Situation is much worse when we do the opposite, and migrate to a host with
a smaller monotonic clock.
This proposed ioctl will allow userspace to inform us what is the monotonic
clock value in the source host, so we can keep the time skew short, and
more importantly, never goes backwards. Userspace may also need to trigger
the current data, since from the first migration onwards, it won't be
reflected by a simple call to clock_gettime() anymore.
[marcelo: future-proof abi with a flags field]
[jan: fix KVM_GET_CLOCK by clearing flags field instead of checking it]
Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Commit 705c5323 opened the doors of hell by unconditionally injecting
single-step flags as long as guest_debug signaled this. This doesn't
work when the guest branches into some interrupt or exception handler
and triggers a vmexit with flag reloading.
Fix it by saving cs:rip when user space requests single-stepping and
restricting the trace flag injection to this guest code position.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Support for Xen PV-on-HVM guests can be implemented almost entirely in
userspace, except for handling one annoying MSR that maps a Xen
hypercall blob into guest address space.
A generic mechanism to delegate MSR writes to userspace seems overkill
and risks encouraging similar MSR abuse in the future. Thus this patch
adds special support for the Xen HVM MSR.
I implemented a new ioctl, KVM_XEN_HVM_CONFIG, that lets userspace tell
KVM which MSR the guest will write to, as well as the starting address
and size of the hypercall blobs (one each for 32-bit and 64-bit) that
userspace has loaded from files. When the guest writes to the MSR, KVM
copies one page of the blob from userspace to the guest.
I've tested this patch with a hacked-up version of Gerd's userspace
code, booting a number of guests (CentOS 5.3 i386 and x86_64, and
FreeBSD 8.0-RC1 amd64) and exercising PV network and block devices.
[jan: fix i386 build warning]
[avi: future proof abi with a flags field]
Signed-off-by: Ed Swierk <eswierk@aristanetworks.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
New AMD processors (Family 0x10 models 8+) support the Pause
Filter Feature. This feature creates a new field in the VMCB
called Pause Filter Count. If Pause Filter Count is greater
than 0 and intercepting PAUSEs is enabled, the processor will
increment an internal counter when a PAUSE instruction occurs
instead of intercepting. When the internal counter reaches the
Pause Filter Count value, a PAUSE intercept will occur.
This feature can be used to detect contended spinlocks,
especially when the lock holding VCPU is not scheduled.
Rescheduling another VCPU prevents the VCPU seeking the
lock from wasting its quantum by spinning idly.
Experimental results show that most spinlocks are held
for less than 1000 PAUSE cycles or more than a few
thousand. Default the Pause Filter Counter to 3000 to
detect the contended spinlocks.
Processor support for this feature is indicated by a CPUID
bit.
On a 24 core system running 4 guests each with 16 VCPUs,
this patch improved overall performance of each guest's
32 job kernbench by approximately 3-5% when combined
with a scheduler algorithm thati caused the VCPU to
sleep for a brief period. Further performance improvement
may be possible with a more sophisticated yield algorithm.
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
New NHM processors will support Pause-Loop Exiting by adding 2 VM-execution
control fields:
PLE_Gap - upper bound on the amount of time between two successive
executions of PAUSE in a loop.
PLE_Window - upper bound on the amount of time a guest is allowed to execute in
a PAUSE loop
If the time, between this execution of PAUSE and previous one, exceeds the
PLE_Gap, processor consider this PAUSE belongs to a new loop.
Otherwise, processor determins the the total execution time of this loop(since
1st PAUSE in this loop), and triggers a VM exit if total time exceeds the
PLE_Window.
* Refer SDM volume 3b section 21.6.13 & 22.1.3.
Pause-Loop Exiting can be used to detect Lock-Holder Preemption, where one VP
is sched-out after hold a spinlock, then other VPs for same lock are sched-in
to waste the CPU time.
Our tests indicate that most spinlocks are held for less than 212 cycles.
Performance tests show that with 2X LP over-commitment we can get +2% perf
improvement for kernel build(Even more perf gain with more LPs).
Signed-off-by: Zhai Edwin <edwin.zhai@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Push TF and RF injection and filtering on guest single-stepping into the
vender get/set_rflags callbacks. This makes the whole mechanism more
robust wrt user space IOCTL order and instruction emulations.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Much of so far vendor-specific code for setting up guest debug can
actually be handled by the generic code. This also fixes a minor deficit
in the SVM part /wrt processing KVM_GUESTDBG_ENABLE.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
X86 CPUs need to have some magic happening to enable the virtualization
extensions on them. This magic can result in unpleasant results for
users, like blocking other VMMs from working (vmx) or using invalid TLB
entries (svm).
Currently KVM activates virtualization when the respective kernel module
is loaded. This blocks us from autoloading KVM modules without breaking
other VMMs.
To circumvent this problem at least a bit, this patch introduces on
demand activation of virtualization. This means, that instead
virtualization is enabled on creation of the first virtual machine
and disabled on destruction of the last one.
So using this, KVM can be easily autoloaded, while keeping other
hypervisors usable.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Maintain back mapping from irqchip/pin to gsi to speedup
interrupt acknowledgment notifications.
[avi: build fix on non-x86/ia64]
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This removes assumptions that max GSIs is smaller than number of pins.
Sharing is tracked on pin level not GSI level.
[avi: no PIC on ia64]
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This at once also gets the alignment specification right for
x86-64.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4B0FF8F80200007800022708@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Having run into the run-(boot-)time check a couple of times lately,
I finally took time to find a build-time check so that one doesn't
need to analyze the register/stack dump and resolve this (through
manual lookup in vmlinux) to the offending construct.
The assembler will emit a message like "Error: value of <num> too
large for field of 1 bytes at <offset>", which while not pointing
out the source location still makes analysis quite a bit easier.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4B0FF8AA0200007800022703@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The semantics the PAT code expect of is_untracked_pat_range() is "is
this range completely contained inside the untracked region." This
means that checkin 8a27138924 was
technically wrong, because the implementation needlessly confusing.
The sane interface is for it to take a semiclosed range like just
about everything else (as evidenced by the sheer number of "- 1"'s
removed by that patch) so change the actual implementation to match.
Reported-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jack Steiner <steiner@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <20091119202341.GA4420@sgi.com>
The data that was stored in this table is now available in
dev->archdata.iommu. So this table is not longer necessary.
This patch removes the remaining uses of that variable and
removes it from the code.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch cleans up the code to flush device table entries
in the IOMMU. With this chance the driver can get rid of the
iommu_queue_inv_dev_entry() function.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch introduces a list to each protection domain which
keeps all devices associated with the domain. This can be
used later to optimize certain functions and to completly
remove the amd_iommu_pd_table.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch adds a reference count to each device to count
how often the device was bound to that domain. This is
important for single devices that act as an alias for a
number of others. These devices must stay bound to their
domains until all devices that alias to it are unbound from
the same domain.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch changes IOMMU code to use dev->archdata->iommu to
store information about the alias device and the domain the
device is attached to.
This allows the driver to get rid of the amd_iommu_pd_table
in the future.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch makes device isolation mandatory and removes
support for the amd_iommu=share option. This simplifies the
code in several places.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The non-present cache flag was IOMMU local until now which
doesn't make sense. Make this a global flag so we can remove
the lase user of 'struct iommu' in the map/unmap path.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch adds code to keep a global list of all protection
domains. This allows to simplify the resume code.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch adds reference counting for protection domains
per IOMMU. This allows a smarter TLB flushing strategy.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch adds an index field to struct amd_iommu which can
be used to lookup it up in an array. This index will be used
in struct protection_domain to keep track which protection
domain has devices behind which IOMMU.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch updates the copyright headers in the relevant AMD
IOMMU driver files to match the date of the latest changes.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch moves all function declarations which are only
used inside the driver code to a seperate header file.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Make it readable in the source too, not just in the assembly output.
No change in functionality.
Cc: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1259176706-5908-1-git-send-email-brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Zero the input register in the exception handler instead of
using an extra register to pass in a zero value.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1259176706-5908-1-git-send-email-brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mtdblock driver doesn't call flush_dcache_page for pages in request. So,
this causes problems on architectures where the icache doesn't fill from
the dcache or with dcache aliases. The patch fixes this.
The ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE symbol was introduced to avoid
pointless empty cache-thrashing loops on architectures for which
flush_dcache_page() is a no-op. Every architecture was provided with this
flush pages on architectires where ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE is
equal 1 or do nothing otherwise.
See "fix mtd_blkdevs problem with caches on some architectures" discussion
on LKML for more information.
Signed-off-by: Ilya Loginov <isloginov@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Peter Horton <phorton@bitbox.co.uk>
Cc: "Ed L. Cashin" <ecashin@coraid.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Percpu symbols now occupy the same namespace as other global
symbols and as such short global symbols without subsystem
prefix tend to collide with local variables. dr7 percpu
variable used by x86 was hit by this. Rename it to cpu_dr7.
The rename also makes it more consistent with its fellow
cpu_debugreg percpu variable.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <20091125115856.GA17856@elte.hu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
This patch factors out the search for an MMCONFIG region, which was
previously implemented in both mmconfig_32 and mmconfig_64. No functional
change.
Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This changes pci_mmcfg_region from a table to a list, to make it easier
to add and remove MMCONFIG regions for PCI host bridge hotplug.
Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The virtual address is only used for x86_64, but it's so much simpler
to manage it as part of the pci_mmcfg_region that I think it's worth
wasting a pointer per MMCONFIG region on x86_32.
Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch adds a resource and corresponding name to the MMCONFIG
structure. This makes allocation simpler (we can allocate the
resource and name at the same time we allocate the pci_mmcfg_region),
and gives us a way to hang onto the resource after inserting it.
This will be needed so we can release and free it when hot-removing
a host bridge.
Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This only renames the struct pci_mmcfg_region members; no functional change.
Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This adds a struct pci_mmcfg_region with a little more information
than the struct acpi_mcfg_allocation used previously. The acpi_mcfg
structure is defined by the spec, so we can't change it.
To begin with, struct pci_mmcfg_region is basically the same as the
ACPI MCFG version, but future patches will add more information.
Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This factors out the common "bus << 20" expression used when computing the
MMCONFIG address.
Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Move the find_smp_config() call to before bootmem is initialized.
Use reserve_early() instead of reserve_bootmem() in it.
This simplifies the code, we only need to call find_smp_config()
once and can remove the now unneeded reserve parameter from
x86_init_mpparse::find_smp_config.
We thus also reduce x86's dependency on bootmem allocations.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4B0BB9F2.70907@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- Change is_untracked_pat_range() to return bool.
- Clean up the initialization of is_untracked_pat_range() -- by default,
we simply point it at is_ISA_range() directly.
- Move is_untracked_pat_range to the end of struct x86_platform, since
it is the newest field.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20091119202341.GA4420@sgi.com>
Change is_ISA_range() from a macro to an inline function. This makes
it type safe, and also allows it to be assigned to a function pointer
if necessary.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <20091119202341.GA4420@sgi.com>
is_untracked_pat_range() -- like its components, is_ISA_range() and
is_GRU_range(), takes a normal semiclosed interval (>=, <) whereas the
PAT code called it as if it took a closed range (>=, <=). Fix.
Although this is a bug, I believe it is non-manifest, simply because
none of the callers will call this with non-page-aligned addresses.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20091119202341.GA4420@sgi.com>
Checkin fd12a0d69a made the PAT
untracked range a platform configurable, but missed on occurrence of
is_ISA_range() which still refers to PAT-untracked memory, and
therefore should be using the configurable.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20091119202341.GA4420@sgi.com>
GRU space is always mapped as WB in the page table. There is
no need to track the mappings in the PAT. This also eliminates
the "freeing invalid memtype" messages when the GRU space is
unmapped.
Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20091119202341.GA4420@sgi.com>
[ v2: fix build failure ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
irq_thermal_count is only being maintained when
X86_THERMAL_VECTOR, and both X86_THERMAL_VECTOR and
X86_MCE_THRESHOLD don't need extra wrapping in X86_MCE
conditionals.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Yong Wang <yong.y.wang@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Arjan van de Ven <arjan@infradead.org>
LKML-Reference: <4B06AFA902000078000211F8@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
arch/x86/kernel/kprobes.c
kernel/trace/Makefile
Merge reason: hw-breakpoints perf integration is looking
good in testing and in reviews, plus conflicts
are mounting up - so merge & resolve.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Rather than having X86_L1_CACHE_BYTES and X86_L1_CACHE_SHIFT
(with inconsistent defaults), just having the latter suffices as
the former can be easily calculated from it.
To be consistent, also change X86_INTERNODE_CACHE_BYTES to
X86_INTERNODE_CACHE_SHIFT, and set it to 7 (128 bytes) for NUMA
to account for last level cache line size (which here matters
more than L1 cache line size).
Finally, make sure the default value for X86_L1_CACHE_SHIFT,
when X86_GENERIC is selected, is being seen before that for the
individual CPU model options (other than on x86-64, where
GENERIC_CPU is part of the choice construct, X86_GENERIC is a
separate option on ix86).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Ravikiran Thirumalai <kiran@scalex86.org>
Acked-by: Nick Piggin <npiggin@suse.de>
LKML-Reference: <4AFD5710020000780001F8F0@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Resolve the conflict between v2.6.32-rc7 where dn_def_dev_handler
gets a small bug fix and the sysctl tree where I am removing all
sysctl strategy routines.
This kills bad_dma_address variable, the old mechanism to enable
IOMMU drivers to make dma_mapping_error() work in IOMMU's
specific way.
bad_dma_address variable was introduced to enable IOMMU drivers
to make dma_mapping_error() work in IOMMU's specific way.
However, it can't handle systems that use both swiotlb and HW
IOMMU. SO we introduced dma_map_ops->mapping_error to solve that
case.
Intel VT-d, GART, and swiotlb already use
dma_map_ops->mapping_error. Calgary, AMD IOMMU, and nommu use
zero for an error dma address. This adds DMA_ERROR_CODE and
converts them to use it (as SPARC and POWER does).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: muli@il.ibm.com
Cc: joerg.roedel@amd.com
LKML-Reference: <1258287594-8777-3-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It is possible for x86_64 systems to lack the NX bit either due to the
hardware lacking support or the BIOS having turned off the CPU capability,
so NX status should be reported. Additionally, anyone booting NX-capable
CPUs in 32bit mode without PAE will lack NX functionality, so this change
provides feedback for that case as well.
Signed-off-by: Kees Cook <kees.cook@canonical.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <1258154897-6770-6-git-send-email-hpa@zytor.com>
The 32- and 64-bit code used very different mechanisms for enabling
NX, but even the 32-bit code was enabling NX in head_32.S if it is
available. Furthermore, we had a bewildering collection of tests for
the available of NX.
This patch:
a) merges the 32-bit set_nx() and the 64-bit check_efer() function
into a single x86_configure_nx() function. EFER control is left
to the head code.
b) eliminates the nx_enabled variable entirely. Things that need to
test for NX enablement can verify __supported_pte_mask directly,
and cpu_has_nx gives the supported status of NX.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Chris Wright <chrisw@sous-sol.org>
LKML-Reference: <1258154897-6770-5-git-send-email-hpa@zytor.com>
Acked-by: Kees Cook <kees.cook@canonical.com>
If IO-APIC base address is 1K aligned we should not fail
on resourse insertion procedure. For this sake we define
IO_APIC_SLOT_SIZE constant which should cover all IO-APIC
direct accessible registers.
An example of a such configuration is there
http://marc.info/?l=linux-kernel&m=118114792006520
|
| Quoting the message
|
| IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23
| IOAPIC[1]: apic_id 3, version 32, address 0xfec80000, GSI 24-47
| IOAPIC[2]: apic_id 4, version 32, address 0xfec80400, GSI 48-71
| IOAPIC[3]: apic_id 5, version 32, address 0xfec84000, GSI 72-95
| IOAPIC[4]: apic_id 8, version 32, address 0xfec84400, GSI 96-119
|
Reported-by: "Maciej W. Rozycki" <macro@linux-mips.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20091116151426.GC5653@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On x86-64, copy_[to|from]_user() rely on assembly routines that
never call might_fault(), making us missing various lockdep
checks.
This doesn't apply to __copy_from,to_user() that explicitly
handle these calls, neither is it a problem in x86-32 where
copy_to,from_user() rely on the "__" prefixed versions that
also call might_fault().
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1258382538-30979-1-git-send-email-fweisbec@gmail.com>
[ v2: fix module export ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This v2.6.26 commit:
ad2fc2c: x86: fix copy_user on x86
rendered __copy_from_user_inatomic() identical to
copy_user_generic(), yet didn't make the former just call the
latter from an inline function.
Furthermore, this v2.6.19 commit:
b885808: [PATCH] Add proper sparse __user casts to __copy_to_user_inatomic
converted the return type of __copy_to_user_inatomic() from
unsigned long to int, but didn't do the same to
__copy_from_user_inatomic().
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: <v.mayatskih@gmail.com>
LKML-Reference: <4AFD5778020000780001F8F4@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
iommu_init_noop() is in arch/x86/kernel/x86_init.c but
iommu_shutdown_noop() in arch/x86/include/asm/iommu.h.
This moves iommu_shutdown_noop() to x86_init.c for consistency.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
LKML-Reference: <1258199198-16657-3-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This build error:
arch/x86/kvm/x86.c:3655: error: implicit declaration of function 'hw_breakpoint_restore'
Happens because in the CONFIG_KVM=m case there's no 'CONFIG_KVM' define
in the kernel - it's CONFIG_KVM_MODULE in that case.
Make the prototype available unconditionally.
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
LKML-Reference: <1258114575-32655-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
x86/PCI: Adjust GFP mask handling for coherent allocations
PCI ASPM: fix oops on root port removal
Instead of using bootmem, try find_e820_area()/reserve_early(),
and call acpi_reserve_memory() early, to allocate the wakeup
trampoline code area below 1M.
This is more reliable, and it also removes a dependency on
bootmem.
-v2: change function name to acpi_reserve_wakeup_memory(),
as suggested by Rafael.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: pm list <linux-pm@lists.linux-foundation.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <4AFA210B.3020207@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
POWERPC doesn't expect it to be used.
This fixes the linux-next build failure reported by
Stephen Rothwell:
lib/swiotlb.c: In function 'setup_io_tlb_npages':
lib/swiotlb.c:114: error: 'swiotlb' undeclared (first use in this function)
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: peterz@infradead.org
LKML-Reference: <20091112000258F.fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If HW IOMMU initialization fails (Intel VT-d often does this,
typically due to BIOS bugs), we fall back to nommu. It doesn't
work for the majority since nowadays we have more than 4GB
memory so we must use swiotlb instead of nommu.
The problem is that it's too late to initialize swiotlb when HW
IOMMU initialization fails. We need to allocate swiotlb memory
earlier from bootmem allocator. Chris explained the issue in
detail:
http://marc.info/?l=linux-kernel&m=125657444317079&w=2
The current x86 IOMMU initialization sequence is too complicated
and handling the above issue makes it more hacky.
This patch changes x86 IOMMU initialization sequence to handle
the above issue cleanly.
The new x86 IOMMU initialization sequence are:
1. we initialize the swiotlb (and setting swiotlb to 1) in the case
of (max_pfn > MAX_DMA32_PFN && !no_iommu). dma_ops is set to
swiotlb_dma_ops or nommu_dma_ops. if swiotlb usage is forced by
the boot option, we finish here.
2. we call the detection functions of all the IOMMUs
3. the detection function sets x86_init.iommu.iommu_init to the
IOMMU initialization function (so we can avoid calling the
initialization functions of all the IOMMUs needlessly).
4. if the IOMMU initialization function doesn't need to swiotlb
then sets swiotlb to zero (e.g. the initialization is
sucessful).
5. if we find that swiotlb is set to zero, we free swiotlb
resource.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
LKML-Reference: <1257849980-22640-10-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This changes amd_iommu_detect() to set amd_iommu_init to
iommu_init hook if amd_iommu_detect() finds the AMD IOMMU.
We can kill the code to check if we found the IOMMU in
amd_iommu_init() since amd_iommu_detect() sets amd_iommu_init()
only when it found the IOMMU.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
LKML-Reference: <1257849980-22640-5-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This changes gart_iommu_hole_init() to set gart_iommu_init() to
iommu_init hook if gart_iommu_hole_init() finds the GART IOMMU.
We can kill the code to check if we found the IOMMU in
gart_iommu_init() since gart_iommu_hole_init() sets
gart_iommu_init() only when it found the IOMMU.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
LKML-Reference: <1257849980-22640-4-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This changes detect_calgary() to set init_calgary() to
iommu_init hook if detect_calgary() finds the Calgary IOMMU.
We can kill the code to check if we found the IOMMU in
init_calgary() since detect_calgary() sets init_calgary() only
when it found the IOMMU.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
LKML-Reference: <1257849980-22640-3-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We call the detections functions of all the IOMMUs then all
their initialization functions. The latter is pointless since we
don't detect multiple different IOMMUs. What we need to do is
calling the initialization function of the detected IOMMU.
This adds iommu_init hook to x86_init_ops so if an IOMMU
detection function can set its initialization function to the
hook.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: chrisw@sous-sol.org
Cc: dwmw2@infradead.org
Cc: joerg.roedel@amd.com
Cc: muli@il.ibm.com
LKML-Reference: <1257849980-22640-2-git-send-email-fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This also implies that corresponding log messages, e.g.
platform microcode: firmware: requesting amd-ucode/microcode_amd.bin
show up only once on module load and not when ucode is updated
for each CPU.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: dimm <dmitry.adamushko@gmail.com>
LKML-Reference: <20091110110723.GH30802@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Wrap in the cpu dr7 check that tells if we have active
breakpoints that need to be restored in the cpu.
This wrapper makes the check more self-explainable and also
reusable for any further other uses.
Reported-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
Fix the broken a.out format dump. For now we only dump the ptrace
breakpoints.
TODO: Dump every perf breakpoints for the current thread, not only
ptrace based ones.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
On platforms where the BIOS handles the thermal monitor interrupt,
APIC_LVTTHMR on each logical CPU is programmed to generate a SMI
and OS must not touch it.
Unfortunately AP bringup sequence using INIT-SIPI-SIPI clears all
the LVT entries except the mask bit. Essentially this results in
all LVT entries including the thermal monitoring interrupt set
to masked (clearing the bios programmed value for APIC_LVTTHMR).
And this leads to kernel take over the thermal monitoring
interrupt on AP's but not on BSP (leaving the bios programmed
value only on BSP).
As a result of this, we have seen system hangs when the thermal
monitoring interrupt is generated.
Fix this by reading the initial value of thermal LVT entry on
BSP and if bios has taken over the control, then program the
same value on all AP's and leave the thermal monitoring
interrupt control on all the logical cpu's to the bios.
Signed-off-by: Yong Wang <yong.y.wang@intel.com>
Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Arjan van de Ven <arjan@infradead.org>
LKML-Reference: <20091110013824.GA24940@ywang-moblin2.bj.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: stable@kernel.org
We should not use physid_mask_t as a stack based
variable in apic code. This type depends on MAX_APICS
parameter which may be huge enough.
Especially it became a problem with apic NOOP driver which
is portable between 32 bit and 64 bit environment
(where we have really huge MAX_APICS).
So apic driver should operate with pointers and a caller
in turn should aware of allocation physid_mask_t variable.
As a side (but positive) effect -- we may use already
implemented physid_set_mask_of_physid function eliminating
default_apicid_to_cpu_present completely.
Note that physids_coerce and physids_promote turned into static
inline from macro (since macro hides the fact that parameter is
being interpreted as unsigned long, make it explicit).
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
LKML-Reference: <20091109220659.GA5568@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It will take some time for binutils (gas) to support some newly added
instructions, such as SSE4.1 instructions or the AES-NI instructions
found in upcoming Intel CPU.
To make the source code can be compiled by old binutils, .byte code is
used instead of the assembly instruction. But the readability and
flexibility of raw .byte code is not good.
This patch solves the issue of raw .byte code via generating it via
assembly instruction like gas macro. The syntax is as close as
possible to real assembly instruction.
Some helper macros such as MODRM is not a full feature
implementation. It can be extended when necessary.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Rather than forcing GFP flags and DMA mask to be inconsistent,
GFP flags should be determined even for the fallback device
through dma_alloc_coherent_mask()/dma_alloc_coherent_gfp_flags().
This restores 64-bit behavior as it was prior to commits
8965eb1938 and
4a367f3a9d (not sure why there are
two of them), where GFP_DMA was forced on for 32-bit, but not
for 64-bit, with the slight adjustment that afaict even 32-bit
doesn't need this without CONFIG_ISA.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Takashi Iwai <tiwai@suse.de>
LKML-Reference: <4AF18187020000780001D8AA@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch rebase the implementation of the breakpoints API on top of
perf events instances.
Each breakpoints are now perf events that handle the
register scheduling, thread/cpu attachment, etc..
The new layering is now made as follows:
ptrace kgdb ftrace perf syscall
\ | / /
\ | / /
/
Core breakpoint API /
/
| /
| /
Breakpoints perf events
|
|
Breakpoints PMU ---- Debug Register constraints handling
(Part of core breakpoint API)
|
|
Hardware debug registers
Reasons of this rewrite:
- Use the centralized/optimized pmu registers scheduling,
implying an easier arch integration
- More powerful register handling: perf attributes (pinned/flexible
events, exclusive/non-exclusive, tunable period, etc...)
Impact:
- New perf ABI: the hardware breakpoints counters
- Ptrace breakpoints setting remains tricky and still needs some per
thread breakpoints references.
Todo (in the order):
- Support breakpoints perf counter events for perf tools (ie: implement
perf_bpcounter_event())
- Support from perf tools
Changes in v2:
- Follow the perf "event " rename
- The ptrace regression have been fixed (ptrace breakpoint perf events
weren't released when a task ended)
- Drop the struct hw_breakpoint and store generic fields in
perf_event_attr.
- Separate core and arch specific headers, drop
asm-generic/hw_breakpoint.h and create linux/hw_breakpoint.h
- Use new generic len/type for breakpoint
- Handle off case: when breakpoints api is not supported by an arch
Changes in v3:
- Fix broken CONFIG_KVM, we need to propagate the breakpoint api
changes to kvm when we exit the guest and restore the bp registers
to the host.
Changes in v4:
- Drop the hw_breakpoint_restore() stub as it is only used by KVM
- EXPORT_SYMBOL_GPL hw_breakpoint_restore() as KVM can be built as a
module
- Restore the breakpoints unconditionally on kvm guest exit:
TIF_DEBUG_THREAD doesn't anymore cover every cases of running
breakpoints and vcpu->arch.switch_db_regs might not always be
set when the guest used debug registers.
(Waiting for a reliable optimization)
Changes in v5:
- Split-up the asm-generic/hw-breakpoint.h moving to
linux/hw_breakpoint.h into a separate patch
- Optimize the breakpoints restoring while switching from kvm guest
to host. We only want to restore the state if we have active
breakpoints to the host, otherwise we don't care about messed-up
address registers.
- Add asm/hw_breakpoint.h to Kbuild
- Fix bad breakpoint type in trace_selftest.c
Changes in v6:
- Fix wrong header inclusion in trace.h (triggered a build
error with CONFIG_FTRACE_SELFTEST
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jan Kiszka <jan.kiszka@web.de>
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Paul Mundt <lethal@linux-sh.org>
This patch cleans up pci_iommu_shutdown() a bit to use
x86_platform (similar to how IA64 initializes an IOMMU driver).
This adds iommu_shutdown() to x86_platform to avoid calling
every IOMMUs' shutdown functions in pci_iommu_shutdown() in
order. The IOMMU shutdown functions are platform specific (we
don't have multiple different IOMMU hardware) so the current way
is pointless.
An IOMMU driver sets x86_platform.iommu_shutdown to the shutdown
function if necessary.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: joerg.roedel@amd.com
LKML-Reference: <20091027163358F.fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
k8.h uses struct bootnode but does not #include a header file
for it, so provide a simple declaration for it.
arch/x86/include/asm/k8.h:13: warning: 'struct bootnode'
declared inside parameter list arch/x86/include/asm/k8.h:13:
warning: its scope is only this definition or declaration, which is probably not what you want
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
LKML-Reference: <20091028160955.d27ccb16.randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This makes the declarations match the definitions, which already
use 'struct cpumask'.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <200911052245.41803.rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Remove unused thread_return label from switch_to() macro on
x86-64. Since this symbol cuts into schedule(), backtrace at the
latter half of schedule() was always shown as thread_return().
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
LKML-Reference: <20091105160359.5181.26225.stgit@harusame>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now that we have a generic 32bit compatibility implementation
there is no need for x86 to implement it's own.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
We plan to make the breakpoints parameters generic among architectures.
For that it's better to move the asm-generic header to a generic linux
header.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
The current implementation of get_user_desc() sign extends the return
value because of integer promotion rules. For the most part, this
doesn't matter, because the top bit of base2 is usually 0. If, however,
that bit is 1, then the entire value will be 0xffff... which is
probably not what the caller intended.
This patch casts the entire thing to unsigned before returning, which
generates almost the same assembly as the current code but replaces the
final "cltq" (sign extend) with a "mov %eax %eax" (zero-extend). This
fixes booting certain guests under KVM.
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Fix kthread_bind() by moving the body of kthread_bind() to sched.c
sched: Disable SD_PREFER_LOCAL at node level
sched: Fix boot crash by zalloc()ing most of the cpu masks
sched: Strengthen buddies and mitigate buddy induced latencies
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, fs: Fix x86 procfs stack information for threads on 64-bit
x86: Add reboot quirk for 3 series Mac mini
x86: Fix printk message typo in mtrr cleanup code
dma-debug: Fix compile warning with PAE enabled
x86/amd-iommu: Un__init function required on shutdown
x86/amd-iommu: Workaround for erratum 63
Move xen_domain and related tests out of asm-x86 to xen/xen.h so they
can be included whenever they are necessary.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch fixes two issues in the procfs stack information on
x86-64 linux.
The 32 bit loader compat_do_execve did not store stack
start. (this was figured out by Alexey Dobriyan).
The stack information on a x64_64 kernel always shows 0 kbyte
stack usage, because of a missing implementation of the KSTK_ESP
macro which always returned -1.
The new implementation now returns the right value.
Signed-off-by: Stefani Seibold <stefani@seibold.net>
Cc: Americo Wang <xiyou.wangcong@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <1257240160.4889.24.camel@wall-e>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
tools/perf/Makefile
Merge reason: Resolve the conflict, merge to upstream and merge in
perf fixes so we can add a dependent patch.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Yanmin Zhang reported that SD_PREFER_LOCAL induces an order of
magnitude increase in select_task_rq_fair() overhead while
running heavy wakeup benchmarks (tbench and vmark).
Since SD_BALANCE_WAKE is off at node level, turn SD_PREFER_LOCAL
off as well pending further investigation.
Reported-by: Zhang, Yanmin <yanmin_zhang@linux.intel.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt reported that we are unconditionally making the
kernel text mapping as read-only. i.e., if someone does cpa() to
the kernel text area for setting/clearing any page table
attribute, we unconditionally clear the read-write attribute for
the kernel text mapping that is set at compile time.
We should delay (to forbid the write attribute) and enforce only
after the kernel has mapped the text as read-only.
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20091029024820.996634347@sbs-t61.sc.intel.com>
[ marked kernel_set_to_readonly as __read_mostly ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When a cpu goes offline, fixup_irqs() try to move irq's
currently destined to the offline cpu to a new cpu. But this
attempt will fail if the irq is recently moved to this cpu and
the irq still hasn't arrived at this cpu (for non intr-remapping
platforms this is when we free the vector allocation at the
previous destination) that is about to go offline.
This will endup with the interrupt subsystem still pointing the
irq to the offline cpu, causing that irq to not work any more.
Fix this by forcing the irq to complete its move (its been a
long time we moved the irq to this cpu which we are offlining
now) and then move this irq to a new cpu before this cpu goes
offline.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Gary Hade <garyhade@us.ibm.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <20091026230001.848830905@sbs-t61.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
move_cleanup_count for each irq in irq_cfg is keeping track of
the total number of cpus that need to free the corresponding
vectors associated with the irq which has now been migrated to
new destination. As long as this move_cleanup_count is non-zero
(i.e., as long as we have n't freed the vector allocations on
the old destinations) we were preventing the irq's further
migration.
This cleanup count is unnecessary and it is enough to not allow
the irq migration till we send the cleanup vector to the
previous irq destination, for which we already have irq_cfg's
move_in_progress. All we need to make sure is that we free the
vector at the old desintation but we don't need to wait till
that gets freed.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Gary Hade <garyhade@us.ibm.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <20091026230001.752968906@sbs-t61.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Do less agressive buddy clearing
sched: Disable SD_PREFER_LOCAL for MC/CPU domains
Now that the return from alloc_percpu is compatible with the address
of per-cpu vars, it makes sense to hand around the address of per-cpu
variables. To make this sane, we remove the per_cpu__ prefix we used
created to stop people accidentally using these vars directly.
Now we have sparse, we can use that (next patch).
tj: * Updated to convert stuff which were missed by or added after the
original patch.
* Kill per_cpu_var() macro.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Make the following changes to remove some sparse warnings.
* Make DEFINE_PER_CPU_SECTION() declare __pcpu_unique_* before
defining it.
* Annotate pcpu_extend_area_map() that it is entered with pcpu_lock
held, releases it and then reacquires it.
* Make percpu related macros use unique nested variable names.
* While at it, add pcpu prefix to __size_call[_return]() macros as
to-be-implemented sparse annotations will add percpu specific stuff
to these macros.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Add Intel AVX(Advanced Vector Extensions) instruction set
support to x86 instruction decoder. This adds insn.vex_prefix
field for storing VEX prefixes, and introduces some original
tags for expressing opcodes attributes.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: K.Prasad <prasad@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
LKML-Reference: <20091027204226.30545.23451.stgit@harusame>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
tools/perf/Makefile
Merge reason:
- fix the conflict
- pick up the pr_*() infrastructure to queue up dependent patch
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When sending a NMI_VECTOR IPI using the UV_HUB_IPI_INT register,
we need to ensure the delivery mode field of that register has
NMI delivery selected.
This makes those IPIs true NMIs, instead of flat IPIs. It
matters to reboot sequences and KGDB, both of which use NMI
IPIs.
Signed-off-by: Robin Holt <holt@sgi.com>
Acked-by: Jack Steiner <steiner@sgi.com>
Cc: Martin Hicks <mort@sgi.com>
Cc: <stable@kernel.org>
LKML-Reference: <20091020193620.877322000@alcatraz.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
CONFIG_DEBUG_RODATA chops the large pages spanning boundaries of kernel
text/rodata/data to small 4KB pages as they are mapped with different
attributes (text as RO, RODATA as RO and NX etc).
On x86_64, preserve the large page mappings for kernel text/rodata/data
boundaries when CONFIG_DEBUG_RODATA is enabled. This is done by allowing the
RODATA section to be hugepage aligned and having same RWX attributes
for the 2MB page boundaries
Extra Memory pages padding the sections will be freed during the end of the boot
and the kernel identity mappings will have different RWX permissions compared to
the kernel text mappings.
Kernel identity mappings to these physical pages will be mapped with smaller
pages but large page mappings are still retained for kernel text,rodata,data
mappings.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20091014220254.190119924@sbs-t61.sc.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
PCLMULQDQ is used to accelerate the most time-consuming part of GHASH,
carry-less multiplication. More information about PCLMULQDQ can be
found at:
http://software.intel.com/en-us/articles/carry-less-multiplication-and-its-usage-for-computing-the-gcm-mode/
Because PCLMULQDQ changes XMM state, its usage must be enclosed with
kernel_fpu_begin/end, which can be used only in process context, the
acceleration is implemented as crypto_ahash. That is, request in soft
IRQ context will be defered to the cryptd kernel thread.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Conflicts:
kernel/Makefile
kernel/trace/Makefile
kernel/trace/trace.h
samples/Makefile
Merge reason: We need to be uptodate with the perf events development
branch because we plan to rewrite the breakpoints API on top of
perf events.
Create an inline function to extract the pnode from a global
physical address and then convert the broadcast assist unit to
use the newly created uv_gpa_to_pnode function.
The open-coded code was wrong as well - it might explain a
few of our unexplained bau hangs.
Signed-off-by: Robin Holt <holt@sgi.com>
Acked-by: Cliff Whickman <cpw@sgi.com>
Cc: linux-mm@kvack.org
Cc: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20091016112920.GZ8903@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Prefix global/setup routines with "mcheck_" thus differentiating
from the internal facilities prefixed with "mce_". Also, prefix
the per cpu calls with mcheck_cpu and rename them to reflect the
MCE setup hierarchy of calls better.
There should be no functionality change resulting from this
patch.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Andi Kleen <andi@firstfloor.org>
LKML-Reference: <1255689093-26921-1-git-send-email-borislav.petkov@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This warning:
In file included from arch/x86/include/asm/ipi.h:23,
from arch/x86/kernel/apic/apic_noop.c:27:
arch/x86/include/asm/hw_irq.h:105: warning: ‘struct irq_desc’ declared inside parameter list
arch/x86/include/asm/hw_irq.h:105: warning: its scope is only this definition or declaration, which is probably not what you want
triggers because irq_desc is defined after hw_irq.h is included
in irq.h. Since it's pointer reference only, a forward declaration
of the type will solve the problem.
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Yanmin reported that both tbench and hackbench were significantly
hurt by trying to keep tasks local on these domains, esp on small
cache machines.
So disable it in order to promote spreading outside of the cache
domains.
Reported-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Mike Galbraith <efault@gmx.de>
LKML-Reference: <1255083400.8802.15.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move UV specific functionality out of the generic IO-APIC code.
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
LKML-Reference: <20091013203236.GD20543@sgi.com>
[ Cleaned up the code some more in their new places. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch fixes handling of uv hub irq affinity. IRQs with ALL or
NODE affinity can be routed to cpus other than their originally
assigned cpu. Those with CPU affinity cannot be rerouted.
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
LKML-Reference: <20090930160259.GA7822@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Introduce NOOP APIC driver. We should use it in case if apic was
disabled due to hardware of software/firmware problems (including
user requested to disable it case).
The driver is attempting to catch any inappropriate apic operation
call with warning issue.
Also it is possible to use some apic operation like IPI calls,
read/write without checking for apic presence which should make
callers code easier.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: yinghai@kernel.org
Cc: macro@linux-mips.org
LKML-Reference: <20091013201022.534682104@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Bastian Blank reported a boot crash with stackprotector enabled,
and debugged it back to edx register corruption.
For historical reasons irq enable/disable/save/restore had special
calling sequences to make them more efficient. With the more
recent introduction of higher-level and more general optimisations
this is no longer necessary so we can just use the normal PVOP_
macros.
This fixes some residual bugs in the old implementations which left
edx liable to inadvertent clobbering. Also, fix some bugs in
__PVOP_VCALLEESAVE which were revealed by actual use.
Reported-by: Bastian Blank <bastian@waldi.eu.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
Cc: Xen-devel <xen-devel@lists.xensource.com>
LKML-Reference: <4AD3BC9B.7040501@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Meaning receive multiple messages, reducing the number of syscalls and
net stack entry/exit operations.
Next patches will introduce mechanisms where protocols that want to
optimize this operation will provide an unlocked_recvmsg operation.
This takes into account comments made by:
. Paul Moore: sock_recvmsg is called only for the first datagram,
sock_recvmsg_nosec is used for the rest.
. Caitlin Bestler: recvmmsg now has a struct timespec timeout, that
works in the same fashion as the ppoll one.
If the underlying protocol returns a datagram with MSG_OOB set, this
will make recvmmsg return right away with as many datagrams (+ the OOB
one) it has received so far.
. Rémi Denis-Courmont & Steven Whitehouse: If we receive N < vlen
datagrams and then recvmsg returns an error, recvmmsg will return
the successfully received datagrams, store the error and return it
in the next call.
This paves the way for a subsequent optimization, sk_prot->unlocked_recvmsg,
where we will be able to acquire the lock only at batch start and end, not at
every underlying recvmsg call.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the counterpart to "x86: export k8 physical topology" for
SRAT. It is not as invasive because the acpi code already seperates
node setup into detection and registration steps, with the
exception of registering e820 active regions in
acpi_numa_memory_affinity_init(). This is now moved to
acpi_scan_nodes() if NUMA emulation is disabled or deferred.
acpi_numa_init() now returns a value which specifies whether an
underlying SRAT was located. If so, that topology can be used by
the emulation code to interleave emulated nodes over physical nodes
or to register the nodes for ACPI.
acpi_get_nodes() may now be used to export the srat physical
topology of the machine for NUMA emulation.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Ankita Garg <ankita@in.ibm.com>
Cc: Len Brown <len.brown@intel.com>
LKML-Reference: <alpine.DEB.1.00.0909251518580.14754@chino.kir.corp.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
To eventually interleave emulated nodes over physical nodes, we
need to know the physical topology of the machine without actually
registering it. This does the k8 node setup in two parts:
detection and registration. NUMA emulation can then used the
physical topology detected to setup the address ranges of emulated
nodes accordingly. If emulation isn't used, the k8 nodes are
registered as normal.
Two formals are added to the x86 NUMA setup functions: `acpi' and
`k8'. These represent whether ACPI or K8 NUMA has been detected;
both cannot be true at the same time. This specifies to the NUMA
emulation code whether an underlying physical NUMA topology exists
and which interface to use.
This patch deals solely with separating the k8 setup path into
Northbridge detection and registration steps and leaves the ACPI
changes for a subsequent patch. The `acpi' formal is added here,
however, to avoid touching all the header files again in the next
patch.
This approach also ensures emulated nodes will not span physical
nodes so the true memory latency is not misrepresented.
k8_get_nodes() may now be used to export the k8 physical topology
of the machine for NUMA emulation.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Ankita Garg <ankita@in.ibm.com>
Cc: Len Brown <len.brown@intel.com>
LKML-Reference: <alpine.DEB.1.00.0909251518400.14754@chino.kir.corp.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move the handling of truncated %rip from an iret fault to the fault
entry path.
This allows x86-64 to use the standard search_extable() function.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <1255357103-5418-1-git-send-email-brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add an atomic notifier which ensures proper locking when conveying
MCE info to EDAC for decoding. The actual notifier call overrides a
default, negative priority notifier.
Note: make sure we register the default decoder only once since
mcheck_init() runs on each CPU.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20091003065752.GA8935@liondog.tnic>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
For no real good reason, compat_start_thread() was embedded inline in
<asm/elf.h> whereas the native start_thread() lives in process_*.c.
Move compat_start_thread() to process_64.c, remove gratuitious
differences, and fix a few items which mostly look like bit rot.
In particular, compat_start_thread() didn't do free_thread_xstate(),
which means it was hanging on to the xstate store area even when it
was not needed. It was also not setting old_rsp, but it looks like
that generally shouldn't matter for a 32-bit process.
Note: compat_start_thread *has* to be a macro, since it is tested with
start_thread_ia32() as the out of line function name.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
There is an erratum for IOMMU hardware which documents
undefined behavior when forwarding SMI requests from
peripherals and the DTE of that peripheral has a sysmgt
value of 01b. This problem caused weird IO_PAGE_FAULTS in my
case.
This patch implements the suggested workaround for that
erratum into the AMD IOMMU driver. The erratum is
documented with number 63.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Rework the x86 cmpxchg() implementation to generate build failures
when used on improper types.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <1254771187.21044.22.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Intel fixed counters do not support all the filters possible with a
generic counter. Thus, if a fixed counter event is passed but with
certain filters set, then the fixed_mode_idx() function must fail
and the event must be measured in a generic counter instead.
Reject filters are: inv, edge, cnt-mask.
Signed-off-by: Stephane Eranian <eranian@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1254840129-6198-2-git-send-email-eranian@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
this is needed for kvm if it want ksm to directly map pages into its
shadow page tables.
[marcelo: cast pfn assignment to u64]
Signed-off-by: Izik Eidus <ieidus@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Basically the existing percpu ops can be used for this_cpu variants that allow
operations also on dynamically allocated percpu data. However, we do not pass a
reference to a percpu variable in. Instead a dynamically or statically
allocated percpu variable is provided.
Preempt, the non preempt and the irqsafe operations generate the same code.
It will always be possible to have the requires per cpu atomicness in a single
RMW instruction with segment override on x86.
64 bit this_cpu operations are not supported on 32 bit.
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
For automated testing it is useful to have the option to turn
the warnings on copy_from_user() etc checks into errors:
In function ‘copy_from_user’,
inlined from ‘fd_copyin’ at drivers/block/floppy.c:3080,
inlined from ‘fd_ioctl’ at drivers/block/floppy.c:3503:
linux/arch/x86/include/asm/uaccess_32.h:213:
error: call to ‘copy_from_user_overflow’ declared with attribute error:
copy_from_user buffer size is not provably correct
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <20091002075050.4e9f7641@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Make decoding of MCEs happen only on AMD hardware by registering a
non-default callback only on CPU families which support it.
While looking at the interaction of decode_mce() with the other MCE
code i also noticed a few other things and made the following
cleanups/fixes:
- Fixed the mce_decode() weak alias - a weak alias is really not
good here, it should be a proper callback. A weak alias will be
overriden if a piece of code is built into the kernel - not
good, obviously.
- The patch initializes the callback on AMD family 10h and 11h.
- Added the more correct fallback printk of:
No support for human readable MCE decoding on this CPU type.
Transcribe the message and run it through 'mcelog --ascii' to decode.
On CPUs that dont have a decoder.
- Made the surrounding code more readable.
Note that the callback allows us to have a default fallback -
without having to check the CPU versions during the printout
itself. When an EDAC module registers itself, it can install the
decode-print function.
(there's no unregister needed as this is core code.)
version -v2 by Borislav Petkov:
- add K8 to the set of supported CPUs
- always build in edac_mce_amd since we use an early_initcall now
- fix checkpatch warnings
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>
LKML-Reference: <20091001141432.GA11410@aftab>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Just like ip_fast_csum, the assembly snippet in csum_ipv6_magic needs a
memory clobber, as it is only passed the address of the buffer, not a
memory reference to the buffer itself.
This caused failures in Hurd's pfinetv4 when we tried to compile it with
gcc-4.3 (bogus checksums).
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: "David S. Miller" <davem@davemloft.net>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a general per-cpu notifier that is called whenever the kernel is
about to return to userspace. The notifier uses a thread_info flag
and existing checks, so there is no impact on user return or context
switch fast paths.
This will be used initially to speed up KVM task switching by lazily
updating MSRs.
Signed-off-by: Avi Kivity <avi@redhat.com>
LKML-Reference: <1253342422-13811-1-git-send-email-avi@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
A previous patch added the buffer size check to copy_from_user().
One of the things learned from analyzing the result of the previous
patch is that in general, gcc is really good at proving that the
code contains sufficient security checks to not need to do a
runtime check. But that for those cases where gcc could not prove
this, there was a relatively high percentage of real security
issues.
This patch turns the case of "gcc cannot prove" into a compile time
warning, as long as a sufficiently new gcc is in use that supports
this. The objective is that these warnings will trigger developers
checking new cases out before a security hole enters a linux kernel
release.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Morris <jmorris@namei.org>
Cc: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <20090930130523.348ae6c4@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Just like ip_fast_csum, the assembly snippet in csum_ipv6_magic needs a
memory clobber, as it is only passed the address of the buffer, not a
memory reference to the buffer itself.
This caused failures in Hurd's pfinetv4 when we tried to compile it with
gcc-4.3 (bogus checksums).
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
cmpxchg64() today generates, to quote Linus, "barf bag" code.
cmpxchg64() is about to get used in the scheduler to fix a bug there,
but it's a prerequisite that cmpxchg64() first be made non-sucking.
This patch turns cmpxchg64() into an efficient implementation that
uses the alternative() mechanism to just use the raw instruction on
all modern systems.
Note: the fallback is NOT smp safe, just like the current fallback
is not SMP safe. (Interested parties with i486 based SMP systems
are welcome to submit fix patches for that.)
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
[ fixed asm constraint bug ]
Fixed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090930170754.0886ff2e@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
GCC provides reasonable memset/memcpy functions itself, with __builtin_memset
and __builtin_memcpy. For the "unknown" cases, it'll fall back to our
current existing functions, but for fixed size versions it'll inline
something smart. Quite often that will be the same as we have now,
but sometimes it can do something smarter (for example, if the code
then sets the first member of a struct, it can do a shorter memset).
In addition, and this is more important, gcc knows which registers and
such are not clobbered (while for our asm version it pretty much
acts like a compiler barrier), so for various cases it can avoid reloading
values.
The effect on codesize is shown below on my typical laptop .config:
text data bss dec hex filename
5605675 2041100 6525148 14171923 d83f13 vmlinux.before
5595849 2041668 6525148 14162665 d81ae9 vmlinux.after
Due to some not-so-good behavior in the gcc 3.x series, this change
is only done for GCC 4.x and above.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
LKML-Reference: <20090928142122.6fc57e9c@infradead.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Remove redundant non-NUMA topology functions
x86: early_printk: Protect against using the same device twice
x86: Reduce verbosity of "PAT enabled" kernel message
x86: Reduce verbosity of "TSC is reliable" message
x86: mce: Use safer ways to access MCE registers
x86: mce, inject: Use real inject-msg in raise_local
x86: mce: Fix thermal throttling message storm
x86: mce: Clean up thermal throttling state tracking code
x86: split NX setup into separate file to limit unstack-protected code
xen: check EFER for NX before setting up GDT mapping
x86: Cleanup linker script using new linker script macros.
x86: Use section .data.page_aligned for the idt_table.
x86: convert to use __HEAD and HEAD_TEXT macros.
x86: convert compressed loader to use __HEAD and HEAD_TEXT macros.
x86: fix fragile computation of vsyscall address
gcc (4.x) supports the __builtin_object_size() builtin, which
reports the size of an object that a pointer point to, when known
at compile time. If the buffer size is not known at compile time, a
constant -1 is returned.
This patch uses this feature to add a sanity check to
copy_from_user(); if the target buffer is known to be smaller than
the copy size, the copy is aborted and a WARNing is emitted in
memory debug mode.
These extra checks compile away when the object size is not known,
or if both the buffer size and the copy length are constants.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
LKML-Reference: <20090926143301.2c396b94@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It's unused.
It isn't needed -- read or write flag is already passed and sysctl
shouldn't care about the rest.
It _was_ used in two places at arch/frv for some reason.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arch/x86/include/asm/topology.h declares inline fns cpu_to_node and
cpumask_of_node for !NUMA, even though they are then declared as
macros by asm-generic/topology.h, which is #included just below.
The macros (which are the same) end up being used; these functions
are just confusing.
Noticed-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: "Greg Kroah-Hartman" <gregkh@suse.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
LKML-Reference: <200909241748.45629.rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (39 commits)
cpumask: Move deprecated functions to end of header.
cpumask: remove unused deprecated functions, avoid accusations of insanity
cpumask: use new-style cpumask ops in mm/quicklist.
cpumask: use mm_cpumask() wrapper: x86
cpumask: use mm_cpumask() wrapper: um
cpumask: use mm_cpumask() wrapper: mips
cpumask: use mm_cpumask() wrapper: mn10300
cpumask: use mm_cpumask() wrapper: m32r
cpumask: use mm_cpumask() wrapper: arm
cpumask: Use accessors for cpu_*_mask: um
cpumask: Use accessors for cpu_*_mask: powerpc
cpumask: Use accessors for cpu_*_mask: mips
cpumask: Use accessors for cpu_*_mask: m32r
cpumask: remove arch_send_call_function_ipi
cpumask: arch_send_call_function_ipi_mask: s390
cpumask: arch_send_call_function_ipi_mask: powerpc
cpumask: arch_send_call_function_ipi_mask: mips
cpumask: arch_send_call_function_ipi_mask: m32r
cpumask: arch_send_call_function_ipi_mask: alpha
cpumask: remove obsolete topology_core_siblings and topology_thread_siblings: ia64
...
Makes code futureproof against the impending change to mm->cpu_vm_mask (to be a pointer).
It's also a chance to use the new cpumask_ ops which take a pointer
(the older ones are deprecated, but there's no hurry for arch code).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next: (30 commits)
Use macros for .data.page_aligned section.
Use macros for .bss.page_aligned section.
Use new __init_task_data macro in arch init_task.c files.
kbuild: Don't define ALIGN and ENTRY when preprocessing linker scripts.
arm, cris, mips, sparc, powerpc, um, xtensa: fix build with bash 4.0
kbuild: add static to prototypes
kbuild: fail build if recordmcount.pl fails
kbuild: set -fconserve-stack option for gcc 4.5
kbuild: echo the record_mcount command
gconfig: disable "typeahead find" search in treeviews
kbuild: fix cc1 options check to ensure we do not use -fPIC when compiling
checkincludes.pl: add option to remove duplicates in place
markup_oops: use modinfo to avoid confusion with underscored module names
checkincludes.pl: provide usage helper
checkincludes.pl: close file as soon as we're done with it
ctags: usability fix
kernel hacking: move STRIP_ASM_SYMS from General
gitignore usr/initramfs_data.cpio.bz2 and usr/initramfs_data.cpio.lzma
kbuild: Check if linker supports the -X option
kbuild: introduce ld-option
...
Fix trivial conflict in scripts/basic/fixdep.c
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (119 commits)
ACPI: don't pass handle for fixed hardware notifications
ACPI: remove null pointer checks in deferred execution path
ACPI: simplify deferred execution path
acerhdf: additional BIOS versions
acerhdf: convert to dev_pm_ops
acerhdf: fix fan control for AOA150 model
thermal: add missing Kconfig dependency
acpi: switch /proc/acpi/{debug_layer,debug_level} to seq_file
hp-wmi: fix rfkill memory leak on unload
ACPI: remove unnecessary #ifdef CONFIG_DMI
ACPI: linux/acpi.h should not include linux/dmi.h
hwmon driver for ACPI 4.0 power meters
topstar-laptop: add new driver for hotkeys support on Topstar N01
thinkpad_acpi: fix rfkill memory leak on unload
thinkpad-acpi: report brightness events when required
thinkpad-acpi: don't poll by default any of the reserved hotkeys
thinkpad-acpi: Fix procfs hotkey reset command
thinkpad-acpi: deprecate hotkey_bios_mask
thinkpad-acpi: hotkey poll fixes
thinkpad-acpi: be more strict when detecting a ThinkPad
...
Move the NX setup into a separate file so that it can be compiled
without stack-protection while leaving the rest of the mm/init code
protected.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
* 'perfcounters-rename-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf: Tidy up after the big rename
perf: Do the big rename: Performance Counters -> Performance Events
perf_counter: Rename 'event' to event_id/hw_event
perf_counter: Rename list_entry -> group_entry, counter_list -> group_list
Manually resolved some fairly trivial conflicts with the tracing tree in
include/trace/ftrace.h and kernel/trace/trace_syscalls.c.
Bye-bye Performance Counters, welcome Performance Events!
In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.
Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.
All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)
The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.
Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.
User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)
This patch has been generated via the following script:
FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
sed -i \
-e 's/PERF_EVENT_/PERF_RECORD_/g' \
-e 's/PERF_COUNTER/PERF_EVENT/g' \
-e 's/perf_counter/perf_event/g' \
-e 's/nb_counters/nb_events/g' \
-e 's/swcounter/swevent/g' \
-e 's/tpcounter_event/tp_event/g' \
$FILES
for N in $(find . -name perf_counter.[ch]); do
M=$(echo $N | sed 's/perf_counter/perf_event/g')
mv $N $M
done
FILES=$(find . -name perf_event.*)
sed -i \
-e 's/COUNTER_MASK/REG_MASK/g' \
-e 's/COUNTER/EVENT/g' \
-e 's/\<event\>/event_id/g' \
-e 's/counter/event/g' \
-e 's/Counter/Event/g' \
$FILES
... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.
Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.
( NOTE: 'counters' are still the proper terminology when we deal
with hardware registers - and these sed scripts are a bit
over-eager in renaming them. I've undone some of that, but
in case there's something left where 'counter' would be
better than 'event' we can undo that on an individual basis
instead of touching an otherwise nicely automated patch. )
Suggested-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch changes the remaining direct references to
.data.page_aligned in C and assembly code to use the macros in
include/linux/linkage.h.
Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alok N Kataria <akataria@vmware.com>
Cc: "Tan Wei Chong" <wei.chong.tan@intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Bob Moore <robert.moore@intel.com>
LKML-Reference: <1253137123-18047-2-git-send-email-felipe.contreras@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In case of discrete (pretty old) apics we may have cpu_has_apic bit
not set but have to check if smp_found_config (MP spec) is there
and apic was not disabled.
Also don't forget to print apic/io-apic for such case as well.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20090915071230.GA10604@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (38 commits)
x86: Move get/set_wallclock to x86_platform_ops
x86: platform: Fix section annotations
x86: apic namespace cleanup
x86: Distangle ioapic and i8259
x86: Add Moorestown early detection
x86: Add hardware_subarch ID for Moorestown
x86: Add early platform detection
x86: Move tsc_init to late_time_init
x86: Move tsc_calibration to x86_init_ops
x86: Replace the now identical time_32/64.c by time.c
x86: time_32/64.c unify profile_pc
x86: Move calibrate_cpu to tsc.c
x86: Make timer setup and global variables the same in time_32/64.c
x86: Remove mca bus ifdef from timer interrupt
x86: Simplify timer_ack magic in time_32.c
x86: Prepare unification of time_32/64.c
x86: Remove do_timer hook
x86: Add timer_init to x86_init_ops
x86: Move percpu clockevents setup to x86_init_ops
x86: Move xen_post_allocator_init into xen_pagetable_setup_done
...
Fix up conflicts in arch/x86/include/asm/io_apic.h
* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (34 commits)
time: Prevent 32 bit overflow with set_normalized_timespec()
clocksource: Delay clocksource down rating to late boot
clocksource: clocksource_select must be called with mutex locked
clocksource: Resolve cpu hotplug dead lock with TSC unstable, fix crash
timers: Drop a function prototype
clocksource: Resolve cpu hotplug dead lock with TSC unstable
timer.c: Fix S/390 comments
timekeeping: Fix invalid getboottime() value
timekeeping: Fix up read_persistent_clock() breakage on sh
timekeeping: Increase granularity of read_persistent_clock(), build fix
time: Introduce CLOCK_REALTIME_COARSE
x86: Do not unregister PIT clocksource on PIT oneshot setup/shutdown
clocksource: Avoid clocksource watchdog circular locking dependency
clocksource: Protect the watchdog rating changes with clocksource_mutex
clocksource: Call clocksource_change_rating() outside of watchdog_lock
timekeeping: Introduce read_boot_clock
timekeeping: Increase granularity of read_persistent_clock()
timekeeping: Update clocksource with stop_machine
timekeeping: Add timekeeper read_clock helper functions
timekeeping: Move NTP adjusted clock multiplier to struct timekeeper
...
Fix trivial conflict due to MIPS lemote -> loongson renaming.
The early initialization of the pci bus to node mapping leaves all busses
with a node id of -1 if it lacks memory affinity. Thus, cpumask_of_pcibus
must return all online cpus for such busses.
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Add volatile-semantics to the SGI UV read/write macros that are
used to access chipset memory mapped registers. No direct
references to volatile are made. Instead the readq/writeq macros
are used.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Cc: linux-mm@kvack.org
Cc: dwalker@fifo99.com
Cc: cfriesen@nortel.com
LKML-Reference: <20090910143149.GA14273@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The UV BIOS has changed the way interrupt remapping is being done.
This affects the id used for sending IPIs. The upper id bits no
longer need to be masked off.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Cc: <stable@kernel.org>
LKML-Reference: <20090909154104.GA25083@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (21 commits)
x86, mce: Fix compilation with !CONFIG_DEBUG_FS in mce-severity.c
x86, mce: CE in last bank prevents panic by unknown MCE
x86, mce: Fake panic support for MCE testing
x86, mce: Move debugfs mce dir creating to mce.c
x86, mce: Support specifying raise mode for software MCE injection
x86, mce: Support specifying context for software mce injection
x86, mce: fix reporting of Thermal Monitoring mechanism enabled
x86, mce: remove never executed code
x86, mce: add missing __cpuinit tags
x86, mce: fix "mce" boot option handling for CONFIG_X86_NEW_MCE
x86, mce: don't log boot MCEs on Pentium M (model == 13) CPUs
x86: mce: Lower maximum number of banks to architecture limit
x86: mce: macros to compute banks MSRs
x86: mce: Move per bank data in a single datastructure
x86: mce: Move code in mce.c
x86: mce: Rename CONFIG_X86_NEW_MCE to CONFIG_X86_MCE
x86: mce: Remove old i386 machine check code
x86: mce: Update X86_MCE description in x86/Kconfig
x86: mce: Make CONFIG_X86_ANCIENT_MCE dependent on CONFIG_X86_MCE
x86, mce: use atomic_inc_return() instead of add by 1
...
Manually fixed up trivial conflicts:
Documentation/feature-removal-schedule.txt
arch/x86/kernel/cpu/mcheck/mce.c
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (37 commits)
sched: Fix SD_POWERSAVING_BALANCE|SD_PREFER_LOCAL vs SD_WAKE_AFFINE
sched: Stop buddies from hogging the system
sched: Add new wakeup preemption mode: WAKEUP_RUNNING
sched: Fix TASK_WAKING & loadaverage breakage
sched: Disable wakeup balancing
sched: Rename flags to wake_flags
sched: Clean up the load_idx selection in select_task_rq_fair
sched: Optimize cgroup vs wakeup a bit
sched: x86: Name old_perf in a unique way
sched: Implement a gentler fair-sleepers feature
sched: Add SD_PREFER_LOCAL
sched: Add a few SYNC hint knobs to play with
sched: Fix sync wakeups again
sched: Add WF_FORK
sched: Rename sync arguments
sched: Rename select_task_rq() argument
sched: Feature to disable APERF/MPERF cpu_power
x86: sched: Provide arch implementations using aperf/mperf
x86: Add generic aperf/mperf code
x86: Move APERF/MPERF into a X86_FEATURE
...
Fix up trivial conflict in arch/x86/include/asm/processor.h due to
nearby addition of amd_get_nb_id() declaration from the EDAC merge.
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, pat: don't use rb-tree based lookup in reserve_memtype()
x86: Increase MIN_GAP to include randomized stack
* 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (44 commits)
vsnprintf: remove duplicate comment of vsnprintf
softirq: add BLOCK_IOPOLL to softirq_to_name
oprofile: fix oprofile regression: select RING_BUFFER_ALLOW_SWAP
tracing: switch function prints from %pf to %ps
vsprintf: add %ps that is the same as %pS but is like %pf
tracing: Fix minor bugs for __unregister_ftrace_function_probe
tracing: remove notrace from __kprobes annotation
tracing: optimize global_trace_clock cachelines
MAINTAINERS: Update tracing tree details
ftrace: document function and function graph implementation
tracing: make testing syscall events a separate configuration
tracing: remove some unused macros
ftrace: add compile-time check on F_printk()
tracing: fix F_printk() typos
tracing: have TRACE_EVENT macro use __flags to not shadow parameter
tracing: add static to generated TRACE_EVENT functions
ring-buffer: typecast cmpxchg to fix PowerPC warning
tracing: add filter event logic to special, mmiotrace and boot tracers
tracing: remove trace_event_types.h
tracing: use the new trace_entries.h to create format files
...
Merge reason:
Suresh Siddha (1):
x86, pat: don't use rb-tree based lookup in reserve_memtype()
... requires previous x86/pat commits already pushed to Linus.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
amd64_edac: check NB MCE bank enable on the current node properly
amd64_edac: Rewrite unganged mode code of f10_early_channel_count
amd64_edac: cleanup amd64_check_ecc_enabled
x86, EDAC: Provide function to return NodeId of a CPU
amd64_edac: build driver only on AMD hardware
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (75 commits)
PCI hotplug: clean up acpi_run_hpp()
PCI hotplug: acpiphp: use generic pci_configure_slot()
PCI hotplug: shpchp: use generic pci_configure_slot()
PCI hotplug: pciehp: use generic pci_configure_slot()
PCI hotplug: add pci_configure_slot()
PCI hotplug: clean up acpi_get_hp_params_from_firmware() interface
PCI hotplug: acpiphp: don't cache hotplug_params in acpiphp_bridge
PCI hotplug: acpiphp: remove superfluous _HPP/_HPX evaluation
PCI: Clear saved_state after the state has been restored
PCI PM: Return error codes from pci_pm_resume()
PCI: use dev_printk in quirk messages
PCI / PCIe portdrv: Fix pcie_portdrv_slot_reset()
PCI Hotplug: convert acpi_pci_detect_ejectable() to take an acpi_handle
PCI Hotplug: acpiphp: find bridges the easy way
PCI: pcie portdrv: remove unused variable
PCI / ACPI PM: Propagate wake-up enable for devices w/o ACPI support
ACPI PM: Replace wakeup.prepared with reference counter
PCI PM: Introduce device flag wakeup_prepared
PCI / ACPI PM: Rework some debug messages
PCI PM: Simplify PCI wake-up code
...
Fixed up conflict in arch/powerpc/kernel/pci_64.c due to OF device tree
scanning having been moved and merged for the 32- and 64-bit cases. The
'needs_freset' initialization added in 6e19314cc ("PCI/powerpc: support
PCIe fundamental reset") is now in arch/powerpc/kernel/pci_of_scan.c.
Sysbench thinks SD_BALANCE_WAKE is too agressive and kbuild doesn't
really mind too much, SD_BALANCE_NEWIDLE picks up most of the
slack.
On a dual socket, quad core, dual thread nehalem system:
sysbench (--num_threads=16):
SD_BALANCE_WAKE-: 13982 tx/s
SD_BALANCE_WAKE+: 15688 tx/s
kbuild (-j16):
SD_BALANCE_WAKE-: 47.648295846 seconds time elapsed ( +- 0.312% )
SD_BALANCE_WAKE+: 47.608607360 seconds time elapsed ( +- 0.026% )
(same within noise)
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
get/set_wallclock() have already a set of platform dependent
implementations (default, EFI, paravirt). MRST will add another
variant.
Moving them to platform ops simplifies the existing code and minimizes
the effort to integrate new variants.
Signed-off-by: Feng Tang <feng.tang@intel.com>
LKML-Reference: <new-submission>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits)
powerpc64: convert to dynamic percpu allocator
sparc64: use embedding percpu first chunk allocator
percpu: kill lpage first chunk allocator
x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA
percpu: update embedding first chunk allocator to handle sparse units
percpu: use group information to allocate vmap areas sparsely
vmalloc: implement pcpu_get_vm_areas()
vmalloc: separate out insert_vmalloc_vm()
percpu: add chunk->base_addr
percpu: add pcpu_unit_offsets[]
percpu: introduce pcpu_alloc_info and pcpu_group_info
percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward
percpu: add @align to pcpu_fc_alloc_fn_t
percpu: make @dyn_size mandatory for pcpu_setup_first_chunk()
percpu: drop @static_size from first chunk allocators
percpu: generalize first chunk allocator selection
percpu: build first chunk allocators selectively
percpu: rename 4k first chunk allocator to page
percpu: improve boot messages
percpu: fix pcpu_reclaim() locking
...
Fix trivial conflict as by Tejun Heo in kernel/sched.c
* 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, pat: Fix cacheflush address in change_page_attr_set_clr()
mm: remove !NUMA condition from PAGEFLAGS_EXTENDED condition set
x86: Fix earlyprintk=dbgp for machines without NX
x86, pat: Sanity check remap_pfn_range for RAM region
x86, pat: Lookup the protection from memtype list on vm_insert_pfn()
x86, pat: Add lookup_memtype to get the current memtype of a paddr
x86, pat: Use page flags to track memtypes of RAM pages
x86, pat: Generalize the use of page flag PG_uncached
x86, pat: Add rbtree to do quick lookup in memtype tracking
x86, pat: Add PAT reserve free to io_mapping* APIs
x86, pat: New i/f for driver to request memtype for IO regions
x86, pat: ioremap to follow same PAT restrictions as other PAT users
x86, pat: Keep identity maps consistent with mmaps even when pat_disabled
x86, mtrr: make mtrr_aps_delayed_init static bool
x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init
generic-ipi: Allow cpus not yet online to call smp_call_function with irqs disabled
x86: Fix an incorrect argument of reserve_bootmem()
x86: Fix system crash when loading with "reservetop" parameter
* 'x86-txt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, intel_txt: clean up the impact on generic code, unbreak non-x86
x86, intel_txt: Handle ACPI_SLEEP without X86_TRAMPOLINE
x86, intel_txt: Fix typos in Kconfig help
x86, intel_txt: Factor out the code for S3 setup
x86, intel_txt: tboot.c needs <asm/fixmap.h>
intel_txt: Force IOMMU on for Intel TXT launch
x86, intel_txt: Intel TXT Sx shutdown support
x86, intel_txt: Intel TXT reboot/halt shutdown support
x86, intel_txt: Intel TXT boot support
* 'agp-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6:
agp/intel: remove restore in resume
agp: fix uninorth build
intel-agp: Set dma mask for i915
agp: kill phys_to_gart() and gart_to_phys()
intel-agp: fix sglist allocation to avoid vmalloc()
intel-agp: Move repeated sglist free into separate function
agp: Switch agp_{un,}map_page() to take struct page * argument
agp: tidy up handling of scratch pages w.r.t. DMA API
intel_agp: Use PCI DMA API correctly on chipsets new enough to have IOMMU
agp: Add generic support for graphics dma remapping
agp: Switch mask_memory() method to take address argument again, not page
Move some of the aperf/mperf code out from the cpufreq driver
thingy so that other people can enjoy it too.
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Yanmin <yanmin_zhang@linux.intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: cpufreq@vger.kernel.org
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move the APERFMPERF capacility into a X86_FEATURE flag so that it
can be used outside of the acpi cpufreq driver.
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Yanmin <yanmin_zhang@linux.intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: cpufreq@vger.kernel.org
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If we're looking to place a new task, we might as well find the
idlest position _now_, not 1 tick ago.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Make the idle balancer more agressive, to improve a
x264 encoding workload provided by Jason Garrett-Glaser:
NEXT_BUDDY NO_LB_BIAS
encoded 600 frames, 252.82 fps, 22096.60 kb/s
encoded 600 frames, 250.69 fps, 22096.60 kb/s
encoded 600 frames, 245.76 fps, 22096.60 kb/s
NO_NEXT_BUDDY LB_BIAS
encoded 600 frames, 344.44 fps, 22096.60 kb/s
encoded 600 frames, 346.66 fps, 22096.60 kb/s
encoded 600 frames, 352.59 fps, 22096.60 kb/s
NO_NEXT_BUDDY NO_LB_BIAS
encoded 600 frames, 425.75 fps, 22096.60 kb/s
encoded 600 frames, 425.45 fps, 22096.60 kb/s
encoded 600 frames, 422.49 fps, 22096.60 kb/s
Peter pointed out that this is better done via newidle_idx,
not via LB_BIAS, newidle balancing should look for where
there is load _now_, not where there was load 2 ticks ago.
Worst-case latencies are improved as well as no buddies
means less vruntime spread. (as per prior lkml discussions)
This change improves kbuild-peak parallelism as well.
Reported-by: Jason Garrett-Glaser <darkshikari@gmail.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1253011667.9128.16.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When merging select_task_rq_fair() and sched_balance_self() we lost
the use of wake_idx, restore that and set them to 0 to make wake
balancing more aggressive.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The problem with wake_idle() is that is doesn't respect things like
cpu_power, which means it doesn't deal well with SMT nor the recent
RT interaction.
To cure this, it needs to do what sched_balance_self() does, which
leads to the possibility of merging select_task_rq_fair() and
sched_balance_self().
Modify sched_balance_self() to:
- update_shares() when walking up the domain tree,
(it only called it for the top domain, but it should
have done this anyway), which allows us to remove
this ugly bit from try_to_wake_up().
- do wake_affine() on the smallest domain that contains
both this (the waking) and the prev (the wakee) cpu for
WAKE invocations.
Then use the top-down balance steps it had to replace wake_idle().
This leads to the dissapearance of SD_WAKE_BALANCE and
SD_WAKE_IDLE_FAR, with SD_WAKE_IDLE replaced with SD_BALANCE_WAKE.
SD_WAKE_AFFINE needs SD_BALANCE_WAKE to be effective.
Touch all topology bits to replace the old with new SD flags --
platforms might need re-tuning, enabling SD_BALANCE_WAKE
conditionally on a NUMA distance seems like a good additional
feature, magny-core and small nehalem systems would want this
enabled, systems with slow interconnects would not.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Since rdmsr_on_cpus and wrmsr_on_cpus are almost identical, unify them
into a common __rwmsr_on_cpus helper thus avoiding code duplication.
While at it, convert cpumask_t's to const struct cpumask *.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: (23 commits)
at_hdmac: Rework suspend_late()/resume_early()
PM: Reset transition_started at dpm_resume_noirq
PM: Update kerneldoc comments in drivers/base/power/main.c
PM: Add convenience macro to make switching to dev_pm_ops less error-prone
hp-wmi: Switch driver to dev_pm_ops
floppy: Switch driver to dev_pm_ops
PM: Trivial fixes
PM / Hibernate / Memory hotplug: Always use for_each_populated_zone()
PM/Hibernate: Do not try to allocate too much memory too hard (rev. 2)
PM/Hibernate: Do not release preallocated memory unnecessarily (rev. 2)
PM/Hibernate: Rework shrinking of memory
PM: Fix typo in label name s/Platofrm_finish/Platform_finish/
PM: Run-time PM platform device bus support
PM: Introduce core framework for run-time PM of I/O devices (rev. 17)
Driver Core: Make PM operations a const pointer
PM: Remove platform device suspend_late()/resume_early() V2
USB: Rework musb suspend()/resume_early()
I2C: Rework i2c-s3c2410 suspend_late()/resume() V2
I2C: Rework i2c-pxa suspend_late()/resume_early()
DMA: Rework txx9dmac suspend_late()/resume_early()
...
Fix trivial conflict in drivers/base/platform.c (due to same
constification patch being merged in both sides, along with some other
PM work in the PM branch)
* 'kvm-updates/2.6.32' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (202 commits)
MAINTAINERS: update KVM entry
KVM: correct error-handling code
KVM: fix compile warnings on s390
KVM: VMX: Check cpl before emulating debug register access
KVM: fix misreporting of coalesced interrupts by kvm tracer
KVM: x86: drop duplicate kvm_flush_remote_tlb calls
KVM: VMX: call vmx_load_host_state() only if msr is cached
KVM: VMX: Conditionally reload debug register 6
KVM: Use thread debug register storage instead of kvm specific data
KVM guest: do not batch pte updates from interrupt context
KVM: Fix coalesced interrupt reporting in IOAPIC
KVM guest: fix bogus wallclock physical address calculation
KVM: VMX: Fix cr8 exiting control clobbering by EPT
KVM: Optimize kvm_mmu_unprotect_page_virt() for tdp
KVM: Document KVM_CAP_IRQCHIP
KVM: Protect update_cr8_intercept() when running without an apic
KVM: VMX: Fix EPT with WP bit change during paging
KVM: Use kvm_{read,write}_guest_virt() to read and write segment descriptors
KVM: x86 emulator: Add adc and sbb missing decoder flags
KVM: Add missing #include
...
* 'x86-percpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, percpu: Collect hot percpu variables into one cacheline
x86, percpu: Fix DECLARE/DEFINE_PER_CPU_PAGE_ALIGNED()
x86, percpu: Add 'percpu_read_stable()' interface for cacheable accesses
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, highmem_32.c: Clean up comment
x86, pgtable.h: Clean up types
x86: Clean up dump_pagetable()
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86-64: move clts into batch cpu state updates when preloading fpu
x86-64: move unlazy_fpu() into lazy cpu state part of context switch
x86-32: make sure clts is batched during context switch
x86: split out core __math_state_restore
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (22 commits)
x86: Fix code patching for paravirt-alternatives on 486
x86, msr: change msr-reg.o to obj-y, and export its symbols
x86: Use hard_smp_processor_id() to get apic id for AMD K8 cpus
x86, sched: Workaround broken sched domain creation for AMD Magny-Cours
x86, mcheck: Use correct cpumask for shared bank4
x86, cacheinfo: Fixup L3 cache information for AMD multi-node processors
x86: Fix CPU llc_shared_map information for AMD Magny-Cours
x86, msr: Fix msr-reg.S compilation with gas 2.16.1, on 32-bit too
x86: Move kernel_fpu_using to irq_fpu_usable in asm/i387.h
x86, msr: fix msr-reg.S compilation with gas 2.16.1
x86, msr: Export the register-setting MSR functions via /dev/*/msr
x86, msr: Create _on_cpu helpers for {rw,wr}msr_safe_regs()
x86, msr: Have the _safe MSR functions return -EIO, not -EFAULT
x86, msr: CFI annotations, cleanups for msr-reg.S
x86, asm: Make _ASM_EXTABLE() usable from assembly code
x86, asm: Add 32-bit versions of the combined CFI macros
x86, AMD: Disable wrongly set X86_FEATURE_LAHF_LM CPUID bit
x86, msr: Rewrite AMD rd/wrmsr variants
x86, msr: Add rd/wrmsr interfaces with preset registers
x86: add specific support for Intel Atom architecture
...
* 'x86-asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: remove all now-duplicate header files
x86: convert termios.h to the asm-generic version
x86: convert almost generic headers to asm-generic version
x86: convert trivial headers to asm-generic version
x86: add copies of some headers to convert to asm-generic