* git://git.infradead.org/iommu-2.6:
intel-iommu: Fix address wrap on 32-bit kernel.
intel-iommu: Enable DMAR on 32-bit kernel.
intel-iommu: fix PCI device detach from virtual machine
intel-iommu: VT-d page table to support snooping control bit
iommu: Add domain_has_cap iommu_ops
intel-iommu: Snooping control support
Fixed trivial conflicts in arch/x86/Kconfig and drivers/pci/intel-iommu.c
This patch adds preadv and pwritev system calls. These syscalls are a
pretty straightforward combination of pread and readv (same for write).
They are quite useful for doing vectored I/O in threaded applications.
Using lseek+readv instead opens race windows you'll have to plug with
locking.
Other systems have such system calls too, for example NetBSD, check
here: http://www.daemon-systems.org/man/preadv.2.html
The application-visible interface provided by glibc should look like
this to be compatible to the existing implementations in the *BSD family:
ssize_t preadv(int d, const struct iovec *iov, int iovcnt, off_t offset);
ssize_t pwritev(int d, const struct iovec *iov, int iovcnt, off_t offset);
This prototype has one problem though: On 32bit archs is the (64bit)
offset argument unaligned, which the syscall ABI of several archs doesn't
allow to do. At least s390 needs a wrapper in glibc to handle this. As
we'll need a wrappers in glibc anyway I've decided to push problem to
glibc entriely and use a syscall prototype which works without
arch-specific wrappers inside the kernel: The offset argument is
explicitly splitted into two 32bit values.
The patch sports the actual system call implementation and the windup in
the x86 system call tables. Other archs follow as separate patches.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <linux-api@vger.kernel.org>
Cc: <linux-arch@vger.kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add macros for using the UV hub to send interrupts. Change the IPI code
to use these macros. These macros will also be used in additional patches
that will follow.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Container-init must behave like global-init to processes within the
container and hence it must be immune to unhandled fatal signals from
within the container (i.e SIG_DFL signals that terminate the process).
But the same container-init must behave like a normal process to processes
in ancestor namespaces and so if it receives the same fatal signal from a
process in ancestor namespace, the signal must be processed.
Implementing these semantics requires that send_signal() determine pid
namespace of the sender but since signals can originate from workqueues/
interrupt-handlers, determining pid namespace of sender may not always be
possible or safe.
This patchset implements the design/simplified semantics suggested by
Oleg Nesterov. The simplified semantics for container-init are:
- container-init must never be terminated by a signal from a
descendant process.
- container-init must never be immune to SIGKILL from an ancestor
namespace (so a process in parent namespace must always be able
to terminate a descendant container).
- container-init may be immune to unhandled fatal signals (like
SIGUSR1) even if they are from ancestor namespace. SIGKILL/SIGSTOP
are the only reliable signals to a container-init from ancestor
namespace.
This patch:
Based on an earlier patch submitted by Oleg Nesterov and comments from
Roland McGrath (http://lkml.org/lkml/2008/11/19/258).
The handler parameter is currently unused in the tracehook functions.
Besides, the tracehook functions are called with siglock held, so the
functions can check the handler if they later need to.
Removing the parameter simiplifies changes to sig_ignored() in a follow-on
patch.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Daniel Lezcano <daniel.lezcano@free.fr>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (88 commits)
PCI: fix HT MSI mapping fix
PCI: don't enable too much HT MSI mapping
x86/PCI: make pci=lastbus=255 work when acpi is on
PCI: save and restore PCIe 2.0 registers
PCI: update fakephp for bus_id removal
PCI: fix kernel oops on bridge removal
PCI: fix conflict between SR-IOV and config space sizing
powerpc/PCI: include pci.h in powerpc MSI implementation
PCI Hotplug: schedule fakephp for feature removal
PCI Hotplug: rename legacy_fakephp to fakephp
PCI Hotplug: restore fakephp interface with complete reimplementation
PCI: Introduce /sys/bus/pci/devices/.../rescan
PCI: Introduce /sys/bus/pci/devices/.../remove
PCI: Introduce /sys/bus/pci/rescan
PCI: Introduce pci_rescan_bus()
PCI: do not enable bridges more than once
PCI: do not initialize bridges more than once
PCI: always scan child buses
PCI: pci_scan_slot() returns newly found devices
PCI: don't scan existing devices
...
Fix trivial append-only conflict in Documentation/feature-removal-schedule.txt
Make the following header file changes:
- remove arch ifdefs and asm/suspend.h from linux/suspend.h
- add asm/suspend.h to disk.c (for arch_prepare_suspend())
- add linux/io.h to swsusp.c (for ioremap())
- x86 32/64 bit compile fixes
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Cc: Paul Mundt <lethal@linux-sh.org>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc:
Revert "proc: revert /proc/uptime to ->read_proc hook"
proc 2/2: remove struct proc_dir_entry::owner
proc 1/2: do PDE usecounting even for ->read_proc, ->write_proc
proc: fix sparse warnings in pagemap_read()
proc: move fs/proc/inode-alloc.txt comment into a source file
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
PCI PM: Make pci_prepare_to_sleep() disable wake-up if needed
radeonfb: Use __pci_complete_power_transition()
PCI PM: Introduce __pci_[start|complete]_power_transition() (rev. 2)
PCI PM: Restore config spaces of all devices during early resume
PCI PM: Make pci_set_power_state() handle devices with no PM support
PCI PM: Put devices into low power states during late suspend (rev. 2)
PCI PM: Move pci_restore_standard_config to pci-driver.c
PCI PM: Use pci_set_power_state during early resume
PCI PM: Consistently use variable name "error" for pm call return values
kexec: Change kexec jump code ordering
PM: Change hibernation code ordering
PM: Change suspend code ordering
PM: Rework handling of interrupts during suspend-resume
PM: Introduce functions for suspending and resuming device interrupts
Setting ->owner as done currently (pde->owner = THIS_MODULE) is racy
as correctly noted at bug #12454. Someone can lookup entry with NULL
->owner, thus not pinning enything, and release it later resulting
in module refcount underflow.
We can keep ->owner and supply it at registration time like ->proc_fops
and ->data.
But this leaves ->owner as easy-manipulative field (just one C assignment)
and somebody will forget to unpin previous/pin current module when
switching ->owner. ->proc_fops is declared as "const" which should give
some thoughts.
->read_proc/->write_proc were just fixed to not require ->owner for
protection.
rmmod'ed directories will be empty and return "." and ".." -- no harm.
And directories with tricky enough readdir and lookup shouldn't be modular.
We definitely don't want such modular code.
Removing ->owner will also make PDE smaller.
So, let's nuke it.
Kudos to Jeff Layton for reminding about this, let's say, oversight.
http://bugzilla.kernel.org/show_bug.cgi?id=12454
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
* 'iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (60 commits)
dma-debug: make memory range checks more consistent
dma-debug: warn of unmapping an invalid dma address
dma-debug: fix dma_debug_add_bus() definition for !CONFIG_DMA_API_DEBUG
dma-debug/x86: register pci bus for dma-debug leak detection
dma-debug: add a check dma memory leaks
dma-debug: add checks for kernel text and rodata
dma-debug: print stacktrace of mapping path on unmap error
dma-debug: Documentation update
dma-debug: x86 architecture bindings
dma-debug: add function to dump dma mappings
dma-debug: add checks for sync_single_sg_*
dma-debug: add checks for sync_single_range_*
dma-debug: add checks for sync_single_*
dma-debug: add checking for [alloc|free]_coherent
dma-debug: add add checking for map/unmap_sg
dma-debug: add checking for map/unmap_page/single
dma-debug: add core checking functions
dma-debug: add debugfs interface
dma-debug: add kernel command line parameters
dma-debug: add initialization code
...
Fix trivial conflicts due to whitespace changes in arch/x86/kernel/pci-nommu.c
Use the functions introduced in by the previous patch,
suspend_device_irqs(), resume_device_irqs() and check_wakeup_irqs(),
to rework the handling of interrupts during suspend (hibernation) and
resume. Namely, interrupts will only be disabled on the CPU right
before suspending sysdevs, while device drivers will be prevented
from receiving interrupts, with the help of the new helper function,
before their "late" suspend callbacks run (and analogously during
resume).
In addition, since the device interrups are now disabled before the
CPU has turned all interrupts off and the CPU will ACK the interrupts
setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
any wake-up interrupts are pending and abort suspend if that's the
case.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Ingo Molnar <mingo@elte.hu>
* 'x86-stage-3-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (190 commits)
Revert "cpuacct: reduce one NULL check in fast-path"
Revert "x86: don't compile vsmp_64 for 32bit"
x86: Correct behaviour of irq affinity
x86: early_ioremap_init(), use __fix_to_virt(), because we are sure it's safe
x86: use default_cpu_mask_to_apicid for 64bit
x86: fix set_extra_move_desc calling
x86, PAT, PCI: Change vma prot in pci_mmap to reflect inherited prot
x86/dmi: fix dmi_alloc() section mismatches
x86: e820 fix various signedness issues in setup.c and e820.c
x86: apic/io_apic.c define msi_ir_chip and ir_ioapic_chip all the time
x86: irq.c keep CONFIG_X86_LOCAL_APIC interrupts together
x86: irq.c use same path for show_interrupts
x86: cpu/cpu.h cleanup
x86: Fix a couple of sparse warnings in arch/x86/kernel/apic/io_apic.c
Revert "x86: create a non-zero sized bm_pte only when needed"
x86: pci-nommu.c cleanup
x86: io_delay.c cleanup
x86: rtc.c cleanup
x86: i8253 cleanup
x86: kdebugfs.c cleanup
...
Impact: cleanup
It's unused, since about 1995. So remove all initialization of it in
preparation for actually removing the field.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
* 'sched-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (46 commits)
sched: Add comments to find_busiest_group() function
sched: Refactor the power savings balance code
sched: Optimize the !power_savings_balance during fbg()
sched: Create a helper function to calculate imbalance
sched: Create helper to calculate small_imbalance in fbg()
sched: Create a helper function to calculate sched_domain stats for fbg()
sched: Define structure to store the sched_domain statistics for fbg()
sched: Create a helper function to calculate sched_group stats for fbg()
sched: Define structure to store the sched_group statistics for fbg()
sched: Fix indentations in find_busiest_group() using gotos
sched: Simple helper functions for find_busiest_group()
sched: remove unused fields from struct rq
sched: jiffies not printed per CPU
sched: small optimisation of can_migrate_task()
sched: fix typos in documentation
sched: add avg_overlap decay
x86, sched_clock(): mark variables read-mostly
sched: optimize ttwu vs group scheduling
sched: TIF_NEED_RESCHED -> need_reshed() cleanup
sched: don't rebalance if attached on NULL domain
...
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq: (35 commits)
[CPUFREQ] Prevent p4-clockmod from auto-binding to the ondemand governor.
[CPUFREQ] Make cpufreq-nforce2 less obnoxious
[CPUFREQ] p4-clockmod reports wrong frequency.
[CPUFREQ] powernow-k8: Use a common exit path.
[CPUFREQ] Change link order of x86 cpufreq modules
[CPUFREQ] conservative: remove 10x from def_sampling_rate
[CPUFREQ] conservative: fixup governor to function more like ondemand logic
[CPUFREQ] conservative: fix dbs_cpufreq_notifier so freq is not locked
[CPUFREQ] conservative: amend author's email address
[CPUFREQ] Use swap() in longhaul.c
[CPUFREQ] checkpatch cleanups for acpi-cpufreq
[CPUFREQ] powernow-k8: Only print error message once, not per core.
[CPUFREQ] ondemand/conservative: sanitize sampling_rate restrictions
[CPUFREQ] ondemand/conservative: deprecate sampling_rate{min,max}
[CPUFREQ] powernow-k8: Always compile powernow-k8 driver with ACPI support
[CPUFREQ] Introduce /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_transition_latency
[CPUFREQ] checkpatch cleanups for powernow-k8
[CPUFREQ] checkpatch cleanups for ondemand governor.
[CPUFREQ] checkpatch cleanups for powernow-k7
[CPUFREQ] checkpatch cleanups for speedstep related drivers.
...
Partial revert of commit 129d8bc828
titled 'x86: don't compile vsmp_64 for 32bit'
Commit reverted to compile vsmp_64.c if CONFIG_X86_64 is defined,
since is_vsmp_box() needs to indicate that TSCs are not synchronized, and
hence, not a valid time source, even when CONFIG_X86_VSMP is not defined.
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: shai@scalex86.org
LKML-Reference: <20090324061429.GH7278@localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: get correct smp_affinity as user requested
The effect of setting desc->affinity (ie. from userspace via sysfs) has
varied over time. In 2.6.27, the 32-bit code anded the value with
cpu_online_map, and both 32 and 64-bit did that anding whenever a cpu
was unplugged.
2.6.29 consolidated this into one routine (and fixed hotplug) but
introduced another variation: anding the affinity with cfg->domain.
We should just set it to what the user said - if possible.
(cpu_mask_to_apicid_and already takes cpu_online_mask into account)
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <49C94DDF.2010703@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix bug with irq-descriptor moving when logical flat
Rusty observed:
> The effect of setting desc->affinity (ie. from userspace via sysfs) has varied
> over time. In 2.6.27, the 32-bit code anded the value with cpu_online_map,
> and both 32 and 64-bit did that anding whenever a cpu was unplugged.
>
> 2.6.29 consolidated this into one routine (and fixed hotplug) but introduced
> another variation: anding the affinity with cfg->domain. Is this right, or
> should we just set it to what the user said? Or as now, indicate that we're
> restricting it.
Eric pointed out that desc->affinity should be what the user requested,
if it is at all possible to honor the user space request.
This bug got introduced by commit 22f65d31b "x86: Update io_apic.c to use
new cpumask API".
Fix it by moving the masking to before the descriptor moving ...
Reported-by: Rusty Russell <rusty@rustcorp.com.au>
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <49C94134.4000408@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This iommu_op can tell if domain have a specific capability, like snooping
control for Intel IOMMU, which can be used by other components of kernel to
adjust the behaviour.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Impact: cleanup
This fixed various signedness issues in setup.c and e820.c:
arch/x86/kernel/setup.c:455:53: warning: incorrect type in argument 3 (different signedness)
arch/x86/kernel/setup.c:455:53: expected int *pnr_map
arch/x86/kernel/setup.c:455:53: got unsigned int extern [toplevel] *<noident>
arch/x86/kernel/setup.c:639:53: warning: incorrect type in argument 3 (different signedness)
arch/x86/kernel/setup.c:639:53: expected int *pnr_map
arch/x86/kernel/setup.c:639:53: got unsigned int extern [toplevel] *<noident>
arch/x86/kernel/setup.c:820:54: warning: incorrect type in argument 3 (different signedness)
arch/x86/kernel/setup.c:820:54: expected int *pnr_map
arch/x86/kernel/setup.c:820:54: got unsigned int extern [toplevel] *<noident>
arch/x86/kernel/e820.c:670:53: warning: incorrect type in argument 3 (different signedness)
arch/x86/kernel/e820.c:670:53: expected int *pnr_map
arch/x86/kernel/e820.c:670:53: got unsigned int [toplevel] *<noident>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
move out msi_ir_chip and ir_ioapic_chip from CONFIG_INTR_REMAP shadow
Fix:
arch/x86/kernel/apic/io_apic.c:1431: warning: ‘msi_ir_chip’ defined but not used
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Impact: cleanup
This patch fixes the following sparse warnings:
arch/x86/kernel/apic/io_apic.c:3602:17: warning: symbol 'hpet_msi_type'
was not declared. Should it be static?
arch/x86/kernel/apic/io_apic.c:3467:30: warning: Using plain integer as
NULL pointer
Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@movial.com>
LKML-Reference: <1237741871-5827-2-git-send-email-dmitri.vorobiev@movial.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
To reduce the size of the oversized function __get_smp_config()
There should be no impact to functionality.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Impact: fix incorrect error message
- IO APIC resource allocation error message contains one too many "be".
- Print the error message iff there are IO APICs in the system.
I've seen this error message for some time on my x86-32 laptop...
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Alan Bartlett <ajb.stxsl@googlemail.com>
LKML-Reference: <200903202100.30789.bzolnier@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Check alternate signal stack overflow with proper stack pointer.
The stack pointer of the next signal frame is different if that
task has i387 state.
On x86_64, redzone would be included.
No need to check SA_ONSTACK if we're already using alternate signal stack.
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Cc: Roland McGrath <roland@redhat.com>
LKML-Reference: <49C2874D.3080002@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>