Commit graph

1587 commits

Author SHA1 Message Date
David Woodhouse
e758936e02 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	include/asm-x86/statfs.h
2008-10-13 17:13:56 +01:00
Ingo Molnar
206855c321 Merge branch 'x86/urgent' into core/signal
Conflicts:
	arch/x86/kernel/signal_64.c
2008-10-12 11:32:17 +02:00
Linus Torvalds
bf6f51e3a4 Merge phase #3 (IOMMU) of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-v28-for-linus-phase3-B' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (74 commits)
  AMD IOMMU: use iommu_device_max_index, fix
  AMD IOMMU: use iommu_device_max_index
  x86: add PCI IDs for AMD Barcelona PCI devices
  x86/iommu: use __GFP_ZERO instead of memset for GART
  x86/iommu: convert GART need_flush to bool
  x86/iommu: make GART driver checkpatch clean
  x86 gart: remove unnecessary initialization
  x86: restore old GART alloc_coherent behavior
  revert "x86: make GART to respect device's dma_mask about virtual mappings"
  x86: export pci-nommu's alloc_coherent
  iommu: remove fullflush and nofullflush in IOMMU generic option
  x86: remove set_bit_string()
  iommu: export iommu_area_reserve helper function
  AMD IOMMU: use coherent_dma_mask in alloc_coherent
  add AMD IOMMU tree to MAINTAINERS file
  AMD IOMMU: use cmd_buf_size when freeing the command buffer
  AMD IOMMU: calculate IVHD size with a function
  AMD IOMMU: remove unnecessary cast to u64 in the init code
  AMD IOMMU: free domain bitmap with its allocation order
  AMD IOMMU: simplify dma_mask_to_pages
  ...
2008-10-11 11:03:12 -07:00
Ingo Molnar
725c25819e Merge branches 'core/iommu', 'x86/amd-iommu' and 'x86/iommu' into x86-v28-for-linus-phase3-B
Conflicts:
	arch/x86/kernel/pci-gart_64.c
	include/asm-x86/dma-mapping.h
2008-10-10 19:47:12 +02:00
Ingo Molnar
990d0f2ced Merge branches 'sched/devel', 'sched/cpu-hotplug', 'sched/cpusets' and 'sched/urgent' into sched/core 2008-10-08 11:31:02 +02:00
Tony Luck
c459ce8b5a [IA64] Put the space for cpu0 per-cpu area into .data section
Initial fix for making sure that we can access percpu variables
in all C code (commit: 10617bbe84)
inadvertantly allocated the memory in the "percpu" section of
the vmlinux ELF executable.  This confused kexec/dump.

Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-09-29 16:39:19 -07:00
Srinivasa Ds
da654b74bd signals: demultiplexing SIGTRAP signal
Currently a SIGTRAP can denote any one of below reasons.
	- Breakpoint hit
	- H/W debug register hit
	- Single step
	- Signal sent through kill() or rasie()

Architectures like powerpc/parisc provides infrastructure to demultiplex
SIGTRAP signal by passing down the information for receiving SIGTRAP through
si_code of siginfot_t structure. Here is an attempt is generalise this
infrastructure by extending it to x86 and x86_64 archs.

Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: akpm@linux-foundation.org
Cc: paulus@samba.org
Cc: linuxppc-dev@ozlabs.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-23 13:26:52 +02:00
Jay Lan
d3758f87f3 [IA64] kexec fails on systems with blocks of uncached memory
Currently a memory segment in memory map with attribute of EFI_MEMORY_UC
is denoted as "System RAM" in /proc/iomem, while memory of attribute
(EFI_MEMORY_WB|EFI_MEMORY_UC) is also labeled the same.

The kexec utility then includes uncached memory as part of vmcore. The
kdump kernel MCA'ed when it tries to save the vmcore to a disk. A normal
"cached" access may cause MCAs.

This patch would label memory with attribute of EFI_MEMORY_UC only as
"Uncached RAM" so that kexec would know not to include it in the vmcore.
I will submit a separate kexec-tools patch to the kexec list.

Signed-off-by: Jay Lan <jlan@sgi.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-09-22 14:21:19 -07:00
Alex Chiang
06f95ea898 [IA64] Ski simulator doesn't need check_sal_cache_flush
Peter Chubb reported that commit 3463a93def
(Update check_sal_cache_flush to use platform_send_ipi()) broke
Ski because it does not implement IPIs.

Tony Luck suggested we just #ifndef out the call (since the simulator
does not have the SAL bug that this code is attempting to detect and
workaround)

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-09-22 14:13:32 -07:00
Jes Sorensen
9f7263236a KVM: ia64: 'struct fdesc' build fix
Commit 4611a77 ("[IA64] fix compile failure with non modular builds")
introduced struct fdesc into asm/elf.h, which duplicates KVM's definition.
Remove the latter to avoid the build error.

Signed-off-by: Jes Sorensen <jes@sgi.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-09-19 16:49:01 -07:00
Ingo Molnar
6e03f99803 Merge branch 'linus' into x86/iommu
Conflicts:
	lib/swiotlb.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-14 14:07:00 +02:00
Paul E. McKenney
e7b140365b [IA64] prevent ia64 from invoking irq handlers on offline CPUs
Make ia64 refrain from clearing a given to-be-offlined CPU's bit in the
cpu_online_mask until it has processed pending irqs.  This change
prevents other CPUs from being blindsided by an apparently offline CPU
nevertheless changing globally visible state.  Also remove the existing
redundant cpu_clear(cpu, cpu_online_map).

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-09-10 10:52:42 -07:00
Julia Lawall
6bf6a1a493 [IA64] arch/ia64/sn/pci/tioca_provider.c: introduce missing kfree
Error handling code following a kmalloc should free the allocated data.

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-09-10 10:49:36 -07:00
Robin Holt
47633cf0d6 [IA64] fix up bte.h
bte.h expects a #define of L1_CACHE_MASK which is currently only
in bte.c.  This small patch gets bte.h to include cleanly and makes
BTE_UNALIGNED_COPY not report errors.

Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-09-10 10:48:06 -07:00
James Bottomley
4611a771fc [IA64] fix compile failure with non modular builds
Broke the non modular builds by moving an essential function into
modules.c.  Fix this by moving it out again and into asm/sections.h as
an inline.  To do this, the definitions of struct fdesc and struct
got_val have been lifted out of modules.c and put in asm/elf.h where
they belong.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-09-10 10:46:32 -07:00
Ingo Molnar
e92b4fdacc Merge commit 'v2.6.27-rc6' into x86/iommu 2008-09-10 11:32:52 +02:00
James Bottomley
deac93df26 lib: Correct printk %pF to work on all architectures
It was introduced by "vsprintf: add support for '%pS' and '%pF' pointer
formats" in commit 0fe1ef24f7.  However,
the current way its coded doesn't work on parisc64.  For two reasons: 1)
parisc isn't in the #ifdef and 2) parisc has a different format for
function descriptors

Make dereference_function_descriptor() more accommodating by allowing
architecture overrides.  I put the three overrides (for parisc64, ppc64
and ia64) in arch/kernel/module.c because that's where the kernel
internal linker which knows how to deal with function descriptors sits.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-09-09 11:51:15 -07:00
Manfred Spraul
e545a6140b kernel/cpu.c: create a CPU_STARTING cpu_chain notifier
Right now, there is no notifier that is called on a new cpu, before the new
cpu begins processing interrupts/softirqs.
Various kernel function would need that notification, e.g. kvm works around
by calling smp_call_function_single(), rcu polls cpu_online_map.

The patch adds a CPU_STARTING notification. It also adds a helper function
that sends the message to all cpu_chain handlers.

Tested on x86-64.
All other archs are untested. Especially on sparc, I'm not sure if I got
it right.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-08 19:25:24 +02:00
FUJITA Tomonori
3a80b6aa27 ia64: dma_alloc_coherent always use GFP_DMA
This patch makes dma_alloc_coherent use GFP_DMA at all times. This is
necessary for swiotlb, which requires the callers to set up the gfp
flags properly.

swiotlb_alloc_coherent tries to allocate pages with the gfp flags. If
the allocated memory isn't fit for dev->coherent_dma_mask,
swiotlb_alloc_coherent reserves some of the swiotlb memory area, which
is precious resource. So the callers need to set up the gfp flags
properly.

This patch means that other IA64 IOMMUs' dma_alloc_coherent also use
GFP_DMA. These IOMMUs (e.g. SBA IOMMU) don't need GFP_DMA since they
can map a memory to any address. But IA64's GFP_DMA is large,
generally drivers allocate small memory with dma_alloc_coherent only
at startup. So I chose the simplest way to set up the gfp flags for
swiotlb.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-08 15:50:05 +02:00
Adrian Bunk
9d5a9e7465 Remove asm/a.out.h files for all architectures without a.out support.
This patch also includes the required removal of (unused) inclusion of
<asm/a.out.h> <linux/a.out.h>'s in the arch/ code for these
architectures.

[dwmw2: updated for 2.6.27-rc]
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2008-09-06 19:30:24 +01:00
David Woodhouse
4dc429243d IA64: Use <asm-generic/statfs.h>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2008-09-04 09:46:13 +01:00
James Bottomley
8a549f8b58 [IA64] Fix __{in,out}s{w,l} to handle unaligned data
Some ia64 systems produce several repeats of kernel messages like this:

 kernel unaligned access to 0xe000000644220466, ip=0xa000000100516fa1

This was tracked to ide code using the __cmd[] field in "struct request"
via the __outsw() function.  __cmd[] is a char array, so is not guaranteed
to be properly aligned when accessed as words.

Tested-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-08-25 11:23:13 -07:00
Robin Holt
42aca483dd [IA64] Fix ia64 build failure when CONFIG_SFC=m
CONFIG_SFC=m uses topology_core_siblings() which, for ia64, expects
cpu_core_map to be exported.  It is not.  This patch exports the needed
symbol.

Maintainers note: This really looks like the wrong thing to do ... it
would be much better for the kernel to export an API to provide
drivers like this with data they need (which in the case of this
driver seems to be an estimate of the effective parallelism available
on the platform).  But x86 has exported this forever ... so go with
the flow until such an API is defined.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-08-25 11:10:11 -07:00
Christoph Hellwig
37c23e7fda [IA64] use generic compat_old_sys_readdir
Switch ia64 to the generic compat_sys_old_readdir which is identical
except for slightly better error handling.  Also remove sys32_getdents
which already isn't wired up to the syscall table anymore in favour of
compat_sys_getdents.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-08-18 15:42:11 -07:00
Luck, Tony
8a20fd52c6 [IA64] pci_acpi_scan_root cleanup
The code walks all the acpi _CRS methods to see how many windows
to allocate.  It then scans them all again to insert_resource()
for each *even if the first scan found that there were none*.

Move the second scan inside the "if (windows)" clause.

Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-08-18 15:41:21 -07:00
Robin Holt
97653f92c0 [IA64] Shrink shadow_flush_counts to a short array to save 8k of per_cpu area.
Making allmodconfig will break the current build.  This patch shrinks
the per_cpu__shadow_flush_counts from 16k to 8k which frees enough space
to allow allmodconfig to successfully complete.

Fixes http://bugzilla.kernel.org/show_bug.cgi?id=11338

Signed-off-by: Robin Holt <holt@sgi.com>
Acked-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-08-18 15:39:48 -07:00
Robin Holt
ea42b8ce8c [IA64] Remove sn2_defconfig.
Not really a patch as much as a remove this file request.  Now that
generic_defconfig supports all the configurations SGI currently supports
and has NR_CPUS and NR_NODES at our largest configurations, we have no
reason to maintain the extra defconfig file.

Signed-off-by: Robin Holt <holt@sgi.com>
Acked-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-08-18 15:33:40 -07:00
Huang Ying
163f6876f5 kexec jump: rename KEXEC_CONTROL_CODE_SIZE to KEXEC_CONTROL_PAGE_SIZE
Rename KEXEC_CONTROL_CODE_SIZE to KEXEC_CONTROL_PAGE_SIZE, because control
page is used for not only code on some platform.  For example in kexec
jump, it is used for data and stack too.

[akpm@linux-foundation.org: unbreak powerpc and arm, finish conversion]
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-15 08:35:42 -07:00
Adrian Bunk
430ac5ba9c [IA64] use bcd2bin/bin2bcd
This patch changes ia64 to use the new bcd2bin/bin2bcd functions instead
of the obsolete BCD2BIN/BIN2BCD macros.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-08-12 13:32:09 -07:00
Tony Luck
10617bbe84 [IA64] Ensure cpu0 can access per-cpu variables in early boot code
ia64 handles per-cpu variables a litle differently from other architectures
in that it maps the physical memory allocated for each cpu at a constant
virtual address (0xffffffffffff0000). This mapping is not enabled until
the architecture specific cpu_init() function is run, which causes problems
since some generic code is run before this point. In particular when
CONFIG_PRINTK_TIME is enabled, the boot cpu will trap on the access to
per-cpu memory at the first printk() call so the boot will fail without
the kernel printing anything to the console.

Fix this by allocating percpu memory for cpu0 in the kernel data section
and doing all initialization to enable percpu access in head.S before
calling any generic code.

Other cpus must take care not to access per-cpu variables too early, but
their code path from start_secondary() to cpu_init() is all in arch/ia64

Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-08-12 10:34:20 -07:00
Tony Luck
bcbd2b6586 [IA64] Update generic config
Changes to support a new platform in my lab.

Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-08-04 15:47:25 -07:00
Jack Steiner
3351ab9b34 [IA64] Eliminate trailing backquote in IA64_SGI_UV
Eliminate trailing backquote in IA64_SGI_UV config.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-08-04 11:19:35 -07:00
Robin Holt
ceffacc1d6 [IA64] update generic_defconfig to support sn2.
This patch changes the generic_defconfig so it works on all sn2
platforms I have access to.  There is only one support configuration
which was not tested and that configuration is only a combination of two
tested configurations.  With this patchset applied, a generic kernel can
be booted on either a RHEL 5.2, RHEL5.3, or SLES10 SP1 root and operate.
All features needed by SGI's ProPack are also working.  I have not
tested all features of RHEL or SLES, but they do at least boot.

Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-08-04 11:13:21 -07:00
Robin Holt
ac0af91ebc [IA64] update generic_defconfig for 2.6.27-rc1
This patch updates the generic_defconfig for 2.6.27-rc1 by simply doing
a make oldconfig and holding down the carriage return.

Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-08-04 11:11:51 -07:00
Robin Holt
d1339df1f4 [IA64] Allow ia64 to CONFIG_NR_CPUS up to 4096
ia64 has compiled with NR_CPUS=4096 for a couple releases, just forgot
to update Kconfig to allow it.

Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-08-04 11:09:51 -07:00
Robin Holt
94567ef16b [IA64] Cleanup generated file not ignored by .gitignore
arch/ia64/kernel/vmlinux.lds is a generated file. Tell
git to ignore it.

Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-08-04 11:06:16 -07:00
Isaku Yamahata
9b3cbf725f [IA64] pv_ops: fix ivt.S paravirtualization
Recent kernels are not booting on some HP systems (though
it does boot on others). James and Willy reported the
problem.  James did the bisection to find the commit
that caused the problem:
	498c517047.
	[IA64] pvops: paravirtualize ivt.S

Two instructions were wrongly paravirtualized such that
_FROM_ macro had been used where _TO_ was intended

Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: "Wilcox, Matthew  R" <matthew.r.wilcox@intel.com>
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-08-04 10:52:12 -07:00
Tony Luck
7f30491ccd [IA64] Move include/asm-ia64 to arch/ia64/include/asm
After moving the the include files there were a few clean-ups:

1) Some files used #include <asm-ia64/xyz.h>, changed to <asm/xyz.h>

2) Some comments alerted maintainers to look at various header files to
make matching updates if certain code were to be changed. Updated these
comments to use the new include paths.

3) Some header files mentioned their own names in initial comments. Just
deleted these self references.

Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-08-01 10:21:21 -07:00
Jack Steiner
34d8a380d7 GRU Driver: hardware data structures
This series of patches adds a driver for the SGI UV GRU.  The driver is
still in development but it currently compiles for both x86_64 & IA64.
All simple regression tests pass on IA64.  Although features remain to be
added, I'd like to start the process of getting the driver into the
kernel.  Additional kernel drivers will depend on services provide by the
GRU driver.

The GRU is a hardware resource located in the system chipset.  The GRU
contains memory that is mmaped into the user address space.  This memory
is used to communicate with the GRU to perform functions such as
load/store, scatter/gather, bcopy, AMOs, etc.  The GRU is directly
accessed by user instructions using user virtual addresses.  GRU
instructions (ex., bcopy) use user virtual addresses for operands.

The GRU contains a large TLB that is functionally very similar to
processor TLBs.  Because the external contains a TLB with user virtual
address, it requires callouts from the core VM system when certain types
of changes are made to the process page tables.  There are several MMUOPS
patches currently being discussed but none has been accepted into the
kernel.  The GRU driver is built using version V18 from Andrea Arcangeli.

This patch:

Contains the definitions of the hardware GRU data structures that are used
by the driver to manage the GRU.

[akpm@linux-foundation;org: export hpage_shift]
Signed-off-by: Jack Steiner <steiner@sgi.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-30 09:41:47 -07:00
Julia Lawall
cab7a1eeeb KVM: ia64: Fix irq disabling leak in error handling code
There is a call to local_irq_restore in the normal exit case, so it would
seem that there should be one on an error return as well.

The semantic patch that finds this problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@@
expression l;
expression E,E1,E2;
@@

local_irq_save(l);
... when != local_irq_restore(l)
    when != spin_unlock_irqrestore(E,l)
    when any
    when strict
(
if (...) { ... when != local_irq_restore(l)
               when != spin_unlock_irqrestore(E1,l)
+   local_irq_restore(l);
    return ...;
}
|
if (...)
+   {local_irq_restore(l);
    return ...;
+   }
|
spin_unlock_irqrestore(E2,l);
|
local_irq_restore(l);
)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-27 11:35:32 +03:00
Roland McGrath
85ba2d862e tracehook: wait_task_inactive
This extends wait_task_inactive() with a new argument so it can be used in
a "soft" mode where it will check for the task changing state unexpectedly
and back off.  There is no change to existing callers.  This lays the
groundwork to allow robust, noninvasive tracing that can try to sample a
blocked thread but back off safely if it wakes up.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:09 -07:00
FUJITA Tomonori
8d8bb39b9e dma-mapping: add the device argument to dma_mapping_error()
Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER
architecture does:

This enables us to cleanly fix the Calgary IOMMU issue that some devices
are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423).

I think that per-device dma_mapping_ops support would be also helpful for
KVM people to support PCI passthrough but Andi thinks that this makes it
difficult to support the PCI passthrough (see the above thread).  So I
CC'ed this to KVM camp.  Comments are appreciated.

A pointer to dma_mapping_ops to struct dev_archdata is added.  If the
pointer is non NULL, DMA operations in asm/dma-mapping.h use it.  If it's
NULL, the system-wide dma_ops pointer is used as before.

If it's useful for KVM people, I plan to implement a mechanism to register
a hook called when a new pci (or dma capable) device is created (it works
with hot plugging).  It enables IOMMUs to set up an appropriate
dma_mapping_ops per device.

The major obstacle is that dma_mapping_error doesn't take a pointer to the
device unlike other DMA operations.  So x86 can't have dma_mapping_ops per
device.  Note all the POWER IOMMUs use the same dma_mapping_error function
so this is not a problem for POWER but x86 IOMMUs use different
dma_mapping_error functions.

The first patch adds the device argument to dma_mapping_error.  The patch
is trivial but large since it touches lots of drivers and dma-mapping.h in
all the architecture.

This patch:

dma_mapping_error() doesn't take a pointer to the device unlike other DMA
operations.  So we can't have dma_mapping_ops per device.

Note that POWER already has dma_mapping_ops per device but all the POWER
IOMMUs use the same dma_mapping_error function.  x86 IOMMUs use device
argument.

[akpm@linux-foundation.org: fix sge]
[akpm@linux-foundation.org: fix svc_rdma]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix bnx2x]
[akpm@linux-foundation.org: fix s2io]
[akpm@linux-foundation.org: fix pasemi_mac]
[akpm@linux-foundation.org: fix sdhci]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix sparc]
[akpm@linux-foundation.org: fix ibmvscsi]
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:03 -07:00
Linus Torvalds
29ca069cc6 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
  [IA64] Wire up new system calls
2008-07-25 17:29:03 -07:00
Srinivasa D S
ef53d9c5e4 kprobes: improve kretprobe scalability with hashed locking
Currently list of kretprobe instances are stored in kretprobe object (as
used_instances,free_instances) and in kretprobe hash table.  We have one
global kretprobe lock to serialise the access to these lists.  This causes
only one kretprobe handler to execute at a time.  Hence affects system
performance, particularly on SMP systems and when return probe is set on
lot of functions (like on all systemcalls).

Solution proposed here gives fine-grain locks that performs better on SMP
system compared to present kretprobe implementation.

Solution:

 1) Instead of having one global lock to protect kretprobe instances
    present in kretprobe object and kretprobe hash table.  We will have
    two locks, one lock for protecting kretprobe hash table and another
    lock for kretporbe object.

 2) We hold lock present in kretprobe object while we modify kretprobe
    instance in kretprobe object and we hold per-hash-list lock while
    modifying kretprobe instances present in that hash list.  To prevent
    deadlock, we never grab a per-hash-list lock while holding a kretprobe
    lock.

 3) We can remove used_instances from struct kretprobe, as we can
    track used instances of kretprobe instances using kretprobe hash
    table.

Time duration for kernel compilation ("make -j 8") on a 8-way ppc64 system
with return probes set on all systemcalls looks like this.

cacheline              non-cacheline             Un-patched kernel
aligned patch 	       aligned patch
===============================================================================
real    9m46.784s       9m54.412s                  10m2.450s
user    40m5.715s       40m7.142s                  40m4.273s
sys     2m57.754s       2m58.583s                  3m17.430s
===========================================================

Time duration for kernel compilation ("make -j 8) on the same system, when
kernel is not probed.
=========================
real    9m26.389s
user    40m8.775s
sys     2m7.283s
=========================

Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com>
Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:30 -07:00
Tony Luck
3e4d0cab61 [IA64] Wire up new system calls
Six new system calls: signalfd4, eventfd2, epoll_create1,
dup3, pipe2 and inotify_init1.

Signed-off-by: Tony Luck <tony.luck@intel.com>
2008-07-25 10:10:28 -07:00
Ulrich Drepper
ed8cae8ba0 flag parameters: pipe
This patch introduces the new syscall pipe2 which is like pipe but it also
takes an additional parameter which takes a flag value.  This patch implements
the handling of O_CLOEXEC for the flag.  I did not add support for the new
syscall for the architectures which have a special sys_pipe implementation.  I
think the maintainers of those archs have the chance to go with the unified
implementation but that's up to them.

The implementation introduces do_pipe_flags.  I did that instead of changing
all callers of do_pipe because some of the callers are written in assembler.
I would probably screw up changing the assembly code.  To avoid breaking code
do_pipe is now a small wrapper around do_pipe_flags.  Once all callers are
changed over to do_pipe_flags the old do_pipe function can be removed.

The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>

#ifndef __NR_pipe2
# ifdef __x86_64__
#  define __NR_pipe2 293
# elif defined __i386__
#  define __NR_pipe2 331
# else
#  error "need __NR_pipe2"
# endif
#endif

int
main (void)
{
  int fd[2];
  if (syscall (__NR_pipe2, fd, 0) != 0)
    {
      puts ("pipe2(0) failed");
      return 1;
    }
  for (int i = 0; i < 2; ++i)
    {
      int coe = fcntl (fd[i], F_GETFD);
      if (coe == -1)
        {
          puts ("fcntl failed");
          return 1;
        }
      if (coe & FD_CLOEXEC)
        {
          printf ("pipe2(0) set close-on-exit for fd[%d]\n", i);
          return 1;
        }
    }
  close (fd[0]);
  close (fd[1]);

  if (syscall (__NR_pipe2, fd, O_CLOEXEC) != 0)
    {
      puts ("pipe2(O_CLOEXEC) failed");
      return 1;
    }
  for (int i = 0; i < 2; ++i)
    {
      int coe = fcntl (fd[i], F_GETFD);
      if (coe == -1)
        {
          puts ("fcntl failed");
          return 1;
        }
      if ((coe & FD_CLOEXEC) == 0)
        {
          printf ("pipe2(O_CLOEXEC) does not set close-on-exit for fd[%d]\n", i);
          return 1;
        }
    }
  close (fd[0]);
  close (fd[1]);

  puts ("OK");

  return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:28 -07:00
Johannes Weiner
3560e249ab bootmem: replace node_boot_start in struct bootmem_data
Almost all users of this field need a PFN instead of a physical address,
so replace node_boot_start with node_min_pfn.

[Lee.Schermerhorn@hp.com: fix spurious BUG_ON() in mark_bootmem()]
Signed-off-by: Johannes Weiner <hannes@saeureba.de>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:20 -07:00
Andi Kleen
ceb8687961 hugetlb: introduce pud_huge
Straight forward extensions for huge pages located in the PUD instead of
PMDs.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:18 -07:00
Andi Kleen
a551643895 hugetlb: modular state for hugetlb page size
The goal of this patchset is to support multiple hugetlb page sizes.  This
is achieved by introducing a new struct hstate structure, which
encapsulates the important hugetlb state and constants (eg.  huge page
size, number of huge pages currently allocated, etc).

The hstate structure is then passed around the code which requires these
fields, they will do the right thing regardless of the exact hstate they
are operating on.

This patch adds the hstate structure, with a single global instance of it
(default_hstate), and does the basic work of converting hugetlb to use the
hstate.

Future patches will add more hstate structures to allow for different
hugetlbfs mounts to have different page sizes.

[akpm@linux-foundation.org: coding-style fixes]
Acked-by: Adam Litke <agl@us.ibm.com>
Acked-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:17 -07:00
Jan Beulich
42b7772812 mm: remove double indirection on tlb parameter to free_pgd_range() & Co
The double indirection here is not needed anywhere and hence (at least)
confusing.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 10:47:15 -07:00