Use new cpumask API in /proc/cpuinfo code.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This separates the per cpu output from the summary output at the end of the
file, making it easier to convert to the new cpumask API in a subsequent
patch.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Use the new cpumask API and add some comments to clarify how get_irq_server
works.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Use new cpumask functions in pseries SMP startup code.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Use new cpumask functions in iseries SMP startup code.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Use new cpumask_* functions, and dynamically allocate cpumask in fixup_irqs.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Use the new cpumask_* functions and dynamically allocate the cpumask in
smp_cpus_done.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Use cpumask_first, cpumask_next in rtasd code.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Change &cpu_online_map to cpu_online_mask.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
As explained in commit 1c0fe6e3bd, we want to call the architecture independent
oom killer when getting an unexplained OOM from handle_mm_fault, rather than
simply killing current.
Cc: linuxppc-dev@ozlabs.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We need to keep track of the backing pages that get allocated by
vmemmap_populate() so that when we use kdump, the dump-capture kernel knows
where these pages are.
We use a simple linked list of structures that contain the physical address
of the backing page and corresponding virtual address to track the backing
pages.
To save space, we just use a pointer to the next struct vmemmap_backing. We
can also do this because we never remove nodes. We call the pointer "list"
to be compatible with changes made to the crash utility.
vmemmap_populate() is called either at boot-time or on a memory hotplug
operation. We don't have to worry about the boot-time calls because they
will be inherently single-threaded, and for a memory hotplug operation
vmemmap_populate() is called through:
sparse_add_one_section()
|
V
kmalloc_section_memmap()
|
V
sparse_mem_map_populate()
|
V
vmemmap_populate()
and in sparse_add_one_section() we're protected by pgdat_resize_lock().
So, we don't need a spinlock to protect the vmemmap_list.
We allocate space for the vmemmap_backing structs by allocating whole pages
in vmemmap_list_alloc() and then handing out chunks of this to
vmemmap_list_populate().
This means that we waste at most just under one page, but this keeps the code
is simple.
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently the parsing of the device tree in
arch/powerpc/include/asm/parport.h assumes that the interrupt provided in
the parallel port node is a valid virtual irq. The values for the
interrupts provided in the device tree should have meaning in the context
of the driver for the specific interrupt controller to which the interrupt
is connected and irq_of_parse_and_map() should be used to determine the
correct virtual irq.
Signed-off-by: Martyn Welch <martyn.welch@ge.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
So we tried to speed things up a bit using flush_hash_pages() directly
but that falls over on 603 of course meaning we fail to flush the TLB
properly and we may even end up having it corrupt memory randomly by
accessing a hash table that doesn't exist.
This removes the "optimization" by always going through flush_tlb_page()
for now at least.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently we always call start-cpu irrespective of if the CPU is
stopped or not. Unfortunatley on POWER7, firmware seems to not like
start-cpu being called when a cpu already been started. This was not
the case on POWER6 and earlier.
This patch checks to see if the CPU is stopped or not via an
query-cpu-stopped-state call, and only calls start-cpu on CPUs which
are stopped.
This fixes a bug with kexec on POWER7 on PHYP where only the primary
thread would make it to the second kernel.
Reported-by: Ankita Garg <ankita@linux.vnet.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This moves query_cpu_stopped() out of the hotplug cpu code and into
smp.c so it can called in other places and renames it to
smp_query_cpu_stopped().
It also cleans up the return values by adding some #defines
Cc: <stable@kernel.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
By setting "reset_type" to one of the following values, the default
software reset mechanism may be overidden. Here the possible values of
"reset_type":
1 - PPC4xx core reset
2 - PPC4xx chip reset
3 - PPC4xx system reset (default)
This will be used by a new PPC440SPe board port, which needs a "chip
reset" instead of the default "system reset" to be asserted.
Signed-off-by: Stefan Roese <sr@denx.de>
Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
A defconfig for the IBM ISS 476 simulator
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
This is a trivial 4xx plaform that uses the new simple bsp from
Josh and is handy to use in simulators such as ISS or even Mambo
who don't properly implement most of the actual devices in the
SoC but really only the core.
Signed-off-by: Torez Smith <lnxtorez@linux.vnet.ibm.com>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
476 requires an isync after loading MMU and debug related SPR's. Some of
these are in performance-critical paths and may need to be optimized, but
initially, we're playing it safe.
Signed-off-by: Torez Smith <lnxtorez@linux.vnet.ibm.com>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
The 47x core's MCSR varies from 44x, so it needs it's own machine check
handler.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
This patch adds the base support for the 476 processor. The code was
primarily written by Ben Herrenschmidt and Torez Smith, but I've been
maintaining it for a while.
The goal is to have a single binary that will run on 44x and 47x, but
we still have some details to work out. The biggest is that the L1 cache
line size differs on the two platforms, but it's currently a compile-time
option.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Torez Smith <lnxtorez@linux.vnet.ibm.com>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
The 47x platform supports multiple cores and shares code with 44x.
Break out code that is common for initializing the primary and secondary
cpus into a function which can be called for both.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
This patch adds a marker to the exception stack frame to aid in debugging.
It's already inserted on other platforms and xmon recognizes it and
identifies exception frames when showing stack traces.
Signed-off-by: Torez Smith <lnxtorez@linux.vnet.ibm.com>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
When we compare the devices DMA mask to the amount of memory we need to
make sure we treat the DMA mask as an address boundary. For example if
the DMA_MASK(32) and we have 4G of memory we'd incorrectly set the dma
ops to swiotlb. We need to add one to the dma mask when we convert it.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The bypassing of this test is a leftover from 2.4 vintage
kernels, and is no longer appropriate, or even used by KGDB.
Currently KGDB uses probe_kernel_write() for all access to
memory via the KGDB core, so it can simply be deleted.
This fixes CVE-2010-1446.
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Wufei <fei.wu@windriver.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Currently, platforms using CONFIG_OF add a 'struct device_node *of_node'
to dev->archdata. However, with CONFIG_OF becoming generic for all
architectures, it makes sense for commonality to move it out of archdata
and into struct device proper.
This patch adds a struct device_node *of_node member to struct device
and updates all locations which currently write the device_node pointer
into archdata to also update dev->of_node. Subsequent patches will
modify callers to use the archdata location and ultimately remove
the archdata member entirely.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
CC: Michal Simek <monstr@monstr.eu>
CC: Greg Kroah-Hartman <gregkh@suse.de>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: "David S. Miller" <davem@davemloft.net>
CC: Stephen Rothwell <sfr@canb.auug.org.au>
CC: Jeremy Kerr <jeremy.kerr@canonical.com>
CC: microblaze-uclinux@itee.uq.edu.au
CC: linux-kernel@vger.kernel.org
CC: linuxppc-dev@ozlabs.org
CC: sparclinux@vger.kernel.org
Refresh ps3_defconfig to latest kernel sources and change
these kernel config options:
o CONFIG_USB_ANNOUNCE_NEW_DEVICES: n -> y
o CONFIG_USB_EHCI_TT_NEWSCHED: n -> y
o CONFIG_CMDLINE_BOOL: n -> y
o CONFIG_CMDLINE: n -> ""
Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This ensures that the translations for unmapped IO mappings or
unmapped memory are properly removed from the MMU hash table
before such an unplug. Without this, the hypervisor refuses the
unplug operations due to those resources still being mapped by
the partition.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Firmware changed the way it represents memory and cpu affinity on POWER7.
Unfortunately the old method now caps the topology to work around issues
with legacy operating systems. For Linux to get the correct topology we
need to use the new form 1 affinity information.
We set the form 1 field in the client architecture, and if we see "1" in the
ibm,associativity-form property firmware supports form 1 affinity and
we should look at the first field in the ibm,associativity-reference-points
array. If not we use the second field as we always have.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The following commit broke CONFIG_RELOCATABLE support on FSL Book-E
parts:
commit 549e8152de
Author: Paul Mackerras <paulus@samba.org>
Date: Sat Aug 30 11:43:47 2008 +1000
powerpc: Make the 64-bit kernel as a position-independent executable
The change to __va and __pa to use PAGE_OFFSET & MEMORY_START causes
problems on the Book-E parts because we don't know MEMORY_START until
after we parse the device tree. We need __va to work properly to even
parse the device tree so we have a chicken an egg. So go back to using
he other definition of __va/__pa on CONFIG_BOOKE and use the
PAGE_OFFSET/MEMORY_START version on "Classic" PPC64.
Also updated casts to handle phys_addr_t being a different size from
unsigned long (ie 36-bit physical on PPC32).
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
When we destory a vcpu, we should also make sure to kill all pending
timers that could still be up. When not doing this, hrtimers might
dereference null pointers trying to call our code.
This patch fixes spontanious kernel panics seen after closing VMs.
Signed-off-by: Alexander Graf <alex@csgraf.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
While converting the kzalloc we used to allocate our vcpu struct to
vmalloc, I forgot to memset the contents to zeros. That broke quite
a lot.
This patch memsets it to zero again.
Signed-off-by: Alexander Graf <alex@csgraf.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
We used to use get_free_pages to allocate our vcpu struct. Unfortunately
that call failed on me several times after my machine had a big enough
uptime, as memory became too fragmented by then.
Fortunately, we don't need it to be page aligned any more! We can just
vmalloc it and everything's great.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
We don't need as complex code. I had some thinkos while writing it, figuring
I needed to support PPC32 paths on PPC64 which would have required DR=0, but
everything just runs fine with DR=1.
So let's make the functions simple C call wrappers that reserve some space on
the stack for the respective functions to clobber.
Fixes out-of-RMA-access (and thus guest FPU loading) on the PS3.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
We had code to make use of the secondary htab buckets, but kept that
disabled because it was unstable when I put it in.
I checked again if that's still the case and apparently it was only
exposing some instability that was there anyways before. I haven't
seen any badness related to usage of secondary htab entries so far.
This should speed up guest memory allocations by quite a bit, because
we now have more space to put PTEs in.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
We need to tell userspace that we can emulate paired single instructions.
So let's add a capability export.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
The one big thing about the Gekko is paired singles.
Paired singles are an extension to the instruction set, that adds 32 single
precision floating point registers (qprs), some SPRs to modify the behavior
of paired singled operations and instructions to deal with qprs to the
instruction set.
Unfortunately, it also changes semantics of existing operations that affect
single values in FPRs. In most cases they get mirrored to the coresponding
QPR.
Thanks to that we need to emulate all FPU operations and all the new paired
single operations too.
In order to achieve that, we use the just introduced FPU call helpers to
call the real FPU whenever the guest wants to modify an FPR. Additionally
we also fix up the QPR values along the way.
That way we can execute paired single FPU operations without implementing a
soft fpu.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
When we get a program interrupt we usually don't expect it to perform an
MMIO operation. But why not? When we emulate paired singles, we can end
up loading or storing to an MMIO address - and the handling of those
happens in the program interrupt handler.
So let's teach the program interrupt handler how to deal with EMULATE_MMIO.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
The PowerPC specification always lists bits from MSB to LSB. That is
really confusing when you're trying to write C code, because it fits
in pretty badly with the normal (1 << xx) schemes.
So I came up with some nice wrappers that allow to get and set fields
in a u64 with bit numbers exactly as given in the spec. That makes the
code in KVM and the spec easier comparable.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
BATs didn't work. Well, they did, but only up to BAT3. As soon as we
came to BAT4 the offset calculation was screwed up and we ended up
overwriting BAT0-3.
Fortunately, Linux hasn't been using BAT4+. It's still a good
idea to write correct code though.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
To emulate paired single instructions, we need to be able to call FPU
operations from within the kernel. Since we don't want gcc to spill
arbitrary FPU code everywhere, we tell it to use a soft fpu.
Since we know we can really call the FPU in safe areas, let's also add
some calls that we can later use to actually execute real world FPU
operations on the host's FPU.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
We need to call the ext giveup handlers from code outside of book3s.c.
So let's make it non-static.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
The Book3S KVM implementation contains some helper functions to load and store
data from and to virtual addresses.
Unfortunately, this helper used to keep the physical address it so nicely
found out for us to itself. So let's change that and make it return the
physical address it resolved.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
The Book3S_32 specifications allows for two instructions to modify segment
registers: mtsrin and mtsr.
Most normal operating systems use mtsrin, because it allows to define which
segment it wants to change using a register. But since I was trying to run
an embedded guest, it turned out to be using mtsr with hardcoded values.
So let's also emulate mtsr. It's a valid instruction after all.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
There's a typo in the debug ifdef of the book3s_32 mmu emulation. While trying
to debug something I stumbled across that and wanted to save anyone after me
(or myself later) from having to debug that again.
So let's fix the ifdef.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
There are some situations when we're pretty sure the guest will use the
FPU soon. So we can save the churn of going into the guest, finding out
it does want to use the FPU and going out again.
This patch adds preloading of the FPU when it's reasonable.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
When we for example get an Altivec interrupt, but our guest doesn't support
altivec, we need to inject a program interrupt, not an altivec interrupt.
The same goes for paired singles. When an altivec interrupt arrives, we're
pretty sure we need to emulate the instruction because it's a paired single
operation.
So let's make all the ext handlers aware that they need to jump to the
program interrupt handler when an extension interrupt arrives that
was not supposed to arrive for the guest CPU.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>