kernel-fxtec-pro1x

Author	SHA1	Message	Date
Gavin Shan	c8c29b38fb	powerpc/eeh: pseries platform EEH PE address retrieval There're 2 types of addresses used for EEH operations. The first one would be BDF (Bus/Device/Function) address which is retrieved from the reg property of the corresponding FDT node. Another one is PE address that should be enquired from firmware through RTAS call on pSeries platform. When issuing EEH operation, the PE address has precedence over BDF address. The patch implements retrieving PE address according to the given BDF address on pSeries platform. Also, the struct eeh_early_enable_info has been removed since the information can be figured out from dn->pdn->phb->buid directly and that simplifies the code. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-03-09 11:10:09 +11:00
Gavin Shan	8fb8f70902	powerpc/eeh: pseries platform EEH operations There're 4 EEH operations that are covered by the dedicated RTAS call <ibm,set-eeh-option>: enable or disable EEH, enable MMIO and enable DMA. At early stage of system boot, the EEH would be tried to enable on PCI device related device node. MMIO and DMA for particular PE should be enabled when doing recovery on EEH errors so that the PE could function properly again. The patch implements it and abstract that through struct eeh_ops::set_eeh. It would be help for EEH to support multiple platforms in future. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-03-09 11:09:49 +11:00
Gavin Shan	e2af155c2a	powerpc/eeh: pseries platform EEH initialization The platform specific EEH operations have been abstracted by struct eeh_ops. The individual platroms, including pSeries, needs doing necessary initialization before the platform dependent EEH operations work properly. The patch is addressing that and do necessary platform initialization for pSeries platform. More specificly, it will figure out the tokens of EEH related RTAS calls. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-03-09 11:09:36 +11:00
Gavin Shan	aa1e6374ae	powerpc/eeh: Platform dependent EEH operations EEH has been implemented on RTAS-compliant pSeries platform. That's to say, the EEH operations will be implemented through RTAS calls eventually. The situation limited feasible extension on EEH. In order to support EEH on multiple platforms like pseries and powernv simutaneously. We have to split the platform dependent EEH options up out of current implementation. The patch addresses supporting EEH on multiple platforms. The pseries platform dependent EEH operations will be abstracted by struct eeh_ops. EEH core components will be built based on the registered EEH operations. With the mechanism, what the individual platform needs to do is implement platform dependent EEH operations. For now, the pseries platform is covered under the mechanism. That means we have to think about other platforms to support EEH, like powernv. Besides, we only have framework for the mechanism and we have to implement it for pseries platform later. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-03-09 11:08:54 +11:00
Gavin Shan	cce4b2d243	powerpc/eeh: Cleanup function names in the EEH core The EEH has been implemented on pSeries platform. The original code looks a little bit nasty. The patch does cleanup on the current EEH implementation so that it looks more clean. * Try adding prefix "eeh" for functions. * Some function names have been adjusted so that they looks shorter and meaningful. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-03-09 11:08:37 +11:00
Gavin Shan	cb3bc9d0de	powerpc/eeh: Cleanup comments in the EEH core The EEH has been implemented on pSeries platform. The original code looks a little bit nasty. The patch does cleanup on the current EEH implementation so that it looks more clean. * Duplicated comments have been removed from the corresponding header files. * Comments have been reorganized so that it looks more clean. * The leading comments of functions are adjusted for a little bit so that the result of "make pdfdocs" would be more unified. * Function definitions and calls have unified format as "xxx()". That means the format "xxx ()" has been replaced by "xxx()". * There're multiple functions implemented for resetting PE. The position of those functions have been move around so that they are adjacent to each other to reflect their relationship. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-03-09 11:08:11 +11:00
Benjamin Herrenschmidt	d9ada91ae2	powerpc: Replace mfmsr instructions with load from PACA kernel_msr field On 64-bit, the mfmsr instruction can be quite slow, slower than loading a field from the cache-hot PACA, which happens to already contain the value we want in most cases. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-03-09 10:55:20 +11:00
Benjamin Herrenschmidt	9424fabf86	powerpc: Fix 64-bit BookE FP unavailable exceptions We were using CR0.EQ after EXCEPTION_COMMON, hoping it still contained whether we came from userspace or kernel space. However, under some circumstances, EXCEPTION_COMMON will call C code and clobber non-volatile registers, so we really need to re-load the previous MSR from the stackframe and re-test. While there, invert the condition to make the fast path more obvious and remove the BUG_OPCODE which was a debugging leftover and call .ret_from_except as we should. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-03-09 10:55:18 +11:00
Benjamin Herrenschmidt	990118c84b	powerpc: Fix register clobbering when accumulating stolen time When running under a hypervisor that supports stolen time accounting, we may call C code from the macro EXCEPTION_PROLOG_COMMON in the exception entry path, which clobbers CR0. However, the FPU and vector traps rely on CR0 indicating whether we are coming from userspace or kernel to decide what to do. So we need to restore that value after the C call Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-03-09 10:55:16 +11:00
Benjamin Herrenschmidt	7ac21cd465	powerpc/xmon: Add display of soft & hard irq states Also use local_paca instead of get_paca() to avoid getting into the smp_processor_id() debugging code from the debugger Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-03-09 10:55:14 +11:00
Benjamin Herrenschmidt	9be72573a8	powerpc: Add support for page fault retry and fatal signals Other architectures such as x86 and ARM have been growing new support for features like retrying page faults after dropping the mm semaphore to break contention, or being able to return from a stuck page fault when a SIGKILL is pending. This refactors our implementation of do_page_fault() to move the error handling out of line in a way similar to x86 and adds support for those two features. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-03-09 10:55:12 +11:00
Benjamin Herrenschmidt	9f2f79e3a3	powerpc: Disable interrupts in 64-bit kernel FP and vector faults If we get a floating point, altivec or vsx unavaible interrupt in kernel, we trigger a kernel error. There is no point preserving the interrupt state, in fact, that can even make debugging harder as the processor state might change (we may even preempt) between taking the exception and landing in a debugger. So just make those 3 disable interrupts unconditionally. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> --- v2: On BookE only disable when hitting the kernel unavailable path, otherwise it will fail to restore softe as fast_exception_return doesn't do it.	2012-03-09 10:55:10 +11:00
Benjamin Herrenschmidt	a546498f3b	powerpc: Call do_page_fault() with interrupts off We currently turn interrupts back to their previous state before calling do_page_fault(). This can be annoying when debugging as a bad fault will potentially have lost some processor state before getting into the debugger. We also end up calling some generic code with interrupts enabled such as notify_page_fault() with interrupts enabled, which could be unexpected. This changes our code to behave more like other architectures, and make the assembly entry code call into do_page_faults() with interrupts disabled. They are conditionally re-enabled from within do_page_fault() in the same spot x86 does it. While there, add the might_sleep() test in the case of a successful trylock of the mmap semaphore, again like x86. Also fix a bug in the existing assembly where r12 (_MSR) could get clobbered by C calls (the DTL accounting in the exception common macro and DISABLE_INTS) in some cases. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> --- v2. Add the r12 clobber fix	2012-03-09 10:55:08 +11:00
Benjamin Herrenschmidt	1b70117924	powerpc: Improve behaviour of irq tracing on 64-bit exception entry Some exceptions would unconditionally disable interrupts on entry, which is fine, but calling lockdep every time not only adds more overhead than strictly needed, but also means we get quite a few "redudant" disable logged, which makes it hard to spot the really bad ones. So instead, split the macro used by the exception code into a normal one and a separate one used when CONFIG_TRACE_IRQFLAGS is enabled, and make the later skip th tracing if interrupts were already disabled. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-03-09 10:55:06 +11:00
Benjamin Herrenschmidt	1421ae0b29	powerpc: Improve 64-bit syscall entry/exit We unconditionally hard enable interrupts. This is unnecessary as syscalls are expected to always be called with interrupts enabled. While at it, we add a WARN_ON if that is not the case and CONFIG_TRACE_IRQFLAGS is enabled (we don't want to add overhead to the fast path when this is not set though). Thus let's remove the enabling (and associated irq tracing) from the syscall entry path. Also on Book3S, replace a few mfmsr instructions with loads of PACAMSR from the PACA, which should be faster & schedule better. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-03-09 10:55:04 +11:00
Benjamin Herrenschmidt	fe1952fc0a	powerpc: Rework runlatch code This moves the inlines into system.h and changes the runlatch code to use the thread local flags (non-atomic) rather than the TIF flags (atomic) to keep track of the latch state. The code to turn it back on in an asynchronous interrupt is now simplified and partially inlined. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-03-09 10:55:02 +11:00
Benjamin Herrenschmidt	7450f6f03e	powerpc: Use the same interrupt prolog for perfmon as other interrupts The perfmon interrupt is the sole user of a special variant of the interrupt prolog which differs from the one used by external and timer interrupts in that it saves the non-volatile GPRs and doesn't turn the runlatch on. The former is unnecessary and the later is arguably incorrect, so let's clean that up by using the same prolog. While at it we rename that prolog to use the _ASYNC prefix. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-03-09 10:55:00 +11:00
Benjamin Herrenschmidt	4f8cf36f48	powerpc: Remove legacy iSeries bits from assembly files This removes the various bits of assembly in the kernel entry, exception handling and SLB management code that were specific to running under the legacy iSeries hypervisor which is no longer supported. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-03-09 10:54:59 +11:00
Stephen Rothwell	b078766026	powerpc: clean up vio.c This cleans up vio.c after the removal of the legacy iSeries platform. It also removes some no longer referenced include files. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-03-09 10:35:23 +11:00
Stephen Rothwell	8ee3e0d696	powerpc: Remove the main legacy iSerie platform code Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-03-09 10:35:11 +11:00
Danny Kukawka	9cc815e469	arch/powerpc/kvm/book3s_hv.c: included linux/sched.h twice arch/powerpc/kvm/book3s_hv.c: included 'linux/sched.h' twice, remove the duplicate. Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-03-08 14:10:25 +02:00
Takuya Yoshikawa	db3fe4eb45	KVM: Introduce kvm_memory_slot::arch and move lpage_info into it Some members of kvm_memory_slot are not used by every architecture. This patch is the first step to make this difference clear by introducing kvm_memory_slot::arch; lpage_info is moved into it. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-03-08 14:10:22 +02:00
Akinobu Mita	2d4b971287	powerpc/pmac: Use string library in nvram code - Use memchr_inv to check if the data contains all 0xFF bytes. It is faster than looping for each byte. - Use memcmp to compare memory areas Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-03-07 17:09:05 +11:00
Grant Likely	ad5b7f1350	powerpc: Make SPARSE_IRQ required All IRQs on powerpc are managed via irq_domain anyway, there isn't really any advantage to turning SPARSE_IRQ off, and it's the direction we want to take the kernel design anyway. This patch makes powerpc always use SPARSE_IRQ. On pseries_defconfig, SPARSE_IRQ adds only about 0x300 bytes to the .text sections, and removes about 0x20000 from the data section for the static irq_desc table. Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Cc: Rob Herring <rob.herring@calxeda.com> Cc: Ben Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-03-07 17:09:04 +11:00
Nishanth Aravamudan	e9daf2ad7f	powerpc/prom: Remove limit on maximum size of properties On a 16TB system (using AMS/CMO), I get: WARNING: ignoring large property [/ibm,dynamic-reconfiguration-memory] ibm,dynamic-memory length 0x000000000017ffec and significantly less memory is thus shown to the partition. As far as I can tell, the constant used is arbitrary. Ben Herrenschmidt provided additional background that > The limit was originally set because of Apple machines carrying ROM > images in the device-tree, at a time where we were much more memory > constrained than we are now. and that it is likely not very useful any longer. Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-03-07 17:06:10 +11:00
Matt Fleming	a2007ce844	powerpc: Use set_current_blocked() and block_sigmask() As described in `e6fa16ab` ("signal: sigprocmask() should do retarget_shared_pending()") the modification of current->blocked is incorrect as we need to check whether the signal we're about to block is pending in the shared queue. Also, use the new helper function introduced in commit `5e6292c0f2` ("signal: add block_sigmask() for adding sigmask to current->blocked") which centralises the code for updating current->blocked after successfully delivering a signal and reduces the amount of duplicate code across architectures. In the past some architectures got this code wrong, so using this helper function should stop that from happening again. Cc: Oleg Nesterov <oleg@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-03-07 17:06:09 +11:00
Joe Perches	a2234b4bae	powerpc: Use vsprintf extention %pf with builtin_return_address Emit the function name not the address when possible. builtin_return_address() gives an address. When building a kernel with CONFIG_KALLSYMS, emit the actual function name not the address. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-03-07 17:06:09 +11:00
Jimi Xenidis	de801de139	powerpc/icswx: Fix race condition with IPI setting ACOP There is a race where a thread causes a coprocessor type to be valid in its own ACOP _and_ in the current context, but it does not propagate to the ACOP register of other threads in time for them to use it. The original code tries to solve this by sending an IPI to all threads on the system, which is heavy handed, but unfortunately still provides a window where the icswx is issued by other threads and the ACOP is not up to date. This patch detects that the ACOP DSI fault was a "false positive" and syncs the ACOP and causes the icswx to be replayed. Signed-off-by: Jimi Xenidis <jimix@pobox.com> Cc: Anton Blanchard <anton@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-03-07 17:06:09 +11:00
Anton Blanchard	a6cf7ed511	powerpc/atomic: Implement atomic_inc_not_zero Implement atomic_inc_not_zero and atomic64_inc_not_zero. At the moment we use atomic_add_unless which requires us to put 0 and 1 constants into registers. We can also avoid a subtract by saving the original value in a second temporary. This removes 3 instructions from fget: - c0000000001b63c0: 39 00 00 00 li r8,0 - c0000000001b63c4: 39 40 00 01 li r10,1 ... - c0000000001b63e8: 7c 0a 00 50 subf r0,r10,r0 Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-03-07 17:06:08 +11:00
Duc Dang	8dfc2b45ff	powerpc/44x: Add new compatible value for EMAC node of APM821XX dts file. This compatible value will be used to distinguish some special features of APM821XX EMAC: no half duplex mode support, configuring jumbo frame. Signed-off-by: Duc Dang <dhdang@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-03-06 17:06:07 -05:00
Stephane Eranian	2481c5fa6d	perf: Disable PERF_SAMPLE_BRANCH_* when not supported PERF_SAMPLE_BRANCH_* is disabled for: - SW events (sw counters, tracepoints) - HW breakpoints - ALL but Intel x86 architecture - AMD64 processors Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1328826068-11713-10-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2012-03-05 14:55:42 +01:00
Alexander Graf	d2a1b483a4	KVM: PPC: Add HPT preallocator We're currently allocating 16MB of linear memory on demand when creating a guest. That does work some times, but finding 16MB of linear memory available in the system at runtime is definitely not a given. So let's add another command line option similar to the RMA preallocator, that we can use to keep a pool of page tables around. Now, when a guest gets created it has a pretty low chance of receiving an OOM. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-03-05 14:57:28 +02:00
Alexander Graf	b7f5d0114c	KVM: PPC: Initialize linears with zeros RMAs and HPT preallocated spaces should be zeroed, so we don't accidently leak information from previous VM executions. Signed-off-by: Alexander Graf <agraf@suse.de> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-03-05 14:57:27 +02:00
Alexander Graf	b4e706111d	KVM: PPC: Convert RMA allocation into generic code We have code to allocate big chunks of linear memory on bootup for later use. This code is currently used for RMA allocation, but can be useful beyond that extent. Make it generic so we can reuse it for other stuff later. Signed-off-by: Alexander Graf <agraf@suse.de> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-03-05 14:57:25 +02:00
Alexander Graf	9cf7c0e465	KVM: PPC: E500: Fail init when not on e500v2 When enabling the current KVM code on e500mc, I get the following oops: Oops: Exception in kernel mode, sig: 4 [#1] SMP NR_CPUS=8 P2041 RDB Modules linked in: NIP: c067df4c LR: c067df44 CTR: 00000000 REGS: ee055ed0 TRAP: 0700 Not tainted (3.2.0-10391-g36c5afe) MSR: 00029002 <CE,EE,ME> CR: 24042022 XER: 00000000 TASK = ee0429b0[1] 'swapper/0' THREAD: ee054000 CPU: 2 GPR00: c067df44 ee055f80 ee0429b0 00000000 00000058 0000003f ee211600 60c6b864 GPR08: 7cc903a6 0000002c 00000000 00000001 44042082 2d180088 00000000 00000000 GPR16: c0000a00 00000014 3fffffff 03fe9000 00000015 7ff3be68 c06e0000 00000000 GPR24: 00000000 00000000 00001720 c067df1c c06e0000 00000000 ee054000 c06ab51c NIP [c067df4c] kvmppc_e500_init+0x30/0xf8 LR [c067df44] kvmppc_e500_init+0x28/0xf8 Call Trace: [ee055f80] [c067df44] kvmppc_e500_init+0x28/0xf8 (unreliable) [ee055fb0] [c0001d30] do_one_initcall+0x50/0x1f0 [ee055fe0] [c06721dc] kernel_init+0xa4/0x14c [ee055ff0] [c000e910] kernel_thread+0x4c/0x68 Instruction dump: 9421ffd0 7c0802a6 93410018 9361001c 90010034 93810020 93a10024 93c10028 93e1002c 4bfffe7d 2c030000 408200a4 <7c1082a6> 90010008 7c1182a6 9001000c ---[ end trace b8ef4903fcbf9dd3 ]--- Since it doesn't make sense to run the init function on any non-supported platform, we can just call our "is this platform supported?" function and bail out of init() if it's not. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-03-05 14:57:23 +02:00
Paul Mackerras	9d4cba7f93	KVM: Move gfn_to_memslot() to kvm_host.h This moves __gfn_to_memslot() and search_memslots() from kvm_main.c to kvm_host.h to reduce the code duplication caused by the need for non-modular code in arch/powerpc/kvm/book3s_hv_rm_mmu.c to call gfn_to_memslot() in real mode. Rather than putting gfn_to_memslot() itself in a header, which would lead to increased code size, this puts __gfn_to_memslot() in a header. Then, the non-modular uses of gfn_to_memslot() are changed to call __gfn_to_memslot() instead. This way there is only one place in the source code that needs to be changed should the gfn_to_memslot() implementation need to be modified. On powerpc, the Book3S HV style of KVM has code that is called from real mode which needs to call gfn_to_memslot() and thus needs this. (Module code is allocated in the vmalloc region, which can't be accessed in real mode.) With this, we can remove builtin_gfn_to_memslot() from book3s_hv_rm_mmu.c. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Avi Kivity <avi@redhat.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-03-05 14:57:22 +02:00
Scott Wood	54f65795c8	KVM: PPC: refer to paravirt docs in header file Instead of keeping separate copies of struct kvm_vcpu_arch_shared (one in the code, one in the docs) that inevitably fail to be kept in sync (already sr[] is missing from the doc version), just point to the header file as the source of documentation on the contents of the magic page. Signed-off-by: Scott Wood <scottwood@freescale.com> Acked-by: Avi Kivity <avi@redhat.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-03-05 14:52:41 +02:00
Alexander Graf	b3c5d3c2a4	KVM: PPC: Rename MMIO register identifiers We need the KVM_REG namespace for generic register settings now, so let's rename the existing users to something different, enabling us to reuse the namespace for more visible interfaces. While at it, also move these private constants to a private header. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-03-05 14:52:41 +02:00
Paul Mackerras	31f3438eca	KVM: PPC: Move kvm_vcpu_ioctl_[gs]et_one_reg down to platform-specific code This moves the get/set_one_reg implementation down from powerpc.c into booke.c, book3s_pr.c and book3s_hv.c. This avoids #ifdefs in C code, but more importantly, it fixes a bug on Book3s HV where we were accessing beyond the end of the kvm_vcpu struct (via the to_book3s() macro) and corrupting memory, causing random crashes and file corruption. On Book3s HV we only accept setting the HIOR to zero, since the guest runs in supervisor mode and its vectors are never offset from zero. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de> [agraf update to apply on top of changed ONE_REG patches] Signed-off-by: Avi Kivity <avi@redhat.com>	2012-03-05 14:52:41 +02:00
Alexander Graf	1022fc3d3b	KVM: PPC: Add support for explicit HIOR setting Until now, we always set HIOR based on the PVR, but this is just wrong. Instead, we should be setting HIOR explicitly, so user space can decide what the initial HIOR value is - just like on real hardware. We keep the old PVR based way around for backwards compatibility, but once user space uses the SET_ONE_REG based method, we drop the PVR logic. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-03-05 14:52:41 +02:00
Alexander Graf	e24ed81fed	KVM: PPC: Add generic single register ioctls Right now we transfer a static struct every time we want to get or set registers. Unfortunately, over time we realize that there are more of these than we thought of before and the extensibility and flexibility of transferring a full struct every time is limited. So this is a new approach to the problem. With these new ioctls, we can get and set a single register that is identified by an ID. This allows for very precise and limited transmittal of data. When we later realize that it's a better idea to shove over multiple registers at once, we can reuse most of the infrastructure and simply implement a GET_MANY_REGS / SET_MANY_REGS interface. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-03-05 14:52:40 +02:00
Sasha Levin	6b75e6bfef	KVM: PPC: Use the vcpu kmem_cache when allocating new VCPUs Currently the code kzalloc()s new VCPUs instead of using the kmem_cache which is created when KVM is initialized. Modify it to allocate VCPUs from that kmem_cache. Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-03-05 14:52:40 +02:00
Liu Yu	d37b1a037c	KVM: PPC: booke: Add booke206 TLB trace The existing kvm_stlb_write/kvm_gtlb_write were a poor match for the e500/book3e MMU -- mas1 was passed as "tid", mas2 was limited to "unsigned int" which will be a problem on 64-bit, mas3/7 got split up rather than treated as a single 64-bit word, etc. Signed-off-by: Liu Yu <yu.liu@freescale.com> [scottwood@freescale.com: made mas2 64-bit, and added mas8 init] Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-03-05 14:52:40 +02:00
Paul Mackerras	82ed36164c	KVM: PPC: Book3s HV: Implement get_dirty_log using hardware changed bit This changes the implementation of kvm_vm_ioctl_get_dirty_log() for Book3s HV guests to use the hardware C (changed) bits in the guest hashed page table. Since this makes the implementation quite different from the Book3s PR case, this moves the existing implementation from book3s.c to book3s_pr.c and creates a new implementation in book3s_hv.c. That implementation calls kvmppc_hv_get_dirty_log() to do the actual work by calling kvm_test_clear_dirty on each page. It iterates over the HPTEs, clearing the C bit if set, and returns 1 if any C bit was set (including the saved C bit in the rmap entry). Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-03-05 14:52:39 +02:00
Paul Mackerras	5551489373	KVM: PPC: Book3S HV: Use the hardware referenced bit for kvm_age_hva This uses the host view of the hardware R (referenced) bit to speed up kvm_age_hva() and kvm_test_age_hva(). Instead of removing all the relevant HPTEs in kvm_age_hva(), we now just reset their R bits if set. Also, kvm_test_age_hva() now scans the relevant HPTEs to see if any of them have R set. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-03-05 14:52:39 +02:00
Paul Mackerras	bad3b5075e	KVM: PPC: Book3s HV: Maintain separate guest and host views of R and C bits This allows both the guest and the host to use the referenced (R) and changed (C) bits in the guest hashed page table. The guest has a view of R and C that is maintained in the guest_rpte field of the revmap entry for the HPTE, and the host has a view that is maintained in the rmap entry for the associated gfn. Both view are updated from the guest HPT. If a bit (R or C) is zero in either view, it will be initially set to zero in the HPTE (or HPTEs), until set to 1 by hardware. When an HPTE is removed for any reason, the R and C bits from the HPTE are ORed into both views. We have to be careful to read the R and C bits from the HPTE after invalidating it, but before unlocking it, in case of any late updates by the hardware. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-03-05 14:52:39 +02:00
Paul Mackerras	a92bce95f0	KVM: PPC: Book3S HV: Keep HPTE locked when invalidating This reworks the implementations of the H_REMOVE and H_BULK_REMOVE hcalls to make sure that we keep the HPTE locked and in the reverse- mapping chain until we have finished invalidating it. Previously we would remove it from the chain and unlock it before invalidating it, leaving a tiny window when the guest could access the page even though we believe we have removed it from the guest (e.g., kvm_unmap_hva() has been called for the page and has found no HPTEs in the chain). In addition, we'll need this for future patches where we will need to read the R and C bits in the HPTE after invalidating it. Doing this required restructuring kvmppc_h_bulk_remove() substantially. Since we want to batch up the tlbies, we now need to keep several HPTEs locked simultaneously. In order to avoid possible deadlocks, we don't spin on the HPTE bitlock for any except the first HPTE in a batch. If we can't acquire the HPTE bitlock for the second or subsequent HPTE, we terminate the batch at that point, do the tlbies that we have accumulated so far, unlock those HPTEs, and then start a new batch to do the remaining invalidations. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-03-05 14:52:39 +02:00
Matt Evans	b5434032fc	KVM: PPC: Add KVM_CAP_NR_VCPUS and KVM_CAP_MAX_VCPUS PPC KVM lacks these two capabilities, and as such a userland system must assume a max of 4 VCPUs (following api.txt). With these, a userland can determine a more realistic limit. Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-03-05 14:52:38 +02:00
Matt Evans	03cdab5340	KVM: PPC: Fix vcpu_create dereference before validity check. Fix usage of vcpu struct before check that it's actually valid. Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-03-05 14:52:38 +02:00
Paul Mackerras	4cf302bc10	KVM: PPC: Allow for read-only pages backing a Book3S HV guest With this, if a guest does an H_ENTER with a read/write HPTE on a page which is currently read-only, we make the actual HPTE inserted be a read-only version of the HPTE. We now intercept protection faults as well as HPTE not found faults, and for a protection fault we work out whether it should be reflected to the guest (e.g. because the guest HPTE didn't allow write access to usermode) or handled by switching to kernel context and calling kvmppc_book3s_hv_page_fault, which will then request write access to the page and update the actual HPTE. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2012-03-05 14:52:38 +02:00

... 2 3 4 5 6 ...

9394 commits