If the page fault is caused by mmio, we can cache the mmio info, later, we do
not need to walk guest page table and quickly know it is a mmio fault while we
emulate the mmio instruction
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Introduce vcpu_mmio_gva_to_gpa to translate the gva to gpa, we can use it
to cleanup the code between read emulation and write emulation
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Properly check the last mapping, and do not walk to the next level if last spte
is met
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch implements the kvm bits of the steal time infrastructure.
The most important part of it, is the steal time clock. It is an
continuous clock that shows the accumulated amount of steal time
since vcpu creation. It is supposed to survive cpu offlining/onlining.
[marcelo: fix build with CONFIG_KVM_GUEST=n]
Signed-off-by: Glauber Costa <glommer@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Tested-by: Eric B Munson <emunson@mgebm.net>
CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Avi Kivity <avi@redhat.com>
CC: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Provide additional information on SIGTRAP by using a sig_info signal.
Use TRAP_BRKPT for breakpoints via illegal operation and TRAP_HWBKPT
for breakpoints via program event recording. Provide the address of
the instruction that caused the breakpoint via si_addr.
While we are at it get rid of tracehook_consider_fatal_signal.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch extends the DASD statistics to allow for a more detailed
analysis of DASD I/O operations. In particular we want the statistics
to provide answers to the following questions:
- How many requests used a PAV alias?
- How many requests used High Performance FICON?
- How do read request perform versus write requests?
The existing DASD statistics interface has several shortcomings
- The interface for global data is a formatted text table in procfs
(/proc/dasd/statistics). The layout is meant for human readers and
is not to easy to parse. If values get to large for the table
layout, they get scaled down.
- The statistics which are collected per block device can be
accessed via an ioctl interface, which can only be extended by
defining a new ioctl.
- There is no statistics interface for individual PAV base and alias
devices.
To overcome theses shortcomings we create a new DASD statistics
interface in debugfs. This interface will contain one entry for global
data, one per DASD block device, and one per DASD base and alias
device. Each file contains the statistic data in easy to parse
name/value and name/array pairs. The existing interfaces will remain
functional, but they will not be extended.
Signed-off-by: Stefan Weinhuber <wein@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
SIGP emerg needs to pass the source vpu adress into __LC_CPU_ADDRESS of the
target guest.
Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The cpu measurement alerts that are used for instance by oprofile
for hardware sampling are not turned off on a cpu that is going
offline. Add the appropriate control register bit that should be
disabled to the list.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Do not set the cr0 enablement bit for iucv by default in head[31|64].S,
move the enablement to iucv_init in the iucv base layer.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The (un-)register_external_interrupt functions are not race safe if
more than one interrupt handler is added or deleted for an external
interrupt concurrently.
Make the registration / unregistration of external interrupts race safe
by using RCU and a spinlock. RCU is used to avoid a performance penalty
in the external interrupt handler, the register and unregister functions
are protected by the spinlock and are not performance critical.
call_rcu must be used since the SCLP driver uses the interface with
IRQs disabled. Also use the generic list implementation rather than
homebrewn list code.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
After git commit 66ceed5ad1 removed
the tape block device driver, remove its documentation as well.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add toleration support for ap devices with device type 10.
Signed-off-by: Holger Dengler <hd@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch removes the mmu reload logic for kvm on s390. Via Martin's
new gmap interface, we can safely add or remove memory slots while
guest CPUs are in-flight. Thus, the mmu reload logic is not needed
anymore.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch removes kvm-s390 internal assumption of a linear mapping
of guest address space to user space. Previously, guest memory was
translated to user addresses using a fixed offset (gmsor). The new
code uses gmap_fault to resolve guest addresses.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch switches kvm from using (Qemu's) user address space to
Martin's gmap address space. This way QEMU does not have to use a
linker script in order to fit large guests at low addresses in its
address space.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add code that allows KVM to control the virtual memory layout that
is seen by a guest. The guest address space uses a second page table
that shares the last level pte-tables with the process page table.
If a page is unmapped from the process page table it is automatically
unmapped from the guest page table as well.
The guest address space mapping starts out empty, KVM can map any
individual 1MB segments from the process virtual memory to any 1MB
aligned location in the guest virtual memory. If a target segment in
the process virtual memory does not exist or is unmapped while a
guest mapping exists the desired target address is stored as an
invalid segment table entry in the guest page table.
The population of the guest page table is fault driven.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The alignment is missing for various global symbols in s390 assembly code.
With a recent gcc and an instruction like stgrl this can lead to a
specification exception if the instruction uses such a mis-aligned address.
Specify the alignment explicitely and while add it define __ALIGN for s390
and use the ENTRY define to save some lines of code.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The entry to / exit from sie has subtle dependencies to the first level
interrupt handler. Move the sie assembler code to entry64.S and replace
the SIE_HOOK callback with a test and the new _TIF_SIE bit.
In addition this patch fixes several problems in regard to the check for
the_TIF_EXIT_SIE bits. The old code checked the TIF bits before executing
the interrupt handler and it only modified the instruction address if it
pointed directly to the sie instruction. In both cases it could miss
a TIF bit that normally would cause an exit from the guest and would
reenter the guest context.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When running a kvm guest we can get intercepts for tprot, if the host
page table is read-only or not populated. This patch implements the
most common case (linux memory detection).
This also allows host copy on write for guest memory on newer systems.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The following race can occur with qdio devices that use the shared device
state change indicator:
Device (Shared DSCI) CPU0 CPU1
===============================================================================
1. DSCI 0 => 1,
INT pending
2. Thinint handler
* si_used = 1
* Inbound tasklet_schedule
* DSCI 1 => 0
3. DSCI 0 => 1,
INT pending
4. Thinint handler
* si_used = 1
* Inbound tasklet_schedu
le
=> NOP
5. Inbound tasklet run
6. DSCI = 1,
INT surpressed
7. DSCI 1 => 0
The race would lead to a stall where new data in the input queue is
not recognized so the device stops working in case of no further traffic.
Fix the race by resetting the DSCI before scheduling the inbound tasklet
so the device generates an interrupt if new data arrives in the above
scenario in step 6.
Reviewed-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
On x86 a page without a mapper is by definition not referenced / old.
The s390 architecture keeps the reference bit in the storage key and
the current code will check the storage key for page without a mapper.
This leads to an interesting effect: the first time an s390 system
needs to write pages to swap it only finds referenced pages. This
causes a lot of pages to get added and written to the swap device.
To avoid this behaviour change page_referenced to query the storage
key only if there is a mapper of the page.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Do not trace arch_local_save_flags(), arch_local_irq_*() and friends.
Although they are marked inline, gcc may still make a function out of
them and add it to the pool of functions that are traced by the function
tracer. This can cause undesirable results (kernel panic, triple faults,
etc).
Add the notrace notation to prevent them from ever being traced.
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fix improper protocol err_handler, current implementation is fully
unapplicable and may cause kernel crash due to double kfree_skb.
Signed-off-by: Dmitry Kozlov <xeb@mail.ru>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rt_tos was changed to iph->tos but it must be filtered by RT_TOS
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove duplicated #include('s) in
drivers/net/via-velocity.c
Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove duplicated #include('s) in
drivers/net/qlge/qlge_main.c
Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove duplicated #include('s) in
drivers/net/igb/igb_main.c
Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove duplicated #include('s) in
drivers/net/can/c_can/c_can.c
drivers/net/can/c_can/c_can_platform.c
Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove duplicated #include('s) in
drivers/net/bna/bnad.c
Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cleanup sff_pio_task_link when a command is cancel while the
pio_task thread has been scheduled.
Signed-off-by: Gwendal Grignou <gwendal@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
This patch adds an additional SATA RAID controller DeviceID for the Intel Panther Point PCH.
Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
The missing comma causes the wrong RAID type to be displayed.
Introduced by commit 963e4975c6 three
years ago, odd that nobody noticed before.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Fixed hsdev memleak on sata_dwc_probe() error.
As dma_dwc_exit() can be called multiple times without sata_dma_regs and
irq_dma changes, it might lead to double free on sequential
dma_dwc_exit() calls. So, zero these fields after free calls.
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Like e65cc194f7 this patch enables 64bit DMA
for the AHCI SATA controller of a board that has the SB600 southbridge. In
this case though we're enabling 64bit DMA for the Asus M3A motherboard. It
is a new enough board that all of the BIOS releases since the initial
release (0301 from 2007-10-22) work correctly with 64bit DMA enabled.
Signed-off-by: Mark Nelson <mdnelson8@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Link resume failure in itself isn't an error condition and may happen
regularly depending on hardware configuration. Reporting it as
KERN_ERR makes the condition unnecessarily prominent (e.g. reported
during boot). Use KERN_WARNING instead.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: David Shaw <dshaw@jabberwocky.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
ahci_sb600_softreset was in ahci.c. This function is used
to fix soft reset failure and renames as ahci_pmp_retry_softreset
in libahci.c.
Signed-off-by: Yuan-Hsin Chen <yhchen@faraday-tech.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
libata EH intentionally left a port frozen if it failed
ata_eh_reset(). The intention was avoiding continuous loop of resets
when the controller or attached device is flaky and reporting spurious
hotplug events. Once port enters this state, it can be recovered with
manual rescan, which seemed reasonable.
However, outside of my convoluted test setup, there have been very few
reports justifying this choice while there have been more cases where
the automatic freezing of the port after hotplug attempt of a faulty
device caused confusion and led to unnecessary resets.
This patch changes the behavior so that the port is thawed after reset
failure. This change doesn't necessarily solve but makes it easier
and more intuitive to work around hotplug related problems
(ie. re-pluggin or power cycling the device) as reported in the
followings.
https://bugzilla.kernel.org/show_bug.cgi?id=34712http://thread.gmane.org/gmane.linux.kernel/1123265/focus=49548
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Reartes Guillermo <rtguille@gmail.com>
Reported-by: Bruce Stenning <b.stenning@indigovision.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Use normal debugging path for dynamic debug capability.
Convert dev_printk(KERN_DEBUG to dev_dbg(
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Use a single mechanism to show driver version.
Reduces text a tiny bit too.
Remove uses of static int printed_version
Add and use ata_print_version(const struct device *, const char *ver)
and ata_print_version_once.
$ size drivers/ata/built-in.*
text data bss dec hex filename
544969 73893 116584 735446 b38d6 drivers/ata/built-in.allyesconfig.ata.o
543870 73893 116592 734355 b34ad drivers/ata/built-in.allyesconfig.print_once.o
141328 14689 4220 160237 271ed drivers/ata/built-in.defconfig.ata.o
141212 14689 4220 160121 27179 drivers/ata/built-in.defconfig.print_once.o
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Saves a bit of text as the call takes fewer args.
Coalesce a few formats.
Convert a few bare printks to pr_cont.
$ size drivers/ata/built-in.o*
text data bss dec hex filename
558429 73893 117864 750186 b726a drivers/ata/built-in.o.allyesconfig.new
559574 73893 117888 751355 b76fb drivers/ata/built-in.o.allyesconfig.old
149567 14689 4220 168476 2921c drivers/ata/built-in.o.defconfig.new
149851 14689 4220 168760 29338 drivers/ata/built-in.o.defconfig.old
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Some systems benefit from completions always being steered to the strict
requester cpu rather than the looser "per-socket" steering that
blk_cpu_to_group() attempts by default. This is because the first
CPU in the group mask ends up being completely overloaded with work,
while the others (including the original submitter) has power left
to spare.
Allow the strict mode to be set by writing '2' to the sysfs control
file. This is identical to the scheme used for the nomerges file,
where '2' is a more aggressive setting than just being turned on.
echo 2 > /sys/block/<bdev>/queue/rq_affinity
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Roland Dreier <roland@purestorage.com>
Tested-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>