This adds infrastructure which will be needed to allow book3s_hv KVM to
run on older POWER processors, including PPC970, which don't support
the Virtual Real Mode Area (VRMA) facility, but only the Real Mode
Offset (RMO) facility. These processors require a physically
contiguous, aligned area of memory for each guest. When the guest does
an access in real mode (MMU off), the address is compared against a
limit value, and if it is lower, the address is ORed with an offset
value (from the Real Mode Offset Register (RMOR)) and the result becomes
the real address for the access. The size of the RMA has to be one of
a set of supported values, which usually includes 64MB, 128MB, 256MB
and some larger powers of 2.
Since we are unlikely to be able to allocate 64MB or more of physically
contiguous memory after the kernel has been running for a while, we
allocate a pool of RMAs at boot time using the bootmem allocator. The
size and number of the RMAs can be set using the kvm_rma_size=xx and
kvm_rma_count=xx kernel command line options.
KVM exports a new capability, KVM_CAP_PPC_RMA, to signal the availability
of the pool of preallocated RMAs. The capability value is 1 if the
processor can use an RMA but doesn't require one (because it supports
the VRMA facility), or 2 if the processor requires an RMA for each guest.
This adds a new ioctl, KVM_ALLOCATE_RMA, which allocates an RMA from the
pool and returns a file descriptor which can be used to map the RMA. It
also returns the size of the RMA in the argument structure.
Having an RMA means we will get multiple KMV_SET_USER_MEMORY_REGION
ioctl calls from userspace. To cope with this, we now preallocate the
kvm->arch.ram_pginfo array when the VM is created with a size sufficient
for up to 64GB of guest memory. Subsequently we will get rid of this
array and use memory associated with each memslot instead.
This moves most of the code that translates the user addresses into
host pfns (page frame numbers) out of kvmppc_prepare_vrma up one level
to kvmppc_core_prepare_memory_region. Also, instead of having to look
up the VMA for each page in order to check the page size, we now check
that the pages we get are compound pages of 16MB. However, if we are
adding memory that is mapped to an RMA, we don't bother with calling
get_user_pages_fast and instead just offset from the base pfn for the
RMA.
Typically the RMA gets added after vcpus are created, which makes it
inconvenient to have the LPCR (logical partition control register) value
in the vcpu->arch struct, since the LPCR controls whether the processor
uses RMA or VRMA for the guest. This moves the LPCR value into the
kvm->arch struct and arranges for the MER (mediated external request)
bit, which is the only bit that varies between vcpus, to be set in
assembly code when going into the guest if there is a pending external
interrupt request.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
This lifts the restriction that book3s_hv guests can only run one
hardware thread per core, and allows them to use up to 4 threads
per core on POWER7. The host still has to run single-threaded.
This capability is advertised to qemu through a new KVM_CAP_PPC_SMT
capability. The return value of the ioctl querying this capability
is the number of vcpus per virtual CPU core (vcore), currently 4.
To use this, the host kernel should be booted with all threads
active, and then all the secondary threads should be offlined.
This will put the secondary threads into nap mode. KVM will then
wake them from nap mode and use them for running guest code (while
they are still offline). To wake the secondary threads, we send
them an IPI using a new xics_wake_cpu() function, implemented in
arch/powerpc/sysdev/xics/icp-native.c. In other words, at this stage
we assume that the platform has a XICS interrupt controller and
we are using icp-native.c to drive it. Since the woken thread will
need to acknowledge and clear the IPI, we also export the base
physical address of the XICS registers using kvmppc_set_xics_phys()
for use in the low-level KVM book3s code.
When a vcpu is created, it is assigned to a virtual CPU core.
The vcore number is obtained by dividing the vcpu number by the
number of threads per core in the host. This number is exported
to userspace via the KVM_CAP_PPC_SMT capability. If qemu wishes
to run the guest in single-threaded mode, it should make all vcpu
numbers be multiples of the number of threads per core.
We distinguish three states of a vcpu: runnable (i.e., ready to execute
the guest), blocked (that is, idle), and busy in host. We currently
implement a policy that the vcore can run only when all its threads
are runnable or blocked. This way, if a vcpu needs to execute elsewhere
in the kernel or in qemu, it can do so without being starved of CPU
by the other vcpus.
When a vcore starts to run, it executes in the context of one of the
vcpu threads. The other vcpu threads all go to sleep and stay asleep
until something happens requiring the vcpu thread to return to qemu,
or to wake up to run the vcore (this can happen when another vcpu
thread goes from busy in host state to blocked).
It can happen that a vcpu goes from blocked to runnable state (e.g.
because of an interrupt), and the vcore it belongs to is already
running. In that case it can start to run immediately as long as
the none of the vcpus in the vcore have started to exit the guest.
We send the next free thread in the vcore an IPI to get it to start
to execute the guest. It synchronizes with the other threads via
the vcore->entry_exit_count field to make sure that it doesn't go
into the guest if the other vcpus are exiting by the time that it
is ready to actually enter the guest.
Note that there is no fixed relationship between the hardware thread
number and the vcpu number. Hardware threads are assigned to vcpus
as they become runnable, so we will always use the lower-numbered
hardware threads in preference to higher-numbered threads if not all
the vcpus in the vcore are runnable, regardless of which vcpus are
runnable.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
This improves I/O performance for guests using the PAPR
paravirtualization interface by making the H_PUT_TCE hcall faster, by
implementing it in real mode. H_PUT_TCE is used for updating virtual
IOMMU tables, and is used both for virtual I/O and for real I/O in the
PAPR interface.
Since this moves the IOMMU tables into the kernel, we define a new
KVM_CREATE_SPAPR_TCE ioctl to allow qemu to create the tables. The
ioctl returns a file descriptor which can be used to mmap the newly
created table. The qemu driver models use them in the same way as
userspace managed tables, but they can be updated directly by the
guest with a real-mode H_PUT_TCE implementation, reducing the number
of host/guest context switches during guest IO.
There are certain circumstances where it is useful for userland qemu
to write to the TCE table even if the kernel H_PUT_TCE path is used
most of the time. Specifically, allowing this will avoid awkwardness
when we need to reset the table. More importantly, we will in the
future need to write the table in order to restore its state after a
checkpoint resume or migration.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
This adds support for KVM running on 64-bit Book 3S processors,
specifically POWER7, in hypervisor mode. Using hypervisor mode means
that the guest can use the processor's supervisor mode. That means
that the guest can execute privileged instructions and access privileged
registers itself without trapping to the host. This gives excellent
performance, but does mean that KVM cannot emulate a processor
architecture other than the one that the hardware implements.
This code assumes that the guest is running paravirtualized using the
PAPR (Power Architecture Platform Requirements) interface, which is the
interface that IBM's PowerVM hypervisor uses. That means that existing
Linux distributions that run on IBM pSeries machines will also run
under KVM without modification. In order to communicate the PAPR
hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
to include/linux/kvm.h.
Currently the choice between book3s_hv support and book3s_pr support
(i.e. the existing code, which runs the guest in user mode) has to be
made at kernel configuration time, so a given kernel binary can only
do one or the other.
This new book3s_hv code doesn't support MMIO emulation at present.
Since we are running paravirtualized guests, this isn't a serious
restriction.
With the guest running in supervisor mode, most exceptions go straight
to the guest. We will never get data or instruction storage or segment
interrupts, alignment interrupts, decrementer interrupts, program
interrupts, single-step interrupts, etc., coming to the hypervisor from
the guest. Therefore this introduces a new KVMTEST_NONHV macro for the
exception entry path so that we don't have to do the KVM test on entry
to those exception handlers.
We do however get hypervisor decrementer, hypervisor data storage,
hypervisor instruction storage, and hypervisor emulation assist
interrupts, so we have to handle those.
In hypervisor mode, real-mode accesses can access all of RAM, not just
a limited amount. Therefore we put all the guest state in the vcpu.arch
and use the shadow_vcpu in the PACA only for temporary scratch space.
We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
We don't have a shared page with the guest, but we still need a
kvm_vcpu_arch_shared struct to store the values of various registers,
so we include one in the vcpu_arch struct.
The POWER7 processor has a restriction that all threads in a core have
to be in the same partition. MMU-on kernel code counts as a partition
(partition 0), so we have to do a partition switch on every entry to and
exit from the guest. At present we require the host and guest to run
in single-thread mode because of this hardware restriction.
This code allocates a hashed page table for the guest and initializes
it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We
require that the guest memory is allocated using 16MB huge pages, in
order to simplify the low-level memory management. This also means that
we can get away without tracking paging activity in the host for now,
since huge pages can't be paged or swapped.
This also adds a few new exports needed by the book3s_hv code.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
This is a shared page used for paravirtualization. It is always present
in the guest kernel's effective address space at the address indicated
by the hypercall that enables it.
The physical address specified by the hypercall is not used, as
e500 does not have real mode.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
When CR0.WP=0, we sometimes map user pages as kernel pages (to allow
the kernel to write to them). Unfortunately this also allows the kernel
to fetch from these pages, even if CR4.SMEP is set.
Adjust for this by also setting NX on the spte in these circumstances.
Signed-off-by: Avi Kivity <avi@redhat.com>
The documented behavior did not match the implemented one (which also
never changed).
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Neither host_irq nor the guest_msi struct are used anymore today.
Tag the former, drop the latter to avoid confusion.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Document KVM_IOEVENTFD that can be used to receive
notifications of PIO/MMIO events without triggering
an exit.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch includes a brief introduction to the nested vmx feature in the
Documentation/kvm directory. The document also includes a copy of the
vmcs12 structure, as requested by Avi Kivity.
[marcelo: move to Documentation/virtual/kvm]
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This boolean function simply returns whether or not the runtime status
of the device is 'suspended'. Unlike pm_runtime_suspended(), this
function returns the runtime status whether or not runtime PM for the
device has been disabled or not.
Also add entry to Documentation/power/runtime.txt
Signed-off-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
That file harkens back to the days of the big 2.4 -> 2.6 version jump,
and was based even then on older versions. Some of it is just obsolete,
and Jesper Juhl points out that it talks about kernel versions 2.6 and
should be updated to 3.0.
Remove some obsolete text, and re-phrase some other to not be 2.6-specific.
Reported-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6:
[media] msp3400: fill in v4l2_tuner based on vt->type field
[media] tuner-core.c: don't change type field in g_tuner or g_frequency
[media] cx18/ivtv: fix g_tuner support
[media] tuner-core: power up tuner when called with s_power(1)
[media] v4l2-ioctl.c: check for valid tuner type in S_HW_FREQ_SEEK
[media] tuner-core: simplify the standard fixup
[media] tuner-core/v4l2-subdev: document that the type field has to be filled in
[media] v4l2-subdev.h: remove unused s_mode tuner op
[media] feature-removal-schedule: change in how radio device nodes are handled
[media] bttv: fix s_tuner for radio
[media] pvrusb2: fix g/s_tuner support
[media] v4l2-ioctl.c: prefill tuner type for g_frequency and g/s_tuner
[media] tuner-core: fix tuner_resume: use t->mode instead of t->type
[media] tuner-core: fix s_std and s_tuner
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86:
hp-wmi: fix use after free
dell-laptop - using buffer without mutex_lock
Revert: "dell-laptop: Toggle the unsupported hardware killswitch"
platform-drivers-x86: set backlight type to BACKLIGHT_PLATFORM
thinkpad-acpi: handle HKEY 0x4010, 0x4011 events
drivers/platform/x86: Fix memory leak
thinkpad-acpi: handle some new HKEY 0x60xx events
acer-wmi: fix bitwise bug when set device state
acer-wmi: Only update rfkill status for associated hotkey events
Since we removed sti()/cli() and related, how about removing it from
Documentation/spinlocks.txt?
Signed-off-by: Muthukumar R <muthur@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I changed the TKIP key functions, but forgot to
update the documentation includes, fix that.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add sysfs files for each led of the wiimote. Writing 1 to the file
enables the led and 0 disables the led.
We do not need memory barriers when checking wdata->ready since we use
a spinlock directly after it.
Signed-off-by: David Herrmann <dh.herrmann@googlemail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
This patch enables fetching configuration data, which is normally provided
via platform_data, from the device-tree instead.
If the device is configured from device-tree data, the platform_data struct
is not used, and button data needs to be allocated dynamically. Big part of
this patch deals with confining pdata usage to the probe function, to make
this possible.
Signed-off-by: David Jander <david@protonic.nl>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
SiRFprimaII is the latest generation application processor from CSR’s
Multifunction SoC product family. Designed around an ARM cortex A9 core,
high-speed memory bus, advanced 3D accelerator and full-HD multi-format
video decoder, SiRFprimaII is able to meet the needs of complicated
applications for modern multifunction devices that require heavy concurrent
applications and fluid user experience. Integrated with GPS baseband,
analog and PMU, this new platform is designed to provide a cost effective
solution for Automotive and Consumer markets.
This patch adds the basic support for this SoC and EVB board based on device
tree. It is following the ZYNQ of Xilinx in some degree.
Signed-off-by: Binghua Duan <Binghua.Duan@csr.com>
Signed-off-by: Rongjun Ying <Rongjun.Ying@csr.com>
Signed-off-by: Zhiwu Song <Zhiwu.Song@csr.com>
Signed-off-by: Yuping Luo <Yuping.Luo@csr.com>
Signed-off-by: Bin Shi <Bin.Shi@csr.com>
Signed-off-by: Huayi Li <Huayi.Li@csr.com>
Signed-off-by: Barry Song <Baohua.Song@csr.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Multiple attempts to dynamically reallocate pci resources have
unfortunately lead to regressions. Though we continue to fix the
regressions and fine tune the dynamic-reallocation behavior, we have not
reached a acceptable state yet.
This patch provides a interim solution. It disables dynamic reallocation
by default, but adds the ability to enable it through pci=realloc kernel
command line parameter.
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
There are cases, when 80% max isochronous bandwidth is too limiting.
For example I have two USB video capture cards which stream uncompressed
video, and to stream full NTSC + PAL videos we'd need
NTSC 640x480 YUV422 @30fps ~17.6 MB/s
PAL 720x576 YUV422 @25fps ~19.7 MB/s
isoc bandwidth.
Now, due to limited alt settings in capture devices NTSC one ends up
streaming with max_pkt_size=2688 and PAL with max_pkt_size=2892, both
with interval=1. In terms of microframe time allocation this gives
NTSC ~53us
PAL ~57us
and together
~110us > 100us == 80% of 125us uframe time.
So those two devices can't work together simultaneously because the'd
over allocate isochronous bandwidth.
80% seemed a bit arbitrary to me, and I've tried to raise it to 90% and
both devices started to work together, so I though sometimes it would be
a good idea for users to override hardcoded default of max 80% isoc
bandwidth.
After all, isn't it a user who should decide how to load the bus? If I
can live with 10% or even 5% bulk bandwidth that should be ok. I'm a USB
newcomer, but that 80% set in stone by USB 2.0 specification seems to be
chosen pretty arbitrary to me, just to serve as a reasonable default.
NOTE 1
~~~~~~
for two streams with max_pkt_size=3072 (worst case) both time
allocation would be 60us+60us=120us which is 96% periodic bandwidth
leaving 4% for bulk and control. Alan Stern suggested that bulk then
would be problematic (less than 300*8 bittimes left per microframe), but
I think that is still enough for control traffic.
NOTE 2
~~~~~~
Sarah Sharp expressed concern that maxing out periodic bandwidth
could lead to vendor-specific hardware bugs on host controllers, because
> It's entirely possible that you'll run into
> vendor-specific bugs if you try to pack the schedule with isochronous
> transfers. I don't think any hardware designer would seriously test or
> validate their hardware with a schedule that is basically a violation of
> the USB bus spec (more than 80% for periodic transfers).
So far I've only tested this patch on my HP Mini 5103 with N10 chipset
kirr@mini:~$ lspci
00:00.0 Host bridge: Intel Corporation N10 Family DMI Bridge
00:02.0 VGA compatible controller: Intel Corporation N10 Family Integrated Graphics Controller
00:02.1 Display controller: Intel Corporation N10 Family Integrated Graphics Controller
00:1b.0 Audio device: Intel Corporation N10/ICH 7 Family High Definition Audio Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation N10/ICH 7 Family PCI Express Port 1 (rev 02)
00:1c.3 PCI bridge: Intel Corporation N10/ICH 7 Family PCI Express Port 4 (rev 02)
00:1d.0 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation N10/ICH 7 Family USB2 EHCI Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e2)
00:1f.0 ISA bridge: Intel Corporation NM10 Family LPC Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation N10/ICH7 Family SATA AHCI Controller (rev 02)
01:00.0 Network controller: Broadcom Corporation BCM4313 802.11b/g/n Wireless LAN Controller (rev 01)
02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8059 PCI-E Gigabit Ethernet Controller (rev 11)
and the system works stable with 110us/uframe (~88%) isoc bandwith allocated for
above-mentioned isochronous transfers.
NOTE 3
~~~~~~
This feature is off by default. I mean max periodic bandwidth is set to
100us/uframe by default exactly as it was before the patch. So only those of us
who need the extreme settings are taking the risk - normal users who do not
alter uframe_periodic_max sysfs attribute should not see any change at all.
NOTE 4
~~~~~~
I've tried to update documentation in Documentation/ABI/ thoroughly, but
only "TBD" was put into Documentation/usb/ehci.txt -- the text there seems
to be outdated and much needing refreshing, before it could be amended.
Cc: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The patch adds device tree probe support for gpio-mxc driver.
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Add the drivers/virt directory, which houses drivers that support
virtualization environments, and add the Freescale hypervisor management
driver.
The Freescale hypervisor management driver provides several services to
drivers and applications related to the Freescale hypervisor:
1. An ioctl interface for querying and managing partitions
2. A file interface to reading incoming doorbells
3. An interrupt handler for shutting down the partition upon receiving the
shutdown doorbell from a manager partition
4. A kernel interface for receiving callbacks when a managed partition
shuts down.
Signed-off-by: Timur Tabi <timur@freescale.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Add an FS-Cache helper to bulk uncache pages on an inode. This will
only work for the circumstance where the pages in the cache correspond
1:1 with the pages attached to an inode's page cache.
This is required for CIFS and NFS: When disabling inode cookie, we were
returning the cookie and setting cifsi->fscache to NULL but failed to
invalidate any previously mapped pages. This resulted in "Bad page
state" errors and manifested in other kind of errors when running
fsstress. Fix it by uncaching mapped pages when we disable the inode
cookie.
This patch should fix the following oops and "Bad page state" errors
seen during fsstress testing.
------------[ cut here ]------------
kernel BUG at fs/cachefiles/namei.c:201!
invalid opcode: 0000 [#1] SMP
Pid: 5, comm: kworker/u:0 Not tainted 2.6.38.7-30.fc15.x86_64 #1 Bochs Bochs
RIP: 0010: cachefiles_walk_to_object+0x436/0x745 [cachefiles]
RSP: 0018:ffff88002ce6dd00 EFLAGS: 00010282
RAX: ffff88002ef165f0 RBX: ffff88001811f500 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000282
RBP: ffff88002ce6dda0 R08: 0000000000000100 R09: ffffffff81b3a300
R10: 0000ffff00066c0a R11: 0000000000000003 R12: ffff88002ae54840
R13: ffff88002ae54840 R14: ffff880029c29c00 R15: ffff88001811f4b0
FS: 00007f394dd32720(0000) GS:ffff88002ef00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fffcb62ddf8 CR3: 000000001825f000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/u:0 (pid: 5, threadinfo ffff88002ce6c000, task ffff88002ce55cc0)
Stack:
0000000000000246 ffff88002ce55cc0 ffff88002ce6dd58 ffff88001815dc00
ffff8800185246c0 ffff88001811f618 ffff880029c29d18 ffff88001811f380
ffff88002ce6dd50 ffffffff814757e4 ffff88002ce6dda0 ffffffff8106ac56
Call Trace:
cachefiles_lookup_object+0x78/0xd4 [cachefiles]
fscache_lookup_object+0x131/0x16d [fscache]
fscache_object_work_func+0x1bc/0x669 [fscache]
process_one_work+0x186/0x298
worker_thread+0xda/0x15d
kthread+0x84/0x8c
kernel_thread_helper+0x4/0x10
RIP cachefiles_walk_to_object+0x436/0x745 [cachefiles]
---[ end trace 1d481c9af1804caa ]---
I tested the uncaching by the following means:
(1) Create a big file on my NFS server (104857600 bytes).
(2) Read the file into the cache with md5sum on the NFS client. Look in
/proc/fs/fscache/stats:
Pages : mrk=25601 unc=0
(3) Open the file for read/write ("bash 5<>/warthog/bigfile"). Look in proc
again:
Pages : mrk=25601 unc=25601
Reported-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de>
cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since technically it's not powerpc arch-specific. Also rename it sec2
to differentiate it from its incompatible successor, the SEC 4.
Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Radio devices have weird side-effects when used with combined TV/radio
tuners and the V4L2 spec is ambiguous on how it should work. This results
in inconsistent driver behavior which makes life hard for everyone.
Be more strict in when and how the switch between radio and tv mode
takes place and make sure all drivers behave the same.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Handle events 0x4010 and 0x4011 so that we do not pester users about them.
These events report when the thinkpad is docked/undocked to a native
hotplug dock (i.e. one that does not need ACPI handling, nor is represented
in the ACPI device tree). Such docks are based on USB 2.0/3.0, and also
work as port replicators.
We really want a proper dock class to report these, or at least new input
EV_SW events. Since it is not clear which one to use yet, keep reporting
them as vendor-specific ThinkPad events.
WARNING: As defined by the thinkpad-acpi sysfs ABI rules of engagement, the
vendor-specific events will be REMOVED as soon as generic events are made
available (duplicate events are a big problem), with an appropriate update
to the thinkpad-acpi sysfs/event ABI versioning. Userspace is already
prepared to provide easy backwards compatibility for such changes when
convenient to the distro (see acpi-fakekey).
* Event 0x4010: docking to hotplug dock/port replicator
* Event 0x4011: undocking from hotplug dock/port replicator
Typical usecase would be to trigger display reconfiguration.
Reports mention T410, T510, and series 3 docks/port replicators. Special
thanks to Robert de Rooy for his extensive report and analysis of the
situation.
http://www.thinkwiki.org/wiki/ThinkPad_Port_Replicator_Series_3http://www.thinkwiki.org/wiki/ThinkPad_Mini_Dock_Series_3http://www.thinkwiki.org/wiki/ThinkPad_Mini_Dock_Plus_Series_3http://www.thinkwiki.org/wiki/ThinkPad_Mini_Dock_Plus_Series_3_for_Mobile_Workstationshttp://lenovoblogs.com/insidethebox/?p=290
Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Reported-by: Claudius Hubig <claudiushubig@chubig.net>
Reported-by: Doctor Bill <docbill@gmail.com>
Reported-by: Korte Noack <gbk.noack@gmx.de>
Reported-by: Robert de Rooy <robert.de.rooy@gmail.com>
Reported-by: Sebastian Will <swill@csail.mit.edu>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Handle some user interface events from the newer Lenovo models. We are likely
to do something smart with these events in the future, for now, hide the ones
we are already certain about from the user and userspace both.
* Events 0x6000 and 0x6005 are key-related. 0x6005 is not properly identified
yet. Ignore these events, and do not report them.
* Event 0x6040 has not been properly identified yet, and we don't know if it
is important (looks like it isn't, but still...). Keep reporting it.
* Change the message the driver outputs on unknown 0x6xxx events, as all
recent events are not related to thermal alarms. Degrade log level from
ALERT to WARNING.
Thanks to all users who reported these events or asked about them in a number
of mailing lists. Your help is highly appreciated, even if I did took a lot of
time to act on them. For that I apologise.
I will list those that identified the reasons for the events as "reported-by",
and I apologise in advance if I leave anyone out: it was not done on purpose, I
made the mistake of not properly tagging all event report emails separately,
and might have missed some.
Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Reported-by: Markus Malkusch <markus@malkusch.de>
Reported-by: Peter Giles <g1l3sp@gmail.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Fix following warning in ifenslave.c with gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu4).
Documentation/networking/ifenslave.c:263:4: warning: format not a string literal and no format arguments
Documentation/networking/ifenslave.c:271:3: warning: format not a string literal and no format arguments
Documentation/networking/ifenslave.c:277:3: warning: format not a string literal and no format arguments
Documentation/networking/ifenslave.c:285:3: warning: format not a string literal and no format arguments
Documentation/networking/ifenslave.c:291:3: warning: format not a string literal and no format arguments
Documentation/networking/ifenslave.c:292:3: warning: format not a string literal and no format arguments
Documentation/networking/ifenslave.c:312:4: warning: format not a string literal and no format arguments
Documentation/networking/ifenslave.c:323:3: warning: format not a string literal and no format arguments
Documentation/networking/ifenslave.c:342:4: warning: format not a string literal and no format arguments
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a driver to configure the XO-1 RTC via CS5536 MSRs, to be used as a
system wakeup source via olpc-xo1-pm.
Device detection is based on finding the relevant device tree node.
Signed-off-by: Daniel Drake <dsd@laptop.org>
Link: http://lkml.kernel.org/r/1309019658-1712-11-git-send-email-dsd@laptop.org
Acked-by: Andres Salomon <dilinger@queued.net>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: devicetree-discuss@lists.ozlabs.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
All the blkio.throttle.* file names are incorrectly reported without
".throttle" in the documentation. Fix it.
Signed-off-by: Andrea Righi <andrea@betterlinux.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The list of available general purpose memory allocators in
Documentation/CodingStyle chapter 14 is incomplete. This patch adds
the missing vzalloc() to the list.
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The runtime PM documentation and kerneldoc comments sometimes spell
"runtime" with a dash (i.e. "run-time"). Replace all of those
instances with "runtime" to make the naming consistent.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
The runtime PM documentation in Documentation/power/runtime_pm.txt
doesn't say that pm_runtime_enable() and pm_runtime_disable() work by
operating on power.disable_depth, which is wrong, because the
possibility of nesting disables doesn't follow from the description
of these functions. Also, there is no description of
pm_runtime_barrier() at all in the document, which is confusing.
Improve the documentation by fixing those issues.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
One of the roles of the PM core is to prevent different PM callbacks
executed for the same device object from racing with each other.
Unfortunately, after commit e866500247
(PM: Allow pm_runtime_suspend() to succeed during system suspend)
runtime PM callbacks may be executed concurrently with system
suspend/resume callbacks for the same device.
The main reason for commit e866500247
was that some subsystems and device drivers wanted to use runtime PM
helpers, pm_runtime_suspend() and pm_runtime_put_sync() in
particular, for carrying out the suspend of devices in their
.suspend() callbacks. However, as it's been determined recently,
there are multiple reasons not to do so, inlcuding:
* The caller really doesn't control the runtime PM usage counters,
because user space can access them through sysfs and effectively
block runtime PM. That means using pm_runtime_suspend() or
pm_runtime_get_sync() to suspend devices during system suspend
may or may not work.
* If a driver calls pm_runtime_suspend() from its .suspend()
callback, it causes the subsystem's .runtime_suspend() callback to
be executed, which leads to the call sequence:
subsys->suspend(dev)
driver->suspend(dev)
pm_runtime_suspend(dev)
subsys->runtime_suspend(dev)
recursive from the subsystem's point of view. For some subsystems
that may actually work (e.g. the platform bus type), but for some
it will fail in a rather spectacular fashion (e.g. PCI). In each
case it means a layering violation.
* Both the subsystem and the driver can provide .suspend_noirq()
callbacks for system suspend that can do whatever the
.runtime_suspend() callbacks do just fine, so it really isn't
necessary to call pm_runtime_suspend() during system suspend.
* The runtime PM's handling of wakeup devices is usually different
from the system suspend's one, so .runtime_suspend() may simply be
inappropriate for system suspend.
* System suspend is supposed to work even if CONFIG_PM_RUNTIME is
unset.
* The runtime PM workqueue is frozen before system suspend, so if
whatever the driver is going to do during system suspend depends
on it, that simply won't work.
Still, there is a good reason to allow pm_runtime_resume() to
succeed during system suspend and resume (for instance, some
subsystems and device drivers may legitimately use it to ensure that
their devices are in full-power states before suspending them).
Moreover, there is no reason to prevent runtime PM callbacks from
being executed in parallel with the system suspend/resume .prepare()
and .complete() callbacks and the code removed by commit
e866500247 went too far in this
respect. On the other hand, runtime PM callbacks, including
.runtime_resume(), must not be executed during system suspend's
"late" stage of suspending devices and during system resume's "early"
device resume stage.
Taking all of the above into consideration, make the PM core
acquire a runtime PM reference to every device and resume it if
there's a runtime PM resume request pending right before executing
the subsystem-level .suspend() callback for it. Make the PM core
drop references to all devices right after executing the
subsystem-level .resume() callbacks for them. Additionally,
make the PM core disable the runtime PM framework for all devices
during system suspend, after executing the subsystem-level .suspend()
callbacks for them, and enable the runtime PM framework for all
devices during system resume, right before executing the
subsystem-level .resume() callbacks for them.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Kevin Hilman <khilman@ti.com>
Engineering names are more stable than marketing names. Hence, use them
for Device Tree compatible properties instead.
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Engineering names are more stable than marketing names. Hence, use them
for Device Tree compatible properties instead.
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Aloisio Almeida Jr <aloisio.almeida@openbossa.org>
Signed-off-by: Lauro Ramos Venancio <lauro.venancio@openbossa.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This handles the merge conflicts with the
drivers/staging/brcm80211/Kconfig file due to changes on the two
different branches.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
tcp_rmem and tcp_wmem use 1 page as default value for the minimum
amount of memory to be used, same as udp_wmem_min and udp_rmem_min.
Pages are different size on different architectures - use the right
units when describing the defaults.
Reviewed-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Max Matveev <makc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sctp does not use second and third ("default" and "max") values
of sctp_rmem tunable. The format is the same as tcp_rmem
but the meaning is different so make the documentation explicit to
avoid confusion.
sctp_wmem is not used at all.
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Max Matveev <makc@redhat.com>
Reviewed-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>