To check wether we are truncating a very large device due to limited
meta data space, we need to check the ll_dev size.
Also improve the printk to suggest "flexible" or "internal".
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
So remove both the comment and the inline requirement, going back to the
inline hint.
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Currently there is no barrier support in the block device code. That
means we cannot guarantee any sort of data integerity when using the
block device node with dis kwrite caches enabled. Using the raw block
device node is a typical use case for virtualization (and I assume
databases, too). This patch changes block_fsync to issue a cache flush
and thus make fsync on block device nodes actually useful.
Note that in mainline we would also need to add such code to the
->aio_write method for O_SYNC handling, but assuming that Jan's patch
series for the O_SYNC rewrite goes in it will also call into ->fsync
for 2.6.32.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
There's nothing block related about them, the backing device
is used by things like NFS etc as well. This gets rid of the
need to protect such calls by CONFIG_BLOCK.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Hi,
Some workloads issue batches of small I/O, and the performance is poor
due to the call to blk_run_address_space for every single iocb. Nathan
Roberts pointed this out, and suggested that by deferring this call
until all I/Os in the iocb array are submitted to the block layer, we
can realize some impressive performance gains (up to 30% for sequential
4k reads in batches of 16).
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Hi,
The WRITE_ODIRECT flag is only used in one place, and that code path
happens to also call blk_run_address_space. The introduction of this
flag, then, could result in the device being unplugged twice for every
I/O.
Further, with the batching changes in the next patch, we don't want an
O_DIRECT write to imply a queue unplug.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
If active queue hasn't enough requests and idle window opens, cfq will not
dispatch sufficient requests to hardware. In such situation, current code
will zero hw_tag. But this is because cfq doesn't dispatch enough requests
instead of hardware queue doesn't work. Don't zero hw_tag in such case.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
cfq_queues are merged if they are issuing requests within the mean seek
distance of one another. This patch detects when the coopearting stops and
breaks the queues back up.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The flag used to indicate that a cfqq was allowed to jump ahead in the
scheduling order due to submitting a request close to the queue that
just executed. Since closely cooperating queues are now merged, the flag
holds little meaning. Change it to indicate that multiple queues were
merged. This will later be used to allow the breaking up of merged queues
when they are no longer cooperating.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
When cooperating cfq_queues are detected currently, they are allowed to
skip ahead in the scheduling order. It is much more efficient to
automatically share the cfq_queue data structure between cooperating processes.
Performance of the read-test2 benchmark (which is written to emulate the
dump(8) utility) went from 12MB/s to 90MB/s on my SATA disk. NFS servers
with multiple nfsd threads also saw performance increases.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
async cfq_queue's are already shared between processes within the same
priority, and forthcoming patches will change the mapping of cic to sync
cfq_queue from 1:1 to 1:N. So, calculate the seekiness of a process
based on the cfq_queue instead of the cfq_io_context.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The destination keyring specified to request_key() and co. is made available to
the process that instantiates the key (the slave process started by
/sbin/request-key typically). This is passed in the request_key_auth struct as
the dest_keyring member.
keyctl_instantiate_key and keyctl_negate_key() call get_instantiation_keyring()
to get the keyring to attach the newly constructed key to at the end of
instantiation. This may be given a specific keyring into which a link will be
made later, or it may be asked to find the keyring passed to request_key(). In
the former case, it returns a keyring with the refcount incremented by
lookup_user_key(); in the latter case, it returns the keyring from the
request_key_auth struct - and does _not_ increment the refcount.
The latter case will eventually result in an oops when the keyring prematurely
runs out of references and gets destroyed. The effect may take some time to
show up as the key is destroyed lazily.
To fix this, the keyring returned by get_instantiation_keyring() must always
have its refcount incremented, no matter where it comes from.
This can be tested by setting /etc/request-key.conf to:
#OP TYPE DESCRIPTION CALLOUT INFO PROGRAM ARG1 ARG2 ARG3 ...
#====== ======= =============== =============== ===============================
create * test:* * |/bin/false %u %g %d %{user:_display}
negate * * * /bin/keyctl negate %k 10 @u
and then doing:
keyctl add user _display aaaaaaaa @u
while keyctl request2 user test:x test:x @u &&
keyctl list @u;
do
keyctl request2 user test:x test:x @u;
sleep 31;
keyctl list @u;
done
which will oops eventually. Changing the negate line to have @u rather than
%S at the end is important as that forces the latter case by passing a special
keyring ID rather than an actual keyring ID.
Reported-by: Alexander Zangerl <az@bond.edu.au>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Alexander Zangerl <az@bond.edu.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/pci: Fix MODPOST warning
powerpc/oprofile: Add ppc750 CL as supported by oprofile
powerpc: warning: allocated section `.data_nosave' not in segment
powerpc/kgdb: Fix build failure caused by "kgdb.c: unused variable 'acc'"
powerpc: Fix hypervisor TLB batching
powerpc/mm: Fix hang accessing top of vmalloc space
powerpc: Fix memory leak in axon_msi.c
powerpc/pmac: Fix issues with sleep on some powerbooks
powerpc64/ftrace: use PACA to retrieve TOC in mod_return_to_handler
powerpc/ftrace: show real return addresses in modules
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
ACPI button: don't try to use a non-existent lid device
ACPI: video: Loosen strictness of video bus detection code
eeepc-laptop: Prevent a panic when disabling RT2860 wireless when associated
eeepc-laptop: Properly annote eeepc_enable_camera().
ACPI / PCI: Fix NULL pointer dereference in acpi_get_pci_dev() (rev. 2)
fujitsu-laptop: address missed led-class ifdef fixup
ACPI: Kconfig, fix proc aggregator text
ACPI: add AC/DC notifier
* 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6:
OMAP2xxx clock: set up clockdomain pointer in struct clk
OMAP: Fix race condition with autodeps
omap: McBSP: Fix incorrect receiver stop in omap_mcbsp_stop
omap: Initialization of SDRC params on Zoom2
omap: RX-51: Drop I2C-1 speed to 2200
omap: SDMA: Fixing bug in omap_dma_set_global_params()
omap: CONFIG_ISP1301_OMAP redefined in Beagle defconfig
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: always pin metadata in discard mode
Btrfs: enable discard support
Btrfs: add -o discard option
Btrfs: properly wait log writers during log sync
Btrfs: fix possible ENOSPC problems with truncate
Btrfs: fix btrfs acl #ifdef checks
Btrfs: streamline tree-log btree block writeout
Btrfs: avoid tree log commit when there are no changes
Btrfs: only write one super copy during fsync
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6:
sysfs: Allow sysfs_notify_dirent to be called from interrupt context.
sysfs: Allow sysfs_move_dir(..., NULL) again.
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6:
USB: gadget: Fix EEM driver comments and VID/PID
usb-storage: Workaround devices with bogus sense size
USB: ehci: Fix IST boundary checking interval math.
USB: option: Support for AIRPLUS MCD650 Datacard
USB: whci-hcd: always do an update after processing a halted qTD
USB: whci-hcd: handle early deletion of endpoints
USB: wusb: don't use the stack to read security descriptor
USB: rename Documentation/ABI/.../sysfs-class-usb_host
* branch 'tty-fixes'
tty: use the new 'flush_delayed_work()' helper to do ldisc flush
workqueue: add 'flush_delayed_work()' to run and wait for delayed work
tty: Make flush_to_ldisc() locking more robust
The 2.6.32 merge window brought a number of changes to the flexible array
API; this patch updates the documentation to match the new state of
affairs.
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
This reverts commit e9a63a4e55.
This breaks older binutils, where sink-less asserts are broken.
See this commit for further details:
d2ba8b2: x86: Fix assert syntax in vmlinux.lds.S
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <4AD6523D.5030909@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
vmxnet3 uses in_dev* interfaces so it should depend on INET.
Also fix so that the driver builds when CONFIG_PCI_MSI is disabled.
vmxnet3_drv.c:(.text+0x2a88cb): undefined reference to `in_dev_finish_destroy'
drivers/net/vmxnet3/vmxnet3_drv.c:1335: error: 'struct vmxnet3_intr' has no member named 'msix_entries'
drivers/net/vmxnet3/vmxnet3_drv.c:1384: error: 'struct vmxnet3_intr' has no member named 'msix_entries'
drivers/net/vmxnet3/vmxnet3_drv.c:2137: error: 'struct vmxnet3_intr' has no member named 'msix_entries'
drivers/net/vmxnet3/vmxnet3_drv.c:2138: error: 'struct vmxnet3_intr' has no member named 'msix_entries'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Bhavesh davda <bhavesh@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Call compat_unimap_ioctl, not do_unimap_ioctl.
This was broken by commit e9216651.
The compat_unimap_ioctl was originally called do_unimap_ioctl in
fs/compat_ioctl.h which got moved to drivers/char/vt_ioctl.c.
In that patch, the caller was not updated and consequently called
the native handler.
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
clock24xx.c is missing a omap2_init_clk_clkdm() in its
omap2_clk_init() function. Among other bad effects, this causes the
OMAP hwmod layer to oops on boot.
Thanks to Carlos Aguiar <carlos.aguiar@indt.org.br> and Stefano
Panella <Stefano.Panella@csr.com> for reporting this bug. Thanks to Tony
Lindgren <tony@atomide.com> for N800 booting advice.
Signed-off-by: Paul Walmsley <paul@pwsan.com>
Cc: Carlos Aguiar <carlos.aguiar@indt.org.br>
Cc: Stefano Panella <Stefano.Panella@csr.com>
Cc: Tony Lindgren <tony@atomide.com>
There is a possible race condition in clockdomain
code handling hw supported idle transitions.
When multiple autodeps dependencies are being added
or removed, a transition of still remaining dependent
powerdomain can result in false readings of the
state counter. This is especially fatal for off mode
state counter, as it could result in a driver not
noticing a context loss.
Fixed by disabling hw supported state transitions
when autodeps are being changed.
Signed-off-by: Kalle Jokiniemi <kalle.jokiniemi@digia.com>
Signed-off-by: Paul Walmsley <paul@pwsan.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
firewire: sbp2: provide fallback if mgt_ORB_timeout is missing
ieee1394: add documentation entry to MAINTAINERS
ieee1394: update URLs in debugging-via-ohci1394.txt
* branch 'tty-fixes':
tty: use the new 'flush_delayed_work()' helper to do ldisc flush
workqueue: add 'flush_delayed_work()' to run and wait for delayed work
Make flush_to_ldisc properly handle parallel calls
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
watchdog: Fix rio watchdog probe function
sparc64: Set IRQF_DISABLED on LDC channel IRQs.
sparc64: Fix D-cache flushing on swapin from SW devices.
sparc64: Fix niagara2 perf IRQ bits.
* 'sh/for-2.6.32' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
sh: Fix a TRACE_IRQS_OFF typo.
sh: Optimize the setup_rt_frame() I-cache flush.
sh: Populate initial secondary CPU info from boot_cpu_data.
sh: Tidy up SMP cpuinfo.
sh: Use boot_cpu_data for FPU tests in sigcontext paths.
sh: ftrace: Fix up syscall tracepoint support.
sh: force dcache flush if dcache_dirty bit set.
sh: update die() output.