This patch fixes an issue with the delalloc metadata space reservation
code. The problem is we used to free the reservation as soon as we
allocated the delalloc region. The problem with this is if we are not
inserting an inline extent, we don't actually insert the extent item until
after the ordered extent is written out. This patch does 3 things,
1) It moves the reservation clearing stuff into the ordered code, so when
we remove the ordered extent we remove the reservation.
2) It adds a EXTENT_DO_ACCOUNTING flag that gets passed when we clear
delalloc bits in the cases where we want to clear the metadata reservation
when we clear the delalloc extent, in the case that we do an inline extent
or we invalidate the page.
3) It adds another waitqueue to the space info so that when we start a fs
wide delalloc flush, anybody else who also hits that area will simply wait
for the flush to finish and then try to make their allocation.
This has been tested thoroughly to make sure we did not regress on
performance.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
futex: fix requeue_pi key imbalance
futex: Fix typo in FUTEX_WAIT/WAKE_BITSET_PRIVATE definitions
rcu: Place root rcu_node structure in separate lockdep class
rcu: Make hot-unplugged CPU relinquish its own RCU callbacks
rcu: Move rcu_barrier() to rcutree
futex: Move exit_pi_state() call to release_mm()
futex: Nullify robust lists after cleanup
futex: Fix locking imbalance
panic: Fix panic message visibility by calling bust_spinlocks(0) before dying
rcu: Replace the rcu_barrier enum with pointer to call_rcu*() function
rcu: Clean up code based on review feedback from Josh Triplett, part 4
rcu: Clean up code based on review feedback from Josh Triplett, part 3
rcu: Fix rcu_lock_map build failure on CONFIG_PROVE_LOCKING=y
rcu: Clean up code to address Ingo's checkpatch feedback
rcu: Clean up code based on review feedback from Josh Triplett, part 2
rcu: Clean up code based on review feedback from Josh Triplett
When compression is on, the cow_file_range code is farmed off to
worker threads. This allows us to do significant CPU work in parallel
on SMP machines.
But it is a delicate balance around when we clear flags and how. In
the past we cleared the delalloc flag immediately, which was safe
because the pages stayed locked.
But this is causing problems with the newest ENOSPC code, and with the
recent extent state cleanups we can now clear the delalloc bit at the
same time the uncompressed code does.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
extent_clear_unlock_delalloc has a growing set of ugly parameters
that is very difficult to read and maintain.
This switches to a flag field and well named flag defines.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Set correct normal_prio and prio values in sched_fork()
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
tracing: user local buffer variable for trace branch tracer
tracing: fix warning on kernel/trace/trace_branch.c andtrace_hw_branches.c
ftrace: check for failure for all conversions
tracing: correct module boundaries for ftrace_release
tracing: fix transposed numbers of lock_depth and preempt_count
trace: Fix missing assignment in trace_ctxwake_*
tracing: Use free_percpu instead of kfree
tracing: Check total refcount before releasing bufs in profile_enable failure
* 'sparc-perf-events-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
mm, perf_event: Make vmalloc_user() align base kernel virtual address to SHMLBA
perf_event: Provide vmalloc() based mmap() backing
* 'perf-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf_events: Make ABI definitions available to userspace
perf tools: elf_sym__is_function() should accept "zero" sized functions
tracing/syscalls: Use long for syscall ret format and field definitions
perf trace: Update eval_flag() flags array to match interrupt.h
perf trace: Remove unused code in builtin-trace.c
perf: Propagate term signal to child
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, timers: Check for pending timers after (device) interrupts
NOHZ: update idle state also when NOHZ is inactive
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
ALSA: ice1724: increase SPDIF and independent stereo buffer sizes
ALSA: opl3: circular locking in the snd_opl3_note_on() and snd_opl3_note_off()
ALSA: ICE1712/24 - Change the Multi Track Peak control (level meters) from MIXER to PCM type
ALSA: hda - Fix yet another auto-mic bug in ALC268
ASoC: WM8350 capture PGA mutes are inverted
ASoC: Remove absent SYNC and TDM DAI format options from i.MX SSI
sound: via82xx: move DXS volume controls to PCM interface
ALSA: hda - Don't pick up invalid HP pins in alc_subsystem_id()
ALSA: hda - Add a workaround for ASUS A7K
ALSA: hda - Fix invalid initializations for ALC861 auto mode
ASoC: wm8940: Fix check on error code form snd_soc_codec_set_cache_io
ASoC: Fix SND_SOC_DAPM_LINE handling
* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (24 commits)
drm/radeon/kms: fix vline register for second head.
drm/r600: avoid assigning vb twice in blit code
drm/radeon: use list_for_each_entry instead of list_for_each
drm/radeon/kms: Fix AGP support for R600/RV770 family (v2)
drm/radeon/kms: Fallback to non AGP when acceleration fails to initialize (v2)
drm/radeon/kms: Fix RS600/RV515/R520/RS690 IRQ
drm/radeon: Fix setting of bits
drm/ttm: fix refcounting in ttm global code.
drm/fb: add more correct 8/16/24/32 bpp fb support.
drm/fb: add setcmap and fix 8-bit support.
drm/radeon/kms: respect single crtc cards, only create one crtc. (v2)
drm: Delete the DRM_DEBUG_KMS in drm_mode_cursor_ioctl
drm/radeon/kms: add support for "Surround View"
drm/radeon/kms: Fix irq handling on AVIVO hw
drm/radeon/kms: R600/RV770 remove dead code and print message for wrong BIOS
drm/radeon/kms: Fix R600/RV770 disable acceleration path
drm/radeon/kms: Fix R600/RV770 startup path & reset
drm/radeon/kms: Fix R600 write back buffer
drm/radeon/kms: Remove old init path as no hw use it anymore
drm/radeon/kms: Convert RS600 to new init path
...
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (40 commits)
ethoc: limit the number of buffers to 128
ethoc: use system memory as buffer
ethoc: align received packet to make IP header at word boundary
ethoc: fix buffer address mapping
ethoc: fix typo to compute number of tx descriptors
au1000_eth: Duplicate test of RX_OVERLEN bit in update_rx_stats()
netxen: Fix Unlikely(x) > y
pasemi_mac: ethtool get settings fix
add maintainer for network drop monitor kernel service
tg3: Fix phylib locking strategy
rndis_host: support ETHTOOL_GPERMADDR
ipv4: arp_notify address list bug
gigaset: add kerneldoc comments
gigaset: correct debugging output selection
gigaset: improve error recovery
gigaset: fix device ERROR response handling
gigaset: announce if built with debugging
gigaset: handle isoc frame errors more gracefully
gigaset: linearize skb
gigaset: fix reject/hangup handling
...
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-2.6:
Revert "Revert "ide: try to use PIO Mode 0 during probe if possible""
sis5513: fix PIO setup for ATAPI devices
Now that the VFS actually waits for the data I/O to complete before
calling into ->fsync we can stop doing it ourselves.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
This is for bug #850,
http://oss.sgi.com/bugzilla/show_bug.cgi?id=850
XFS file system segfaults , repeatedly and 100% reproducable in 2.6.30 , 2.6.31
The above only showed up on a CONFIG_XFS_DEBUG=y kernel, because
xfs_bmapi() ASSERTs that it has been asked for at least one map,
and it was getting 0.
The root cause is that our guesstimated "bufsize" from xfs_file_readdir
was fairly small, and the
bufsize -= length;
in the loop was going negative - except bufsize is a size_t, so it
was wrapping to a very large number.
Then when we did
ra_want = howmany(bufsize + mp->m_dirblksize,
mp->m_sb.sb_blocksize) - 1;
with that very large number, the (int) ra_want was coming out
negative, and a subsequent compare:
if (1 + ra_want > map_blocks ...
was coming out -true- (negative int compare w/ uint) and we went
back to xfs_bmapi() for more, even though we did not need more,
and asked for 0 maps, and hit the ASSERT.
We have kind of a type mess here, but just keeping bufsize from
going negative is probably sufficient to avoid the problem.
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
We want to always cover the log after writing out the superblock, and
in case of a synchronous writeout make sure we actually wait for the
log to be covered. That way a filesystem that has been sync()ed can
be considered clean by log recovery.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
To make sure they get properly waited on in sync when I/O is in flight and
we latter need to update the inode size. Requires a new helper to check if an
ioend structure is beyond the current EOF.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Sort out ->sync_fs to not perform a superblock writeback for the wait = 0 case
as that is just an optional first pass and the superblock will be written back
properly in the next call with wait = 1. Instead perform an opportunistic
quota writeback to have less work later. Also remove the freeze special case
as we do a proper wait = 1 call in the freeze code anyway.
Also rename the function to xfs_fs_sync_fs to match the normal naming
convention, update comments and avoid calling into the laptop_mode logic on
an error.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
We need to do a synchronous xfs_sync_fsdata to make sure the superblock
actually is on disk when we return.
Also remove SYNC_BDFLUSH flag to xfs_sync_inodes because that particular
flag is never checked.
Move xfs_filestream_flush call later to only release inodes after they
have been written out.
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
This is picking up on Felix's repost of Dave's patch to implement a
.dirty_inode method. We really need this notification because
the VFS keeps writing directly into the inode structure instead
of going through methods to update this state. In addition to
the long-known atime issue we now also have a caller in VM code
that updates c/mtime that way for shared writeable mmaps. And
I found another one that no one has noticed in practice in the FIFO
code.
So implement ->dirty_inode to set i_update_core whenever the
inode gets externally dirtied, and switch the c/mtime handling to
the same scheme we already use for atime (always picking up
the value from the Linux inode).
Note that this patch also removes the xfs_synchronize_atime call
in xfs_reclaim it was superflous as we already synchronize the time
when writing the inode via the log (xfs_inode_item_format) or the
normal buffers (xfs_iflush_int).
In addition also remove the I_CLEAR check before copying the Linux
timestamps - now that we always have the Linux inode available
we can always use the timestamps in it.
Also switch to just using file_update_time for regular reads/writes -
that will get us all optimization done to it for free and make
sure we notice early when it breaks.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Felix Blyakher <felixb@sgi.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
The unencrypted files are being measured. Update the counters to get
rid of the ecryptfs imbalance message. (http://bugzilla.redhat.com/519737)
Reported-by: Sachin Garg
Cc: Eric Paris <eparis@redhat.com>
Cc: Dustin Kirkland <kirkland@canonical.com>
Cc: James Morris <jmorris@namei.org>
Cc: David Safford <safford@watson.ibm.com>
Cc: stable@kernel.org
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
eCryptfs no longer uses a netlink interface to communicate with
ecryptfsd, so NET is not a valid dependency anymore.
MD5 is required and must be built for eCryptfs to be of any use.
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
ecryptfs uses crypto APIs so it should depend on CRYPTO.
Otherwise many build errors occur. [63 lines not pasted]
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: ecryptfs-devel@lists.launchpad.net
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
The NFSv4 renew daemon is shared between all active super blocks that refer
to a particular NFS server, so it is wrong to be shutting it down in
nfs4_kill_super every time a super block is destroyed.
This patch therefore kills nfs4_renewd_prepare_shutdown altogether, and
leaves it up to nfs4_shutdown_client() to also shut down the renew daemon
by means of the existing call to nfs4_kill_renewd().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Now that range timers and deferred timers are common, I found a
problem with these using the "perf timechart" tool. Frans Pop also
reported high scheduler latencies via LatencyTop, when using
iwlagn.
It turns out that on x86, these two 'opportunistic' timers only get
checked when another "real" timer happens. These opportunistic
timers have the objective to save power by hitchhiking on other
wakeups, as to avoid CPU wakeups by themselves as much as possible.
The change in this patch runs this check not only at timer
interrupts, but at all (device) interrupts. The effect is that:
1) the deferred timers/range timers get delayed less
2) the range timers cause less wakeups by themselves because
the percentage of hitchhiking on existing wakeup events goes up.
I've verified the working of the patch using "perf timechart", the
original exposed bug is gone with this patch. Frans also reported
success - the latencies are now down in the expected ~10 msec
range.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Tested-by: Frans Pop <elendil@planet.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091008064041.67219b13@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When a vmalloc'd area is mmap'd into userspace, some kind of
co-ordination is necessary for this to work on platforms with cpu
D-caches which can have aliases.
Otherwise kernel side writes won't be seen properly in userspace
and vice versa.
If the kernel side mapping and the user side one have the same
alignment, modulo SHMLBA, this can work as long as VM_SHARED is
shared of VMA and for all current users this is true. VM_SHARED
will force SHMLBA alignment of the user side mmap on platforms with
D-cache aliasing matters.
The bulk of this patch is just making it so that a specific
alignment can be passed down into __get_vm_area_node(). All
existing callers pass in '1' which preserves existing behavior.
vmalloc_user() gives SHMLBA for the alignment.
As a side effect this should get the video media drivers and other
vmalloc_user() users into more working shape on such systems.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <200909211922.n8LJMYjw029425@imap1.linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6:
agp: parisc-agp.c - use correct page_mask function
parisc: Fix linker script breakage.
parisc: convert to asm-generic/hardirq.h
parisc: Make THREAD_SIZE available to assembly files and linker scripts.
parisc: correct use of SHF_ALLOC
parisc: rename parisc's vmalloc_start to parisc_vmalloc_start
parisc: add me to Maintainers
parisc: includecheck fix: signal.c
parisc: HAVE_ARCH_TRACEHOOK
parisc: add skeleton syscall.h
parisc: stop using task->ptrace for {single,block}step flags
parisc: split syscall_trace into two halves
parisc: add missing TI_TASK macro in syscall.S
parisc: tracehook_signal_handler
parisc: tracehook_report_syscall
The PC Card 8.0 specification (vol. 4, section 3.2.10) says the
TPLLV1_INFO field of the CISTPL_VERS_1 tuple must contain 4 strings. Some
cards don't have all 4 so just parse as many as we can.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: David Vrabel <david.vrabel@csr.com>
Tested-by: Jonathan Cameron <jic23@cam.ac.uk>
Tested-by: Bing Zhao <bzhao@marvell.com>
Cc: Roel Kluin <roel.kluin@gmail.com>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For hwpoison stress testing. The debugfs mount point is assumed to be
/debug/.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Refactor the code to be more modular and easier to reuse.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This helps merge duplicate code (now and future) and outstand the main
logic.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It indicates to the system admin that processes mapping such pages may be
eating less physical memory than the reported numbers by legacy tools.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Izik Eidus <ieidus@redhat.com>
Acked-by: Chris Wright <chrisw@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This flag indicates a hardware detected memory corruption on the page.
Any future access of the page data may bring down the machine.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Update documentation of cgroups tasks and procs files
Document the cgroup.procs file.
Clarify the semantics of the cgroup.procs and tasks files. Although the
current cgroup.procs interface returns a sorted and uniqified list of
pids, potential future performance enhancements could result in those
properties being removed - explicitly document this aspect of the API.
There are no existing users of cgroup.procs, so compatibility isn't an
issue. There are users of the "tasks" file, but none that would appear to
break in the event of the sorted property being broken. The standard
"libcpuset" explicitly sorts the results of reading from the tasks file,
and "libcg" and other users don't appear to care about ordering.
Signed-off-by: Paul Menage <menage@google.com>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fix the following 'make includecheck' warning:
drivers/video/da8xx-fb.c: linux/device.h is included more than once.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fix the following 'make includecheck' warning:
drivers/video/msm/mddi.c: linux/delay.h is included more than once.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fix the following 'make includecheck' warning:
fs/proc/kcore.c: linux/mm.h is included more than once.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fix the following 'make includecheck' warning:
mm/vmalloc.c: linux/highmem.h is included more than once.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adjust the max_kernel_pages default to a quarter of totalram_pages,
instead of nr_free_buffer_pages() / 4: the KSM pages themselves come from
highmem, and even on a 16GB PAE machine, 4GB of KSM pages would only be
pinning 32MB of lowmem with their rmap_items, so no need for the more
obscure calculation (nor for its own special init function).
There is no way for the user to switch KSM on if CONFIG_SYSFS is not
enabled, so in that case default run to KSM_RUN_MERGE.
Update KSM Documentation and Kconfig to reflect the new defaults.
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Izik Eidus <ieidus@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Increase the default and maximum PCM buffer prellocation size for ice1724's
SPDIF and independent stereo pair outputs to 256K, which is the hardware's
maximum supported size. This allows a reduction in interrupt rate and
potentially power usage when an application is not latency-critical.
Signed-off-by: Robert Hancock <hancockrwd@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>