The below patch lets userspace have more control over the inodes that
inotify will watch. It introduces two new flags.
IN_ONLYDIR -- only watch the inode if it is a directory.
This is needed to avoid the race that can occur when we want to be
sure that we are watching a directory.
IN_DONT_FOLLOW -- don't follow a symlink. In combination
with IN_ONLYDIR we can make sure that we don't watch the target of
symlinks.
The issues the flags fix came up when writing the gnome-vfs inotify
backend. Default behaviour is unchanged.
Signed-off-by: John McCutchan <ttb@tentacle.dhs.org>
Acked-by: Robert Love <rml@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This undoes the put_disk patch I sent in before.
If I had been paying attention I would have seen that we call put_disk
from free_hba during driver unload. That's the only time we want to
call it. If it's called from deregister disk we may remove the
controller (cNd0) unintentionally.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When registering multiple kprobes at the same address, we leave a small
window where the kprobe hlist will not contain a reference to the
registered kprobe, leading to potentially, a system crash if the breakpoint
is hit on another processor.
Patch below now automically relpace the old kprobe with the new
kprobe from the hash list.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add list_replace_rcu: replace old entry by new one.
Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds a timestamp field to the events sent via the process event
connector. The timestamp allows listeners to accurately account the
duration(s) between a process' events and offers strong means with which
to determine the order of events with respect to a given task while also
avoiding the addition of per-task data.
This alters the size and layout of the event structure and hence would
break compatibility if process events connector as it stands in 2.6.15-rc2
were released as a mainline kernel.
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There are several functions that might seem appropriate for a timestamp:
get_cycles()
current_kernel_time()
do_gettimeofday()
<read jiffies/jiffies_64>
Each has problems with combinations of SMP-safety, low resolution, and
monotonicity. This patch adds a new function that returns a monotonic SMP-safe
timestamp with nanosecond resolution where available.
Changes:
Split timestamp into separate patch
Moved to kernel/time.c
Renamed to getnstimestamp
Fixed unintended-pointer-arithmetic bug
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Commit f549d6c18c introduced a generic
fallback for security xattrs, but appears to include a subtle bug.
Gentoo users with kernels with selinux compiled in, and coreutils compiled
with acl support, noticed that they could not copy files on tmpfs using
'cp'.
cp (compiled with acl support) copies the file, lists the extended
attributes on the old file, copies them all to the new file, and then
exits. However the listxattr() calls were failing with this odd behaviour:
llistxattr("a.out", (nil), 0) = 17
llistxattr("a.out", 0x7fffff8c6cb0, 17) = -1 ERANGE (Numerical result out of
range)
I believe this is a simple problem in the logic used to check the buffer
sizes; if the user sends a buffer the exact size of the data, then its ok
:)
This change solves the problem.
More info can be found at http://bugs.gentoo.org/113138
Signed-off-by: Daniel Drake <dsd@gentoo.org>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Accessing nohz_cpu_mask before incrementing rcp->cur is racy. It can cause
tickless idle CPUs to be included in rsp->cpumask, which will extend
graceperiods unnecessarily.
Fix this race. It has been tested using extensions to RCU torture module
that forces various CPUs to become idle.
Signed-off-by: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
While doing some test of RCU torture module, I hit a OOPS in rcu_do_batch,
which was trying to processes callback of a module that was just removed.
This is because we weren't waiting long enough for all callbacks to fire.
Signed-off-by: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This introduces a new interface - rcu_barrier() which waits until all
the RCUs queued until this call have been completed.
Reiser4 needs this, because we do more than just freeing memory object
in our RCU callback: we also remove it from the list hanging off
super-block. This means, that before freeing reiser4-specific portion
of super-block (during umount) we have to wait until all pending RCU
callbacks are executed.
The only change of reiser4 made to the original patch, is exporting of
rcu_barrier().
Cc: Hans Reiser <reiser@namesys.com>
Cc: Vladimir V. Saveliev <vs@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Reported by Jacques de Mer and Daniel Drake <dsd@gentoo.org>.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Sam Ravnborg <sam@ravnborg.org> writes:
> Author: Uwe Zeisberger <zeisberg@informatik.uni-freiburg.de>
>
> [PATCH] kbuild: make kernelrelease in unconfigured kernel prints an error
>
> Do not include .config for target kernelrelease
This is wrong. KERNELRELEASE depends on CONFIG_LOCALVERSION, thus you
need .config.
Signed-off-by: Andreas Schwab <schwab@suse.de>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With CPU hotplug enabled, NMI watchdog stoped working. It appears the
violation is the cpu_online check in nmi handler. local ACPI based NMI
watchdog is initialized before we set CPU online for APs. It's quite
possible a NMI is fired before we set CPU online, and that's what happens
here.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Acked-by: Zwane Mwaikambo <zwane@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When a Kprobes are inserted/removed on a modules, the modules must be ref
counted so as not to allow to unload while probes are registered on that
module.
Without this patch, the probed module is free to unload, and when the
probing module unregister the probe, the kpobes code while trying to
replace the original instruction might crash.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Mao Bibo <bibo.mao@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
SetPageDirty() and ClearPageDirty() are low-level thing which filesystems
shouldn't be using. They bypass dirty page accounting.
Cc: David Woodhouse <dwmw2@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
IA64 is using the generic version of __raw_read_trylock, which always
waits for the lock to be free instead of returning when the lock is in
use. Define an ia64 version of __raw_read_trylock which behaves
correctly, and drop the generic one.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The VM layer (for historical reasons) turns a read-only shared mmap into
a private-like mapping with the VM_MAYWRITE bit clear. Thus checking
just VM_SHARED isn't actually sufficient.
So use a trivial helper function for the cases where we wanted to inquire
if a mapping was COW-like or not.
Moo!
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With the previous commit, we can handle arbitrary shared re-mappings
even without this complexity, and since the only known private mappings
are for strange users of /dev/mem (which never create an incomplete one),
there seems to be no reason to support it.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A shared mapping doesn't cause COW-pages, so we don't need to worry
about the whole vm_pgoff logic to decide if a PFN-remapped page has
gone through COW or not.
This makes it possible to entirely avoid the special "partial remapping"
logic for the common case.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ppc32 kernel, when built with CONFIG_SMP and booted on a single CPU
machine, will not properly set smp_tb_synchronized, thus causing
gettimeofday() to not use the HW timebase and to be limited to jiffy
resolution. This, among others, causes unacceptable pauses when launching
X.org.
Signed-Off-By: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The code that sets the clock spreading feature of the Intrepid ASIC
must not be run on some machine models or those won't boot. This
fixes it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Patch from Nikola Valerjev
Single stepping an application using ptrace() fails over ARM instructions BX and BLX.
Steps to reproduce:
Compile and link the following files
main.c
-----
void foo();
int main() {
foo();
return 0;
}
foo.s
-----
.text
.globl foo
foo:
BX LR
Using ptrace() functionality, run to main(), and start singlestepping.
Singlestep over \"BX LR\" instruction won\'t transfer the control back
to main, but run the code to completion.
This problems seems to be in the function get_branch_address() in
arch/arm/kernel/ptrace.c. The function doesn\'t seem to recognize BX
and BLX instructions as branches. BX and BLX instructions can be used
to convert from ARM to Thumb mode if the target address has the low
bit set. However, they are also perfectly legal in the ARM only mode.
Although other things in the kernel seem to indicate that only ARM
mode is accepted (and not Thumb), many compilers will generate BX
and BLX instructions even when generating ARM only code.
Signed-off-by: Nikola Valerjev <nikola@ghs.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
It is a simple bug which uses the wrong member.
This bug does not seriously affect ordinary use of IPsec.
But it is important to pass IPv6 ready logo phase-2
conformance test of IPsec SGW.
Signed-off-by: Kazunori MIYAZAWA <miyazawa@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
On ppc64, when opening a new hugepage region, we need to make sure any
old normal-page SLBs for the area are flushed on all CPUs. There was
a bug in this logic - after putting the new hugepage area masks into
the thread structure, we copied it into the paca (read by the SLB miss
handler) only on one CPU, not on all. This could cause incorrect SLB
entries to be loaded when a multithreaded program was running
simultaneously on several CPUs. This patch corrects the error,
copying the context information into the PACA on all CPUs using the mm
in question before flushing any existing SLB entries.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
On most powerpc CPUs, the dcache and icache are not coherent so
between writing and executing a page, the caches must be flushed.
Userspace programs assume pages given to them by the kernel are icache
clean, so we must do this flush between the kernel clearing a page and
it being mapped into userspace for execute. We were not doing this
for hugepages, this patch corrects the situation.
We use the same lazy mechanism as we use for normal pages, delaying
the flush until userspace actually attempts to execute from the page
in question.
Tested on G5.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cache info is setup by walking the device tree in initialize_cache_info().
However, icache_flush_range might be called before that, in
slb_initialize()->patch_slb_encoding, which modifies the load immediate
instructions used with SLB fault code.
Not only that, but depending on memory layout, we might take SLB faults
during unflatten_device_tree. So that fault will load an SLB entry that
might not contain the right LLP flags for the segment.
Either we can walk the flattened device tree to setup cache info, or
we can pick the known defaults that are known to work. Doing it in the
flattened device tree is hairier since we need to know the machine type
to know what property to look for, etc, etc.
For now, it's just easier to go with the defaults. Worst thing that
happens from it is that we might waste a few cycles doing too small
dcbst/icbi increments.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
model_id fields of wf_smu_sys_all_params should match the model ID
they are supposed to represent (as commented). Fixes windfarm on some
iMac 8,1 models.
Signed-off-by: Michal Ostrowski <mostrows at watson ibm com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Patch from Deepak Saxena
This looks like a leftover from 2.4 days...
Signed-off-by: Deepak Saxena <dsaxena@plexity.net>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The problem I was seeing turned out to be that skb->dev is NULL when
the checksum is being completed in user context. This happens because
the reference to the device is dropped (to allow it to be released
when packets are in the queue).
Because skb->dev was NULL, the netdev_rx_csum_fault was panicing on
deref of dev->name. How about this?
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some debug code wasn't properly removed from the initial 64k pages
patch, and while it's harmless, it's also slowing down significantly a
very hot code path, thus it should really be removed.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The 64k pages patch changed the meaning of one argument passed to the
low level hash functions (from "large" it became "psize" or page size
index), but one of the call sites wasn't properly updated, causing
potential random weird problems with huge pages. This fixes it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This bug exists in the current code and prevents machines from booting
with numa enabled if there is a node that does not contain memory.
Workaround is to boot with 'numa=off'. Looks like a simple typo.
Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Local add/sub macros need to have a parameter to specify
the addend/subtrahend respectively.
Signed-off-by: Christoph Lameter <clameter@sgi.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
[ Move assosciated code comment to the correct spot, and
update driver version and release date -DaveM ]
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
So we can properly use __GFP_COMP and avoid the use of
PG_reserved pages.
With extremely helpful review from Hugh Dickins.
Signed-off-by: David S. Miller <davem@davemloft.net>
We have to store the congestion control timestamp on the SKB before we
clone it, not after. Else we get no timestamping information at all.
tcp_transmit_skb() has been reworked so that we can do the timestamp
still in one spot, instead of at all the call sites.
Problem discovered, and initial fix, from Tom Young
<tyo@ee.unimelb.edu.au>.
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove unneeded call to tcp_vegas_rtt_calc. The more accurate
microsecond value has already been registered prior to calling
tcp_vegas_cong_avoid.
Signed-off-by: Thomas Young <tyo@ee.mu.oz.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the resetting of rtt measurements to inside the once per RTT
block of code.
Signed-off-by: Thomas Young <tyo@ee.mu.oz.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch that added support for a new platform chipset (shub2) broke
PTC deadlock recovery on older versions of the chipset. (PTCs are the
SN platform-specific method for doing a global TLB purge). This
patch fixes deadlock recovery so that it works on both the old & new
chipsets.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
We have a customer application which trips a bug. The problem arises
when a driver attempts to call do_munmap on an area which is mapped, but
because current->thread.task_size has been set to 0xC0000000, the call
to do_munmap fails thinking it is an unmap beyond the user's address
space.
The comment in fs/binfmt_elf.c in load_elf_library() before the call
to SET_PERSONALITY() indicates that task_size must not be changed for
the running application until flush_thread, but is for ia64 executing
ia32 binaries.
This patch moves the setting of task_size from SET_PERSONALITY() to
flush_thread() as indicated. The customer application no longer is able
to trip the bug.
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The per-node data structures are allocated with strided offsets that are a
function of the node number. This prevents excessive cache-aliasing from
occurring.
On systems with a large number of nodes, the strided offset becomes
too large. This patch restricts the maximum offset to 32MB. This is far larger
than the size of any current L3 cache.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Altix only patch to add fixup code that sets up
pci_controller->window. This code is a temporary
fix until ACPI support on Altix is added.
Also, corrects the usage of pci_dev->sysdata,
which had previously been used to reference
platform specific device info, to now point to
a pci_controller struct.
Signed-off-by: John Keller <jpk@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The checksum offsets for receive offload were not being set correctly.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
The patch (originally from Steve) simply adds memory buffer settings to
DECnet similar to those in TCP.
Signed-off-by: Patrick Caulfield <patrick@tykepenguin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a function takes a function pointer as argument it should use the 'return
(*pointer)(params...)' syntax used everywhere else in the kernel as this is
recognized by kernel-doc.
Signed-off-by: Martin Waitz <tali@admingilde.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
NFA_NEST calls NFA_PUT which jumps to nfattr_failure if the skb has no
room left. We call read_unlock_bh at nfattr_failure for the NFA_PUT inside
the locked section, so move NFA_NEST inside the locked section too.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>