main executable of (specially compiled/linked -pie/-fpie) ET_DYN binaries
onto a random address (in cases in which mmap() is allowed to perform a
randomization).
The code has been extraced from Ingo's exec-shield patch
http://people.redhat.com/mingo/exec-shield/
[akpm@linux-foundation.org: fix used-uninitialsied warning]
[kamezawa.hiroyu@jp.fujitsu.com: fixed ia32 ELF on x86_64 handling]
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Randomize the location of the heap (brk) for i386 and x86_64. The range is
randomized in the range starting at current brk location up to 0x02000000
offset for both architectures. This, together with
pie-executable-randomization.patch and
pie-executable-randomization-fix.patch, should make the address space
randomization on i386 and x86_64 complete.
Arjan says:
This is known to break older versions of some emacs variants, whose dumper
code assumed that the last variable declared in the program is equal to the
start of the dynamically allocated memory region.
(The dumper is the code where emacs effectively dumps core at the end of it's
compilation stage; this coredump is then loaded as the main program during
normal use)
iirc this was 5 years or so; we found this way back when I was at RH and we
first did the security stuff there (including this brk randomization). It
wasn't all variants of emacs, and it got fixed as a result (I vaguely remember
that emacs already had code to deal with it for other archs/oses, just
ifdeffed wrongly).
It's a rare and wrong assumption as a general thing, just on x86 it mostly
happened to be true (but to be honest, it'll break too if gcc does
something fancy or if the linker does a non-standard order). Still its
something we should at least document.
Note 2: afaik it only broke the emacs *build*. I'm not 100% sure about that
(it IS 5 years ago) though.
[ akpm@linux-foundation.org: deuglification ]
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The same delegation may have been handed out to more than one nfs_client.
Ensure that if a recall occurs, we return all instances.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If a (broken?) server hands out two different delegations for the same
file, then we should return one of them.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Otherwise, there is a potential deadlock if the last dput() from an NFSv4
close() or other asynchronous operation leads to nfs_clear_inode calling
the synchronous delegreturn.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
David Howells noticed that repeating the same mount option twice during an
NFS mount request can result in orphaned memory in certain cases.
Only the client_address and mount_server.hostname strings are initialized
in the mount parsing loop, so those appear to be the only two pointers that
might be written over by repeating a mount option. The strings in the
nfs_server section of the nfs_parsed_mount_data structure are set only once
after the options are parsed, thus these are not susceptible to being
overwritten.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The rfc doesn't give any reason it shouldn't be possible to set an
attribute on a non-regular file. And if the server supports it, then it
shouldn't be up to us to prevent it.
Thanks to Erez for the report and Trond for further analysis.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Tested-by: Erez Zadok <ezk@cs.sunysb.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
There are no interruptible waits for asynchronous RPC tasks, so we don't
need to wrap calls to rpc_run_task() with an
rpc_clnt_sigmask/rpc_clnt_unsigmask pair.
Instead we can wrap the wait_for_completion_interruptible() in
nfs_direct_wait(). This means that we completely optimise away sigmask
setting for the case of non-blocking aio/dio.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: pass 5 arguments to nlmclnt_init() in a structure similar to the
new nfs_client_initdata structure.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Now that each NFS mount point caches its own nlm_host structure, it can be
passed to nlmclnt_proc() for each lock request. By pinning an nlm_host for
each mount point, we trade the overhead of looking up or creating a fresh
nlm_host struct during every NLM procedure call for a little extra memory.
We also restrict the nlmclnt_proc symbol to limit the use of this call to
in-tree modules.
Note that nlm_lookup_host() (just removed from the client's per-request
NLM processing) could also trigger an nlm_host garbage collection. Now
client-side nlm_host garbage collection occurs only during NFS mount
processing. Since the NFS client now holds a reference on these nlm_host
structures, they wouldn't have been affected by garbage collection
anyway.
Given that nlm_lookup_host() reorders the global nlm_host chain after
every successful lookup, and that a garbage collection could be triggered
during the call, we've removed a significant amount of per-NLM-request
CPU processing overhead.
Sidebar: there are only a few remaining references to the internals of
NFS inodes in the client-side NLM code. The only references I found are
related to extracting or comparing the inode's file handle via NFS_FH().
One is in nlmclnt_grant(); the other is in nlmclnt_setlockargs().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cache an appropriate nlm_host structure in the NFS client's mount point
metadata for later use.
Note that there is no need to set NFS_MOUNT_NONLM in the error case -- if
nfs_start_lockd() returns a non-zero value, its callers ensure that the
mount request fails outright.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We would like to remove the per-lock-operation nlm_lookup_host() call from
nlmclnt_proc().
The new architecture pins an nlm_host structure to each NFS client
superblock that has the "lock" mount option set. The NFS client passes
in the pinned nlm_host structure during each call to nlmclnt_proc(). NFS
client unmount processing "puts" the nlm_host so it can be garbage-
collected later.
This patch introduces externally callable NLM functions that handle
mount-time nlm_host set up and tear-down.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The cookie->len field is unsigned, so the loop index variable in
nlmdbg_cookie2a() should also be unsigned.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: commit 4899f9c8 added nfs_write_end(), which introduces a
conditional expression that returns an unsigned integer in one arm and
a signed integer in the other.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: PAGE_CACHE_SIZE is unsigned, and nfs_pageio_init() takes a size_t.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: always use the same type when handling buffer lengths. As a
bonus, this prevents a mixed sign comparison in idmap_lookup_name.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The idmap_pipe_upcall() function expects the copy_to_user() function to
return a negative error value if the call fails, but copy_to_user()
returns an unsigned long number of bytes that couldn't be copied.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up white space damage and use standard kernel coding conventions for
return statements.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently, if you have a server mounted using networking protocol, you
cannot specify a different value using the 'proto=' option on another
mountpoint.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Now that the needed IPv6 infrastructure is in place, allow the NFS client's
IP address parser to generate AF_INET6 addresses.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Replace the nfs_server and mount_server address fields in the
nfs_parsed_mount_data structure with a "struct sockaddr_storage"
instead of a "struct sockaddr_in".
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Refactor the logic to parse incoming text-based IP addresses. Use the
in4_pton() function instead of the older in_aton(), following the lead
of the in-kernel CIFS client.
Later we'll add IPv6 address parsing using the matching in6_pton()
function. For now we can't allow IPv6 address parsing: we must expand
the size of the address storage fields in the nfs_parsed_mount_options
struct before we can parse and store IPv6 addresses.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
In the name of address family compatibility, we can't have the NIP_FMT and
NIPQUAD macros in nfs_try_mount(). Instead, we can make use of an unused
mount option to display the mount server's hostname.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Change the addr field in the nfs_clone_mount structure to store a "struct
sockaddr *" to support non-IPv4 addresses in the NFS client.
Note this is mostly a cosmetic change, and does not actually allow
referrals using IPv6 addresses. The existing referral code assumes that
the server returns a string that represents an IPv4 address. This code
needs to support hostnames and IPv6 addresses as well as IPv4 addresses,
thus it will need to be reorganized completely (to handle DNS resolution
in user space).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Adjust the arguments and callers of nfs4_set_client() to pass a "struct
sockaddr *" instead of a "struct sockaddr_in *" to support non-IPv4
addresses in the NFS client.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Adjust arguments and callers of nfs_get_client() to pass a
"struct sockaddr *" instead of "struct sockaddr_in *" to support
non-IPv4 addresses.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Adjust arguments and callers of nfs_find_client() to pass a
"struct sockaddr *" instead of "struct sockaddr_in *" to support non-IPv4
addresses.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Trond: Also fix up protocol version number argument in nfs_find_client() to
use the correct u32 type.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Change the addr field in the cb_recallargs struct to a "struct sockaddr *"
to support non-IPv4 addresses.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Change the addr field in the cb_getattrargs struct to a "struct sockaddr *"
to support non-IPv4 addresses.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Prepare for managing larger addresses in the NFS client by widening the
nfs_client struct's cl_addr field.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
(Modified to work with the new parameters for nfs_alloc_client)
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Create a helper function to set the default NFS port for NFSv4 mount
points. The helper supports both AF_INET and AF_INET6 family addresses.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We'll need to set the port number of an AF_INET or AF_INET6 address in
several places in fs/nfs/super.c, so introduce a helper that can manage
this for us. We put this helper to immediate use.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add support to nfs_verify_server_address for recognizing AF_INET6
addresses.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Refactor nfs_compare_super() and add AF_INET6 support.
Replace the generic memcmp() to document explicitly what parts of the
addresses must match in this check, and make the comparison independent
of the lengths of both addresses.
A side benefit is both tests are more computationally efficient than a
memcmp().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: fix an outdated block comment, and address a comparison
between a signed and unsigned integer.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: The client side peer address is available in callback_proc.c,
so move a dprintk out of fs/nfs/callback.c and into
fs/nfs/callback_proc.c.
This is more consistent with other debugging messages, and the proc
routines have more information about each request to display.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
To ensure the NFS client displays IPv6 addresses properly, replace
address family-specific NIPQUAD() invocations with a call to the RPC
client to get a formatted string representing the remote peer's
address.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We recently added methods to RPC transports that provide string versions of
the remote peer address information. Convert the NFSv4 SETCLIENTID
procedure to use those methods instead of building the client ID out of
whole cloth.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Ensure that the RPC buffer size specified for NFSv4 SETCLIENTID procedures
matches what we are encoding into the buffer. See the definition of
struct nfs4_setclientid {} and the encode_setclientid() function.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: The header tag length is unsigned, so checking that it is less
than zero is unnecessary.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The address comparison in the __nfs_find_client() function is deceptive.
It uses a memcmp() to check a pair of u32 fields for equality. Not only is
this inefficient, but usually memcmp() is used for comparing two *whole*
sockaddr_in's (which includes comparisons of the address family and port
number), so it's easy to mistake the comparison here for a whole sockaddr
comparison, which it isn't.
So for clarity and efficiency, we replace the memcmp() with a simple test
for equality between the two s_addr fields. This should have no
behavioral effect.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: mount option parsing uses kstrndup in several places, rather than
using kzalloc. Replace the few remaining uses of kzalloc with kstrndup,
for consistency.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Remove the mount option that allows users to specify an alternate mountd
program number. The client hasn't support setting an alternate mountd
program number for a very long time.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Remove the mount option that allows users to specify an alternate NFS
program number. The client hasn't support setting an alternate NFS
program number for a very long time.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Text-based mount option parsing introduced a minor regression in the
behavior of NFS version 4 mounts. NFS version 4 is not supposed to require
a running rpcbind service on the server in order for a mount to succeed.
In other words, if the mount options don't specify a port number, the port
number is supposed to default to 2049. For earlier versions of NFS, the
default port number was zero in order to cause the RPC client to autobind
to the server's NFS service.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
POSIX requires that ctime and mtime, as reported by the stat(2) call,
reflect the activity of the most recent write(2). To that end, nfs_getattr()
flushes pending dirty writes to a file before doing a GETATTR to allow the
NFS server to set the file's size, ctime, and mtime properly.
However, nfs_getattr() can be starved when a constant stream of application
writes to a file prevents nfs_wb_nocommit() from completing. This usually
results in hangs of programs doing a stat against an NFS file that is being
written. "ls -l" is a common victim of this behavior.
To prevent starvation, hold the file's i_mutex in nfs_getattr() to
freeze applications writes temporarily so the client can more quickly obtain
clean values for a file's size, mtime, and ctime.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The nfs_wcc_update_inode() function omits logic to convert the type of
the NFS on-the-wire value of a file's size (__u64) to the type of file
size value stored in struct inode (loff_t, which is signed).
Everywhere else in the NFS client I checked already correctly converts the
file size type.
This effects only very large files.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Replace use of rpc_call_setup() with rpc_init_task(), and in cases where we
need to initialise task->tk_action, with rpc_call_start().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Move the common code for setting up the nfs_write_data and nfs_read_data
structures into fs/nfs/read.c, fs/nfs/write.c and fs/nfs/direct.c.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We want the default scheduling priority (priority == 0) to remain
RPC_PRIORITY_NORMAL.
Also ensure that the priority wait queue scheduling is per process id
instead of sometimes being per thread, and sometimes being per inode.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Added an active/deactive mechanism to the nfs_server structure
allowing async operations to hold off umount until the
operations are done.
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Reduce the time spent locking the rpc_sequence structure by queuing the
nfs_seqid only when we are ready to take the lock (when calling
nfs_wait_on_sequence).
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The current model locks the page twice for no good reason. Optimise by
inlining the parts of nfs_write_begin()/nfs_write_end() that we care about.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If the server returns an ENOENT error, we still need to do a d_delete() in
order to ensure that the dentry is deleted.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
In nfs_do_call_unlink() we check that we haven't raced, and that lookup()
hasn't created an aliased dentry to our sillydeleted dentry. If somebody
has deleted the file on the server and the lookup() resulted in a negative
dentry, then ignore...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Ensure that readdir revalidates its data cache after blocking on
sillyrename.
Also fix a typo in nfs_do_call_unlink(): swap the ^= for an |=. The result
is the same, since we've already checked that the flag is unset, but it
makes the code more readable.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch addresses a problem introduced with the last round of
lowcomms patches where the 'othercon' connections do not get freed when
the DLM shuts down.
This results in the error message
"slab error in kmem_cache_destroy(): cache `dlm_conn': Can't free all
objects"
and the DLM cannot be restarted without a system reboot.
See bz#428119
Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Signed-off-by: David Teigland <teigland@redhat.com>
The dlm functions in memory.c should use the dlm_ prefix. Also, use
kzalloc/kfree directly for dlm_direntry's, removing the wrapper functions.
Signed-off-by: David Teigland <teigland@redhat.com>
Change log_error() to log_debug() for conditions that can occur in
large number in normal operation.
Signed-off-by: David Teigland <teigland@redhat.com>
This patch adds a proper prototype for some functions in
fs/dlm/dlm_internal.h
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David Teigland <teigland@redhat.com>
A common problem occurs when multiple IP addresses within the same
subnet are assigned to the same NIC. If we make a connection attempt to
another address on the same subnet as one of those addresses, the
connection attempt will not necessarily be routed from the address we
want.
In the case of the DLM, the other nodes will quickly drop the connection
attempt, causing problems.
This patch makes the DLM bind to the local address it acquired from the
cluster manager when using TCP prior to making a connection, obviating
the need for administrators to "fix" their systems or use clever routing
tricks.
Signed-off-by: Lon Hohberger <lhh@redhat.com>
Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
A bug report on nfsd that states that since it was switched to use
splice instead of sendfile, the atime was no longer being updated
on the input file. do_generic_mapping_read() does this when accessing
the file, make splice do it for the direct splice handler.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25: (1470 commits)
[IPV6] ADDRLABEL: Fix double free on label deletion.
[PPP]: Sparse warning fixes.
[IPV4] fib_trie: remove unneeded NULL check
[IPV4] fib_trie: More whitespace cleanup.
[NET_SCHED]: Use nla_policy for attribute validation in ematches
[NET_SCHED]: Use nla_policy for attribute validation in actions
[NET_SCHED]: Use nla_policy for attribute validation in classifiers
[NET_SCHED]: Use nla_policy for attribute validation in packet schedulers
[NET_SCHED]: sch_api: introduce constant for rate table size
[NET_SCHED]: Use typeful attribute parsing helpers
[NET_SCHED]: Use typeful attribute construction helpers
[NET_SCHED]: Use NLA_PUT_STRING for string dumping
[NET_SCHED]: Use nla_nest_start/nla_nest_end
[NET_SCHED]: Propagate nla_parse return value
[NET_SCHED]: act_api: use PTR_ERR in tcf_action_init/tcf_action_get
[NET_SCHED]: act_api: use nlmsg_parse
[NET_SCHED]: act_api: fix netlink API conversion bug
[NET_SCHED]: sch_netem: use nla_parse_nested_compat
[NET_SCHED]: sch_atm: fix format string warning
[NETNS]: Add namespace for ICMP replying code.
...
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild: (79 commits)
Remove references to "make dep"
kconfig: document use of HAVE_*
Introduce new section reference annotations tags: __ref, __refdata, __refconst
kbuild: warn about ld added unique sections
kbuild: add verbose option to Section mismatch reporting in modpost
kconfig: tristate choices with mixed tristate and boolean values
asm-generic/vmlix.lds.h: simplify __mem{init,exit}* dependencies
remove __attribute_used__
kbuild: support ARCH=x86 in buildtar
kconfig: remove "enable"
kbuild: simplified warning report in modpost
kbuild: introduce a few helpers in modpost
kbuild: use simpler section mismatch warnings in modpost
kbuild: link vmlinux.o before kallsyms passes
kbuild: introduce new option to enhance section mismatch analysis
Use separate sections for __dev/__cpu/__mem code/data
compiler.h: introduce __section()
all archs: consolidate init and exit sections in vmlinux.lds.h
kbuild: check section names consistently in modpost
kbuild: introduce blacklisting in modpost
...
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (50 commits)
jbd2: sparse pointer use of zero as null
jbd2: Use round-jiffies() function for the "5 second" ext4/jbd2 wakeup
jbd2: Mark jbd2 slabs as SLAB_TEMPORARY
jbd2: add lockdep support
ext4: Use the ext4_ext_actual_len() helper function
ext4: fix uniniatilized extent splitting error
ext4: Check for return value from sb_set_blocksize
ext4: Add stripe= option to /proc/mounts
ext4: Enable the multiblock allocator by default
ext4: Add multi block allocator for ext4
ext4: Add new functions for searching extent tree
ext4: Add ext4_find_next_bit()
ext4: fix up EXT4FS_DEBUG builds
ext4: Fix ext4_show_options to show the correct mount options.
ext4: Add EXT4_IOC_MIGRATE ioctl
ext4: Add inode version support in ext4
vfs: Add 64 bit i_version support
ext4: Add the journal checksum feature
jbd2: jbd2 stats through procfs
ext4: Take read lock during overwrite case.
...
Get rid of sparse related warnings from places that use integer as NULL
pointer. (Ported from upstream ext3/jbd changes.)
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
While "every 5 seconds" doesn't sound as a problem, there can be many
of these (and these timers do add up over all the kernel). The "5
second" wakeup isn't really timing sensitive; in addition even with
rounding it'll still happen every 5 seconds (with the exception of the
very first time, which is likely to be rounded up to somewhere closer
to 6 seconds)
(Ported from similar JBD patch made by Arjan van de Ven to
fs/jbd/transaction.c)
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This patch marks slab allocations by jbd2 as short-lived in support of
Mel Gorman's "Group short-lived and reclaimable kernel allocations"
patch. (Ported from similar changes made to fs/jbd/journal.c and
fs/jbd/revoke.c in Mel's patch.)
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
ext4 uses the high bit of the extent length to encode whether the extent
is intialized or not. The helper function ext4_ext_get_actual_len should
be used to get the actual length of the extent.
This addresses the kernel bug documented here:
http://bugzilla.kernel.org/show_bug.cgi?id=9732
kernel BUG at fs/ext4/extents.c:1056!
....
Call Trace:
[<ffffffff88366073>] :ext4dev:ext4_ext_get_blocks+0x5ba/0x8c1
[<ffffffff81053c91>] lock_release_holdtime+0x27/0x49
[<ffffffff812748f6>] _spin_unlock+0x17/0x20
[<ffffffff883400a6>] :jbd2:start_this_handle+0x4e0/0x4fe
[<ffffffff88366564>] :ext4dev:ext4_fallocate+0x175/0x39a
[<ffffffff81053c91>] lock_release_holdtime+0x27/0x49
[<ffffffff81056480>] __lock_acquire+0x4e7/0xc4d
[<ffffffff81053c91>] lock_release_holdtime+0x27/0x49
[<ffffffff810a8de7>] sys_fallocate+0xe4/0x10d
[<ffffffff8100c043>] tracesys+0xd5/0xda
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Fix bug reported by Dmitry Monakhov caused by lost error code
Testcase:
blksize = 0x1000;
fd = open(argv[1], O_RDWR|O_CREAT, 0700);
unsigned long long sz = 0x10000000UL;
/* allocating big blocks chunk */
syscall(__NR_fallocate, fd, 0, 0UL, sz)
/* grab all other available filesystem space */
tfd = open("tmp", O_RDWR|O_CREAT|O_DIRECT, 0700);
while( write(tfd, buf, 4096) > 0); /* loop untill ENOSPC */
fsync(fd); /* just in case */
while (pos < sz) {
/* each seek+ write operation result in splits uninitialized extent
in three extents. Splitting may result in new extent allocation
which probably will fail because of ENOSPC*/
lseek(fd, blksize*2 -1, SEEK_CUR);
if ((ret = write(fd, 'a', 1)) != 1)
exit(1);
pos += blksize * 2;
}
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
sb_set_blocksize validates whether the specfied block size can be used by
the file system. Make sure we fail mounting the file system if the
blocksize specfied cannot be used.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Enable the multiblock allocator by default.
Fix ext4_show_options() so if it is not enabled, the nomballoc option
included in /proc/mounts.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Add the functions ext4_ext_search_left() and ext4_ext_search_right(),
which are used by mballoc during ext4_ext_get_blocks to decided whether
to merge extent information.
Signed-off-by: Alex Tomas <alex@clusterfs.com>
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Johann Lombardi <johann@clusterfs.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Builds with EXT4FS_DEBUG defined (to enable ext4_debug()) fail
without these changes. Clean up some format warnings too.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
We need to look at the default value and make sure
the mount options are not set via default value
before showing them via ext4_show_options
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>