Apparently the patch "NFS: Always use the same SETCLIENTID boot verifier"
is tickling a Linux nfs server bug, and causing a regression: the server
can get into a situation where it keeps replying NFS4ERR_SEQ_MISORDERED
to our CREATE_SESSION request even when we are sending the correct
sequence ID.
Fix this by purging the lease and then retrying.
Reported-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Try to consolidate the error handling for nfs4_reclaim_lease into
a single function instead of doing a bit here, and a bit there...
Also ensure that NFS4CLNT_PURGE_STATE handles errors correctly.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Pull ext2, ext3 and quota fixes from Jan Kara:
"Interesting bits are:
- removal of a special i_mutex locking subclass (I_MUTEX_QUOTA) since
quota code does not need i_mutex anymore in any unusual way.
- backport (from ext4) of a fix of a checkpointing bug (missing cache
flush) that could lead to fs corruption on power failure
The rest are just random small fixes & cleanups."
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
ext2: trivial fix to comment for ext2_free_blocks
ext2: remove the redundant comment for ext2_export_ops
ext3: return 32/64-bit dir name hash according to usage type
quota: Get rid of nested I_MUTEX_QUOTA locking subclass
quota: Use precomputed value of sb_dqopt in dquot_quota_sync
ext2: Remove i_mutex use from ext2_quota_write()
reiserfs: Remove i_mutex use from reiserfs_quota_write()
ext4: Remove i_mutex use from ext4_quota_write()
ext3: Remove i_mutex use from ext3_quota_write()
quota: Fix double lock in add_dquot_ref() with CONFIG_QUOTA_DEBUG
jbd: Write journal superblock with WRITE_FUA after checkpointing
jbd: protect all log tail updates with j_checkpoint_mutex
jbd: Split updating of journal superblock and marking journal empty
ext2: do not register write_super within VFS
ext2: Remove s_dirt handling
ext2: write superblock only once on unmount
ext3: update documentation with barrier=1 default
ext3: remove max_debt in find_group_orlov()
jbd: Refine commit writeout logic
* Extend the APIC ops implementation and add IRQ_WORKER vector support so that 'perf' can work properly.
* Fix self-ballooning code, and balloon logic when booting as initial domain.
* Move array printing code to generic debugfs
* Support XenBus domains.
* Lazily free grants when a domain is dead/non-existent.
* In M2P code use batching calls
Bug-fixes:
* Fix NULL dereference in allocation failure path (hvc_xen)
* Fix unbinding of IRQ_WORKER vector during vCPU hot-unplug
* Fix HVM guest resume - we would leak an PIRQ value instead of reusing the existing one.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQEcBAABAgAGBQJPu9MpAAoJEFjIrFwIi8fJaNQH/RylThiO+O+LBpPrO8VRUw+2
/Io98T7ZK2ggoUeaJx0C8irM0JMFAkxGMcfX3w9fwNt/BTec4s++4JhbN1jYN0da
6a0PqINo+M8y73So6CBfuJDCunaRLGKVG/ibIO3Y3WAff51/H+DMvO7uYYDAE0aA
mikyOxnaty0DiG5i4JGDHGmCzDASfK/jgGccZ03m6522mDx5ZIbTzZWONLfz8dqT
rbxnn9vrNLgEYWuzyLMwW0GymToUtt01xBQvwJLAbhn8lr1WBRBLpxXA+5iYNQrn
Ri25G7keYJhG4uwZfaHnR+4HTrmhlGzK1Z96dkqpGUaeIcdyWmPMp22VtBBiwG8=
=uyRr
-----END PGP SIGNATURE-----
Merge tag 'stable/for-linus-3.5-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
Pull Xen updates from Konrad Rzeszutek Wilk:
"Features:
* Extend the APIC ops implementation and add IRQ_WORKER vector
support so that 'perf' can work properly.
* Fix self-ballooning code, and balloon logic when booting as initial
domain.
* Move array printing code to generic debugfs
* Support XenBus domains.
* Lazily free grants when a domain is dead/non-existent.
* In M2P code use batching calls
Bug-fixes:
* Fix NULL dereference in allocation failure path (hvc_xen)
* Fix unbinding of IRQ_WORKER vector during vCPU hot-unplug
* Fix HVM guest resume - we would leak an PIRQ value instead of
reusing the existing one."
Fix up add-add onflicts in arch/x86/xen/enlighten.c due to addition of
apic ipi interface next to the new apic_id functions.
* tag 'stable/for-linus-3.5-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen: do not map the same GSI twice in PVHVM guests.
hvc_xen: NULL dereference on allocation failure
xen: Add selfballoning memory reservation tunable.
xenbus: Add support for xenbus backend in stub domain
xen/smp: unbind irqworkX when unplugging vCPUs.
xen: enter/exit lazy_mmu_mode around m2p_override calls
xen/acpi/sleep: Enable ACPI sleep via the __acpi_os_prepare_sleep
xen: implement IRQ_WORK_VECTOR handler
xen: implement apic ipi interface
xen/setup: update VA mapping when releasing memory during setup
xen/setup: Combine the two hypercall functions - since they are quite similar.
xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM
xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0
xen/gnttab: add deferred freeing logic
debugfs: Add support to print u32 array in debugfs
xen/p2m: An early bootup variant of set_phys_to_machine
xen/p2m: Collapse early_alloc_p2m_middle redundant checks.
xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument
xen/p2m: Move code around to allow for better re-usage.
Pull sparc changes from David S. Miller:
"This has the generic strncpy_from_user() implementation architectures
can now use, which we've been developing on linux-arch over the past
few days.
For good measure I ran both a 32-bit and a 64-bit glibc testsuite run,
and the latter of which pointed out an adjustment I needed to make to
sparc's user_addr_max() definition. Linus, you were right, STACK_TOP
was not the right thing to use, even on sparc itself :-)
From Sam Ravnborg, we have a conversion of sparc32 over to the common
alloc_thread_info_node(), since the aspect which originally blocked
our doing so (sun4c) has been removed."
Fix up trivial arch/sparc/Kconfig and lib/Makefile conflicts.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: Fix user_addr_max() definition.
lib: Sparc's strncpy_from_user is generic enough, move under lib/
kernel: Move REPEAT_BYTE definition into linux/kernel.h
sparc: Increase portability of strncpy_from_user() implementation.
sparc: Optimize strncpy_from_user() zero byte search.
sparc: Add full proper error handling to strncpy_from_user().
sparc32: use the common implementation of alloc_thread_info_node()
Pull XFS update from Ben Myers:
- Removal of xfsbufd
- Background CIL flushes have been moved to a workqueue.
- Fix to xfs_check_page_type applicable to filesystems where
blocksize < page size
- Fix for stale data exposure when extsize hints are used.
- A series of xfs_buf cache cleanups.
- Fix for XFS_IOC_ALLOCSP
- Cleanups for includes and removal of xfs_lrw.[ch].
- Moved all busy extent handling to it's own file so that it is easier
to merge with userspace.
- Fix for log mount failure.
- Fix to enable inode reclaim during quotacheck at mount time.
- Fix for delalloc quota accounting.
- Fix for memory reclaim deadlock on agi buffer.
- Fixes for failed writes and to clean up stale delalloc blocks.
- Fix to use GFP_NOFS in blkdev_issue_flush
- SEEK_DATA/SEEK_HOLE support
* 'for-linus' of git://oss.sgi.com/xfs/xfs: (57 commits)
xfs: add trace points for log forces
xfs: fix memory reclaim deadlock on agi buffer
xfs: fix delalloc quota accounting on failure
xfs: protect xfs_sync_worker with s_umount semaphore
xfs: introduce SEEK_DATA/SEEK_HOLE support
xfs: make xfs_extent_busy_trim not static
xfs: make XBF_MAPPED the default behaviour
xfs: flush outstanding buffers on log mount failure
xfs: Properly exclude IO type flags from buffer flags
xfs: clean up xfs_bit.h includes
xfs: move xfs_do_force_shutdown() and kill xfs_rw.c
xfs: move xfs_get_extsz_hint() and kill xfs_rw.h
xfs: move xfs_fsb_to_db to xfs_bmap.h
xfs: clean up busy extent naming
xfs: move busy extent handling to it's own file
xfs: move xfsagino_t to xfs_types.h
xfs: use iolock on XFS_IOC_ALLOCSP calls
xfs: kill XBF_DONTBLOCK
xfs: kill xfs_read_buf()
xfs: kill XBF_LOCK
...
Exchange ID can be called in a lease reclaim situation, so it
will deadlock if it then tries to write out dirty NFS pages.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The state manager can handle SEQ4_STATUS_CB_PATH_DOWN* flags with a
BIND_CONN_TO_SESSION instead of destroying the session and creating a new one.
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch adds the BIND_CONN_TO_SESSION operation which is needed for
upcoming SP4_MACH_CRED work and useful for recovering from broken connections
without destroying the session.
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Keep track of the number of bytes read or written via buffered, direct, and
mem-mapped i/o for use by mdsthreshold size_io hints.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We only support one layout type per file system, so one threshold_item4 per
mdsthreshold4.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Pull more networking updates from David Miller:
"Ok, everything from here on out will be bug fixes."
1) One final sync of wireless and bluetooth stuff from John Linville.
These changes have all been in his tree for more than a week, and
therefore have had the necessary -next exposure. John was just away
on a trip and didn't have a change to send the pull request until a
day or two ago.
2) Put back some defines in user exposed header file areas that were
removed during the tokenring purge. From Stephen Hemminger and Paul
Gortmaker.
3) A bug fix for UDP hash table allocation got lost in the pile due to
one of those "you got it.. no I've got it.." situations. :-)
From Tim Bird.
4) SKB coalescing in TCP needs to have stricter checks, otherwise we'll
try to coalesce overlapping frags and crash. Fix from Eric Dumazet.
5) RCU routing table lookups can race with free_fib_info(), causing
crashes when we deref the device pointers in the route. Fix by
releasing the net device in the RCU callback. From Yanmin Zhang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (293 commits)
tcp: take care of overlaps in tcp_try_coalesce()
ipv4: fix the rcu race between free_fib_info and ip_route_output_slow
mm: add a low limit to alloc_large_system_hash
ipx: restore token ring define to include/linux/ipx.h
if: restore token ring ARP type to header
xen: do not disable netfront in dom0
phy/micrel: Fix ID of KSZ9021
mISDN: Add X-Tensions USB ISDN TA XC-525
gianfar:don't add FCB length to hard_header_len
Bluetooth: Report proper error number in disconnection
Bluetooth: Create flags for bt_sk()
Bluetooth: report the right security level in getsockopt
Bluetooth: Lock the L2CAP channel when sending
Bluetooth: Restore locking semantics when looking up L2CAP channels
Bluetooth: Fix a redundant and problematic incoming MTU check
Bluetooth: Add support for Foxconn/Hon Hai AR5BBU22 0489:E03C
Bluetooth: Fix EIR data generation for mgmt_device_found
Bluetooth: Fix Inquiry with RSSI event mask
Bluetooth: improve readability of l2cap_seq_list code
Bluetooth: Fix skb length calculation
...
UDP stack needs a minimum hash size value for proper operation and also
uses alloc_large_system_hash() for proper NUMA distribution of its hash
tables and automatic sizing depending on available system memory.
On some low memory situations, udp_table_init() must ignore the
alloc_large_system_hash() result and reallocs a bigger memory area.
As we cannot easily free old hash table, we leak it and kmemleak can
issue a warning.
This patch adds a low limit parameter to alloc_large_system_hash() to
solve this problem.
We then specify UDP_HTABLE_SIZE_MIN for UDP/UDPLite hash table
allocation.
Reported-by: Mark Asselstine <mark.asselstine@windriver.com>
Reported-by: Tim Bird <tim.bird@am.sony.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull user namespace enhancements from Eric Biederman:
"This is a course correction for the user namespace, so that we can
reach an inexpensive, maintainable, and reasonably complete
implementation.
Highlights:
- Config guards make it impossible to enable the user namespace and
code that has not been converted to be user namespace safe.
- Use of the new kuid_t type ensures the if you somehow get past the
config guards the kernel will encounter type errors if you enable
user namespaces and attempt to compile in code whose permission
checks have not been updated to be user namespace safe.
- All uids from child user namespaces are mapped into the initial
user namespace before they are processed. Removing the need to add
an additional check to see if the user namespace of the compared
uids remains the same.
- With the user namespaces compiled out the performance is as good or
better than it is today.
- For most operations absolutely nothing changes performance or
operationally with the user namespace enabled.
- The worst case performance I could come up with was timing 1
billion cache cold stat operations with the user namespace code
enabled. This went from 156s to 164s on my laptop (or 156ns to
164ns per stat operation).
- (uid_t)-1 and (gid_t)-1 are reserved as an internal error value.
Most uid/gid setting system calls treat these value specially
anyway so attempting to use -1 as a uid would likely cause
entertaining failures in userspace.
- If setuid is called with a uid that can not be mapped setuid fails.
I have looked at sendmail, login, ssh and every other program I
could think of that would call setuid and they all check for and
handle the case where setuid fails.
- If stat or a similar system call is called from a context in which
we can not map a uid we lie and return overflowuid. The LFS
experience suggests not lying and returning an error code might be
better, but the historical precedent with uids is different and I
can not think of anything that would break by lying about a uid we
can't map.
- Capabilities are localized to the current user namespace making it
safe to give the initial user in a user namespace all capabilities.
My git tree covers all of the modifications needed to convert the core
kernel and enough changes to make a system bootable to runlevel 1."
Fix up trivial conflicts due to nearby independent changes in fs/stat.c
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits)
userns: Silence silly gcc warning.
cred: use correct cred accessor with regards to rcu read lock
userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq
userns: Convert cgroup permission checks to use uid_eq
userns: Convert tmpfs to use kuid and kgid where appropriate
userns: Convert sysfs to use kgid/kuid where appropriate
userns: Convert sysctl permission checks to use kuid and kgids.
userns: Convert proc to use kuid/kgid where appropriate
userns: Convert ext4 to user kuid/kgid where appropriate
userns: Convert ext3 to use kuid/kgid where appropriate
userns: Convert ext2 to use kuid/kgid where appropriate.
userns: Convert devpts to use kuid/kgid where appropriate
userns: Convert binary formats to use kuid/kgid where appropriate
userns: Add negative depends on entries to avoid building code that is userns unsafe
userns: signal remove unnecessary map_cred_ns
userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
userns: Convert stat to return values mapped from kuids and kgids
userns: Convert user specfied uids and gids in chown into kuids and kgid
userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
...
* Implementation of opportunistic suspend (autosleep) and user space interface
for manipulating wakeup sources.
* Hibernate updates from Bojan Smojver and Minho Ban.
* Updates of the runtime PM core and generic PM domains framework related to
PM QoS.
* Assorted fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQIcBAABAgAGBQJPu+jwAAoJEKhOf7ml8uNsOw0P/0w1FqXD64a1laE43JIlBe9w
yHEcLHc9MXN+8lS0XQ6jFiL/VC3U5Sj7Ro+DFKcL2MWX//dfDcZcwA9ep/qh4tHV
tJ987IijdWqJV14pde3xQafhp/9i12rArLxns7S5fzkdfVk0iDjhZZaZy4afFJYM
SuCsDhCwWefZh89+oLikByiFPnhW+f2ZC9YQeokBM/XvZLtxmOiVfL6duloT/Cr+
58jkrJ8xz/5kmmN4bXM4Wlpf9ZIYFXbvtbKrq3GZOXc+LpNKlWQyFgg/pIuxBewC
uSgsNXXV0LFDi5JfER/8l9MMLtJwwc4VHzpLvMnRv+GtwO2/FKIIr9Fcv000IL2N
0/Ppr52M7XpRruM/k+YroUQ4F1oBX6HB4e3rwqC+XG6n5bwn/Jc7kdy7aUojqNLG
Nlr5f0vBjLTSF66Jnel71Bn+gbA1ogER7E+esSTMpyX+RgGJAUVt5oX9IjbXl3PI
bk8xW1csSRxBI2NkFOd9EM3vMzdGc5uu+iOoy7iBvcAK0AEfo2Ml9YuSVFQeqAu0
A96MUW155A+GKMC7I/LK8pTgMvYDedWhVW9uyXpMRjwdFC5/ywZU1aM00tL9HMpG
pzHOFJgsYrf/6VCV8BwqgudRYd0K5EPSGeITCg973os/XzJIOCfJuy+Pn5V/F0ew
lTbi8ipQD0Hh8A/Xt0QB
=Q2vo
-----END PGP SIGNATURE-----
Merge tag 'pm-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
- Implementation of opportunistic suspend (autosleep) and user space
interface for manipulating wakeup sources.
- Hibernate updates from Bojan Smojver and Minho Ban.
- Updates of the runtime PM core and generic PM domains framework
related to PM QoS.
- Assorted fixes.
* tag 'pm-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (25 commits)
epoll: Fix user space breakage related to EPOLLWAKEUP
PM / Domains: Make it possible to add devices to inactive domains
PM / Hibernate: Use get_gendisk to verify partition if resume_file is integer format
PM / Domains: Fix computation of maximum domain off time
PM / Domains: Fix link checking when add subdomain
PM / Sleep: User space wakeup sources garbage collector Kconfig option
PM / Sleep: Make the limit of user space wakeup sources configurable
PM / Documentation: suspend-and-cpuhotplug.txt: Fix typo
PM / Domains: Cache device stop and domain power off governor results, v3
PM / Domains: Make device removal more straightforward
PM / Sleep: Fix a mistake in a conditional in autosleep_store()
epoll: Add a flag, EPOLLWAKEUP, to prevent suspend while epoll events are ready
PM / QoS: Create device constraints objects on notifier registration
PM / Runtime: Remove device fields related to suspend time, v2
PM / Domains: Rework default domain power off governor function, v2
PM / Domains: Rework default device stop governor function, v2
PM / Sleep: Add user space interface for manipulating wakeup sources, v3
PM / Sleep: Add "prevent autosleep time" statistics to wakeup sources
PM / Sleep: Implement opportunistic sleep, v2
PM / Sleep: Add wakeup_source_activate and wakeup_source_deactivate tracepoints
...
Ensure that a process that uses the nfs_client->cl_cons_state test
for whether the initialisation process is finished does not read
stale data.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Since the struct nfs_client gets added to the global nfs_client_list
before it is initialised, it is possible that rpc_pipefs_event can
end up trying to create idmapper entries on such a thing.
The solution is to have the mount notification wait for the
initialisation of each nfs_client to complete, and then to
skip any entries for which the it failed.
Reported-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Session initialisation is not complete until the lease manager
has run. We need to ensure that both nfs4_init_session and
nfs4_init_ds_session do so, and that they check for any resulting
errors in clp->cl_cons_state.
Only after this is done, can nfs4_ds_connect check the contents
of clp->cl_exchange_flags.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Andy Adamson <andros@netapp.com>
Pull fpu state cleanups from Ingo Molnar:
"This tree streamlines further aspects of FPU handling by eliminating
the prepare_to_copy() complication and moving that logic to
arch_dup_task_struct().
It also fixes the FPU dumps in threaded core dumps, removes and old
(and now invalid) assumption plus micro-optimizes the exit path by
avoiding an FPU save for dead tasks."
Fixed up trivial add-add conflict in arch/sh/kernel/process.c that came
in because we now do the FPU handling in arch_dup_task_struct() rather
than the legacy (and now gone) prepare_to_copy().
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, fpu: drop the fpu state during thread exit
x86, xsave: remove thread_has_fpu() bug check in __sanitize_i387_state()
coredump: ensure the fpu state is flushed for proper multi-threaded core dump
fork: move the real prepare_to_copy() users to arch_dup_task_struct()
While traversing the linked list of open file handles, if the identfied
file handle is invalid, a reopen is attempted and if it fails, we
resume traversing where we stopped and cifs can oops while accessing
invalid next element, for list might have changed.
So mark the invalid file handle and attempt reopen if no
valid file handle is found in rest of the list.
If reopen fails, move the invalid file handle to the end of the list
and start traversing the list again from the begining.
Repeat this four times before giving up and returning an error if
file reopen keeps failing.
Cc: <stable@vger.kernel.org>
Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
For more details see <file: Documentation/filesystems/porting>.
Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
As with Linux nfs client, which uses "nfsvers=" or "vers=" to
indicate which protocol to use for mount, specifying
"vers=2.1"
will force an SMB2 mount. When vers is not specified CIFS is used
"vers=1"
We can eventually autonegotiate down from SMB2 to CIFS
when SMB2 is stable enough to make it the default, but this
is for the future. At that time we could also implement a
"maxprotocol" mount option as smbclient and Samba have today,
but that would be premature until SMB2 is stable.
Intially the SMB2 Kconfig option will depend on "BROKEN"
until the merge is complete, and then be "EXPERIMENTAL"
When it is no longer experimental we can consider changing
the default protocol to attempt first.
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
SMB2 is the followon to the CIFS (and SMB) protocols
and the default for Windows since Windows Vista, and also
now implemented by various non-Windows servers. SMB2
is more secure, has various performance advantages, including
larger i/o sizes, flow control, better caching model and more.
SMB2 also resolves some scalability limits in the CIFS
protocol and adds many new features while being much
simpler (only a few dozen commands instead of hundreds)
and since the protocol is clearer it is also more consistently
implemented across servers and thus easier to optimize.
After much discussion with Jeff Layton, Jeremy Allison
and others at Connectathon, we decided to move the SMB2
code from a distinct .ko and fstype into distinct
C files that optionally build in cifs.ko. As a result
the Kconfig gets simpler.
To avoid destabilizing CIFS, the SMB2 code is going
to be moved into its own experimental CONFIG_CIFS_SMB2 ifdef
as it is merged and rereviewed. The changes to stable
CIFS (builds with the SMB2 ifdef off) are expected to be
fairly small.
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>
This set includes some minor fixes and improvements.
The one large patch addresses the special "nodir" mode,
which has been a long neglected proof of concept, but
with these fixes seems to be quite usable. It allows
the resource master to be assigned statically instead of
dynamically, which can improve performance if there is
little locality and most resources are shared.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJPu/MlAAoJEDgbc8f8gGmq860P/0o+tYG2pAUz87WnKg92cGwm
ajaI78ydY6qOjndcEjbgdX6uWqVQ7f/OKo3drzVH8KFQ67eiaXC4wv2xTL3aymbX
2Ua55oiVsW+k9d9yK5Dzfa4qAlR5QPV1WEAnoVkiEDNoiGCGecjmVebhK1/Sb5Lu
1gaIJ3C+3L1ngfAzpfeB+7LwuVB36UlIyBrvPOj6yWiSDgpPaVbTrEU0NaDDDDIi
oo7tTiqivCZf/GH+ZcIjPE/LBen/lVqXSDU2YShiac/ErRfpRk9rnDFIUeN2nYPd
JwPjzutFWM+N6HIA2RCBXKo7FkK2rvYXw84/RVMvA4goEH/Qu8yDtBww20BmvFYY
3guU1udka0/NR7/ap98Btdqsvqco6R2X/rpzx8y1eD1jzUvb6El6yg3PM1Qvd8zQ
72aVzcdgAI4qtEAVziy5X4omNeQ6a55sUYXlCcvkiwZJQdPzkDuzntC28q3bgJva
QD0ugX7ltBpHuZZZb2tbBN9hfMqyo7gneaY2OoGVCTb1U9ibb5JgfZOswTC2gQsE
17vykdL5owQ8bbBj2tkRQiJ8dZoxn23hV+sZrvLm3TR8xF4oJtDqUdRs9K7iX8It
YxTTCL1LmxHRFG/0Cy2l7VhoqkIKsoVFdavW7pivFNkzp/yQNHk4r2iJWhR9YArV
qaE2HqIxJsev/B/lBPyo
=mHOh
-----END PGP SIGNATURE-----
Merge tag 'dlm-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
"This set includes some minor fixes and improvements. The one large
patch addresses the special "nodir" mode, which has been a long
neglected proof of concept, but with these fixes seems to be quite
usable. It allows the resource master to be assigned statically
instead of dynamically, which can improve performance if there is
little locality and most resources are shared."
* tag 'dlm-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
dlm: NULL dereference on failure in kmem_cache_create()
gfs2: fix recovery during unmount
dlm: fixes for nodir mode
dlm: improve error and debug messages
dlm: avoid unnecessary search in search_rsb
dlm: limit rcom debug messages
dlm: fix waiter recovery
dlm: prevent connections during shutdown
* Always support xattrs (remove the Kconfig option)
* Always support debugging (remove the Kconfig option)
* A fix for a memory leak on error path
* A number of clean-ups
UBI:
* Always support debugging (remove the Kconfig option)
* Remove "data type" hint support
* Huge amount of renames to prepare for the fastmap wor
* A lot of clean-ups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJPuzTxAAoJECmIfjd9wqK0D/AP/iNOnYWnYZmmO18jDM48kKt/
Jp7VTAE0l7DBUDxtiIthq4c7YxIE1o1bN9gMmvzZibvwIrZcoAOnQpeL96s1Bc9J
t0aGm8ONvrtuyFeyxPC0aplWgqWQ49qDLGV/lIVJ+BSGmXMeU4giUIXqbsjyCPR4
YVJJw6rLTC10EhuAUs99keJxxuN5ZMrCB8y47fD+bkalVxgqNh9JNkKabyjevt5C
AERVWnP20hnEcwnbQWMHueGWiaqFeesTytNOy6heRi0uL3bNy5nrol7AFXKqnDc9
OpSkApH6SCO3C8X/bIep2bL9kKiW1LpClxgDIF6p7lj2t2ToPn6PZJbP60zSHQPb
0bgy1SzHccF3ihIMgCdOXYZ5EomBgKZyDyU6Ec+gAttE00ZbIigNmjFmukwMhO89
I0bGvjQdKFAFSzo+ffm8xNfYjmmNfB+edLkPaVttjMWAbQ4V831ZPDT07Q11W4TQ
2p2NDKTps3etbtkemZ/Cm1jeEWI3KuogrFhyDhpcgXc7pxlJbvMg+tt22FusoQ8T
VPGGT+WhmXfF0ZG/gurI69k8opj4BUhm4EfGL6pGEoUMe1nGp2pSUNv5Kwby1wau
1wElJt2qO9xdjJ4QlLc+Ux1vm8rCS1iQst9plUX1BZt2bKja7tZaW7uu4hGKqe5u
UwrosuYcmS1Ei1Rs7Sqz
=+6Qi
-----END PGP SIGNATURE-----
Merge tag 'upstream-3.5-rc1' of git://git.infradead.org/linux-ubifs
Pull UBI and UBIFS updates from Artem Bityutskiy:
UBIFS:
* Always support xattrs (remove the Kconfig option)
* Always support debugging (remove the Kconfig option)
* A fix for a memory leak on error path
* A number of clean-ups
UBI:
* Always support debugging (remove the Kconfig option)
* Remove "data type" hint support
* Huge amount of renames to prepare for the fastmap wor
* A lot of clean-ups
* tag 'upstream-3.5-rc1' of git://git.infradead.org/linux-ubifs: (54 commits)
UBI: modify ubi_wl_flush function to clear work queue for a lnum
UBI: introduce UBI_ALL constant
UBI: add lnum and vol_id to struct ubi_work
UBI: add volume id struct ubi_ainf_peb
UBI: add in hex the value for UBI_INTERNAL_VOL_START to comment
UBI: rename scan.c to attach.c
UBI: remove scan.h
UBI: rename UBI_SCAN_UNKNOWN_EC
UBI: move and rename attach_by_scanning
UBI: rename _init_scan functions
UBI: amend comments after all the renamings
UBI: rename ubi_scan_leb_slab
UBI: rename ubi_scan_move_to_list
UBI: rename ubi_scan_destroy_ai
UBI: rename ubi_scan_get_free_peb
UBI: rename ubi_scan_rm_volume
UBI: rename ubi_scan_find_av
UBI: rename ubi_scan_add_used
UBI: remove unused function
UBI: make ubi_scan_erase_peb static and rename
...
Pull trivial updates from Jiri Kosina:
"As usual, it's mostly typo fixes, redundant code elimination and some
documentation updates."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (57 commits)
edac, mips: don't change code that has been removed in edac/mips tree
xtensa: Change mail addresses of Hannes Weiner and Oskar Schirmer
lib: Change mail address of Oskar Schirmer
net: Change mail address of Oskar Schirmer
arm/m68k: Change mail address of Sebastian Hess
i2c: Change mail address of Oskar Schirmer
net: Fix tcp_build_and_update_options comment in struct tcp_sock
atomic64_32.h: fix parameter naming mismatch
Kconfig: replace "--- help ---" with "---help---"
c2port: fix bogus Kconfig "default no"
edac: Fix spelling errors.
qla1280: Remove redundant NULL check before release_firmware() call
remoteproc: remove redundant NULL check before release_firmware()
qla2xxx: Remove redundant NULL check before release_firmware() call.
aic94xx: Get rid of redundant NULL check before release_firmware() call
tehuti: delete redundant NULL check before release_firmware()
qlogic: get rid of a redundant test for NULL before call to release_firmware()
bna: remove redundant NULL test before release_firmware()
tg3: remove redundant NULL test before release_firmware() call
typhoon: get rid of redundant conditional before all to release_firmware()
...
Here is the big staging tree pull request for the 3.5-rc1 merge window.
Loads of changes here, and we just narrowly added more lines than we
added:
622 files changed, 28356 insertions(+), 26059 deletions(-)
But, good news is that there is a number of subsystems that moved out of
the staging tree, to their respective "real" portions of the kernel.
Code that moved out was:
- iio core code
- mei driver
- vme core and bridge drivers
There was one broken network driver that moved into staging as a step
before it is removed from the tree (pc300), and there was a few new
drivers added to the tree:
- new iio drivers
- gdm72xx wimax USB driver
- ipack subsystem and 2 drivers
All of the movements around have acks from the various subsystem
maintainers, and all of this has been in the linux-next tree for a
while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iEYEABECAAYFAk+7q8MACgkQMUfUDdst+ymjogCguo8fANFVlPWeZGeoBTL+aQfQ
yTkAoLE0codmh+2SvhulYgyU1Wh6ZDK2
=nJ2F
-----END PGP SIGNATURE-----
Merge tag 'staging-3.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging tree changes from Greg Kroah-Hartman:
"Here is the big staging tree pull request for the 3.5-rc1 merge
window.
Loads of changes here, and we just narrowly added more lines than we
added:
622 files changed, 28356 insertions(+), 26059 deletions(-)
But, good news is that there is a number of subsystems that moved out
of the staging tree, to their respective "real" portions of the
kernel.
Code that moved out was:
- iio core code
- mei driver
- vme core and bridge drivers
There was one broken network driver that moved into staging as a step
before it is removed from the tree (pc300), and there was a few new
drivers added to the tree:
- new iio drivers
- gdm72xx wimax USB driver
- ipack subsystem and 2 drivers
All of the movements around have acks from the various subsystem
maintainers, and all of this has been in the linux-next tree for a
while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
Fixed up various trivial conflicts, along with a non-trivial one found
in -next and pointed out by Olof Johanssen: a clean - but incorrect -
merge of the arch/arm/boot/dts/at91sam9g20.dtsi file. Fix up manually
as per Stephen Rothwell.
* tag 'staging-3.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (536 commits)
Staging: bcm: Remove two unused variables from Adapter.h
Staging: bcm: Removes the volatile type definition from Adapter.h
Staging: bcm: Rename all "INT" to "int" in Adapter.h
Staging: bcm: Fix warning: __packed vs. __attribute__((packed)) in Adapter.h
Staging: bcm: Correctly format all comments in Adapter.h
Staging: bcm: Fix all whitespace issues in Adapter.h
Staging: bcm: Properly format braces in Adapter.h
Staging: ipack/bridges/tpci200: remove unneeded casts
Staging: ipack/bridges/tpci200: remove TPCI200_SHORTNAME constant
Staging: ipack: remove board_name and bus_name fields from struct ipack_device
Staging: ipack: improve the register of a bus and a device in the bus.
staging: comedi: cleanup all the comedi_driver 'detach' functions
staging: comedi: remove all 'default N' in Kconfig
staging: line6/config.h: Delete unused header
staging: gdm72xx depends on NET
staging: gdm72xx: Set up parent link in sysfs for gdm72xx devices
staging: drm/omap: initial dmabuf/prime import support
staging: drm/omap: dmabuf/prime mmap support
pstore/ram: Add ECC support
pstore/ram: Switch to persistent_ram routines
...
Here's the driver core, and other driver subsystems, pull request for
the 3.5-rc1 merge window.
Outside of a few minor driver core changes, we ended up with the
following different subsystem and core changes as well, due to
interdependancies on the driver core:
- hyperv driver updates
- drivers/memory being created and some drivers moved into it
- extcon driver subsystem created out of the old Android staging switch
driver code
- dynamic debug updates
- printk rework, and /dev/kmsg changes
All of this has been tested in the linux-next releases for a few weeks
with no reported problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iEYEABECAAYFAk+7q28ACgkQMUfUDdst+ykXmwCfcPASzC+/bDkuqdWsqzxlWZ7+
VOQAnAriySv397St36J6Hz5bMQZwB1Yq
=SQc+
-----END PGP SIGNATURE-----
Merge tag 'driver-core-3.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg Kroah-Hartman:
"Here's the driver core, and other driver subsystems, pull request for
the 3.5-rc1 merge window.
Outside of a few minor driver core changes, we ended up with the
following different subsystem and core changes as well, due to
interdependancies on the driver core:
- hyperv driver updates
- drivers/memory being created and some drivers moved into it
- extcon driver subsystem created out of the old Android staging
switch driver code
- dynamic debug updates
- printk rework, and /dev/kmsg changes
All of this has been tested in the linux-next releases for a few weeks
with no reported problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
Fix up conflicts in drivers/extcon/extcon-max8997.c where git noticed
that a patch to the deleted drivers/misc/max8997-muic.c driver needs to
be applied to this one.
* tag 'driver-core-3.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (90 commits)
uio_pdrv_genirq: get irq through platform resource if not set otherwise
memory: tegra{20,30}-mc: Remove empty *_remove()
printk() - isolate KERN_CONT users from ordinary complete lines
sysfs: get rid of some lockdep false positives
Drivers: hv: util: Properly handle version negotiations.
Drivers: hv: Get rid of an unnecessary check in vmbus_prep_negotiate_resp()
memory: tegra{20,30}-mc: Use dev_err_ratelimited()
driver core: Add dev_*_ratelimited() family
Driver Core: don't oops with unregistered driver in driver_find_device()
printk() - restore prefix/timestamp printing for multi-newline strings
printk: add stub for prepend_timestamp()
ARM: tegra30: Make MC optional in Kconfig
ARM: tegra20: Make MC optional in Kconfig
ARM: tegra30: MC: Remove unnecessary BUG*()
ARM: tegra20: MC: Remove unnecessary BUG*()
printk: correctly align __log_buf
ARM: tegra30: Add Tegra Memory Controller(MC) driver
ARM: tegra20: Add Tegra Memory Controller(MC) driver
printk() - restore timestamp printing at console output
printk() - do not merge continuation lines of different threads
...
Save the server major and minor ID results from EXCHANGE_ID, as they
are needed for detecting server trunking.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
"noresvport" and "discrtry" can be passed to nfs_create_rpc_client()
by setting flags in the passed-in nfs_client. This change makes it
easy to add new flags.
Note that these settings are now "sticky" over the lifetime of a
struct nfs_client, and may even be copied when an nfs_client is
cloned.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: Continue to rationalize the locking in nfs_get_client() by
moving the logic that handles the case where a matching server IP
address is not found.
When we support server trunking detection, client initialization may
return a different nfs_client struct than was passed to it. Change
the synopsis of the init_client methods to return an nfs_client.
The client initialization logic in nfs_get_client() is not much more
than a wrapper around ->init_client. It's simpler to keep the little
bits of error handling in the version-specific init_client methods.
No behavior change is expected.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: Code that takes and releases nfs_client_lock remains in
nfs_get_client(). Logic that handles a pre-existing nfs_client is
moved to a separate function.
No behavior change is expected.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently our NFS client assigns a unique SETCLIENTID boot verifier
for each server IP address it knows about. It's set to CURRENT_TIME
when the struct nfs_client for that server IP is created.
During the SETCLIENTID operation, our client also presents an
nfs_client_id4 string to servers, as an identifier on which the server
can hang all of this client's NFSv4 state. Our client's
nfs_client_id4 string is unique for each server IP address.
An NFSv4 server is obligated to wipe all NFSv4 state associated with
an nfs_client_id4 string when the client presents the same
nfs_client_id4 string along with a changed SETCLIENTID boot verifier.
When our client unmounts the last of a server's shares, it destroys
that server's struct nfs_client. The next time the client mounts that
NFS server, it creates a fresh struct nfs_client with a fresh boot
verifier. On seeing the fresh verifer, the server wipes any previous
NFSv4 state associated with that nfs_client_id4.
However, NFSv4.1 clients are supposed to present the same
nfs_client_id4 string to all servers. And, to support Transparent
State Migration, the same nfs_client_id4 string should be presented
to all NFSv4.0 servers so they recognize that migrated state for this
client belongs with state a server may already have for this client.
(This is known as the Uniform Client String model).
If the nfs_client_id4 string is the same but the boot verifier changes
for each server IP address, SETCLIENTID and EXCHANGE_ID operations
from such a client could unintentionally result in a server wiping a
client's previously obtained lease.
Thus, if our NFS client is going to use a fixed nfs_client_id4 string,
either for NFSv4.0 or NFSv4.1 mounts, our NFS client should use a
boot verifier that does not change depending on server IP address.
Replace our current per-nfs_client boot verifier with a per-nfs_net
boot verifier.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
nfs4_reset_all_state() refreshes the boot verifier a server sees to
trigger that server to wipe this client's state. This function is
invoked when an NFSv4.1 server reports that it has revoked some or
all of a client's NFSv4 state.
To facilitate server trunking discovery, we will eventually want to
move the cl_boot_time field to a more global structure. The Uniform
Client String model (and specifically, server trunking detection)
requires that all servers see the same boot verifier until the client
actually does reboot, and not a fresh verifier every time the client
unmounts and remounts the server.
Without the cl_boot_time field, however, nfs4_reset_all_state() will
have to find some other way to force the server to purge the client's
NFSv4 state.
Because these verifiers are opaque (ie, the server doesn't know or
care that they happen to be timestamps), we can force the server
to wipe NFSv4 state by updating the boot verifier as we do now, then
immediately afterwards establish a fresh client ID using the old boot
verifier again.
Hopefully there are no extra paranoid server implementations that keep
track of the client's boot verifiers and prevent clients from reusing
a previous one.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: update to use matching types in "if" expressions.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: When naming fields and data types, follow established
conventions to facilitate accurate grep/cscope searches.
Introduced by commit e50a7a1a "NFS: make NFS client allocated per
network namespace context," Tue Jan 10, 2012.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: When naming fields and data types, follow established
conventions to facilitate accurate grep/cscope searches.
Additionally, for consistency, move the impl_id field into the NFSv4-
specific part of the nfs_client, and free that memory in the logic
that shuts down NFSv4 nfs_clients.
Introduced by commit 7d2ed9ac "NFSv4: parse and display server
implementation ids," Fri Feb 17, 2012.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: When naming fields and data types, follow established
conventions to facilitate accurate grep/cscope searches.
Additionally, for consistency, move the scope field into the NFSv4-
specific part of the nfs_client, and free that memory in the logic
that shuts down NFSv4 nfs_clients.
Introduced by commit 99fe60d0 "nfs41: exchange_id operation", April
1 2009.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
fs/nfs/nfs4state.c does not yet have any dprintk() call sites, and I'm
about to introduce some. We will need a new flag for enabling them.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The SETCLIENTID boot verifier is opaque to NFSv4 servers, thus there
is no requirement for byte swapping before the client puts the
verifier on the wire.
This treatment is similar to other timestamp-based verifiers.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The "struct inode *inode" was only used in a dprintk, so compiling with
CONFIG_SUNRPC_DEBUG off triggers a warning. To get around this, I
remove the "struct inode *inode" variable and instead change the
dprintk()s to use hdr->inode instead.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We reset all I/O on a disconnected data server through the pgio layer indicated
by the NFS_IOHDR_REDO flag.
Differentiate between on-the-wire tasks returning with an error which must
call rpc_call_done and tasks woken from the data server slot_table_waitq
waiting for a session slot with a status of zero which call rpc_exit in
rpc_prepare and need to skip rpc_call_done.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
filelayout_scan_commit_lists needs to bump the reference count on
the struct nfs_page just like nfs_scan_commit_list().
Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Commit 4d7e30d (epoll: Add a flag, EPOLLWAKEUP, to prevent
suspend while epoll events are ready) caused some applications to
malfunction, because they set the bit corresponding to the new
EPOLLWAKEUP flag in their eventpoll flags and they don't have the
new CAP_EPOLLWAKEUP capability.
To prevent that from happening, change epoll_ctl() to clear
EPOLLWAKEUP in epds.events if the caller doesn't have the
CAP_EPOLLWAKEUP capability instead of failing and returning an
error code, which allows the affected applications to function
normally.
Reported-and-tested-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Pull security subsystem updates from James Morris:
"New notable features:
- The seccomp work from Will Drewry
- PR_{GET,SET}_NO_NEW_PRIVS from Andy Lutomirski
- Longer security labels for Smack from Casey Schaufler
- Additional ptrace restriction modes for Yama by Kees Cook"
Fix up trivial context conflicts in arch/x86/Kconfig and include/linux/filter.h
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (65 commits)
apparmor: fix long path failure due to disconnected path
apparmor: fix profile lookup for unconfined
ima: fix filename hint to reflect script interpreter name
KEYS: Don't check for NULL key pointer in key_validate()
Smack: allow for significantly longer Smack labels v4
gfp flags for security_inode_alloc()?
Smack: recursive tramsmute
Yama: replace capable() with ns_capable()
TOMOYO: Accept manager programs which do not start with / .
KEYS: Add invalidation support
KEYS: Do LRU discard in full keyrings
KEYS: Permit in-place link replacement in keyring list
KEYS: Perform RCU synchronisation on keys prior to key destruction
KEYS: Announce key type (un)registration
KEYS: Reorganise keys Makefile
KEYS: Move the key config into security/keys/Kconfig
KEYS: Use the compat keyctl() syscall wrapper on Sparc64 for Sparc32 compat
Yama: remove an unused variable
samples/seccomp: fix dependencies on arch macros
Yama: add additional ptrace scopes
...
Pull GFS2 changes from Steven Whitehouse.
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: (24 commits)
GFS2: Fix quota adjustment return code
GFS2: Add rgrp information to block_alloc trace point
GFS2: Eliminate unused "new" parameter to gfs2_meta_indirect_buffer
GFS2: Update glock doc to add new stats info
GFS2: Update main gfs2 doc
GFS2: Remove redundant metadata block type check
GFS2: Fix sgid propagation when using ACLs
GFS2: eliminate log elements and simplify
GFS2: Eliminate vestigial sd_log_le_rg
GFS2: Eliminate needless parameter from function gfs2_setbit
GFS2: Log code fixes
GFS2: Remove unused argument from gfs2_internal_read
GFS2: Remove bd_list_tr
GFS2: Remove duplicate log code
GFS2: Clean up log write code path
GFS2: Use variable rather than qa to determine if unstuff necessary
GFS2: Change variable blk to biblk
GFS2: Fix function parameter comments in rgrp.c
GFS2: Eliminate offset parameter to gfs2_setbit
GFS2: Use slab for block reservation memory
...
This reverts commit 8c01a529b8.
It turns out the d_unhashed() check isn't unnecessary after all: while
it's true that unhashing will increment the sequence numbers, that does
not necessarily invalidate the RCU lookup, because it might have seen
the dentry pointer (before it got unhashed), but by the time it loaded
the sequence number, it could have seen the *new* sequence number (after
it got unhashed).
End result: we might look up an unhashed dentry that is about to be
freed, with the sequence number never indicating anything bad about it.
So checking that the dentry is still hashed (*after* reading the sequence
number) is indeed the proper fix, and was never unnecessary.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miklos Szeredi points out that we need to also worry about memory
odering when doing the dentry name comparison asynchronously with RCU.
In particular, doing a rename can do a memcpy() of one dentry name over
another, and we want to make sure that any unlocked reader will always
see the proper terminating NUL character, so that it won't ever run off
the allocation.
Rather than having to be extra careful with the name copy or at lookup
time for each character, this resolves the issue by making sure that all
names that are inlined in the dentry always have a NUL character at the
end of the name allocation. If we do that at dentry allocation time, we
know that no future name copy will ever change that final NUL to
anything else, so there are no memory ordering issues.
So even if a concurrent rename ends up overwriting the NUL character
that terminates the original name, we always know that there is one
final NUL at the end, and there is no worry about the lockless RCU
lookup traversing the name too far.
The out-of-line allocations are never copied over, so we can just make
sure that we write the name (with terminating NULL) and do a write
barrier before we expose the name to anything else by setting it in the
dentry.
Reported-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nick Piggin <npiggin@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We had for some reason overlooked the AIO interface, and it didn't use
the proper rw_verify_area() helper function that checks (for example)
mandatory locking on the file, and that the size of the access doesn't
cause us to overflow the provided offset limits etc.
Instead, AIO did just the security_file_permission() thing (that
rw_verify_area() also does) directly.
This fixes it to do all the proper helper functions, which not only
means that now mandatory file locking works with AIO too, we can
actually remove lines of code.
Reported-by: Manish Honap <manish_honap_vit@yahoo.co.in>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJPunCjAAoJEOiN4VijXeFPR3cP+gLDhWJGWC/xnI+vr1WJJxZU
xJQbaopEyqJa7GBT+3BasCCLUCU7MVIrYaqubzEI4dzVCMDxwuJYPT7m/N3Avssg
bm1ZaLUnanc8ZLfLHlnsG02xtBtEYI4mUM7ggCfcDJJC0b7tDGcz0kiM2WDGK0fq
LRbZmlC57qK0UTx2IEvl0DoMuH7pq7N1sSXFr6keBm/lqN7Qj3VtAf/ASVQdIgN8
TfQiWbu40v8EEcKAZSkVKvhS3kSiPI9Dr1DGeR1JIhjFqFYUaw52aOzNTJyC9k1g
4oPIoDh0xnSzYrYpetP1gN5bT/FMRcVXLvKM7cIbeUzoAlgnQsBDAGA7x205X8Bf
ChrQEpm/yKZnwEBtp9uIEQY3rRT7iecqVgeVO5LXskg+/OX0+gO5CtCDPopWzqJN
wSOCsP9va0G9W0a8G5hKqEGYhbCXDkzU6KzmcgxbxIeF+CvJ+72mWgqxJkF0GtuU
KCoLQT7Necq+4p/SaiXL7KogS9m2rClbszManceyAYSGwqAPIfU3RzUw+Fz6NcIh
DrYXNTZelrVN2wrVrlhn3B6GdgfyVabhQ1e3CftmAtbY/i+i/ArHCT9JxjyAHTuV
ekTnpVVDBZUCUluJXFJKzrvRuSUt9X18AxK0NfJ+xglOPQnPh026K/sCuvmn6mAq
U1f1XmCqG2l2TDtkZ6Wu
=q4Z6
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming
Pull c6x updates from Mark Salter:
"Clean up some c6x Kconfig items and add support for Elf FDPIC loader."
* tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming:
C6X: remove unused config items
C6X: add support to build with BINFMT_ELF_FDPIC
C6X: change main arch kbuild symbol
Pull networking changes from David Miller:
1) Get rid of the error prone NLA_PUT*() macros that used an embedded
goto.
2) Kill off the token-ring and MCA networking drivers, from Paul
Gortmaker.
3) Reduce high-order allocations made by datagram AF_UNIX sockets, from
Eric Dumazet.
4) Add PTP hardware clock support to IGB and IXGBE, from Richard
Cochran and Jacob Keller.
5) Allow users to query timestamping capabilities of a card via
ethtool, from Richard Cochran.
6) Add loadbalance mode to the teaming driver, from Jiri Pirko. Part
of this is that we can now have BPF filters not attached to sockets,
and the loadbalancing function is calculated using one.
7) Francois Romieu went through the network drivers removing gratuitous
uses of netdev->base_addr, perhaps some day we can remove it
completely but it's used for ISA probing still.
8) Add a BPF JIT for sparc. I know, who cares, right? :-)
9) Move networking sysctl registry away from using the compatability
mode interfaces in the sysctl code. From Eric W Biederman.
10) Pavel Emelyanov added a way to save and restore TCP socket state via
TCP_REPAIR, TCP_REPAIR_QUEUE, and TCP_QUEUE_SEQ socket options as
well as a way to forcefully bind a socket to a port via the
sk->sk_reuse value SK_FORCE_REUSE. There is also a
TCP_REPAIR_OPTIONS which allows to reinstante the TCP options
enabled on the connection.
11) Several enhancements from Eric Dumazet that, in particular, can
enhance splice performance on TCP sockets significantly.
a) Reset the offset of the per-socket sendmsg page when we know
we're the only use of the page in linear_to_page().
b) Add facilities such that skb->data can be backed a page rather
than SLAB kmalloc'd memory. In particular devices which were
receiving into linear RX buffers can now end up providing paged
data.
The big result is that code like splice and GRO do not have to copy
any more.
12) Allow a pure sender to more gracefully handle ACK backlogs in TCP.
What can happen at high rates is that the sender hasn't grown his
receive buffer limits at all (he's not receiving data so really
doesn't need to), but the non-data ACKs consume receive buffer
space.
sk_add_backlog() is too aggressive in dropping frames in this case,
so relax it's requirements by using the receive buffer plus the send
buffer limit as the backlog limit instead of just the former.
Also from Eric Dumazet.
13) Add ipv6 support to L2TP, from Benjamin LaHaise, James Chapman, and
Chris Elston.
14) Implement TCP early retransmit (RFC 5827), from Yuchung Cheng.
Basically, we can start fast retransmit before hiting the dupack
threshold under certain conditions.
15) New CODEL active queue management packet scheduler, from Eric
Dumazet based upon initial work by Dave Taht.
Basically, the big feature is that packets are dropped (or ECN bits
are set) based upon how long packets live in the queue, rather than
the queue length (which is what RED uses).
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1341 commits)
drivers/net/stmmac: seq_file fix memory leak
ipv6/exthdrs: strict Pad1 and PadN check
USB: qmi_wwan: Add ZTE (Vodafone) K3520-Z
USB: qmi_wwan: Add ZTE (Vodafone) K3765-Z
USB: qmi_wwan: Make forced int 4 whitelist generic
net/ipv4: replace simple_strtoul with kstrtoul
net/ipv4/ipconfig: neaten __setup placement
net: qmi_wwan: Add Vodafone/Huawei K5005 support
net: cdc_ether: Add ZTE WWAN matches before generic Ethernet
ipv6: use skb coalescing in reassembly
ipv4: use skb coalescing in defragmentation
net: introduce skb_try_coalesce()
net:ipv6:fixed space issues relating to operators.
net:ipv6:fixed a trailing white space issue.
ipv6: disable GSO on sockets hitting dst_allfrag
tg3: use netdev_alloc_frag() API
net: napi_frags_skb() is static
ppp: avoid false drop_monitor false positives
ipv6: bool/const conversions phase2
ipx: Remove spurious NULL checking in ipx_ioctl().
...
This branch simplifies and clarifies the dcache lookup, and allows us to
do certain nice optimizations when comparing dentries. It also cleans
up the interface to __d_lookup_rcu(), especially around passing the
inode information around.
* dentry-cleanups:
vfs: make it possible to access the dentry hash/len as one 64-bit entry
vfs: move dentry name length comparison from dentry_cmp() into callers
vfs: do the careful dentry name access for all dentry_cmp cases
vfs: remove unnecessary d_unhashed() check from __d_lookup_rcu
vfs: clean up __d_lookup_rcu() and dentry_cmp() interfaces
This teaches vfs_fstat() to use the appropriate f[get|put]_light
functions, allowing it to avoid some unnecessary locking for the common
case.
More noticeably, it also cleans up and simplifies the "getname_flags()"
function, which now relies on the architecture strncpy_from_user() doing
all the user access checks properly, instead of hacking around the fact
that on x86 it didn't use to do it right (see commit 92ae03f2ef: "x86:
merge 32/64-bit versions of 'strncpy_from_user()' and speed it up").
* vfs-cleanups:
VFS: make vfs_fstat() use f[get|put]_light()
VFS: clean up and simplify getname_flags()
x86: make word-at-a-time strncpy_from_user clear bytes at the end
To enable easy tracing of the location of log forces and the
frequency of them via perf, add a pair of trace points to the log
force functions. This will help debug where excessive log forces
are being issued from by simple perf commands like:
# ~/perf/perf top -e xfs:xfs_log_force -G -U
Which gives this sort of output:
Events: 141 xfs:xfs_log_force
- 100.00% [kernel] [k] xfs_log_force
- xfs_log_force
87.04% xfsaild
kthread
kernel_thread_helper
- 12.87% xfs_buf_lock
_xfs_buf_find
xfs_buf_get
xfs_trans_get_buf
xfs_da_do_buf
xfs_da_get_buf
xfs_dir2_data_init
xfs_dir2_leaf_addname
xfs_dir_createname
xfs_create
xfs_vn_mknod
xfs_vn_create
vfs_create
do_last.isra.41
path_openat
do_filp_open
do_sys_open
sys_open
system_call_fastpath
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sig.com>
Note xfs_iget can be called while holding a locked agi buffer. If
it goes into memory reclaim then inode teardown may try to lock the
same buffer. Prevent the deadlock by calling radix_tree_preload
with GFP_NOFS.
Signed-off-by: Peter Watkins <treestem@gmail.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
xfstest 270 was causing quota reservations way beyond what was sane
(ten to hundreds of TB) for a 4GB filesystem. There's a sign problem
in the error handling path of xfs_bmapi_reserve_delalloc() because
xfs_trans_unreserve_quota_nblks() simple negates the value passed -
which doesn't work for an unsigned variable. This causes
reservations of close to 2^32 block instead of removing a
reservation of a handful of blocks.
Fix the same problem in the other xfs_trans_unreserve_quota_nblks()
callers where unsigned integer variables are used, too.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
This makes cp_new_stat() a bit more readable, and avoids having to
memset() the whole structure just to fill in a couple of padding fields.
This is another result of me looking at code generation of functions
that show up high on certain kernel profiles, and just going "Oh, let's
just clean that up".
Architectures that don't supply the #define to fill just the padding
fields will still fall back to memset().
* stat-cleanups:
vfs: don't force a big memset of stat data just to clear padding fields
vfs: de-crapify "cp_new_stat()" function
Introduce sysfs infrastructure for exofs cluster filesystem.
Each OSD target shows up as below in the sysfs hierarchy:
/sys/fs/exofs/<osdname>_<partition_id>/devX
Where <osdname>_<partition_id> is the unique identification
of a Superblock.
Where devX: 0 <= X < device_table_size. They are ordered
in device-table order as specified to the mkfs.exofs command
Each OSD device devX has following attributes :
osdname - ReadOnly
systemid - ReadOnly
uri - Read/Write
It is up to user-mode to update devX/uri for support of
autologin.
These sysfs information are used both for autologin as well
as support for exporting exofs via a pNFSD server in user-mode.
(.eg NFS-Ganesha)
Signed-off-by: Sachin Bhamare <sbhamare@panasas.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Richard removed the "dtype" hint, but few commentaries were left and this patch
removes them. I've also added a better description about the "dtype" field in
the ubi-user.h for people who may ever wonder what was that dtype thing about.
This patch also adds an important note that it is better to use value "3" for
the "dtype" field.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
We do not need this feature and to our shame it even was not working
and there was a bug found very recently.
-- Artem Bityutskiy
Without the data type hint UBI2 (fastmap) will be easier to implement.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
UBIFS leaks memory on error path in 'mount_ubifs()'. In case of failure in
'ubifs_fixup_free_space()', it does not call 'ubifs_lpt_free()' whereas LPT
data structures can potentially be allocated. The amount of memory leaked can
be quite high -- see 'ubifs_lpt_init()'.
The bug was introduced when moving the LPT initialisation earlier in the
mount process (commit '781c5717a95a74b294beb38b8276943b0f8b5bb4').
Signed-off-by: Sidney Amani <seed95@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Most functions in UBIFS follow the following designn pattern: if the function
allocates multiple resources, and failss at some point, it frees what it has
allocated and returns an error. So the caller can rely on the fact that the
callee has cleaned up everything after own failure.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Acked-by: Sidney Amani <seed95@gmail.com>
If at exofs_fill_super() we had an early termination
do to any error, like an IO error while reading the
super-block. We would crash inside exofs_free_sbi().
This is because sbi->oc.numdevs was set to 1, before
we actually have a device table at all.
Fix it by moving the sbi->oc.numdevs = 1 to after the
allocation of the device table.
Reported-by: Johannes Schild <JSchild@gmx.de>
Stable: This is a bug since v3.2.0
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
The "invalid layout" class of errors is handled by destroying the layout and
getting a new layout from the server. Currently, the layout must be
destroyed before a new layout can be obtained.
This means that all references (e.g.lsegs) to the "to be destroyed" layout
header must be dropped before it can be destroyed. This in turn means waiting
for all in flight RPC's using the old layout as well as draining the data
server session slot table wait queue.
Set the NFS_LAYOUT_INVALID flag to redirect I/O to the MDS while waiting for
the old layout to be destroyed.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
When the last DS io is processed, the data server client record will be
freed.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Prepare to put a dis-connected DS client record.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Let the MDS know that you are redirecting I/O from pNFS to MDS.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The DS has a connection error (invalid deviceid). Drain the fore channel
slot table waitq.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tasks sleeping on the slot table waitq wake to the rpc_prepare_task state.
Reset the task for io through the MDS if the deviceid is invalid.
The reset functions put the io pages through the pageio layer which has the
advantage of re-coalescing which allows for the MDS and DS having different
r/wsizes. Exit the awakened task without executing the rpc_call_done routine.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Replaced by filelayout_reset_write and filelayout_reset_read
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This prevents the use of any layout for i/o that references the deviceid.
I/O is redirected through the MDS.
Redirect the unhandled failed I/O to the MDS without marking either the
layout or the deviceid invalid.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Set the recovery parameters for data servers.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
RPC_TASK_SOFTCONN returns connection errors to the caller which allows the pNFS
file layout to quickly try the MDS or perhaps another DS.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The invalid layout bits are should only be used to block LAYOUTGETs.
Do not invalidate a layout on deviceid invalidation.
Do not invalidate a layout on un-handled READ, WRITE, COMMIT errors.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Move the invalid deviceid test into nfs4_fl_prepare_ds, called by the
filelayout read, write, and commit routines. NFS4_DEVICE_ID_NEG_ENTRY
is no longer needed.
Remove redundant printk's - filelayout_mark_devid_invalid prints a KERN_WARNING.
An invalid device prevents pNFS io.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Simplified error gotos to make it slightly easier to read,
it doesn't affect the functionality of the routine.
Signed-off-by: Matthew Treinish <treinish@linux.vnet.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Pull block layer fixes from Jens Axboe:
"A few small, but important fixes. Most of them are marked for stable
as well
- Fix failure to release a semaphore on error path in mtip32xx.
- Fix crashable condition in bio_get_nr_vecs().
- Don't mark end-of-disk buffers as mapped, limit it to i_size.
- Fix for build problem with CONFIG_BLOCK=n on arm at least.
- Fix for a buffer overlow on UUID partition printing.
- Trivial removal of unused variables in dac960."
* 'for-linus' of git://git.kernel.dk/linux-block:
block: fix buffer overflow when printing partition UUIDs
Fix blkdev.h build errors when BLOCK=n
bio allocation failure due to bio_get_nr_vecs()
block: don't mark buffers beyond end of disk as mapped
mtip32xx: release the semaphore on an error path
dac960: Remove unused variables from DAC960_CreateProcEntries()
Merge misc fixes from Andrew Morton.
* emailed from Andrew Morton <akpm@linux-foundation.org>: (4 patches)
frv: delete incorrect task prototypes causing compile fail
slub: missing test for partial pages flush work in flush_all()
fs, proc: fix ABBA deadlock in case of execution attempt of map_files/ entries
drivers/rtc/rtc-pl031.c: configure correct wday for 2000-01-01
Instead of doing the i_mode calculations at proc_fd_instantiate() time,
move them into tid_fd_revalidate(), which is where the other inode state
(notably uid/gid information) is updated too.
Otherwise we'll end up with stale i_mode information if an fd is re-used
while the dentry still hangs around. Not that anything really *cares*
(symlink permissions don't really matter), but Tetsuo Handa noticed that
the owner read/write bits don't always match the state of the
readability of the file descriptor, and we _used_ to get this right a
long time ago in a galaxy far, far away.
Besides, aside from fixing an ugly detail (that has apparently been this
way since commit 61a2878402: "proc: Remove the hard coded inode
numbers" in 2006), this removes more lines of code than it adds. And it
just makes sense to update i_mode in the same place we update i_uid/gid.
Al Viro correctly points out that we could just do the inode fill in the
inode iops ->getattr() function instead. However, that does require
somewhat slightly more invasive changes, and adds yet *another* lookup
of the file descriptor. We need to do the revalidate() for other
reasons anyway, and have the file descriptor handy, so we might as well
fill in the information at this point.
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
map_files/ entries are never supposed to be executed, still curious
minds might try to run them, which leads to the following deadlock
======================================================
[ INFO: possible circular locking dependency detected ]
3.4.0-rc4-24406-g841e6a6 #121 Not tainted
-------------------------------------------------------
bash/1556 is trying to acquire lock:
(&sb->s_type->i_mutex_key#8){+.+.+.}, at: do_lookup+0x267/0x2b1
but task is already holding lock:
(&sig->cred_guard_mutex){+.+.+.}, at: prepare_bprm_creds+0x2d/0x69
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&sig->cred_guard_mutex){+.+.+.}:
validate_chain+0x444/0x4f4
__lock_acquire+0x387/0x3f8
lock_acquire+0x12b/0x158
__mutex_lock_common+0x56/0x3a9
mutex_lock_killable_nested+0x40/0x45
lock_trace+0x24/0x59
proc_map_files_lookup+0x5a/0x165
__lookup_hash+0x52/0x73
do_lookup+0x276/0x2b1
walk_component+0x3d/0x114
do_last+0xfc/0x540
path_openat+0xd3/0x306
do_filp_open+0x3d/0x89
do_sys_open+0x74/0x106
sys_open+0x21/0x23
tracesys+0xdd/0xe2
-> #0 (&sb->s_type->i_mutex_key#8){+.+.+.}:
check_prev_add+0x6a/0x1ef
validate_chain+0x444/0x4f4
__lock_acquire+0x387/0x3f8
lock_acquire+0x12b/0x158
__mutex_lock_common+0x56/0x3a9
mutex_lock_nested+0x40/0x45
do_lookup+0x267/0x2b1
walk_component+0x3d/0x114
link_path_walk+0x1f9/0x48f
path_openat+0xb6/0x306
do_filp_open+0x3d/0x89
open_exec+0x25/0xa0
do_execve_common+0xea/0x2f9
do_execve+0x43/0x45
sys_execve+0x43/0x5a
stub_execve+0x6c/0xc0
This is because prepare_bprm_creds grabs task->signal->cred_guard_mutex
and when do_lookup happens we try to grab task->signal->cred_guard_mutex
again in lock_trace.
Fix it using plain ptrace_may_access() helper in proc_map_files_lookup()
and in proc_map_files_readdir() instead of lock_trace(), the caller must
be CAP_SYS_ADMIN granted anyway.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is now straightforward: just introduce a module parameter and pass
the needed value to persistent_ram_new().
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Marco Stornelli <marco.stornelli@gmail.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The patch switches pstore RAM backend to use persistent_ram routines,
one step closer to the ECC support.
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>