Wait until all users have closed their device context before allowing
device unregistration to complete. This prevents a crash caused by
referring to stale data structures.
A better solution would be to have a way to revoke contexts rather
than waiting for userspace to close the context, but that's a much
bigger change that will have to wait. For now let's at least avoid
the crash.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
IB/iser: iSER Kconfig and Makefile
IB/iser: iSER handling of memory for RDMA
IB/iser: iSER RDMA CM (CMA) and IB verbs interaction
IB/iser: iSER initiator iSCSI PDU and TX/RX
IB/iser: iSCSI iSER transport provider high level code
IB/iser: iSCSI iSER transport provider header file
IB/uverbs: Remove unnecessary list_del()s
IB/uverbs: Don't free wr list when it's known to be empty
Extend the get_sb() filesystem operation to take an extra argument that
permits the VFS to pass in the target vfsmount that defines the mountpoint.
The filesystem is then required to manually set the superblock and root dentry
pointers. For most filesystems, this should be done with simple_set_mnt()
which will set the superblock pointer and then set the root dentry to the
superblock's s_root (as per the old default behaviour).
The get_sb() op now returns an integer as there's now no need to return the
superblock pointer.
This patch permits a superblock to be implicitly shared amongst several mount
points, such as can be done with NFS to avoid potential inode aliasing. In
such a case, simple_set_mnt() would not be called, and instead the mnt_root
and mnt_sb would be set directly.
The patch also makes the following changes:
(*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
pointer argument and return an integer, so most filesystems have to change
very little.
(*) If one of the convenience function is not used, then get_sb() should
normally call simple_set_mnt() to instantiate the vfsmount. This will
always return 0, and so can be tail-called from get_sb().
(*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
dcache upon superblock destruction rather than shrink_dcache_anon().
This is required because the superblock may now have multiple trees that
aren't actually bound to s_root, but that still need to be cleaned up. The
currently called functions assume that the whole tree is rooted at s_root,
and that anonymous dentries are not the roots of trees which results in
dentries being left unculled.
However, with the way NFS superblock sharing are currently set to be
implemented, these assumptions are violated: the root of the filesystem is
simply a dummy dentry and inode (the real inode for '/' may well be
inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
with child trees.
[*] Anonymous until discovered from another tree.
(*) The documentation has been adjusted, including the additional bit of
changing ext2_* into foo_* in the documentation.
[akpm@osdl.org: convert ipath_fs, do other stuff]
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In ib_uverbs_cleanup_ucontext(), when iterating through the lists of
objects, there's no reason to do list_del() to remove the objects,
since both the objects and the lists that contain them are about to be
freed anyway. Since list_del() is a moderately big inline function,
getting rid of this extra work saves quite a bit of .text:
add/remove: 0/0 grow/shrink: 1/2 up/down: 3/-217 (-214)
function old new delta
ib_uverbs_comp_handler 225 228 +3
ib_uverbs_async_handler 256 255 -1
ib_uverbs_close 905 689 -216
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex. This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.
Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.
This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention. However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.
Surprisingly, these changes even shrink the object code:
add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add support to uverbs to handle querying userspace SRQs (shared
receive queues), including adding an ABI for marshalling requests and
responses. The kernel midlayer already has the underlying
ib_query_srq() function.
Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add support to uverbs to handle querying userspace QPs (queue pairs),
including adding an ABI for marshalling requests and responses. The
kernel midlayer already has the underlying ib_query_qp() function.
Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add support to uverbs to handle resizing userspace CQs (completion
queues), including adding an ABI for marshalling requests and
responses. The kernel midlayer already has ib_resize_cq().
Signed-off-by: Roland Dreier <rolandd@cisco.com>
uverbs might schedule work to clean up when a file is closed. Make
sure that this work runs before allowing module text to go away.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
semaphore to mutex conversion by Ingo and Arjan's script.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
[ Sanity-checked on real IB hardware ]
Signed-off-by: Roland Dreier <rolandd@cisco.com>
uverbs needs to track which multicast groups is each qp
attached to, in order to properly detach when cleanup
is performed on device file close.
Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Replace kmalloc()+memset(,0,) with kzalloc(), for a net savings of 35
source lines and about 500 bytes of text.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Userspace CQs that have no completion event channel attached end up
with their cq_context set to NULL. However, asynchronous events like
"CQ overrun" can still occur on such CQs, so add a uverbs_file member
to struct ib_ucq_object that we can follow to deliver these events.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Move ib_uverbs module to using cdev_alloc() and class_device_create()
so that we can handle device lifetime properly. Now we can make sure
we keep all of our data structures around until the last way to reach
them is gone.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add idr_destroy() calls to the module_exit() functions of the four IB
driver modules that use idrs, so we don't leak idr_layer_cache objects
when these modules are unloaded.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add kernel support for userspace calling poll CQ, request CQ
notification, post send, post receive, post SRQ receive, create AH and
destroy AH commands. These commands allow us to support userspace
verbs for devices that can't perform these operations directly from
userspace (eg the PathScale HCA).
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Give each device a uverbs_cmd_mask, so that a low-level driver can
control which methods may be called on behalf of userspace.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add abi_version attribute to uverbs class devices to allow for
ABI versioning of device-specific interfaces.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Introduce new userspace verbs ABI version 3. This eliminates some
unneeded commands, and adds support for user-created completion
channels. This cleans up problems with file leaks on error paths, and
also makes sure that file descriptors are always installed into the
correct process.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Al Viro pointed out that the current IB userspace verbs interface
allows userspace to cause mischief by closing file descriptors before
we're ready, or issuing the same command twice at the same time. This
patch closes those races, and fixes other obvious problems such as a
module reference leak.
Some other interface bogosities will require an ABI change to fix
properly, so I'm deferring those fixes until 2.6.15.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
completion events after destroying a CQ, QP or SRQ. We do this by
sweeping the event lists before returning from a destroy calls, and
then return the number of events already reported before the destroy
call. This allows userspace wait until it has processed all events
for an object returned from the kernel before it frees its context for
the object.
The ABI of the destroy CQ, destroy QP and destroy SRQ commands has to
change to return the event count, so bump the ABI version from 1 to 2.
The userspace libibverbs library has already been updated to handle
both the old and new ABI versions.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add SRQ support to userspace verbs module. This adds several commands
and associated structures, but it's OK to do this without bumping the
ABI version because the commands are added at the end of the list so
they don't change the existing numbering. There are two cases to
worry about:
1. New kernel, old userspace. This is OK because old userspace simply
won't try to use the new SRQ commands. None of the old commands are
changed.
2. Old kernel, new userspace. This works perfectly as long as
userspace doesn't try to use SRQ commands. If userspace tries to
use SRQ commands, it will get EINVAL, which is perfectly
reasonable: the kernel doesn't support SRQs, so we couldn't do any
better.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Make some lawyers happy and add copyright notices for people who
forgot to include them when they actually touched the code.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix a use-after-free bug in userspace verbs cleanup: we can't touch
mr->device after we free mr by calling ib_dereg_mr().
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add support for O_ASYNC notifications on userspace verbs
completion and asynchronous event file descriptors.
Signed-off-by: Gleb Natapov <glebn@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add the core of the InfiniBand userspace verbs implementation, including
creating character device nodes, dispatching requests from userspace, and
passing event notifications back up to userspace.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>