2009-04-03 09:42:35 -06:00
|
|
|
====================================
|
|
|
|
SLOW WORK ITEM EXECUTION THREAD POOL
|
|
|
|
====================================
|
|
|
|
|
|
|
|
By: David Howells <dhowells@redhat.com>
|
|
|
|
|
|
|
|
The slow work item execution thread pool is a pool of threads for performing
|
|
|
|
things that take a relatively long time, such as making mkdir calls.
|
|
|
|
Typically, when processing something, these items will spend a lot of time
|
|
|
|
blocking a thread on I/O, thus making that thread unavailable for doing other
|
|
|
|
work.
|
|
|
|
|
|
|
|
The standard workqueue model is unsuitable for this class of work item as that
|
|
|
|
limits the owner to a single thread or a single thread per CPU. For some
|
|
|
|
tasks, however, more threads - or fewer - are required.
|
|
|
|
|
|
|
|
There is just one pool per system. It contains no threads unless something
|
|
|
|
wants to use it - and that something must register its interest first. When
|
|
|
|
the pool is active, the number of threads it contains is dynamic, varying
|
|
|
|
between a maximum and minimum setting, depending on the load.
|
|
|
|
|
|
|
|
|
|
|
|
====================
|
|
|
|
CLASSES OF WORK ITEM
|
|
|
|
====================
|
|
|
|
|
|
|
|
This pool support two classes of work items:
|
|
|
|
|
|
|
|
(*) Slow work items.
|
|
|
|
|
|
|
|
(*) Very slow work items.
|
|
|
|
|
|
|
|
The former are expected to finish much quicker than the latter.
|
|
|
|
|
|
|
|
An operation of the very slow class may do a batch combination of several
|
|
|
|
lookups, mkdirs, and a create for instance.
|
|
|
|
|
|
|
|
An operation of the ordinarily slow class may, for example, write stuff or
|
|
|
|
expand files, provided the time taken to do so isn't too long.
|
|
|
|
|
|
|
|
Operations of both types may sleep during execution, thus tying up the thread
|
|
|
|
loaned to it.
|
|
|
|
|
2009-11-19 11:10:47 -07:00
|
|
|
A further class of work item is available, based on the slow work item class:
|
|
|
|
|
|
|
|
(*) Delayed slow work items.
|
|
|
|
|
|
|
|
These are slow work items that have a timer to defer queueing of the item for
|
|
|
|
a while.
|
|
|
|
|
2009-04-03 09:42:35 -06:00
|
|
|
|
|
|
|
THREAD-TO-CLASS ALLOCATION
|
|
|
|
--------------------------
|
|
|
|
|
|
|
|
Not all the threads in the pool are available to work on very slow work items.
|
|
|
|
The number will be between one and one fewer than the number of active threads.
|
|
|
|
This is configurable (see the "Pool Configuration" section).
|
|
|
|
|
|
|
|
All the threads are available to work on ordinarily slow work items, but a
|
|
|
|
percentage of the threads will prefer to work on very slow work items.
|
|
|
|
|
|
|
|
The configuration ensures that at least one thread will be available to work on
|
|
|
|
very slow work items, and at least one thread will be available that won't work
|
|
|
|
on very slow work items at all.
|
|
|
|
|
|
|
|
|
|
|
|
=====================
|
|
|
|
USING SLOW WORK ITEMS
|
|
|
|
=====================
|
|
|
|
|
|
|
|
Firstly, a module or subsystem wanting to make use of slow work items must
|
|
|
|
register its interest:
|
|
|
|
|
2009-11-19 11:10:23 -07:00
|
|
|
int ret = slow_work_register_user(struct module *module);
|
2009-04-03 09:42:35 -06:00
|
|
|
|
2009-11-19 11:10:23 -07:00
|
|
|
This will return 0 if successful, or a -ve error upon failure. The module
|
|
|
|
pointer should be the module interested in using this facility (almost
|
|
|
|
certainly THIS_MODULE).
|
2009-04-03 09:42:35 -06:00
|
|
|
|
|
|
|
|
|
|
|
Slow work items may then be set up by:
|
|
|
|
|
|
|
|
(1) Declaring a slow_work struct type variable:
|
|
|
|
|
|
|
|
#include <linux/slow-work.h>
|
|
|
|
|
|
|
|
struct slow_work myitem;
|
|
|
|
|
|
|
|
(2) Declaring the operations to be used for this item:
|
|
|
|
|
|
|
|
struct slow_work_ops myitem_ops = {
|
|
|
|
.get_ref = myitem_get_ref,
|
|
|
|
.put_ref = myitem_put_ref,
|
|
|
|
.execute = myitem_execute,
|
|
|
|
};
|
|
|
|
|
|
|
|
[*] For a description of the ops, see section "Item Operations".
|
|
|
|
|
|
|
|
(3) Initialising the item:
|
|
|
|
|
|
|
|
slow_work_init(&myitem, &myitem_ops);
|
|
|
|
|
2009-11-19 11:10:47 -07:00
|
|
|
or:
|
|
|
|
|
|
|
|
delayed_slow_work_init(&myitem, &myitem_ops);
|
|
|
|
|
2009-04-03 09:42:35 -06:00
|
|
|
or:
|
|
|
|
|
|
|
|
vslow_work_init(&myitem, &myitem_ops);
|
|
|
|
|
|
|
|
depending on its class.
|
|
|
|
|
|
|
|
A suitably set up work item can then be enqueued for processing:
|
|
|
|
|
|
|
|
int ret = slow_work_enqueue(&myitem);
|
|
|
|
|
|
|
|
This will return a -ve error if the thread pool is unable to gain a reference
|
2009-11-19 11:10:47 -07:00
|
|
|
on the item, 0 otherwise, or (for delayed work):
|
|
|
|
|
|
|
|
int ret = delayed_slow_work_enqueue(&myitem, my_jiffy_delay);
|
2009-04-03 09:42:35 -06:00
|
|
|
|
|
|
|
|
|
|
|
The items are reference counted, so there ought to be no need for a flush
|
2009-11-19 11:10:43 -07:00
|
|
|
operation. But as the reference counting is optional, means to cancel
|
|
|
|
existing work items are also included:
|
|
|
|
|
|
|
|
cancel_slow_work(&myitem);
|
2009-11-19 11:10:47 -07:00
|
|
|
cancel_delayed_slow_work(&myitem);
|
2009-11-19 11:10:43 -07:00
|
|
|
|
|
|
|
can be used to cancel pending work. The above cancel function waits for
|
|
|
|
existing work to have been executed (or prevent execution of them, depending
|
|
|
|
on timing).
|
|
|
|
|
|
|
|
|
|
|
|
When all a module's slow work items have been processed, and the
|
2009-04-03 09:42:35 -06:00
|
|
|
module has no further interest in the facility, it should unregister its
|
|
|
|
interest:
|
|
|
|
|
2009-11-19 11:10:23 -07:00
|
|
|
slow_work_unregister_user(struct module *module);
|
|
|
|
|
|
|
|
The module pointer is used to wait for all outstanding work items for that
|
|
|
|
module before completing the unregistration. This prevents the put_ref() code
|
|
|
|
from being taken away before it completes. module should almost certainly be
|
|
|
|
THIS_MODULE.
|
2009-04-03 09:42:35 -06:00
|
|
|
|
|
|
|
|
2009-11-19 11:10:53 -07:00
|
|
|
================
|
|
|
|
HELPER FUNCTIONS
|
|
|
|
================
|
|
|
|
|
|
|
|
The slow-work facility provides a function by which it can be determined
|
|
|
|
whether or not an item is queued for later execution:
|
|
|
|
|
|
|
|
bool queued = slow_work_is_queued(struct slow_work *work);
|
|
|
|
|
|
|
|
If it returns false, then the item is not on the queue (it may be executing
|
|
|
|
with a requeue pending). This can be used to work out whether an item on which
|
|
|
|
another depends is on the queue, thus allowing a dependent item to be queued
|
|
|
|
after it.
|
|
|
|
|
SLOW_WORK: Allow a requeueable work item to sleep till the thread is needed
Add a function to allow a requeueable work item to sleep till the thread
processing it is needed by the slow-work facility to perform other work.
Sometimes a work item can't progress immediately, but must wait for the
completion of another work item that's currently being processed by another
slow-work thread.
In some circumstances, the waiting item could instead - theoretically - put
itself back on the queue and yield its thread back to the slow-work facility,
thus waiting till it gets processing time again before attempting to progress.
This would allow other work items processing time on that thread.
However, this only works if there is something on the queue for it to queue
behind - otherwise it will just get a thread again immediately, and will end
up cycling between the queue and the thread, eating up valuable CPU time.
So, slow_work_sleep_till_thread_needed() is provided such that an item can put
itself on a wait queue that will wake it up when the event it is actually
interested in occurs, then call this function in lieu of calling schedule().
This function will then sleep until either the item's event occurs or another
work item appears on the queue. If another work item is queued, but the
item's event hasn't occurred, then the work item should requeue itself and
yield the thread back to the slow-work facility by returning.
This can be used by CacheFiles for an object that is being created on one
thread to wait for an object being deleted on another thread where there is
nothing on the queue for the creation to go and wait behind. As soon as an
item appears on the queue that could be given thread time instead, CacheFiles
can stick the creating object back on the queue and return to the slow-work
facility - assuming the object deletion didn't also complete.
Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-19 11:10:57 -07:00
|
|
|
If the above shows an item on which another depends not to be queued, then the
|
|
|
|
owner of the dependent item might need to wait. However, to avoid locking up
|
|
|
|
the threads unnecessarily be sleeping in them, it can make sense under some
|
|
|
|
circumstances to return the work item to the queue, thus deferring it until
|
|
|
|
some other items have had a chance to make use of the yielded thread.
|
|
|
|
|
|
|
|
To yield a thread and defer an item, the work function should simply enqueue
|
|
|
|
the work item again and return. However, this doesn't work if there's nothing
|
|
|
|
actually on the queue, as the thread just vacated will jump straight back into
|
|
|
|
the item's work function, thus busy waiting on a CPU.
|
|
|
|
|
|
|
|
Instead, the item should use the thread to wait for the dependency to go away,
|
|
|
|
but rather than using schedule() or schedule_timeout() to sleep, it should use
|
|
|
|
the following function:
|
|
|
|
|
|
|
|
bool requeue = slow_work_sleep_till_thread_needed(
|
|
|
|
struct slow_work *work,
|
|
|
|
signed long *_timeout);
|
|
|
|
|
|
|
|
This will add a second wait and then sleep, such that it will be woken up if
|
|
|
|
either something appears on the queue that could usefully make use of the
|
|
|
|
thread - and behind which this item can be queued, or if the event the caller
|
|
|
|
set up to wait for happens. True will be returned if something else appeared
|
|
|
|
on the queue and this work function should perhaps return, of false if
|
|
|
|
something else woke it up. The timeout is as for schedule_timeout().
|
|
|
|
|
|
|
|
For example:
|
|
|
|
|
|
|
|
wq = bit_waitqueue(&my_flags, MY_BIT);
|
|
|
|
init_wait(&wait);
|
|
|
|
requeue = false;
|
|
|
|
do {
|
|
|
|
prepare_to_wait(wq, &wait, TASK_UNINTERRUPTIBLE);
|
|
|
|
if (!test_bit(MY_BIT, &my_flags))
|
|
|
|
break;
|
|
|
|
requeue = slow_work_sleep_till_thread_needed(&my_work,
|
|
|
|
&timeout);
|
|
|
|
} while (timeout > 0 && !requeue);
|
|
|
|
finish_wait(wq, &wait);
|
|
|
|
if (!test_bit(MY_BIT, &my_flags)
|
|
|
|
goto do_my_thing;
|
|
|
|
if (requeue)
|
|
|
|
return; // to slow_work
|
|
|
|
|
2009-11-19 11:10:53 -07:00
|
|
|
|
2009-04-03 09:42:35 -06:00
|
|
|
===============
|
|
|
|
ITEM OPERATIONS
|
|
|
|
===============
|
|
|
|
|
|
|
|
Each work item requires a table of operations of type struct slow_work_ops.
|
2009-11-19 11:10:51 -07:00
|
|
|
Only ->execute() is required; the getting and putting of a reference and the
|
|
|
|
describing of an item are all optional.
|
2009-04-03 09:42:35 -06:00
|
|
|
|
|
|
|
(*) Get a reference on an item:
|
|
|
|
|
|
|
|
int (*get_ref)(struct slow_work *work);
|
|
|
|
|
|
|
|
This allows the thread pool to attempt to pin an item by getting a
|
|
|
|
reference on it. This function should return 0 if the reference was
|
|
|
|
granted, or a -ve error otherwise. If an error is returned,
|
|
|
|
slow_work_enqueue() will fail.
|
|
|
|
|
|
|
|
The reference is held whilst the item is queued and whilst it is being
|
|
|
|
executed. The item may then be requeued with the same reference held, or
|
|
|
|
the reference will be released.
|
|
|
|
|
|
|
|
(*) Release a reference on an item:
|
|
|
|
|
|
|
|
void (*put_ref)(struct slow_work *work);
|
|
|
|
|
|
|
|
This allows the thread pool to unpin an item by releasing the reference on
|
|
|
|
it. The thread pool will not touch the item again once this has been
|
|
|
|
called.
|
|
|
|
|
|
|
|
(*) Execute an item:
|
|
|
|
|
|
|
|
void (*execute)(struct slow_work *work);
|
|
|
|
|
|
|
|
This should perform the work required of the item. It may sleep, it may
|
|
|
|
perform disk I/O and it may wait for locks.
|
|
|
|
|
2009-11-19 11:10:51 -07:00
|
|
|
(*) View an item through /proc:
|
|
|
|
|
|
|
|
void (*desc)(struct slow_work *work, struct seq_file *m);
|
|
|
|
|
|
|
|
If supplied, this should print to 'm' a small string describing the work
|
|
|
|
the item is to do. This should be no more than about 40 characters, and
|
|
|
|
shouldn't include a newline character.
|
|
|
|
|
|
|
|
See the 'Viewing executing and queued items' section below.
|
|
|
|
|
2009-04-03 09:42:35 -06:00
|
|
|
|
|
|
|
==================
|
|
|
|
POOL CONFIGURATION
|
|
|
|
==================
|
|
|
|
|
|
|
|
The slow-work thread pool has a number of configurables:
|
|
|
|
|
|
|
|
(*) /proc/sys/kernel/slow-work/min-threads
|
|
|
|
|
|
|
|
The minimum number of threads that should be in the pool whilst it is in
|
|
|
|
use. This may be anywhere between 2 and max-threads.
|
|
|
|
|
|
|
|
(*) /proc/sys/kernel/slow-work/max-threads
|
|
|
|
|
|
|
|
The maximum number of threads that should in the pool. This may be
|
|
|
|
anywhere between min-threads and 255 or NR_CPUS * 2, whichever is greater.
|
|
|
|
|
|
|
|
(*) /proc/sys/kernel/slow-work/vslow-percentage
|
|
|
|
|
|
|
|
The percentage of active threads in the pool that may be used to execute
|
|
|
|
very slow work items. This may be between 1 and 99. The resultant number
|
|
|
|
is bounded to between 1 and one fewer than the number of active threads.
|
|
|
|
This ensures there is always at least one thread that can process very
|
|
|
|
slow work items, and always at least one thread that won't.
|
2009-11-19 11:10:51 -07:00
|
|
|
|
|
|
|
|
|
|
|
==================================
|
|
|
|
VIEWING EXECUTING AND QUEUED ITEMS
|
|
|
|
==================================
|
|
|
|
|
2009-12-01 08:36:11 -07:00
|
|
|
If CONFIG_SLOW_WORK_DEBUG is enabled, a debugfs file is made available:
|
2009-11-19 11:10:51 -07:00
|
|
|
|
2009-12-01 08:36:11 -07:00
|
|
|
/sys/kernel/debug/slow_work/runqueue
|
2009-11-19 11:10:51 -07:00
|
|
|
|
|
|
|
through which the list of work items being executed and the queues of items to
|
|
|
|
be executed may be viewed. The owner of a work item is given the chance to
|
|
|
|
add some information of its own.
|
|
|
|
|
|
|
|
The contents look something like the following:
|
|
|
|
|
|
|
|
THR PID ITEM ADDR FL MARK DESC
|
|
|
|
=== ===== ================ == ===== ==========
|
|
|
|
0 3005 ffff880023f52348 a 952ms FSC: OBJ17d3: LOOK
|
|
|
|
1 3006 ffff880024e33668 2 160ms FSC: OBJ17e5 OP60d3b: Write1/Store fl=2
|
|
|
|
2 3165 ffff8800296dd180 a 424ms FSC: OBJ17e4: LOOK
|
|
|
|
3 4089 ffff8800262c8d78 a 212ms FSC: OBJ17ea: CRTN
|
|
|
|
4 4090 ffff88002792bed8 2 388ms FSC: OBJ17e8 OP60d36: Write1/Store fl=2
|
|
|
|
5 4092 ffff88002a0ef308 2 388ms FSC: OBJ17e7 OP60d2e: Write1/Store fl=2
|
|
|
|
6 4094 ffff88002abaf4b8 2 132ms FSC: OBJ17e2 OP60d4e: Write1/Store fl=2
|
|
|
|
7 4095 ffff88002bb188e0 a 388ms FSC: OBJ17e9: CRTN
|
|
|
|
vsq - ffff880023d99668 1 308ms FSC: OBJ17e0 OP60f91: Write1/EnQ fl=2
|
|
|
|
vsq - ffff8800295d1740 1 212ms FSC: OBJ16be OP4d4b6: Write1/EnQ fl=2
|
|
|
|
vsq - ffff880025ba3308 1 160ms FSC: OBJ179a OP58dec: Write1/EnQ fl=2
|
|
|
|
vsq - ffff880024ec83e0 1 160ms FSC: OBJ17ae OP599f2: Write1/EnQ fl=2
|
|
|
|
vsq - ffff880026618e00 1 160ms FSC: OBJ17e6 OP60d33: Write1/EnQ fl=2
|
|
|
|
vsq - ffff880025a2a4b8 1 132ms FSC: OBJ16a2 OP4d583: Write1/EnQ fl=2
|
|
|
|
vsq - ffff880023cbe6d8 9 212ms FSC: OBJ17eb: LOOK
|
|
|
|
vsq - ffff880024d37590 9 212ms FSC: OBJ17ec: LOOK
|
|
|
|
vsq - ffff880027746cb0 9 212ms FSC: OBJ17ed: LOOK
|
|
|
|
vsq - ffff880024d37ae8 9 212ms FSC: OBJ17ee: LOOK
|
|
|
|
vsq - ffff880024d37cb0 9 212ms FSC: OBJ17ef: LOOK
|
|
|
|
vsq - ffff880025036550 9 212ms FSC: OBJ17f0: LOOK
|
|
|
|
vsq - ffff8800250368e0 9 212ms FSC: OBJ17f1: LOOK
|
|
|
|
vsq - ffff880025036aa8 9 212ms FSC: OBJ17f2: LOOK
|
|
|
|
|
|
|
|
In the 'THR' column, executing items show the thread they're occupying and
|
|
|
|
queued threads indicate which queue they're on. 'PID' shows the process ID of
|
|
|
|
a slow-work thread that's executing something. 'FL' shows the work item flags.
|
|
|
|
'MARK' indicates how long since an item was queued or began executing. Lastly,
|
|
|
|
the 'DESC' column permits the owner of an item to give some information.
|
|
|
|
|