Add a device-mapper target called dm-switch to provide a multipath
framework for storage arrays that dynamically reconfigure their preferred paths for different device regions. Fix a bug in the verity target that prevented its use with some specific sizes of devices. Improve some locking mechanisms in the device-mapper core and bufio. Add Mike Snitzer as a device-mapper maintainer. A few more clean-ups and fixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJR3ehdAAoJEK2W1qbAHj1nseUP+gPgoX2YTBiKW/fQnbixb11c 0BExXiHtHgVnxQP4aJo8BJRFW9/DAN740UvKb2XjjbNChIQ47j6vOLCCzJ+97wW+ FCJ48pltsacgywvm5e3BbnwmcmpQXKk1Wd+1/9beWbcib9IzVB2B06Esv3HRtQZj cQbIkeeTGbrSnsiAWSQh2xsNqjv1YObUohs43uG+Pa0WmdE1KebAYfkgEvi0b+E6 ehSsvAMqYRgkLvYdYTxRNJtC+H3pkucS6r42Q/tZj2YciU3tc0v6rsFW9Ey+l0E7 c5KaUAKk5e3HAhFvJ4ydlj7r1cu7G49rixIBJ60lX86QBwmZ8js5EEPliw0ZoWI+ av1P+9gLsxaQTH/Cw8jJW4xK7hYAZAvn//iNVBAATATd65nmQImHNWWMjr205Kw9 9XOeFUxAdnM7ITKXJkFf3vH2tFrRAKgXiR57im5ZuLMOFYWjR6EYE870+GCWSya8 Dhzj0Mb8IFHrelEbRWicNbD5IaAxvfQ6/sTvXBiV642jImkQIyIj+PBiIvsq8fTH LKNL1l545R5aOHSU4TXnseq3TcIqElx0KsPTJuZq+q/2UfvMe9Lv9g+ld5CywfH1 1HkEB75yWPvEfOtIac9tzQSt3KnF01fC2QMYZE4rSiYs8KPgln9pxo+UulUaZzId 8Gch3/C5cBBCHjMJtv/b =s5m4 -----END PGP SIGNATURE----- Merge tag 'dm-3.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm Pull device-mapper changes from Alasdair G Kergon: "Add a device-mapper target called dm-switch to provide a multipath framework for storage arrays that dynamically reconfigure their preferred paths for different device regions. Fix a bug in the verity target that prevented its use with some specific sizes of devices. Improve some locking mechanisms in the device-mapper core and bufio. Add Mike Snitzer as a device-mapper maintainer. A few more clean-ups and fixes" * tag 'dm-3.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm: dm: add switch target dm: update maintainers dm: optimize reorder structure dm: optimize use SRCU and RCU dm bufio: submit writes outside lock dm cache: fix arm link errors with inline dm verity: use __ffs and __fls dm flakey: correct ctr alloc failure mesg dm verity: remove pointless comparison dm: use __GFP_HIGHMEM in __vmalloc dm verity: fix inability to use a few specific devices sizes dm ioctl: set noio flag to avoid __vmalloc deadlock dm mpath: fix ioctl deadlock when no paths
This commit is contained in:
commit
9903883f1d
15 changed files with 951 additions and 185 deletions
126
Documentation/device-mapper/switch.txt
Normal file
126
Documentation/device-mapper/switch.txt
Normal file
|
@ -0,0 +1,126 @@
|
|||
dm-switch
|
||||
=========
|
||||
|
||||
The device-mapper switch target creates a device that supports an
|
||||
arbitrary mapping of fixed-size regions of I/O across a fixed set of
|
||||
paths. The path used for any specific region can be switched
|
||||
dynamically by sending the target a message.
|
||||
|
||||
It maps I/O to underlying block devices efficiently when there is a large
|
||||
number of fixed-sized address regions but there is no simple pattern
|
||||
that would allow for a compact representation of the mapping such as
|
||||
dm-stripe.
|
||||
|
||||
Background
|
||||
----------
|
||||
|
||||
Dell EqualLogic and some other iSCSI storage arrays use a distributed
|
||||
frameless architecture. In this architecture, the storage group
|
||||
consists of a number of distinct storage arrays ("members") each having
|
||||
independent controllers, disk storage and network adapters. When a LUN
|
||||
is created it is spread across multiple members. The details of the
|
||||
spreading are hidden from initiators connected to this storage system.
|
||||
The storage group exposes a single target discovery portal, no matter
|
||||
how many members are being used. When iSCSI sessions are created, each
|
||||
session is connected to an eth port on a single member. Data to a LUN
|
||||
can be sent on any iSCSI session, and if the blocks being accessed are
|
||||
stored on another member the I/O will be forwarded as required. This
|
||||
forwarding is invisible to the initiator. The storage layout is also
|
||||
dynamic, and the blocks stored on disk may be moved from member to
|
||||
member as needed to balance the load.
|
||||
|
||||
This architecture simplifies the management and configuration of both
|
||||
the storage group and initiators. In a multipathing configuration, it
|
||||
is possible to set up multiple iSCSI sessions to use multiple network
|
||||
interfaces on both the host and target to take advantage of the
|
||||
increased network bandwidth. An initiator could use a simple round
|
||||
robin algorithm to send I/O across all paths and let the storage array
|
||||
members forward it as necessary, but there is a performance advantage to
|
||||
sending data directly to the correct member.
|
||||
|
||||
A device-mapper table already lets you map different regions of a
|
||||
device onto different targets. However in this architecture the LUN is
|
||||
spread with an address region size on the order of 10s of MBs, which
|
||||
means the resulting table could have more than a million entries and
|
||||
consume far too much memory.
|
||||
|
||||
Using this device-mapper switch target we can now build a two-layer
|
||||
device hierarchy:
|
||||
|
||||
Upper Tier – Determine which array member the I/O should be sent to.
|
||||
Lower Tier – Load balance amongst paths to a particular member.
|
||||
|
||||
The lower tier consists of a single dm multipath device for each member.
|
||||
Each of these multipath devices contains the set of paths directly to
|
||||
the array member in one priority group, and leverages existing path
|
||||
selectors to load balance amongst these paths. We also build a
|
||||
non-preferred priority group containing paths to other array members for
|
||||
failover reasons.
|
||||
|
||||
The upper tier consists of a single dm-switch device. This device uses
|
||||
a bitmap to look up the location of the I/O and choose the appropriate
|
||||
lower tier device to route the I/O. By using a bitmap we are able to
|
||||
use 4 bits for each address range in a 16 member group (which is very
|
||||
large for us). This is a much denser representation than the dm table
|
||||
b-tree can achieve.
|
||||
|
||||
Construction Parameters
|
||||
=======================
|
||||
|
||||
<num_paths> <region_size> <num_optional_args> [<optional_args>...]
|
||||
[<dev_path> <offset>]+
|
||||
|
||||
<num_paths>
|
||||
The number of paths across which to distribute the I/O.
|
||||
|
||||
<region_size>
|
||||
The number of 512-byte sectors in a region. Each region can be redirected
|
||||
to any of the available paths.
|
||||
|
||||
<num_optional_args>
|
||||
The number of optional arguments. Currently, no optional arguments
|
||||
are supported and so this must be zero.
|
||||
|
||||
<dev_path>
|
||||
The block device that represents a specific path to the device.
|
||||
|
||||
<offset>
|
||||
The offset of the start of data on the specific <dev_path> (in units
|
||||
of 512-byte sectors). This number is added to the sector number when
|
||||
forwarding the request to the specific path. Typically it is zero.
|
||||
|
||||
Messages
|
||||
========
|
||||
|
||||
set_region_mappings <index>:<path_nr> [<index>]:<path_nr> [<index>]:<path_nr>...
|
||||
|
||||
Modify the region table by specifying which regions are redirected to
|
||||
which paths.
|
||||
|
||||
<index>
|
||||
The region number (region size was specified in constructor parameters).
|
||||
If index is omitted, the next region (previous index + 1) is used.
|
||||
Expressed in hexadecimal (WITHOUT any prefix like 0x).
|
||||
|
||||
<path_nr>
|
||||
The path number in the range 0 ... (<num_paths> - 1).
|
||||
Expressed in hexadecimal (WITHOUT any prefix like 0x).
|
||||
|
||||
Status
|
||||
======
|
||||
|
||||
No status line is reported.
|
||||
|
||||
Example
|
||||
=======
|
||||
|
||||
Assume that you have volumes vg1/switch0 vg1/switch1 vg1/switch2 with
|
||||
the same size.
|
||||
|
||||
Create a switch device with 64kB region size:
|
||||
dmsetup create switch --table "0 `blockdev --getsize /dev/vg1/switch0`
|
||||
switch 3 128 0 /dev/vg1/switch0 0 /dev/vg1/switch1 0 /dev/vg1/switch2 0"
|
||||
|
||||
Set mappings for the first 7 entries to point to devices switch0, switch1,
|
||||
switch2, switch0, switch1, switch2, switch1:
|
||||
dmsetup message switch 0 set_region_mappings 0:0 :1 :2 :0 :1 :2 :1
|
|
@ -2574,6 +2574,7 @@ S: Maintained
|
|||
|
||||
DEVICE-MAPPER (LVM)
|
||||
M: Alasdair Kergon <agk@redhat.com>
|
||||
M: Mike Snitzer <snitzer@redhat.com>
|
||||
M: dm-devel@redhat.com
|
||||
L: dm-devel@redhat.com
|
||||
W: http://sources.redhat.com/dm
|
||||
|
@ -2585,6 +2586,7 @@ F: drivers/md/dm*
|
|||
F: drivers/md/persistent-data/
|
||||
F: include/linux/device-mapper.h
|
||||
F: include/linux/dm-*.h
|
||||
F: include/uapi/linux/dm-*.h
|
||||
|
||||
DIOLAN U2C-12 I2C DRIVER
|
||||
M: Guenter Roeck <linux@roeck-us.net>
|
||||
|
|
|
@ -412,4 +412,18 @@ config DM_VERITY
|
|||
|
||||
If unsure, say N.
|
||||
|
||||
config DM_SWITCH
|
||||
tristate "Switch target support (EXPERIMENTAL)"
|
||||
depends on BLK_DEV_DM
|
||||
---help---
|
||||
This device-mapper target creates a device that supports an arbitrary
|
||||
mapping of fixed-size regions of I/O across a fixed set of paths.
|
||||
The path used for any specific region can be switched dynamically
|
||||
by sending the target a message.
|
||||
|
||||
To compile this code as a module, choose M here: the module will
|
||||
be called dm-switch.
|
||||
|
||||
If unsure, say N.
|
||||
|
||||
endif # MD
|
||||
|
|
|
@ -40,6 +40,7 @@ obj-$(CONFIG_DM_FLAKEY) += dm-flakey.o
|
|||
obj-$(CONFIG_DM_MULTIPATH) += dm-multipath.o dm-round-robin.o
|
||||
obj-$(CONFIG_DM_MULTIPATH_QL) += dm-queue-length.o
|
||||
obj-$(CONFIG_DM_MULTIPATH_ST) += dm-service-time.o
|
||||
obj-$(CONFIG_DM_SWITCH) += dm-switch.o
|
||||
obj-$(CONFIG_DM_SNAPSHOT) += dm-snapshot.o
|
||||
obj-$(CONFIG_DM_PERSISTENT_DATA) += persistent-data/
|
||||
obj-$(CONFIG_DM_MIRROR) += dm-mirror.o dm-log.o dm-region-hash.o
|
||||
|
|
|
@ -145,6 +145,7 @@ struct dm_buffer {
|
|||
unsigned long state;
|
||||
unsigned long last_accessed;
|
||||
struct dm_bufio_client *c;
|
||||
struct list_head write_list;
|
||||
struct bio bio;
|
||||
struct bio_vec bio_vec[DM_BUFIO_INLINE_VECS];
|
||||
};
|
||||
|
@ -349,7 +350,7 @@ static void *alloc_buffer_data(struct dm_bufio_client *c, gfp_t gfp_mask,
|
|||
if (gfp_mask & __GFP_NORETRY)
|
||||
noio_flag = memalloc_noio_save();
|
||||
|
||||
ptr = __vmalloc(c->block_size, gfp_mask, PAGE_KERNEL);
|
||||
ptr = __vmalloc(c->block_size, gfp_mask | __GFP_HIGHMEM, PAGE_KERNEL);
|
||||
|
||||
if (gfp_mask & __GFP_NORETRY)
|
||||
memalloc_noio_restore(noio_flag);
|
||||
|
@ -630,7 +631,8 @@ static int do_io_schedule(void *word)
|
|||
* - Submit our write and don't wait on it. We set B_WRITING indicating
|
||||
* that there is a write in progress.
|
||||
*/
|
||||
static void __write_dirty_buffer(struct dm_buffer *b)
|
||||
static void __write_dirty_buffer(struct dm_buffer *b,
|
||||
struct list_head *write_list)
|
||||
{
|
||||
if (!test_bit(B_DIRTY, &b->state))
|
||||
return;
|
||||
|
@ -639,7 +641,24 @@ static void __write_dirty_buffer(struct dm_buffer *b)
|
|||
wait_on_bit_lock(&b->state, B_WRITING,
|
||||
do_io_schedule, TASK_UNINTERRUPTIBLE);
|
||||
|
||||
submit_io(b, WRITE, b->block, write_endio);
|
||||
if (!write_list)
|
||||
submit_io(b, WRITE, b->block, write_endio);
|
||||
else
|
||||
list_add_tail(&b->write_list, write_list);
|
||||
}
|
||||
|
||||
static void __flush_write_list(struct list_head *write_list)
|
||||
{
|
||||
struct blk_plug plug;
|
||||
blk_start_plug(&plug);
|
||||
while (!list_empty(write_list)) {
|
||||
struct dm_buffer *b =
|
||||
list_entry(write_list->next, struct dm_buffer, write_list);
|
||||
list_del(&b->write_list);
|
||||
submit_io(b, WRITE, b->block, write_endio);
|
||||
dm_bufio_cond_resched();
|
||||
}
|
||||
blk_finish_plug(&plug);
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -655,7 +674,7 @@ static void __make_buffer_clean(struct dm_buffer *b)
|
|||
return;
|
||||
|
||||
wait_on_bit(&b->state, B_READING, do_io_schedule, TASK_UNINTERRUPTIBLE);
|
||||
__write_dirty_buffer(b);
|
||||
__write_dirty_buffer(b, NULL);
|
||||
wait_on_bit(&b->state, B_WRITING, do_io_schedule, TASK_UNINTERRUPTIBLE);
|
||||
}
|
||||
|
||||
|
@ -802,7 +821,8 @@ static void __free_buffer_wake(struct dm_buffer *b)
|
|||
wake_up(&c->free_buffer_wait);
|
||||
}
|
||||
|
||||
static void __write_dirty_buffers_async(struct dm_bufio_client *c, int no_wait)
|
||||
static void __write_dirty_buffers_async(struct dm_bufio_client *c, int no_wait,
|
||||
struct list_head *write_list)
|
||||
{
|
||||
struct dm_buffer *b, *tmp;
|
||||
|
||||
|
@ -818,7 +838,7 @@ static void __write_dirty_buffers_async(struct dm_bufio_client *c, int no_wait)
|
|||
if (no_wait && test_bit(B_WRITING, &b->state))
|
||||
return;
|
||||
|
||||
__write_dirty_buffer(b);
|
||||
__write_dirty_buffer(b, write_list);
|
||||
dm_bufio_cond_resched();
|
||||
}
|
||||
}
|
||||
|
@ -853,7 +873,8 @@ static void __get_memory_limit(struct dm_bufio_client *c,
|
|||
* If we are over threshold_buffers, start freeing buffers.
|
||||
* If we're over "limit_buffers", block until we get under the limit.
|
||||
*/
|
||||
static void __check_watermark(struct dm_bufio_client *c)
|
||||
static void __check_watermark(struct dm_bufio_client *c,
|
||||
struct list_head *write_list)
|
||||
{
|
||||
unsigned long threshold_buffers, limit_buffers;
|
||||
|
||||
|
@ -872,7 +893,7 @@ static void __check_watermark(struct dm_bufio_client *c)
|
|||
}
|
||||
|
||||
if (c->n_buffers[LIST_DIRTY] > threshold_buffers)
|
||||
__write_dirty_buffers_async(c, 1);
|
||||
__write_dirty_buffers_async(c, 1, write_list);
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -897,7 +918,8 @@ static struct dm_buffer *__find(struct dm_bufio_client *c, sector_t block)
|
|||
*--------------------------------------------------------------*/
|
||||
|
||||
static struct dm_buffer *__bufio_new(struct dm_bufio_client *c, sector_t block,
|
||||
enum new_flag nf, int *need_submit)
|
||||
enum new_flag nf, int *need_submit,
|
||||
struct list_head *write_list)
|
||||
{
|
||||
struct dm_buffer *b, *new_b = NULL;
|
||||
|
||||
|
@ -924,7 +946,7 @@ static struct dm_buffer *__bufio_new(struct dm_bufio_client *c, sector_t block,
|
|||
goto found_buffer;
|
||||
}
|
||||
|
||||
__check_watermark(c);
|
||||
__check_watermark(c, write_list);
|
||||
|
||||
b = new_b;
|
||||
b->hold_count = 1;
|
||||
|
@ -992,10 +1014,14 @@ static void *new_read(struct dm_bufio_client *c, sector_t block,
|
|||
int need_submit;
|
||||
struct dm_buffer *b;
|
||||
|
||||
LIST_HEAD(write_list);
|
||||
|
||||
dm_bufio_lock(c);
|
||||
b = __bufio_new(c, block, nf, &need_submit);
|
||||
b = __bufio_new(c, block, nf, &need_submit, &write_list);
|
||||
dm_bufio_unlock(c);
|
||||
|
||||
__flush_write_list(&write_list);
|
||||
|
||||
if (!b)
|
||||
return b;
|
||||
|
||||
|
@ -1047,6 +1073,8 @@ void dm_bufio_prefetch(struct dm_bufio_client *c,
|
|||
{
|
||||
struct blk_plug plug;
|
||||
|
||||
LIST_HEAD(write_list);
|
||||
|
||||
BUG_ON(dm_bufio_in_request());
|
||||
|
||||
blk_start_plug(&plug);
|
||||
|
@ -1055,7 +1083,15 @@ void dm_bufio_prefetch(struct dm_bufio_client *c,
|
|||
for (; n_blocks--; block++) {
|
||||
int need_submit;
|
||||
struct dm_buffer *b;
|
||||
b = __bufio_new(c, block, NF_PREFETCH, &need_submit);
|
||||
b = __bufio_new(c, block, NF_PREFETCH, &need_submit,
|
||||
&write_list);
|
||||
if (unlikely(!list_empty(&write_list))) {
|
||||
dm_bufio_unlock(c);
|
||||
blk_finish_plug(&plug);
|
||||
__flush_write_list(&write_list);
|
||||
blk_start_plug(&plug);
|
||||
dm_bufio_lock(c);
|
||||
}
|
||||
if (unlikely(b != NULL)) {
|
||||
dm_bufio_unlock(c);
|
||||
|
||||
|
@ -1069,7 +1105,6 @@ void dm_bufio_prefetch(struct dm_bufio_client *c,
|
|||
goto flush_plug;
|
||||
dm_bufio_lock(c);
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
dm_bufio_unlock(c);
|
||||
|
@ -1126,11 +1161,14 @@ EXPORT_SYMBOL_GPL(dm_bufio_mark_buffer_dirty);
|
|||
|
||||
void dm_bufio_write_dirty_buffers_async(struct dm_bufio_client *c)
|
||||
{
|
||||
LIST_HEAD(write_list);
|
||||
|
||||
BUG_ON(dm_bufio_in_request());
|
||||
|
||||
dm_bufio_lock(c);
|
||||
__write_dirty_buffers_async(c, 0);
|
||||
__write_dirty_buffers_async(c, 0, &write_list);
|
||||
dm_bufio_unlock(c);
|
||||
__flush_write_list(&write_list);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(dm_bufio_write_dirty_buffers_async);
|
||||
|
||||
|
@ -1147,8 +1185,13 @@ int dm_bufio_write_dirty_buffers(struct dm_bufio_client *c)
|
|||
unsigned long buffers_processed = 0;
|
||||
struct dm_buffer *b, *tmp;
|
||||
|
||||
LIST_HEAD(write_list);
|
||||
|
||||
dm_bufio_lock(c);
|
||||
__write_dirty_buffers_async(c, 0, &write_list);
|
||||
dm_bufio_unlock(c);
|
||||
__flush_write_list(&write_list);
|
||||
dm_bufio_lock(c);
|
||||
__write_dirty_buffers_async(c, 0);
|
||||
|
||||
again:
|
||||
list_for_each_entry_safe_reverse(b, tmp, &c->lru[LIST_DIRTY], lru_list) {
|
||||
|
@ -1274,7 +1317,7 @@ void dm_bufio_release_move(struct dm_buffer *b, sector_t new_block)
|
|||
BUG_ON(!b->hold_count);
|
||||
BUG_ON(test_bit(B_READING, &b->state));
|
||||
|
||||
__write_dirty_buffer(b);
|
||||
__write_dirty_buffer(b, NULL);
|
||||
if (b->hold_count == 1) {
|
||||
wait_on_bit(&b->state, B_WRITING,
|
||||
do_io_schedule, TASK_UNINTERRUPTIBLE);
|
||||
|
|
|
@ -425,6 +425,10 @@ static bool block_size_is_power_of_two(struct cache *cache)
|
|||
return cache->sectors_per_block_shift >= 0;
|
||||
}
|
||||
|
||||
/* gcc on ARM generates spurious references to __udivdi3 and __umoddi3 */
|
||||
#if defined(CONFIG_ARM) && __GNUC__ == 4 && __GNUC_MINOR__ <= 6
|
||||
__always_inline
|
||||
#endif
|
||||
static dm_block_t block_div(dm_block_t b, uint32_t n)
|
||||
{
|
||||
do_div(b, n);
|
||||
|
|
|
@ -176,7 +176,7 @@ static int flakey_ctr(struct dm_target *ti, unsigned int argc, char **argv)
|
|||
|
||||
fc = kzalloc(sizeof(*fc), GFP_KERNEL);
|
||||
if (!fc) {
|
||||
ti->error = "Cannot allocate linear context";
|
||||
ti->error = "Cannot allocate context";
|
||||
return -ENOMEM;
|
||||
}
|
||||
fc->start_time = jiffies;
|
||||
|
|
|
@ -36,6 +36,14 @@ struct hash_cell {
|
|||
struct dm_table *new_map;
|
||||
};
|
||||
|
||||
/*
|
||||
* A dummy definition to make RCU happy.
|
||||
* struct dm_table should never be dereferenced in this file.
|
||||
*/
|
||||
struct dm_table {
|
||||
int undefined__;
|
||||
};
|
||||
|
||||
struct vers_iter {
|
||||
size_t param_size;
|
||||
struct dm_target_versions *vers, *old_vers;
|
||||
|
@ -242,9 +250,10 @@ static int dm_hash_insert(const char *name, const char *uuid, struct mapped_devi
|
|||
return -EBUSY;
|
||||
}
|
||||
|
||||
static void __hash_remove(struct hash_cell *hc)
|
||||
static struct dm_table *__hash_remove(struct hash_cell *hc)
|
||||
{
|
||||
struct dm_table *table;
|
||||
int srcu_idx;
|
||||
|
||||
/* remove from the dev hash */
|
||||
list_del(&hc->uuid_list);
|
||||
|
@ -253,16 +262,18 @@ static void __hash_remove(struct hash_cell *hc)
|
|||
dm_set_mdptr(hc->md, NULL);
|
||||
mutex_unlock(&dm_hash_cells_mutex);
|
||||
|
||||
table = dm_get_live_table(hc->md);
|
||||
if (table) {
|
||||
table = dm_get_live_table(hc->md, &srcu_idx);
|
||||
if (table)
|
||||
dm_table_event(table);
|
||||
dm_table_put(table);
|
||||
}
|
||||
dm_put_live_table(hc->md, srcu_idx);
|
||||
|
||||
table = NULL;
|
||||
if (hc->new_map)
|
||||
dm_table_destroy(hc->new_map);
|
||||
table = hc->new_map;
|
||||
dm_put(hc->md);
|
||||
free_cell(hc);
|
||||
|
||||
return table;
|
||||
}
|
||||
|
||||
static void dm_hash_remove_all(int keep_open_devices)
|
||||
|
@ -270,6 +281,7 @@ static void dm_hash_remove_all(int keep_open_devices)
|
|||
int i, dev_skipped;
|
||||
struct hash_cell *hc;
|
||||
struct mapped_device *md;
|
||||
struct dm_table *t;
|
||||
|
||||
retry:
|
||||
dev_skipped = 0;
|
||||
|
@ -287,10 +299,14 @@ static void dm_hash_remove_all(int keep_open_devices)
|
|||
continue;
|
||||
}
|
||||
|
||||
__hash_remove(hc);
|
||||
t = __hash_remove(hc);
|
||||
|
||||
up_write(&_hash_lock);
|
||||
|
||||
if (t) {
|
||||
dm_sync_table(md);
|
||||
dm_table_destroy(t);
|
||||
}
|
||||
dm_put(md);
|
||||
if (likely(keep_open_devices))
|
||||
dm_destroy(md);
|
||||
|
@ -356,6 +372,7 @@ static struct mapped_device *dm_hash_rename(struct dm_ioctl *param,
|
|||
struct dm_table *table;
|
||||
struct mapped_device *md;
|
||||
unsigned change_uuid = (param->flags & DM_UUID_FLAG) ? 1 : 0;
|
||||
int srcu_idx;
|
||||
|
||||
/*
|
||||
* duplicate new.
|
||||
|
@ -418,11 +435,10 @@ static struct mapped_device *dm_hash_rename(struct dm_ioctl *param,
|
|||
/*
|
||||
* Wake up any dm event waiters.
|
||||
*/
|
||||
table = dm_get_live_table(hc->md);
|
||||
if (table) {
|
||||
table = dm_get_live_table(hc->md, &srcu_idx);
|
||||
if (table)
|
||||
dm_table_event(table);
|
||||
dm_table_put(table);
|
||||
}
|
||||
dm_put_live_table(hc->md, srcu_idx);
|
||||
|
||||
if (!dm_kobject_uevent(hc->md, KOBJ_CHANGE, param->event_nr))
|
||||
param->flags |= DM_UEVENT_GENERATED_FLAG;
|
||||
|
@ -620,11 +636,14 @@ static int check_name(const char *name)
|
|||
* _hash_lock without first calling dm_table_put, because dm_table_destroy
|
||||
* waits for this dm_table_put and could be called under this lock.
|
||||
*/
|
||||
static struct dm_table *dm_get_inactive_table(struct mapped_device *md)
|
||||
static struct dm_table *dm_get_inactive_table(struct mapped_device *md, int *srcu_idx)
|
||||
{
|
||||
struct hash_cell *hc;
|
||||
struct dm_table *table = NULL;
|
||||
|
||||
/* increment rcu count, we don't care about the table pointer */
|
||||
dm_get_live_table(md, srcu_idx);
|
||||
|
||||
down_read(&_hash_lock);
|
||||
hc = dm_get_mdptr(md);
|
||||
if (!hc || hc->md != md) {
|
||||
|
@ -633,8 +652,6 @@ static struct dm_table *dm_get_inactive_table(struct mapped_device *md)
|
|||
}
|
||||
|
||||
table = hc->new_map;
|
||||
if (table)
|
||||
dm_table_get(table);
|
||||
|
||||
out:
|
||||
up_read(&_hash_lock);
|
||||
|
@ -643,10 +660,11 @@ static struct dm_table *dm_get_inactive_table(struct mapped_device *md)
|
|||
}
|
||||
|
||||
static struct dm_table *dm_get_live_or_inactive_table(struct mapped_device *md,
|
||||
struct dm_ioctl *param)
|
||||
struct dm_ioctl *param,
|
||||
int *srcu_idx)
|
||||
{
|
||||
return (param->flags & DM_QUERY_INACTIVE_TABLE_FLAG) ?
|
||||
dm_get_inactive_table(md) : dm_get_live_table(md);
|
||||
dm_get_inactive_table(md, srcu_idx) : dm_get_live_table(md, srcu_idx);
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -657,6 +675,7 @@ static void __dev_status(struct mapped_device *md, struct dm_ioctl *param)
|
|||
{
|
||||
struct gendisk *disk = dm_disk(md);
|
||||
struct dm_table *table;
|
||||
int srcu_idx;
|
||||
|
||||
param->flags &= ~(DM_SUSPEND_FLAG | DM_READONLY_FLAG |
|
||||
DM_ACTIVE_PRESENT_FLAG);
|
||||
|
@ -676,26 +695,27 @@ static void __dev_status(struct mapped_device *md, struct dm_ioctl *param)
|
|||
param->event_nr = dm_get_event_nr(md);
|
||||
param->target_count = 0;
|
||||
|
||||
table = dm_get_live_table(md);
|
||||
table = dm_get_live_table(md, &srcu_idx);
|
||||
if (table) {
|
||||
if (!(param->flags & DM_QUERY_INACTIVE_TABLE_FLAG)) {
|
||||
if (get_disk_ro(disk))
|
||||
param->flags |= DM_READONLY_FLAG;
|
||||
param->target_count = dm_table_get_num_targets(table);
|
||||
}
|
||||
dm_table_put(table);
|
||||
|
||||
param->flags |= DM_ACTIVE_PRESENT_FLAG;
|
||||
}
|
||||
dm_put_live_table(md, srcu_idx);
|
||||
|
||||
if (param->flags & DM_QUERY_INACTIVE_TABLE_FLAG) {
|
||||
table = dm_get_inactive_table(md);
|
||||
int srcu_idx;
|
||||
table = dm_get_inactive_table(md, &srcu_idx);
|
||||
if (table) {
|
||||
if (!(dm_table_get_mode(table) & FMODE_WRITE))
|
||||
param->flags |= DM_READONLY_FLAG;
|
||||
param->target_count = dm_table_get_num_targets(table);
|
||||
dm_table_put(table);
|
||||
}
|
||||
dm_put_live_table(md, srcu_idx);
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -796,6 +816,7 @@ static int dev_remove(struct dm_ioctl *param, size_t param_size)
|
|||
struct hash_cell *hc;
|
||||
struct mapped_device *md;
|
||||
int r;
|
||||
struct dm_table *t;
|
||||
|
||||
down_write(&_hash_lock);
|
||||
hc = __find_device_hash_cell(param);
|
||||
|
@ -819,9 +840,14 @@ static int dev_remove(struct dm_ioctl *param, size_t param_size)
|
|||
return r;
|
||||
}
|
||||
|
||||
__hash_remove(hc);
|
||||
t = __hash_remove(hc);
|
||||
up_write(&_hash_lock);
|
||||
|
||||
if (t) {
|
||||
dm_sync_table(md);
|
||||
dm_table_destroy(t);
|
||||
}
|
||||
|
||||
if (!dm_kobject_uevent(md, KOBJ_REMOVE, param->event_nr))
|
||||
param->flags |= DM_UEVENT_GENERATED_FLAG;
|
||||
|
||||
|
@ -986,6 +1012,7 @@ static int do_resume(struct dm_ioctl *param)
|
|||
|
||||
old_map = dm_swap_table(md, new_map);
|
||||
if (IS_ERR(old_map)) {
|
||||
dm_sync_table(md);
|
||||
dm_table_destroy(new_map);
|
||||
dm_put(md);
|
||||
return PTR_ERR(old_map);
|
||||
|
@ -1003,6 +1030,10 @@ static int do_resume(struct dm_ioctl *param)
|
|||
param->flags |= DM_UEVENT_GENERATED_FLAG;
|
||||
}
|
||||
|
||||
/*
|
||||
* Since dm_swap_table synchronizes RCU, nobody should be in
|
||||
* read-side critical section already.
|
||||
*/
|
||||
if (old_map)
|
||||
dm_table_destroy(old_map);
|
||||
|
||||
|
@ -1125,6 +1156,7 @@ static int dev_wait(struct dm_ioctl *param, size_t param_size)
|
|||
int r = 0;
|
||||
struct mapped_device *md;
|
||||
struct dm_table *table;
|
||||
int srcu_idx;
|
||||
|
||||
md = find_device(param);
|
||||
if (!md)
|
||||
|
@ -1145,11 +1177,10 @@ static int dev_wait(struct dm_ioctl *param, size_t param_size)
|
|||
*/
|
||||
__dev_status(md, param);
|
||||
|
||||
table = dm_get_live_or_inactive_table(md, param);
|
||||
if (table) {
|
||||
table = dm_get_live_or_inactive_table(md, param, &srcu_idx);
|
||||
if (table)
|
||||
retrieve_status(table, param, param_size);
|
||||
dm_table_put(table);
|
||||
}
|
||||
dm_put_live_table(md, srcu_idx);
|
||||
|
||||
out:
|
||||
dm_put(md);
|
||||
|
@ -1221,7 +1252,7 @@ static int table_load(struct dm_ioctl *param, size_t param_size)
|
|||
{
|
||||
int r;
|
||||
struct hash_cell *hc;
|
||||
struct dm_table *t;
|
||||
struct dm_table *t, *old_map = NULL;
|
||||
struct mapped_device *md;
|
||||
struct target_type *immutable_target_type;
|
||||
|
||||
|
@ -1277,14 +1308,14 @@ static int table_load(struct dm_ioctl *param, size_t param_size)
|
|||
hc = dm_get_mdptr(md);
|
||||
if (!hc || hc->md != md) {
|
||||
DMWARN("device has been removed from the dev hash table.");
|
||||
dm_table_destroy(t);
|
||||
up_write(&_hash_lock);
|
||||
dm_table_destroy(t);
|
||||
r = -ENXIO;
|
||||
goto out;
|
||||
}
|
||||
|
||||
if (hc->new_map)
|
||||
dm_table_destroy(hc->new_map);
|
||||
old_map = hc->new_map;
|
||||
hc->new_map = t;
|
||||
up_write(&_hash_lock);
|
||||
|
||||
|
@ -1292,6 +1323,11 @@ static int table_load(struct dm_ioctl *param, size_t param_size)
|
|||
__dev_status(md, param);
|
||||
|
||||
out:
|
||||
if (old_map) {
|
||||
dm_sync_table(md);
|
||||
dm_table_destroy(old_map);
|
||||
}
|
||||
|
||||
dm_put(md);
|
||||
|
||||
return r;
|
||||
|
@ -1301,6 +1337,7 @@ static int table_clear(struct dm_ioctl *param, size_t param_size)
|
|||
{
|
||||
struct hash_cell *hc;
|
||||
struct mapped_device *md;
|
||||
struct dm_table *old_map = NULL;
|
||||
|
||||
down_write(&_hash_lock);
|
||||
|
||||
|
@ -1312,7 +1349,7 @@ static int table_clear(struct dm_ioctl *param, size_t param_size)
|
|||
}
|
||||
|
||||
if (hc->new_map) {
|
||||
dm_table_destroy(hc->new_map);
|
||||
old_map = hc->new_map;
|
||||
hc->new_map = NULL;
|
||||
}
|
||||
|
||||
|
@ -1321,6 +1358,10 @@ static int table_clear(struct dm_ioctl *param, size_t param_size)
|
|||
__dev_status(hc->md, param);
|
||||
md = hc->md;
|
||||
up_write(&_hash_lock);
|
||||
if (old_map) {
|
||||
dm_sync_table(md);
|
||||
dm_table_destroy(old_map);
|
||||
}
|
||||
dm_put(md);
|
||||
|
||||
return 0;
|
||||
|
@ -1370,6 +1411,7 @@ static int table_deps(struct dm_ioctl *param, size_t param_size)
|
|||
{
|
||||
struct mapped_device *md;
|
||||
struct dm_table *table;
|
||||
int srcu_idx;
|
||||
|
||||
md = find_device(param);
|
||||
if (!md)
|
||||
|
@ -1377,11 +1419,10 @@ static int table_deps(struct dm_ioctl *param, size_t param_size)
|
|||
|
||||
__dev_status(md, param);
|
||||
|
||||
table = dm_get_live_or_inactive_table(md, param);
|
||||
if (table) {
|
||||
table = dm_get_live_or_inactive_table(md, param, &srcu_idx);
|
||||
if (table)
|
||||
retrieve_deps(table, param, param_size);
|
||||
dm_table_put(table);
|
||||
}
|
||||
dm_put_live_table(md, srcu_idx);
|
||||
|
||||
dm_put(md);
|
||||
|
||||
|
@ -1396,6 +1437,7 @@ static int table_status(struct dm_ioctl *param, size_t param_size)
|
|||
{
|
||||
struct mapped_device *md;
|
||||
struct dm_table *table;
|
||||
int srcu_idx;
|
||||
|
||||
md = find_device(param);
|
||||
if (!md)
|
||||
|
@ -1403,11 +1445,10 @@ static int table_status(struct dm_ioctl *param, size_t param_size)
|
|||
|
||||
__dev_status(md, param);
|
||||
|
||||
table = dm_get_live_or_inactive_table(md, param);
|
||||
if (table) {
|
||||
table = dm_get_live_or_inactive_table(md, param, &srcu_idx);
|
||||
if (table)
|
||||
retrieve_status(table, param, param_size);
|
||||
dm_table_put(table);
|
||||
}
|
||||
dm_put_live_table(md, srcu_idx);
|
||||
|
||||
dm_put(md);
|
||||
|
||||
|
@ -1443,6 +1484,7 @@ static int target_message(struct dm_ioctl *param, size_t param_size)
|
|||
struct dm_target_msg *tmsg = (void *) param + param->data_start;
|
||||
size_t maxlen;
|
||||
char *result = get_result_buffer(param, param_size, &maxlen);
|
||||
int srcu_idx;
|
||||
|
||||
md = find_device(param);
|
||||
if (!md)
|
||||
|
@ -1470,9 +1512,9 @@ static int target_message(struct dm_ioctl *param, size_t param_size)
|
|||
if (r <= 1)
|
||||
goto out_argv;
|
||||
|
||||
table = dm_get_live_table(md);
|
||||
table = dm_get_live_table(md, &srcu_idx);
|
||||
if (!table)
|
||||
goto out_argv;
|
||||
goto out_table;
|
||||
|
||||
if (dm_deleting_md(md)) {
|
||||
r = -ENXIO;
|
||||
|
@ -1491,7 +1533,7 @@ static int target_message(struct dm_ioctl *param, size_t param_size)
|
|||
}
|
||||
|
||||
out_table:
|
||||
dm_table_put(table);
|
||||
dm_put_live_table(md, srcu_idx);
|
||||
out_argv:
|
||||
kfree(argv);
|
||||
out:
|
||||
|
@ -1644,7 +1686,10 @@ static int copy_params(struct dm_ioctl __user *user, struct dm_ioctl *param_kern
|
|||
}
|
||||
|
||||
if (!dmi) {
|
||||
dmi = __vmalloc(param_kernel->data_size, GFP_NOIO | __GFP_REPEAT | __GFP_HIGH, PAGE_KERNEL);
|
||||
unsigned noio_flag;
|
||||
noio_flag = memalloc_noio_save();
|
||||
dmi = __vmalloc(param_kernel->data_size, GFP_NOIO | __GFP_REPEAT | __GFP_HIGH | __GFP_HIGHMEM, PAGE_KERNEL);
|
||||
memalloc_noio_restore(noio_flag);
|
||||
if (dmi)
|
||||
*param_flags |= DM_PARAMS_VMALLOC;
|
||||
}
|
||||
|
|
|
@ -1561,7 +1561,6 @@ static int multipath_ioctl(struct dm_target *ti, unsigned int cmd,
|
|||
unsigned long flags;
|
||||
int r;
|
||||
|
||||
again:
|
||||
bdev = NULL;
|
||||
mode = 0;
|
||||
r = 0;
|
||||
|
@ -1579,7 +1578,7 @@ static int multipath_ioctl(struct dm_target *ti, unsigned int cmd,
|
|||
}
|
||||
|
||||
if ((pgpath && m->queue_io) || (!pgpath && m->queue_if_no_path))
|
||||
r = -EAGAIN;
|
||||
r = -ENOTCONN;
|
||||
else if (!bdev)
|
||||
r = -EIO;
|
||||
|
||||
|
@ -1591,11 +1590,8 @@ static int multipath_ioctl(struct dm_target *ti, unsigned int cmd,
|
|||
if (!r && ti->len != i_size_read(bdev->bd_inode) >> SECTOR_SHIFT)
|
||||
r = scsi_verify_blk_ioctl(NULL, cmd);
|
||||
|
||||
if (r == -EAGAIN && !fatal_signal_pending(current)) {
|
||||
if (r == -ENOTCONN && !fatal_signal_pending(current))
|
||||
queue_work(kmultipathd, &m->process_queued_ios);
|
||||
msleep(10);
|
||||
goto again;
|
||||
}
|
||||
|
||||
return r ? : __blkdev_driver_ioctl(bdev, mode, cmd, arg);
|
||||
}
|
||||
|
|
538
drivers/md/dm-switch.c
Normal file
538
drivers/md/dm-switch.c
Normal file
|
@ -0,0 +1,538 @@
|
|||
/*
|
||||
* Copyright (C) 2010-2012 by Dell Inc. All rights reserved.
|
||||
* Copyright (C) 2011-2013 Red Hat, Inc.
|
||||
*
|
||||
* This file is released under the GPL.
|
||||
*
|
||||
* dm-switch is a device-mapper target that maps IO to underlying block
|
||||
* devices efficiently when there are a large number of fixed-sized
|
||||
* address regions but there is no simple pattern to allow for a compact
|
||||
* mapping representation such as dm-stripe.
|
||||
*/
|
||||
|
||||
#include <linux/device-mapper.h>
|
||||
|
||||
#include <linux/module.h>
|
||||
#include <linux/init.h>
|
||||
#include <linux/vmalloc.h>
|
||||
|
||||
#define DM_MSG_PREFIX "switch"
|
||||
|
||||
/*
|
||||
* One region_table_slot_t holds <region_entries_per_slot> region table
|
||||
* entries each of which is <region_table_entry_bits> in size.
|
||||
*/
|
||||
typedef unsigned long region_table_slot_t;
|
||||
|
||||
/*
|
||||
* A device with the offset to its start sector.
|
||||
*/
|
||||
struct switch_path {
|
||||
struct dm_dev *dmdev;
|
||||
sector_t start;
|
||||
};
|
||||
|
||||
/*
|
||||
* Context block for a dm switch device.
|
||||
*/
|
||||
struct switch_ctx {
|
||||
struct dm_target *ti;
|
||||
|
||||
unsigned nr_paths; /* Number of paths in path_list. */
|
||||
|
||||
unsigned region_size; /* Region size in 512-byte sectors */
|
||||
unsigned long nr_regions; /* Number of regions making up the device */
|
||||
signed char region_size_bits; /* log2 of region_size or -1 */
|
||||
|
||||
unsigned char region_table_entry_bits; /* Number of bits in one region table entry */
|
||||
unsigned char region_entries_per_slot; /* Number of entries in one region table slot */
|
||||
signed char region_entries_per_slot_bits; /* log2 of region_entries_per_slot or -1 */
|
||||
|
||||
region_table_slot_t *region_table; /* Region table */
|
||||
|
||||
/*
|
||||
* Array of dm devices to switch between.
|
||||
*/
|
||||
struct switch_path path_list[0];
|
||||
};
|
||||
|
||||
static struct switch_ctx *alloc_switch_ctx(struct dm_target *ti, unsigned nr_paths,
|
||||
unsigned region_size)
|
||||
{
|
||||
struct switch_ctx *sctx;
|
||||
|
||||
sctx = kzalloc(sizeof(struct switch_ctx) + nr_paths * sizeof(struct switch_path),
|
||||
GFP_KERNEL);
|
||||
if (!sctx)
|
||||
return NULL;
|
||||
|
||||
sctx->ti = ti;
|
||||
sctx->region_size = region_size;
|
||||
|
||||
ti->private = sctx;
|
||||
|
||||
return sctx;
|
||||
}
|
||||
|
||||
static int alloc_region_table(struct dm_target *ti, unsigned nr_paths)
|
||||
{
|
||||
struct switch_ctx *sctx = ti->private;
|
||||
sector_t nr_regions = ti->len;
|
||||
sector_t nr_slots;
|
||||
|
||||
if (!(sctx->region_size & (sctx->region_size - 1)))
|
||||
sctx->region_size_bits = __ffs(sctx->region_size);
|
||||
else
|
||||
sctx->region_size_bits = -1;
|
||||
|
||||
sctx->region_table_entry_bits = 1;
|
||||
while (sctx->region_table_entry_bits < sizeof(region_table_slot_t) * 8 &&
|
||||
(region_table_slot_t)1 << sctx->region_table_entry_bits < nr_paths)
|
||||
sctx->region_table_entry_bits++;
|
||||
|
||||
sctx->region_entries_per_slot = (sizeof(region_table_slot_t) * 8) / sctx->region_table_entry_bits;
|
||||
if (!(sctx->region_entries_per_slot & (sctx->region_entries_per_slot - 1)))
|
||||
sctx->region_entries_per_slot_bits = __ffs(sctx->region_entries_per_slot);
|
||||
else
|
||||
sctx->region_entries_per_slot_bits = -1;
|
||||
|
||||
if (sector_div(nr_regions, sctx->region_size))
|
||||
nr_regions++;
|
||||
|
||||
sctx->nr_regions = nr_regions;
|
||||
if (sctx->nr_regions != nr_regions || sctx->nr_regions >= ULONG_MAX) {
|
||||
ti->error = "Region table too large";
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
nr_slots = nr_regions;
|
||||
if (sector_div(nr_slots, sctx->region_entries_per_slot))
|
||||
nr_slots++;
|
||||
|
||||
if (nr_slots > ULONG_MAX / sizeof(region_table_slot_t)) {
|
||||
ti->error = "Region table too large";
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
sctx->region_table = vmalloc(nr_slots * sizeof(region_table_slot_t));
|
||||
if (!sctx->region_table) {
|
||||
ti->error = "Cannot allocate region table";
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void switch_get_position(struct switch_ctx *sctx, unsigned long region_nr,
|
||||
unsigned long *region_index, unsigned *bit)
|
||||
{
|
||||
if (sctx->region_entries_per_slot_bits >= 0) {
|
||||
*region_index = region_nr >> sctx->region_entries_per_slot_bits;
|
||||
*bit = region_nr & (sctx->region_entries_per_slot - 1);
|
||||
} else {
|
||||
*region_index = region_nr / sctx->region_entries_per_slot;
|
||||
*bit = region_nr % sctx->region_entries_per_slot;
|
||||
}
|
||||
|
||||
*bit *= sctx->region_table_entry_bits;
|
||||
}
|
||||
|
||||
/*
|
||||
* Find which path to use at given offset.
|
||||
*/
|
||||
static unsigned switch_get_path_nr(struct switch_ctx *sctx, sector_t offset)
|
||||
{
|
||||
unsigned long region_index;
|
||||
unsigned bit, path_nr;
|
||||
sector_t p;
|
||||
|
||||
p = offset;
|
||||
if (sctx->region_size_bits >= 0)
|
||||
p >>= sctx->region_size_bits;
|
||||
else
|
||||
sector_div(p, sctx->region_size);
|
||||
|
||||
switch_get_position(sctx, p, ®ion_index, &bit);
|
||||
path_nr = (ACCESS_ONCE(sctx->region_table[region_index]) >> bit) &
|
||||
((1 << sctx->region_table_entry_bits) - 1);
|
||||
|
||||
/* This can only happen if the processor uses non-atomic stores. */
|
||||
if (unlikely(path_nr >= sctx->nr_paths))
|
||||
path_nr = 0;
|
||||
|
||||
return path_nr;
|
||||
}
|
||||
|
||||
static void switch_region_table_write(struct switch_ctx *sctx, unsigned long region_nr,
|
||||
unsigned value)
|
||||
{
|
||||
unsigned long region_index;
|
||||
unsigned bit;
|
||||
region_table_slot_t pte;
|
||||
|
||||
switch_get_position(sctx, region_nr, ®ion_index, &bit);
|
||||
|
||||
pte = sctx->region_table[region_index];
|
||||
pte &= ~((((region_table_slot_t)1 << sctx->region_table_entry_bits) - 1) << bit);
|
||||
pte |= (region_table_slot_t)value << bit;
|
||||
sctx->region_table[region_index] = pte;
|
||||
}
|
||||
|
||||
/*
|
||||
* Fill the region table with an initial round robin pattern.
|
||||
*/
|
||||
static void initialise_region_table(struct switch_ctx *sctx)
|
||||
{
|
||||
unsigned path_nr = 0;
|
||||
unsigned long region_nr;
|
||||
|
||||
for (region_nr = 0; region_nr < sctx->nr_regions; region_nr++) {
|
||||
switch_region_table_write(sctx, region_nr, path_nr);
|
||||
if (++path_nr >= sctx->nr_paths)
|
||||
path_nr = 0;
|
||||
}
|
||||
}
|
||||
|
||||
static int parse_path(struct dm_arg_set *as, struct dm_target *ti)
|
||||
{
|
||||
struct switch_ctx *sctx = ti->private;
|
||||
unsigned long long start;
|
||||
int r;
|
||||
|
||||
r = dm_get_device(ti, dm_shift_arg(as), dm_table_get_mode(ti->table),
|
||||
&sctx->path_list[sctx->nr_paths].dmdev);
|
||||
if (r) {
|
||||
ti->error = "Device lookup failed";
|
||||
return r;
|
||||
}
|
||||
|
||||
if (kstrtoull(dm_shift_arg(as), 10, &start) || start != (sector_t)start) {
|
||||
ti->error = "Invalid device starting offset";
|
||||
dm_put_device(ti, sctx->path_list[sctx->nr_paths].dmdev);
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
sctx->path_list[sctx->nr_paths].start = start;
|
||||
|
||||
sctx->nr_paths++;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* Destructor: Don't free the dm_target, just the ti->private data (if any).
|
||||
*/
|
||||
static void switch_dtr(struct dm_target *ti)
|
||||
{
|
||||
struct switch_ctx *sctx = ti->private;
|
||||
|
||||
while (sctx->nr_paths--)
|
||||
dm_put_device(ti, sctx->path_list[sctx->nr_paths].dmdev);
|
||||
|
||||
vfree(sctx->region_table);
|
||||
kfree(sctx);
|
||||
}
|
||||
|
||||
/*
|
||||
* Constructor arguments:
|
||||
* <num_paths> <region_size> <num_optional_args> [<optional_args>...]
|
||||
* [<dev_path> <offset>]+
|
||||
*
|
||||
* Optional args are to allow for future extension: currently this
|
||||
* parameter must be 0.
|
||||
*/
|
||||
static int switch_ctr(struct dm_target *ti, unsigned argc, char **argv)
|
||||
{
|
||||
static struct dm_arg _args[] = {
|
||||
{1, (KMALLOC_MAX_SIZE - sizeof(struct switch_ctx)) / sizeof(struct switch_path), "Invalid number of paths"},
|
||||
{1, UINT_MAX, "Invalid region size"},
|
||||
{0, 0, "Invalid number of optional args"},
|
||||
};
|
||||
|
||||
struct switch_ctx *sctx;
|
||||
struct dm_arg_set as;
|
||||
unsigned nr_paths, region_size, nr_optional_args;
|
||||
int r;
|
||||
|
||||
as.argc = argc;
|
||||
as.argv = argv;
|
||||
|
||||
r = dm_read_arg(_args, &as, &nr_paths, &ti->error);
|
||||
if (r)
|
||||
return -EINVAL;
|
||||
|
||||
r = dm_read_arg(_args + 1, &as, ®ion_size, &ti->error);
|
||||
if (r)
|
||||
return r;
|
||||
|
||||
r = dm_read_arg_group(_args + 2, &as, &nr_optional_args, &ti->error);
|
||||
if (r)
|
||||
return r;
|
||||
/* parse optional arguments here, if we add any */
|
||||
|
||||
if (as.argc != nr_paths * 2) {
|
||||
ti->error = "Incorrect number of path arguments";
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
sctx = alloc_switch_ctx(ti, nr_paths, region_size);
|
||||
if (!sctx) {
|
||||
ti->error = "Cannot allocate redirection context";
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
r = dm_set_target_max_io_len(ti, region_size);
|
||||
if (r)
|
||||
goto error;
|
||||
|
||||
while (as.argc) {
|
||||
r = parse_path(&as, ti);
|
||||
if (r)
|
||||
goto error;
|
||||
}
|
||||
|
||||
r = alloc_region_table(ti, nr_paths);
|
||||
if (r)
|
||||
goto error;
|
||||
|
||||
initialise_region_table(sctx);
|
||||
|
||||
/* For UNMAP, sending the request down any path is sufficient */
|
||||
ti->num_discard_bios = 1;
|
||||
|
||||
return 0;
|
||||
|
||||
error:
|
||||
switch_dtr(ti);
|
||||
|
||||
return r;
|
||||
}
|
||||
|
||||
static int switch_map(struct dm_target *ti, struct bio *bio)
|
||||
{
|
||||
struct switch_ctx *sctx = ti->private;
|
||||
sector_t offset = dm_target_offset(ti, bio->bi_sector);
|
||||
unsigned path_nr = switch_get_path_nr(sctx, offset);
|
||||
|
||||
bio->bi_bdev = sctx->path_list[path_nr].dmdev->bdev;
|
||||
bio->bi_sector = sctx->path_list[path_nr].start + offset;
|
||||
|
||||
return DM_MAPIO_REMAPPED;
|
||||
}
|
||||
|
||||
/*
|
||||
* We need to parse hex numbers in the message as quickly as possible.
|
||||
*
|
||||
* This table-based hex parser improves performance.
|
||||
* It improves a time to load 1000000 entries compared to the condition-based
|
||||
* parser.
|
||||
* table-based parser condition-based parser
|
||||
* PA-RISC 0.29s 0.31s
|
||||
* Opteron 0.0495s 0.0498s
|
||||
*/
|
||||
static const unsigned char hex_table[256] = {
|
||||
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
|
||||
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
|
||||
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
|
||||
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 255, 255, 255, 255, 255, 255,
|
||||
255, 10, 11, 12, 13, 14, 15, 255, 255, 255, 255, 255, 255, 255, 255, 255,
|
||||
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
|
||||
255, 10, 11, 12, 13, 14, 15, 255, 255, 255, 255, 255, 255, 255, 255, 255,
|
||||
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
|
||||
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
|
||||
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
|
||||
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
|
||||
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
|
||||
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
|
||||
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
|
||||
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
|
||||
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255
|
||||
};
|
||||
|
||||
static __always_inline unsigned long parse_hex(const char **string)
|
||||
{
|
||||
unsigned char d;
|
||||
unsigned long r = 0;
|
||||
|
||||
while ((d = hex_table[(unsigned char)**string]) < 16) {
|
||||
r = (r << 4) | d;
|
||||
(*string)++;
|
||||
}
|
||||
|
||||
return r;
|
||||
}
|
||||
|
||||
static int process_set_region_mappings(struct switch_ctx *sctx,
|
||||
unsigned argc, char **argv)
|
||||
{
|
||||
unsigned i;
|
||||
unsigned long region_index = 0;
|
||||
|
||||
for (i = 1; i < argc; i++) {
|
||||
unsigned long path_nr;
|
||||
const char *string = argv[i];
|
||||
|
||||
if (*string == ':')
|
||||
region_index++;
|
||||
else {
|
||||
region_index = parse_hex(&string);
|
||||
if (unlikely(*string != ':')) {
|
||||
DMWARN("invalid set_region_mappings argument: '%s'", argv[i]);
|
||||
return -EINVAL;
|
||||
}
|
||||
}
|
||||
|
||||
string++;
|
||||
if (unlikely(!*string)) {
|
||||
DMWARN("invalid set_region_mappings argument: '%s'", argv[i]);
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
path_nr = parse_hex(&string);
|
||||
if (unlikely(*string)) {
|
||||
DMWARN("invalid set_region_mappings argument: '%s'", argv[i]);
|
||||
return -EINVAL;
|
||||
}
|
||||
if (unlikely(region_index >= sctx->nr_regions)) {
|
||||
DMWARN("invalid set_region_mappings region number: %lu >= %lu", region_index, sctx->nr_regions);
|
||||
return -EINVAL;
|
||||
}
|
||||
if (unlikely(path_nr >= sctx->nr_paths)) {
|
||||
DMWARN("invalid set_region_mappings device: %lu >= %u", path_nr, sctx->nr_paths);
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
switch_region_table_write(sctx, region_index, path_nr);
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* Messages are processed one-at-a-time.
|
||||
*
|
||||
* Only set_region_mappings is supported.
|
||||
*/
|
||||
static int switch_message(struct dm_target *ti, unsigned argc, char **argv)
|
||||
{
|
||||
static DEFINE_MUTEX(message_mutex);
|
||||
|
||||
struct switch_ctx *sctx = ti->private;
|
||||
int r = -EINVAL;
|
||||
|
||||
mutex_lock(&message_mutex);
|
||||
|
||||
if (!strcasecmp(argv[0], "set_region_mappings"))
|
||||
r = process_set_region_mappings(sctx, argc, argv);
|
||||
else
|
||||
DMWARN("Unrecognised message received.");
|
||||
|
||||
mutex_unlock(&message_mutex);
|
||||
|
||||
return r;
|
||||
}
|
||||
|
||||
static void switch_status(struct dm_target *ti, status_type_t type,
|
||||
unsigned status_flags, char *result, unsigned maxlen)
|
||||
{
|
||||
struct switch_ctx *sctx = ti->private;
|
||||
unsigned sz = 0;
|
||||
int path_nr;
|
||||
|
||||
switch (type) {
|
||||
case STATUSTYPE_INFO:
|
||||
result[0] = '\0';
|
||||
break;
|
||||
|
||||
case STATUSTYPE_TABLE:
|
||||
DMEMIT("%u %u 0", sctx->nr_paths, sctx->region_size);
|
||||
for (path_nr = 0; path_nr < sctx->nr_paths; path_nr++)
|
||||
DMEMIT(" %s %llu", sctx->path_list[path_nr].dmdev->name,
|
||||
(unsigned long long)sctx->path_list[path_nr].start);
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Switch ioctl:
|
||||
*
|
||||
* Passthrough all ioctls to the path for sector 0
|
||||
*/
|
||||
static int switch_ioctl(struct dm_target *ti, unsigned cmd,
|
||||
unsigned long arg)
|
||||
{
|
||||
struct switch_ctx *sctx = ti->private;
|
||||
struct block_device *bdev;
|
||||
fmode_t mode;
|
||||
unsigned path_nr;
|
||||
int r = 0;
|
||||
|
||||
path_nr = switch_get_path_nr(sctx, 0);
|
||||
|
||||
bdev = sctx->path_list[path_nr].dmdev->bdev;
|
||||
mode = sctx->path_list[path_nr].dmdev->mode;
|
||||
|
||||
/*
|
||||
* Only pass ioctls through if the device sizes match exactly.
|
||||
*/
|
||||
if (ti->len + sctx->path_list[path_nr].start != i_size_read(bdev->bd_inode) >> SECTOR_SHIFT)
|
||||
r = scsi_verify_blk_ioctl(NULL, cmd);
|
||||
|
||||
return r ? : __blkdev_driver_ioctl(bdev, mode, cmd, arg);
|
||||
}
|
||||
|
||||
static int switch_iterate_devices(struct dm_target *ti,
|
||||
iterate_devices_callout_fn fn, void *data)
|
||||
{
|
||||
struct switch_ctx *sctx = ti->private;
|
||||
int path_nr;
|
||||
int r;
|
||||
|
||||
for (path_nr = 0; path_nr < sctx->nr_paths; path_nr++) {
|
||||
r = fn(ti, sctx->path_list[path_nr].dmdev,
|
||||
sctx->path_list[path_nr].start, ti->len, data);
|
||||
if (r)
|
||||
return r;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static struct target_type switch_target = {
|
||||
.name = "switch",
|
||||
.version = {1, 0, 0},
|
||||
.module = THIS_MODULE,
|
||||
.ctr = switch_ctr,
|
||||
.dtr = switch_dtr,
|
||||
.map = switch_map,
|
||||
.message = switch_message,
|
||||
.status = switch_status,
|
||||
.ioctl = switch_ioctl,
|
||||
.iterate_devices = switch_iterate_devices,
|
||||
};
|
||||
|
||||
static int __init dm_switch_init(void)
|
||||
{
|
||||
int r;
|
||||
|
||||
r = dm_register_target(&switch_target);
|
||||
if (r < 0)
|
||||
DMERR("dm_register_target() failed %d", r);
|
||||
|
||||
return r;
|
||||
}
|
||||
|
||||
static void __exit dm_switch_exit(void)
|
||||
{
|
||||
dm_unregister_target(&switch_target);
|
||||
}
|
||||
|
||||
module_init(dm_switch_init);
|
||||
module_exit(dm_switch_exit);
|
||||
|
||||
MODULE_DESCRIPTION(DM_NAME " dynamic path switching target");
|
||||
MODULE_AUTHOR("Kevin D. O'Kelley <Kevin_OKelley@dell.com>");
|
||||
MODULE_AUTHOR("Narendran Ganapathy <Narendran_Ganapathy@dell.com>");
|
||||
MODULE_AUTHOR("Jim Ramsay <Jim_Ramsay@dell.com>");
|
||||
MODULE_AUTHOR("Mikulas Patocka <mpatocka@redhat.com>");
|
||||
MODULE_LICENSE("GPL");
|
|
@ -26,22 +26,8 @@
|
|||
#define KEYS_PER_NODE (NODE_SIZE / sizeof(sector_t))
|
||||
#define CHILDREN_PER_NODE (KEYS_PER_NODE + 1)
|
||||
|
||||
/*
|
||||
* The table has always exactly one reference from either mapped_device->map
|
||||
* or hash_cell->new_map. This reference is not counted in table->holders.
|
||||
* A pair of dm_create_table/dm_destroy_table functions is used for table
|
||||
* creation/destruction.
|
||||
*
|
||||
* Temporary references from the other code increase table->holders. A pair
|
||||
* of dm_table_get/dm_table_put functions is used to manipulate it.
|
||||
*
|
||||
* When the table is about to be destroyed, we wait for table->holders to
|
||||
* drop to zero.
|
||||
*/
|
||||
|
||||
struct dm_table {
|
||||
struct mapped_device *md;
|
||||
atomic_t holders;
|
||||
unsigned type;
|
||||
|
||||
/* btree table */
|
||||
|
@ -208,7 +194,6 @@ int dm_table_create(struct dm_table **result, fmode_t mode,
|
|||
|
||||
INIT_LIST_HEAD(&t->devices);
|
||||
INIT_LIST_HEAD(&t->target_callbacks);
|
||||
atomic_set(&t->holders, 0);
|
||||
|
||||
if (!num_targets)
|
||||
num_targets = KEYS_PER_NODE;
|
||||
|
@ -246,10 +231,6 @@ void dm_table_destroy(struct dm_table *t)
|
|||
if (!t)
|
||||
return;
|
||||
|
||||
while (atomic_read(&t->holders))
|
||||
msleep(1);
|
||||
smp_mb();
|
||||
|
||||
/* free the indexes */
|
||||
if (t->depth >= 2)
|
||||
vfree(t->index[t->depth - 2]);
|
||||
|
@ -274,22 +255,6 @@ void dm_table_destroy(struct dm_table *t)
|
|||
kfree(t);
|
||||
}
|
||||
|
||||
void dm_table_get(struct dm_table *t)
|
||||
{
|
||||
atomic_inc(&t->holders);
|
||||
}
|
||||
EXPORT_SYMBOL(dm_table_get);
|
||||
|
||||
void dm_table_put(struct dm_table *t)
|
||||
{
|
||||
if (!t)
|
||||
return;
|
||||
|
||||
smp_mb__before_atomic_dec();
|
||||
atomic_dec(&t->holders);
|
||||
}
|
||||
EXPORT_SYMBOL(dm_table_put);
|
||||
|
||||
/*
|
||||
* Checks to see if we need to extend highs or targets.
|
||||
*/
|
||||
|
|
|
@ -451,7 +451,7 @@ static void verity_prefetch_io(struct work_struct *work)
|
|||
goto no_prefetch_cluster;
|
||||
|
||||
if (unlikely(cluster & (cluster - 1)))
|
||||
cluster = 1 << (fls(cluster) - 1);
|
||||
cluster = 1 << __fls(cluster);
|
||||
|
||||
hash_block_start &= ~(sector_t)(cluster - 1);
|
||||
hash_block_end |= cluster - 1;
|
||||
|
@ -695,8 +695,8 @@ static int verity_ctr(struct dm_target *ti, unsigned argc, char **argv)
|
|||
goto bad;
|
||||
}
|
||||
|
||||
if (sscanf(argv[0], "%d%c", &num, &dummy) != 1 ||
|
||||
num < 0 || num > 1) {
|
||||
if (sscanf(argv[0], "%u%c", &num, &dummy) != 1 ||
|
||||
num > 1) {
|
||||
ti->error = "Invalid version";
|
||||
r = -EINVAL;
|
||||
goto bad;
|
||||
|
@ -723,7 +723,7 @@ static int verity_ctr(struct dm_target *ti, unsigned argc, char **argv)
|
|||
r = -EINVAL;
|
||||
goto bad;
|
||||
}
|
||||
v->data_dev_block_bits = ffs(num) - 1;
|
||||
v->data_dev_block_bits = __ffs(num);
|
||||
|
||||
if (sscanf(argv[4], "%u%c", &num, &dummy) != 1 ||
|
||||
!num || (num & (num - 1)) ||
|
||||
|
@ -733,7 +733,7 @@ static int verity_ctr(struct dm_target *ti, unsigned argc, char **argv)
|
|||
r = -EINVAL;
|
||||
goto bad;
|
||||
}
|
||||
v->hash_dev_block_bits = ffs(num) - 1;
|
||||
v->hash_dev_block_bits = __ffs(num);
|
||||
|
||||
if (sscanf(argv[5], "%llu%c", &num_ll, &dummy) != 1 ||
|
||||
(sector_t)(num_ll << (v->data_dev_block_bits - SECTOR_SHIFT))
|
||||
|
@ -812,7 +812,7 @@ static int verity_ctr(struct dm_target *ti, unsigned argc, char **argv)
|
|||
}
|
||||
|
||||
v->hash_per_block_bits =
|
||||
fls((1 << v->hash_dev_block_bits) / v->digest_size) - 1;
|
||||
__fls((1 << v->hash_dev_block_bits) / v->digest_size);
|
||||
|
||||
v->levels = 0;
|
||||
if (v->data_blocks)
|
||||
|
@ -831,9 +831,8 @@ static int verity_ctr(struct dm_target *ti, unsigned argc, char **argv)
|
|||
for (i = v->levels - 1; i >= 0; i--) {
|
||||
sector_t s;
|
||||
v->hash_level_block[i] = hash_position;
|
||||
s = verity_position_at_level(v, v->data_blocks, i);
|
||||
s = (s >> v->hash_per_block_bits) +
|
||||
!!(s & ((1 << v->hash_per_block_bits) - 1));
|
||||
s = (v->data_blocks + ((sector_t)1 << ((i + 1) * v->hash_per_block_bits)) - 1)
|
||||
>> ((i + 1) * v->hash_per_block_bits);
|
||||
if (hash_position + s < hash_position) {
|
||||
ti->error = "Hash device offset overflow";
|
||||
r = -E2BIG;
|
||||
|
|
177
drivers/md/dm.c
177
drivers/md/dm.c
|
@ -116,16 +116,30 @@ EXPORT_SYMBOL_GPL(dm_get_rq_mapinfo);
|
|||
#define DMF_NOFLUSH_SUSPENDING 5
|
||||
#define DMF_MERGE_IS_OPTIONAL 6
|
||||
|
||||
/*
|
||||
* A dummy definition to make RCU happy.
|
||||
* struct dm_table should never be dereferenced in this file.
|
||||
*/
|
||||
struct dm_table {
|
||||
int undefined__;
|
||||
};
|
||||
|
||||
/*
|
||||
* Work processed by per-device workqueue.
|
||||
*/
|
||||
struct mapped_device {
|
||||
struct rw_semaphore io_lock;
|
||||
struct srcu_struct io_barrier;
|
||||
struct mutex suspend_lock;
|
||||
rwlock_t map_lock;
|
||||
atomic_t holders;
|
||||
atomic_t open_count;
|
||||
|
||||
/*
|
||||
* The current mapping.
|
||||
* Use dm_get_live_table{_fast} or take suspend_lock for
|
||||
* dereference.
|
||||
*/
|
||||
struct dm_table *map;
|
||||
|
||||
unsigned long flags;
|
||||
|
||||
struct request_queue *queue;
|
||||
|
@ -154,11 +168,6 @@ struct mapped_device {
|
|||
*/
|
||||
struct workqueue_struct *wq;
|
||||
|
||||
/*
|
||||
* The current mapping.
|
||||
*/
|
||||
struct dm_table *map;
|
||||
|
||||
/*
|
||||
* io objects are allocated from here.
|
||||
*/
|
||||
|
@ -386,10 +395,14 @@ static int dm_blk_ioctl(struct block_device *bdev, fmode_t mode,
|
|||
unsigned int cmd, unsigned long arg)
|
||||
{
|
||||
struct mapped_device *md = bdev->bd_disk->private_data;
|
||||
struct dm_table *map = dm_get_live_table(md);
|
||||
int srcu_idx;
|
||||
struct dm_table *map;
|
||||
struct dm_target *tgt;
|
||||
int r = -ENOTTY;
|
||||
|
||||
retry:
|
||||
map = dm_get_live_table(md, &srcu_idx);
|
||||
|
||||
if (!map || !dm_table_get_size(map))
|
||||
goto out;
|
||||
|
||||
|
@ -408,7 +421,12 @@ static int dm_blk_ioctl(struct block_device *bdev, fmode_t mode,
|
|||
r = tgt->type->ioctl(tgt, cmd, arg);
|
||||
|
||||
out:
|
||||
dm_table_put(map);
|
||||
dm_put_live_table(md, srcu_idx);
|
||||
|
||||
if (r == -ENOTCONN) {
|
||||
msleep(10);
|
||||
goto retry;
|
||||
}
|
||||
|
||||
return r;
|
||||
}
|
||||
|
@ -502,20 +520,39 @@ static void queue_io(struct mapped_device *md, struct bio *bio)
|
|||
/*
|
||||
* Everyone (including functions in this file), should use this
|
||||
* function to access the md->map field, and make sure they call
|
||||
* dm_table_put() when finished.
|
||||
* dm_put_live_table() when finished.
|
||||
*/
|
||||
struct dm_table *dm_get_live_table(struct mapped_device *md)
|
||||
struct dm_table *dm_get_live_table(struct mapped_device *md, int *srcu_idx) __acquires(md->io_barrier)
|
||||
{
|
||||
struct dm_table *t;
|
||||
unsigned long flags;
|
||||
*srcu_idx = srcu_read_lock(&md->io_barrier);
|
||||
|
||||
read_lock_irqsave(&md->map_lock, flags);
|
||||
t = md->map;
|
||||
if (t)
|
||||
dm_table_get(t);
|
||||
read_unlock_irqrestore(&md->map_lock, flags);
|
||||
return srcu_dereference(md->map, &md->io_barrier);
|
||||
}
|
||||
|
||||
return t;
|
||||
void dm_put_live_table(struct mapped_device *md, int srcu_idx) __releases(md->io_barrier)
|
||||
{
|
||||
srcu_read_unlock(&md->io_barrier, srcu_idx);
|
||||
}
|
||||
|
||||
void dm_sync_table(struct mapped_device *md)
|
||||
{
|
||||
synchronize_srcu(&md->io_barrier);
|
||||
synchronize_rcu_expedited();
|
||||
}
|
||||
|
||||
/*
|
||||
* A fast alternative to dm_get_live_table/dm_put_live_table.
|
||||
* The caller must not block between these two functions.
|
||||
*/
|
||||
static struct dm_table *dm_get_live_table_fast(struct mapped_device *md) __acquires(RCU)
|
||||
{
|
||||
rcu_read_lock();
|
||||
return rcu_dereference(md->map);
|
||||
}
|
||||
|
||||
static void dm_put_live_table_fast(struct mapped_device *md) __releases(RCU)
|
||||
{
|
||||
rcu_read_unlock();
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -1349,17 +1386,18 @@ static int __split_and_process_non_flush(struct clone_info *ci)
|
|||
/*
|
||||
* Entry point to split a bio into clones and submit them to the targets.
|
||||
*/
|
||||
static void __split_and_process_bio(struct mapped_device *md, struct bio *bio)
|
||||
static void __split_and_process_bio(struct mapped_device *md,
|
||||
struct dm_table *map, struct bio *bio)
|
||||
{
|
||||
struct clone_info ci;
|
||||
int error = 0;
|
||||
|
||||
ci.map = dm_get_live_table(md);
|
||||
if (unlikely(!ci.map)) {
|
||||
if (unlikely(!map)) {
|
||||
bio_io_error(bio);
|
||||
return;
|
||||
}
|
||||
|
||||
ci.map = map;
|
||||
ci.md = md;
|
||||
ci.io = alloc_io(md);
|
||||
ci.io->error = 0;
|
||||
|
@ -1386,7 +1424,6 @@ static void __split_and_process_bio(struct mapped_device *md, struct bio *bio)
|
|||
|
||||
/* drop the extra reference count */
|
||||
dec_pending(ci.io, error);
|
||||
dm_table_put(ci.map);
|
||||
}
|
||||
/*-----------------------------------------------------------------
|
||||
* CRUD END
|
||||
|
@ -1397,7 +1434,7 @@ static int dm_merge_bvec(struct request_queue *q,
|
|||
struct bio_vec *biovec)
|
||||
{
|
||||
struct mapped_device *md = q->queuedata;
|
||||
struct dm_table *map = dm_get_live_table(md);
|
||||
struct dm_table *map = dm_get_live_table_fast(md);
|
||||
struct dm_target *ti;
|
||||
sector_t max_sectors;
|
||||
int max_size = 0;
|
||||
|
@ -1407,7 +1444,7 @@ static int dm_merge_bvec(struct request_queue *q,
|
|||
|
||||
ti = dm_table_find_target(map, bvm->bi_sector);
|
||||
if (!dm_target_is_valid(ti))
|
||||
goto out_table;
|
||||
goto out;
|
||||
|
||||
/*
|
||||
* Find maximum amount of I/O that won't need splitting
|
||||
|
@ -1436,10 +1473,8 @@ static int dm_merge_bvec(struct request_queue *q,
|
|||
|
||||
max_size = 0;
|
||||
|
||||
out_table:
|
||||
dm_table_put(map);
|
||||
|
||||
out:
|
||||
dm_put_live_table_fast(md);
|
||||
/*
|
||||
* Always allow an entire first page
|
||||
*/
|
||||
|
@ -1458,8 +1493,10 @@ static void _dm_request(struct request_queue *q, struct bio *bio)
|
|||
int rw = bio_data_dir(bio);
|
||||
struct mapped_device *md = q->queuedata;
|
||||
int cpu;
|
||||
int srcu_idx;
|
||||
struct dm_table *map;
|
||||
|
||||
down_read(&md->io_lock);
|
||||
map = dm_get_live_table(md, &srcu_idx);
|
||||
|
||||
cpu = part_stat_lock();
|
||||
part_stat_inc(cpu, &dm_disk(md)->part0, ios[rw]);
|
||||
|
@ -1468,7 +1505,7 @@ static void _dm_request(struct request_queue *q, struct bio *bio)
|
|||
|
||||
/* if we're suspended, we have to queue this io for later */
|
||||
if (unlikely(test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags))) {
|
||||
up_read(&md->io_lock);
|
||||
dm_put_live_table(md, srcu_idx);
|
||||
|
||||
if (bio_rw(bio) != READA)
|
||||
queue_io(md, bio);
|
||||
|
@ -1477,8 +1514,8 @@ static void _dm_request(struct request_queue *q, struct bio *bio)
|
|||
return;
|
||||
}
|
||||
|
||||
__split_and_process_bio(md, bio);
|
||||
up_read(&md->io_lock);
|
||||
__split_and_process_bio(md, map, bio);
|
||||
dm_put_live_table(md, srcu_idx);
|
||||
return;
|
||||
}
|
||||
|
||||
|
@ -1664,7 +1701,8 @@ static struct request *dm_start_request(struct mapped_device *md, struct request
|
|||
static void dm_request_fn(struct request_queue *q)
|
||||
{
|
||||
struct mapped_device *md = q->queuedata;
|
||||
struct dm_table *map = dm_get_live_table(md);
|
||||
int srcu_idx;
|
||||
struct dm_table *map = dm_get_live_table(md, &srcu_idx);
|
||||
struct dm_target *ti;
|
||||
struct request *rq, *clone;
|
||||
sector_t pos;
|
||||
|
@ -1719,7 +1757,7 @@ static void dm_request_fn(struct request_queue *q)
|
|||
delay_and_out:
|
||||
blk_delay_queue(q, HZ / 10);
|
||||
out:
|
||||
dm_table_put(map);
|
||||
dm_put_live_table(md, srcu_idx);
|
||||
}
|
||||
|
||||
int dm_underlying_device_busy(struct request_queue *q)
|
||||
|
@ -1732,14 +1770,14 @@ static int dm_lld_busy(struct request_queue *q)
|
|||
{
|
||||
int r;
|
||||
struct mapped_device *md = q->queuedata;
|
||||
struct dm_table *map = dm_get_live_table(md);
|
||||
struct dm_table *map = dm_get_live_table_fast(md);
|
||||
|
||||
if (!map || test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags))
|
||||
r = 1;
|
||||
else
|
||||
r = dm_table_any_busy_target(map);
|
||||
|
||||
dm_table_put(map);
|
||||
dm_put_live_table_fast(md);
|
||||
|
||||
return r;
|
||||
}
|
||||
|
@ -1751,7 +1789,7 @@ static int dm_any_congested(void *congested_data, int bdi_bits)
|
|||
struct dm_table *map;
|
||||
|
||||
if (!test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)) {
|
||||
map = dm_get_live_table(md);
|
||||
map = dm_get_live_table_fast(md);
|
||||
if (map) {
|
||||
/*
|
||||
* Request-based dm cares about only own queue for
|
||||
|
@ -1762,9 +1800,8 @@ static int dm_any_congested(void *congested_data, int bdi_bits)
|
|||
bdi_bits;
|
||||
else
|
||||
r = dm_table_any_congested(map, bdi_bits);
|
||||
|
||||
dm_table_put(map);
|
||||
}
|
||||
dm_put_live_table_fast(md);
|
||||
}
|
||||
|
||||
return r;
|
||||
|
@ -1869,12 +1906,14 @@ static struct mapped_device *alloc_dev(int minor)
|
|||
if (r < 0)
|
||||
goto bad_minor;
|
||||
|
||||
r = init_srcu_struct(&md->io_barrier);
|
||||
if (r < 0)
|
||||
goto bad_io_barrier;
|
||||
|
||||
md->type = DM_TYPE_NONE;
|
||||
init_rwsem(&md->io_lock);
|
||||
mutex_init(&md->suspend_lock);
|
||||
mutex_init(&md->type_lock);
|
||||
spin_lock_init(&md->deferred_lock);
|
||||
rwlock_init(&md->map_lock);
|
||||
atomic_set(&md->holders, 1);
|
||||
atomic_set(&md->open_count, 0);
|
||||
atomic_set(&md->event_nr, 0);
|
||||
|
@ -1937,6 +1976,8 @@ static struct mapped_device *alloc_dev(int minor)
|
|||
bad_disk:
|
||||
blk_cleanup_queue(md->queue);
|
||||
bad_queue:
|
||||
cleanup_srcu_struct(&md->io_barrier);
|
||||
bad_io_barrier:
|
||||
free_minor(minor);
|
||||
bad_minor:
|
||||
module_put(THIS_MODULE);
|
||||
|
@ -1960,6 +2001,7 @@ static void free_dev(struct mapped_device *md)
|
|||
bioset_free(md->bs);
|
||||
blk_integrity_unregister(md->disk);
|
||||
del_gendisk(md->disk);
|
||||
cleanup_srcu_struct(&md->io_barrier);
|
||||
free_minor(minor);
|
||||
|
||||
spin_lock(&_minor_lock);
|
||||
|
@ -2102,7 +2144,6 @@ static struct dm_table *__bind(struct mapped_device *md, struct dm_table *t,
|
|||
struct dm_table *old_map;
|
||||
struct request_queue *q = md->queue;
|
||||
sector_t size;
|
||||
unsigned long flags;
|
||||
int merge_is_optional;
|
||||
|
||||
size = dm_table_get_size(t);
|
||||
|
@ -2131,9 +2172,8 @@ static struct dm_table *__bind(struct mapped_device *md, struct dm_table *t,
|
|||
|
||||
merge_is_optional = dm_table_merge_is_optional(t);
|
||||
|
||||
write_lock_irqsave(&md->map_lock, flags);
|
||||
old_map = md->map;
|
||||
md->map = t;
|
||||
rcu_assign_pointer(md->map, t);
|
||||
md->immutable_target_type = dm_table_get_immutable_target_type(t);
|
||||
|
||||
dm_table_set_restrictions(t, q, limits);
|
||||
|
@ -2141,7 +2181,7 @@ static struct dm_table *__bind(struct mapped_device *md, struct dm_table *t,
|
|||
set_bit(DMF_MERGE_IS_OPTIONAL, &md->flags);
|
||||
else
|
||||
clear_bit(DMF_MERGE_IS_OPTIONAL, &md->flags);
|
||||
write_unlock_irqrestore(&md->map_lock, flags);
|
||||
dm_sync_table(md);
|
||||
|
||||
return old_map;
|
||||
}
|
||||
|
@ -2152,15 +2192,13 @@ static struct dm_table *__bind(struct mapped_device *md, struct dm_table *t,
|
|||
static struct dm_table *__unbind(struct mapped_device *md)
|
||||
{
|
||||
struct dm_table *map = md->map;
|
||||
unsigned long flags;
|
||||
|
||||
if (!map)
|
||||
return NULL;
|
||||
|
||||
dm_table_event_callback(map, NULL, NULL);
|
||||
write_lock_irqsave(&md->map_lock, flags);
|
||||
md->map = NULL;
|
||||
write_unlock_irqrestore(&md->map_lock, flags);
|
||||
rcu_assign_pointer(md->map, NULL);
|
||||
dm_sync_table(md);
|
||||
|
||||
return map;
|
||||
}
|
||||
|
@ -2312,11 +2350,12 @@ EXPORT_SYMBOL_GPL(dm_device_name);
|
|||
static void __dm_destroy(struct mapped_device *md, bool wait)
|
||||
{
|
||||
struct dm_table *map;
|
||||
int srcu_idx;
|
||||
|
||||
might_sleep();
|
||||
|
||||
spin_lock(&_minor_lock);
|
||||
map = dm_get_live_table(md);
|
||||
map = dm_get_live_table(md, &srcu_idx);
|
||||
idr_replace(&_minor_idr, MINOR_ALLOCED, MINOR(disk_devt(dm_disk(md))));
|
||||
set_bit(DMF_FREEING, &md->flags);
|
||||
spin_unlock(&_minor_lock);
|
||||
|
@ -2326,6 +2365,9 @@ static void __dm_destroy(struct mapped_device *md, bool wait)
|
|||
dm_table_postsuspend_targets(map);
|
||||
}
|
||||
|
||||
/* dm_put_live_table must be before msleep, otherwise deadlock is possible */
|
||||
dm_put_live_table(md, srcu_idx);
|
||||
|
||||
/*
|
||||
* Rare, but there may be I/O requests still going to complete,
|
||||
* for example. Wait for all references to disappear.
|
||||
|
@ -2340,7 +2382,6 @@ static void __dm_destroy(struct mapped_device *md, bool wait)
|
|||
dm_device_name(md), atomic_read(&md->holders));
|
||||
|
||||
dm_sysfs_exit(md);
|
||||
dm_table_put(map);
|
||||
dm_table_destroy(__unbind(md));
|
||||
free_dev(md);
|
||||
}
|
||||
|
@ -2397,8 +2438,10 @@ static void dm_wq_work(struct work_struct *work)
|
|||
struct mapped_device *md = container_of(work, struct mapped_device,
|
||||
work);
|
||||
struct bio *c;
|
||||
int srcu_idx;
|
||||
struct dm_table *map;
|
||||
|
||||
down_read(&md->io_lock);
|
||||
map = dm_get_live_table(md, &srcu_idx);
|
||||
|
||||
while (!test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)) {
|
||||
spin_lock_irq(&md->deferred_lock);
|
||||
|
@ -2408,17 +2451,13 @@ static void dm_wq_work(struct work_struct *work)
|
|||
if (!c)
|
||||
break;
|
||||
|
||||
up_read(&md->io_lock);
|
||||
|
||||
if (dm_request_based(md))
|
||||
generic_make_request(c);
|
||||
else
|
||||
__split_and_process_bio(md, c);
|
||||
|
||||
down_read(&md->io_lock);
|
||||
__split_and_process_bio(md, map, c);
|
||||
}
|
||||
|
||||
up_read(&md->io_lock);
|
||||
dm_put_live_table(md, srcu_idx);
|
||||
}
|
||||
|
||||
static void dm_queue_flush(struct mapped_device *md)
|
||||
|
@ -2450,10 +2489,10 @@ struct dm_table *dm_swap_table(struct mapped_device *md, struct dm_table *table)
|
|||
* reappear.
|
||||
*/
|
||||
if (dm_table_has_no_data_devices(table)) {
|
||||
live_map = dm_get_live_table(md);
|
||||
live_map = dm_get_live_table_fast(md);
|
||||
if (live_map)
|
||||
limits = md->queue->limits;
|
||||
dm_table_put(live_map);
|
||||
dm_put_live_table_fast(md);
|
||||
}
|
||||
|
||||
if (!live_map) {
|
||||
|
@ -2533,7 +2572,7 @@ int dm_suspend(struct mapped_device *md, unsigned suspend_flags)
|
|||
goto out_unlock;
|
||||
}
|
||||
|
||||
map = dm_get_live_table(md);
|
||||
map = md->map;
|
||||
|
||||
/*
|
||||
* DMF_NOFLUSH_SUSPENDING must be set before presuspend.
|
||||
|
@ -2554,7 +2593,7 @@ int dm_suspend(struct mapped_device *md, unsigned suspend_flags)
|
|||
if (!noflush && do_lockfs) {
|
||||
r = lock_fs(md);
|
||||
if (r)
|
||||
goto out;
|
||||
goto out_unlock;
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -2569,9 +2608,8 @@ int dm_suspend(struct mapped_device *md, unsigned suspend_flags)
|
|||
* (dm_wq_work), we set BMF_BLOCK_IO_FOR_SUSPEND and call
|
||||
* flush_workqueue(md->wq).
|
||||
*/
|
||||
down_write(&md->io_lock);
|
||||
set_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags);
|
||||
up_write(&md->io_lock);
|
||||
synchronize_srcu(&md->io_barrier);
|
||||
|
||||
/*
|
||||
* Stop md->queue before flushing md->wq in case request-based
|
||||
|
@ -2589,10 +2627,9 @@ int dm_suspend(struct mapped_device *md, unsigned suspend_flags)
|
|||
*/
|
||||
r = dm_wait_for_completion(md, TASK_INTERRUPTIBLE);
|
||||
|
||||
down_write(&md->io_lock);
|
||||
if (noflush)
|
||||
clear_bit(DMF_NOFLUSH_SUSPENDING, &md->flags);
|
||||
up_write(&md->io_lock);
|
||||
synchronize_srcu(&md->io_barrier);
|
||||
|
||||
/* were we interrupted ? */
|
||||
if (r < 0) {
|
||||
|
@ -2602,7 +2639,7 @@ int dm_suspend(struct mapped_device *md, unsigned suspend_flags)
|
|||
start_queue(md->queue);
|
||||
|
||||
unlock_fs(md);
|
||||
goto out; /* pushback list is already flushed, so skip flush */
|
||||
goto out_unlock; /* pushback list is already flushed, so skip flush */
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -2615,9 +2652,6 @@ int dm_suspend(struct mapped_device *md, unsigned suspend_flags)
|
|||
|
||||
dm_table_postsuspend_targets(map);
|
||||
|
||||
out:
|
||||
dm_table_put(map);
|
||||
|
||||
out_unlock:
|
||||
mutex_unlock(&md->suspend_lock);
|
||||
return r;
|
||||
|
@ -2632,7 +2666,7 @@ int dm_resume(struct mapped_device *md)
|
|||
if (!dm_suspended_md(md))
|
||||
goto out;
|
||||
|
||||
map = dm_get_live_table(md);
|
||||
map = md->map;
|
||||
if (!map || !dm_table_get_size(map))
|
||||
goto out;
|
||||
|
||||
|
@ -2656,7 +2690,6 @@ int dm_resume(struct mapped_device *md)
|
|||
|
||||
r = 0;
|
||||
out:
|
||||
dm_table_put(map);
|
||||
mutex_unlock(&md->suspend_lock);
|
||||
|
||||
return r;
|
||||
|
|
|
@ -446,9 +446,9 @@ int __must_check dm_set_target_max_io_len(struct dm_target *ti, sector_t len);
|
|||
/*
|
||||
* Table reference counting.
|
||||
*/
|
||||
struct dm_table *dm_get_live_table(struct mapped_device *md);
|
||||
void dm_table_get(struct dm_table *t);
|
||||
void dm_table_put(struct dm_table *t);
|
||||
struct dm_table *dm_get_live_table(struct mapped_device *md, int *srcu_idx);
|
||||
void dm_put_live_table(struct mapped_device *md, int srcu_idx);
|
||||
void dm_sync_table(struct mapped_device *md);
|
||||
|
||||
/*
|
||||
* Queries
|
||||
|
|
|
@ -267,9 +267,9 @@ enum {
|
|||
#define DM_DEV_SET_GEOMETRY _IOWR(DM_IOCTL, DM_DEV_SET_GEOMETRY_CMD, struct dm_ioctl)
|
||||
|
||||
#define DM_VERSION_MAJOR 4
|
||||
#define DM_VERSION_MINOR 24
|
||||
#define DM_VERSION_MINOR 25
|
||||
#define DM_VERSION_PATCHLEVEL 0
|
||||
#define DM_VERSION_EXTRA "-ioctl (2013-01-15)"
|
||||
#define DM_VERSION_EXTRA "-ioctl (2013-06-26)"
|
||||
|
||||
/* Status bits */
|
||||
#define DM_READONLY_FLAG (1 << 0) /* In/Out */
|
||||
|
|
Loading…
Reference in a new issue