Commit graph

505 commits

Author SHA1 Message Date
Yang Shi
419c434c35 Fix DMA access of block device in 64-bit kernel on some non-x86 systems with 4GB or upper 4GB memory
For some non-x86 systems with 4GB or upper 4GB memory,
we need increase the range of addresses that can be
used for direct DMA in 64-bit kernel.

Signed-off-by: Yang Shi <yang.shi@windriver.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-03-04 11:20:51 +01:00
Tejun Heo
e3790c7d42 block: separate out padding from alignment
Block layer alignment was used for two different purposes - memory
alignment and padding.  This causes problems in lower layers because
drivers which only require memory alignment ends up with adjusted
rq->data_len.  Separate out padding such that padding occurs iff
driver explicitly requests it.

Tomo: restorethe code to update bio in blk_rq_map_user
      introduced by the commit 40b01b9bbd
      according to padding alignment.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-03-04 11:18:17 +01:00
FUJITA Tomonori
7a85f8896f block: restore the meaning of rq->data_len to the true data length
The meaning of rq->data_len was changed to the length of an allocated
buffer from the true data length. It breaks SG_IO friends and
bsg. This patch restores the meaning of rq->data_len to the true data
length and adds rq->extra_len to store an extended length (due to
drain buffer and padding).

This patch also removes the code to update bio in blk_rq_map_user
introduced by the commit 40b01b9bbd.
The commit adjusts bio according to memory alignment
(queue_dma_alignment). However, memory alignment is NOT padding
alignment. This adjustment also breaks SG_IO friends and bsg. Padding
alignment needs to be fixed in a proper way (by a separate patch).

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <axboe@carl.home.kernel.dk>
2008-03-04 11:17:11 +01:00
Randy Dunlap
5d87a052c7 block: fix kernel-docbook parameters and files
kernel-doc for block/:
- add missing parameters
- fix one function's parameter list (remove blank line)
- add 2 source files to docbook for non-exported kernel-doc functions

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-03-04 11:14:39 +01:00
Tejun Heo
db0a2e0099 block: clear drain buffer if draining for write command
Clear drain buffer before chaining if the command in question is a
write.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-02-19 11:36:55 +01:00
Tejun Heo
2fb98e8414 block: implement request_queue->dma_drain_needed
Draining shouldn't be done for commands where overflow may indicate
data integrity issues.  Add dma_drain_needed callback to
request_queue.  Drain buffer is appened iff this function returns
non-zero.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-02-19 11:36:53 +01:00
Tejun Heo
6b00769fe1 block: add request->raw_data_len
With padding and draining moved into it, block layer now may extend
requests as directed by queue parameters, so now a request has two
sizes - the original request size and the extended size which matches
the size of area pointed to by bios and later by sgs.  The latter size
is what lower layers are primarily interested in when allocating,
filling up DMA tables and setting up the controller.

Both padding and draining extend the data area to accomodate
controller characteristics.  As any controller which speaks SCSI can
handle underflows, feeding larger data area is safe.

So, this patch makes the primary data length field, request->data_len,
indicate the size of full data area and add a separate length field,
request->raw_data_len, for the unmodified request size.  The latter is
used to report to higher layer (userland) and where the original
request size should be fed to the controller or device.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-02-19 11:36:35 +01:00
Tejun Heo
40b01b9bbd block: update bio according to DMA alignment padding
DMA start address and transfer size alignment for PC requests are
achieved using bio_copy_user() instead of bio_map_user().  This works
because bio_copy_user() always uses full pages and block DMA alignment
isn't allowed to go over PAGE_SIZE.

However, the implementation didn't update the last bio of the request
to make this padding visible to lower layers.  This patch makes
blk_rq_map_user() extend the last bio such that it includes the
padding area and the size of area pointed to by the request is
properly aligned.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-02-19 11:35:38 +01:00
Jens Axboe
e164094964 elevator: make elevator_get() attempt to load the appropriate module
Currently we fail if someone requests a valid io scheduler, but it's
modular and not currently loaded. That can happen from a driver init
asking for a different scheduler, or online switching through sysfs
as requested by a user.

This patch makes elevator_get() request_module() to attempt to load
the appropriate module, instead of requiring that done manually.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-02-19 10:20:37 +01:00
Jens Axboe
ffc4e75957 cfq-iosched: add hlist for browsing parallel to the radix tree
It's cumbersome to browse a radix tree from start to finish, especially
since we modify keys when a process exits. So add a hlist for the single
purpose of browsing over all known cfq_io_contexts, used for exit,
io prio change, etc.

This fixes http://bugzilla.kernel.org/show_bug.cgi?id=9948

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-02-19 10:04:00 +01:00
Jens Axboe
84e9e03c55 block: make blk_rq_map_user() clear ->bio if it unmaps it
That way the interface is symmetric, and calling blk_rq_unmap_user()
on the request wont oops.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-02-19 10:04:00 +01:00
Adrian Bunk
52ff4cae65 make blk_settings_init() static
blk_settings_init() can become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jens Axboe <axboe@carl.home.kernel.dk>
2008-02-19 10:04:00 +01:00
Adrian Bunk
1334159826 make blk_ioc_init() static
blk_ioc_init() can become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jens Axboe <axboe@carl.home.kernel.dk>
2008-02-19 10:04:00 +01:00
Adrian Bunk
5ece6c52ea make blk-core.c:request_cachep static again
request_cachep needlessly became global.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jens Axboe <axboe@carl.home.kernel.dk>
2008-02-19 10:04:00 +01:00
Jerome Marchand
c3c930d933 Enhanced partition statistics: remove old partition statistics
Removes the now unused old partition statistic code.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-02-08 12:42:01 +01:00
Jerome Marchand
28f39d553e Enhanced partition statistics: procfs
Reports enhanced partition statistics in /proc/diskstats.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2008-02-08 12:42:00 +01:00
Jerome Marchand
6f2576af5b Enhanced partition statistics: update partition statitics
Updates the enhanced partition statistics in generic block layer
besides the disk statistics.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-02-08 12:41:56 +01:00
Jens Axboe
63a7138671 block: fixup rq_init() a bit
Rearrange fields in cache order and initialize some fields that
we didn't previously init. Remove init of ->completion_data, it's
part of a union with ->hash. Luckily clearing the rb node is the same
as setting it to null!

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-02-08 12:41:03 +01:00
Jens Axboe
3bc217ffe6 block: kill swap_io_context()
It blindly copies everything in the io_context, including the lock.
That doesn't work so well for either lock ordering or lockdep.

There seems zero point in swapping io contexts on a request to request
merge, so the best point of action is to just remove it.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-02-01 11:34:49 +01:00
Jens Axboe
8bdd3f8a69 as-iosched: fix inconsistent ioc->lock context
Since it's acquired from irq context, all locking must be of the
irq safe variant. Most are already inside the queue lock (which
already disables interrupts), but the io scheduler rmmod path
always has irqs enabled and the put_io_context() path may legally
be called with irqs enabled (even if it isn't usually). So fixup
those two.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-02-01 09:44:28 +01:00
Jens Axboe
4eb166d987 block: make elevator lib checkpatch compliant
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-02-01 09:26:33 +01:00
Jens Axboe
fe094d98e7 cfq-iosched: make checkpatch compliant
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-02-01 09:26:33 +01:00
Jens Axboe
6728cb0e63 block: make core bits checkpatch compliant
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-02-01 09:26:33 +01:00
Jens Axboe
22b132102f block: new end request handling interface should take unsigned byte counts
No point in passing signed integers as the byte count, they can never
be negative.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-02-01 09:26:33 +01:00
James Bottomley
40f620286d [SCSI] bsg: copy the cmd_type field to the subordinate request for bidi
This fixes a problem in SCSI where we use the (previously
uninitialised) cmd_type via blk_pc_request() to set up the transfer in
scsi_init_sgtable().

Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-01-30 13:14:26 -06:00
Jens Axboe
149a051f82 as-iosched: fix double locking bug in as_merged_requests()
If the two requests belong to the same io context, we will attempt
to lock the same lock twice. But swapping contexts is pointless in
that case, so just check for rioc == nioc before doing the double
lock and copy.

Tested-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-30 09:11:10 +01:00
Jan Engelhardt
12f32bb317 block: constify function pointer tables
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-29 21:55:19 +01:00
Martin K. Petersen
e68b903c6b Expose hardware sector size
Expose hardware sector size in sysfs queue directory.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-29 21:55:17 +01:00
Jens Axboe
d6d4819696 block: ll_rw_blk.c split, add blk-merge.c
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-29 21:55:12 +01:00
Jens Axboe
db1d08c646 block: remove dated (and wrong) comment in blk-core.c
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-29 21:55:11 +01:00
Jens Axboe
26b8256e2b block: get rid of unnecessary forward declarations in blk-core.c
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-29 21:55:09 +01:00
Jens Axboe
86db1e2977 block: continue ll_rw_blk.c splitup
Adds files for barrier handling, rq execution, io context handling,
mapping data to requests, and queue settings.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-29 21:55:08 +01:00
Jens Axboe
8324aa91d1 block: split tag and sysfs handling from blk-core.c
Seperates the tag and sysfs handling from ll_rw_blk.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-29 21:55:07 +01:00
Jens Axboe
a168ee84c9 block: first step of splitting ll_rw_blk, rename it
Then we retain history in blk-core.c

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-29 21:55:05 +01:00
Linus Torvalds
8d01eddf29 Merge branch 'for-2.6.25' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.25' of git://git.kernel.dk/linux-2.6-block:
  block: implement drain buffers
  __bio_clone: don't calculate hw/phys segment counts
  block: allow queue dma_alignment of zero
  blktrace: Add blktrace ioctls to SCSI generic devices
2008-01-29 08:51:56 +11:00
Linus Torvalds
f0f0052069 Merge branch 'blk-end-request' of git://git.kernel.dk/linux-2.6-block
* 'blk-end-request' of git://git.kernel.dk/linux-2.6-block: (30 commits)
  blk_end_request: changing xsysace (take 4)
  blk_end_request: changing ub (take 4)
  blk_end_request: cleanup of request completion (take 4)
  blk_end_request: cleanup 'uptodate' related code (take 4)
  blk_end_request: remove/unexport end_that_request_* (take 4)
  blk_end_request: changing scsi (take 4)
  blk_end_request: add bidi completion interface (take 4)
  blk_end_request: changing ide-cd (take 4)
  blk_end_request: add callback feature (take 4)
  blk_end_request: changing ide normal caller (take 4)
  blk_end_request: changing cpqarray (take 4)
  blk_end_request: changing cciss (take 4)
  blk_end_request: changing ide-scsi (take 4)
  blk_end_request: changing s390 (take 4)
  blk_end_request: changing mmc (take 4)
  blk_end_request: changing i2o_block (take 4)
  blk_end_request: changing viocd (take 4)
  blk_end_request: changing xen-blkfront (take 4)
  blk_end_request: changing viodasd (take 4)
  blk_end_request: changing sx8 (take 4)
  ...
2008-01-29 08:51:32 +11:00
Jens Axboe
febffd6181 cfq-iosched: kill some big inlines
Use of inlines were a bit over the top, trim them down a bit.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 13:19:43 +01:00
Jens Axboe
0871714e08 cfq-iosched: relax IOPRIO_CLASS_IDLE restrictions
Currently you must be root to set idle io prio class on a process. This
is due to the fact that the idle class is implemented as a true idle
class, meaning that it will not make progress if someone else is
requesting disk access. Unfortunately this means that it opens DOS
opportunities by locking down file system resources, hence it is root
only at the moment.

This patch relaxes the idle class a little, by removing the truly idle
part (which entals a grace period with associated timer). The
modifications make the idle class as close to zero impact as can be done
while still guarenteeing progress. This means we can relax the root only
criteria as well.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 11:38:15 +01:00
James Bottomley
fa0ccd837e block: implement drain buffers
These DMA drain buffer implementations in drivers are pretty horrible
to do in terms of manipulating the scatterlist.  Plus they're being
done at least in drivers/ide and drivers/ata, so we now have code
duplication.

The one use case for this, as I understand it is AHCI controllers doing
PIO mode to mmc devices but translating this to DMA at the controller
level.

So, what about adding a callback to the block layer that permits the
adding of the drain buffer for the problem devices.  The idea is that
you'd do this in slave_configure after you find one of these devices.

The beauty of doing it in the block layer is that it quietly adds the
drain buffer to the end of the sg list, so it automatically gets mapped
(and unmapped) without anything unusual having to be done to the
scatterlist in driver/scsi or drivers/ata and without any alteration to
the transfer length.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:54:11 +01:00
Jens Axboe
521f3bbdba io_context sharing - anticipatory changes
changes to anticipatory io scheduler for io_context sharing

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:50:35 +01:00
Jens Axboe
4ac845a2e9 block: cfq: make the io contect sharing lockless
The io context sharing introduced a per-ioc spinlock, that would protect
the cfq io context lookup. That is a regression from the original, since
we never needed any locking there because the ioc/cic were process private.

The cic lookup is changed from an rbtree construct to a radix tree, which
we can then use RCU to make the reader side lockless. That is the performance
critical path, modifying the radix tree is only done on process creation
(when that process first does IO, actually) and on process exit (if that
process has done IO).

As it so happens, radix trees are also much faster for this type of
lookup where the key is a pointer. It's a very sparse tree.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:50:33 +01:00
Nikanth Karthikesan
66dac98ed0 io_context sharing - cfq changes
changes in the cfq for io_context sharing

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:50:32 +01:00
Jens Axboe
d38ecf935f io context sharing: preliminary support
Detach task state from ioc, instead keep track of how many processes
are accessing the ioc.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:50:31 +01:00
Jens Axboe
fd0928df98 ioprio: move io priority from task_struct to io_context
This is where it belongs and then it doesn't take up space for a
process that doesn't do IO.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:50:29 +01:00
Kiyoshi Ueda
b8286239dd blk_end_request: cleanup of request completion (take 4)
This patch merges complete_request() into end_that_request_last()
for cleanup.

complete_request() was introduced by earlier part of this patch-set,
not to break the existing users of end_that_request_last().

Since all users are converted to blk_end_request interfaces and
end_that_request_last() is no longer exported, the code can be
merged to end_that_request_last().

Cc: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:37:15 +01:00
Kiyoshi Ueda
5450d3e1d6 blk_end_request: cleanup 'uptodate' related code (take 4)
This patch converts 'uptodate' arguments of no longer exported
interfaces, end_that_request_first/last, to 'error', and removes
internal conversions for it in blk_end_request interfaces.

Also, this patch removes no longer needed end_io_error().

Cc: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:37:13 +01:00
Kiyoshi Ueda
3bcddeac1c blk_end_request: remove/unexport end_that_request_* (take 4)
This patch removes the following functions:
  o end_that_request_first()
  o end_that_request_chunk()
and stops exporting the functions below:
  o end_that_request_last()

Cc: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:37:12 +01:00
Kiyoshi Ueda
e3a04fe34a blk_end_request: add bidi completion interface (take 4)
This patch adds a variant of the interface, blk_end_bidi_request(),
which completes a bidi request.

Bidi request must be completed as a whole, both rq and rq->next_rq
at once.  So the interface has 2 arguments for completion size.

As for ->end_io, only rq->end_io is called (rq->next_rq->end_io is not
called).  So if special completion handling is needed, the handler
must be set to rq->end_io.
And the handler must take care of freeing next_rq too, since
the interface doesn't care of it if rq->end_io is not NULL.

Cc: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:37:08 +01:00
Kiyoshi Ueda
e19a3ab058 blk_end_request: add callback feature (take 4)
This patch adds a variant of the interface, blk_end_request_callback(),
which has driver callback feature.

Drivers may need to do special works between end_that_request_first()
and end_that_request_last().
For such drivers, blk_end_request_callback() allows it to pass
a callback function which is called between end_that_request_first()
and end_that_request_last().

This interface is only for fallback of other blk_end_request interfaces.
Drivers should avoid their tricky behaviors and use other interfaces
as much as possible.

Currently, only one driver, ide-cd, needs this interface.
So this interface should/will be removed, after the driver removes
such tricky behaviors.

o ide-cd (cdrom_newpc_intr())
  In PIO mode, cdrom_newpc_intr() needs to defer end_that_request_last()
  until the device clears DRQ_STAT and raises an interrupt after
  end_that_request_first().
  So end_that_request_first() and end_that_request_last() are called
  separately in cdrom_newpc_intr().

  This means blk_end_request_callback() has to return without
  completing request even if no leftover in the request.
  To satisfy the requirement, callback function has return value
  so that drivers can tell blk_end_request_callback() to return
  without completing request.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:37:04 +01:00
Kiyoshi Ueda
9e6e39f2c4 blk_end_request: changing block layer core (take 4)
This patch converts core parts of block layer to use blk_end_request
interfaces.  Related 'uptodate' arguments are converted to 'error'.

'dequeue' argument was originally introduced for end_dequeued_request(),
where no attempt should be made to dequeue the request as it's already
dequeued.
However, it's not necessary as it can be checked with
list_empty(&rq->queuelist).
(Dequeued request has empty list and queued request doesn't.)
And it has been done in blk_end_request interfaces.

As a result of this patch, end_queued_request() and
end_dequeued_request() become identical.  A future patch will merge
and rename them and change users of those functions.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-28 10:35:57 +01:00