* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx:
DMAENGINE: extend the control command to include an arg
async_tx: trim dma_async_tx_descriptor in 'no channel switch' case
DMAENGINE: DMA40 fix for allocation of logical channel 0
DMAENGINE: DMA40 support paused channel status
dmaengine: mpc512x: Use resource_size
DMA ENGINE: Do not reset 'private' of channel
ioat: Remove duplicated devm_kzalloc() calls for ioatdma_device
ioat3: disable cacheline-unaligned transfers for raid operations
ioat2,3: convert to producer/consumer locking
ioat: convert to circ_buf
DMAENGINE: Support for ST-Ericssons DMA40 block v3
async_tx: use of kzalloc/kfree requires the include of slab.h
dmaengine: provide helper for setting txstate
DMAENGINE: generic channel status v2
DMAENGINE: generic slave control v2
dma: timb-dma: Update comment and fix compiler warning
dma: Add timb-dma
DMAENGINE: COH 901 318 fix bytesleft
DMAENGINE: COH 901 318 rename confusing vars
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (127 commits)
sh: update defconfigs.
sh: Fix up the NUMA build for recent LMB changes.
sh64: provide a stub per_cpu_trap_init() definition.
sh: fix up CONFIG_KEXEC=n build.
sh: fixup the docbook paths for clock framework shuffling.
driver core: Early dev_name() depends on slab_is_available().
sh: simplify WARN usage in SH clock driver
sh: Check return value of clk_get on ms7724
sh: Check return value of clk_get on ecovec24
sh: move sh clock-cpg.c contents to drivers/sh/clk-cpg.c
sh: move sh clock.c contents to drivers/sh/clk.
sh: move sh asm/clock.h contents to linux/sh_clk.h V2
sh: remove unused clock lookup
sh: switch boards to clkdev
sh: switch sh4-202 to clkdev
sh: switch shx3 to clkdev
sh: switch sh7757 to clkdev
sh: switch sh7763 to clkdev
sh: switch sh7780 to clkdev
sh: switch sh7786 to clkdev
...
The following structure elements duplicate the information in
'struct device.of_node' and so are being eliminated. This patch
makes all readers of these elements use device.of_node instead.
(struct of_device *)->node
(struct dev_archdata *)->prom_node (sparc)
(struct dev_archdata *)->of_node (powerpc & microblaze)
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
This adds an argument to the DMAengine control function, so that
we can later provide control commands that need some external data
passed in through an argument akin to the ioctl() operation
prototype.
[dan.j.williams@intel.com: fix up some missed conversions]
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Saves 24 bytes per descriptor (64-bit) when the channel-switching
capabilities of async_tx are not required.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Fix for allocation failure of logical channel when event line
happens to be number 0.
Signed-off-by: Marcin Mielczarczyk <marcin.mielczarczyk@tieto.com>
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Support determining whether a channel is paused or
not using the status function.
Signed-off-by: Jonas Aaberg <jonas.aberg@stericsson.com>
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Use the resource_size function instead of manually calculating the
resource size. This reduces the chance of introducing off-by-one
errors.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The member 'private' of 'struct dma_chan' is meant for passing
data between client and the controller driver.
The DMA client driver may point it to platform specific stuff after
acquiring the channel. So, it is the responsiblity of the same code
to reset it, if it must.
The DMA engine doesn't set it and hence, shouldn't reset it either.
This reseting of private by DMA Engine comes in the way of implementing
default channel settings during DMAC probe. That capability is useful
for not having the clients to always provide platform specific data,
like Rx/Tx FIFO addresses, which usually doesn't change across channel
requests.
Signed-off-by: Jassi Brar <jassi.brar@samsung.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* 'sh/for-2.6.34' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
sh: fix a number of Oopses and leaks in SH framebuffer driver
SH: fix error paths in DMA driver
sh: sh7751 pci controller io port fix
sh: Fix maximum number of SCIF ports in R2D defconfigs
SH: fix TS field shift calculation for DMA drivers
The memory for ioatdma_device structure is being allocated in
alloc_ioatdma()
Signed-off-by: Minskey Guo <chaohong_guo@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
There are cases where cacheline-unaligned raid operations can hang the
dma channel. Simply disable these operations by increasing the
alignment constraints published to async_tx. The raid456 driver always
issues page aligned requests, so the only in-kernel user of the ioatdma
driver that is affected by this change is dmatest.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Use separate locks for the descriptor prep (producer) and descriptor
cleanup (consumer) paths. Allows the producer path to run concurrently
with the cleanup path. Inspired by Documentation/circular-buffer.txt.
Cc: David Howells <dhowells@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This enables autoloading of the TXx9 sound driver on RBTX4927.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
To: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Linux MIPS Mailing List <linux-mips@linux-mips.org>
Patchwork: http://patchwork.linux-mips.org/patch/1101/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
If channel allocation is failing, mark the channel unused and give PM a chance
to power down the hardware.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Lists of DMA channels and slaves are not changed, make them constant. Besides,
SH7724 channel and slave configuration of both DMA controllers is identical,
remove the extra copy of the configuration data.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This is a straightforward driver for the ST-Ericsson DMA40 DMA
controller found in U8500, implemented akin to the existing
COH 901 318 driver.
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Acked-by: Srinidh Kasagar <srinidhi.kasagar@stericsson.com>
Cc: STEricsson_nomadik_linux@list.st.com
Cc: Alessandro Rubini <rubini@unipv.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Convert the device_is_tx_complete() operation on the
DMA engine to a generic device_tx_status()operation which
can return three states, DMA_TX_RUNNING, DMA_TX_COMPLETE,
DMA_TX_PAUSED.
[dan.j.williams@intel.com: update for timberdale]
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Maciej Sosnowski <maciej.sosnowski@intel.com>
Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Li Yang <leoli@freescale.com>
Cc: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Cc: Magnus Damm <damm@opensource.se>
Cc: Liam Girdwood <lrg@slimlogic.co.uk>
Cc: Joe Perches <joe@perches.com>
Cc: Roland Dreier <rdreier@cisco.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Convert the device_terminate_all() operation on the
DMA engine to a generic device_control() operation
which can now optionally support also pausing and
resuming DMA on a certain channel. Implemented for the
COH 901 318 DMAC as an example.
[dan.j.williams@intel.com: update for timberdale]
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Maciej Sosnowski <maciej.sosnowski@intel.com>
Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Li Yang <leoli@freescale.com>
Cc: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Cc: Magnus Damm <damm@opensource.se>
Cc: Liam Girdwood <lrg@slimlogic.co.uk>
Cc: Joe Perches <joe@perches.com>
Cc: Roland Dreier <rdreier@cisco.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
An incremental patch which clarifies what the spinlock is used for
and fixes a compiler warning.
Signed-off-by: Richard Röjfors <richard.rojfors@pelagicore.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Adds the support for the DMA engine withing the timberdale FPGA.
The DMA channels are strict device to host, or host to device
and can not be used for generic memcpy.
Signed-off-by: Richard Röjfors <richard.rojfors@pelagicore.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This makes the function to get the number of bytes left in the
ongoing DMA transaction actually work: the old code did not take
neither lli:s nor queued jobs into account. Also fix a missing
spinlock while we're at it.
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This fixes up the code with a lot of comments that make it readable,
rename things with opaque names like "data" into something more
appropriate, and remove some very confusing BUG() statements.
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Create a common platform data header file for the
shdma dmaengine driver. This is done by moving
common structures from sh asm/dmaengine.h to
linux/sh_dma.h. DMA registers are also copied from
sh asm/dma-register.h to make the code architecture
independent.
The sh header file asm/dmaengine.h is still kept
with the slave id enum. This allows us to keep the
old processor specific code as is and slowly move
over to slave id enums in per-processor headers.
Signed-off-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Move SHDMA_SLAVE_NUMBER from asm/dmaengine.h to
shdma.h. Set it to 256 to support a wide range
of processors. The amount of memory consumed by
this change is limited to 256 bits.
While at it, rename to SH_DMA_SLAVE_NUMBER to
match with the rest of the file.
Signed-off-by: Magnus Damm <damm@opensource.se>
Acked-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This patch replaces the sh_dmae_slave_chan_id enum
with an unsigned int. The purpose of this chainge is
to make it possible to separate the slave id enums
from the dmaengine header.
The slave id enums varies with processor model, so in
the future it makes sense to put these in the processor
specific headers together with the pinmux enums.
Signed-off-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (56 commits)
doc: fix typo in comment explaining rb_tree usage
Remove fs/ntfs/ChangeLog
doc: fix console doc typo
doc: cpuset: Update the cpuset flag file
Fix of spelling in arch/sparc/kernel/leon_kernel.c no longer needed
Remove drivers/parport/ChangeLog
Remove drivers/char/ChangeLog
doc: typo - Table 1-2 should refer to "status", not "statm"
tree-wide: fix typos "ass?o[sc]iac?te" -> "associate" in comments
No need to patch AMD-provided drivers/gpu/drm/radeon/atombios.h
devres/irq: Fix devm_irq_match comment
Remove reference to kthread_create_on_cpu
tree-wide: Assorted spelling fixes
tree-wide: fix 'lenght' typo in comments and code
drm/kms: fix spelling in error message
doc: capitalization and other minor fixes in pnp doc
devres: typo fix s/dev/devm/
Remove redundant trailing semicolons from macros
fix typo "definetly" -> "definitely" in comment
tree-wide: s/widht/width/g typo in comments
...
Fix trivial conflict in Documentation/laptops/00-INDEX
Constify struct sysfs_ops.
This is part of the ops structure constification
effort started by Arjan van de Ven et al.
Benefits of this constification:
* prevents modification of data that is shared
(referenced) by many other structure instances
at runtime
* detects/prevents accidental (but not intentional)
modification attempts on archs that enforce
read-only kernel data at runtime
* potentially better optimized code as the compiler
can assume that the const data cannot be changed
* the compiler/linker move const data into .rodata
and therefore exclude them from false sharing
Signed-off-by: Emese Revfy <re.emese@gmail.com>
Acked-by: David Teigland <teigland@redhat.com>
Acked-by: Matt Domsch <Matt_Domsch@dell.com>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: Hans J. Koch <hjk@linutronix.de>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (26 commits)
sh: Convert sh to use read/update_persistent_clock
sh: Move PMB debugfs entry initialization to later stage
sh: Fix up flush_cache_vmap() on SMP.
sh: fix up MMU reset with variable PMB mapping sizes.
sh: establish PMB mappings for NUMA nodes.
sh: check for existing mappings for bolted PMB entries.
sh: fixed virt/phys mapping helpers for PMB.
sh: make pmb iomapping configurable.
sh: reworked dynamic PMB mapping.
sh: Fix up cpumask_of_pcibus() for the NUMA build.
serial: sh-sci: Tidy up build warnings.
sh: Fix up ctrl_read/write stragglers in migor setup.
serial: sh-sci: Add DMA support.
dmaengine: shdma: extend .device_terminate_all() to record partial transfer
sh: merge sh7722 and sh7724 DMA register definitions
sh: activate runtime PM for dmaengine on sh7722 and sh7724
dmaengine: shdma: add runtime PM support.
dmaengine: shdma: separate DMA headers.
dmaengine: shdma: convert to platform device resources
dmaengine: shdma: fix DMA error handling.
...
Rename for_each_bit to for_each_set_bit in the kernel source tree. To
permit for_each_clear_bit(), should that ever be added.
The patch includes a macro to map the old for_each_bit() onto the new
for_each_set_bit(). This is a (very) temporary thing to ease the migration.
[akpm@linux-foundation.org: add temporary for_each_bit()]
Suggested-by: Alexey Dobriyan <adobriyan@gmail.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Artem Bityutskiy <dedekind@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the calling convention of ->timer_fn() and ->cleanup_fn() are unified
across hardware versions we can drop parameters to ioat_init_channel() and
unify ioat_is_dma_complete() implementations.
Both ->timer_fn() and ->cleanup_fn() are modified to expect a struct
dma_chan pointer.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The hardware automatically disables further interrupts after each event
until rearmed. This allows a delay to be injected between the occurence
of the interrupt and the running of the cleanup routine. The delay is
scaled by the descriptor backlog and then written to the INTRDELAY
register which specifies the number of microseconds to hold off
interrupt delivery after an interrupt event occurs. According to
powertop this reduces the interrupt rate from ~5000 intr/s to ~150
intr/s per without affecting throughput (simple dd to a raid6 array).
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Since ioat_cleanup_preamble() and the update of the last completed
descriptor are not synchronized there is a chance that two cleanup threads
can see descriptors to clean. If the first cleans up all pending
descriptors then the second will trigger the BUG_ON.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The pending == 2 case no longer exists in the driver so, we can use
ioat2_ring_pending() outside the lock to determine if there might be any
descriptors in the ring that the hardware has not seen.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
We already disallow raid operations while DCA is globally enabled, so
having it locally enabled is a nop and confusing when reading the code.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This makes the COH 901 318 respect the scatter offset field by using
the sg_phys() rather than the sg_dma_address() so we get a pointer
to the actual data we want to send rather than the beginning of the
buffer. Also initialize the lli:s a bit more thoroughly.
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This makes the COH 901 318 configure channel direction (to or from
device) dynamically, instead of being passed in from the platform
data. This was necessary in order to get the MMC/SD-card channel
bidirectional (all other channels on the U300 were either RX or
TX but this one was both). This also sets memcpy() alignent to
even 2^2 (32bit) boundaries, which makes the memcpy() stress tests
start working.
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This removes the pointless irq counting for the COH 901 318, as
it turns out the hardware will only ever fire one IRQ for a linked
list anyway. In the process also a missing spinlock was introduced.
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This centralize some spread-out initialization of descriptors into
one function and cleans up the error paths.
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This cleans up the some debug code that was not working in the
COH 901 318 driver, adds some helpful comments and rearrange the
code a bit.
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Adds initial version of MPC512x DMA driver.
Only memory to memory transfers are currenly supported.
Signed-off-by: Piotr Ziecik <kosmo@semihalf.com>
Signed-off-by: Wolfgang Denk <wd@denx.de>
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Cc: John Rigby <jcrigby@gmail.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This adds Kconfig options for DEBUG and VERBOSE_DEBUG to the DMA
engine subsystem, I got tired of editing the Makefile manually
each time I want to debug things in here, modelled this on the
debug switches for other subsystems and works like a charm when
working on our DMA engines.
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This patch extends the .device_terminate_all() method of the shdma driver
to return number of bytes transfered in the current descriptor.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Provided platforms implement runtime PM, this disables the controller, when not
in use.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Separate SH DMA headers into ones, commonly used by both drivers, and ones,
specific to each of them. This will make the future development of the
dmaengine driver easier.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
The shdma dmaengine driver currently uses numerous macros to support various
platforms, selected by ifdef's. Convert it to use platform device resources and
lists of channel descriptors to specify register locations, interrupt numbers
and other system-specific configuration variants. Unavoidably, we have to
simultaneously convert all shdma users to provide those resources.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Present DMA error ISR in shdma.c is bogus, it locks the system hard in multiple
ways. Fix it to abort all queued transactions on all channels on the affected
controller and giving submitters a chance to get a DMA_ERROR status for aborted
transactions. Afterwards further functionality is again possible without the
need to re-load the driver.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Just like commit ac5d73fc, we need to be careful to use 'src_cnt' as it
contains the fixed up number of xor sources (forced odd) to meet dmatest's
data verification scheme.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The number of PQ sources specified by module parameter "pq_sources"
is always forced odd to fit into dmatest's destination verificaton
scheme. But number of PQ sources and coefficients as passed to the
driver's prep_dma_pq() is not adjusted accordingly.
Fix it now to get correct PQ testing results in the case passed
"pq_sources" parameter is even.
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
fsl_dma_update_completed_cookie() appears to calculate the last completed
cookie incorrectly in the corner case where DMA on cookie 1 is in progress
just following a cookie wrap.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Acked-by: Ira W. Snyder <iws@ovro.caltech.edu>
[dan.j.williams@intel.com: fix an integer overflow warning with INT_MAX]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
fsl_dma_tx_submit() only sets the cookie on the first descriptor of a
transaction. It should set the cookie on all.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Acked-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Add __percpu sparse annotations to places which didn't make it in one
of the previous patches. All converions are trivial.
These annotations are to make sparse consider percpu variables to be
in a different address space and warn if accessed without going
through percpu accessors. This patch doesn't affect normal builds.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Neil Brown <neilb@suse.de>
cohd_fin has already been verified not to be NULL, so the argument to
BUG_ON cannot be true.
A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
@r@
expression *x;
expression e;
identifier l;
@@
if (x == NULL || ...) {
... when forall
return ...; }
... when != goto l;
when != x = e
when != &x
*x == NULL
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
If submitting new buffer failed, a wrong descriptor gets completed and it
doesn't check, if a callback is at all defined, which can lead to an Oops. Fix
these bugs and make ipu_update_channel_buffer() void, because it never fails.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In particular, several occurances of funny versions of 'success',
'unknown', 'therefore', 'acknowledge', 'argument', 'achieve', 'address',
'beginning', 'desirable', 'separate' and 'necessary' are fixed.
Signed-off-by: Daniel Mack <daniel@caiaq.de>
Cc: Joe Perches <joe@perches.com>
Cc: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Tested to work with a SIU ASoC driver on sh7722 (migor).
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Both the original arch/sh/drivers/dma/dma-sh.c and the new SH dmaengine drivers
do not take into account bits 3:2 of the Transfer Size field in the CHCR
register, besides, bit-field defines set bit 2, but the mask only passes bits
1:0 through. TS_16BLK and TS_32BLK macros are bogus too. This patch fixes all
these issues for sh7722 and sh7724, other CPUs stay unchanged and might need to
be fixed too.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Slave DMA functionality uses scatter-gather arrays for data transfers,
whereas memcpy just uses a single data buffer. This patch converts the
current memcpy implementation in shdma.c to use scatter-gather, making it
just a special case with one SG-element. This allows us to isolate
descriptor list manipulations and locking into one function, thus reducing
error chances.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Use DECLARE_WAIT_QUEUE_HEAD_ONSTACK to make lockdep happy
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Maciej Sosnowski <maciej.sosnowski@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In these cases the same statements are executed.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Acked-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The match_table field of the struct of_device_id is constant in <linux/of_platform.h>
so it is worth to make the initialization data also constant.
The semantic match that finds this kind of pattern is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@r@
disable decl_init,const_decl_init;
identifier I1, I2, x;
@@
struct I1 {
...
const struct I2 *x;
...
};
@s@
identifier r.I1, y;
identifier r.x, E;
@@
struct I1 y = {
.x = E,
};
@c@
identifier r.I2;
identifier s.E;
@@
const struct I2 E[] = ... ;
@depends on !c@
identifier r.I2;
identifier s.E;
@@
+ const
struct I2 E[] = ...;
// </smpl>
Signed-off-by: Márton Németh <nm127@freemail.hu>
Cc: Julia Lawall <julia@diku.dk>
Cc: cocci@diku.dk
[dan.j.williams@intel.com: resolved conflict with recent fsldma updates]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Fix typo in ioat2_quiesce. check 'tmo' is zero, not 'end'. Also applies
to 2.6.32.3
Cc: <stable@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
While debugging a dma driver I noticed a memleak after
unloading the driver module.
Caught by kmemleak.
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Cc: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Fix locking. Use two queues in the driver, one for pending transacions, and
one for transactions which are actually running on the hardware. Call
dma_run_dependencies() on descriptor cleanup so that the async_tx API works
correctly.
There are a number of places throughout the code where lists of descriptors
are freed in a loop. Create functions to handle this, and use them instead
of open-coding the loop each time.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The name fsl_chan seems too long, so it has been shortened to chan. There
are only a few places where the higher level "struct dma_chan *chan" name
conflicts. These have been changed to "struct dma_chan *dchan" instead.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The IRQ probing is needlessly complex. All off the 83xx device trees in
arch/powerpc/boot/dts/ specify 5 interrupts per DMA controller: one for the
controller, and one for each channel. These interrupts are all attached to
the same IRQ line.
This causes an interesting situation if two channels interrupt at the same
time. The per-controller handler will handle the first channel, and the
per-channel handler will handle the remaining channels.
Instead of this mess, we fix the bug in the per-controller handler, and
make it handle all channels that generated an interrupt. When a
per-controller handler is specified in the device tree, we prefer to use
the shared handler instead of the per-channel handler.
The 85xx/86xx controllers do not have a per-controller interrupt, and
instead use a per-channel interrupt. This behavior has not been changed.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This fixes some errors in the cleanup paths of the OF subsystem, including
missing checks for ioremap failing. Also, some variables were renamed for
brevity.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Most functions in the standard library use "dst" as a parameter, rather
than "dest". This renames all use of "dest" to "dst" to match the usual
convention.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This is the beginning of a cleanup which will change all instances of
"fsl_dma" to "fsldma" to match the name of the driver itself.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Remove some unused members from the fsldma data structures. A few trivial
uses of struct resource were converted to use the stack rather than keeping
the memory allocated for the lifetime of the driver.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Some of the functions are written in a way where they use multiple reads
and writes where a single read/write pair could suffice. This shrinks the
kernel text size measurably, while making the functions easier to
understand.
add/remove: 0/0 grow/shrink: 1/4 up/down: 4/-196 (-192)
function old new delta
fsl_chan_set_request_count 120 124 +4
dma_halt 300 272 -28
fsl_chan_set_src_loop_size 208 156 -52
fsl_chan_set_dest_loop_size 208 156 -52
fsl_chan_xfer_ld_queue 500 436 -64
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx:
drivers/dma: Correct use after free
drivers/dma: drop unnecesary memset
ioat2,3: put channel hardware in known state at init
async_tx: expand async raid6 test to cover ioatdma corner case
ioat3: fix p-disabled q-continuation
sh: fix DMA driver's descriptor chaining and cookie assignment
dma: at_hdmac: correct incompatible type for argument 1 of 'spin_lock_bh'
Move the kfree after the iounmap that refers to the same structure.
A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
@@
expression x,e;
identifier f;
iterator I;
statement S;
@@
*kfree(x);
... when != &x
when != x = e
when != I(x,...) S
*x->f
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
memset of 0 is not needed after kzalloc
The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression x;
statement S;
@@
x = kzalloc(...);
if (x == NULL) S
... when != x
-memset(x,0,...);// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Put the ioat2 and ioat3 state machines in the halted state with all
errors cleared.
The ioat1 init path is not disturbed for stability, there are no
reported ioat1 initiaization issues.
Cc: <stable@kernel.org>
Reported-by: Roland Dreier <rdreier@cisco.com>
Tested-by: Roland Dreier <rdreier@cisco.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When continuing a pq calculation the driver needs 3 extra sources. The
driver can perform a 3 source calculation with a single descriptor, but
needs an extended descriptor to process up to 8 sources in one
operation. However, in the p-disabled case only one extra source is
needed. When continuing a p-disabled operation there are occasions
(i.e. 0 < src_cnt % 8 < 3) where the tail operation does not need an
extended descriptor. Properly account for this fact otherwise invalid
'dmacount' values will be written to hardware usually causing the
channel to halt with 'invalid descriptor' errors.
Cc: <stable@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The SH DMA driver wrongly assigns negative cookies to transfer descriptors,
also, its chaining of partial descriptors is broken. The latter problem is
usually invisible, because maximum transfer size per chunk is 16M, but if you
artificially set this limit lower, the driver fails. Since cookies are also
used in chunk management, both these problems are fixed in one patch. As side
effects a possible memory leak, when descriptors are prepared, but not
submitted, and multiple races have also been fixed.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx:
ppc440spe-adma: adds updated ppc440spe adma driver
iop-adma.c: use resource_size()
dmaengine: clarify the meaning of the DMA_CTRL_ACK flag
sh: stylistic improvements for the DMA driver
dmaengine: fix dmatest to verify minimum transfer length and test buffer size
sh: DMA driver has to specify its alignment requirements
Add COH 901 318 DMA block driver v5
Correct a typo error in locking calls.
Cc: <stable@kernel.org>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits)
m68k: rename global variable vmalloc_end to m68k_vmalloc_end
percpu: add missing per_cpu_ptr_to_phys() definition for UP
percpu: Fix kdump failure if booted with percpu_alloc=page
percpu: make misc percpu symbols unique
percpu: make percpu symbols in ia64 unique
percpu: make percpu symbols in powerpc unique
percpu: make percpu symbols in x86 unique
percpu: make percpu symbols in xen unique
percpu: make percpu symbols in cpufreq unique
percpu: make percpu symbols in oprofile unique
percpu: make percpu symbols in tracer unique
percpu: make percpu symbols under kernel/ and mm/ unique
percpu: remove some sparse warnings
percpu: make alloc_percpu() handle array types
vmalloc: fix use of non-existent percpu variable in put_cpu_var()
this_cpu: Use this_cpu_xx in trace_functions_graph.c
this_cpu: Use this_cpu_xx for ftrace
this_cpu: Use this_cpu_xx in nmi handling
this_cpu: Use this_cpu operations in RCU
this_cpu: Use this_cpu ops for VM statistics
...
Fix up trivial (famous last words) global per-cpu naming conflicts in
arch/x86/kvm/svm.c
mm/slab.c
This patch adds new version of the PPC440SPe ADMA driver.
Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The size of the requested and ioremaped memory is off by 1.
Use resource_size() to get the correct value.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Transfers and the test buffer have to be at least align bytes long.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The SH DMA driver by default uses 32-byte transfers, in this mode buffers and
sizes have to be 32-byte aligned. Specifying this requirement also fixes Oopses
with dmatest.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
1/ Error handling code following a kzalloc should free the allocated data.
2/ Report an error when no platform data is detected
Both problems fixed by moving the platform data check before the allocation,
and allows a goto to be killed.
Reported-by: Julia Lawall <julia@diku.dk>
Acked-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This patch adds support for the ST-Ericsson COH 901 318 DMA block,
found in the U300 series platforms. It registers a DMA slave for
device I/O and also a memcpy slave for memcpy.
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The completion of a pq operation is notified with a null descriptor
appended to the end of the chain. This descriptor needs to be visible
to dma clients otherwise the client is precluded from ensuring all
operations are quiesced before freeing channel resources, i.e. due to
descriptor polling it may get the completion notification ahead of the
interrupt delivered by the null descriptor.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
ioat3.2 does not support asynchronous error notifications which makes
the driver experience latencies when non-zero pq validate results are
expected. Provide a mechanism for turning off async_xor_val and
async_syndrome_val via Kconfig. This approach is generally useful for
any driver that specifies ASYNC_TX_DISABLE_CHANNEL_SWITCH and would like
to force the async_tx api to fall back to the synchronous path for
certain operations.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Error interrupts and error completions may cause channel hangs, so
poll the channel status register after a timeout.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
RAID operations cause a system hang on platforms with DCA
(Direct-Cache-Access) enabled. So turn off RAID capabilities in this
case.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Add at91sam9g45 dependency to drivers/dma/Kconfig
Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
There are cases where we can use this_cpu_ptr and as the result
of using this_cpu_ptr() we no longer need to determine the
currently executing cpu.
In those places no get/put_cpu combination is needed anymore.
The local cpu variable can be eliminated.
Preemption still needs to be disabled and enabled since the
modifications of the per cpu variables is not atomic. There may
be multiple per cpu variables modified and those must all
be from the same processor.
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
cc: Eric Biederman <ebiederm@aristanetworks.com>
cc: Stephen Hemminger <shemminger@vyatta.com>
cc: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
drivers/dma/ioat/dma_v3.c: In function 'ioat3_prep_memset_lock':
drivers/dma/ioat/dma_v3.c:439: warning: 'fill' may be used uninitialized in this function
drivers/dma/ioat/dma_v3.c:437: warning: 'desc' may be used uninitialized in this function
drivers/dma/ioat/dma_v3.c: In function '__ioat3_prep_xor_lock':
drivers/dma/ioat/dma_v3.c:489: warning: 'xor' may be used uninitialized in this function
drivers/dma/ioat/dma_v3.c:486: warning: 'desc' may be used uninitialized in this function
drivers/dma/ioat/dma_v3.c: In function '__ioat3_prep_pq_lock':
drivers/dma/ioat/dma_v3.c:631: warning: 'pq' may be used uninitialized in this function
drivers/dma/ioat/dma_v3.c:628: warning: 'desc' may be used uninitialized in this function
gcc-4.0, unlike gcc-4.3, does not see that these variables are
initialized before use. Convert the descriptor loops to do-while make
this initialization apparent.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
drivers/dma/ioat/dma_v2.c: In function 'ioat2_dma_prep_memcpy_lock':
drivers/dma/ioat/dma_v2.c:680: warning: 'hw' may be used uninitialized in this function
drivers/dma/ioat/dma_v2.c:681: warning: 'desc' may be used uninitialized in this function
Cc: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
With the addition of ioat_max_alloc_order it is not clear what the
maximum allocation order is, so document that in the modinfo. Also take
an opportunity to kill a stray semicolon.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This patch reworks platform driver power management code
for at_hdmac from legacy late/early callbacks to dev_pm_ops.
The callbacks are converted for CONFIG_SUSPEND like this:
suspend_late() -> suspend_noirq()
resume_early() -> resume_noirq()
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
A new ring implementation and the addition of raid functionality
constitutes a bump in the driver major version number.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This patch enables DCA support on multiple-IOH/multiple-IIO architectures.
It modifies dca module by replacing single dca_providers list
with dca_domains list, each domain containing separate list of providers.
This approach lets dca driver manage multiple domains, i.e. sets of providers
and requesters mapped back to the same PCI root complex device.
The driver takes care to register each requester to a provider
from the same domain.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
This restriction prevented ASYNC_TX_DMA from being enabled on platform
configurations where DMA address conversion could not be performed in
place on the stack. Since commit 04ce9ab3 ("async_xor: permit callers
to pass in a 'dma/page scribble' region") the async_tx api now either
uses a caller provided 'scribble' buffer, or performs the conversion in
place when sizeof(dma_addr_t) <= sizeof(struct page *).
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This supported all DMA channels, and it was tested in SH7722,
SH7780, SH7785 and SH7763.
This can not use with SH DMA API.
Signed-off-by: Nobuhiro Iwamatsu <iwamatsu.nobuhiro@renesas.com>
Reviewed-by: Matt Fleming <matt@console-pimps.org>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams wrote:
... DMA-slave clients request specific channels and know the hardware
details at a low level, so it should not be too high an expectation to
push dma mapping responsibility to the client.
Also this patch includes DMA_COMPL_{SRC,DEST}_UNMAP_SINGLE support for
dw_dmac driver.
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Use the DMA_SLAVE capability of the DMAEngine API to copy/from a
scatterlist into an arbitrary list of hardware address/length pairs.
This allows a single DMA transaction to copy data from several different
devices into a scatterlist at the same time.
This also adds support to enable some controller-specific features such as
external start and external pause for a DMA transaction.
[dan.j.williams@intel.com: rebased on tx_list movement]
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Acked-by: Li Yang <leoli@freescale.com>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When using the Freescale DMA controller in external control mode, both the
request count and external pause bits need to be setup correctly. This was
being done with the same function.
The 83xx controller lacks the external pause feature, but has a similar
feature called external start. This feature requires that the request count
bits be setup correctly.
Split the function into two parts, to make it possible to use the external
start feature on the 83xx controller.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
All the necessary fields for handling an ioat2,3 ring entry can fit into
one cacheline. Move ->len prior to ->txd in struct ioat_ring_ent, and
move allocation of these entries to a hw-cache-aligned kmem cache to
reduce the number of cachelines dirtied for descriptor management.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The tx_list attribute of struct dma_async_tx_descriptor is common to
most, but not all dma driver implementations. None of the upper level
code (dmaengine/async_tx) uses it, so allow drivers to implement it
locally if they need it. This saves sizeof(struct list_head) bytes for
drivers that do not manage descriptors with a linked list (e.g.: ioatdma
v2,3).
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Drop txx9dmac's use of tx_list from struct dma_async_tx_descriptor in
preparation for removal of this field.
Cc: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Drop at_hdmac's use of tx_list from struct dma_async_tx_descriptor in
preparation for removal of this field.
Cc: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Drop mv_xor's use of tx_list from struct dma_async_tx_descriptor in
preparation for removal of this field.
Cc: Saeed Bishara <saeed@marvell.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Drop ioatdma's use of tx_list from struct dma_async_tx_descriptor in
preparation for removal of this field.
Cc: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Drop iop-adma's use of tx_list from struct dma_async_tx_descriptor in
preparation for removal of this field.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Drop fsldma's use of tx_list from struct dma_async_tx_descriptor in
preparation for removal of this field.
Cc: Li Yang <leoli@freescale.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Drop dw_dmac's use of tx_list from struct dma_async_tx_descriptor in
preparation for removal of this field.
Cc: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Trivial cleanup to make the PCI ID table easier to read.
[dan.j.williams@intel.com: extended to v3.2 devices]
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The ioatdma module is missing aliases for the PCI devices it supports,
so it is not autoloaded on boot. Add a MODULE_DEVICE_TABLE() to get
these aliases.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The cleanup routine for the raid cases imposes extra checks for handling
raid descriptors and extended descriptors. If the channel does not
support raid it can avoid this extra overhead by using the ioat2 cleanup
path.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Jasper Forest introduces raid offload support via ioat3.2 support. When
raid offload is enabled two (out of 8 channels) will report raid5/raid6
offload capabilities. The remaining channels will only report ioat3.0
capabilities (memcpy).
Signed-off-by: Tom Picard <tom.s.picard@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The async_tx api uses the DMA_INTERRUPT operation type to terminate a
chain of issued operations with a callback routine.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
If a platform advertises pq capabilities, but not xor, then use
ioat3_prep_pqxor and ioat3_prep_pqxor_val to simulate xor support.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
ioat3.2 adds support for raid6 syndrome generation (xor sum of galois
field multiplication products) using up to 8 sources. It can also
perform an pq-zero-sum operation to validate whether the syndrome for a
given set of sources matches a previously computed syndrome.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This adds a hardware specific self test to be called from ioat_probe.
In the ioat3 case we will have tests for all the different raid
operations, while ioat1 and ioat2 will continue to just test memcpy.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
ioat3.2 adds xor offload support for up to 8 sources. It can also
perform an xor-zero-sum operation to validate whether all given sources
sum to zero, without writing to a destination. Xor descriptors differ
from memcpy in that one operation may require multiple descriptors
depending on the number of sources. When the number of sources exceeds
5 an extended descriptor is needed. These descriptors need to be
accounted for when updating the DMA_COUNT register.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Tag completion writes for direct cache access to reduce the latency of
checking for descriptor completions.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Export driver attributes for diagnostic purposes:
'ring_size': total number of descriptors available to the engine
'ring_active': number of descriptors in-flight
'capabilities': supported operation types for this channel
'version': Intel(R) QuickData specfication revision
This also allows some chattiness to be removed from the driver startup
as this information is now available via sysfs.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Up until this point the driver for Intel(R) QuickData Technology
engines, specification versions 2 and 3, were mostly identical save for
a few quirks. Version 3.2 hardware adds many new capabilities (like
raid offload support) requiring some infrastructure that is not relevant
for v2. For better code organization of the new funcionality move v3
and v3.2 support to its own file dma_v3.c, and export some routines from
the base files (dma.c and dma_v2.c) that can be reused directly.
The first new capability included in this code reorganization is support
for v3.2 memset operations.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
ioat3.2 adds raid5 and raid6 offload capabilities.
Signed-off-by: Tom Picard <tom.s.picard@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In preparation for adding more operation types to the ioat3 path the
driver needs to honor the DMA_PREP_FENCE flag. For example the async_tx api
will hand xor->memcpy->xor chains to the driver with the 'fence' flag set on
the first xor and the memcpy operation. This flag in turn sets the 'fence'
flag in the descriptor control field telling the hardware that future
descriptors in the chain depend on the result of the current descriptor, so
wait for all writes to complete before starting the next operation.
Note that ioat1 does not prefetch the descriptor chain, so does not
require/support fenced operations.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Some engines have transfer size and address alignment restrictions. Add
a per-operation alignment property to struct dma_device that the async
routines and dmatest can use to check alignment capabilities.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Channel switching is problematic for some dmaengine drivers as the
architecture precludes separating the ->prep from ->submit. In these
cases the driver can select ASYNC_TX_DISABLE_CHANNEL_SWITCH to modify
the async_tx allocator to only return channels that support all of the
required asynchronous operations.
For example MD_RAID456=y selects support for asynchronous xor, xor
validate, pq, pq validate, and memcpy. When
ASYNC_TX_DISABLE_CHANNEL_SWITCH=y any channel with all these
capabilities is marked DMA_ASYNC_TX allowing async_tx_find_channel() to
quickly locate compatible channels with the guarantee that dependency
chains will remain on one channel. When
ASYNC_TX_DISABLE_CHANNEL_SWITCH=n async_tx_find_channel() may select
channels that lead to operation chains that need to cross channel
boundaries using the async_tx channel switch capability.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Handle descriptor allocation failures by polling for a descriptor. The
driver will force forward progress when polled. In the best case this
polling interval will be the time it takes for one dma memcpy
transaction to complete. In the worst case, channel hang, we will need
to wait 100ms for the cleanup watchdog to fire (ioatdma driver).
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Increment the allocation order of the descriptor ring every time we run
out of descriptors up to a maximum of allocation order specified by the
module parameter 'ioat_max_alloc_order'. After each idle period
decrement the allocation order to a minimum order of
'ioat_ring_alloc_order' (i.e. the default ring size, tunable as a module
parameter).
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In order to support dynamic resizing of the descriptor ring or polling
for a descriptor in the presence of a hung channel the reset handler
needs to make progress while in a non-preemptible context. The current
workqueue implementation precludes polling channel reset completion
under spin_lock().
This conversion also allows us to return to opportunistic cleanup in the
ioat2 case as the timer implementation guarantees at least one cleanup
after every descriptor is submitted. This means the worst case
completion latency becomes the timer frequency (for exceptional
circumstances), but with the benefit of avoiding busy waiting when the
lock is contended.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Save 4 bytes per software descriptor by transmitting tx_cnt in an unused
portion of the hardware descriptor.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Mark all single use initialization routines with __devinit.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The register write in ioat_dma_cleanup_tasklet is unfortunate in two
ways:
1/ It clears the extra 'enable' bits that we set at alloc_chan_resources time
2/ It gives the impression that it disables interrupts when it is in
fact re-arming interrupts
[ Impact: fix, persist the value of the chanctrl register when re-arming ]
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Don't trust that the reserved bits are always zero, also sanity check
the returned value.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The cleanup path makes an effort to only perform an atomic read of the
64-bit completion address. However in the 32-bit case it does not
matter if we read the upper-32 and lower-32 non-atomically because the
upper-32 will always be zero.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Provide some output for debugging the driver.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The unified ioat1/ioat2 ioat_dma_unmap() implementation derives the
source and dest addresses from the unmap descriptor. There is no longer
a need to track this information in struct ioat_desc_sw.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Replace the current linked list munged into a ring with a native ring
buffer implementation. The benefit of this approach is reduced overhead
as many parameters can be derived from ring position with simple pointer
comparisons and descriptor allocation/freeing becomes just a
manipulation of head/tail pointers.
It requires a contiguous allocation for the software descriptor
information.
Since this arrangement is significantly different from the ioat1 chain,
move ioat2,3 support into its own file and header. Common routines are
exported from driver/dma/ioat/dma.[ch].
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Prepare the code for the conversion of the ioat2 linked-list-ring into a
native ring buffer. After this conversion ioat2 channels will share
less of the ioat1 infrastructure, but there will still be places where
sharing is possible. struct ioat_chan_common is created to house the
channel attributes that will remain common between ioat1 and ioat2
channels.
For every routine that accesses both common and hardware specific fields
the old unified 'ioat_chan' pointer is split into an 'ioat' and 'chan'
pointer. Where 'chan' references common fields and 'ioat' the
hardware/version specific.
[ Impact: pure structure member movement/variable renames, no logic changes ]
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
If a callback is to be attached to a descriptor the channel needs to
know at ->prep time so it can set the interrupt enable bit. This is in
preparation for moving descriptor ioat2 descriptor preparation from
->submit to ->prep.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The async_tx api assumes that after a successful ->prep a subsequent
->submit will not fail due to a lack of resources.
This also fixes a bug in the allocation failure case. Previously the
descriptors allocated prior to the allocation failure would not be
returned to the free list.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This cleans up a mess of and'ing and or'ing bit definitions, and allows
simple assignments from the specified dma_ctrl_flags parameter.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
->dmacount tracks the sequence number of active descriptors. It is
written to the DMACOUNT register to update the channel's view of pending
descriptors in the chain. The register is 16-bits so ->dmacount should
be unsigned and 16-bit as well. Also modify ->desccount to maintain
alignment.
This was never a problem in practice because we never compared dmacount
values, but this is a bug waiting to happen.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Towards the removal of ioatdma_device.version split the initialization
path into distinct versions. This conversion:
1/ moves version specific probe code to version specific routines
2/ removes the need for ioat_device
3/ turns off the ioat1 msi quirk if the device is reinitialized for intx
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The only .c files that utilize these protected prototypes depend on
CONFIG_INTEL_IOATDMA=y, so there is no value gained in providing empty
prototypes.
[ Impact: pure cleanup ]
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* reduce device->common. to dma-> in ioat_dma_{probe,remove,selftest}
* ioat_lookup_chan_by_index to ioat_chan_by_index
* multi-line function definitions
* ioat_desc_sw.async_tx to ioat_desc_sw.txd
* desc->txd. to tx-> in cleanup routine
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The driver currently duplicates much of what these routines offer, so
just use the common code. For example ->irq_mode tracks what interrupt
mode was initialized, which duplicates the ->msix_enabled and
->msi_enabled handling in pcim_release.
This also adds a check to the return value of dma_async_device_register,
which can fail.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Some of these defines may be useful outside of dma.c and the header is
private so there are no namespace pollution concerns.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Even though the intent is to extend dmatest with P+Q tests there is
still value in having an always-on sanity check to prevent an
unintentionally broken driver from registering.
This depends on raid6_pq.ko for verification, the side effect being that
PQ capable channels will fail to register when raid6 is disabled.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
iop33x support is not included because that engine is a bit more awkward
to handle in that it can either be in xor mode or pq mode. The
dmaengine/async_tx layers currently only comprehend static capabilities.
Note iop13xx does not support hardware PQ continuation so the driver
must handle the DMA_PREP_CONTINUE flag for operations across > 16
sources. From the comment for dma_maxpq:
/* When an engine does not support native continuation we need 3 extra
* source slots to reuse P and Q with the following coefficients:
* 1/ {00} * P : remove P from Q', but use it as a source for P'
* 2/ {01} * Q : use Q to continue Q' calculation
* 3/ {00} * Q : subtract Q from P' to cancel (2)
*/
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
lockdep correctly identifies a potential recursive locking case for
iop_chan->lock, but in the dependency submission case we expect that the same
class will be acquired for both the parent dependency and the child channel.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Test raid6 p+q operations with a simple "always multiply by 1" q
calculation to fit into dmatest's current destination verification
scheme.
Reviewed-by: Andre Noll <maan@systemlinux.org>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
[ Based on an original patch by Yuri Tikhonov ]
This adds support for doing asynchronous GF multiplication by adding
two additional functions to the async_tx API:
async_gen_syndrome() does simultaneous XOR and Galois field
multiplication of sources.
async_syndrome_val() validates the given source buffers against known P
and Q values.
When a request is made to run async_pq against more than the hardware
maximum number of supported sources we need to reuse the previous
generated P and Q values as sources into the next operation. Care must
be taken to remove Q from P' and P from Q'. For example to perform a 5
source pq op with hardware that only supports 4 sources at a time the
following approach is taken:
p, q = PQ(src0, src1, src2, src3, COEF({01}, {02}, {04}, {08}))
p', q' = PQ(p, q, q, src4, COEF({00}, {01}, {00}, {10}))
p' = p + q + q + src4 = p + src4
q' = {00}*p + {01}*q + {00}*q + {10}*src4 = q + {10}*src4
Note: 4 is the minimum acceptable maxpq otherwise we punt to
synchronous-software path.
The DMA_PREP_CONTINUE flag indicates to the driver to reuse p and q as
sources (in the above manner) and fill the remaining slots up to maxpq
with the new sources/coefficients.
Note1: Some devices have native support for P+Q continuation and can skip
this extra work. Devices with this capability can advertise it with
dma_set_maxpq. It is up to each driver how to handle the
DMA_PREP_CONTINUE flag.
Note2: The api supports disabling the generation of P when generating Q,
this is ignored by the synchronous path but is implemented by some dma
devices to save unnecessary writes. In this case the continuation
algorithm is simplified to only reuse Q as a source.
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Reviewed-by: Andre Noll <maan@systemlinux.org>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
We currently walk the parent chain when waiting for a given tx to
complete however this walk may race with the driver cleanup routine.
The routines in async_raid6_recov.c may fall back to the synchronous
path at any point so we need to be prepared to call async_tx_quiesce()
(which calls dma_wait_for_async_tx). To remove the ->parent walk we
guarantee that every time a dependency is attached ->issue_pending() is
invoked, then we can simply poll the initial descriptor until
completion.
This also allows for a lighter weight 'issue pending' implementation as
there is no longer a requirement to iterate through all the channels'
->issue_pending() routines as long as operations have been submitted in
an ordered chain. async_tx_issue_pending() is added for this case.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx:
dmaengine: at_hdmac: add DMA slave transfers
dmaengine: at_hdmac: new driver for the Atmel AHB DMA Controller
dmaengine: dmatest: correct thread_count while using multiple thread per channel
dmaengine: dmatest: add a maximum number of test iterations
drivers/dma: Remove unnecessary semicolons
drivers/dma/fsldma.c: Remove unnecessary semicolons
dmaengine: move HIGHMEM64G restriction to ASYNC_TX_DMA
fsldma: do not clear bandwidth control bits on the 83xx controller
fsldma: enable external start for the 83xx controller
fsldma: use PCI Read Multiple command
When first created the ioat driver was the only inhabitant of
drivers/dma/. Now, it is the only multi-file (more than a .c and a .h)
driver in the directory. Moving it to an ioat/ subdirectory allows the
naming convention to be cleaned up, and allows for future splitting of
the source files by hardware version (v1, v2, and v3).
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This patch for at_hdmac adds the slave transfers capability to the Atmel DMA
controller available on some AT91 SOCs. This allow peripheral to memory and
memory to peripheral transfers with hardware handshaking.
Slave structure for controller specific information is passed through channel
private data. This at_dma_slave structure is defined in at_hdmac.h header file
and relative hardware definition are moved to this file from at_hdmac_regs.h.
Doing this we allow the channel configuration from platform definition code.
This work is intensively based on dw_dmac and several slave implementations.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This AHB DMA Controller (aka HDMA or DMAC on AT91 systems) is availlable on
at91sam9rl chip. It will be used on other products in the future.
This first release covers only the memory-to-memory tranfer type. This is the
only tranfer type supported by this chip. On other products, it will be used
also for peripheral DMA transfer (slave API support to come).
I used dmatest client without problem in different configurations to test it.
Full documentation for this controller can be found in the SAM9RL datasheet:
http://www.atmel.com/dyn/products/product_card.asp?part_id=4243
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
It seems that thread_count is not properly calculated in dmatest.
In fact the thread count number that is returned from dmatest_add_threads() is
not correctly added to the thread_count and thus not properly printed.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The dmatest usually waits for the killing of its kthreads to stop
running tests. This patch adds a parameter that sets a maximum
number of test iterations.
This feature is quite interesting for debugging when you set a lot of
traces in your dmaengine controller driver.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This patch reworks platform driver power management code
for txx9dmac from legacy late/early callbacks to dev_pm_ops.
The callbacks are converted for CONFIG_SUSPEND like this:
suspend_late() -> suspend_noirq()
resume_early() -> resume_noirq()
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
This patch reworks platform driver power management code
for dw_dmac from legacy late/early callbacks to dev_pm_ops.
The callbacks are converted for CONFIG_SUSPEND like this:
suspend_late() -> suspend_noirq()
resume_early() -> resume_noirq()
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
On HIGHMEM64G systems dma_addr_t is known to be larger than (void *)
which precludes async_xor from performing dma address conversions by
reusing the input parameter address list. However, other parts of the
dmaengine infrastructure do not suffer this constraint, so the
HIGHMEM64G restriction can be down-levelled.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This patch does not change actual behaviour since dma_unmap_page is just
an alias of dma_unmap_single on MIPS.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
This patch adds support for the integrated DMAC of the TXx9 family.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The 83xx controller does not support the external pause feature. The bit
in the mode register that controls external pause on the 85xx controller
happens to be part of the bandwidth control settings for the 83xx
controller.
This patch fixes the driver so that it only clears the external pause bit
if the hardware is the 85xx controller. When driving the 83xx controller,
the bit is left untouched. This follows the existing convention that mode
registers settings are not touched unless necessary.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The 83xx controller has external start capability, but lacks external pause
capability. Hook up the external start function pointer for the 83xx
controller.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
By default, the Freescale 83xx DMA controller uses the PCI Read Line
command when reading data over the PCI bus. Setting the controller to use
the PCI Read Multiple command instead allows the controller to read much
larger bursts of data, which provides a drastic speed increase.
The slowdown due to using PCI Read Line was only observed when a PCI-to-PCI
bridge was between the devices trying to communicate.
A simple test driver showed an increase from 4MB/sec to 116MB/sec when
performing DMA over the PCI bus. Using DMA to transfer between blocks of
local SDRAM showed no change in performance with this patch. The dmatest
driver was also used to verify the correctness of the transfers, and showed
no errors.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Acked-by: Timur Tabi <timur@freescale.com>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
async_xor() needs space to perform dma and page address conversions. In
most cases the code can simply reuse the struct page * array because the
size of the native pointer matches the size of a dma/page address. In
order to support archs where sizeof(dma_addr_t) is larger than
sizeof(struct page *), or to preserve the input parameters, we utilize a
memory region passed in by the caller.
Since the code is now prepared to handle the case where it cannot
perform address conversions on the stack, we no longer need the
!HIGHMEM64G dependency in drivers/dma/Kconfig.
[ Impact: don't clobber input buffers for address conversions ]
Reviewed-by: Andre Noll <maan@systemlinux.org>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Testing the i7300_idle driver on i5000-series hardware required
an edit to i7300_idle.h to "#define SUPPORT_I5000 1" and a re-build
of both i7300_idle and ioat_dma.
Replace that build-time scheme with a load-time module parameter:
"7300_idle.forceload=1" to make it easier to test the driver
on hardware that while not officially validated, works fine
and is much more commonly available.
By default (no modparam) the driver will continue to load
only on the i7300.
Note that ioat_dma runs a copy of i7300_idle's probe routine
to know to reserve an IOAT channel for i7300_idle.
This change makes ioat_dma do that always on the i5000,
just like it does on the i7300.
Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Andrew Henroid <andrew.d.henroid@intel.com>
We we build with dma_addr_t as a 64-bit quantity we get:
drivers/dma/fsldma.c: In function 'fsl_chan_xfer_ld_queue':
drivers/dma/fsldma.c:625: warning: cast to pointer from integer of different size
drivers/dma/fsldma.c: In function 'fsl_dma_chan_do_interrupt':
drivers/dma/fsldma.c:737: warning: cast to pointer from integer of different size
drivers/dma/fsldma.c:737: warning: cast to pointer from integer of different size
drivers/dma/fsldma.c: In function 'of_fsl_dma_probe':
drivers/dma/fsldma.c:927: warning: cast to pointer from integer of different
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When preparing a memcpy operation, if the kernel fails to allocate memory
for a link descriptor after the first link descriptor has already been
allocated, then some memory will never be released. Fix the problem by
walking the list of allocated descriptors backwards, and freeing the
allocated descriptors back into the DMA pool.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Li Yang <leoli@freescale.com>
On the 83xx controller, snooping is necessary for the DMA controller to
ensure cache coherence with the CPU when transferring to/from RAM.
The last descriptor in a chain will always have the End-of-Chain interrupt
bit set, so we can set the snoop bit while adding the End-of-Chain
interrupt bit.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Li Yang <leoli@freescale.com>
When creating a DMA transaction with multiple descriptors, the async_tx
cookie is set to 0 for each descriptor in the chain, excluding the last
descriptor, whose cookie is set to -EBUSY.
When fsl_dma_tx_submit() is run, it only assigns a cookie to the first
descriptor. All of the remaining descriptors keep their original value,
including the last descriptor, which is set to -EBUSY.
After the DMA completes, the driver will update the last completed cookie
to be -EBUSY, which is an error code instead of a valid cookie. This causes
dma_async_is_complete() to always return DMA_IN_PROGRESS.
This causes the fsldma driver to never cleanup the queue of link
descriptors, and the driver will re-run the DMA transaction on the hardware
each time it receives the End-of-Chain interrupt. This causes an infinite
loop.
With this patch, fsl_dma_tx_submit() is changed to assign a cookie to every
descriptor in the chain. The rest of the code then works without problems.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Li Yang <leoli@freescale.com>
When using the DMA controller from multiple threads at the same time, it is
possible to get lots of "DMA halt timeout!" errors printed to the kernel
log.
This occurs due to a race between fsl_dma_memcpy_issue_pending() and the
interrupt handler, fsl_dma_chan_do_interrupt(). Both call the
fsl_chan_xfer_ld_queue() function, which does not protect against
concurrent accesses to dma_halt() and dma_start().
The existing spinlock is moved to cover the dma_halt() and dma_start()
functions. Testing shows that the "DMA halt timeout!" errors disappear.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Li Yang <leoli@freescale.com>
Fix the check of potential array overflow when using corrupted channel
device tree nodes.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Li Yang <leoli@freescale.com>
This also fixes the case of a single queued buffer, for example, when taking a
single frame snapshot with the mx3_camera driver.
Reported-by: Agustin Ferrin Pozuelo <gatoguan-os@yahoo.com>
Tested-by: Agustin Ferrin Pozuelo <gatoguan-os@yahoo.com>
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
as reported by Alexander Beregalov <a.beregalov@gmail.com>
ioatdma 0000:00:08.0: DMA-API: device driver frees DMA memory with
wrong function [device address=0x000000007f76f800] [size=2000 bytes]
[map
ped as single] [unmapped as page]
The ioatdma driver was unmapping all regions
(either allocated as page or single) using unmap_page.
This patch lets dma driver recognize if unmap_single or unmap_page should be used.
It introduces two new dma control flags:
DMA_COMPL_SRC_UNMAP_SINGLE and DMA_COMPL_DEST_UNMAP_SINGLE.
They should be set to indicate dma driver to do dma-unmapping as single
(first one for the source, tha latter for the destination).
If respective flag is not set, the driver assumes dma-unmapping as page.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Tested-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
disable_irq() should wait for all running handlers to complete
before returning. As such, if it's used to disable an interrupt
from that interrupt's handler it will deadlock. This replaces
the dangerous instances with the _nosync() variant which doesn't
have this problem.
Note the 2 handlers in question are only used #ifdef DEBUG so
I imagine these code paths don't get hit often.
Signed-off-by: Ben Nizette <bn@niasdigital.com>
Acked-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The check for reaching max_channels is short circuited by 'continuing'
after successfully adding a channel.
[ Impact: make the 'max_channels' module parameter actually have an effect ]
Cc: <stable@kernel.org>
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
'zero_sum' does not properly describe the operation of generating parity
and checking that it validates against an existing buffer. Change the
name of the operation to 'val' (for 'validate'). This is in
anticipation of the p+q case where it is a requirement to identify the
target parity buffers separately from the source buffers, because the
target parity buffers will not have corresponding pq coefficients.
Reviewed-by: Andre Noll <maan@systemlinux.org>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Replace all DMA_32BIT_MASK macro with DMA_BIT_MASK(32)
Signed-off-by: Yang Hongyang<yanghy@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace all DMA_64BIT_MASK macro with DMA_BIT_MASK(64)
Signed-off-by: Yang Hongyang<yanghy@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx:
dma: Add SoF and EoF debugging to ipu_idmac.c, minor cleanup
dw_dmac: add cyclic API to DW DMA driver
dmaengine: Add privatecnt to revert DMA_PRIVATE property
dmatest: add dma interrupts and callbacks
dmatest: add xor test
dmaengine: allow dma support for async_tx to be toggled
async_tx: provide __async_inline for HAS_DMA=n archs
dmaengine: kill some unused headers
dmaengine: initialize tx_list in dma_async_tx_descriptor_init
dma: i.MX31 IPU DMA robustness improvements
dma: improve section assignment in i.MX31 IPU DMA driver
dma: ipu_idmac driver cosmetic clean-up
dmaengine: fail device registration if channel registration fails
Add Start-of-Frame and End-of-Frame debugging to ipu_idmac.c, in the
future it might also be needed for the actual video processing in
mx3-camera, at which point, the ISRs will have to be transferred to
mx3_camera.c, for which ipu_irq_map() and ipu_irq_unmap() functions will
have to be exported.
Also simplify a couple of pointer-dereferences.
Signed-off-by: Guennadi Liakhovetski <lg@denx.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This patch adds a cyclic DMA interface to the DW DMA driver. This is
very useful if you want to use the DMA controller in combination with a
sound device which uses cyclic buffers.
Using a DMA channel for cyclic DMA will disable the possibility to use
it as a normal DMA engine until the user calls the cyclic free function
on the DMA channel. Also a cyclic DMA list can not be prepared if the
channel is already active.
Signed-off-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Acked-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Currently dma_request_channel() set DMA_PRIVATE capability but never
clear it. So if a public channel was once grabbed by
dma_request_channel(), the device stay PRIVATE forever. Add
privatecnt member to dma_device to correctly revert it.
[lg@denx.de: fix bad usage of 'chan' in dma_async_device_register]
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Use the callback infrastructure to report driver/hardware hangs or
missed interrupts. Since this makes the test threads much more
aggressive (from: explicit 1ms sleep to: wait_for_completion) we set the
nice value to 10 so as to not swamp legitimate tasks.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Centralize this common initialization (and one case where ipu_idmac is
duplicating ->chan initialization).
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Add DMA error handling to the ISR, move common code fragments to functions, fix
scatter-gather element queuing in the ISR, survive channel freeing and
re-allocation in a quick succession.
Signed-off-by: Guennadi Liakhovetski <lg@denx.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The i.MX31 IPU DMA driver is a platform driver, but doesn't need hotplug, so we
can use __init and __exit function attributes.
Signed-off-by: Guennadi Liakhovetski <lg@denx.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Atsushi points out:
"If alloc_percpu or kzalloc failed, chan_id does not match with its
position in device->channels list.
And above "continue" looks buggy anyway. Keeping incomplete channels
in device->channels list looks very dangerous..."
Also, fix up leakage of idr_ref in the idr_pre_get() and channel init
fail cases.
Reported-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This patch adds clkdev support for i.MX31. This is done in a
similar way done previously for i.MX27
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx:
dmatest: fix use after free in dmatest_exit
ipu_idmac: fix spinlock type
iop-adma, mv_xor: fix mem leak on self-test setup failure
fsldma: fix off by one in dma_halt
I/OAT: fail self-test if callback test reaches timeout
I/OAT: update driver version and copyright dates
I/OAT: list usage cleanup
I/OAT: set tcp_dma_copybreak to 256k for I/OAT ver.3
I/OAT: cancel watchdog before dma remove
I/OAT: fail initialization on zero channels detection
I/OAT: do not set DCACTRL_CMPL_WRITE_ENABLE for I/OAT ver.3
I/OAT: add verification for proper APICID_TAG_MAP setting by BIOS
dmaengine: update kerneldoc
dmatest_cleanup_chanel will free dtc, so grab ->chan before it goes away
and use it to do the release.
Reported-by: Thierry Reding <thierry.reding@avionic-design.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
fix a probably accidently dropped reference operator while calling
spin_unlock_restore to an ipu lock.
Signed-off-by: Luotao Fu <l.fu@pengutronix.de>
Cc: Guennadi Liakhovetski <lg@denx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
iop_adma_zero_sum_self_test has the brackets in the wrong place for the
setup failure deallocation path. This error was duplicated in
mv_xor_xor_self_test.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Prevent dev_err from firing even if we successfully detected 'dma-idle'
before the full 1ms timeout has elapsed.
Acked-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
If we miss interrupts in the self test then fail registration of this
channel as it is unsuitable for use as a public channel.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Together with new fixes update driver version
and extend copyright dates ranges.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Trivial cleanup, list_del(); list_add_tail() is equivalent
to list_move_tail(). Semantic patch for coccinelle can be
found at www.cccmz.de/~snakebyte/list_move_tail.spatch
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Upcoming server platforms from Intel based on the Nehalem performance
have significantly improved CPU based copy performance.
However, the DMA engine can still be effective at higher I/O sizes
for TCP traffic and at this time copybreak
should be set to 256k for TCP traffic only.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Channel watchdog should be canceled before the rest of dma remove stuff.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
On some systems with I/OAT ver.2 when DCA is disabled in BIOS
situations have been observed
that zero DMA channels are detected instead of four.
To avoid kernel panic driver should fail gracefully with appropriate message.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Flag DCACTRL_CMPL_WRITE_ENABLE is valid only for I/OAT ver.2
so it should not be set for I/OAT ver.3.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
BIOS versions for systems with I/OAT ver.2 have been found
which fail to program APICID_TAG_MAP for DCA.
The ioatdma driver should recognize incorrectly set APICID_TAG_MAP
and disable DCA in that case.
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>